*** mmalchuk_ is now known as mmalchuk | 06:49 | |
clarkb | Just about meeting time | 18:59 |
---|---|---|
clarkb | Not sure how many people we'll end up with today | 18:59 |
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Mar 26 19:00:06 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WN2HTKAHF257JN2FT3ZZ5YOAXF3Y5KW3/ Our Agenda | 19:00 |
frickler | o/ | 19:00 |
clarkb | #topic Announcements | 19:00 |
tonyb | o/ | 19:00 |
clarkb | The OpenStack Release happens next week and the PTG is the week after | 19:01 |
clarkb | #link https://etherpad.opendev.org/p/apr2024-ptg-opendev Get your PTG agenda items on the agenda doc | 19:01 |
clarkb | feel free to add ideas for our PTG time on this etherpad | 19:01 |
clarkb | #topic Server Upgrades | 19:01 |
clarkb | I don't think there is much new here. Other than to mention the rackspace mfa changes (which we'll dig into shortly) | 19:02 |
clarkb | We did make the changes on our end and we think that launch node as well as reverse dns updates should be working | 19:02 |
clarkb | its only the forward dns commands that won't work anymore and we'll have to do those in the gui or figure out how to use the api key for that too (but we rarely update openstack.org records these days so not a big deal) | 19:02 |
clarkb | If you do launch new servers and run into trouble please say something. This is all fairly new and any new behavior is worth knowing about | 19:03 |
tonyb | noted | 19:04 |
clarkb | #topic MariaDB Upgrades | 19:04 |
fungi | fyi i didn't test the volume options in launch-node with it | 19:04 |
clarkb | We upgraded the refstack DB and it went just as smoothly as the paste upgrade (all good things) | 19:04 |
clarkb | fungi: ack, but that was always weird before :/ | 19:04 |
clarkb | The remaining services we need to upgrade are etherpad, gitea, gerrit, and mailman3 | 19:04 |
fungi | i think mm3 should be straightforward, also not a big version jump | 19:05 |
clarkb | due to the release and ptg I hesitate to upgrade etherpad, gitea, and gerrit dbs right now (also I'm busy with that stuff so generally have less time) | 19:05 |
clarkb | ya I suspect mm3 may be the safest one to do in the next little bit | 19:05 |
fungi | i can tackle that one when i'm on a more stable connection | 19:05 |
clarkb | sounds good. Let me know if I can help but it is pretty cookie cutter I think | 19:05 |
fungi | yep | 19:05 |
clarkb | #topic AFS Mirror Cleanups | 19:06 |
clarkb | We got centos 7 and opensuse leap pretty well cleared out after I fixed the script updates. There is another cleanup change https://review.opendev.org/c/opendev/system-config/+/913454 to remove the opensuse script entirely since it is nooping now and doesn't need to run at all | 19:07 |
clarkb | That isn't urgent | 19:07 |
clarkb | Next up is ubuntu xenial cleanup. I'm not going to be able to look at that more closely until after the PTG though | 19:07 |
clarkb | I did mention in the TC meeting today that we've freed up space if anyone wants to start on noble mirroring and images | 19:08 |
clarkb | I think the rough plan there is add jobs to dib for noble, then add mirroring, then add image builds to nodepool | 19:08 |
clarkb | I'm happy for others to push on that otherwise I'm likely also going to wait until after the PTG to dig into that | 19:08 |
clarkb | also happy to report that cleaning out opensuse and centos didn't make openafs sad. May have just been coincidence | 19:09 |
clarkb | or the response to the issue has corrected it for the future. Either way I was happy I didn't have to scramble openafs restoration | 19:09 |
clarkb | #topic Rebuilding Gerrit Images | 19:10 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/912470 Update our 3.9 image to 3.9.2 | 19:10 |
clarkb | as mentioned previously this will also rebuild our gerrit 3.8 images against the latest state of that branch. I noticed in gerrit discord this morning that there are a number of bugfixes on stable-3.8 that people want a 3.8.5 release for. We can rebuild and get those deployed before a release happens so all the more reason to do this update | 19:10 |
clarkb | I'm thinking that maybe we plan to do this late next week after the openstack release is done? | 19:11 |
clarkb | we could potentially sneak in a gerrit mariadb upgrade at that time too | 19:11 |
fungi | wfm | 19:11 |
fungi | good with both | 19:11 |
clarkb | cool I won't worry about this too much until next week then | 19:11 |
fungi | maybe as separate restarts | 19:11 |
fungi | just to be extra sure we know what triggered a problem if it blows up | 19:12 |
clarkb | ++ | 19:12 |
clarkb | #topic Rackspace MFA Requirement | 19:12 |
clarkb | fungi updated our three primary accounts to use MFA with info in the usual place | 19:12 |
clarkb | thank you handling that | 19:12 |
clarkb | Today is also the big day it will be come required. As mentioned earlier keep an eye out for unexpected changesi n behavior and say something if you notice any | 19:13 |
fungi | it was very straightforward, the complexity was just lots of extra precautions and staged testing on our end | 19:13 |
clarkb | fungi: I did mean to ask which app you chose to setup to get the secret totp key | 19:13 |
clarkb | maybe it didn't matter any any of the app choices provided a valid string | 19:13 |
fungi | definitely keep an eye on build results from jobs since we're still wary of the swift/cloudfiles accounts | 19:14 |
corvus1 | are we uploading logs to rax currently? | 19:14 |
clarkb | corvus1: we are | 19:14 |
corvus1 | so we're going with the "assume it works and yank it out if it breaks" approach? | 19:14 |
clarkb | ya I think so | 19:14 |
corvus1 | sounds good to me | 19:14 |
fungi | clarkb: in the end it didn't make me choose a specific app. just said i had a phone app rather than choosing the sms option | 19:14 |
clarkb | fungi: gotcha | 19:15 |
fungi | and fed the copyable string equivalent beneath their qr code | 19:15 |
fungi | into our usual totp tool | 19:15 |
clarkb | cool I'm glad they made that easy (my credit union does not) | 19:15 |
fungi | it was only confusing insofar as they didn't mention that it might work with totp implementations besides the three popular phone apps they listed | 19:16 |
clarkb | ya this is a common issue with 2fa setups. They push you to an app whcih almost always resultsin totp because thats what all the apps do | 19:17 |
fungi | for my personal account, i hooked up two different librem key fobs (using the nitrocli utility which supports them) | 19:17 |
clarkb | they just pretend that the protocol is something people shouldn't be aware of | 19:17 |
clarkb | #topic Project Renames | 19:18 |
clarkb | Still pencilled in for April 19 | 19:18 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/911622 Move gerrit replication queue aside during project renames. | 19:18 |
fungi | sounds good, i expect to be around | 19:18 |
clarkb | this change is the only prep I've done so far but I expect to start organizing things after the PTG (there is a theme around scheduling for me can you tell) | 19:18 |
fungi | also we heard back from starlingx folks and they aren't entertaining any repo renames for that window | 19:19 |
clarkb | oh good | 19:19 |
clarkb | I missed that | 19:19 |
fungi | at least that was my takeaway from the starlingx-discuss ml thread | 19:19 |
clarkb | if you are listening in and do intend on renaming a project now is a good time to start preparing for that | 19:20 |
clarkb | #topic Nodepool Image Delete After Upload | 19:20 |
clarkb | I haven't seen any changes to implement the config update in opendev. I can't remember if someone volunteered for that last week (also maybe I missed the change) | 19:21 |
clarkb | I do think this may be a good space saving change we can make in addition to cleaning up old image builds like buster, leap etc so I should tack this onto the cleanup work for images | 19:21 |
clarkb | unless someone else wants to go ahead and push a change up | 19:22 |
clarkb | #topic Review02 had an Oops Last Night | 19:22 |
fungi | saw that, but too late to help out, sorry :( | 19:23 |
clarkb | around 02:46-02:47 UTC last night review02 was shutdown. Best I can tell looking at logs this was not a graceful shutdown | 19:23 |
fungi | didn't see any further updates from vexxhost folks yet | 19:24 |
clarkb | Once I realized this was happening I gave it a few minutes just to see if the cloud would automate a restart then manually started it. The containers were in an error 255 stopped state so did not auto start on boot. I manually down and up'd them again | 19:24 |
clarkb | ya the last update from vexxhost was that this was likely an OOM event on the host/hypervisor side whcih is why it was suddent and opaque to us | 19:24 |
fungi | at least it booted with minimal trauma, aside from maybe some changes not being properly indexed? | 19:25 |
tonyb | I missed that it was gone, because it looked like the address range was gone so I assumed it was and didn't dig further | 19:25 |
clarkb | everything seems to have come back mostly ok. I did note one manila change that wasn't indexed | 19:25 |
clarkb | fungi: ya that was the only issue I could find | 19:25 |
clarkb | and just one change that I found | 19:25 |
fungi | tonyb: i think (but am not certain) that vexxhost does bgp to the hypervisor hosts | 19:25 |
fungi | which explains why routing gets turned around at the core or edge when a vm is unreachable | 19:26 |
clarkb | I had come out of my migraine/headache cocoon to eat something and maybe push out a meeting agenda and did this instead :) | 19:26 |
tonyb | okay. good to know | 19:26 |
clarkb | I was actually feeling a lto better at that point so not a huge deal but that is why I didn't get the agenda out until this morning | 19:26 |
fungi | i had just gotten out of the car and onto the internet when i saw you'd rebooted the vm, terrible timing on my part | 19:27 |
frickler | to me it is a bit worrying having to assume that this can repeat any time | 19:27 |
clarkb | frickler: I agree, but until we hear back from the cloud as to why it happened we don't really know what the problem is and if it can repeat | 19:27 |
fungi | which is why hearing from vexxhost folks on a rca might help assuage our concerns | 19:28 |
clarkb | if it is an OOM then java does allocate memory quite greedily so I do think thati s a good hunch | 19:28 |
clarkb | basically we'll look like the best candidate for killing in an oomkiller situation | 19:28 |
clarkb | and mnaser did say they would look today so hopefully we hear back soon and then determien if we need to make any suggestions or work with them to improve things | 19:28 |
frickler | the hypervisor only seems qemu as a whole | 19:29 |
frickler | *sees | 19:29 |
clarkb | frickler: correct but we'll actually be using all 96GB of memory or whatever in that qemu process | 19:29 |
clarkb | whereas other VMs with that much RAM may only use a small portion in reality | 19:29 |
fungi | main concern is whether it's oversubscribed host, vs memory leak in services running alongside the hypervisor | 19:29 |
clarkb | due to the way java allocates memory | 19:29 |
frickler | ceph rbd client in qemu would be a not uncommon consumer | 19:30 |
fungi | ram oversubscription would be odd, since vexxhost had previously mentioned the hosts having an excess of available ram | 19:31 |
clarkb | we should keep an eye out for any unexpected fallout, but otherwise wait to hear back from them to see if we can help mitigate it going forward | 19:31 |
fungi | thanks again for jumping on that | 19:32 |
clarkb | #topic Open Discussion | 19:33 |
clarkb | Gitea made another 1.21 relese... | 19:33 |
clarkb | we're playing a game of tag with them | 19:33 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/914292 Update Gitea to 1.21.10 | 19:33 |
clarkb | and I've got a change for etherpad 2.0.1 that I'm testing currently | 19:33 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/914119 | 19:33 |
clarkb | the previous ps didn't load our plugin correctly but I think that may have been a bug in the dockerfile | 19:34 |
clarkb | I also think it may be a bug in the upstream dockerfile (the dockerfile is actually a huge mess and maybe once we get things sorted on our end we can look at pushing a PR to clean it up) | 19:34 |
clarkb | as with the etherpad db upgrade I don't think we land this until after the PTG though | 19:34 |
clarkb | Anything else? | 19:36 |
clarkb | I'm still fightin this cold but other than the random headache its mostly a non issue at this point | 19:36 |
tonyb | not from me. | 19:36 |
frickler | ah, one thing | 19:37 |
frickler | did we want to block the storpool CI account now? | 19:37 |
clarkb | +1 from me. I emailed them months ago and got no response | 19:37 |
frickler | still seeing random reviews in devstack and elsewhere | 19:38 |
corvus1 | ++ | 19:38 |
fungi | also a heads up that i'm travelling on short notice due to a family emergency, not sure for how long, so spending a lot of time working from parking lots and waiting rooms over dicey phone tether. will try to help with things when i can safely do so | 19:38 |
frickler | clarkb: maybe you can do that then to make it more "official"? | 19:38 |
clarkb | frickler: sure I can add it to my todo list. Might not happen today but probably can this week | 19:39 |
fungi | fwiw, i think any of our gerrit admins should feel free to switch a problem account to inactive, especially with this much consensus. i just make sure to #status log it for posterity | 19:40 |
clarkb | ++ | 19:40 |
clarkb | Oh one more thing. Do we want ot have a meeting during the PTG or should we cancel? | 19:40 |
clarkb | I think my schedule is open enough to have the meeting but not sure about others | 19:40 |
clarkb | I guess we can decide next week | 19:41 |
frickler | I'd rather skip unless something important happens | 19:41 |
clarkb | ack | 19:41 |
fungi | there's an openstack rbac session at this time i might try to catch | 19:42 |
fungi | so i concur with frickler | 19:42 |
clarkb | sounds like leaning towards cancelling on the 9th. We can make it official next week | 19:42 |
clarkb | Thank you for your time and help running opendev everyone! | 19:42 |
clarkb | I think we can call that a meeting | 19:42 |
clarkb | #endmeeting | 19:42 |
opendevmeet | Meeting ended Tue Mar 26 19:42:49 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:42 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-03-26-19.00.html | 19:42 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-03-26-19.00.txt | 19:42 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-03-26-19.00.log.html | 19:42 |
frickler | thx clarkb | 19:43 |
fungi | thanks clarkb! | 19:43 |
fungi | oh, i looked at the schedule wrong, the rbac session is earlier. there are no ptg session slots during our meeting, but i'd still lean toward skipping and taking advantage of the break block in the ptg schedule for an actual break | 19:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!