Tuesday, 2024-03-26

*** mmalchuk_ is now known as mmalchuk06:49
clarkbJust about meeting time18:59
clarkbNot sure how many people we'll end up with today18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Mar 26 19:00:06 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/WN2HTKAHF257JN2FT3ZZ5YOAXF3Y5KW3/ Our Agenda19:00
fricklero/19:00
clarkb#topic Announcements19:00
tonybo/19:00
clarkbThe OpenStack Release happens next week and the PTG is the week after19:01
clarkb#link https://etherpad.opendev.org/p/apr2024-ptg-opendev Get your PTG agenda items on the agenda doc19:01
clarkbfeel free to add ideas for our PTG time on this etherpad19:01
clarkb#topic Server Upgrades19:01
clarkbI don't think there is much new here. Other than to mention the rackspace mfa changes (which we'll dig into shortly)19:02
clarkbWe did make the changes on our end and we think that launch node as well as reverse dns updates should be working19:02
clarkbits only the forward dns commands that won't work anymore and we'll have to do those in the gui or figure out how to use the api key for that too (but we rarely update openstack.org records these days so not a big deal)19:02
clarkbIf you do launch new servers and run into trouble please say something. This is all fairly new and any new behavior is worth knowing about19:03
tonybnoted19:04
clarkb#topic MariaDB Upgrades19:04
fungifyi i didn't test the volume options in launch-node with it19:04
clarkbWe upgraded the refstack DB and it went just as smoothly as the paste upgrade (all good things)19:04
clarkbfungi: ack, but that was always weird before :/19:04
clarkbThe remaining services we need to upgrade are etherpad, gitea, gerrit, and mailman319:04
fungii think mm3 should be straightforward, also not a big version jump19:05
clarkbdue to the release and ptg I hesitate to upgrade etherpad, gitea, and gerrit dbs right now (also I'm busy with that stuff so generally have less time)19:05
clarkbya I suspect mm3 may be the safest one to do in the next little bit19:05
fungii can tackle that one when i'm on a more stable connection19:05
clarkbsounds good. Let me know if I can help but it is pretty cookie cutter I think19:05
fungiyep19:05
clarkb#topic AFS Mirror Cleanups19:06
clarkbWe got centos 7 and opensuse leap pretty well cleared out after I fixed the script updates. There is another cleanup change https://review.opendev.org/c/opendev/system-config/+/913454 to remove the opensuse script entirely since it is nooping now and doesn't need to run at all19:07
clarkbThat isn't urgent19:07
clarkbNext up is ubuntu xenial cleanup. I'm not going to be able to look at that more closely until after the PTG though19:07
clarkbI did mention in the TC meeting today that we've freed up space if anyone wants to start on noble mirroring and images19:08
clarkbI think the rough plan there is add jobs to dib for noble, then add mirroring, then add image builds to nodepool19:08
clarkbI'm happy for others to push on that otherwise I'm likely also going to wait until after the PTG to dig into that19:08
clarkbalso happy to report that cleaning out opensuse and centos didn't make openafs sad. May have just been coincidence19:09
clarkbor the response to the issue has corrected it for the future. Either way I was happy I didn't have to scramble openafs restoration19:09
clarkb#topic Rebuilding Gerrit Images19:10
clarkb#link https://review.opendev.org/c/opendev/system-config/+/912470 Update our 3.9 image to 3.9.219:10
clarkbas mentioned previously this will also rebuild our gerrit 3.8 images against the latest state of that branch. I noticed in gerrit discord this morning that there are a number of bugfixes on stable-3.8 that people want a 3.8.5 release for. We can rebuild and get those deployed before a release happens so all the more reason to do this update19:10
clarkbI'm thinking that maybe we plan to do this late next week after the openstack release is done?19:11
clarkbwe could potentially sneak in a gerrit mariadb upgrade at that time too19:11
fungiwfm19:11
fungigood with both19:11
clarkbcool I won't worry about this too much until next week then19:11
fungimaybe as separate restarts19:11
fungijust to be extra sure we know what triggered a problem if it blows up19:12
clarkb++19:12
clarkb#topic Rackspace MFA Requirement19:12
clarkbfungi updated our three primary accounts to use MFA with info in the usual place19:12
clarkbthank you handling that19:12
clarkbToday is also the big day it will be come required. As mentioned earlier keep an eye out for unexpected changesi n behavior and say something if you notice any19:13
fungiit was very straightforward, the complexity was just lots of extra precautions and staged testing on our end19:13
clarkbfungi: I did mean to ask which app you chose to setup to get the secret totp key19:13
clarkbmaybe it didn't matter any any of the app choices provided a valid string19:13
fungidefinitely keep an eye on build results from jobs since we're still wary of the swift/cloudfiles accounts19:14
corvus1are we uploading logs to rax currently?19:14
clarkbcorvus1: we are19:14
corvus1so we're going with the "assume it works and yank it out if it breaks" approach?19:14
clarkbya I think so19:14
corvus1sounds good to me19:14
fungiclarkb: in the end it didn't make me choose a specific app. just said i had a phone app rather than choosing the sms option19:14
clarkbfungi: gotcha19:15
fungiand fed the copyable string equivalent beneath their qr code19:15
fungiinto our usual totp tool19:15
clarkbcool I'm glad they made that easy (my credit union does not)19:15
fungiit was only confusing insofar as they didn't mention that it might work with totp implementations besides the three popular phone apps they listed19:16
clarkbya this is a common issue with 2fa setups. They push you to an app whcih almost always resultsin totp because thats what all the apps do19:17
fungifor my personal account, i hooked up two different librem key fobs (using the nitrocli utility which supports them)19:17
clarkbthey just pretend that the protocol is something people shouldn't be aware of19:17
clarkb#topic Project Renames19:18
clarkbStill pencilled in for April 1919:18
clarkb#link https://review.opendev.org/c/opendev/system-config/+/911622 Move gerrit replication queue aside during project renames.19:18
fungisounds good, i expect to be around19:18
clarkbthis change is the only prep I've done so far but I expect to start organizing things after the PTG (there is a theme around scheduling for me can you tell)19:18
fungialso we heard back from starlingx folks and they aren't entertaining any repo renames for that window19:19
clarkboh good19:19
clarkbI missed that19:19
fungiat least that was my takeaway from the starlingx-discuss ml thread19:19
clarkbif you are listening in and do intend on renaming a project now is a good time to start preparing for that19:20
clarkb#topic Nodepool Image Delete After Upload19:20
clarkbI haven't seen any changes to implement the config update in opendev. I can't remember if someone volunteered for that last week (also maybe I missed the change)19:21
clarkbI do think this may be a good space saving change we can make in addition to cleaning up old image builds like buster, leap etc so I should tack this onto the cleanup work for images19:21
clarkbunless someone else wants to go ahead and push a change up19:22
clarkb#topic Review02 had an Oops Last Night19:22
fungisaw that, but too late to help out, sorry :(19:23
clarkbaround 02:46-02:47 UTC last night review02 was shutdown. Best I can tell looking at logs this was not a graceful shutdown19:23
fungididn't see any further updates from vexxhost folks yet19:24
clarkbOnce I realized this was happening I gave it a few minutes just to see if the cloud would automate a restart then manually started it. The containers were in an error 255 stopped state so did not auto start on boot. I manually down and up'd them again19:24
clarkbya the last update from vexxhost was that this was likely an OOM event on the host/hypervisor side whcih is why it was suddent and opaque to us19:24
fungiat least it booted with minimal trauma, aside from maybe some changes not being properly indexed?19:25
tonybI missed that it was gone, because it looked like the address range was gone so I assumed it was and didn't dig further19:25
clarkbeverything seems to have come back mostly ok. I did note one manila change that wasn't indexed19:25
clarkbfungi: ya that was the only issue I could find19:25
clarkband just one change that I found19:25
fungitonyb: i think (but am not certain) that vexxhost does bgp to the hypervisor hosts19:25
fungiwhich explains why routing gets turned around at the core or edge when a vm is unreachable19:26
clarkbI had come out of my migraine/headache cocoon to eat something and maybe push out a meeting agenda and did this instead :)19:26
tonybokay.  good to know 19:26
clarkbI was actually feeling a lto better at that point so not a huge deal but that is why I didn't get the agenda out until this morning19:26
fungii had just gotten out of the car and onto the internet when i saw you'd rebooted the vm, terrible timing on my part19:27
fricklerto me it is a bit worrying having to assume that this can repeat any time19:27
clarkbfrickler: I agree, but until we hear back from the cloud as to why it happened we don't really know what the problem is and if it can repeat19:27
fungiwhich is why hearing from vexxhost folks on a rca might help assuage our concerns19:28
clarkbif it is an OOM then java does allocate memory quite greedily so I do think thati s a good hunch19:28
clarkbbasically we'll look like the best candidate for killing in an oomkiller situation19:28
clarkband mnaser did say they would look today so hopefully we hear back soon and then determien if we need to make any suggestions or work with them to improve things19:28
fricklerthe hypervisor only seems qemu as a whole19:29
frickler*sees19:29
clarkbfrickler: correct but we'll actually be using all 96GB of memory or whatever in that qemu process19:29
clarkbwhereas other VMs with that much RAM may only use a small portion in reality19:29
fungimain concern is whether it's oversubscribed host, vs memory leak in services running alongside the hypervisor19:29
clarkbdue to the way java allocates memory19:29
fricklerceph rbd client in qemu would be a not uncommon consumer19:30
fungiram oversubscription would be odd, since vexxhost had previously mentioned the hosts having an excess of available ram19:31
clarkbwe should keep an eye out for any unexpected fallout, but otherwise wait to hear back from them to see if we can help mitigate it going forward19:31
fungithanks again for jumping on that19:32
clarkb#topic Open Discussion19:33
clarkbGitea made another 1.21 relese...19:33
clarkbwe're playing a game of tag with them19:33
clarkb#link https://review.opendev.org/c/opendev/system-config/+/914292 Update Gitea to 1.21.1019:33
clarkband I've got a change for etherpad 2.0.1 that I'm testing currently19:33
clarkb#link https://review.opendev.org/c/opendev/system-config/+/91411919:33
clarkbthe previous ps didn't load our plugin correctly but I think that may have been a bug in the dockerfile19:34
clarkbI also think it may be a bug in the upstream dockerfile (the dockerfile is actually a huge mess and maybe once we get things sorted on our end we can look at pushing a PR to clean it up)19:34
clarkbas with the etherpad db upgrade I don't think we land this until after the PTG though19:34
clarkbAnything else?19:36
clarkbI'm still fightin this cold but other than the random headache its mostly a non issue at this point19:36
tonybnot from me.19:36
fricklerah, one thing19:37
fricklerdid we want to block the storpool CI account now?19:37
clarkb+1 from me. I emailed them months ago and got no response19:37
fricklerstill seeing random reviews in devstack and elsewhere19:38
corvus1++19:38
fungialso a heads up that i'm travelling on short notice due to a family emergency, not sure for how long, so spending a lot of time working from parking lots and waiting rooms over dicey phone tether. will try to help with things when i can safely do so19:38
fricklerclarkb: maybe you can do that then to make it more "official"?19:38
clarkbfrickler: sure I can add it to my todo list. Might not happen today but probably can this week19:39
fungifwiw, i think any of our gerrit admins should feel free to switch a problem account to inactive, especially with this much consensus. i just make sure to #status log it for posterity19:40
clarkb++19:40
clarkbOh one more thing. Do we want ot have a meeting during the PTG or should we cancel?19:40
clarkbI think my schedule is open enough to have the meeting but not sure about others19:40
clarkbI guess we can decide next week19:41
fricklerI'd rather skip unless something important happens19:41
clarkback19:41
fungithere's an openstack rbac session at this time i might try to catch19:42
fungiso i concur with frickler19:42
clarkbsounds like leaning towards cancelling on the 9th. We can make it official next week19:42
clarkbThank you for your time and help running opendev everyone!19:42
clarkbI think we can call that a meeting19:42
clarkb#endmeeting19:42
opendevmeetMeeting ended Tue Mar 26 19:42:49 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:42
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-03-26-19.00.html19:42
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-03-26-19.00.txt19:42
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-03-26-19.00.log.html19:42
fricklerthx clarkb 19:43
fungithanks clarkb!19:43
fungioh, i looked at the schedule wrong, the rbac session is earlier. there are no ptg session slots during our meeting, but i'd still lean toward skipping and taking advantage of the break block in the ptg schedule for an actual break19:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!