Tuesday, 2025-07-01

clarkbI'll start the meeting in just a few minutes. Heads up that HVAC person showed up early so if I disappear for a minute or two I 'm probably answer a question or modifying hvac settings. But I shouldn't be gone long18:58
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Jul  1 19:00:05 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/YQ43ECZ5W6GKF4CTPVARL2U5XCQOKQCB/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbFriday is a US holiday so I expect several of us won't be around that day19:00
clarkbI'm also going to be out the 15th-17th which means someone else will need to run the meeting on the 15th or we can skip it19:01
clarkbI'm happy to defer that decision on everyone else since I won't be here :)19:01
fungii expect to be around19:01
corvusme too19:01
fungihappy to run meetings whenever19:02
clarkbthanks I'll let you dceide that week if you need to send a meeting agenda then fungi19:02
clarkbAnything else to announce?19:02
fungiopeninfra summit schedule is out19:02
clarkb#topic Zuul-launcher19:04
clarkbI guess the big news here is that we're using zuul launcher for the majority of nodes now19:04
fungiyay!19:04
clarkbBe on the lookout for unexpected behaviors. I found one yesterday and corvus  quickly had a patch for that just merged19:04
corvusyep!  a few nodes still supplied for images we don't have yet19:04
clarkb(more often than expected you can get nodesets with nodes from different cloud regions)19:05
corvusalso, there are a few labels we don't have for images that we do have (like nested-virt)19:05
clarkb#link https://review.opendev.org/c/opendev/zuul-providers/+/953269 Add Ubuntu Focal and Bionic images19:05
fungisomeone reported a cirros image file missing in a job just a little bit ago, not sure if it was on a zuul or nodepool built node though19:05
corvusi'm hoping to restart launchers with the latest fixes today; those also include additional metrics, so i'll merge updates to the graphs then too.19:05
clarkbfungi: ya unfornately they didn't provide a build log to check19:06
fungiand we did at least confirm that we're building images with that file in the expected path19:06
fungior configured to do so anyway19:06
corvusfungi: for nodepool, niz, or both?19:07
fungiboth19:07
fungibut hard to dig deeper until we get more details19:07
corvusgood... there's a lot of image metadata just in the web ui now; so looking into that should be a little bit easier19:08
corvusbut i know all of the web ui stuff still needs some work19:08
clarkbcorvus: both mnasiadka's focal+bionic change and frickler's debian trixie depend on dib changes that haven't merged. If/when they do merge are we using dib from relaeses or master in the image build jobs?19:08
fungiotherwise we'll need to d another dib release19:09
corvusreleases, i think, and if that's right, that makes the depends-on testing a dangerous non-co-gating situation19:09
corvus(i wonder if we should have the job run, but fail, if it uses dib from source)19:10
corvusbut let's double check that, since i'm not sure19:10
clarkbcorvus: ++ that seems like a good plan. or we can use it from source too19:10
clarkbfrom source all the time I mean19:10
corvusyeah19:10
clarkbok so to summarize continue to try and debug issues that may be related to the niz switch, support corvus in improving things via code reviews etc, review the dib changes necessary to build additional images, and then try to rollout more images19:11
clarkband maybe we need to toggle how we're installing dib to make things less flaky post merge19:11
corvus++19:11
clarkb#link https://review.opendev.org/c/opendev/zuul-providers/+/951471 Debian Trixie Image builds19:12
clarkbOne thing I wanted to do this morning when looking at the cirros image is missing claim is log into an image built by niz to double check. But the web ui only shows in use nodes and doesn't supply their IPs. It sounds like the json blob may have some of that info if anyone lese needs to look it up19:12
corvusi'm pretty sure we only install dib from source if the repo is there19:13
clarkbI suspect improving the web ui around that sort of thing is going to happen too as we get deeper into this19:13
corvusso easiest way to make it install from source all the time is just to add it to required-projects for the image build jobs19:13
clarkbmakes sense19:13
corvusclarkb: yeah, i think we should also be able to show the image id we used in the web ui/json19:14
clarkb++19:14
fungilisting not-yet-used nodes in the ui could be weird, since the nodes list is tenant scoped and those nodes wouldn't be associated with a tenant yet?19:14
clarkbI have confirmed that nodepool list doesn't seem to show you niz images. I didn't expect it to but thought maybe since they both use zk there would be enough overal pfor that to magically work19:14
corvusfungi: yeah... though we do show building nodes that are assigned to a tenant19:15
corvusthe only thing that won't show up is unassigned ready nodes (typically from min-ready, but possibly from aborted buildsets)19:15
fungifair, if they've got an associated node request19:15
corvusthere is a way to filter for "ready nodes that could possibly be used by this tenant"; it's a bit more complex, but i think we can/should do it.19:15
fungicool!19:16
clarkbI think that would've been useful for me today so I'm ++ on doing that19:16
corvus(so then those ready nodes would show up in multiple tenant listings, until their probability field collapses)19:16
clarkbeach unassigned node is a quantum computer19:17
funginodes that are "available" to the tenant19:17
clarkbanything else on this topic?19:17
corvusi think that's it from me19:17
clarkboh mnasiadka just mentioned in #opendev that image build jobs are running out of disk19:18
clarkbso we may need to do more optimization of the disk usage in the jobs. But details are scarce right now. Needs more characterization19:18
corvusoof.  i guess we'll followup on that in opendev19:18
corvuswe do have cacti graphs19:18
clarkbI think this was on the image build node itself19:18
clarkbnot the launchers fwiw19:18
clarkbbut ya we can followup there later19:19
corvusoh derp sorry19:19
clarkb#topic Gerrit shutdown problems19:19
clarkbLast week I finally got around to doing the "testing" of gerrit shutdown processes in production19:19
clarkbAnd I think the hunch that h2 db compaction was the cause of slow gerrit shutdown was accurate.19:19
clarkbwe ran a manual kill -HUP against gerrit to rule out sigint vs sighup behavior differences and sighup produced the same slow shutdown. It ended up taking about 6-7 minutes to finally shutdown19:20
clarkbwhile we were waiting I ran a strace against gerrit and it was read/writing/seeking in h2 db files. And after the process completed the db files were smaller than when we started19:20
clarkbwe did the restart to apply the revert of h2 compaction so the testing was also the fix and I expect the next restart will be happy19:21
clarkbfungi: you helped out with ^ anything else to add?19:21
funginope, just that there was good solid evidence to support your guess19:21
fungilooking forward to smooter restarts in the future19:22
fungismoother19:22
clarkbso ya good news I think we've corrected this which means I can return to figuring out a gerrit 3.11 upgrade19:22
clarkb#topic Gerrit 3.11 Upgrade Planning19:22
clarkbhowever, I haven't started on this since hopefully fixing gerrit restarts19:22
clarkb#link https://www.gerritcodereview.com/3.11.html19:22
clarkbreading over the release notes is still helpful if you haven't done it yet19:22
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.11 Planning Document for the eventual Upgrade19:22
clarkband you can put any thoughts or notes in that etherpad. I'd like to pick this up again either this week or next so hopefully there will be proper updates soon19:23
clarkb#topic Upgrading old servers19:23
clarkbThere are updates on this topic!19:23
fungilots19:24
clarkblate last week and over the weekend corvus replaced all of the core zuul servers with noble nodes19:24
clarkb(schedulers, mergers, executors, launchers)19:24
corvusi did self-approve some pro-forma changes.  hope that's okay.  :)19:24
fungiperfectly19:24
clarkbthen I followed that work up by replacing the zookeeper nodes behind zuul and nodepool yseterday19:25
corvusone thing that went wrong with the zuul upgrade: the load balancer19:25
clarkbcorvus: the problem is that we hardcode IP addrs in the proxy config right?19:25
corvusentirely my fault, because i forgot 2 things: 1) we moved the config files; and 2) the backend server config is explicit19:25
clarkbI feel like we do that because then we aren't reliant on DNS for that to work. Maybe we should just let haproxy do dns lookups?19:26
corvusi cleaned up our old config file locations on both load balancers (zuul and gitea), so hopefully that won't bite anyone else :)19:26
corvusyeah, i'm kind of thinking that just letting dns or ansible write that automatically might be okay19:26
clarkbI'm on board with that. Worst case we notice flakyness on the frontend and can revert19:27
clarkbthen tackle it some other way19:27
corvusautomatic config is... uh... what i was expecting to happen... and i think our process for replacing servers would work with that.19:27
corvusprobably just need to still be able to take a server in and out easily, but just not specify it's ip address.19:28
clarkbya we do that with zookeeper actually19:28
clarkbthe servers are all listed with explicit IPs but ansible figures out what they are for us and puts them in he config file19:28
corvusi like that approach19:29
clarkband it does so via looking at hosts in the zookeeper ansible group19:29
corvussounds like consensus to change that.  that was the only followup from my weekend19:29
tonybMakes sense to me19:29
clarkbzookeeper replacement went smoothly. I did have one unexpcted election behavior but afterwards I thought it through and the behavior makes sense to me after the fact19:30
clarkb#link https://review.opendev.org/c/opendev/zone-opendev.org/+/953844 Remove old zk servers from DNS19:30
clarkbafter lunch today I plan to approve ^ unless there are objections to remove the old zk servers from DNS. Then I'll plan to delete the old zk servers after or tomorrow morning (again if there are no objections)19:30
clarkbcorvus already acked doing ^ so let me know otherwise I'm planning to proceed with cleanup19:31
corvusclarkb: one thought: now that we've shown we can do that particular upgrade process, we should make sure not to copy those questionable notes about upgrades from the etherpad.19:31
corvusor revise them or whatever19:31
clarkbcorvus: ya I did put a note about them possibly being FUD in the etherpad already but I should go ahead and delete them or cross them out and mark them invalid19:31
corvus++19:32
clarkbnow that we have all these nodes running on noble with docker compose instead of docker compose we can clean up their docker-compose.yaml files19:32
clarkb#link https://review.opendev.org/c/opendev/system-config/+/95384619:32
clarkb#link https://review.opendev.org/c/opendev/system-config/+/95384819:32
clarkbit was writing these changes that helped discover the nodes from multiple clouds behavior19:32
clarkbthose aren't urgent but it might be nice to get rid of the warnings19:34
corvuslooking forward to that :)19:34
clarkbI also swapped otu mirror-update servers not sure if we discussed that previously.19:34
clarkbEavesdrop and refstack are the "easy" nodes I have remaingin on the easy list19:34
clarkbfungi: I know we've been busy with plenty of other stuff but any word on refstack cleanup?19:34
funginothing19:37
clarkbok. The list is still quite big but we're slowly whittling it down. Thanks for the help and happy to have more19:37
clarkb#topic OFTC Matrix bridge no longer supporting new users19:38
clarkbI have an action item to go write a spec for this I just haven't gotten to it yet. Maybe I should do that before I start looking at gerrit 3.1119:38
clarkbI guess there isn't anything new on this until I do that19:40
clarkb#topic Working through our TODO list19:40
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:40
clarkbI also need to migrate this list to something a bit more permanent/better19:40
clarkbbut a friendly reminder that the list exists if you get bored :)19:41
clarkb#topic Pre PTG Planning19:43
clarkbI haven't heard any additional feedback on the week of October 6-10 for holding an opendev pre ptg19:43
clarkbI think we shoudl all pencil in those dates and I'll start on an announcement and an agenda that we can fill in before then19:44
clarkbok no more feedback is good feedback I'll proceed wit htaht as the plan for now19:47
clarkb#topic Open Discussion19:47
clarkbAnything else?19:47
fungii got nothin'19:51
clarkbin that case thanks everyone for your time here and elsewhere keeping opendev up and running19:52
clarkbwe'll be back next week at the same time and location19:52
clarkb#endmeeting19:52
opendevmeetMeeting ended Tue Jul  1 19:52:22 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:52
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-01-19.00.html19:52
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-01-19.00.txt19:52
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-07-01-19.00.log.html19:52
fungithanks clarkb!19:54

Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!