Tuesday, 2025-04-01

clarkbjust about meeting time18:57
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Apr  1 19:00:06 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/5PQX3P4NIXU6FRRQRWPTQSZNICSJJVFF/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbOpenStack is going to release its Epoxy 2025.1 release tomorrow19:00
clarkbkeep that in mind as we make changes over the next 24 hours or so19:00
clarkbthen the virtual PTG is being hosted next week (April 7 - 11) with meetpad being the default location for teams (they can choose to override the location if they wish)19:01
fricklerin particular hold back on https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/941246 until the release is done, please19:01
clarkbearlier today fungi and I tested meetpad functionality and it seems to be working so I went ahead and put meetpad02 and jvb02 in the emergency file19:02
fricklerwhat about etherpad?19:02
clarkbthis way new container images from upstream won't unexpectedly break us mid PTG. We can remove the hosts from the emergency file Friday afternoon19:02
clarkbfrickler: etherpad uses images we build so shouldn't get auto updated19:02
fricklerah, right19:03
fungi(but please don't push and approve any etherpad image update changes)19:03
clarkb++19:04
clarkbanything else to announce?19:04
fricklerdo we want to skip the meeting next week?19:04
clarkbgood question. I'm happy to host it if we like but also happy to skip if people think they will be too busy with the ptg19:05
clarkbI have intentionally avoided scheduling opendev ptg time as we do tend to be too busy for that19:05
* frickler wouldn't mind skipping. also +1 to the latter19:05
fungiyeah, the meeting technically doesn't conflict with the ptg schedule since it's not during a ptg timeslot, but i'd be okay with skipping19:06
clarkblets say we'll skip then and if something important comes up I can send out an agenda and reschedule it19:06
fungieven if just to have one fewer obligation next week19:06
clarkbbut for now we'll say there is no meeting next week19:06
fricklerI also plan to more regularly skip the meeting during the summer time, but not set in stone yet19:06
clarkbthanks for the heads up19:07
clarkb#topic Zuul-launcher image builds19:07
clarkbI think zuul has been able to dog food zuu launcher images and nodes a fair bit recently which is neat19:07
corvusyeah i think the launcher is performing sufficiently well that we can expand its use19:08
corvusi think we can switch the zuul tenant over to using it exclusively19:08
corvusshould we think about switching the opendev tenant too?19:08
clarkbwe do still need more image builds to be pushed up to https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/image-build-jobs.yaml19:08
clarkbcorvus: no objections to switching opendev but that may need more images. Probably good motivation to get ^ done19:08
frickler+1 to opendev19:09
clarkbthat is something I may be able to look at later this week or during the ptg next week depending on how gerrit things go19:09
fungii'm in favor19:09
corvussounds good, i'll make changes to switch the node usage over19:10
clarkbcorvus: are there any major known missing pieces to the system at this point?19:10
corvuswould be great for not-corvus to add more image jobs19:10
clarkbor are we in the unknown unknowns right now and so more use is really what we need?19:10
corvusi will also add the image build jobs to periodic pipeline to make sure they're rebuilt frequently19:11
corvusi think we need to add statsd19:11
corvusi don't think the launcher emits any useful stats now19:11
clarkboh ya that would be good19:11
corvusbut other than that, as far as the sort of main-line functionality, i think it's generally there, and we're in the unknown-unknowns phase19:12
fricklerdoes autohold work the same as with nodepool?19:12
clarkbsounds like we have a rough plan for next steps then. statsd, more images, periodic builds, switch opendev jobs over19:12
fungione question springs to mind: how do we go about building different images at different frequencies now? dedicatd pipelines?19:13
corvusfrickler: probably not, that may be a missing piece19:13
corvusfungi: yep19:13
fricklerfungi: maybe doing some in periodic-weekly would be good enough?19:13
fungiyeah, i think so19:13
fungino need to over-complicate it19:13
clarkb++ to daily and weekly19:13
fungiwe always have the option of adding complexity later if we get bored and hate ourselves enough19:14
fungi[masochist sysadmin stereotype]19:14
corvusif we switch opendev over, and autohold isn't working and we need it, we can always dynamically switch back19:14
fricklerso the switch has to be per tenant or can we revert per repo if needed?19:15
clarkbit is per job I think19:15
corvusbasically just saying: even if we switch the tenant over, we can always change back one project/job/change even.19:15
clarkbso should be very flexible if we need an autohold19:15
corvusyeah per job19:15
fricklerah, cool19:15
clarkbI think that works as a fallback19:15
fungithe power of zuul19:15
clarkbanything else on this subject?19:16
corvusnew labels will be like "niz-ubuntu-noble-8gb" and if you need to switch back to nodepool, just change it to "ubuntu-noble"19:16
corvusthat's it from me19:16
fricklernot directly related, but we still have held noble builds19:16
fricklernot sure if zuul uses neutron in any tests?19:16
clarkbthere are nodepool jobs that test against a real openstack19:17
clarkbso those might be affected if they run on noble19:17
fricklerthose likely will be broken until ubuntu publishes a fixed kernel, which is planned for the week of the 14th19:17
clarkback19:17
clarkbfwiw the noble nodes I booted toreplace old servers have ip6tables rules that look correct to me19:17
frickler#link https://bugs.launchpad.net/neutron/+bug/2104134 for reference19:18
clarkbso the brokeness must be in a very specific part of the ipv6 firewall handling19:18
frickleryes, it is only a special ip6tables module that is missing19:18
clarkback19:18
clarkb#topic Container hygiene tasks19:18
clarkb#link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.1219:18
clarkbwe updated matrix-eavesdrop and accessbot last week to python3.1219:18
clarkbaccessbot broke on an ssl thing that fungi fixed up19:19
fungiit happens19:19
clarkbotherwise things are happy. I think this effort will be on hiatus this week while we wait for the openstack release and I focus on other things19:19
fungijust glad to be fixing things for a change rather than breaking them19:19
clarkbBut so far no major issues with python3.1219:19
clarkbIn related news I did try to test zuul wit python3.13 to see if we can skip 3.12 for zuul entirely. Unfortunately, zuul relies on google's re2 python package which doesn't have 3.13 wheels yet and they require libabsl and pybind11 versiosn that are too new even for noble19:20
clarkbif we really want to we can use bazel to build packages for us which should fetch all the deps and do the right thing19:20
clarkbbut for now I'm hopeful upstream pushes new wheels (there is a change proposed to add 3.13 wheel builds already)19:21
clarkb#topic Booting a new Gerrit server19:21
clarkbhigh on my todo list for early april is buidling a new gerrit server so that we can change the production server over to a newer os version19:22
clarkbthe rough plan I've got in mind is boot the new server end of this week, early next week. Then late the week after do the production cutover (~April 17/18(19:22
clarkbbut the first step in doing that is deciding where to boot it. Each of the available options has downsides and upsides. I personally think the best option is to stay where we are and use a non boot from volume v3 flavor in vexxhost ymq19:23
clarkbthe reason for that is the most stable gerrit has ever been for us has been running on the large flavor (particularly with extra memory) in vexxhost and we don't have access to large nodes like that elsewhere19:23
clarkbthe downside to this location is ipv6 connectivity has been flaky for some isps in europe19:24
clarkbalternatives would be rackspcae classic, the main drawbacks are that we'd probably have to redeploy to rax flex sooner than later (or back to vexxhost) and flavors are smaller aiui. Or ovh. The downside to ovh is their billing is weird and sometimes things go away unexpectedly19:24
clarkball that to say that my vote is for vexxhost ymq and if I don't hear strong objections or suggestions otherwise I'll probably boot a review03 there within the next week19:25
fungiyeah, part of me feels wasteful because we're probably using a flavor twice the size of what we could get away with, but also we haven't been asked to scale it down and this is the core of all our project workflows19:25
clarkbthe day to day size is smaller than we need but whenever we have to do offline reindexing we definitely benefit from the memory and cpus19:26
fungiso anything to help with stability makes sense19:26
clarkbso bigger is good for upgrades/downgrades and unexpected spieks in demand19:26
fungias for doing it in other providers, i'm not sure we have any with similarly large flavors on offer19:26
fricklerI agree the current alternatives doen't look too good, so better live with the IPv6 issues and periodically nag mnaser ;=D19:26
clarkbonce the server is up (whereever it goes) the next step will be to put the server in the inventory safely (no replication config) and sync data safely from review02 (again don't copy the replication config)19:27
fricklerraxflex would be nice if it finally had IPv619:27
fungiagreed, for a lot of our control plane in fact19:27
clarkbthat should allows us to check everything is working before scheduling a downtime and doing a final sync over on ~April 17-1819:27
fricklernot sure if it would be an option to wait for that with gerrit? likely not with not schedule for it yet19:28
frickler*no schedule19:28
clarkbI don't think we should wait19:28
clarkbthere is never a great time to make changes to Gerrit so we just have to pick less bad times and roll with it19:28
clarkbwe don't have to make any hard decisions right this moment. I probably won't get to this until thursday or friday at the earliest. Chew on it and let me know if you have objectiosn or other ideas19:30
fricklerfair enough19:30
clarkbI also wanted to note that Gerrit just made a 3.10.5 release19:30
clarkb#link https://review.opendev.org/c/opendev/system-config/+/946050 Gerrit 3.10.5 image update19:30
clarkbthe release notes don't look urgent to me so I'm fine holding off on this until after the openstack release19:30
clarkbbut I think late this week we should try to sneak this in too19:30
clarkbjust a heads up that is on my todo list. Don't awnt anyone to be surprised by a gerrit update particularly during release week19:32
clarkb#topic Upgrading old servers19:32
clarkbSince our last meeting I ended up replacing the rax.iad and rax.dfw servers. rax.iad was to update the base os and rax.dfw was to get rescheduled to avoid network bandwidth issues19:32
clarkbboth are running noble now19:33
clarkbhaven't seen any complaints and in the case of dfw we made the region useable again in the process19:33
fungiyeah, that was a good impulse19:34
fungiseems like there was either something happening with the hypervisor host the old one was running on, or some arbitrary rate limit applied to the server instance's interface19:34
clarkbfor other servers in the pipeline this is the "easy" list: refstack, mirror-update, eavesdrop, zuul schedulers, and zookeeper servers19:35
clarkbI'd appreciate any help others can offer on this especially as I'm going to shift my focus on gerrit for the next bit19:35
fungiwe haven't heard back from rackspace folks on any underlying cause yet, that i've seen19:35
clarkbfungi: right I haven't seen any root cause19:35
clarkbbut we left the old server up so they could continue to debug19:35
clarkbcleaning it up will happen later19:35
clarkbOh also refstack may be the lowest priority as I'm not sure that there is anyone maintaining the software anymore19:36
clarkbbut the others are all valid I think19:37
clarkband there are more on the hard list (gerrit is on that list too)19:37
fungialso we got confirmation from the foundation staff that they no longer rely on it for anything19:37
fungi(refstack i mean)19:37
fricklerso announce deprecation and shut off in a year instead of migrating?19:38
clarkbfrickler: ya or even sooner19:38
clarkbthat rough plan makes sense to me.19:38
fricklersure, I just wanted to avoid a "that's too fast" response ;)19:39
clarkbbut ya my next focus on this topic is gerrit. Would be great if others can fill in some of the gaps around that19:39
clarkband let me know if there are questions concerns generally on the proces19:39
clarkb#topic Running certcheck on bridge19:39
clarkbthis is on the agenda mostly so I don't forget to look at it19:39
clarkbI don't have any updates and don't think anyone else does so probably not much to say and we can continue19:39
clarkbbut I'll wait for a minute in case I'm wrong about that19:40
clarkb#topic Working through our TODO list19:41
clarkbIf what we've discussed above doesn't inspire you or fill your todo list full of activities we do have a separate larger and broader list you can look at for inspiration19:42
clarkb#link https://etherpad.opendev.org/p/opendev-january-2025-meetup19:42
clarkbmuch of what we discuss week to week falls under this list but I just like to remind people we've got even more to dive into if there is interest19:42
clarkbthis applies to existing and new contributors alike19:42
clarkband feel free to reach out to me with questions if you have them about anything we're doing or have on that todo list19:42
clarkb#topic Rotating mailman 3 logs19:42
clarkbfungi: do we have any changes for this yet? speaking of autoholds this might be a good cause for autoholds to test whether copytruncate is useable19:43
fungino, sorry, not yet19:43
clarkback its been busy lately and I think this week is no different. but might be good to throw something up and get it held so that we can observe the behavior over time19:44
clarkb#topic Open Discussion19:45
clarkbamong everything else this week I'm juggling some family stuff and I'll be out Monday (so missing day 1 of the ptg)19:45
clarkbas a perfect example I'm doing the school run this afternoon so will be out for a bit in 1.5 hours19:46
fricklerah, that reminds me to drop the osc-placement autohold ;)19:46
fungii'll be around, happy to keep an eye on things when i'm not leading ptg sessions19:46
fungialso for tomorrow's openstack release i'm going to try to be online by 10:00 utc (6am local for me) in case anything goes sideways19:47
clarkbI won't be awake that early but when I do awke I'll check in on things and can help out if necessary too19:47
* tonyb will be around for the release also19:48
fungimuch appreciated19:49
clarkbanything else?19:50
clarkbhopefully everyone is able to celebrate the release tomorrow19:50
clarkbthank you everyone for your time and effort and help.19:51
fungii'll be hosting an openinfra.live episode on thursday where openstack community leaders will talk about new features and changes in various components19:51
clarkbAs mentioned at the beginning of the meeting we will skip next week's meeting unless something important comes up. That way you can all enjoy the PTG and not burn out on meetings as quickly19:51
tonybNice .... more sleep for me :)19:52
clarkbthat too :)19:52
clarkb#endmeeting19:52
opendevmeetMeeting ended Tue Apr  1 19:52:36 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:52
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-01-19.00.html19:52
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-01-19.00.txt19:52
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-01-19.00.log.html19:52
fungithanks clarkb!19:52
clarkboh I was going to mention the matrix oftc bridge is still up as of an hour ago or so19:52
clarkbwe probably want to keep an eye on that so we can notify people if it does end up going away19:53

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!