clarkb | just about meeting time | 18:57 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Apr 1 19:00:06 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/5PQX3P4NIXU6FRRQRWPTQSZNICSJJVFF/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | OpenStack is going to release its Epoxy 2025.1 release tomorrow | 19:00 |
clarkb | keep that in mind as we make changes over the next 24 hours or so | 19:00 |
clarkb | then the virtual PTG is being hosted next week (April 7 - 11) with meetpad being the default location for teams (they can choose to override the location if they wish) | 19:01 |
frickler | in particular hold back on https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/941246 until the release is done, please | 19:01 |
clarkb | earlier today fungi and I tested meetpad functionality and it seems to be working so I went ahead and put meetpad02 and jvb02 in the emergency file | 19:02 |
frickler | what about etherpad? | 19:02 |
clarkb | this way new container images from upstream won't unexpectedly break us mid PTG. We can remove the hosts from the emergency file Friday afternoon | 19:02 |
clarkb | frickler: etherpad uses images we build so shouldn't get auto updated | 19:02 |
frickler | ah, right | 19:03 |
fungi | (but please don't push and approve any etherpad image update changes) | 19:03 |
clarkb | ++ | 19:04 |
clarkb | anything else to announce? | 19:04 |
frickler | do we want to skip the meeting next week? | 19:04 |
clarkb | good question. I'm happy to host it if we like but also happy to skip if people think they will be too busy with the ptg | 19:05 |
clarkb | I have intentionally avoided scheduling opendev ptg time as we do tend to be too busy for that | 19:05 |
* frickler wouldn't mind skipping. also +1 to the latter | 19:05 | |
fungi | yeah, the meeting technically doesn't conflict with the ptg schedule since it's not during a ptg timeslot, but i'd be okay with skipping | 19:06 |
clarkb | lets say we'll skip then and if something important comes up I can send out an agenda and reschedule it | 19:06 |
fungi | even if just to have one fewer obligation next week | 19:06 |
clarkb | but for now we'll say there is no meeting next week | 19:06 |
frickler | I also plan to more regularly skip the meeting during the summer time, but not set in stone yet | 19:06 |
clarkb | thanks for the heads up | 19:07 |
clarkb | #topic Zuul-launcher image builds | 19:07 |
clarkb | I think zuul has been able to dog food zuu launcher images and nodes a fair bit recently which is neat | 19:07 |
corvus | yeah i think the launcher is performing sufficiently well that we can expand its use | 19:08 |
corvus | i think we can switch the zuul tenant over to using it exclusively | 19:08 |
corvus | should we think about switching the opendev tenant too? | 19:08 |
clarkb | we do still need more image builds to be pushed up to https://opendev.org/opendev/zuul-providers/src/branch/master/zuul.d/image-build-jobs.yaml | 19:08 |
clarkb | corvus: no objections to switching opendev but that may need more images. Probably good motivation to get ^ done | 19:08 |
frickler | +1 to opendev | 19:09 |
clarkb | that is something I may be able to look at later this week or during the ptg next week depending on how gerrit things go | 19:09 |
fungi | i'm in favor | 19:09 |
corvus | sounds good, i'll make changes to switch the node usage over | 19:10 |
clarkb | corvus: are there any major known missing pieces to the system at this point? | 19:10 |
corvus | would be great for not-corvus to add more image jobs | 19:10 |
clarkb | or are we in the unknown unknowns right now and so more use is really what we need? | 19:10 |
corvus | i will also add the image build jobs to periodic pipeline to make sure they're rebuilt frequently | 19:11 |
corvus | i think we need to add statsd | 19:11 |
corvus | i don't think the launcher emits any useful stats now | 19:11 |
clarkb | oh ya that would be good | 19:11 |
corvus | but other than that, as far as the sort of main-line functionality, i think it's generally there, and we're in the unknown-unknowns phase | 19:12 |
frickler | does autohold work the same as with nodepool? | 19:12 |
clarkb | sounds like we have a rough plan for next steps then. statsd, more images, periodic builds, switch opendev jobs over | 19:12 |
fungi | one question springs to mind: how do we go about building different images at different frequencies now? dedicatd pipelines? | 19:13 |
corvus | frickler: probably not, that may be a missing piece | 19:13 |
corvus | fungi: yep | 19:13 |
frickler | fungi: maybe doing some in periodic-weekly would be good enough? | 19:13 |
fungi | yeah, i think so | 19:13 |
fungi | no need to over-complicate it | 19:13 |
clarkb | ++ to daily and weekly | 19:13 |
fungi | we always have the option of adding complexity later if we get bored and hate ourselves enough | 19:14 |
fungi | [masochist sysadmin stereotype] | 19:14 |
corvus | if we switch opendev over, and autohold isn't working and we need it, we can always dynamically switch back | 19:14 |
frickler | so the switch has to be per tenant or can we revert per repo if needed? | 19:15 |
clarkb | it is per job I think | 19:15 |
corvus | basically just saying: even if we switch the tenant over, we can always change back one project/job/change even. | 19:15 |
clarkb | so should be very flexible if we need an autohold | 19:15 |
corvus | yeah per job | 19:15 |
frickler | ah, cool | 19:15 |
clarkb | I think that works as a fallback | 19:15 |
fungi | the power of zuul | 19:15 |
clarkb | anything else on this subject? | 19:16 |
corvus | new labels will be like "niz-ubuntu-noble-8gb" and if you need to switch back to nodepool, just change it to "ubuntu-noble" | 19:16 |
corvus | that's it from me | 19:16 |
frickler | not directly related, but we still have held noble builds | 19:16 |
frickler | not sure if zuul uses neutron in any tests? | 19:16 |
clarkb | there are nodepool jobs that test against a real openstack | 19:17 |
clarkb | so those might be affected if they run on noble | 19:17 |
frickler | those likely will be broken until ubuntu publishes a fixed kernel, which is planned for the week of the 14th | 19:17 |
clarkb | ack | 19:17 |
clarkb | fwiw the noble nodes I booted toreplace old servers have ip6tables rules that look correct to me | 19:17 |
frickler | #link https://bugs.launchpad.net/neutron/+bug/2104134 for reference | 19:18 |
clarkb | so the brokeness must be in a very specific part of the ipv6 firewall handling | 19:18 |
frickler | yes, it is only a special ip6tables module that is missing | 19:18 |
clarkb | ack | 19:18 |
clarkb | #topic Container hygiene tasks | 19:18 |
clarkb | #link https://review.opendev.org/q/topic:%22opendev-python3.12%22+status:open Update images to use python3.12 | 19:18 |
clarkb | we updated matrix-eavesdrop and accessbot last week to python3.12 | 19:18 |
clarkb | accessbot broke on an ssl thing that fungi fixed up | 19:19 |
fungi | it happens | 19:19 |
clarkb | otherwise things are happy. I think this effort will be on hiatus this week while we wait for the openstack release and I focus on other things | 19:19 |
fungi | just glad to be fixing things for a change rather than breaking them | 19:19 |
clarkb | But so far no major issues with python3.12 | 19:19 |
clarkb | In related news I did try to test zuul wit python3.13 to see if we can skip 3.12 for zuul entirely. Unfortunately, zuul relies on google's re2 python package which doesn't have 3.13 wheels yet and they require libabsl and pybind11 versiosn that are too new even for noble | 19:20 |
clarkb | if we really want to we can use bazel to build packages for us which should fetch all the deps and do the right thing | 19:20 |
clarkb | but for now I'm hopeful upstream pushes new wheels (there is a change proposed to add 3.13 wheel builds already) | 19:21 |
clarkb | #topic Booting a new Gerrit server | 19:21 |
clarkb | high on my todo list for early april is buidling a new gerrit server so that we can change the production server over to a newer os version | 19:22 |
clarkb | the rough plan I've got in mind is boot the new server end of this week, early next week. Then late the week after do the production cutover (~April 17/18( | 19:22 |
clarkb | but the first step in doing that is deciding where to boot it. Each of the available options has downsides and upsides. I personally think the best option is to stay where we are and use a non boot from volume v3 flavor in vexxhost ymq | 19:23 |
clarkb | the reason for that is the most stable gerrit has ever been for us has been running on the large flavor (particularly with extra memory) in vexxhost and we don't have access to large nodes like that elsewhere | 19:23 |
clarkb | the downside to this location is ipv6 connectivity has been flaky for some isps in europe | 19:24 |
clarkb | alternatives would be rackspcae classic, the main drawbacks are that we'd probably have to redeploy to rax flex sooner than later (or back to vexxhost) and flavors are smaller aiui. Or ovh. The downside to ovh is their billing is weird and sometimes things go away unexpectedly | 19:24 |
clarkb | all that to say that my vote is for vexxhost ymq and if I don't hear strong objections or suggestions otherwise I'll probably boot a review03 there within the next week | 19:25 |
fungi | yeah, part of me feels wasteful because we're probably using a flavor twice the size of what we could get away with, but also we haven't been asked to scale it down and this is the core of all our project workflows | 19:25 |
clarkb | the day to day size is smaller than we need but whenever we have to do offline reindexing we definitely benefit from the memory and cpus | 19:26 |
fungi | so anything to help with stability makes sense | 19:26 |
clarkb | so bigger is good for upgrades/downgrades and unexpected spieks in demand | 19:26 |
fungi | as for doing it in other providers, i'm not sure we have any with similarly large flavors on offer | 19:26 |
frickler | I agree the current alternatives doen't look too good, so better live with the IPv6 issues and periodically nag mnaser ;=D | 19:26 |
clarkb | once the server is up (whereever it goes) the next step will be to put the server in the inventory safely (no replication config) and sync data safely from review02 (again don't copy the replication config) | 19:27 |
frickler | raxflex would be nice if it finally had IPv6 | 19:27 |
fungi | agreed, for a lot of our control plane in fact | 19:27 |
clarkb | that should allows us to check everything is working before scheduling a downtime and doing a final sync over on ~April 17-18 | 19:27 |
frickler | not sure if it would be an option to wait for that with gerrit? likely not with not schedule for it yet | 19:28 |
frickler | *no schedule | 19:28 |
clarkb | I don't think we should wait | 19:28 |
clarkb | there is never a great time to make changes to Gerrit so we just have to pick less bad times and roll with it | 19:28 |
clarkb | we don't have to make any hard decisions right this moment. I probably won't get to this until thursday or friday at the earliest. Chew on it and let me know if you have objectiosn or other ideas | 19:30 |
frickler | fair enough | 19:30 |
clarkb | I also wanted to note that Gerrit just made a 3.10.5 release | 19:30 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/946050 Gerrit 3.10.5 image update | 19:30 |
clarkb | the release notes don't look urgent to me so I'm fine holding off on this until after the openstack release | 19:30 |
clarkb | but I think late this week we should try to sneak this in too | 19:30 |
clarkb | just a heads up that is on my todo list. Don't awnt anyone to be surprised by a gerrit update particularly during release week | 19:32 |
clarkb | #topic Upgrading old servers | 19:32 |
clarkb | Since our last meeting I ended up replacing the rax.iad and rax.dfw servers. rax.iad was to update the base os and rax.dfw was to get rescheduled to avoid network bandwidth issues | 19:32 |
clarkb | both are running noble now | 19:33 |
clarkb | haven't seen any complaints and in the case of dfw we made the region useable again in the process | 19:33 |
fungi | yeah, that was a good impulse | 19:34 |
fungi | seems like there was either something happening with the hypervisor host the old one was running on, or some arbitrary rate limit applied to the server instance's interface | 19:34 |
clarkb | for other servers in the pipeline this is the "easy" list: refstack, mirror-update, eavesdrop, zuul schedulers, and zookeeper servers | 19:35 |
clarkb | I'd appreciate any help others can offer on this especially as I'm going to shift my focus on gerrit for the next bit | 19:35 |
fungi | we haven't heard back from rackspace folks on any underlying cause yet, that i've seen | 19:35 |
clarkb | fungi: right I haven't seen any root cause | 19:35 |
clarkb | but we left the old server up so they could continue to debug | 19:35 |
clarkb | cleaning it up will happen later | 19:35 |
clarkb | Oh also refstack may be the lowest priority as I'm not sure that there is anyone maintaining the software anymore | 19:36 |
clarkb | but the others are all valid I think | 19:37 |
clarkb | and there are more on the hard list (gerrit is on that list too) | 19:37 |
fungi | also we got confirmation from the foundation staff that they no longer rely on it for anything | 19:37 |
fungi | (refstack i mean) | 19:37 |
frickler | so announce deprecation and shut off in a year instead of migrating? | 19:38 |
clarkb | frickler: ya or even sooner | 19:38 |
clarkb | that rough plan makes sense to me. | 19:38 |
frickler | sure, I just wanted to avoid a "that's too fast" response ;) | 19:39 |
clarkb | but ya my next focus on this topic is gerrit. Would be great if others can fill in some of the gaps around that | 19:39 |
clarkb | and let me know if there are questions concerns generally on the proces | 19:39 |
clarkb | #topic Running certcheck on bridge | 19:39 |
clarkb | this is on the agenda mostly so I don't forget to look at it | 19:39 |
clarkb | I don't have any updates and don't think anyone else does so probably not much to say and we can continue | 19:39 |
clarkb | but I'll wait for a minute in case I'm wrong about that | 19:40 |
clarkb | #topic Working through our TODO list | 19:41 |
clarkb | If what we've discussed above doesn't inspire you or fill your todo list full of activities we do have a separate larger and broader list you can look at for inspiration | 19:42 |
clarkb | #link https://etherpad.opendev.org/p/opendev-january-2025-meetup | 19:42 |
clarkb | much of what we discuss week to week falls under this list but I just like to remind people we've got even more to dive into if there is interest | 19:42 |
clarkb | this applies to existing and new contributors alike | 19:42 |
clarkb | and feel free to reach out to me with questions if you have them about anything we're doing or have on that todo list | 19:42 |
clarkb | #topic Rotating mailman 3 logs | 19:42 |
clarkb | fungi: do we have any changes for this yet? speaking of autoholds this might be a good cause for autoholds to test whether copytruncate is useable | 19:43 |
fungi | no, sorry, not yet | 19:43 |
clarkb | ack its been busy lately and I think this week is no different. but might be good to throw something up and get it held so that we can observe the behavior over time | 19:44 |
clarkb | #topic Open Discussion | 19:45 |
clarkb | among everything else this week I'm juggling some family stuff and I'll be out Monday (so missing day 1 of the ptg) | 19:45 |
clarkb | as a perfect example I'm doing the school run this afternoon so will be out for a bit in 1.5 hours | 19:46 |
frickler | ah, that reminds me to drop the osc-placement autohold ;) | 19:46 |
fungi | i'll be around, happy to keep an eye on things when i'm not leading ptg sessions | 19:46 |
fungi | also for tomorrow's openstack release i'm going to try to be online by 10:00 utc (6am local for me) in case anything goes sideways | 19:47 |
clarkb | I won't be awake that early but when I do awke I'll check in on things and can help out if necessary too | 19:47 |
* tonyb will be around for the release also | 19:48 | |
fungi | much appreciated | 19:49 |
clarkb | anything else? | 19:50 |
clarkb | hopefully everyone is able to celebrate the release tomorrow | 19:50 |
clarkb | thank you everyone for your time and effort and help. | 19:51 |
fungi | i'll be hosting an openinfra.live episode on thursday where openstack community leaders will talk about new features and changes in various components | 19:51 |
clarkb | As mentioned at the beginning of the meeting we will skip next week's meeting unless something important comes up. That way you can all enjoy the PTG and not burn out on meetings as quickly | 19:51 |
tonyb | Nice .... more sleep for me :) | 19:52 |
clarkb | that too :) | 19:52 |
clarkb | #endmeeting | 19:52 |
opendevmeet | Meeting ended Tue Apr 1 19:52:36 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:52 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-01-19.00.html | 19:52 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-01-19.00.txt | 19:52 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2025/infra.2025-04-01-19.00.log.html | 19:52 |
fungi | thanks clarkb! | 19:52 |
clarkb | oh I was going to mention the matrix oftc bridge is still up as of an hour ago or so | 19:52 |
clarkb | we probably want to keep an eye on that so we can notify people if it does end up going away | 19:53 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!