clarkb | we will start our weekly meeting in just a minute or two | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Mar 11 19:00:18 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/X75EQWC4QAGVRJKPB3HTTKHKONAM26JJ/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | I don't have anything to announce. Did anyone else have something? | 19:00 |
* tonyb has nothing | 19:02 | |
clarkb | #topic Zuul-launcher image builds | 19:03 |
clarkb | I think the big update here is that we did do the raxflex surgery to switch tenants and update networks (and their MTUs) | 19:03 |
clarkb | that happened for both zuul-launcher and nodepool and as far as I know is working | 19:03 |
corvus | yay! | 19:04 |
corvus | i know at least some nodes were provided by zuul-launcher yesterday | 19:04 |
corvus | i need to check on a node failure though | 19:04 |
clarkb | that would imply things are mostly working I think. Thats good | 19:04 |
corvus | so let's say i should confirm that's actually working with zl | 19:04 |
corvus | hrm, i think there may be a rax flex problem with zl | 19:05 |
corvus | 2025-03-10 17:26:56,062 ERROR zuul.Launcher: TypeError: create_server() requires either 'image' or 'boot_volume' | 19:05 |
corvus | not sure what the deal is there | 19:05 |
corvus | i'll follow up in #opendev tho | 19:06 |
clarkb | nodepool is using images to boot off of not boot from volume fwiw | 19:06 |
clarkb | ack. Anything else to bring up on this topic? | 19:06 |
corvus | i think we're really close to being ready to start expanding the use of niz | 19:06 |
corvus | like maybe one or two more features to add to the launcher | 19:06 |
corvus | so it's probably worth asking how to proceed? | 19:07 |
corvus | i'm thinking... | 19:07 |
corvus | continue to add some more providers, and switch the zuul tenant over to use niz exclusively. do that without the new flavors though (so we use the old flavors -- 8gb everywhere) | 19:07 |
fungi | i do have a change up to add rackspace flex dfw3 | 19:07 |
corvus | (basically, decouple the adding of new flavors from the switching to niz) | 19:08 |
corvus | fungi: oh cool | 19:08 |
fungi | note that sjc3 and dfw3 use different flavor names, so my change moves the label->flavor mapping up into the region layer | 19:08 |
corvus | if we do what i suggest -- should we also dial down max-servers in nodepool just a little? or let them duke it out? | 19:09 |
fungi | #link https://review.opendev.org/943104 Add the DFW3 region for Rackspace Flex | 19:09 |
clarkb | corvus: I suspect we can dial back max-servers by some small amount in nodepool | 19:09 |
corvus | fungi: nice -- that's why we can do it in both places | 19:09 |
fungi | yeah, i found that convenient | 19:09 |
fungi | great design! | 19:09 |
clarkb | we tend to float under max-servers due to $cloudproblems anyway so we may find that eventually leads to them duking it out :) | 19:10 |
corvus | we're definitely getting to the point where more images would be good to have :) we have everything needed for zuul i think, but it would be good to have other images that other tenants use | 19:10 |
corvus | clarkb: good point | 19:10 |
corvus | i think that's about it from me | 19:11 |
clarkb | thanks for the update. Adding more images and starting with the zuul tenant sounds great to me | 19:11 |
clarkb | #topic Updating Flavors in OVH | 19:12 |
clarkb | related is the OVH flavor surgery that we've discussed with amorin | 19:12 |
clarkb | #link https://etherpad.opendev.org/p/ovh-flavors | 19:12 |
clarkb | the proposal is to do this Monday (March 17) ish | 19:12 |
corvus | did amorin respond about scheduling? | 19:12 |
corvus | i missed a bunch of scrollback due to travel | 19:12 |
clarkb | 19:21:33* amorin | corvus: I will check with the team tomorrow about march 17 and let you know. | 19:13 |
clarkb | this was the last message I saw | 19:13 |
clarkb | so I guess we're still waiting on final confirmation they will proceed then | 19:13 |
corvus | cool, i did miss that. gtk. | 19:13 |
corvus | probably worth a ping to check on the status, but no rush | 19:13 |
clarkb | ++ | 19:13 |
clarkb | and ya other wise its mostly be aware of this as a thing we're trying to do. It should make working with ovh in zuul-launcher and nodepool nicer | 19:14 |
clarkb | more future proofed by getting off the bespoke scheduling system | 19:14 |
clarkb | #topic Running infra-prod Jobs in Parallel on Bridge | 19:16 |
clarkb | Yesterday we landed the chnage to increase our infra-prod jobs semaphore limit to 2. Since then we have been running our infra-prod jobs in parallel | 19:16 |
corvus | yaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaay | 19:17 |
tonyb | \o/ #greatsuccess | 19:17 |
fungi | now deploys are happening ~2x as fast | 19:17 |
fungi | it's very nice! | 19:17 |
clarkb | I monitored deploy, opendev-prod-hourly, and periodic buildsets as well as general demand/load on bridge and thankfully no major issues came up | 19:17 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/943999 A small logs fixup | 19:17 |
clarkb | this fixup addresses ansible logging on bridge that ianw noticed is off | 19:18 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/943992 A small job ordering fixup | 19:18 |
clarkb | and this change makes puppet-else depend on letsencrypt because puppet deployed services do use LE certs | 19:18 |
clarkb | neither of which is urgent nor impacting functionality at the moment | 19:18 |
corvus | it's important that puppet work so we can update our jenkins configuration :) | 19:19 |
clarkb | indeed | 19:19 |
corvus | (amusingly, since change #1 is a puppet change, we can say 943998 changes later and we still have puppet) | 19:20 |
clarkb | other than those two chagnes I think the next step is considering increasing that semaphore limit to a larger number. Bridge has 8vcpus and load avg was below 2 everytime I checked. I think that means we can safely bump up to 4 or 5 in the semaphore | 19:20 |
corvus | 4 sgtm | 19:21 |
tonyb | I like 4 also | 19:21 |
clarkb | I was also hoping to land a few more regular system-config changes to see deploy do its thing but the one that merged today largely noop'd | 19:21 |
clarkb | fungi's change from yseterday did not noop and was a good exercise thankfully | 19:22 |
clarkb | ok maybe try to land a change or two to system-config later today and if that and periodic look good bump up to 4 tomororw | 19:22 |
fungi | once we have performance graphs we can check easily again, we might consider increasing it further | 19:23 |
clarkb | anything else on this topic? It was a long time coming but its here and seems to be working so yay and thank you to everyone who helped make it happen | 19:23 |
clarkb | #topic Upgrading old servers | 19:24 |
clarkb | As expected last week was consumed by infra-prod chagnes and the start of this week too | 19:24 |
clarkb | so I don't have anyting enw on this front. But it is the next big item on my todo list to tackle | 19:25 |
clarkb | tonyb: did you have any updates? | 19:25 |
tonyb | :( No. | 19:26 |
clarkb | #topic Sprinting to Upgrade Servers to Noble | 19:26 |
clarkb | so as mentioned I'm hoping to pick this up again starting tomorrow most likely | 19:26 |
clarkb | Help with reviews is always appreciated as is help bootstrapping replacement servers | 19:26 |
clarkb | #link https://etherpad.opendev.org/p/opendev-server-replacement-sprint | 19:26 |
clarkb | feel free to throw some notes in that etherpad and I'll try to get it updated from my side as I pick it up again | 19:27 |
clarkb | Then in related news I started brainstorming what the Gerrit replacement looks like. I think it is basically boot a new gerrit and volume, have system-config deploy it "empty" without replication configured. Sanity check things look right and functional. Then sync over current prod state and recheck operation. Then schedule a downtime to do a final sync and cut over dns | 19:28 |
clarkb | the main thing is taht we don't want replication to run until the new server is in production to avoid overwriting content on gitea | 19:28 |
clarkb | if you can think of any other sync poinst that need to be handled carefully let me know | 19:28 |
clarkb | #topic Running certcheck on bridge | 19:29 |
clarkb | as mentioned last week I volunteered to look at running this within an infra-prod job that runs daily instead | 19:29 |
clarkb | I haven't done that yet but am hopeful I'll be able to look before next meeting just because I have so much infra-prod stuff paged in at this point | 19:30 |
clarkb | and with things running in parallel the wall clock time cost for that should be minimal | 19:30 |
clarkb | #topic Working through our TODO list | 19:31 |
clarkb | #link https://etherpad.opendev.org/p/opendev-january-2025-meetup | 19:31 |
clarkb | this is just your weekly reminder to look at that list if you need good ideas for what to work on next. I think I can probably mark off infra-prod running in parallel now too | 19:31 |
clarkb | #topic Open Discussion | 19:32 |
clarkb | Anything else? | 19:32 |
fungi | i didn't have anything | 19:33 |
clarkb | my kids are out on spring break week after next. I may try to pop out a day or two to do family things as a result. but we have nothing big planned so my guess is it wouldn't be anything crazy | 19:34 |
tonyb | Nothing from me this week | 19:34 |
clarkb | that was quick. I think we can end early today then | 19:36 |
clarkb | thank you everyone for your time and help | 19:36 |
clarkb | see you back here next week | 19:36 |
clarkb | #endmeeting | 19:36 |
opendevmeet | Meeting ended Tue Mar 11 19:36:22 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:36 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2025/infra.2025-03-11-19.00.html | 19:36 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-03-11-19.00.txt | 19:36 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2025/infra.2025-03-11-19.00.log.html | 19:36 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!