Monday, 2025-02-24

fungiinfra root: heads up, we had a post_failure result on a build (a71fcfb612fa47d5b31a8b37a7ca211e in the openstack tenant) that logged a upload-logs-swift upload task failure at 10:42:50,003 from ze06, _swift_provider_name was rax_iad. i don't see other obvious post_faulure builds at the moment, but keep an eye out in case there are swift or keystone problems in that provider13:40
fungihttps://rackspace.service-now.com/system_status doesn't indicate any issues13:42
fungias soon as my current meeting wraps up at the top of the hour, i'm going to run some errands and grab lunch, so probably gone between 16:00-18:00 utc15:37
fungicloudnull: by any chance, is the PUBLICNET network in flex dfw3 and sjc3 out of addresses? at one point i was able to boot a server instance with --network=PUBLICNET but now i get "SDKException: Error in creating the server. Compute service reports fault: [...] No valid service subnet for the given device owner, [...]" in both regions15:42
fricklerthat's sounds a bit like they might be explictly disallowing instance ports in that network?15:43
fungipossible, if so that's a semi-recent change15:44
fungiit was convenient not to need to deal with floating-ips and creating a private network/router/et cetera for things that are always going to get global addresses anyway15:45
fungimeeting wrapped up a few minutes early so heading out now, but should be back in a couple of hours15:51
clarkbinfra-root the three main things on my agenda for the start of the week are fixing up known_hosts generations on bridge: https://review.opendev.org/c/opendev/system-config/+/942307 this one does impact infra-prod jobs which are hard to test and potentially disruptive if we get it wrong so please review carefully. Then upgrading gitea:16:07
clarkbhttps://review.opendev.org/c/opendev/system-config/+/942477 and upgrading grafana are the other two: https://review.opendev.org/c/opendev/system-config/+/940997 I think we can do those in whichever order we feel more comfortable16:07
clarkbI also need to checking and see if any parts of zuul are publishing tracing events to tracing01 and if not start tearing it down16:07
clarkbhttps://tracing01.opendev.org/search?end=1740413341638000&limit=20&lookback=1h&maxDuration&minDuration&service=zuul&start=1740409741638000 has no data but updating hte url to tracing.o.o does show data so ya everything seems to have shifted over16:09
clarkbbut before I load ssh keys for that it is time for local updates and reboots. I'll be back as soon as those can complete16:12
opendevreviewClark Boylan proposed opendev/system-config master: Drop tracing01 from the inventory  https://review.opendev.org/c/opendev/system-config/+/94261716:38
opendevreviewClark Boylan proposed opendev/zone-opendev.org master: Remove tracing01 from DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/94261816:40
clarkbthat seems like a good pile of work to start on a monday. Let me know what you think the best path forward is. Happy to let the infra-prod known_hosts setup stew a bit if we want to think it through carefully or send it then use these other changes as canaries (we can always write more changes to exercise it though)16:41
fungilooking, thanks!17:47
fungiall of those lgtm, but seems i'm the only +2 on any of them so far18:08
fungii suppose i could single-core approve some og the lower-impact changes to give others more time to check the rest18:09
clarkbthat would be great18:10
clarkbthank you for looking18:10
clarkbI intend on being around all day today so will keep an eye on things18:10
clarkbfungi: where do you think we should start? I expect the grafana upgrade might be the most disruptive of the less disruptive changes (and eve nthen its not a user facing thing for the most part we just revert and figure it out)18:22
fungiyeah, i expect that and the tracing changes you just pushed are safe enough we can work through any unexpected fallout with only minor disruption18:24
clarkbgitea should be pretty safe too as its only a minor bugfix update18:24
clarkbroll a die, divide by 2 and pick one of the three?18:25
fungiagreed, once the grafana change deploys i'll approve the gitea upgrade next18:25
clarkbcool18:25
fungithe tracing changes are just cleanup so not all that much benefit to rushing them nor to letting them sit\18:25
clarkbya18:26
clarkbfungi: I wanted to followup and ask if yo uhad a chance to look at the lists' log files that were growing without rotation and whether or not we should push rotation rules18:37
clarkband also wondered if you hit any more problems with uploading noble images and/or booting new mirror nodes in raxflex18:38
fungiclarkb: hven't had a chance to add log rotation to the mailman containers yet19:04
opendevreviewMerged opendev/system-config master: Update to grafana 11.5.1  https://review.opendev.org/c/opendev/system-config/+/94099719:04
fungias for rackspace flex, images uploaded fine, servers can't attach addresses from the public network so we may have to do floating-ips, hoping someone there can clarify before we go ahead19:05
clarkbgot it thanks19:05
clarkbhave to wait for the hourly jobs to finish before grafana deploys19:06
opendevreviewMerged openstack/project-config master: Deprecate ironic-lib  https://review.opendev.org/c/openstack/project-config/+/93928219:16
clarkbgrafana deployed and it seems to still be working for me19:28
clarkbI think I'm happy with it19:29
fungiyeah, seems to be working fine still19:32
fungigitea upgrade change approved next19:32
clarkbI've done a first pass update to the meeting agenda19:41
clarkblet me know if there is anything I'm missing19:41
corvusclarkb: fungi do we want definitive answers/plans to ianw's comments on 942307 before proceeding with that?19:58
clarkbcorvus: I did move the jobs inside project.yaml to address the two comments20:09
clarkband for the third comment I see that more as a future state to aim towards than a necessary update for today. I also don't think the current change is at odds with that plan20:09
clarkbbasically this change addresses the immediate problem then in the ufture we'll continue to update system-config in this new way but stop updating it everywhere else with the proposed plan which works for me20:10
clarkbI rebased ianw's change that starts to sketch that out onto 942307 too to kinda show that was the direction I thought we should take. But let me know if you think we need to codify that more20:12
corvuswas mostly thinking that maybe throwing some of that into a comment to help keep track of the discussion would be good.  like, i don't know when/if we're planning on doing the paused job thing20:55
clarkbI don't have a timeline myself as my primary concern right now is being able to replace old servers with new ones. But ya I can followup with a commment that indicates that this seems like a good plan for how to parallelize our infra-prod job runs20:58
clarkband posted21:01
corvusthat's great, thanks.  i think that will help us (me at least) keep track of the spinning plates21:04
clarkbthe gitea upgrade is about to land21:06
opendevreviewMerged opendev/system-config master: Update got Gitea 1.23.4  https://review.opendev.org/c/opendev/system-config/+/94247721:06
opendevreviewMerged opendev/system-config master: Drop tracing01 from the inventory  https://review.opendev.org/c/opendev/system-config/+/94261721:06
fungideploy ended up behind the hourlies though21:10
clarkbthe deploy has started now21:23
clarkbhasn't gotten to upgrading any thing yet though21:23
clarkbhttps://gitea09.opendev.org:3081/opendev/system-config has updated and lgtm. I'm testing a clone next21:25
fungiyeah, seems to be upgraded and still browseable and i can clone from it21:26
clarkbgit clone seems happy21:27
clarkband all 6 have upgraded now21:32
fungiand the deploy buildset reported success21:35
fungithat inventory change is going to run a lot of deploy jobs. we're up to 34, looks like?21:36
clarkbya thats the motivation for parallelizing them that ianw was working on then it got back burnered21:37
clarkbdefinitely a good thing to pick back up again21:37
clarkbcorvus: before I remove tracing01 from DNS (and then subsequently delete it) you had indicated previously you don't think the data needed to migrate over to tracing02.  I assume that means you are also comfortable with it being deleted without preservation?21:51
corvusclarkb: affirmative21:51
fungii guess we can go ahead with the dns removal for tracing01? i doubt we care to see the dependency's deploy jobs succeed first22:07
clarkbfungi: ya I think it should be good to go for the dns removal22:16
fungiin that case, fire in the hole22:27
opendevreviewMerged opendev/zone-opendev.org master: Remove tracing01 from DNS  https://review.opendev.org/c/opendev/zone-opendev.org/+/94261822:34
fungithe deploy on that is going to be waiting behind the inventory deploy and the hourlies22:36
opendevreviewClark Boylan proposed opendev/system-config master: Run gitea with memcached cache adapter  https://review.opendev.org/c/opendev/system-config/+/94265023:07
clarkbI've been looking at gitea caching options and ended up with ^. The other options are redis (no longer open source right?) and twoqueue which is a size limited cache implemented within the Go runtime (its compiled in)23:08
clarkbI chose memcached bceause it is open source and not aprt of the go runtime (whose gc we think is part of the problem). If we prefer to start with twoqueue because it means one less container image and one fewer service to run we can start there instead23:08
clarkbBut I figured push something up and see what people have to say about it23:08
clarkbmnaser: ^ fyi this is coming out of the debugging for your slow request problems23:08
corvusneat: gerrit's issue tracker is returning no comment content for me23:21
corvuseg https://issues.gerritcodereview.com/issues/4001259423:21
corvusearlier today there was a description of the problem23:21
corvusso either my interwebs are broken, or hosting dev tools is hard23:22
corvuspossibly both i guess23:22
clarkblooks like if you login you see them23:22
clarkbI wonder if they made that change because of all the spam they have been having23:22
clarkbI can confirm the behavior in a tab group that isn't logged in but see comments when in a tab group that is logged in23:22
corvusmaybe, or maybe it's an anti-ai-scraping thing23:23
clarkbrelated: it bugs me that you can't log out of anything you're logged into with google without logging entirely out of google. This also applies when you have multipel google accounts logged in at the same time. You can't log out of one or the other have to do both23:29
corvusi live by firefox containers23:30
clarkbya I've been using them more and more. But usually there is a single google container to try and isolate google frome verything else23:31
clarkbbut maybe I should cave and have multiple google containers23:31
clarkbdid another pass on the meeting agenda. I'll get that out at ~00:30 UTC. Let me know if anything needs updating23:52
fungii have a separate firefox container per google account, since i need separate google accounts for different nonprofits i work with. so annoying23:59

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!