fungi | infra root: heads up, we had a post_failure result on a build (a71fcfb612fa47d5b31a8b37a7ca211e in the openstack tenant) that logged a upload-logs-swift upload task failure at 10:42:50,003 from ze06, _swift_provider_name was rax_iad. i don't see other obvious post_faulure builds at the moment, but keep an eye out in case there are swift or keystone problems in that provider | 13:40 |
---|---|---|
fungi | https://rackspace.service-now.com/system_status doesn't indicate any issues | 13:42 |
fungi | as soon as my current meeting wraps up at the top of the hour, i'm going to run some errands and grab lunch, so probably gone between 16:00-18:00 utc | 15:37 |
fungi | cloudnull: by any chance, is the PUBLICNET network in flex dfw3 and sjc3 out of addresses? at one point i was able to boot a server instance with --network=PUBLICNET but now i get "SDKException: Error in creating the server. Compute service reports fault: [...] No valid service subnet for the given device owner, [...]" in both regions | 15:42 |
frickler | that's sounds a bit like they might be explictly disallowing instance ports in that network? | 15:43 |
fungi | possible, if so that's a semi-recent change | 15:44 |
fungi | it was convenient not to need to deal with floating-ips and creating a private network/router/et cetera for things that are always going to get global addresses anyway | 15:45 |
fungi | meeting wrapped up a few minutes early so heading out now, but should be back in a couple of hours | 15:51 |
clarkb | infra-root the three main things on my agenda for the start of the week are fixing up known_hosts generations on bridge: https://review.opendev.org/c/opendev/system-config/+/942307 this one does impact infra-prod jobs which are hard to test and potentially disruptive if we get it wrong so please review carefully. Then upgrading gitea: | 16:07 |
clarkb | https://review.opendev.org/c/opendev/system-config/+/942477 and upgrading grafana are the other two: https://review.opendev.org/c/opendev/system-config/+/940997 I think we can do those in whichever order we feel more comfortable | 16:07 |
clarkb | I also need to checking and see if any parts of zuul are publishing tracing events to tracing01 and if not start tearing it down | 16:07 |
clarkb | https://tracing01.opendev.org/search?end=1740413341638000&limit=20&lookback=1h&maxDuration&minDuration&service=zuul&start=1740409741638000 has no data but updating hte url to tracing.o.o does show data so ya everything seems to have shifted over | 16:09 |
clarkb | but before I load ssh keys for that it is time for local updates and reboots. I'll be back as soon as those can complete | 16:12 |
opendevreview | Clark Boylan proposed opendev/system-config master: Drop tracing01 from the inventory https://review.opendev.org/c/opendev/system-config/+/942617 | 16:38 |
opendevreview | Clark Boylan proposed opendev/zone-opendev.org master: Remove tracing01 from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/942618 | 16:40 |
clarkb | that seems like a good pile of work to start on a monday. Let me know what you think the best path forward is. Happy to let the infra-prod known_hosts setup stew a bit if we want to think it through carefully or send it then use these other changes as canaries (we can always write more changes to exercise it though) | 16:41 |
fungi | looking, thanks! | 17:47 |
fungi | all of those lgtm, but seems i'm the only +2 on any of them so far | 18:08 |
fungi | i suppose i could single-core approve some og the lower-impact changes to give others more time to check the rest | 18:09 |
clarkb | that would be great | 18:10 |
clarkb | thank you for looking | 18:10 |
clarkb | I intend on being around all day today so will keep an eye on things | 18:10 |
clarkb | fungi: where do you think we should start? I expect the grafana upgrade might be the most disruptive of the less disruptive changes (and eve nthen its not a user facing thing for the most part we just revert and figure it out) | 18:22 |
fungi | yeah, i expect that and the tracing changes you just pushed are safe enough we can work through any unexpected fallout with only minor disruption | 18:24 |
clarkb | gitea should be pretty safe too as its only a minor bugfix update | 18:24 |
clarkb | roll a die, divide by 2 and pick one of the three? | 18:25 |
fungi | agreed, once the grafana change deploys i'll approve the gitea upgrade next | 18:25 |
clarkb | cool | 18:25 |
fungi | the tracing changes are just cleanup so not all that much benefit to rushing them nor to letting them sit\ | 18:25 |
clarkb | ya | 18:26 |
clarkb | fungi: I wanted to followup and ask if yo uhad a chance to look at the lists' log files that were growing without rotation and whether or not we should push rotation rules | 18:37 |
clarkb | and also wondered if you hit any more problems with uploading noble images and/or booting new mirror nodes in raxflex | 18:38 |
fungi | clarkb: hven't had a chance to add log rotation to the mailman containers yet | 19:04 |
opendevreview | Merged opendev/system-config master: Update to grafana 11.5.1 https://review.opendev.org/c/opendev/system-config/+/940997 | 19:04 |
fungi | as for rackspace flex, images uploaded fine, servers can't attach addresses from the public network so we may have to do floating-ips, hoping someone there can clarify before we go ahead | 19:05 |
clarkb | got it thanks | 19:05 |
clarkb | have to wait for the hourly jobs to finish before grafana deploys | 19:06 |
opendevreview | Merged openstack/project-config master: Deprecate ironic-lib https://review.opendev.org/c/openstack/project-config/+/939282 | 19:16 |
clarkb | grafana deployed and it seems to still be working for me | 19:28 |
clarkb | I think I'm happy with it | 19:29 |
fungi | yeah, seems to be working fine still | 19:32 |
fungi | gitea upgrade change approved next | 19:32 |
clarkb | I've done a first pass update to the meeting agenda | 19:41 |
clarkb | let me know if there is anything I'm missing | 19:41 |
corvus | clarkb: fungi do we want definitive answers/plans to ianw's comments on 942307 before proceeding with that? | 19:58 |
clarkb | corvus: I did move the jobs inside project.yaml to address the two comments | 20:09 |
clarkb | and for the third comment I see that more as a future state to aim towards than a necessary update for today. I also don't think the current change is at odds with that plan | 20:09 |
clarkb | basically this change addresses the immediate problem then in the ufture we'll continue to update system-config in this new way but stop updating it everywhere else with the proposed plan which works for me | 20:10 |
clarkb | I rebased ianw's change that starts to sketch that out onto 942307 too to kinda show that was the direction I thought we should take. But let me know if you think we need to codify that more | 20:12 |
corvus | was mostly thinking that maybe throwing some of that into a comment to help keep track of the discussion would be good. like, i don't know when/if we're planning on doing the paused job thing | 20:55 |
clarkb | I don't have a timeline myself as my primary concern right now is being able to replace old servers with new ones. But ya I can followup with a commment that indicates that this seems like a good plan for how to parallelize our infra-prod job runs | 20:58 |
clarkb | and posted | 21:01 |
corvus | that's great, thanks. i think that will help us (me at least) keep track of the spinning plates | 21:04 |
clarkb | the gitea upgrade is about to land | 21:06 |
opendevreview | Merged opendev/system-config master: Update got Gitea 1.23.4 https://review.opendev.org/c/opendev/system-config/+/942477 | 21:06 |
opendevreview | Merged opendev/system-config master: Drop tracing01 from the inventory https://review.opendev.org/c/opendev/system-config/+/942617 | 21:06 |
fungi | deploy ended up behind the hourlies though | 21:10 |
clarkb | the deploy has started now | 21:23 |
clarkb | hasn't gotten to upgrading any thing yet though | 21:23 |
clarkb | https://gitea09.opendev.org:3081/opendev/system-config has updated and lgtm. I'm testing a clone next | 21:25 |
fungi | yeah, seems to be upgraded and still browseable and i can clone from it | 21:26 |
clarkb | git clone seems happy | 21:27 |
clarkb | and all 6 have upgraded now | 21:32 |
fungi | and the deploy buildset reported success | 21:35 |
fungi | that inventory change is going to run a lot of deploy jobs. we're up to 34, looks like? | 21:36 |
clarkb | ya thats the motivation for parallelizing them that ianw was working on then it got back burnered | 21:37 |
clarkb | definitely a good thing to pick back up again | 21:37 |
clarkb | corvus: before I remove tracing01 from DNS (and then subsequently delete it) you had indicated previously you don't think the data needed to migrate over to tracing02. I assume that means you are also comfortable with it being deleted without preservation? | 21:51 |
corvus | clarkb: affirmative | 21:51 |
fungi | i guess we can go ahead with the dns removal for tracing01? i doubt we care to see the dependency's deploy jobs succeed first | 22:07 |
clarkb | fungi: ya I think it should be good to go for the dns removal | 22:16 |
fungi | in that case, fire in the hole | 22:27 |
opendevreview | Merged opendev/zone-opendev.org master: Remove tracing01 from DNS https://review.opendev.org/c/opendev/zone-opendev.org/+/942618 | 22:34 |
fungi | the deploy on that is going to be waiting behind the inventory deploy and the hourlies | 22:36 |
opendevreview | Clark Boylan proposed opendev/system-config master: Run gitea with memcached cache adapter https://review.opendev.org/c/opendev/system-config/+/942650 | 23:07 |
clarkb | I've been looking at gitea caching options and ended up with ^. The other options are redis (no longer open source right?) and twoqueue which is a size limited cache implemented within the Go runtime (its compiled in) | 23:08 |
clarkb | I chose memcached bceause it is open source and not aprt of the go runtime (whose gc we think is part of the problem). If we prefer to start with twoqueue because it means one less container image and one fewer service to run we can start there instead | 23:08 |
clarkb | But I figured push something up and see what people have to say about it | 23:08 |
clarkb | mnaser: ^ fyi this is coming out of the debugging for your slow request problems | 23:08 |
corvus | neat: gerrit's issue tracker is returning no comment content for me | 23:21 |
corvus | eg https://issues.gerritcodereview.com/issues/40012594 | 23:21 |
corvus | earlier today there was a description of the problem | 23:21 |
corvus | so either my interwebs are broken, or hosting dev tools is hard | 23:22 |
corvus | possibly both i guess | 23:22 |
clarkb | looks like if you login you see them | 23:22 |
clarkb | I wonder if they made that change because of all the spam they have been having | 23:22 |
clarkb | I can confirm the behavior in a tab group that isn't logged in but see comments when in a tab group that is logged in | 23:22 |
corvus | maybe, or maybe it's an anti-ai-scraping thing | 23:23 |
clarkb | related: it bugs me that you can't log out of anything you're logged into with google without logging entirely out of google. This also applies when you have multipel google accounts logged in at the same time. You can't log out of one or the other have to do both | 23:29 |
corvus | i live by firefox containers | 23:30 |
clarkb | ya I've been using them more and more. But usually there is a single google container to try and isolate google frome verything else | 23:31 |
clarkb | but maybe I should cave and have multiple google containers | 23:31 |
clarkb | did another pass on the meeting agenda. I'll get that out at ~00:30 UTC. Let me know if anything needs updating | 23:52 |
fungi | i have a separate firefox container per google account, since i need separate google accounts for different nonprofits i work with. so annoying | 23:59 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!