Tuesday, 2024-11-12

clarkbJust about meeting time18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Nov 12 19:00:16 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/HQSCECQODT5XIHWR633MLLITCB3FG243/ Our Agenda19:00
clarkbsorry for getting the agenda out late this week. I was out yesterday so sent it first thing today19:00
clarkb#topic Announcements19:01
clarkbI didn't have anything19:01
clarkbI expect we'll have our regularly scheduled meeting next week and the week after. There is a slight possibility the one the week after may run into thanksgiving plans and get skipped but I have no plans that would do so at this point19:02
clarkb#topic Zuul-launcher image builds19:02
clarkbcorvus: last week you indicated that we needed additional changes in zuul as well as needing testing for raw image upload/download timing19:02
clarkbany updates on those items?19:02
corvusnothing since last week19:03
corvusslowly working on the upstream zuul changes; not started on the opendev-specific changes19:03
clarkbthanks. So to recap on the opendev side we can add more image builds (in addition to bullseye) and test image builds using raw images instead of qcow2 to see what the timing of that looks like19:04
corvusyep -- though note that one of the needed upstream changes is making the zuul-launcher download faster; so raw upload timings would be useful, but download timings are not optimized yet.19:04
clarkbgot it19:05
clarkbanything else on this subject?19:05
corvusnot from me19:06
clarkb#topic Backup Server Pruning19:06
clarkbas mentioned last week ianw pushed up a change to do automated retirement and purging of backups from ansible possible. Its requires us to explicitly list things to retire then purge but is nice for record keep in comparison to what I did manually19:06
clarkbthe underlying process of removing backups ends up being very similar though19:07
clarkb#link https://review.opendev.org/c/opendev/system-config/+/933700 Backup deletions managed through Ansible19:07
clarkbthis is the primary change to do that. I'm happy with it but did write a followup to fix a minor issue we're already hitting after my manual cleanups19:07
clarkb#link https://review.opendev.org/c/opendev/system-config/+/934768 Handle backup verification for purged backups19:07
clarkbmaybe we can get those reviewed and landed then continue with cleanups using this system? I suspect I'll have to manually bring ethercalc02 into the same state on the other backup server but that isn't a big deal19:08
clarkbif we're happy with that I'll update my documentation change as well to refer to this system19:09
clarkbor just abandon it if it doesn't serve a prupose19:09
fungiworth noting, it seems like the "backup inconsistency" warnings we keep receiving for ethercalc are sent weekly19:10
clarkbfungi: yup that second change should address that19:10
fungiso it wasn't just a one-time event19:10
fungicool19:10
clarkb(I have to touch .retired in the ethercalc02 dir to make that work but then it should be handled)19:10
clarkb#topic Upgrading old servers19:11
clarkbI don't think there is anything new on this topic. But I'll leave it open for a minute or two for others to jump in with udpates if we have them19:12
clarkb#topic Docker compose plugin with podman service for servers19:14
clarkbsimilar situation with this topic. I'm unaware of any updates but will leave it open for a couple minutes if ther eare any19:14
clarkb#topic Enabling mailman3 bounce processing19:16
clarkbthere are updates on this topic.19:16
clarkbAs promised I configured service-discuss to enable bounce processing. I didn't change any of the defaults as they seemed like a reasonable place to start.19:16
clarkbThis list is very low traffic so nothing really happened until I sent the meeting agenda email earlier today. That resulted in two list members having non zero scores19:17
clarkbThis is a good indication that bounce processing is unlikely to take immediate drastic action but also a more active list like opnstack-discuss might more quickly trend twoards removing people19:17
clarkbin any case I think I'm comfortable with proceeding with enabling this on more (all?) lists if others are19:18
frickler+119:18
clarkbfungi: you moderate many lists any preference on approach here? SHould we try to do it on a list by list basis via moderator action or something else?19:18
fungii think it's probably fine to roll out to more lists at this point19:20
clarkbfungi: did you want to pick some you moderate and do that? I can apply it to the other opendev lists19:21
clarkbcorvus: I guess you might consider doing it for the zuul lists too19:21
fungisure, i can19:21
corvusdo we need to manually do it for all lists?  is there a default for new lists?19:21
clarkbcorvus: we currently don't configure it when creating new lists, but in theory we can update list creation to enable it on them. But ya we don't really manage list settings post creation19:22
fungiit's possible the default for new lists is already on, and this is migrated lists we're changing? i'd need to look19:22
clarkboh ya that could be too.19:22
corvusack... my view is that this should be enabled for all existing and new lists without further delay (but, also, without urgency).19:23
clarkbbut ya we should enable it by default on new lists as part of this process. I'm less sure if there is value in automating enablement in existing lists19:23
clarkbcorvus: ack thanks19:23
corvusso whatever the best method to achieve that is... :)19:23
fungii'll take a look at the settings for the new list added by change 924432 in july19:24
fungibut it'll take me a few minutes since i need to use the superuser on it19:24
fungi"Process Bounces" is "no" there19:25
clarkbok so default is off on list creation but it should be possible to switch that to on somehow19:26
fungiso i guess we have it off by default for new lists, i'll see if we have that set in config19:26
clarkband then we leave existing lists alone so we'll need to either manually toggle them or write some tool to go update each of them separately19:26
clarkbcorvus: more generally I think it should be safe to list moderators to toggle the setting in the lists they manage19:26
clarkband if we do automate it for everything we should noop on those19:26
corvusack; i'll manually flip the bit for zuul lists19:27
corvuserm, what's the option name? :)19:28
clarkbcorvus: you go to settings then bounce processing then flip it to enabled19:29
corvushttps://imgur.com/FwNWrOI19:29
corvusthat one?19:29
clarkbyes19:29
corvusok; that was already set for zuul lists19:29
clarkboh more data backing up that this should be fine19:30
clarkbanyway we can move on I think we've got rough agreement to go aehad and do this, we can update new list creation then sort out existing lists as we go19:30
clarkb#topic Intermediate Insecure CI Registry Pruning19:30
clarkbThe intermdiate registry / insecure ci registry seems to be more stable now after updating its installation19:31
clarkbafter discussing needed pruning last week we did some code review and found a bugfix as well as added a dry run option19:31
clarkbthe plan is still to do a proper pruning on Friday as announced but I'm curious if we want to do a dry run first say tomorrow?19:31
clarkbor do we think it is safest to do the dry run in the announced window which is ~Friday19:32
corvusi think dry-run now sounds good19:32
clarkbok I'll work on figuring that out probably first thing tomorrow. (I only took one day off yesterday but I returend to a fairly big backlog I'm digging through today)19:33
corvusi mean, worst case if i completely botched it is that we accidentally delete some temporary registry things and there's a small blip in jobs which can be corrected by rechecking.19:33
fricklerthis might also be affected by the rax identity issues, though?19:33
clarkbfrickler: oh yes good point19:33
clarkbso waiting for that to resolve is a good idea too19:33
clarkbhopefully by tommorrow morning that wilkl be happy and I'll start a dry run in screen on the registry node19:34
clarkbI am hopeful that the bugfix means this will just work (tm)19:34
corvusyep, it's a plausible explanation and i think we can reset to "assume it works and debug what doesn't"19:35
clarkb#topic Gerrit 3.10 Upgrade Planning19:35
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.10 Gerrit upgrade planning document19:35
clarkbI would like to announce this upgrade soon if possible. Does the currently pencilled in time of Friday December 6 starting at 1600 UTC not work for some reason?19:36
fungiwfm19:37
clarkbya I'm not hearing any concerns I'll work on sending that email out later today unless something comes up before then19:37
clarkbOther than announcing the upgrade the other current today is simply going through the etherpad and ensuring we're comfortable with the changes/updates and any mitigations we might have19:38
clarkbso please look over the etehrpad, add notes or concernsi f you have them and I'll try to regularly review it and address them19:38
clarkbotherwise this seems like we're on track for uprading on the 6th19:39
clarkb#topic RTD Build Trigger Requests19:40
clarkbafter more debugging ianw is of the opinion that we're getting hit by client fingerprinting in the CDN layer19:40
clarkbone thought is that simply using a different client (like curl) may sidestep the issue19:41
clarkb#link https://review.opendev.org/c/zuul/zuul-jobs/+/934243 switch to curl instead of ansible uri module19:41
ianwyeah, they linked to  a post about their anti-bot things19:41
ianwhttps://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/19:42
clarkb#link https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/19:42
clarkbI'm actually not sure if curl is in our executor images19:42
clarkbwe should check that before approving I guess but otherwise that seems like a reasonable workaround to me19:43
fungiit's present on the executors themselves at least (i checked)19:43
clarkbthis would run within the container I think19:43
ianwi think that job might start a node but not use it?19:43
fungiah, i'm always confused as to whether ansible is running things from inside the zuul-executor container or the distro19:43
fungiianw: yeah, that came up separately, for some reason it has a default nodeset instead of an empty one19:44
clarkbhttps://opendev.org/openstack/project-config/src/branch/master/playbooks/publish/trigger-rtd.yaml#L2 it runs on localhost in the job not sure if the job has a nodeset19:44
corvusdocker run --rm -it quay.io/zuul-ci/zuul-executor:latest bash19:44
corvusroot@8f99eae3145d:/# curl19:44
corvuscurl: try 'curl --help' or 'curl --manual' for more information19:44
clarkbhttps://opendev.org/openstack/project-config/src/branch/master/zuul.d/jobs.yaml#L1108-L112719:44
clarkbit does have a nodeset I think19:44
clarkbcorvus: ok cool so it should work then we can also use an empty nodeset in the job19:45
clarkb#topic promote-openstack-manuals-developer fails with Ansible errors19:46
clarkbThis is a different type of job error that occurs due to vairables being undefined19:46
clarkb#link https://zuul.opendev.org/t/openstack/build/1a84db5d173c4777b9d730923721b04a Example failure19:46
clarkbpart of the problem here is that this promtoe job works for a special set of developer docs that aren't part of the main docs.openstack.org site so have specail everything and in this case something isn't quite right19:47
clarkbI suspect that fixing this is going to require someone pages in all of the things that make this different and then apply that perspective to the jobs and add the missing bits?19:47
fricklernote that the last known successful run of that job was > 3y ago. and I assume the regression may have been triggered by a change in zuul almost that old19:48
fricklerhttps://opendev.org/zuul/zuul/commit/be50a6ca42c41c0608dd02930a01123afd4e606419:48
clarkbya at this point I doubt that we're trying to dig down deltas in what changed to break it and instead just need to undersatnd it properly and determine why it is broken and roll forward19:48
clarkbpersonally I've always argued that developer.openstack.org should've been docs.openstack.org/developer and use the same systems as exist for docs19:49
clarkbthis is I think evidence for why this is a good idea but also that ship sailed a long time ago and the best thing is to simply figure out what needs to be changed to make it work19:49
clarkbis anyone interested in debugging this further? I know fungi took a stab at it.19:51
clarkbalso I'm not sure there is anything opendev or zuul specific about it other than that is where the failure is originating19:51
clarkbit should be solveable by anyone reading the jobs and error message?19:51
fungii've unfortunately already paged out 99% of the context there. pulled in too many directions19:51
fricklerthe error comes from some data that iiuc is coming from a secret19:52
fungithe only reason i even started looking at it was because i noticed the site mentioned and linked to trystack.org which hasn't existed for years, and i wanted to get rid of that dead link19:52
fricklerso not easy to debug for an outsider IMO19:52
clarkbfrickler: outsiders have the same info that we do when it comes to secrest in zuul though?19:52
clarkblike I don't generally go off and decrypt things19:52
clarkb(in fact I'm not sure I ever have)19:53
fricklerit may be needed in this case, though19:53
fricklerbut anyway, I can look further into this, but with low priority19:53
clarkbyes it is possible this would be that case19:53
clarkbok thanks19:53
clarkb#topic Open Discussion19:53
clarkbAnything else?19:53
fricklersomeone mentioned connectivity issues to vexxhost IPv6 earlier today19:54
fungithe openinfra foundation gained control of the openinfra.org domain and is going to start working to relocate their various sites out of the google-controlled .dev tld19:54
fricklerI didn't look closer yet, but seems to be recurring issue of what we had earlier19:55
fungii'm putting together a plan to migrate lists.openinfra.dev to lists.openinfra.org19:55
clarkband jayf mentioned connectivity issues that I think were actually slowness (connections are logged in sshd log but things took longer than expected)19:55
fricklerI did confirm the "no route to host" from AS3320 (Deutsche Telekom, german incumbent ISP)19:55
fungii'm shooting for migrating that mailman site the first week of december, so probably sending an announcement to the foundation mailing list about it on monday. i'll circulate an etherpad with the planned steps in the coming days19:56
clarkbfungi: thanks for the heads up19:56
fricklerfungi: can we keep a redirect from the old site?19:56
fungithe change on the mailman side is pretty simple because it's all in mariadb, so just some update queries (ideally with the services temporarily offline)19:56
fungifrickler: yeah, i plan to keep the old urls and addresses working indefinitely19:57
frickleror do they want to drop the .dev domain?19:57
fungithey just want .org to be the official domain, but will retain control of the .dev one19:57
fricklercool +119:57
fungithere are no plans to drop it, so we can keep redirects and address aliases for ~ever19:57
fungii'll take care of the changes to add the redirects, aliases, config update, et cetera19:58
corvus#link   https://review.opendev.org/c/openstack/project-config/+/934832 Fix openstack developer docs promote job [NEW]        19:58
corvusi did that with no special knowledge19:58
corvusi just read the error message and looked at the secret def19:59
corvushope it works19:59
clarkband we are at time20:00
clarkbthank you everyone20:00
clarkbWe'll be back next week same time and location20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Nov 12 20:00:18 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-12-19.00.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-12-19.00.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-12-19.00.log.html20:00
frickleroh, I was somehow assuming that all data in the secret would be encrypted and didn't check deeper yet, nice20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!