Tuesday, 2024-11-26

clarkbJust about meeting time18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Nov 26 19:00:26 2024 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/HVQKZ6YZUHSL6JQSADEIHVNK4PTCSM2E/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbReminder that this week is a major US holiday. I will be busy with family stuff the next few days and I expect that others will be in a similar boat19:01
fungiyes, same19:01
* frickler will be around19:02
clarkb#topic Zuul-launcher image builds19:02
clarkbcorvus continues to make progress on this item19:03
corvusyeah we're all set to test raw image upload but i have not done so yet19:03
clarkblast I saw there were promising results using zlib to compress raw images for their shuffling between locations. Gets the images down to a size similar to qcow2 and then timing after the compression is similar19:03
corvusclarkb: zstd actually19:04
clarkboh oops my bad19:04
corvusi benchmarked a bunch and zstd hit the sweet spot for time/space19:04
clarkbbut the important bit is that we can do a relatively quick compression that also gets us reasonable file sizes19:04
clarkband a reminder that adding new image builds for things other than debian bullseye is still helpful19:04
corvusyep, so i'm expecting something like 20m for every image format we deal with19:05
clarkbcorvus: I susppose vhd may not compress similar to raw but that seems unlikely (iirc our vhd images are very similar to raw)19:06
clarkbanything else on this topic?19:07
corvusnot from me19:07
clarkb#topic Backup Server Pruning19:07
clarkbthe changes to automate the retirement and purging of backup users from the backup servers has landed19:07
clarkbwe also landed a change to retire and purge the borg-ethercalc02 user/backups from the vexxhost server to catch my manual work there up with the new automation19:08
clarkbbest I can tell this all went fine19:08
clarkb#link https://review.opendev.org/c/opendev/system-config/+/936203 retire these old unneeded backups in prep for eventual purging19:08
clarkbthis change is the next step in retiring the unnecessary backups that I identified previously19:08
clarkbthat should mark each of them retired then we should do a manual prune step which will reduce the total backups for each retired server to a single backup (rather than the weekly monthly etc set we keep)19:09
clarkbwe should do that to ensure the scripts work as expected before continuing to the new step. I think for future retirements this would happen somewhat naturally as we'd retire the server/service when it gets replace or removed then a few months later we would prune due to normal disk utilization19:10
clarkbthen at some point later we would purge them. In this case I'm hoping we can speed that up since we're catching up with the new sytstem against some very old backups19:10
clarkbso basically land the retirements, manually run the prune and ensure the script is happy, then land a change to purge the backups19:10
clarkball that to say I think 936203 should be relatively safe at this point as the next major step is manual pruning and by definition we'redoing that when we can monitor it19:11
clarkbthank you ianw for the good brainstorming and implementation of this19:12
clarkbsometimes it is really helpful to have an outside perspective when you're just focused on fixing the immediate issue (high disk utilization)19:12
clarkb#topic Upgrading old servers19:12
clarkbtonyb: any updates here? fungi had to manually add some robots.txt handling and user agent filtering to the existing wiki to prevent the server from being overwhelemed19:13
clarkbthose may need to be resynced into the changes for deploying a new wiki19:13
fungii also did another quick check of user agents hitting wiki today and suspect the top offender(s) is/are bots with faked random user agent strings19:14
tonybsorry no upsates from me.19:14
corvusdid "brief downtime" help with reading robots.txt?  i missed the end of that story19:15
corvusre-reading robots.txt after update19:15
fungithere are 6 distinct agents making up >99% of the request volume and all claiming to be smart phones19:15
fungicorvus: it didn't seem to slow anything down, no19:15
clarkbeven the new meta agent doesn't seem to respect crawl delay19:15
fungiload average there is nearly 100 at the moment, after telling a couple of llm training crawlers who did identify themselves to go away19:16
clarkbits unfortunate that the internet has basically become a race to download as much info as possible. I seem to recall a futurama episode with this plot19:16
corvus:(19:16
corvusthose who don't watch futurama reruns are doomed to repeat them or something i think19:17
clarkbanyway if there are not other updates we can continue on19:18
clarkb#topic Docker Hub Rate Limits19:18
clarkbwe managed to disable proxy caching for docker hub and my anecdotal percecption is that this has helped without making hte problem go away19:19
clarkbI don't really have any other inptu at this point other than to say reducing any use of docker hub hosted images will only improve the isutation19:20
clarkbso if you've got easy places to address that (for example we fixed where zuul-registry is fetched for buildeset registry jobs) that would be good19:20
corvusif anyone wants to pitch in on my mirror change, feel free19:20
clarkboh and there is the rehosting toolset in zuul-jobs that corvus is working on19:20
* clarkb finds a link19:21
corvusit's a simple change with complex testing :)19:21
clarkb#link https://review.opendev.org/c/zuul/zuul-jobs/+/93557419:21
corvusthat's the one19:21
corvusi think it's almost there19:21
clarkb#topic Docker compose plugin with podman service for servers19:24
clarkbjust to followup from last week I wanted to record that we're going to try and modify our docker and docker compose installation roles to behave differently on noble and newer so that we can somewhat transparent transition to things as they are deployed on enwer hosts19:24
clarkbthis will aid our transition to new docker compose and podman by making the transition point a redeployment from older ubuntu to noble or newer19:26
clarkbbut other than that I think its mostly just a matter of getting that work done at this point19:26
clarkbso may drop this off of next week's meeting19:26
clarkbany questions/concerns/thoughts before we move on?19:26
clarkb#topic Gerrit 3.10 Upgrade Planning19:28
clarkbGerrit just released new bugfix releases for 3.9 and 3.1019:28
clarkbmy plan is to get new images built and test nodes held today (I hope) so that I can test these new images with some manually upgrade and downgrade testing early next week19:29
clarkbassuming that all comes together I think we are still on track for upgrading on December 619:29
clarkbI want to say they did a release last yaer during thanksgiving week too...19:29
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.10 Gerrit upgrade planning document19:31
clarkbreview of that very much welcome particualrly as we're less than 2 weeks away19:31
clarkbbut otherwise I think the ball is back in my court to catch up to these new gerrit releases19:31
clarkb#topic Renaming mailman's openinfra.dev vhost to openinfra.org19:31
clarkb#link https://etherpad.opendev.org/p/lists-openinfra-org-migration migration plan document19:31
clarkbfungi is planning to do this work on December 2 (tahts monday I think)19:31
fungiyep19:32
fungicurrently hacking away on it19:32
fungihoping to have the ansible change up for review later today19:33
clarkbanything else we should know or can do to help?19:34
fungi(have everything done except the apache redirects, i think)19:34
fungii've got a held job node i'm importing a production data set into with exim disabled, and will step through the database queries once i finish nailing them down19:34
fungii'll also send out a note to service-announce after today's meeting about downtime, if there are no objections19:35
fungiplanning for 15:00-17:00 but should hopefully go faster19:36
clarkbwfm19:37
clarkb#topic Upgrading Gitea to 1.22.419:39
clarkb#link https://review.opendev.org/c/opendev/system-config/+/936198 Upgrade gitea to 1.22.419:39
clarkbGitea made a new 1.22 release yesterday. Unfortunately I don't think they expect this to address the OOM issue with 1.2219:40
clarkbhowever there are a number of other bugfixes and performance improvements so this seems worthwhile?19:40
clarkbnto sure if we want to send it in now or delay for next week19:40
clarkbtesting seems clean and none of the templates we override were updated19:40
clarkb#topic Open Discussion19:43
clarkbAs mentioned Gerrit just did releases which I'll start poking at after lunch19:43
clarkbalso I'm trying to land the stack that ends at https://review.opendev.org/c/opendev/system-config/+/936297 once we can confirm the captcha's are rendering better with teh screenshot19:44
clarkbtrying to encourage people who fix bugs like that for us19:44
clarkbIn the process of working on ^ I noticed that we may have overdeleted an insecure ci registry hosted blob as part of the original pruning process19:44
clarkbI don't understand that yet but will look at it. If it is a timing issue then I suspect our daily prunes will be less susceptible as they take less than two hours but the initial prune took ~7 days19:45
fungii've got the aforementioned lists.openinfra.org gerrit change up for review and linked in the maintenance planning etherpad, i'll set it wip for now though19:45
fungiif anybody spots potential problems with it, let me know, but otherwise my focus is shifting to migration testing on the held node so i can flesh out the database queries in the pad19:47
clarkbconsidering we haven't seen any other complaints about 404s from the insecure ci registry hosted blobs my hunch is that htis is more of a corner case tahn something very wrong which is different to the other bugs we have already fixed in the process19:47
clarkbfungi: will do thanks19:47
clarkbI'll give it until 19:50 then call it if there is nothing else19:48
clarkbok thanks everyone!19:50
clarkbenjoy the holiday and we'll be back next week19:50
clarkb#endmeeting19:50
opendevmeetMeeting ended Tue Nov 26 19:50:55 2024 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:50
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-26-19.00.html19:50
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-26-19.00.txt19:50
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-26-19.00.log.html19:50

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!