clarkb | Just about meeting time | 18:59 |
---|---|---|
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Nov 26 19:00:26 2024 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/HVQKZ6YZUHSL6JQSADEIHVNK4PTCSM2E/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | Reminder that this week is a major US holiday. I will be busy with family stuff the next few days and I expect that others will be in a similar boat | 19:01 |
fungi | yes, same | 19:01 |
* frickler will be around | 19:02 | |
clarkb | #topic Zuul-launcher image builds | 19:02 |
clarkb | corvus continues to make progress on this item | 19:03 |
corvus | yeah we're all set to test raw image upload but i have not done so yet | 19:03 |
clarkb | last I saw there were promising results using zlib to compress raw images for their shuffling between locations. Gets the images down to a size similar to qcow2 and then timing after the compression is similar | 19:03 |
corvus | clarkb: zstd actually | 19:04 |
clarkb | oh oops my bad | 19:04 |
corvus | i benchmarked a bunch and zstd hit the sweet spot for time/space | 19:04 |
clarkb | but the important bit is that we can do a relatively quick compression that also gets us reasonable file sizes | 19:04 |
clarkb | and a reminder that adding new image builds for things other than debian bullseye is still helpful | 19:04 |
corvus | yep, so i'm expecting something like 20m for every image format we deal with | 19:05 |
clarkb | corvus: I susppose vhd may not compress similar to raw but that seems unlikely (iirc our vhd images are very similar to raw) | 19:06 |
clarkb | anything else on this topic? | 19:07 |
corvus | not from me | 19:07 |
clarkb | #topic Backup Server Pruning | 19:07 |
clarkb | the changes to automate the retirement and purging of backup users from the backup servers has landed | 19:07 |
clarkb | we also landed a change to retire and purge the borg-ethercalc02 user/backups from the vexxhost server to catch my manual work there up with the new automation | 19:08 |
clarkb | best I can tell this all went fine | 19:08 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/936203 retire these old unneeded backups in prep for eventual purging | 19:08 |
clarkb | this change is the next step in retiring the unnecessary backups that I identified previously | 19:08 |
clarkb | that should mark each of them retired then we should do a manual prune step which will reduce the total backups for each retired server to a single backup (rather than the weekly monthly etc set we keep) | 19:09 |
clarkb | we should do that to ensure the scripts work as expected before continuing to the new step. I think for future retirements this would happen somewhat naturally as we'd retire the server/service when it gets replace or removed then a few months later we would prune due to normal disk utilization | 19:10 |
clarkb | then at some point later we would purge them. In this case I'm hoping we can speed that up since we're catching up with the new sytstem against some very old backups | 19:10 |
clarkb | so basically land the retirements, manually run the prune and ensure the script is happy, then land a change to purge the backups | 19:10 |
clarkb | all that to say I think 936203 should be relatively safe at this point as the next major step is manual pruning and by definition we'redoing that when we can monitor it | 19:11 |
clarkb | thank you ianw for the good brainstorming and implementation of this | 19:12 |
clarkb | sometimes it is really helpful to have an outside perspective when you're just focused on fixing the immediate issue (high disk utilization) | 19:12 |
clarkb | #topic Upgrading old servers | 19:12 |
clarkb | tonyb: any updates here? fungi had to manually add some robots.txt handling and user agent filtering to the existing wiki to prevent the server from being overwhelemed | 19:13 |
clarkb | those may need to be resynced into the changes for deploying a new wiki | 19:13 |
fungi | i also did another quick check of user agents hitting wiki today and suspect the top offender(s) is/are bots with faked random user agent strings | 19:14 |
tonyb | sorry no upsates from me. | 19:14 |
corvus | did "brief downtime" help with reading robots.txt? i missed the end of that story | 19:15 |
corvus | re-reading robots.txt after update | 19:15 |
fungi | there are 6 distinct agents making up >99% of the request volume and all claiming to be smart phones | 19:15 |
fungi | corvus: it didn't seem to slow anything down, no | 19:15 |
clarkb | even the new meta agent doesn't seem to respect crawl delay | 19:15 |
fungi | load average there is nearly 100 at the moment, after telling a couple of llm training crawlers who did identify themselves to go away | 19:16 |
clarkb | its unfortunate that the internet has basically become a race to download as much info as possible. I seem to recall a futurama episode with this plot | 19:16 |
corvus | :( | 19:16 |
corvus | those who don't watch futurama reruns are doomed to repeat them or something i think | 19:17 |
clarkb | anyway if there are not other updates we can continue on | 19:18 |
clarkb | #topic Docker Hub Rate Limits | 19:18 |
clarkb | we managed to disable proxy caching for docker hub and my anecdotal percecption is that this has helped without making hte problem go away | 19:19 |
clarkb | I don't really have any other inptu at this point other than to say reducing any use of docker hub hosted images will only improve the isutation | 19:20 |
clarkb | so if you've got easy places to address that (for example we fixed where zuul-registry is fetched for buildeset registry jobs) that would be good | 19:20 |
corvus | if anyone wants to pitch in on my mirror change, feel free | 19:20 |
clarkb | oh and there is the rehosting toolset in zuul-jobs that corvus is working on | 19:20 |
* clarkb finds a link | 19:21 | |
corvus | it's a simple change with complex testing :) | 19:21 |
clarkb | #link https://review.opendev.org/c/zuul/zuul-jobs/+/935574 | 19:21 |
corvus | that's the one | 19:21 |
corvus | i think it's almost there | 19:21 |
clarkb | #topic Docker compose plugin with podman service for servers | 19:24 |
clarkb | just to followup from last week I wanted to record that we're going to try and modify our docker and docker compose installation roles to behave differently on noble and newer so that we can somewhat transparent transition to things as they are deployed on enwer hosts | 19:24 |
clarkb | this will aid our transition to new docker compose and podman by making the transition point a redeployment from older ubuntu to noble or newer | 19:26 |
clarkb | but other than that I think its mostly just a matter of getting that work done at this point | 19:26 |
clarkb | so may drop this off of next week's meeting | 19:26 |
clarkb | any questions/concerns/thoughts before we move on? | 19:26 |
clarkb | #topic Gerrit 3.10 Upgrade Planning | 19:28 |
clarkb | Gerrit just released new bugfix releases for 3.9 and 3.10 | 19:28 |
clarkb | my plan is to get new images built and test nodes held today (I hope) so that I can test these new images with some manually upgrade and downgrade testing early next week | 19:29 |
clarkb | assuming that all comes together I think we are still on track for upgrading on December 6 | 19:29 |
clarkb | I want to say they did a release last yaer during thanksgiving week too... | 19:29 |
clarkb | #link https://etherpad.opendev.org/p/gerrit-upgrade-3.10 Gerrit upgrade planning document | 19:31 |
clarkb | review of that very much welcome particualrly as we're less than 2 weeks away | 19:31 |
clarkb | but otherwise I think the ball is back in my court to catch up to these new gerrit releases | 19:31 |
clarkb | #topic Renaming mailman's openinfra.dev vhost to openinfra.org | 19:31 |
clarkb | #link https://etherpad.opendev.org/p/lists-openinfra-org-migration migration plan document | 19:31 |
clarkb | fungi is planning to do this work on December 2 (tahts monday I think) | 19:31 |
fungi | yep | 19:32 |
fungi | currently hacking away on it | 19:32 |
fungi | hoping to have the ansible change up for review later today | 19:33 |
clarkb | anything else we should know or can do to help? | 19:34 |
fungi | (have everything done except the apache redirects, i think) | 19:34 |
fungi | i've got a held job node i'm importing a production data set into with exim disabled, and will step through the database queries once i finish nailing them down | 19:34 |
fungi | i'll also send out a note to service-announce after today's meeting about downtime, if there are no objections | 19:35 |
fungi | planning for 15:00-17:00 but should hopefully go faster | 19:36 |
clarkb | wfm | 19:37 |
clarkb | #topic Upgrading Gitea to 1.22.4 | 19:39 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/936198 Upgrade gitea to 1.22.4 | 19:39 |
clarkb | Gitea made a new 1.22 release yesterday. Unfortunately I don't think they expect this to address the OOM issue with 1.22 | 19:40 |
clarkb | however there are a number of other bugfixes and performance improvements so this seems worthwhile? | 19:40 |
clarkb | nto sure if we want to send it in now or delay for next week | 19:40 |
clarkb | testing seems clean and none of the templates we override were updated | 19:40 |
clarkb | #topic Open Discussion | 19:43 |
clarkb | As mentioned Gerrit just did releases which I'll start poking at after lunch | 19:43 |
clarkb | also I'm trying to land the stack that ends at https://review.opendev.org/c/opendev/system-config/+/936297 once we can confirm the captcha's are rendering better with teh screenshot | 19:44 |
clarkb | trying to encourage people who fix bugs like that for us | 19:44 |
clarkb | In the process of working on ^ I noticed that we may have overdeleted an insecure ci registry hosted blob as part of the original pruning process | 19:44 |
clarkb | I don't understand that yet but will look at it. If it is a timing issue then I suspect our daily prunes will be less susceptible as they take less than two hours but the initial prune took ~7 days | 19:45 |
fungi | i've got the aforementioned lists.openinfra.org gerrit change up for review and linked in the maintenance planning etherpad, i'll set it wip for now though | 19:45 |
fungi | if anybody spots potential problems with it, let me know, but otherwise my focus is shifting to migration testing on the held node so i can flesh out the database queries in the pad | 19:47 |
clarkb | considering we haven't seen any other complaints about 404s from the insecure ci registry hosted blobs my hunch is that htis is more of a corner case tahn something very wrong which is different to the other bugs we have already fixed in the process | 19:47 |
clarkb | fungi: will do thanks | 19:47 |
clarkb | I'll give it until 19:50 then call it if there is nothing else | 19:48 |
clarkb | ok thanks everyone! | 19:50 |
clarkb | enjoy the holiday and we'll be back next week | 19:50 |
clarkb | #endmeeting | 19:50 |
opendevmeet | Meeting ended Tue Nov 26 19:50:55 2024 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:50 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-26-19.00.html | 19:50 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-26-19.00.txt | 19:50 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2024/infra.2024-11-26-19.00.log.html | 19:50 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!