Tuesday, 2025-01-07

clarkbJust a couple minutes until our first meeting of the year18:58
tonybThe excitement is palpable 18:59
* fungi palpates excitedly18:59
corvusthe pre-ping was helpful! :)18:59
fungi"helpful"18:59
* fungi wants to return to his winter slubmer18:59
clarkb#startmeeting infra19:00
opendevmeetMeeting started Tue Jan  7 19:00:05 2025 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:00
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:00
opendevmeetThe meeting name has been set to 'infra'19:00
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/M5YEAMYDHKFBK7BJKHIQVFKGXJEW3KZF/ Our Agenda19:00
clarkb#topic Announcements19:00
clarkbWelcome to 2025.  Apologies for interrupting the winter slumber19:00
tonybI won't be around much next week19:00
fungi(or summer slumber in tonyb's case)19:01
clarkbtonyb: ack thanks for the heads up19:01
tonyb(Taking an actual vacation)19:01
clarkbI didn't really have anything to announce today. Anyone else with an announcement?19:01
clarkb#topic Zuul-launcher image builds19:02
clarkbcorvus: I feel like the holidays were just long enough that I've forgotten where we were on this19:02
clarkbI know there was the raw image handling and testing19:02
corvusmostly waiting on the #niz stack (which is in flaky-test fixing mode now) reviews welcome but not necessary at this time19:03
fungii wiped my brain clean like an etch-a-sketch, so don't feel bad19:03
corvusthat stack adds web ui and more image lifecycle stuff19:03
corvusso it'll be good to have that in place before more live testing19:03
clarkbcorvus: did the API stuff land (and presumably deployed through our weekly deployments)19:04
corvusnope that's interleaved in that stack19:04
corvusit's completely separate so i think it's actually okay to do a single-phase merge for that instead of our usual two-phase19:04
clarkbok /me scribbles a note to try and review that stack if paperwork gets done19:05
corvusmore urgent than that though is the dockerhub image stuff :)19:05
clarkbyup that has its own agenda item today19:05
corvusi don't relish the idea of trying to get all that merged before that :)19:05
clarkbanything else on this topic or should we continue on so that we can get to the container image mirroring?19:06
corvuscontinue i think19:06
clarkb#topic Deploying new Noble Servers19:06
clarkbmy podman prep change that updates install-docker to install podman and docker compose on Noble landed last week19:06
clarkbwe don't have any noble servers running containers yet so that was a noop (and I spot checked the deployments to ensure I dind't miss anything)19:07
clarkbbut that means next step is now to deploy a new server that uses containers on Noble19:07
fungiwhat's a good candidate?19:07
clarkbmy hope is that I'll get beginning of the year paperwork stuff done tomorrowish and I can start on a new paste deployment late this week19:07
tonybI did help my ansible-devel changes which use install-docker on noble19:07
fungilodgeit/paste?19:07
clarkbfungi: my plan is paste since that is representative with a database but also small and simple19:07
fungior maybe mediawiki19:08
fungiyeah, paste feels right as a canary19:08
clarkbtonyb: have you seen any problems with it or is it working in CI for your ansible devel stuff?19:08
clarkbI guess that is the main callout on this topic today. The big change is in but I haven't put it to use in production yet. If you do put it to use and have feedback that is very much welcome19:09
tonybclarkb: No issues but I don't actually *use* compose but it installs19:09
fungiit there some possibility of the ansible-devel job getting back into a passing state again? that's exciting19:09
clarkbya I think on bridge (which ansible-devel works on) its minimal use of containers19:09
tonybfungi: if nothing else it sets somewhat of a timeline for dropping xenial and moving bridge to noble19:10
clarkbso ya let me know if you notice oddities, but I think we can continue on19:11
fungithat would be great19:11
tonybboth of which are needed before we can update the ansible version19:11
clarkb#topic Upgrading old servers19:11
clarkbthe discussion seems to be trending into this topic anyway19:11
clarkbya the issue with newer ansible is it won't run on remote hosts with older python19:11
clarkbhistorically ansible has maintained a wide array of python support for the remote side (the control side was more restricted)19:12
clarkbbut that has changed recently with a quite reduced set of supported python versions19:12
clarkbanyway other than the noble work above is there anything else to be aware for upgrading servers? I think we're going to end up needing to focus on this for the first half of the year19:13
clarkbbut I suspect that a lot of those jumps will be to noble so getting that working well upfront seems worthwhile19:13
tonybI wanted to say I started looking at mediawiki again19:13
tonybI'd like to send and announce to service-announce this week19:14
tonyb#link https://etherpad.opendev.org/p/opendev-wiki-announce19:14
clarkbtonyb: are there new patches to review yet? (I haven't followed the irc notifications too closely the last few weeks)19:14
clarkbbut ya announcing that soonish seems good19:14
tonybNo new patches, but I addressed a bunch of feedback yesterday19:15
fricklerI've looked at some old content on the wiki recently, and I do wonder whether it would be better to start fresh19:16
clarkbok I'll try to catch back up on the announcement and the review stack today or tomorrow ish so that the announcement at the end of week schedule isn't held up by me (though I had looekd at them previously and its probably fine to proceed)19:16
fricklerand possibly just leave a read-only copy available somewhere19:16
JayFAs long as that read-only copy will exist as long as the new wiki will, that sounds like an excellent idea19:17
fungitonyb: exciting that you're so close! announcement looks fine other than shoehorning you into using jammy, you might end up wanting noble depending on the timeline19:17
tonybwhere we you two a year ago ;P19:17
clarkbI think I'm on the fence about starting over19:18
corvusi don't think we should start over19:18
clarkbanyone can simply start over on any content that is old as is and that avoids needing to maintain two wikis19:18
corvusclarkb: ++19:18
clarkband its not like starting over prevents things from becoming stale all over again. The fundamental problem is it needs curation and that doesn't change19:18
fungii don't see starting over as a solution, it doesn't solve the problem of the wiki containing old and abandoned content, merely resets the starting point for that19:18
JayFThat's true in the most general of senses, but the reality is we have years-old content in many places. That content is unlikely to *ever* be curated, so having it not carried over to confuse people would be nice.19:19
fungiit will continue to be a problem19:19
clarkbJayF: right but anyone can just go and archive that content today right?19:19
JayFIt's a lot easier to leave something behind than it is to delete something -- it's very hard to know when it's appropriate to remove it19:19
clarkbwe don't need to start over and host a special frozen wiki19:19
tonybfungi: I think I'd like to stick with Jammy and do the OS change once we're on the latest (mediawiki) LTS19:19
clarkb(mediawiki archives things when you delete them aiui so you can jsut delet them if you're 80% certani ti should be deleted)19:20
fungibetter would be to actively delete any concerning old content, since starting from scratch doesn't prevent new content from ceasing to be cared for and ending up in the same state. or do we need semi-annual domain changes in order to force content refreshes?19:20
corvusif, let's say, the openstack project feels pretty strongly about reducing the confusion caused by outdated articles, one approach would be a one-time mass deletion/archiving of those articles.19:21
corvusor, something more like what wikipedia does: mass-additions of "this may be outdated" banners to articles.19:21
JayFfungi: wiki.dalmatian.openstack.org here we come? /s19:21
JayFI think a one time mass-update with "this is old" banners, or archiving old information, is a good idea19:22
fungithere's probably a mw plugin to automatically insert admonitions in pages that haven't seen an edit in x years19:22
clarkbhttps://www.mediawiki.org/wiki/Manual:Archive_table says things get archived when deleted19:22
clarkbso ya I think we can keep the current content and its curators can delete as necessary19:22
clarkbhashar can probably confirm that for us and you can test with a throwaway page19:23
fricklermakes sense, I'll try to take a look at that19:23
tonybfrickler: Thanks.19:23
clarkbalright anything else related to server upgrades?19:23
tonybnot from me19:24
clarkb#topic Mirroring Useful Container Images19:24
clarkbthe docker hub rate limit problems continue to plague us and others19:25
clarkbcorvus has made progress in setting up jobs to mirror useful images from docker hub to another registry (quay in this case) to alleviate the problem19:25
clarkb#link https://review.opendev.org/c/opendev/system-config/+/938508 Initial change to mirror images that are generically useful19:25
clarkbI have some thoughts on improving the tags that are mirrored, but I think that is good for a followup19:26
clarkbfor a first pass we should start smallish and make sure everything is owrking first?19:26
clarkbcorvus: those jobs will run in the periodic pipeline which means they will trigger at 0200 utc iirc and then fight for resources with all of the rest of the periodic jobs19:27
clarkbjust wondering if we should be careful when we merge that so that we can check nothing is exposed and quickly regen the api key if that does happen?19:28
corvusyeah... do we have an earlier pipeline so we can jump the gun? :)  or we can just see if the time is a problem.19:28
clarkbcorvus: we have the hourly opendev pipeline which might work temporarily? Also I realized that the python- images already have images in opendevorg19:29
corvusthey must be out of date though19:29
clarkbI don't think those tags exists so I'm not sure that is ap roblem but wondering if we should be careful with those collisions to start19:29
clarkbya they would be old versions19:29
clarkbgerrit will collide too19:30
fricklerdownstream I added a second daily pipeline that runs 3h earlier for image builds that are then consumed by "normal" daily jobs, maybe we want to do something similar?19:30
corvusi think mirroring existing or non-existing tags is what we want...19:30
clarkbcorvus: ya I think its fine for python-19:30
corvusclarkb: interesting point on gerrit; maybe we should prefix with "mirror" or something?  or even make a new org?19:30
clarkbcorvus: ya I think for things like gerrit we need to namespace further either with a prefix or a new org19:30
clarkbwe could also use different tags but I suspect that would be more confusing19:31
corvushow about new org: opendevmirror?19:31
clarkbcorvus: I like that19:31
fungiit's clear enough as to what it is, wfm19:31
corvusfrickler: ack; sounds like a good solution if 0200 is a problem19:32
frickler+1 to opendevmirror19:32
clarkbok so make new org to namespace things and avoid collisions with stuff opendev wants to host itself eventually. Keep the initial list small like we've got currently. Then followup with additional tags etc19:32
tonyb++ on opendevmirror19:32
fungiadding a new timer trigger pipeline is cheap if we decide there is sufficient need to warrant it19:32
corvussounds good; any of those images we want to say don't belong there and should instead be handled by a job in the zuul tenant?19:33
frickleranother reason to do that: run before the normal periodic rush eats up more rate limits?19:33
fungii guess it's worth monitoring for failures to decide on the separate pipeline19:33
clarkbcorvus: I think the only one that opendev doesn't consume today is httpd19:33
corvusfrickler: yeah, that's the main problem i could see from using 0200, but don't know how bad it will be yet19:34
clarkbcorvus: but that seems generic enough that I'm happy for us to have it19:34
clarkb(we use the gerrit image in gerritlib testing iirc)19:34
corvus#action corvus make opendevmirror quay.org and update 93850819:34
corvus#undo19:35
corvus#action corvus make opendevmirror quay.io org and update 93850819:35
clarkbanything else on this topic?19:35
fungiwrt separate vs existing pipeline, i have no objection other than not wanting to prematurely overengineer it19:35
corvusfungi: ++19:36
clarkbya I wouldn't go out of our way to add ap ipeline just yet but if an alternative to periodic already exists that might be a good option to start19:36
clarkbok there are a few more things I want to get into so lets keep moving19:37
clarkb#topic Gerrit H2 Cache File Growth19:37
clarkbJust before we all enjoyed some holiday time off we restarted Gerrit thinking it would be fine since we weren't changing the image19:37
fungiit's just a restart, wcpgw?19:37
clarkbturns out we were wrong and the underlying issue appears to be the growth of the git_file_diff and gerrit_file_diff h2 database cache backing files19:37
clarkbone of them was over 200GB iirc19:37
clarkbon startup Gerrit attempts to do db cleanup to prune caches down to size but this only affects the content within the db and not the db file itself19:38
clarkbhowever I suspect that h2 becomes very unperformant when the backing file is that size and we had problems. We stopped Gerrit again and then moved the caches aside forcing gerrit to start over with a clean cache file19:38
clarkblast I checked those cache files had already regrown back to abou 20GB in size19:39
fungithankfully hashar was no stranger to this problem19:39
clarkbya hashar points out the default h2 compaction time is like 200ms which isn't enough for files of this size to be compacted down to a resonable size19:39
clarkb#link https://review.opendev.org/c/opendev/system-config/+/938000 Suggested workaround from Hashar improves compaction when we do shutdown19:39
clarkbhashar's suggestion is that we allow compaction to run for up to 15 seconds instead. This compaction only runs on Gerrit shutdown though19:39
clarkbwhich means if we aren't shutting down Gerrit often the files could still grow quite a bit. I'm thinking its still a good toggle to chagne though as it should help when we do shutdown19:40
fungiand which i suppose could get skipped also in an unplanned outage19:40
clarkbbut its also got me thinking maybe we should revert my changes to allow those caches to be bigger (pruned daily but still leading to fragmentation on disk)19:40
fungihave we observed performance improvements from the larger caches?19:41
clarkbthe goal with ^ is maybe daily pruning with smaller limits will reduce the fragmentation of the h2 backing files which leads to the growth19:41
clarkbfungi: not really no19:41
clarkbfungi: I was hoping that doing so would speedup gerrit startups because it wouldn't need to prune so much on startup19:41
fungii don't object to longer shutdown times if there's a runtime performance improvement to balance them out19:41
clarkbbut it seems the slowness may have been related to the size of the backing file all along19:41
corvusif we revert your change, would that actually help this?  the db may still produce just as much fragmentation garbage19:42
clarkbcorvus: right it isn't clear if the revert would helpf significantly since the cache is only pruned once a day and the sizes I picked were based on ~1day sizes anyway19:42
fungialso all of this occurred during what is traditionally our slowest activity time of the year19:43
fungiso we may not have great anecdotal exepriences with its impact either way19:43
clarkbthe limit is 2GB today on one of them whcih means that is our floor. In theory it may grow up to 4gb in size before its daily pruning down to 2gb. If we revert my change the old limit was 256mb iirc so we'd prune from 2gb ish down to 256mb ish19:43
clarkbbut I'm happy to change one thing at a time if we just want to start with increasing the compaction time19:44
clarkbthat would look something like updating the configs for that h2 setting, stopping gerrit, starting gerrit, then probably stopping gerrit again to see if compaction does what we expect before starting gerrit agagin19:44
clarkba bit back and forth/flappy but I think important to actual observe the improvement19:44
fungisounds fine to me19:45
clarkbanyway if that seems reasonable leave a review on the change above (938000) and I'm happy to approve teh change and drive those restarts at an appropriate time19:45
fungii doubt our user experience is granular enough to notice the flapping19:45
clarkblooks like corvus  and tonyb  already voted in favor so I'll proceed with change one thing at a time for now plan and that one thing is increased compaction time19:46
clarkb#topic Rax-ord Noble Nodes with 1 VCPU19:46
clarkbI've kept this agenda item because I wanted to followup and check if anyone had looked into a sanity check for our base pre playbook to early fail instances with only one vcpu on rax xen19:46
clarkbI suspect this is a straightforward bit of ansible that looks at ansible facts but we do want to be careful to test it with base-test first to avoid unexepcted fallout19:47
clarkbsounds like no. Thats fine and the problem is intermittent. I'll probably drop this from next weeks agenda and we can put it back if we need to (eg further debugging or problem gets worse etc)19:48
clarkb#topic Service Coordinator Election19:48
clarkbit is almost that time of year again where we need to elect a service coordinator for OpenDev19:49
clarkbIn the meeting agenda I wrote down this proposal: Nominations Open From February 4, 2025 to February 18, 2025. Voting February 19, 2025 to February 26, 2025. All times and dates will be UTC based.19:49
clarkbthis is basically a year after the first election in 2024 so should line up to 6 months after the last election19:50
clarkbif that schedule doesn't work for some reason (holiday, travel etc) please let me know between now and our next meeting but I think we can probably make this plan official next week if nothing comes up before then19:50
tonyb++19:51
clarkband start thinking about whether or not you'd like to run. I'm happy to support anyone that may be interested in taking on the role.19:51
clarkb#topic Beginning of the Year (Virtual) Meetup19:51
clarkband for the last agenda item I'd like to try and do something similar to the pre ptg we did early 2024. I know we said we should do more of these then we didn't... but I think doing something like that early in the year is a good idea at the very least19:52
tonybSounds good to me19:52
clarkbLooking at a calendar I think one of the last two weeks of January would work for me so something like 21-23 or 28-30 ish19:53
clarkbfebruary is harder for me with random dentist and doctor appointments scattered through the month though I'm sure we can make something work if January doesn't19:53
clarkbany opinions on willingness / ability to participate and if able when works best?19:53
fungii've got some travel going on for teh 15th through the 20th, but that should be doable for me19:54
fungiin january i mean19:54
* frickler is still very unclear on the ability part, will need to decide short term19:54
tonyb21-23 would be my preference as I can be more flexible with my awake hours that week which may make it easier/possible to get us all in "one place"19:55
corvuslunar new year is jan 29.  early 20s sounds good.19:55
clarkbok lets pencil in the days of 21-23. I will start working on compiling some agenda content and then we can nail down what hours work best as we get closer and have a better understanding of total content19:56
clarkbfrickler: and I guess let me know when you have better clarity19:56
fricklersure19:57
tonybclarkb: perfect19:57
clarkb#topic Open Discussion19:57
clarkbAnything else?19:57
corvusgosh there's a lot of steps to set up a quay org19:58
clarkboh I was also going to try and bring up the h2 db thing upstream19:58
clarkbjust to see if any other gerrit folks have input in addition to hashar19:58
fungithere were some extra steps just to (re)use my existing quay/rh account19:58
corvusapparently there's a lot of "inviting" accounts and users to join teams, which means a lot of clicking buttons in emails19:59
corvussome infra-root folks should have some email invites19:59
fungithey seemed to want me to fit my job role and position into some preset list that didn't even have "other" options19:59
corvusand we may need to revisit the set of infra-root that own these orgs19:59
clarkbcorvus: yup I see an invite20:00
fungii'm now an "it - operations, engineer"20:00
clarkbI'll look at that after lunch20:00
corvusoh, yes, the root account is now a "System Administrator" in "IT Operations"20:00
clarkbhaha20:00
clarkband we are at time20:00
tonybLOL20:00
clarkbthank you everyone20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Jan  7 20:00:47 2025 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-07-19.00.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-07-19.00.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-07-19.00.log.html20:00
fungithanks clarkb!20:00
tonybThanks clarkb, Thanks all20:00
* tonyb goes to make coffee20:01
* fungi goes to make beer20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!