clarkb | Just a couple minutes until our first meeting of the year | 18:58 |
---|---|---|
tonyb | The excitement is palpable | 18:59 |
* fungi palpates excitedly | 18:59 | |
corvus | the pre-ping was helpful! :) | 18:59 |
fungi | "helpful" | 18:59 |
* fungi wants to return to his winter slubmer | 18:59 | |
clarkb | #startmeeting infra | 19:00 |
opendevmeet | Meeting started Tue Jan 7 19:00:05 2025 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:00 |
opendevmeet | The meeting name has been set to 'infra' | 19:00 |
clarkb | #link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/M5YEAMYDHKFBK7BJKHIQVFKGXJEW3KZF/ Our Agenda | 19:00 |
clarkb | #topic Announcements | 19:00 |
clarkb | Welcome to 2025. Apologies for interrupting the winter slumber | 19:00 |
tonyb | I won't be around much next week | 19:00 |
fungi | (or summer slumber in tonyb's case) | 19:01 |
clarkb | tonyb: ack thanks for the heads up | 19:01 |
tonyb | (Taking an actual vacation) | 19:01 |
clarkb | I didn't really have anything to announce today. Anyone else with an announcement? | 19:01 |
clarkb | #topic Zuul-launcher image builds | 19:02 |
clarkb | corvus: I feel like the holidays were just long enough that I've forgotten where we were on this | 19:02 |
clarkb | I know there was the raw image handling and testing | 19:02 |
corvus | mostly waiting on the #niz stack (which is in flaky-test fixing mode now) reviews welcome but not necessary at this time | 19:03 |
fungi | i wiped my brain clean like an etch-a-sketch, so don't feel bad | 19:03 |
corvus | that stack adds web ui and more image lifecycle stuff | 19:03 |
corvus | so it'll be good to have that in place before more live testing | 19:03 |
clarkb | corvus: did the API stuff land (and presumably deployed through our weekly deployments) | 19:04 |
corvus | nope that's interleaved in that stack | 19:04 |
corvus | it's completely separate so i think it's actually okay to do a single-phase merge for that instead of our usual two-phase | 19:04 |
clarkb | ok /me scribbles a note to try and review that stack if paperwork gets done | 19:05 |
corvus | more urgent than that though is the dockerhub image stuff :) | 19:05 |
clarkb | yup that has its own agenda item today | 19:05 |
corvus | i don't relish the idea of trying to get all that merged before that :) | 19:05 |
clarkb | anything else on this topic or should we continue on so that we can get to the container image mirroring? | 19:06 |
corvus | continue i think | 19:06 |
clarkb | #topic Deploying new Noble Servers | 19:06 |
clarkb | my podman prep change that updates install-docker to install podman and docker compose on Noble landed last week | 19:06 |
clarkb | we don't have any noble servers running containers yet so that was a noop (and I spot checked the deployments to ensure I dind't miss anything) | 19:07 |
clarkb | but that means next step is now to deploy a new server that uses containers on Noble | 19:07 |
fungi | what's a good candidate? | 19:07 |
clarkb | my hope is that I'll get beginning of the year paperwork stuff done tomorrowish and I can start on a new paste deployment late this week | 19:07 |
tonyb | I did help my ansible-devel changes which use install-docker on noble | 19:07 |
fungi | lodgeit/paste? | 19:07 |
clarkb | fungi: my plan is paste since that is representative with a database but also small and simple | 19:07 |
fungi | or maybe mediawiki | 19:08 |
fungi | yeah, paste feels right as a canary | 19:08 |
clarkb | tonyb: have you seen any problems with it or is it working in CI for your ansible devel stuff? | 19:08 |
clarkb | I guess that is the main callout on this topic today. The big change is in but I haven't put it to use in production yet. If you do put it to use and have feedback that is very much welcome | 19:09 |
tonyb | clarkb: No issues but I don't actually *use* compose but it installs | 19:09 |
fungi | it there some possibility of the ansible-devel job getting back into a passing state again? that's exciting | 19:09 |
clarkb | ya I think on bridge (which ansible-devel works on) its minimal use of containers | 19:09 |
tonyb | fungi: if nothing else it sets somewhat of a timeline for dropping xenial and moving bridge to noble | 19:10 |
clarkb | so ya let me know if you notice oddities, but I think we can continue on | 19:11 |
fungi | that would be great | 19:11 |
tonyb | both of which are needed before we can update the ansible version | 19:11 |
clarkb | #topic Upgrading old servers | 19:11 |
clarkb | the discussion seems to be trending into this topic anyway | 19:11 |
clarkb | ya the issue with newer ansible is it won't run on remote hosts with older python | 19:11 |
clarkb | historically ansible has maintained a wide array of python support for the remote side (the control side was more restricted) | 19:12 |
clarkb | but that has changed recently with a quite reduced set of supported python versions | 19:12 |
clarkb | anyway other than the noble work above is there anything else to be aware for upgrading servers? I think we're going to end up needing to focus on this for the first half of the year | 19:13 |
clarkb | but I suspect that a lot of those jumps will be to noble so getting that working well upfront seems worthwhile | 19:13 |
tonyb | I wanted to say I started looking at mediawiki again | 19:13 |
tonyb | I'd like to send and announce to service-announce this week | 19:14 |
tonyb | #link https://etherpad.opendev.org/p/opendev-wiki-announce | 19:14 |
clarkb | tonyb: are there new patches to review yet? (I haven't followed the irc notifications too closely the last few weeks) | 19:14 |
clarkb | but ya announcing that soonish seems good | 19:14 |
tonyb | No new patches, but I addressed a bunch of feedback yesterday | 19:15 |
frickler | I've looked at some old content on the wiki recently, and I do wonder whether it would be better to start fresh | 19:16 |
clarkb | ok I'll try to catch back up on the announcement and the review stack today or tomorrow ish so that the announcement at the end of week schedule isn't held up by me (though I had looekd at them previously and its probably fine to proceed) | 19:16 |
frickler | and possibly just leave a read-only copy available somewhere | 19:16 |
JayF | As long as that read-only copy will exist as long as the new wiki will, that sounds like an excellent idea | 19:17 |
fungi | tonyb: exciting that you're so close! announcement looks fine other than shoehorning you into using jammy, you might end up wanting noble depending on the timeline | 19:17 |
tonyb | where we you two a year ago ;P | 19:17 |
clarkb | I think I'm on the fence about starting over | 19:18 |
corvus | i don't think we should start over | 19:18 |
clarkb | anyone can simply start over on any content that is old as is and that avoids needing to maintain two wikis | 19:18 |
corvus | clarkb: ++ | 19:18 |
clarkb | and its not like starting over prevents things from becoming stale all over again. The fundamental problem is it needs curation and that doesn't change | 19:18 |
fungi | i don't see starting over as a solution, it doesn't solve the problem of the wiki containing old and abandoned content, merely resets the starting point for that | 19:18 |
JayF | That's true in the most general of senses, but the reality is we have years-old content in many places. That content is unlikely to *ever* be curated, so having it not carried over to confuse people would be nice. | 19:19 |
fungi | it will continue to be a problem | 19:19 |
clarkb | JayF: right but anyone can just go and archive that content today right? | 19:19 |
JayF | It's a lot easier to leave something behind than it is to delete something -- it's very hard to know when it's appropriate to remove it | 19:19 |
clarkb | we don't need to start over and host a special frozen wiki | 19:19 |
tonyb | fungi: I think I'd like to stick with Jammy and do the OS change once we're on the latest (mediawiki) LTS | 19:19 |
clarkb | (mediawiki archives things when you delete them aiui so you can jsut delet them if you're 80% certani ti should be deleted) | 19:20 |
fungi | better would be to actively delete any concerning old content, since starting from scratch doesn't prevent new content from ceasing to be cared for and ending up in the same state. or do we need semi-annual domain changes in order to force content refreshes? | 19:20 |
corvus | if, let's say, the openstack project feels pretty strongly about reducing the confusion caused by outdated articles, one approach would be a one-time mass deletion/archiving of those articles. | 19:21 |
corvus | or, something more like what wikipedia does: mass-additions of "this may be outdated" banners to articles. | 19:21 |
JayF | fungi: wiki.dalmatian.openstack.org here we come? /s | 19:21 |
JayF | I think a one time mass-update with "this is old" banners, or archiving old information, is a good idea | 19:22 |
fungi | there's probably a mw plugin to automatically insert admonitions in pages that haven't seen an edit in x years | 19:22 |
clarkb | https://www.mediawiki.org/wiki/Manual:Archive_table says things get archived when deleted | 19:22 |
clarkb | so ya I think we can keep the current content and its curators can delete as necessary | 19:22 |
clarkb | hashar can probably confirm that for us and you can test with a throwaway page | 19:23 |
frickler | makes sense, I'll try to take a look at that | 19:23 |
tonyb | frickler: Thanks. | 19:23 |
clarkb | alright anything else related to server upgrades? | 19:23 |
tonyb | not from me | 19:24 |
clarkb | #topic Mirroring Useful Container Images | 19:24 |
clarkb | the docker hub rate limit problems continue to plague us and others | 19:25 |
clarkb | corvus has made progress in setting up jobs to mirror useful images from docker hub to another registry (quay in this case) to alleviate the problem | 19:25 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/938508 Initial change to mirror images that are generically useful | 19:25 |
clarkb | I have some thoughts on improving the tags that are mirrored, but I think that is good for a followup | 19:26 |
clarkb | for a first pass we should start smallish and make sure everything is owrking first? | 19:26 |
clarkb | corvus: those jobs will run in the periodic pipeline which means they will trigger at 0200 utc iirc and then fight for resources with all of the rest of the periodic jobs | 19:27 |
clarkb | just wondering if we should be careful when we merge that so that we can check nothing is exposed and quickly regen the api key if that does happen? | 19:28 |
corvus | yeah... do we have an earlier pipeline so we can jump the gun? :) or we can just see if the time is a problem. | 19:28 |
clarkb | corvus: we have the hourly opendev pipeline which might work temporarily? Also I realized that the python- images already have images in opendevorg | 19:29 |
corvus | they must be out of date though | 19:29 |
clarkb | I don't think those tags exists so I'm not sure that is ap roblem but wondering if we should be careful with those collisions to start | 19:29 |
clarkb | ya they would be old versions | 19:29 |
clarkb | gerrit will collide too | 19:30 |
frickler | downstream I added a second daily pipeline that runs 3h earlier for image builds that are then consumed by "normal" daily jobs, maybe we want to do something similar? | 19:30 |
corvus | i think mirroring existing or non-existing tags is what we want... | 19:30 |
clarkb | corvus: ya I think its fine for python- | 19:30 |
corvus | clarkb: interesting point on gerrit; maybe we should prefix with "mirror" or something? or even make a new org? | 19:30 |
clarkb | corvus: ya I think for things like gerrit we need to namespace further either with a prefix or a new org | 19:30 |
clarkb | we could also use different tags but I suspect that would be more confusing | 19:31 |
corvus | how about new org: opendevmirror? | 19:31 |
clarkb | corvus: I like that | 19:31 |
fungi | it's clear enough as to what it is, wfm | 19:31 |
corvus | frickler: ack; sounds like a good solution if 0200 is a problem | 19:32 |
frickler | +1 to opendevmirror | 19:32 |
clarkb | ok so make new org to namespace things and avoid collisions with stuff opendev wants to host itself eventually. Keep the initial list small like we've got currently. Then followup with additional tags etc | 19:32 |
tonyb | ++ on opendevmirror | 19:32 |
fungi | adding a new timer trigger pipeline is cheap if we decide there is sufficient need to warrant it | 19:32 |
corvus | sounds good; any of those images we want to say don't belong there and should instead be handled by a job in the zuul tenant? | 19:33 |
frickler | another reason to do that: run before the normal periodic rush eats up more rate limits? | 19:33 |
fungi | i guess it's worth monitoring for failures to decide on the separate pipeline | 19:33 |
clarkb | corvus: I think the only one that opendev doesn't consume today is httpd | 19:33 |
corvus | frickler: yeah, that's the main problem i could see from using 0200, but don't know how bad it will be yet | 19:34 |
clarkb | corvus: but that seems generic enough that I'm happy for us to have it | 19:34 |
clarkb | (we use the gerrit image in gerritlib testing iirc) | 19:34 |
corvus | #action corvus make opendevmirror quay.org and update 938508 | 19:34 |
corvus | #undo | 19:35 |
corvus | #action corvus make opendevmirror quay.io org and update 938508 | 19:35 |
clarkb | anything else on this topic? | 19:35 |
fungi | wrt separate vs existing pipeline, i have no objection other than not wanting to prematurely overengineer it | 19:35 |
corvus | fungi: ++ | 19:36 |
clarkb | ya I wouldn't go out of our way to add ap ipeline just yet but if an alternative to periodic already exists that might be a good option to start | 19:36 |
clarkb | ok there are a few more things I want to get into so lets keep moving | 19:37 |
clarkb | #topic Gerrit H2 Cache File Growth | 19:37 |
clarkb | Just before we all enjoyed some holiday time off we restarted Gerrit thinking it would be fine since we weren't changing the image | 19:37 |
fungi | it's just a restart, wcpgw? | 19:37 |
clarkb | turns out we were wrong and the underlying issue appears to be the growth of the git_file_diff and gerrit_file_diff h2 database cache backing files | 19:37 |
clarkb | one of them was over 200GB iirc | 19:37 |
clarkb | on startup Gerrit attempts to do db cleanup to prune caches down to size but this only affects the content within the db and not the db file itself | 19:38 |
clarkb | however I suspect that h2 becomes very unperformant when the backing file is that size and we had problems. We stopped Gerrit again and then moved the caches aside forcing gerrit to start over with a clean cache file | 19:38 |
clarkb | last I checked those cache files had already regrown back to abou 20GB in size | 19:39 |
fungi | thankfully hashar was no stranger to this problem | 19:39 |
clarkb | ya hashar points out the default h2 compaction time is like 200ms which isn't enough for files of this size to be compacted down to a resonable size | 19:39 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/938000 Suggested workaround from Hashar improves compaction when we do shutdown | 19:39 |
clarkb | hashar's suggestion is that we allow compaction to run for up to 15 seconds instead. This compaction only runs on Gerrit shutdown though | 19:39 |
clarkb | which means if we aren't shutting down Gerrit often the files could still grow quite a bit. I'm thinking its still a good toggle to chagne though as it should help when we do shutdown | 19:40 |
fungi | and which i suppose could get skipped also in an unplanned outage | 19:40 |
clarkb | but its also got me thinking maybe we should revert my changes to allow those caches to be bigger (pruned daily but still leading to fragmentation on disk) | 19:40 |
fungi | have we observed performance improvements from the larger caches? | 19:41 |
clarkb | the goal with ^ is maybe daily pruning with smaller limits will reduce the fragmentation of the h2 backing files which leads to the growth | 19:41 |
clarkb | fungi: not really no | 19:41 |
clarkb | fungi: I was hoping that doing so would speedup gerrit startups because it wouldn't need to prune so much on startup | 19:41 |
fungi | i don't object to longer shutdown times if there's a runtime performance improvement to balance them out | 19:41 |
clarkb | but it seems the slowness may have been related to the size of the backing file all along | 19:41 |
corvus | if we revert your change, would that actually help this? the db may still produce just as much fragmentation garbage | 19:42 |
clarkb | corvus: right it isn't clear if the revert would helpf significantly since the cache is only pruned once a day and the sizes I picked were based on ~1day sizes anyway | 19:42 |
fungi | also all of this occurred during what is traditionally our slowest activity time of the year | 19:43 |
fungi | so we may not have great anecdotal exepriences with its impact either way | 19:43 |
clarkb | the limit is 2GB today on one of them whcih means that is our floor. In theory it may grow up to 4gb in size before its daily pruning down to 2gb. If we revert my change the old limit was 256mb iirc so we'd prune from 2gb ish down to 256mb ish | 19:43 |
clarkb | but I'm happy to change one thing at a time if we just want to start with increasing the compaction time | 19:44 |
clarkb | that would look something like updating the configs for that h2 setting, stopping gerrit, starting gerrit, then probably stopping gerrit again to see if compaction does what we expect before starting gerrit agagin | 19:44 |
clarkb | a bit back and forth/flappy but I think important to actual observe the improvement | 19:44 |
fungi | sounds fine to me | 19:45 |
clarkb | anyway if that seems reasonable leave a review on the change above (938000) and I'm happy to approve teh change and drive those restarts at an appropriate time | 19:45 |
fungi | i doubt our user experience is granular enough to notice the flapping | 19:45 |
clarkb | looks like corvus and tonyb already voted in favor so I'll proceed with change one thing at a time for now plan and that one thing is increased compaction time | 19:46 |
clarkb | #topic Rax-ord Noble Nodes with 1 VCPU | 19:46 |
clarkb | I've kept this agenda item because I wanted to followup and check if anyone had looked into a sanity check for our base pre playbook to early fail instances with only one vcpu on rax xen | 19:46 |
clarkb | I suspect this is a straightforward bit of ansible that looks at ansible facts but we do want to be careful to test it with base-test first to avoid unexepcted fallout | 19:47 |
clarkb | sounds like no. Thats fine and the problem is intermittent. I'll probably drop this from next weeks agenda and we can put it back if we need to (eg further debugging or problem gets worse etc) | 19:48 |
clarkb | #topic Service Coordinator Election | 19:48 |
clarkb | it is almost that time of year again where we need to elect a service coordinator for OpenDev | 19:49 |
clarkb | In the meeting agenda I wrote down this proposal: Nominations Open From February 4, 2025 to February 18, 2025. Voting February 19, 2025 to February 26, 2025. All times and dates will be UTC based. | 19:49 |
clarkb | this is basically a year after the first election in 2024 so should line up to 6 months after the last election | 19:50 |
clarkb | if that schedule doesn't work for some reason (holiday, travel etc) please let me know between now and our next meeting but I think we can probably make this plan official next week if nothing comes up before then | 19:50 |
tonyb | ++ | 19:51 |
clarkb | and start thinking about whether or not you'd like to run. I'm happy to support anyone that may be interested in taking on the role. | 19:51 |
clarkb | #topic Beginning of the Year (Virtual) Meetup | 19:51 |
clarkb | and for the last agenda item I'd like to try and do something similar to the pre ptg we did early 2024. I know we said we should do more of these then we didn't... but I think doing something like that early in the year is a good idea at the very least | 19:52 |
tonyb | Sounds good to me | 19:52 |
clarkb | Looking at a calendar I think one of the last two weeks of January would work for me so something like 21-23 or 28-30 ish | 19:53 |
clarkb | february is harder for me with random dentist and doctor appointments scattered through the month though I'm sure we can make something work if January doesn't | 19:53 |
clarkb | any opinions on willingness / ability to participate and if able when works best? | 19:53 |
fungi | i've got some travel going on for teh 15th through the 20th, but that should be doable for me | 19:54 |
fungi | in january i mean | 19:54 |
* frickler is still very unclear on the ability part, will need to decide short term | 19:54 | |
tonyb | 21-23 would be my preference as I can be more flexible with my awake hours that week which may make it easier/possible to get us all in "one place" | 19:55 |
corvus | lunar new year is jan 29. early 20s sounds good. | 19:55 |
clarkb | ok lets pencil in the days of 21-23. I will start working on compiling some agenda content and then we can nail down what hours work best as we get closer and have a better understanding of total content | 19:56 |
clarkb | frickler: and I guess let me know when you have better clarity | 19:56 |
frickler | sure | 19:57 |
tonyb | clarkb: perfect | 19:57 |
clarkb | #topic Open Discussion | 19:57 |
clarkb | Anything else? | 19:57 |
corvus | gosh there's a lot of steps to set up a quay org | 19:58 |
clarkb | oh I was also going to try and bring up the h2 db thing upstream | 19:58 |
clarkb | just to see if any other gerrit folks have input in addition to hashar | 19:58 |
fungi | there were some extra steps just to (re)use my existing quay/rh account | 19:58 |
corvus | apparently there's a lot of "inviting" accounts and users to join teams, which means a lot of clicking buttons in emails | 19:59 |
corvus | some infra-root folks should have some email invites | 19:59 |
fungi | they seemed to want me to fit my job role and position into some preset list that didn't even have "other" options | 19:59 |
corvus | and we may need to revisit the set of infra-root that own these orgs | 19:59 |
clarkb | corvus: yup I see an invite | 20:00 |
fungi | i'm now an "it - operations, engineer" | 20:00 |
clarkb | I'll look at that after lunch | 20:00 |
corvus | oh, yes, the root account is now a "System Administrator" in "IT Operations" | 20:00 |
clarkb | haha | 20:00 |
clarkb | and we are at time | 20:00 |
tonyb | LOL | 20:00 |
clarkb | thank you everyone | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Jan 7 20:00:47 2025 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-07-19.00.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-07-19.00.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2025/infra.2025-01-07-19.00.log.html | 20:00 |
fungi | thanks clarkb! | 20:00 |
tonyb | Thanks clarkb, Thanks all | 20:00 |
* tonyb goes to make coffee | 20:01 | |
* fungi goes to make beer | 20:01 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!