Tuesday, 2022-05-10

clarkb	Anyone else here for the meeting?	19:00
clarkb	We will get started momentarily	19:00
fungi	ahoy!	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue May 10 19:01:24 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/pipermail/service-discuss/2022-May/000335.html Our Agenda	19:01
ianw	o/	19:01
frickler	\o	19:01
clarkb	I'm going to add a couple of items to this agenda at the end due to things that came up over the last couple of days. But shouldn't add too much time	19:01
clarkb	#topic Announcements	19:01
clarkb	We are about a month away from the Open Infra Summit. Mostly a heads up that that is happening since its been a while since there was an in person event which tends to impact us in weird ways	19:02
clarkb	Etherpad will likely see more usage for example and CI will be quiet. Though maybe not because who knows what in person events will be like these days	19:02
clarkb	Anyway that runs June 7-9	19:02
clarkb	I'm not sure I'm ready to experience jet lag again	19:03
clarkb	#topic Actions from last meeting	19:04
clarkb	#link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-05-03-19.01.txt minutes from last meeting	19:04
clarkb	I'm back to thinking of dropping this portion of the meeting again :) No actions to call out	19:04
clarkb	#topic Topics	19:04
clarkb	#topic Improving OpenDevs CD throughput	19:05
clarkb	A few of us had good discussion yesterday. I'm not publicly writing notes down anywhere yet just beacuse the idea was to be candid and it needs filtration	19:05
clarkb	But there were good ideas around what sorts of thing we might do post zuulv6	19:05
clarkb	feel free to PM me directly and I'm happy to fill individuals in	19:06
clarkb	I'm hoping I'll have time to get to some of those improvements this week or next as well	19:06
clarkb	But I think making those changes are a good idea before we dive into further zuul controls	19:06
clarkb	but if anyone feels strongly the other way let me know	19:06
clarkb	Anything else on this topic?	19:07
fungi	not from me	19:07
clarkb	#topic Container Maintenance	19:08
clarkb	I was debugging the etherpad database stuff yseterady (we'll talk about that in depth later in the meeting) and noticed that the mariadb containers take an env flag to check for necessary upgrades on the db content during startup	19:08
clarkb	One of the items on the container maintenance list is mariadb upgrades. I half suspect that performing those may be as simple as setting the flag in our docker-compose.yaml files and then updating the mariadb image version. It isn't clear to me if we have to step through versions one by one though and we should still test it	19:09
clarkb	But that was a good discovery and I think it give us a straight forward path towards running newer mariadb in our containers	19:09
clarkb	If anyone wants to look into that further I suspect the gerrit review upgrade job is a good place to start beacuse we can modify the mariadb version at the same time we upgrade gerrit and check taht it updates properly	19:10
clarkb	there is also some content in that db for reviewed file flags iirc so not just upgrading an empty db	19:11
clarkb	That was all I had on this topic.	19:11
clarkb	#topic Spring cleaning old reviews	19:11
clarkb	thank you everyone for the topic:topic:system-config-cleanup reviews. I think the only one left in there has some ongoing debugging (gerrit's diff timeout and how stackaltyics triggers it)	19:12
ianw	clarkb: ^ that's good info, and i owe a checklist for the 3.5 upgrade, with downgrade instructions from previous weeks that i haven't got too. i will add that there and think about it more	19:12
clarkb	ianw: fwiw I don't think upgrading mariadb when we upgrade gerrit is necessarily how we want to appraoch it in prod. But the testing gives us a good framework to exercise mariadb upgrades too	19:13
clarkb	system-config is down to under 150 open changes. I think we've trimmed the list by about 50% since we started	19:13
clarkb	When I have time I'll continue to try and review things and update changes that should be resurrected. Abandon those we don't need, etc	19:13
clarkb	Despite not being 10 open changes like we probably all want this is still really good progress. Thanks everyone for the help	19:14
clarkb	#topic Automating the build process of our PPAs	19:14
clarkb	ianw has recently been looking at automating the build process for our packages hosted in our PPAs	19:14
clarkb	specifically openafs and vhd-util	19:15
clarkb	ianw: maybe you can give us a rundown on what that looks like?	19:15
clarkb	I am excited for this as I always find deb packaging to involve secret magic and it requires a lot of effort for me to figure it out from scratch each time. Having it encoded in the automated tooling will help a lot with that	19:15
ianw	yes when adding jammy support i decided that manually uploading was a bit of a gap in our gitops	19:16
fungi	also with the way the repos for these packages got organized, you can mostly follow the recommended distro package building workflows with them too	19:17
ianw	we have two projects to hold the .deb source generation bits, and now we automatically upload them to ubuntu PPA for building	19:17
ianw	#link https://opendev.org/opendev/infra-openafs-deb	19:17
ianw	#link https://opendev.org/opendev/infra-vhd-util-deb	19:17
fungi	branch-per-distro-release	19:18
ianw	they run jobs that are defined in openstack-zuul-jobs and are configured in project-config (because it's a promote job, it needs a secret, so the config needs to be in a trusted repo)	19:18
clarkb	I don't think the secret needs to be in a trusted repo? It just need to run in a post config context?	19:19
clarkb	*post merge context	19:19
ianw	#link https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L1397	19:19
fungi	yeah, we could also pass-to-parent the secret if being used by a playbook in another repo	19:19
fungi	similar to how some other publication jobs are set up to use project-specific creds	19:20
ianw	yep, i think that could work too	19:20
ianw	openstack-zuul-jobs isn't really the best place for this, but it was where we have extant rpm build jobs	19:20
ianw	those jobs publish to locations with openstack/* in them, so moving them is a multi-step process (but possible)	19:21
clarkb	ya so I guess we're combining the desire to have ozj manage the package build directions but want to run the job on events to the actual deb repos	19:21
fungi	longer term it would be nice to do it out of the opendev tenant instead, but what's there is working great	19:21
ianw	right, the jobs are generic enough to build our debs, but not generic enough that i feel like they belong in zuul-jobs	19:21
clarkb	My only concern doing it that way is that ozj has a separate set of reviewers, but currently that set isn't anything I'd worry about	19:22
fungi	yeah, o-z-j was intended to be openstack-specific central job configuration	19:22
clarkb	just something we'll have to keep in mind if adding people to ozj	19:22
fungi	and this isn't really openstack-focused it's more for opendev	19:22
ianw	anyway, so i've built everything in a testing environment and it all works	19:22
ianw	#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/840572	19:22
clarkb	fungi: right, maybe for that reason we should put the jobs the deb repos themselves?	19:23
ianw	is the final change that pushes things to the "production" PPA (i.e. the ones that are consumed by production jobs, and nodepool for dib, etc.)	19:23
clarkb	and have them manage their own secrets	19:23
ianw	there's one secret for both ppa's, they're not separate?	19:24
clarkb	ianw: each repo can encode that secret in the repo I mean	19:24
clarkb	it will be two zuul secrets possibly of the same data	19:24
fungi	right, we could put the job configuration in system-config instead of o-z-j but i'm not all that worried about it in the near term	19:24
clarkb	but ya I'm also not worried about it in the short term. More of a "it would be good to lay this out without relying on openstack core reviewers being trustworthy for package buidls too"	19:25
ianw	clarkb: yeah, putting in repo became a bit annoying because of the very branchy nature of it. currently there's nothing on the master branch, and we have a distro branch	19:25
clarkb	ah in that case using system-config to centralize things is probably fine	19:25
clarkb	basically do what youare doing with ozj but put it in system-config?	19:26
clarkb	then the overlap with people managing packages matches people managing config management	19:26
ianw	system-config is probably a more natural home for both the rpm and deb generation	19:27
ianw	i feel like the openafs rpm generation ended up there because it's closely tied to the wheel generation, where the wheel builders copy back their results to afs	19:28
clarkb	Regarldess of where things end up longer term I'm very excited for this as it will help my noob packaging brain when we need to do updates :0-	19:28
clarkb	er :)	19:28
ianw	all the wheel jobs probably belong somewhere !o-z-j too, now	19:28
fungi	yeah, this is awesome work	19:28
clarkb	ianw: ya but I think we can have the wheels consume from the ppa too then it doesn't really matter where the ppa itself is managed	19:28
ianw	yep that's right, but i just think that it was how it ended up there, years and years ago	19:29
clarkb	gotcha	19:29
clarkb	alright anything else on this topic?	19:30
ianw	anyway, so it sounds like we're generally happy if i merge 840572 at some point today and switch production	19:30
clarkb	ya then plan for reconfiguring the home of the job and secret longer term.	19:30
ianw	then i'll clean up the testing ppas, and add some documentation to system-config	19:30
ianw	and yeah, make todo notes about the cleanup	19:30
clarkb	That might be a bit ugly to do but I bet enough has merged now that not merging the other change won't help anything	19:30
clarkb	so proceeding is fine with me	19:30
clarkb	thanks again, this is exciting	19:31
clarkb	#topic Jammy Support	19:31
clarkb	I think the last major piece missing from our Jammy support in CI is the wheel mirrors. The previous topic is ensuring we can build those so we should be on track for getting this solved.	19:31
clarkb	Is there anything else related to Jammy that we need to address? Did dib handle the phased package updates on jammy yet?	19:32
ianw	oh, no, sorry, i got distracted on that	19:32
ianw	that's on the todo list too	19:33
fungi	we put a short term hack in place for us though	19:33
clarkb	ya opendev is fine. More considering that DIB users probably want a similar fix then we can revert our local fix	19:33
clarkb	I've been trying to encourage others to push fixes as far upstream as possible so am taking my own advice :)	19:33
clarkb	Please do let us know if you find new Jammy behaviors that are unexpected or problematic. We've already run into at least one (phased package updates)	19:34
clarkb	but I guess for now we're good with Jammy and can move on	19:34
clarkb	#topic Keycloak container image updates	19:34
frickler	we'll also have to tackle the reprepro issue at one point	19:34
clarkb	++	19:34
clarkb	probably by upgrading the mirror-pdate node to jammy	19:34
fungi	remind me what the reprepro issue is again? libzstd?	19:35
clarkb	#undo	19:35
opendevmeet	Removing item from minutes: #topic Keycloak container image updates	19:35
clarkb	fungi: ya that	19:35
frickler	well it doesn't seem jammy fixes it	19:35
clarkb	frickler: oh I missed that	19:35
fungi	interesting	19:36
clarkb	frickler: has upstream reprepro commented further on the issue? last I checked the bug was pretty bare	19:36
clarkb	#link https://bugs.launchpad.net/ubuntu/+source/reprepro/+bug/1968198 Reprepro jammy issue	19:36
frickler	oh wait, maybe I'm mixing that up with reprepro vs. phased updates	19:36
clarkb	oh yes reprepro didn't support phased updates either	19:37
clarkb	but for our purposes that is fine. Definitely something upstream reprepro will likely need to fix though	19:37
clarkb	(since we are telling our nodes to ignore phased updates and just update to latest right away)	19:37
fungi	but luckily we don't need reprepro to support phased updates right now	19:37
fungi	right, that	19:37
ianw	is there a debian bug? i wonder what the relationship to ubuntu w.r.t. to reprepro is	19:38
frickler	still the libzstd related issue remains	19:38
frickler	https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984690 is for phased updates	19:38
clarkb	ianw: I'm not sure google produced the bug I found above	19:38
fungi	right, the suspicion is that by upgrading the mirror-update server to jammy, reprepro will have new enough libzstd and no longer error on some packages	19:39
frickler	can we deploy a second server to test that or do we need to upgrade inplace?	19:40
clarkb	frickler: we can deploy a second server and have it only run jammy updates and stop it on the original. Everything is written to afs which both servers can mount at the same time just shouldn't write to at the same time	19:40
fungi	i think as long as we coordinate locks between them, we can have more than one server in existence	19:40
clarkb	then if we're happy with it tell the new server to run all of mirror updates and stop the old server	19:41
frickler	sounds like a plan	19:41
ianw	yeah coordinate locks between them might be easier said than done	19:42
clarkb	ya I was thinking more like remove cron jobs from one server entirely and add them to the other server	19:42
clarkb	since a reboot will kill the lock	19:42
ianw	it's certainly possible, but the ansible is written to put all cron jobs on both hosts	19:42
clarkb	better to just not attempt the sync in the first place	19:42
frickler	just moving ubuntu mirroring should be independent of the other mirrors at least	19:42
clarkb	ianw: We should be able to modify that with a when clause and check the distro or something	19:42
clarkb	when: distro == jammy and when distro != jammy	19:43
clarkb	its a bit of work but doable	19:43
ianw	yep, it would probably be a good ansible enhancement to have a config dict that said where each job went based on hostname	19:43
fungi	ianw: yeah, i didn't mean make them capable of automatically coordinating locks, just that we would make manual changes on the servers to ensure only one writing at a time	19:43
fungi	but getting more fancy could certainly be cool too	19:44
ianw	that way it could be all deployed on one host when we test, and split up easily in production	19:44
ianw	another todo list item.	19:44
ianw	anyway, i can look at getting jammy nodes going as we bring up the openafs bits over the next day or two	19:44
clarkb	thanks	19:45
clarkb	#topic Keycloak container image updates	19:45
clarkb	Ok I noticed recently that keycloak had stopped publishing to dockerhub and is only publishing to quay.io now	19:45
clarkb	We updated our deployment to switch to quay.io and that will pull a newer image as well. I didn't check if it ended up auto updating the service yet. I should do that	19:46
corvus	08593cb05ab0 quay.io/keycloak/keycloak:legacy "/opt/jboss/tools/do…" 26 hours ago Up 26 hours keycloak-docker_keycloak_1	19:46
clarkb	looks like yes it restarted the container	19:46
clarkb	assuming it still works then that transition went well.	19:46
corvus	i just logged into zuul	19:47
clarkb	The other thing to call out here is the :legacy tag. The legacy tag means keycloak + wildfly. Their non legacy images do not use wildfly anymore and appear to be more configurable. However, our testing shows we can't just do a straight conversion from legacy to non legacy	19:47
clarkb	It would probably be worthwhile to dig into the new non legacy image setup to see if that would work better for us. I know one reason people are excited for it is the image is about half the size of the legacy wildfly image. THough i don't think that is a major problem for us	19:47
corvus	the only non-standard thing we're doing there is adjusting the listening port so it works with host networking	19:47
corvus	hopefully it's straightforward to figure out how to do that with whatever replaced it	19:48
clarkb	ah ok so maybe we need to adjust that in a different way since wildfly isn't doing the listening	19:48
clarkb	++	19:48
corvus	s/listening port/listening addr/	19:48
clarkb	I mostly wanted to bring this up as I'm not the best person to test wildfly currently (I should fix that but ETOOMANYTHINGS)	19:48
clarkb	sounds like we're good currently and if we can update the image and make CI happy we can probably convert to the new thing as well	19:49
corvus	++	19:49
clarkb	#topic Etherpad Database Problems	19:49
clarkb	Yesterday we upgraded Etherpad to 1.8.18 from 1.8.17. And it promptly broke on a database settings issue	19:50
clarkb	The problem is that our schema's default charset and collation method were set to utf8 and not utf8mb4 and utf8mb4_bin respectively	19:50
clarkb	1.8.18 updated its euberdb dependency and it updated how it does logging and it was actually logging that crashed everything	19:50
fungi	(even though the table we're using it it was set to those instead)	19:50
clarkb	Previously it seems that etherpad logged this issue but then continued because the db table itself was fine	19:51
fungi	also i have to say, it's kinda silly that they check the database default character set and collation instead of the table being used	19:51
clarkb	but this update broke logging in that context which caused the whole thing to crash. To workaround this we updated the default charset and collation to what etherpad wants. THis shouldn't affect anything as it only affects CREATE TABLE (table is already created) LOAD DATA (we don't load data out of band etherpad writes to the db itself) and stored routines which I don't think	19:51
clarkb	there are any	19:51
clarkb	#link https://github.com/ether/ueberDB/issues/266	19:52
clarkb	I filed this issue upstream calling out the problem	19:52
clarkb	hopefully they can debug why their logging changes cause the whole thing to crash and fix it, but we should be good going forward. Our CI didn't catch it because mariadb must default to utf8mb4 and utf8mb4_bin when creating new databases	19:52
clarkb	our old schema from the land before time didn't have that goodness so it broke in prod	19:52
clarkb	But please do call out any utf8 issues wit hetherpad if you notice them	19:53
clarkb	#topic acme.sh failing to issue new certs	19:53
clarkb	This was the thing I debugged this morning. tl;dr is that this is another weird side effectproblem in the tool like etherpads	19:53
clarkb	specifically acme.sh is trying to create new keys for our certs because it thinks the key size we are requesing doesn't match the existing key size	19:53
clarkb	but that is because they are transitioning from being implicit that empty key size means 2048 to explicitly stating key size is 2048 and the comparison between "" and 2048 fails triggering the new key creation	19:54
clarkb	#link https://github.com/acmesh-official/acme.sh/issues/4077 wrote an upstream issue	19:54
clarkb	#link https://github.com/acmesh-official/acme.sh/pull/4078 and pushed a potential fix	19:54
clarkb	I think we can go manually edit the key config files and set Le_Keylength to 2048 or "2048" and acme.sh would continue to function. Similar to out of band changing db charset	19:55
clarkb	but I think that the PR I pushed upstream shoudl also address this. One thing that isn't claer to me is how to test this	19:55
ianw	++ thanks for that. the other option is to pull an acme.sh with your change in it	19:55
clarkb	ianw: ^ maybe you have ideas on testing?	19:55
clarkb	ianw: oh we can point it to my fork on github I guess? I was wondering about that because I think ansible alwys updates acme.sh currently so I can't make local edits. I could locally edit them manually run the issue command but that won't tie into the whole dns syncing which would only half test it	19:56
ianw	aiui it's really something that's occuring on hosts that already have a cert and are renewing?	19:56
clarkb	ianw: correct	19:57
fungi	yes	19:57
clarkb	new hosts should be fine since they will write out the 2048 key length in their configs from the start	19:57
ianw	that's two parts of the ci that aren't really tested :/ we test the general issuance, but never atually get the key	19:57
clarkb	I think for a day or two we can probably live with waiting on upstream to accept or reject my change (and they have their own CI system)	19:58
fungi	but also we don't test upgrading from a key written by acme.sh to a newer acme.sh version	19:58
clarkb	but after that we should maybe consider manually editing the file and then letting a prod run run?	19:58
fungi	and yeah, we have nearly a month before this becomes critical	19:58
clarkb	*manually edit the config file to set Le_Keylength	19:58
ianw	yeah, we could add an ansible to rm the autogenerated config-file (and then leave behind a .stamp file so that it doesn't run again until we want it to) and then it should just reissue	19:59
clarkb	its mostly that I don't want stuff like this to pile up against all the summit prep and travel that I'm going to be busy with in a week or two :)	19:59
clarkb	ianw: will it not require a --force in that case?	19:59
ianw	that might be a good enhancement to the ansible anyway, give us a flag that we can set that says "ok, reissue everything on the next run"	19:59
clarkb	because that is the other option. We could set --force and remove it in the future	19:59
clarkb	but ya we don't have to solve thati n this meeting and we are at time	20:00
clarkb	#topic Open Discussion	20:00
ianw	i'll have to page it back in, but we can remove some config and it will regenerate	20:00
clarkb	Really quickly before we end is there anything important that was missed?	20:00
fungi	just a heads up that we had another gerrit user end up with a changed lp/uo openid, so i retired their old account and scrubbed the e-mail address so that could be reused with a new account	20:00
clarkb	frickler mentioned fedora and opensuse mirroring are broken	20:00
clarkb	I believe opensuse mirroring is broken on OBS mirroring. I suspect we'll just sjust mirroring that content and fixing opensuse isn't urgent	20:01
clarkb	but the fedora mirroring may merit a closer look	20:01
ianw	i saw that, i'll take a look. i have a change out still to prune some of the fedora mirrors too	20:01
clarkb	ya I think I +2'd that one	20:01
ianw	#link https://review.opendev.org/c/opendev/system-config/+/837637	20:01
ianw	yeah i might try fixing this and then merge that while i'm looking at it	20:02
clarkb	sounds good	20:03
clarkb	tahnk you everyone!	20:03
fungi	thanks clarkb!	20:03
clarkb	we'll be here next week same time and location	20:03
frickler	oh, then I'll revoke my +w	20:03
clarkb	#endmeeting	20:03
opendevmeet	Meeting ended Tue May 10 20:03:20 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:03
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.html	20:03
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.txt	20:03
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.log.html	20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!