Tuesday, 2022-05-10

clarkbAnyone else here for the meeting?19:00
clarkbWe will get started momentarily19:00
fungiahoy!19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue May 10 19:01:24 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-May/000335.html Our Agenda19:01
ianwo/19:01
frickler\o19:01
clarkbI'm going to add a couple of items to this agenda at the end due to things that came up over the last couple of days. But shouldn't add too much time19:01
clarkb#topic Announcements19:01
clarkbWe are about a month away from the Open Infra Summit. Mostly a heads up that that is happening since its been a while since there was an in person event which tends to impact us in weird ways19:02
clarkbEtherpad will likely see more usage for example and CI will be quiet. Though maybe not because who knows what in person events will be like these days19:02
clarkbAnyway that runs June 7-919:02
clarkbI'm not sure I'm ready to experience jet lag again19:03
clarkb#topic Actions from last meeting19:04
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-05-03-19.01.txt minutes from last meeting19:04
clarkbI'm back to thinking of dropping this portion of the meeting again :) No actions to call out19:04
clarkb#topic Topics19:04
clarkb#topic Improving OpenDevs CD throughput19:05
clarkbA few of us had good discussion yesterday. I'm not publicly writing notes down anywhere yet just beacuse the idea was to be candid and it needs filtration19:05
clarkbBut there were good ideas around what sorts of thing we might do post zuulv619:05
clarkbfeel free to PM me directly and I'm happy to fill individuals in19:06
clarkbI'm hoping I'll have time to get to some of those improvements this week or next as well19:06
clarkbBut I think making those changes are a good idea before we dive into further zuul controls19:06
clarkbbut if anyone feels strongly the other way let me know19:06
clarkbAnything else on this topic?19:07
funginot from me19:07
clarkb#topic Container Maintenance19:08
clarkbI was debugging the etherpad database stuff yseterady (we'll talk about that in depth later in the meeting) and noticed that the mariadb containers take an env flag to check for necessary upgrades on the db content during startup19:08
clarkbOne of the items on the container maintenance list is mariadb upgrades. I half suspect that performing those may be as simple as setting the flag in our docker-compose.yaml files and then updating the mariadb image version. It isn't clear to me if we have to step through versions one by one though and we should still test it19:09
clarkbBut that was a good discovery and I think it give us a straight forward path towards running newer mariadb in our containers19:09
clarkbIf anyone wants to look into that further I suspect the gerrit review upgrade job is a good place to start beacuse we can modify the mariadb version at the same time we upgrade gerrit and check taht it updates properly19:10
clarkbthere is also some content in that db for reviewed file flags iirc so not just upgrading an empty db19:11
clarkbThat was all I had on this topic.19:11
clarkb#topic Spring cleaning old reviews19:11
clarkbthank you everyone for the topic:topic:system-config-cleanup reviews. I think the only one left in there has some ongoing debugging (gerrit's diff timeout and how stackaltyics triggers it)19:12
ianwclarkb: ^ that's good info, and i owe a checklist for the 3.5 upgrade, with downgrade instructions from previous weeks that i haven't got too.  i will add that there and think about it more19:12
clarkbianw: fwiw I don't think upgrading mariadb when we upgrade gerrit is necessarily how we want to appraoch it in prod. But the testing gives us a good framework to exercise mariadb upgrades too19:13
clarkbsystem-config is down to under 150 open changes. I think we've trimmed the list by about 50% since we started19:13
clarkbWhen I have time I'll continue to try and review things and update changes that should be resurrected. Abandon those we don't need, etc19:13
clarkbDespite not being 10 open changes like we probably all want this is still really good progress. Thanks everyone for the help19:14
clarkb#topic Automating the build process of our PPAs19:14
clarkbianw has recently been looking at automating the build process for our packages hosted in our PPAs19:14
clarkbspecifically openafs and vhd-util19:15
clarkbianw: maybe you can give us a rundown on what that looks like?19:15
clarkbI am excited for this as I always find deb packaging to involve secret magic and it requires a lot of effort for me to figure it out from scratch each time. Having it encoded in the automated tooling will help a lot with that19:15
ianwyes when adding jammy support i decided that manually uploading was a bit of a gap in our gitops19:16
fungialso with the way the repos for these packages got organized, you can mostly follow the recommended distro package building workflows with them too19:17
ianwwe have two projects to hold the .deb source generation bits, and now we automatically upload them to ubuntu PPA for building19:17
ianw#link https://opendev.org/opendev/infra-openafs-deb19:17
ianw#link https://opendev.org/opendev/infra-vhd-util-deb19:17
fungibranch-per-distro-release19:18
ianwthey run jobs that are defined in openstack-zuul-jobs and are configured in project-config (because it's a promote job, it needs a secret, so the config needs to be in a trusted repo)19:18
clarkbI don't think the secret needs to be in a trusted repo? It just need to run in a post config context?19:19
clarkb*post merge context19:19
ianw#link https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L139719:19
fungiyeah, we could also pass-to-parent the secret if being used by a playbook in another repo19:19
fungisimilar to how some other publication jobs are set up to use project-specific creds19:20
ianwyep, i think that could work too19:20
ianwopenstack-zuul-jobs isn't really the best place for this, but it was where we have extant rpm build jobs19:20
ianwthose jobs publish to locations with openstack/* in them, so moving them is a multi-step process (but possible)19:21
clarkbya so I guess we're combining the desire to have ozj manage the package build directions but want to run the job on events to the actual deb repos19:21
fungilonger term it would be nice to do it out of the opendev tenant instead, but what's there is working great19:21
ianwright, the jobs are generic enough to build our debs, but not generic enough that i feel like they belong in zuul-jobs19:21
clarkbMy only concern doing it that way is that ozj has a separate set of reviewers, but currently that set isn't anything I'd worry about19:22
fungiyeah, o-z-j was intended to be openstack-specific central job configuration19:22
clarkbjust something we'll have to keep in mind if adding people to ozj19:22
fungiand this isn't really openstack-focused it's more for opendev19:22
ianwanyway, so i've built everything in a testing environment and it all works19:22
ianw#link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/84057219:22
clarkbfungi: right, maybe for that reason we should put the jobs the deb repos themselves?19:23
ianwis the final change that pushes things to the "production" PPA (i.e. the ones that are consumed by production jobs, and nodepool for dib, etc.)19:23
clarkband have them manage their own secrets19:23
ianwthere's one secret for both ppa's, they're not separate?19:24
clarkbianw: each repo can encode that secret in the repo I mean19:24
clarkbit will be two zuul secrets possibly of the same data19:24
fungiright, we could put the job configuration in system-config instead of o-z-j but i'm not all that worried about it in the near term19:24
clarkbbut ya I'm also not worried about it in the short term. More of a "it would be good to lay this out without relying on openstack core reviewers being trustworthy for package buidls too"19:25
ianwclarkb: yeah, putting in repo became a bit annoying because of the very branchy nature of it.  currently there's nothing on the master branch, and we have a distro branch19:25
clarkbah in that case using system-config to centralize things is probably fine19:25
clarkbbasically do what youare doing with ozj but put it in system-config?19:26
clarkbthen the overlap with people managing packages matches people managing config management19:26
ianwsystem-config is probably a more natural home for both the rpm and deb generation19:27
ianwi feel like the openafs rpm generation ended up there because it's closely tied to the wheel generation, where the wheel builders copy back their results to afs19:28
clarkbRegarldess of where things end up longer term I'm very excited for this as it will help my noob packaging brain when we need to do updates :0-19:28
clarkber :)19:28
ianwall the wheel jobs probably belong somewhere !o-z-j too, now19:28
fungiyeah, this is awesome work19:28
clarkbianw: ya but I think we can have the wheels consume from the ppa too then it doesn't really matter where the ppa itself is managed19:28
ianwyep that's right, but i just think that it was how it ended up there, years and years ago19:29
clarkbgotcha19:29
clarkbalright anything else on this topic?19:30
ianwanyway, so it sounds like we're generally happy if i merge 840572 at some point today and switch production19:30
clarkbya then plan for reconfiguring the home of the job and secret longer term.19:30
ianwthen i'll clean up the testing ppas, and add some documentation to system-config19:30
ianwand yeah, make todo notes about the cleanup19:30
clarkbThat might be a bit ugly to do but I bet enough has merged now that not merging the other change won't help anything19:30
clarkbso proceeding is fine with me19:30
clarkbthanks again, this is exciting19:31
clarkb#topic Jammy Support19:31
clarkbI think the last major piece missing from our Jammy support in CI is the wheel mirrors. The previous topic is ensuring we can build those so we should be on track for getting this solved.19:31
clarkbIs there anything else related to Jammy that we need to address? Did dib handle the phased package updates on jammy yet?19:32
ianwoh, no, sorry, i got distracted on that19:32
ianwthat's on the todo list too19:33
fungiwe put a short term hack in place for us though19:33
clarkbya opendev is fine. More considering that DIB users probably want a similar fix then we can revert our local fix19:33
clarkbI've been trying to encourage others to push fixes as far upstream as possible so am taking my own advice :)19:33
clarkbPlease do let us know if you find new Jammy behaviors that are unexpected or problematic. We've already run into at least one (phased package updates)19:34
clarkbbut I guess for now we're good with Jammy and can move on19:34
clarkb#topic Keycloak container image updates19:34
fricklerwe'll also have to tackle the reprepro issue at one point19:34
clarkb++19:34
clarkbprobably by upgrading the mirror-pdate node to jammy19:34
fungiremind me what the reprepro issue is again? libzstd?19:35
clarkb#undo19:35
opendevmeetRemoving item from minutes: #topic Keycloak container image updates19:35
clarkbfungi: ya that19:35
fricklerwell it doesn't seem jammy fixes it19:35
clarkbfrickler: oh I missed that19:35
fungiinteresting19:36
clarkbfrickler: has upstream reprepro commented further on the issue? last I checked the bug was pretty bare19:36
clarkb#link https://bugs.launchpad.net/ubuntu/+source/reprepro/+bug/1968198 Reprepro jammy issue19:36
frickleroh wait, maybe I'm mixing that up with reprepro vs. phased updates19:36
clarkboh yes reprepro didn't support phased updates either19:37
clarkbbut for our purposes that is fine. Definitely something upstream reprepro will likely need to fix though19:37
clarkb(since we are telling our nodes to ignore phased updates and just update to latest right away)19:37
fungibut luckily we don't need reprepro to support phased updates right now19:37
fungiright, that19:37
ianwis there a debian bug?  i wonder what the relationship to ubuntu w.r.t. to reprepro is19:38
fricklerstill the libzstd related issue remains19:38
fricklerhttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984690 is for phased updates19:38
clarkbianw: I'm not sure google produced the bug I found above19:38
fungiright, the suspicion is that by upgrading the mirror-update server to jammy, reprepro will have new enough libzstd and no longer error on some packages19:39
fricklercan we deploy a second server to test that or do we need to upgrade inplace?19:40
clarkbfrickler: we can deploy a second server and have it only run jammy updates and stop it on the original. Everything is written to afs which both servers can mount at the same time just shouldn't write to at the same time19:40
fungii think as long as we coordinate locks between them, we can have more than one server in existence19:40
clarkbthen if we're happy with it tell the new server to run all of mirror updates and stop the old server19:41
fricklersounds like a plan19:41
ianwyeah coordinate locks between them might be easier said than done19:42
clarkbya I was thinking more like remove cron jobs from one server entirely and add them to the other server19:42
clarkbsince a reboot will kill the lock19:42
ianwit's certainly possible, but the ansible is written to put all cron jobs on both hosts19:42
clarkbbetter to just not attempt the sync in the first place19:42
fricklerjust moving ubuntu mirroring should be independent of the other mirrors at least19:42
clarkbianw: We should be able to modify that with a when clause and check the distro or something19:42
clarkbwhen: distro == jammy and when distro != jammy19:43
clarkbits a bit of work but doable19:43
ianwyep, it would probably be a good ansible enhancement to have a config dict that said where each job went based on hostname19:43
fungiianw: yeah, i didn't mean make them capable of automatically coordinating locks, just that we would make manual changes on the servers to ensure only one writing at a time19:43
fungibut getting more fancy could certainly be cool too19:44
ianwthat way it could be all deployed on one host when we test, and split up easily in production19:44
ianwanother todo list item.19:44
ianwanyway, i can look at getting jammy nodes going as we bring up the openafs bits over the next day or two19:44
clarkbthanks19:45
clarkb#topic Keycloak container image updates19:45
clarkbOk I noticed recently that keycloak had stopped publishing to dockerhub and is only publishing to quay.io now19:45
clarkbWe updated our deployment to switch to quay.io and that will pull a newer image as well. I didn't check if it ended up auto updating the service yet. I should do that19:46
corvus08593cb05ab0   quay.io/keycloak/keycloak:legacy   "/opt/jboss/tools/do…"   26 hours ago   Up 26 hours             keycloak-docker_keycloak_119:46
clarkblooks like yes it restarted the container19:46
clarkbassuming it still works then that transition went well.19:46
corvusi just logged into zuul19:47
clarkbThe other thing to call out here is the :legacy tag. The legacy tag means keycloak + wildfly. Their non legacy images do not use wildfly anymore and appear to be more configurable. However, our testing shows we can't just do a straight conversion from legacy to non legacy19:47
clarkbIt would probably be worthwhile to dig into the new non legacy image setup to see if that would work better for us. I know one reason people are excited for it is the image is about half the size of the legacy wildfly image. THough i don't think that is a major problem for us19:47
corvusthe only non-standard thing we're doing there is adjusting the listening port so it works with host networking19:47
corvushopefully it's straightforward to figure out how to do that with whatever replaced it19:48
clarkbah ok so maybe we need to adjust that in a different way since wildfly isn't doing the listening19:48
clarkb++19:48
corvuss/listening port/listening addr/19:48
clarkbI mostly wanted to bring this up as I'm not the best person to test wildfly currently (I should fix that but ETOOMANYTHINGS)19:48
clarkbsounds like we're good currently and if we can update the image and make CI happy we can probably convert to the new thing as well19:49
corvus++19:49
clarkb#topic Etherpad Database Problems19:49
clarkbYesterday we upgraded Etherpad to 1.8.18 from 1.8.17. And it promptly broke on a database settings issue19:50
clarkbThe problem is that our schema's default charset and collation method were set to utf8 and not utf8mb4 and utf8mb4_bin respectively19:50
clarkb1.8.18 updated its euberdb dependency and it updated how it does logging and it was actually logging that crashed everything19:50
fungi(even though the table we're using it it was set to those instead)19:50
clarkbPreviously it seems that etherpad logged this issue but then continued because the db table itself was fine19:51
fungialso i have to say, it's kinda silly that they check the database default character set and collation instead of the table being used19:51
clarkbbut this update broke logging in that context which caused the whole thing to crash. To workaround this we updated the default charset and collation to what etherpad wants. THis shouldn't affect anything as it only affects CREATE TABLE (table is already created) LOAD DATA (we don't load data out of band etherpad writes to the db itself) and stored routines which I don't think19:51
clarkbthere are any19:51
clarkb#link https://github.com/ether/ueberDB/issues/26619:52
clarkbI filed this issue upstream calling out the problem19:52
clarkbhopefully they can debug why their logging changes cause the whole thing to crash and fix it, but we should be good going forward. Our CI didn't catch it because mariadb must default to utf8mb4 and utf8mb4_bin when creating new databases19:52
clarkbour old schema from the land before time didn't have that goodness so it broke in prod19:52
clarkbBut please do call out any utf8 issues wit hetherpad if you notice them19:53
clarkb#topic acme.sh failing to issue new certs19:53
clarkbThis was the thing I debugged this morning. tl;dr is that this is another weird side effectproblem in the tool like etherpads19:53
clarkbspecifically acme.sh is trying to create new keys for our certs because it thinks the key size we are requesing doesn't match the existing key size19:53
clarkbbut that is because they are transitioning from being implicit that empty key size means 2048 to explicitly stating key size is 2048 and the comparison between "" and 2048 fails triggering the new key creation19:54
clarkb#link https://github.com/acmesh-official/acme.sh/issues/4077 wrote an upstream issue19:54
clarkb#link https://github.com/acmesh-official/acme.sh/pull/4078 and pushed a potential fix19:54
clarkbI think we can go manually edit the key config files and set Le_Keylength to 2048 or "2048" and acme.sh would continue to function. Similar to out of band changing db charset19:55
clarkbbut I think that the PR I pushed upstream shoudl also address this. One thing that isn't claer to me is how to test this19:55
ianw++ thanks for that.  the other option is to pull an acme.sh with your change in it19:55
clarkbianw: ^ maybe you have ideas on testing?19:55
clarkbianw: oh we can point it to my fork on github I guess? I was wondering about that because I think ansible alwys updates acme.sh currently so I can't make local edits. I could locally edit them manually run the issue command but that won't tie into the whole dns syncing which would only half test it19:56
ianwaiui it's really something that's occuring on hosts that already have a cert and are renewing?19:56
clarkbianw: correct19:57
fungiyes19:57
clarkbnew hosts should be fine since they will write out the 2048 key length in their configs from the start19:57
ianwthat's two parts of the ci that aren't really tested :/  we test the general issuance, but never atually get the key19:57
clarkbI think for a day or two we can probably live with waiting on upstream to accept or reject my change (and they have their own CI system)19:58
fungibut also we don't test upgrading from a key written by acme.sh to a newer acme.sh version19:58
clarkbbut after that we should maybe consider manually editing the file and then letting a prod run run?19:58
fungiand yeah, we have nearly a month before this becomes critical19:58
clarkb*manually edit the config file to set Le_Keylength19:58
ianwyeah, we could add an ansible to rm the autogenerated config-file (and then leave behind a .stamp file so that it doesn't run again until we want it to) and then it *should* just reissue19:59
clarkbits mostly that I don't want stuff like this to pile up against all the summit prep and travel that I'm going to be busy with in a week or two :)19:59
clarkbianw: will it not require a --force in that case?19:59
ianwthat might be a good enhancement to the ansible anyway, give us a flag that we can set that says "ok, reissue everything on the next run"19:59
clarkbbecause that is the other option. We could set --force and remove it in the future19:59
clarkbbut ya we don't have to solve thati n this meeting and we are at time20:00
clarkb#topic Open Discussion20:00
ianwi'll have to page it back in, but we can remove some config and it will regenerate20:00
clarkbReally quickly before we end is there anything important that was missed?20:00
fungijust a heads up that we had another gerrit user end up with a changed lp/uo openid, so i retired their old account and scrubbed the e-mail address so that could be reused with a new account20:00
clarkbfrickler mentioned fedora and opensuse mirroring are broken20:00
clarkbI believe opensuse mirroring is broken on OBS mirroring. I suspect we'll just sjust mirroring that content and fixing opensuse isn't urgent20:01
clarkbbut the fedora mirroring may merit a closer look20:01
ianwi saw that, i'll take a look.  i have a change out still to prune some of the fedora mirrors too20:01
clarkbya I think I +2'd that one20:01
ianw#link https://review.opendev.org/c/opendev/system-config/+/83763720:01
ianwyeah i might  try fixing this and then merge that while i'm looking at it20:02
clarkbsounds good20:03
clarkbtahnk you everyone!20:03
fungithanks clarkb!20:03
clarkbwe'll be here next week same time and location20:03
frickleroh, then I'll revoke my +w20:03
clarkb#endmeeting20:03
opendevmeetMeeting ended Tue May 10 20:03:20 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.html20:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.txt20:03
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.log.html20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!