clarkb | Anyone else here for the meeting? | 19:00 |
---|---|---|
clarkb | We will get started momentarily | 19:00 |
fungi | ahoy! | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue May 10 19:01:24 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-May/000335.html Our Agenda | 19:01 |
ianw | o/ | 19:01 |
frickler | \o | 19:01 |
clarkb | I'm going to add a couple of items to this agenda at the end due to things that came up over the last couple of days. But shouldn't add too much time | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | We are about a month away from the Open Infra Summit. Mostly a heads up that that is happening since its been a while since there was an in person event which tends to impact us in weird ways | 19:02 |
clarkb | Etherpad will likely see more usage for example and CI will be quiet. Though maybe not because who knows what in person events will be like these days | 19:02 |
clarkb | Anyway that runs June 7-9 | 19:02 |
clarkb | I'm not sure I'm ready to experience jet lag again | 19:03 |
clarkb | #topic Actions from last meeting | 19:04 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-05-03-19.01.txt minutes from last meeting | 19:04 |
clarkb | I'm back to thinking of dropping this portion of the meeting again :) No actions to call out | 19:04 |
clarkb | #topic Topics | 19:04 |
clarkb | #topic Improving OpenDevs CD throughput | 19:05 |
clarkb | A few of us had good discussion yesterday. I'm not publicly writing notes down anywhere yet just beacuse the idea was to be candid and it needs filtration | 19:05 |
clarkb | But there were good ideas around what sorts of thing we might do post zuulv6 | 19:05 |
clarkb | feel free to PM me directly and I'm happy to fill individuals in | 19:06 |
clarkb | I'm hoping I'll have time to get to some of those improvements this week or next as well | 19:06 |
clarkb | But I think making those changes are a good idea before we dive into further zuul controls | 19:06 |
clarkb | but if anyone feels strongly the other way let me know | 19:06 |
clarkb | Anything else on this topic? | 19:07 |
fungi | not from me | 19:07 |
clarkb | #topic Container Maintenance | 19:08 |
clarkb | I was debugging the etherpad database stuff yseterady (we'll talk about that in depth later in the meeting) and noticed that the mariadb containers take an env flag to check for necessary upgrades on the db content during startup | 19:08 |
clarkb | One of the items on the container maintenance list is mariadb upgrades. I half suspect that performing those may be as simple as setting the flag in our docker-compose.yaml files and then updating the mariadb image version. It isn't clear to me if we have to step through versions one by one though and we should still test it | 19:09 |
clarkb | But that was a good discovery and I think it give us a straight forward path towards running newer mariadb in our containers | 19:09 |
clarkb | If anyone wants to look into that further I suspect the gerrit review upgrade job is a good place to start beacuse we can modify the mariadb version at the same time we upgrade gerrit and check taht it updates properly | 19:10 |
clarkb | there is also some content in that db for reviewed file flags iirc so not just upgrading an empty db | 19:11 |
clarkb | That was all I had on this topic. | 19:11 |
clarkb | #topic Spring cleaning old reviews | 19:11 |
clarkb | thank you everyone for the topic:topic:system-config-cleanup reviews. I think the only one left in there has some ongoing debugging (gerrit's diff timeout and how stackaltyics triggers it) | 19:12 |
ianw | clarkb: ^ that's good info, and i owe a checklist for the 3.5 upgrade, with downgrade instructions from previous weeks that i haven't got too. i will add that there and think about it more | 19:12 |
clarkb | ianw: fwiw I don't think upgrading mariadb when we upgrade gerrit is necessarily how we want to appraoch it in prod. But the testing gives us a good framework to exercise mariadb upgrades too | 19:13 |
clarkb | system-config is down to under 150 open changes. I think we've trimmed the list by about 50% since we started | 19:13 |
clarkb | When I have time I'll continue to try and review things and update changes that should be resurrected. Abandon those we don't need, etc | 19:13 |
clarkb | Despite not being 10 open changes like we probably all want this is still really good progress. Thanks everyone for the help | 19:14 |
clarkb | #topic Automating the build process of our PPAs | 19:14 |
clarkb | ianw has recently been looking at automating the build process for our packages hosted in our PPAs | 19:14 |
clarkb | specifically openafs and vhd-util | 19:15 |
clarkb | ianw: maybe you can give us a rundown on what that looks like? | 19:15 |
clarkb | I am excited for this as I always find deb packaging to involve secret magic and it requires a lot of effort for me to figure it out from scratch each time. Having it encoded in the automated tooling will help a lot with that | 19:15 |
ianw | yes when adding jammy support i decided that manually uploading was a bit of a gap in our gitops | 19:16 |
fungi | also with the way the repos for these packages got organized, you can mostly follow the recommended distro package building workflows with them too | 19:17 |
ianw | we have two projects to hold the .deb source generation bits, and now we automatically upload them to ubuntu PPA for building | 19:17 |
ianw | #link https://opendev.org/opendev/infra-openafs-deb | 19:17 |
ianw | #link https://opendev.org/opendev/infra-vhd-util-deb | 19:17 |
fungi | branch-per-distro-release | 19:18 |
ianw | they run jobs that are defined in openstack-zuul-jobs and are configured in project-config (because it's a promote job, it needs a secret, so the config needs to be in a trusted repo) | 19:18 |
clarkb | I don't think the secret needs to be in a trusted repo? It just need to run in a post config context? | 19:19 |
clarkb | *post merge context | 19:19 |
ianw | #link https://opendev.org/openstack/openstack-zuul-jobs/src/branch/master/zuul.d/jobs.yaml#L1397 | 19:19 |
fungi | yeah, we could also pass-to-parent the secret if being used by a playbook in another repo | 19:19 |
fungi | similar to how some other publication jobs are set up to use project-specific creds | 19:20 |
ianw | yep, i think that could work too | 19:20 |
ianw | openstack-zuul-jobs isn't really the best place for this, but it was where we have extant rpm build jobs | 19:20 |
ianw | those jobs publish to locations with openstack/* in them, so moving them is a multi-step process (but possible) | 19:21 |
clarkb | ya so I guess we're combining the desire to have ozj manage the package build directions but want to run the job on events to the actual deb repos | 19:21 |
fungi | longer term it would be nice to do it out of the opendev tenant instead, but what's there is working great | 19:21 |
ianw | right, the jobs are generic enough to build our debs, but not generic enough that i feel like they belong in zuul-jobs | 19:21 |
clarkb | My only concern doing it that way is that ozj has a separate set of reviewers, but currently that set isn't anything I'd worry about | 19:22 |
fungi | yeah, o-z-j was intended to be openstack-specific central job configuration | 19:22 |
clarkb | just something we'll have to keep in mind if adding people to ozj | 19:22 |
fungi | and this isn't really openstack-focused it's more for opendev | 19:22 |
ianw | anyway, so i've built everything in a testing environment and it all works | 19:22 |
ianw | #link https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/840572 | 19:22 |
clarkb | fungi: right, maybe for that reason we should put the jobs the deb repos themselves? | 19:23 |
ianw | is the final change that pushes things to the "production" PPA (i.e. the ones that are consumed by production jobs, and nodepool for dib, etc.) | 19:23 |
clarkb | and have them manage their own secrets | 19:23 |
ianw | there's one secret for both ppa's, they're not separate? | 19:24 |
clarkb | ianw: each repo can encode that secret in the repo I mean | 19:24 |
clarkb | it will be two zuul secrets possibly of the same data | 19:24 |
fungi | right, we could put the job configuration in system-config instead of o-z-j but i'm not all that worried about it in the near term | 19:24 |
clarkb | but ya I'm also not worried about it in the short term. More of a "it would be good to lay this out without relying on openstack core reviewers being trustworthy for package buidls too" | 19:25 |
ianw | clarkb: yeah, putting in repo became a bit annoying because of the very branchy nature of it. currently there's nothing on the master branch, and we have a distro branch | 19:25 |
clarkb | ah in that case using system-config to centralize things is probably fine | 19:25 |
clarkb | basically do what youare doing with ozj but put it in system-config? | 19:26 |
clarkb | then the overlap with people managing packages matches people managing config management | 19:26 |
ianw | system-config is probably a more natural home for both the rpm and deb generation | 19:27 |
ianw | i feel like the openafs rpm generation ended up there because it's closely tied to the wheel generation, where the wheel builders copy back their results to afs | 19:28 |
clarkb | Regarldess of where things end up longer term I'm very excited for this as it will help my noob packaging brain when we need to do updates :0- | 19:28 |
clarkb | er :) | 19:28 |
ianw | all the wheel jobs probably belong somewhere !o-z-j too, now | 19:28 |
fungi | yeah, this is awesome work | 19:28 |
clarkb | ianw: ya but I think we can have the wheels consume from the ppa too then it doesn't really matter where the ppa itself is managed | 19:28 |
ianw | yep that's right, but i just think that it was how it ended up there, years and years ago | 19:29 |
clarkb | gotcha | 19:29 |
clarkb | alright anything else on this topic? | 19:30 |
ianw | anyway, so it sounds like we're generally happy if i merge 840572 at some point today and switch production | 19:30 |
clarkb | ya then plan for reconfiguring the home of the job and secret longer term. | 19:30 |
ianw | then i'll clean up the testing ppas, and add some documentation to system-config | 19:30 |
ianw | and yeah, make todo notes about the cleanup | 19:30 |
clarkb | That might be a bit ugly to do but I bet enough has merged now that not merging the other change won't help anything | 19:30 |
clarkb | so proceeding is fine with me | 19:30 |
clarkb | thanks again, this is exciting | 19:31 |
clarkb | #topic Jammy Support | 19:31 |
clarkb | I think the last major piece missing from our Jammy support in CI is the wheel mirrors. The previous topic is ensuring we can build those so we should be on track for getting this solved. | 19:31 |
clarkb | Is there anything else related to Jammy that we need to address? Did dib handle the phased package updates on jammy yet? | 19:32 |
ianw | oh, no, sorry, i got distracted on that | 19:32 |
ianw | that's on the todo list too | 19:33 |
fungi | we put a short term hack in place for us though | 19:33 |
clarkb | ya opendev is fine. More considering that DIB users probably want a similar fix then we can revert our local fix | 19:33 |
clarkb | I've been trying to encourage others to push fixes as far upstream as possible so am taking my own advice :) | 19:33 |
clarkb | Please do let us know if you find new Jammy behaviors that are unexpected or problematic. We've already run into at least one (phased package updates) | 19:34 |
clarkb | but I guess for now we're good with Jammy and can move on | 19:34 |
clarkb | #topic Keycloak container image updates | 19:34 |
frickler | we'll also have to tackle the reprepro issue at one point | 19:34 |
clarkb | ++ | 19:34 |
clarkb | probably by upgrading the mirror-pdate node to jammy | 19:34 |
fungi | remind me what the reprepro issue is again? libzstd? | 19:35 |
clarkb | #undo | 19:35 |
opendevmeet | Removing item from minutes: #topic Keycloak container image updates | 19:35 |
clarkb | fungi: ya that | 19:35 |
frickler | well it doesn't seem jammy fixes it | 19:35 |
clarkb | frickler: oh I missed that | 19:35 |
fungi | interesting | 19:36 |
clarkb | frickler: has upstream reprepro commented further on the issue? last I checked the bug was pretty bare | 19:36 |
clarkb | #link https://bugs.launchpad.net/ubuntu/+source/reprepro/+bug/1968198 Reprepro jammy issue | 19:36 |
frickler | oh wait, maybe I'm mixing that up with reprepro vs. phased updates | 19:36 |
clarkb | oh yes reprepro didn't support phased updates either | 19:37 |
clarkb | but for our purposes that is fine. Definitely something upstream reprepro will likely need to fix though | 19:37 |
clarkb | (since we are telling our nodes to ignore phased updates and just update to latest right away) | 19:37 |
fungi | but luckily we don't need reprepro to support phased updates right now | 19:37 |
fungi | right, that | 19:37 |
ianw | is there a debian bug? i wonder what the relationship to ubuntu w.r.t. to reprepro is | 19:38 |
frickler | still the libzstd related issue remains | 19:38 |
frickler | https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=984690 is for phased updates | 19:38 |
clarkb | ianw: I'm not sure google produced the bug I found above | 19:38 |
fungi | right, the suspicion is that by upgrading the mirror-update server to jammy, reprepro will have new enough libzstd and no longer error on some packages | 19:39 |
frickler | can we deploy a second server to test that or do we need to upgrade inplace? | 19:40 |
clarkb | frickler: we can deploy a second server and have it only run jammy updates and stop it on the original. Everything is written to afs which both servers can mount at the same time just shouldn't write to at the same time | 19:40 |
fungi | i think as long as we coordinate locks between them, we can have more than one server in existence | 19:40 |
clarkb | then if we're happy with it tell the new server to run all of mirror updates and stop the old server | 19:41 |
frickler | sounds like a plan | 19:41 |
ianw | yeah coordinate locks between them might be easier said than done | 19:42 |
clarkb | ya I was thinking more like remove cron jobs from one server entirely and add them to the other server | 19:42 |
clarkb | since a reboot will kill the lock | 19:42 |
ianw | it's certainly possible, but the ansible is written to put all cron jobs on both hosts | 19:42 |
clarkb | better to just not attempt the sync in the first place | 19:42 |
frickler | just moving ubuntu mirroring should be independent of the other mirrors at least | 19:42 |
clarkb | ianw: We should be able to modify that with a when clause and check the distro or something | 19:42 |
clarkb | when: distro == jammy and when distro != jammy | 19:43 |
clarkb | its a bit of work but doable | 19:43 |
ianw | yep, it would probably be a good ansible enhancement to have a config dict that said where each job went based on hostname | 19:43 |
fungi | ianw: yeah, i didn't mean make them capable of automatically coordinating locks, just that we would make manual changes on the servers to ensure only one writing at a time | 19:43 |
fungi | but getting more fancy could certainly be cool too | 19:44 |
ianw | that way it could be all deployed on one host when we test, and split up easily in production | 19:44 |
ianw | another todo list item. | 19:44 |
ianw | anyway, i can look at getting jammy nodes going as we bring up the openafs bits over the next day or two | 19:44 |
clarkb | thanks | 19:45 |
clarkb | #topic Keycloak container image updates | 19:45 |
clarkb | Ok I noticed recently that keycloak had stopped publishing to dockerhub and is only publishing to quay.io now | 19:45 |
clarkb | We updated our deployment to switch to quay.io and that will pull a newer image as well. I didn't check if it ended up auto updating the service yet. I should do that | 19:46 |
corvus | 08593cb05ab0 quay.io/keycloak/keycloak:legacy "/opt/jboss/tools/do…" 26 hours ago Up 26 hours keycloak-docker_keycloak_1 | 19:46 |
clarkb | looks like yes it restarted the container | 19:46 |
clarkb | assuming it still works then that transition went well. | 19:46 |
corvus | i just logged into zuul | 19:47 |
clarkb | The other thing to call out here is the :legacy tag. The legacy tag means keycloak + wildfly. Their non legacy images do not use wildfly anymore and appear to be more configurable. However, our testing shows we can't just do a straight conversion from legacy to non legacy | 19:47 |
clarkb | It would probably be worthwhile to dig into the new non legacy image setup to see if that would work better for us. I know one reason people are excited for it is the image is about half the size of the legacy wildfly image. THough i don't think that is a major problem for us | 19:47 |
corvus | the only non-standard thing we're doing there is adjusting the listening port so it works with host networking | 19:47 |
corvus | hopefully it's straightforward to figure out how to do that with whatever replaced it | 19:48 |
clarkb | ah ok so maybe we need to adjust that in a different way since wildfly isn't doing the listening | 19:48 |
clarkb | ++ | 19:48 |
corvus | s/listening port/listening addr/ | 19:48 |
clarkb | I mostly wanted to bring this up as I'm not the best person to test wildfly currently (I should fix that but ETOOMANYTHINGS) | 19:48 |
clarkb | sounds like we're good currently and if we can update the image and make CI happy we can probably convert to the new thing as well | 19:49 |
corvus | ++ | 19:49 |
clarkb | #topic Etherpad Database Problems | 19:49 |
clarkb | Yesterday we upgraded Etherpad to 1.8.18 from 1.8.17. And it promptly broke on a database settings issue | 19:50 |
clarkb | The problem is that our schema's default charset and collation method were set to utf8 and not utf8mb4 and utf8mb4_bin respectively | 19:50 |
clarkb | 1.8.18 updated its euberdb dependency and it updated how it does logging and it was actually logging that crashed everything | 19:50 |
fungi | (even though the table we're using it it was set to those instead) | 19:50 |
clarkb | Previously it seems that etherpad logged this issue but then continued because the db table itself was fine | 19:51 |
fungi | also i have to say, it's kinda silly that they check the database default character set and collation instead of the table being used | 19:51 |
clarkb | but this update broke logging in that context which caused the whole thing to crash. To workaround this we updated the default charset and collation to what etherpad wants. THis shouldn't affect anything as it only affects CREATE TABLE (table is already created) LOAD DATA (we don't load data out of band etherpad writes to the db itself) and stored routines which I don't think | 19:51 |
clarkb | there are any | 19:51 |
clarkb | #link https://github.com/ether/ueberDB/issues/266 | 19:52 |
clarkb | I filed this issue upstream calling out the problem | 19:52 |
clarkb | hopefully they can debug why their logging changes cause the whole thing to crash and fix it, but we should be good going forward. Our CI didn't catch it because mariadb must default to utf8mb4 and utf8mb4_bin when creating new databases | 19:52 |
clarkb | our old schema from the land before time didn't have that goodness so it broke in prod | 19:52 |
clarkb | But please do call out any utf8 issues wit hetherpad if you notice them | 19:53 |
clarkb | #topic acme.sh failing to issue new certs | 19:53 |
clarkb | This was the thing I debugged this morning. tl;dr is that this is another weird side effectproblem in the tool like etherpads | 19:53 |
clarkb | specifically acme.sh is trying to create new keys for our certs because it thinks the key size we are requesing doesn't match the existing key size | 19:53 |
clarkb | but that is because they are transitioning from being implicit that empty key size means 2048 to explicitly stating key size is 2048 and the comparison between "" and 2048 fails triggering the new key creation | 19:54 |
clarkb | #link https://github.com/acmesh-official/acme.sh/issues/4077 wrote an upstream issue | 19:54 |
clarkb | #link https://github.com/acmesh-official/acme.sh/pull/4078 and pushed a potential fix | 19:54 |
clarkb | I think we can go manually edit the key config files and set Le_Keylength to 2048 or "2048" and acme.sh would continue to function. Similar to out of band changing db charset | 19:55 |
clarkb | but I think that the PR I pushed upstream shoudl also address this. One thing that isn't claer to me is how to test this | 19:55 |
ianw | ++ thanks for that. the other option is to pull an acme.sh with your change in it | 19:55 |
clarkb | ianw: ^ maybe you have ideas on testing? | 19:55 |
clarkb | ianw: oh we can point it to my fork on github I guess? I was wondering about that because I think ansible alwys updates acme.sh currently so I can't make local edits. I could locally edit them manually run the issue command but that won't tie into the whole dns syncing which would only half test it | 19:56 |
ianw | aiui it's really something that's occuring on hosts that already have a cert and are renewing? | 19:56 |
clarkb | ianw: correct | 19:57 |
fungi | yes | 19:57 |
clarkb | new hosts should be fine since they will write out the 2048 key length in their configs from the start | 19:57 |
ianw | that's two parts of the ci that aren't really tested :/ we test the general issuance, but never atually get the key | 19:57 |
clarkb | I think for a day or two we can probably live with waiting on upstream to accept or reject my change (and they have their own CI system) | 19:58 |
fungi | but also we don't test upgrading from a key written by acme.sh to a newer acme.sh version | 19:58 |
clarkb | but after that we should maybe consider manually editing the file and then letting a prod run run? | 19:58 |
fungi | and yeah, we have nearly a month before this becomes critical | 19:58 |
clarkb | *manually edit the config file to set Le_Keylength | 19:58 |
ianw | yeah, we could add an ansible to rm the autogenerated config-file (and then leave behind a .stamp file so that it doesn't run again until we want it to) and then it *should* just reissue | 19:59 |
clarkb | its mostly that I don't want stuff like this to pile up against all the summit prep and travel that I'm going to be busy with in a week or two :) | 19:59 |
clarkb | ianw: will it not require a --force in that case? | 19:59 |
ianw | that might be a good enhancement to the ansible anyway, give us a flag that we can set that says "ok, reissue everything on the next run" | 19:59 |
clarkb | because that is the other option. We could set --force and remove it in the future | 19:59 |
clarkb | but ya we don't have to solve thati n this meeting and we are at time | 20:00 |
clarkb | #topic Open Discussion | 20:00 |
ianw | i'll have to page it back in, but we can remove some config and it will regenerate | 20:00 |
clarkb | Really quickly before we end is there anything important that was missed? | 20:00 |
fungi | just a heads up that we had another gerrit user end up with a changed lp/uo openid, so i retired their old account and scrubbed the e-mail address so that could be reused with a new account | 20:00 |
clarkb | frickler mentioned fedora and opensuse mirroring are broken | 20:00 |
clarkb | I believe opensuse mirroring is broken on OBS mirroring. I suspect we'll just sjust mirroring that content and fixing opensuse isn't urgent | 20:01 |
clarkb | but the fedora mirroring may merit a closer look | 20:01 |
ianw | i saw that, i'll take a look. i have a change out still to prune some of the fedora mirrors too | 20:01 |
clarkb | ya I think I +2'd that one | 20:01 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/837637 | 20:01 |
ianw | yeah i might try fixing this and then merge that while i'm looking at it | 20:02 |
clarkb | sounds good | 20:03 |
clarkb | tahnk you everyone! | 20:03 |
fungi | thanks clarkb! | 20:03 |
clarkb | we'll be here next week same time and location | 20:03 |
frickler | oh, then I'll revoke my +w | 20:03 |
clarkb | #endmeeting | 20:03 |
opendevmeet | Meeting ended Tue May 10 20:03:20 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:03 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.html | 20:03 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.txt | 20:03 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-05-10-19.01.log.html | 20:03 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!