Tuesday, 2022-11-15

clarkbalmost meeting time.18:59
fungiindeed it be19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Nov 15 19:01:01 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
ianwo/19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-November/000379.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI had no announcements19:01
clarkb#topic Topics19:01
fungithere are a couple19:01
clarkb#undo19:01
opendevmeetRemoving item from minutes: #topic Topics19:01
clarkbgo for it19:01
funginominations for the openinfra foundation board of directors are open19:01
fungiand the cfp for the openinfra summit in vancouver is now open as well19:02
fungi#link https://lists.openinfra.dev/pipermail/foundation/2022-November/003104.html 2023 Open Infrastructure Foundation Individual Director nominations are open19:02
fungi#link https://lists.openinfra.dev/pipermail/foundation/2022-November/003105.html The CFP for the OpenInfra Summit 2023 is open19:02
fungithat's all i can think of though19:03
clarkb#topic Bastion Host Updates19:03
clarkb#link https://review.opendev.org/q/topic:prod-bastion-group19:03
clarkb#link https://review.opendev.org/q/topic:bridge-ansible-venv19:03
clarkblooks like a few changes have merged since we last discussed this. ianw anything urgent or otherwise not captured by the two change topics that we should look at?19:04
clarkbOne idea I had was maybe we should consolidate to a single topic for review even if there are distinct trees of change happening?19:04
ianwyeah i can clean up; i think prod-bastion-group is really now about being in a position to run parallel jobs19:05
ianwwhich is basically "setup source in one place, then fire off jobs"19:05
clarkbah so maybe another topic for "things we need to do before turning off the old server" ?19:06
ianwthe bridge-ansible-venv; one i'll get back to is us storing the host keys for our servers and deploying to /etc/ssh19:06
ianwfungi had some good points on making that better, so that's wip, but i'll get to that soon19:07
ianw(the idea being that when we start a new bridge, i'm trying to make it so we have as few manual steps as possible :)19:07
clarkb++19:08
ianwso writing down the manual steps has been a good way to try and think of ways to codify them :)19:08
fungiit's a good approach, but gets weird if you don't include all ip addresses along wit the hostnames, and we have that info in the inventory already19:08
ianwthe only other one is19:08
ianw#link https://review.opendev.org/c/opendev/system-config/+/86128419:08
ianwwhich converts our launch-node into a small package installed in a venv on the bridge node19:08
clarkband that should address our paramiko needs?19:09
clarkbI'll have to take a look at that19:09
fungii'm going to need to launch a new server for mm3 this week probably, so will try to give that change a closer look19:09
clarkbGreat, I'll have need of that too for our next topic19:10
clarkb#topic Upgrading old Servers19:10
ianwyep, it fixes that issue, and i think is a path to help with openstacksdk versions too19:10
ianwif we need two venv's with different versions -- well that's not great, but at least possible19:10
clarkb#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades19:11
clarkblater this week I'm hoping to put a dent into some of these. I think we've basically sorted out all of the logistical challenges for doing jammy things once 861284 has landed19:11
clarkbso ya I'll try to review that and continue to make progress here19:11
clarkbI don't think there is much else to say about this now. We've hit the point where we just need to start doing the ugprades.19:12
clarkb#topic Mailman 319:13
clarkbfungi has been pushign on this again. We ran into some problems with updated images that I think should be happier now19:13
fungiin preparing to do a final round of import tests, i got stuck on upstream updates to the container image tag we were tracking which slightly broke our setup automation19:13
fungiand fixing those in turn broke our forked images19:13
fungiwhich i think i have sorted (waiting for zuul to confirm)19:14
clarkbthe good news is that this is giving us a sneak preview of the sorts of changes that need to be made for mm3 updates19:14
clarkbI wouldn't say the story there is great, but it should at least be doable19:14
fungibut pending successful repeat of a final round of migration testing, i hope to boot a future mm3 prod server later this week and send announcements for migration maintenance maybe?19:14
clarkbthe main issue being the way mailman deals with dependencies and needing specific versions of stuff19:15
clarkbfungi: ++19:15
fungiyeah, it's dependency hell, more or less19:15
fungianyway, might be worth looking at the calendar for some tentative dates19:16
fungii don't expect we need to provide a ton of advance notice for lists.opendev.org and lists.zuul-ci.org, but a couple of weeks heads-up might be nice19:16
fungiwhich puts us in early december at the soonest19:16
clarkbthat seems reasonable19:16
ianw++19:16
fungibased on my tests so far, importing those two sites plus dns cut-over is all doable inside of an hour19:17
fungilists.openstack.org will need a few hours minimum, but i won't want to tackle that until early next year19:17
fungishould i be looking at trying to do the initial sites over a weekend, or do we think a friday is probably acceptable?19:18
clarkbI think weekdays should be fine. Both lists are quite low traffic19:18
fungithinking about friday december 2, or that weekend (3..4)19:18
clarkbthe second is bad for me, but I won't let that stop you19:19
fungiassuming things look good by the end of this week i can send a two-week warning on friday19:19
fungicould shoot for friday december 9 instead, if that works better for folks, but that's getting closer to holidays19:20
fungii expect to be travelling at the end of december, so won't be around a computer as much19:20
corvusi think almost no notice is required and you should feel free to do it at your convenience19:20
fungiyeah, from a sending and receiving messages standpoint it should be essentially transparent19:21
corvus(low traffic lists + smtp queing)19:21
fungifor people doing list moderation, the webui will be down, and when it comes back for them it will need a new login (which they'll be able to get into their accounts for via password recovery steps)19:22
fungithat's really the biggest impact. that and some unknowns around how dkim signatures on some folks' messages may stop validating if the new mailman alters posts in ways that v2 didn't19:23
fricklermaybe we should have a test ml where people concerned about their mailsetup could test this?19:24
clarkbfrickler: unfortunately that would require setting up an entirely new domain19:24
clarkbI think if we didn't have to do so much on a per domain case this would be easier to test and transition19:25
clarkbbut starting with low traffic domains is a reasonable stand in19:25
clarkbanyway I agree. I don't think a ton of notice is necessary. But sending some notice is probably a good idea as there will be user facing changes19:25
fricklernot upfront, but add something like test@lists.opendev.org19:25
fungiwould require a new domain before the migration, but could be a test ml on a migrated domain later if people want to test through it without bothering legitimate lists with noise posts19:25
clarkbfrickler: oh I see, ya not a bad idea19:25
fungiwe've had a test ml in the past, but dropped it during some domain shuffles in recent years19:26
fungii have no objection to adding one19:26
fricklerso that people from the large lists could test before those get moved19:26
clarkbanything else on this topic or should we move on?19:27
fungiif we're talking fridays, the next possible friday would be november 25, but since the 24th is a holiday around here i'm probably not going to be around much on the 25th19:27
clarkbfungi: maybe we should consider a monday instead? and do the 7th or 28th?19:27
ianwit's easier for me to help on a monday, but also i don't imagine i'd be much help ;)19:29
fungimondays mean smaller tolerance for extra downtime and more people actively trying to use stuff19:29
fungibut fridays mean subtle problems may not get spotted for days19:29
fungiso it's hard to say which is worse19:29
clarkbright, but as corvus points out the risk theer is minimal19:29
clarkbfor openstack that might change, but for opendev.org and zuul-ci.org I think it is fine?19:30
fungii have a call at 17:00 utc on the 28th but am basically open otherwise19:30
fungii can give it a shot. later in the day also means ianw is on hand19:30
fungior earlier in the day for frickler19:31
clarkband we can always reschedule if we get closer and decide that timing is bad19:31
fungiyeah, i guess we can revisit in next week's meeting for a go/no-go19:32
clarkbsounds good19:32
fungino need to burn more time on today's agenda19:32
clarkb#topic Updating python base images19:32
clarkb#link https://review.opendev.org/c/opendev/system-config/+/86215219:32
clarkbThis change has the reviews it needs. Assuming nothing else comes up tomorrow I plan to merge it then rebuild some of our images as well19:32
clarkbAt this point this is mostly a heads up that image churn will occur but it is expected to largely be a noop19:33
clarkb#topic Etherpad container log growth19:33
clarkb#link https://review.opendev.org/c/opendev/system-config/+/86406019:33
clarkbThis change ahs been approved. I half expected we need to manually restart the container to hvae it change behavior though19:34
clarkbI'll try to check on it after lunch today and can manually restart it at that point if necessary19:34
clarkbThis should make the etherpad service far more reliable whcih is great19:34
clarkb#topic Quo vadis Storyboard19:35
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-October/000370.html19:35
clarkbI did send that update to the mailing list las week explicitly asking for users that would like to keep using storyboard so that we can plan accordingly19:35
clarkb(and maybe convince them to help maintain it :) )19:35
clarkbI have not seen any movement on that thread since then though19:36
fungii've added a highlight of that topic to this week's foundation newsletter too, just to get some added visibility19:36
clarkbIt does look like the openstack TC is not interested in making any broad openstack wide decisions either. Which means it is unlikely we'll get openstack pushing in any single direction.19:36
clarkbI think we should keep the discussion open for another week then consider feedack collected and use that to make decisions on what we should be doing19:37
clarkbAny other thoughts or concerns about storyboard?19:37
fungii had a user request in #storyboard this morning, but it was fairly easily resolved19:38
fungiduplicate account, colliding e-mail address19:38
fungideactivated the old account and removed the e-mail address from it, then account autocreation for the new openid worked19:38
clarkb#topic Vexxhost server rescue behavior19:40
clarkbI did more testing of this and learned a bit19:40
clarkbFor normally launched instances resolving the disk label collision does fix things.19:40
clarkbFor BFV instances melwitt pointed me at the tempest testing for bfv server rescues and in that testing they set very specific image disk type and bus options19:41
clarkbI suspect that we need an image to be created with those properties that are proeprly set for the volume setup in vexxhost. Then theoretically this would work19:41
clarkbIn both cases I think that vexxhost should consider creating a dedicated rescue image. Possibly one for bfv and one for non bfv. But with labels set (or uuid used) and the appropriate flags19:42
clarkbmnaser: ^ I don't think this is urgent, but it is also a nice feature to have. I'd be curious to know if you have an feedback on that as well19:42
corvusit sounds like bfv was something we previously needed but don't anymore; should we migrate to non-bfv?19:42
clarkbcorvus: that is probably worth considering as well. I did that with the newly deployed gitea load balancer19:43
mnaseri suggest sticking to bfv, non-bfv means your data is sitting on local storage19:43
mnaserrisks are if the hv goes poof the data might be gone, so if its a cattle then you're fine19:43
mnaserbut if it's a pet that might be a bit of a bad time19:43
clarkbmnaser: I think for things like gitea and gerrit we would still mount a distinct data volume, but don't necessarily need the disk to be managed that way too. For the load balancer this is definitely a non issue19:43
mnaseroh well in that case when you're factoring it in then you're good19:44
fungiyeah, we would just deploy a new load balancer in a matter of minutes, it has no persistent state whatsoeevr19:44
clarkbbut also I think these concerns are distinct. Server rescue should work particularly for a public cloud imo so that users can fixup things themselves.19:45
clarkbthen whether or not we boot bfv is something we should consider19:45
ianwdefinitely an interesting point to add to our launch node docs though?  mostly things are "cattle" but with various degrees of how annoying it would be to restore i guess19:45
clarkbIn any case I just wanted to give an update on what I found. rescue can be made to work for non bfv instances on our end and possibly for bfv as well but I'm unsure what to set those image property values to for ceph volumes19:46
clarkb#topic Replacing Twitter19:46
clarkbwe currently have a twitter account to post our status bot alerts to19:47
clarkbfrickler has put this topic up asking if we should consider a switch to fosstodon instead19:47
frickleryes, not urgent for anything, but we should at least prepare imo19:47
clarkbI think that is a reasonable thing to do, but I have no idea what that requires in the status bot code. I assume we'd need a new driver since I doubt mastodon and twitter share an api19:48
fungiis it just a credential/url change for the api in statusbot's integration, or does it need a different api?19:48
fungiyeah, same thing i was wondering19:48
ianwdefinitely a different api19:48
ianwi had a quick look and not *too* hard19:48
fungii guess if someone has time to write the necessary bindings for whatever library implements that, i'm okay with it, but i don't use either twitter or mastodon so can't speak to the exodus situation there19:49
ianwi don't feel like we need to abandon ship on twitter, but also adding mastodon seems like very little cost.  19:49
corvusthere isn't a technical reason to remove twitter support or stop posting there.  (of course, there may be non-technical reasons, but there always have been).  the twitter support itself is a tertiary output (after irc and wiki)19:49
clarkbright we could post to both locations (as long as we can still login to twitter which has apparnetly become an issue for 2fa users)19:50
fungithat sums up my position on it as well, yes19:50
corvusand i agree that adding mastodon is desirable if there are folks who would like to receive info there19:50
ianwopendevinfra i think have chosen fosstondon as a host -- it seems reasonable.  it broadly shares our philosophy, only caveat is that it is english only (for moderation purposes)19:50
ianwnot that i think we send status in other languages ...19:51
clarkbI would say if you are interested and have the time to do it then go for it :) This is unlikely to be a priority but also something taht shouldn't take long I don't expect19:51
corvusfosstodon seems like an appropriate place, but note that they currently have a waitlist19:51
ianwyeah, personally i have an account and like it enough to pitch in some $ via the patreon.  it seems like a more sustainable model to pay for things you like19:52
clarkbmaybe we can get on the waitlist now so that it is ready when we have the driver written?19:53
clarkbbut ya I agree fosstodon seems appropriate19:53
corvusit looks like waitlist processing is not slow19:53
corvusanecdote: https://fosstodon.org/@acmegating waitlisted and approved within a few hours19:53
ianwwell i can quickly put in "opendevinfra" as a name, if we like19:53
frickler+119:53
fungithat matches the twitter account we're using, right?19:54
frickleryes19:54
ianwyep19:54
fungiif so, sgtm19:54
ianwwell i'll do that and add it to the usual places, and i think we can make statusbot talk to it pretty quick19:54
clarkbsounds good19:54
clarkb#topic Open Discussion19:55
clarkbThere were a few more things I was hoping to get to that weren't on the agenda. Let's see if we can cover them really quickly19:55
clarkbI've got a change up to upgrade our Gerrit version19:55
clarkb#link https://review.opendev.org/c/opendev/system-config/+/864217 Upgrade Gerrit to 3.5.419:55
clarkbThis change needs its parent too in order to land19:56
clarkbIf that lands sometime this week I can be around to restart Gerrit to pick it up at some point19:56
clarkbopenstack/ansible-role-zookeeper was renamed to windmill/ansible-role-zookeeper and we've since created a new openstack/ansible-role-zookeeper19:56
fungiyeah, we recently created a new repository for openstack whose name collides with a redirect for an old repository we moved out of openstack (different project entirely, just happens to use the same name). i've got a held node and am going to poke at options19:57
clarkbI didn't expect this to cause problems because we have a bunch of foo/project-config repos but because there was a prior rename we hvae redirects in place which this runs afoul of19:57
clarkbin addition to fixing this in gitea one way or another we should look into update our testing to call these problems out19:57
clarkbAnd finally, nodepool needs newer openstacksdk in order to run under python3.11 (because old sdk uses pythonisms that were deprecated and removedin 3.11). However new opnstacksdk previously didn't work with our nodepool and clouds19:58
frickleralso there's an interesting git-review patch adding patch set descriptions, which looks useful to me https://review.opendev.org/c/opendev/git-review/+/864098 some concern on whether more sophisticated url mangling might be needed, maybe have a look if you're interested19:58
clarkbcorvus has a nodepool test script thing that I'm hoping to try and use to test this without doing a whole nodepool deployment t osee if openstacksdk updates have made things better (and if not identify the problems)19:58
ianwheh, well if it can happen it will ... is the only problem really that apache is sending things the wrong way?19:58
clarkbianw: its gitea itself redirecting, but ya I think that may be the only problem?19:59
fungiianw: not apache but gitea19:59
clarkbfrickler: interesting, I'm not evensure I know what that feature does in gerrit. I'll have to take a look19:59
fricklerand I'd also like to learn other root's opinion on the Ubuntu FIPS token patch, if I'm in a minority I might be fine with getting outvoted19:59
fricklerclarkb: you can see it in the patch, the submitter used their patch version20:00
clarkbfrickler: excellent that will help :)20:00
fricklerhttps://review.opendev.org/c/openstack/project-config/+/861457 for fips20:00
clarkbfrickler: re FIPs I think that is more a question for openstack20:01
fungiyeah, the main issue with things like gerrit patchset descriptions is that we currently can't add regression tests for newer gerrit features unless we can get our git-review tests able to deploy newer gerrit versions20:01
clarkbI don't think it runs afoul of our expectations from a hosting side20:01
corvusi would have expected a new project creation to invalidate the gitea redirects.  regardless of why it didn't work out, the last time i looked, the redirects were a gitea db entry.  probably can be fixed manually, but if so, then we should remember to record that in our yaml files for repo moves since we have assumed that we should be able to reconstruct the gitea redirect mappings from that data alone.20:01
clarkbcorvus: yup ++20:01
fungicorvus: yes, i plan to comment or comment out the relevant entry in opendev/project-config20:01
clarkbfrickler: from the hosting side of things this is a big part of why I don't think we should have fips specific images20:01
fungiafter i finish playing with the held node to determine options20:01
clarkbfrickler: but we already allow jobs to interact with proprietary services (quay is/was for example)20:02
clarkbWe are at time now. Feel free to continue discussion in #opendev or on the mailing list. Thank you for your time everyone20:02
clarkbNext week is a big US holiday week but I expect I'll be around through tuesday and probably most of wednesday20:03
clarkbI don't expect to be around much thursday and friday20:03
clarkb#endmeeting20:03
opendevmeetMeeting ended Tue Nov 15 20:03:19 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:03
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-11-15-19.01.html20:03
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-11-15-19.01.txt20:03
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-11-15-19.01.log.html20:03
fungisame for me20:03
fungithanks clarkb!20:03
ianwcan't believe it's thanksgiving already again!20:04
fungiwe do seem to have a lot of those20:05

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!