Tuesday, 2022-11-29

*** clarkb is now known as Guest29801:19
*** Guest298 is now known as clarkb01:20
clarkbmeeting time19:00
clarkbI'm actually going to try and end a little early today as I have an appointment to get to after our meeting19:00
fungisgtm19:01
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Nov 29 19:01:04 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-November/000383.html Our Agenda19:01
clarkb#topic Announcements19:01
ianwo/19:01
clarkbThere is a board meeting next week December 6 at 2100 UTC and weblate for openstack translations will be discussed if anyone is interested in attending19:02
clarkbThe summit cfp is also open. I noticed that SCaLE's cfp closes friday as well19:03
clarkband fossdem is doing things19:03
clarkbanythine else to announce?19:04
clarkb#topic Bastion Host Updates19:05
clarkbwith the holiday weekend I've sort of lost where we are at on this19:06
clarkbI think there were changes to manage rax dns with a small script19:06
clarkband openstacksdk updated to fix the networking issue when booting rax nodes19:07
ianwyep there's a stack for updating the node launcher @19:07
ianw#link https://review.opendev.org/q/topic:rax-rdns19:07
fungi#link https://review.opendev.org/865320 Improve launch-node deps and fix script bugs19:07
fungialso that19:07
ianwthat has a tool for updating RAX rdns automatically when launching nodes, and also updates our dns outputs etc.19:08
ianwthere's also19:08
ianw#link https://review.opendev.org/q/topic:bridge-osc19:08
ianwwhich updates the "openstack" command on the bastion host that doesn't currently work19:09
ianwand then 19:09
ianw#link https://review.opendev.org/q/topic:bridge-ansible-update19:09
fungithough in theory we could just deeplink from /usr/local/(s)bin to the osc in the launch env and not manage two separate copies19:09
ianwis a separate stack that updates ansible on the bridge 19:09
ianwfungi: that's what https://review.opendev.org/c/opendev/system-config/+/865606 does :)19:10
fungioh, perfect ;)19:10
clarkbcool. I'll do my best to pull these up after my appointment and review as many as I can19:11
clarkbanything else with the bastion?19:11
ianw#link https://review.opendev.org/c/opendev/system-config/+/86578419:12
ianwis a tool for backing up bits of the host19:12
ianwthat's about it.  i think it's pretty close to being as much of a "normal" host as i think it can be19:13
ianwthere's still19:13
ianw#link https://review.opendev.org/q/topic:prod-bastion-group19:13
clarkband that last link will do the parallel runs right?19:13
ianwto update jobs so they can run in parallel19:14
ianwyep19:14
clarkbI'm thinking we should stablize as much as we can prior to that as that alone demands a fair bit of attention19:14
ianwagree on that, not high priority19:14
clarkb#topic Upgrading Old Servers19:14
clarkb#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes19:14
clarkbI don't have anything new to add to this :( I keep finding distractions19:15
clarkb#topic Mailman 319:15
clarkb#link https://etherpad.opendev.org/p/mm3migration Server and list migration notes19:15
clarkb#link https://review.opendev.org/c/opendev/system-config/+/86536119:15
clarkbI believe fungi is just about ready to deploy the new server19:15
clarkbthen its a matter of announcing and performing the swap over19:15
fungiwell, the server exists, but the inventory change is in the gate noiw19:16
fungidns is already there19:16
clarkbit just merged :)19:16
fungiperfect19:16
fungiand i've added reverse dns and am making sure it's clear of any blocklists19:17
fungiso assuming the deploy from the inventory addition checks out, we should be able to announce a maintenance window now19:17
clarkbis there anything you need from us at this point?19:18
fungii'll float a draft etherpad in #opendev later today or early tomorrow19:18
clarkbsounds good19:18
fungiwhen would folks want to do the maintenance? monday december 5?19:18
clarkbthat day should work for me19:18
ianw++19:19
fungias for things to cover in the announcement, people wanting to manage list moderation queues and configs or adjust their subscriptions will need to create accounts (but if they use the same address from the configs we've imported then those roles will be linked as soon as they follow the link from the resulting confirmation e-mail)19:20
fungialso we're unable to migrate any held messages from moderation queues, so list moderators will want to do a quick pass over theirs shortly before the maintenance window19:20
fungiother than that, and the brief outage and what the new ip addresses are, is there anything else i should cover in the announcement?19:21
fricklercan we first stop incoming mails to avoid race conditions?19:21
corvusi can do that for zuul on sunday/monday19:21
clarkbfrickler: yes the etherpad link above covers the whole process19:21
clarkband I'm pretty sure stopping incoming mail is part of it?19:22
fricklerbut before the final moderation run?19:22
clarkbthat should force it to queue up and then get through to the new server when dns is updated19:22
clarkbfrickler: oh specifically for moderation. For that I'm not sure19:22
fricklermaybe not so relevant for these low volume lists, but possibly for openstack-discuss19:22
fungithe plan is to stop incoming mails by switching the hostnames for the sites to nonexistent addresses temporarily19:23
fungibecause we can't easily turn off some sites and not others at the mta layer19:23
fungibut also to keep that as brief as possible, just long enough for dns to propagate and the imports to run19:24
fungiand then we'll switch to the proper addresses in dns so messages deferred on the senders mtas will end up at the correct server19:24
fungifor openstack-discuss, i'll personally be moderating it anyway and can literally check the moderation queue via an /etc/hosts override after dns is swizzled before the import19:25
fricklerok, guess I should've read the etherpad first ;)19:25
fungibut that won't be migrated in this upcoming window, it'll be sometime early next year19:26
clarkb#topic Quo Vadis Storyboard19:27
clarkbI sent the followup to the thread that I promised (a little late sorry)19:27
clarkbtried to summarize the position the opendev team is in and some ideas for improving the status quo19:28
clarkbI'd love to hear feedback (ideally on the thread so that others can easily follow along)19:28
clarkbfrickler did mention that if we're going to invest further in the gitea deployment (and maybe if we don't anyway) we'll need to sort out the gitea fork situation19:28
clarkbI followed that whole thing as it was happening a few weeks ago and didn't feel like we needed to take any drastic action at the time19:29
clarkbBut i agree it is worth being informed and up to date on that situation to ensure we're putting effort in the correct location19:29
fungii guess people are still pushing to fork and it hasn't settled out yet?19:29
clarkbyes I think a fork is in progress, but I'm not sure it is deployable yet19:29
fungidoes it have a name?19:30
frickleryes, my main concern when I saw this was we might build something for gitea and then it turns out we would want/need the fork instead19:30
fricklerforgejo19:30
fricklerhttps://codeberg.org/forgejo/forgejo19:30
fricklernot sure though whether that is _the_ fork or just one of multiple ones19:30
fungithey had to one-up gitea on unpronounceable names19:30
fricklerit's esperanto for "forge" they say19:31
fungii saw19:32
clarkbI dont' think we need drastic action today either fwiw19:32
clarkbbut it would be good to evaluate both if we dig deeping into using gitea's extra functionality19:32
fungisome of the folks involved in that have a history of getting very excited about things and then dropping them months later, so i'm not holding my breath yet anyway19:33
clarkb#topic Vexxhost instance rescuing19:34
clarkbThis is mostly on the agenda at this point to try adn remind me to find time when jrosser's day overlaps with mine to gather info on the bfv setup they use for rescuing19:34
clarkbthen I can try and set that up in vexxhost and test it. IF I can make that work then we can develop a rescue image with all the right settings maybe19:35
clarkbbut no new updates on this since the last time we met19:35
clarkb(holidays will do that)19:35
clarkb#topic Gerrit 3.6 Upgrade19:35
clarkb#link https://etherpad.opendev.org/p/gerrit-upgrade-3.619:35
clarkbianw: I've skimmed this doc a couple times at this point. The main new bit of feedback I've got (which is on the etherpad too) is that the latest 3.5 release which we upgraded to recently was made partially to fix a bunch of issues with copy approvals19:35
clarkbI think it might be worth catching up on the state of copy approvals upstream (just to be sure there aren't any more bug fixes outstanding) then give it a go on our installation?19:36
clarkbthat way if our install finds new issues we have time to work upstream to address them19:36
ianwyeah i agree -- aiui that can be run online just fine right?19:36
ianwit looks likely to be a multiple-hour thing19:37
clarkbianw: yes, it is supposed to be able to run online in the background. Digging through logs for that might be the hard part to double check it was happy19:37
clarkbthe other piece which I don't thin kwe've done is evaluate any other potentailly breaking or user visible updates19:37
clarkbjust to try and idnetify any major things people might complain about19:38
ianwyeah, i need to spin up the node19:38
ianwthen we can poke19:38
clarkb3.7 is looking to be far more problematic too so  I'm glad we aren't trying to make that jump yet19:38
clarkbit has currently broken recheck comments for example19:39
ianwif you like, i can try running the copy-approvals when it slows down in a few hours, and monitor it19:39
clarkbianw: I think the first thing is to look at the changelog and open changes for 3.5 to see if we are missing any copy approvals updates19:39
clarkbupdate our image if necessary but then ya running it19:39
corvusclarkb:  upstream bug suggests there should be an easy fix for that in zuul; i'll be investigating it soon19:40
ianw++ i'll check that out this morning and we can sync on it19:40
clarkbcorvus: it is a bit weird to me that want zuul to address it? its the commetn added event which has for a decade now included the content of the comment19:40
clarkbcorvus: I mean, its great if we can workaround it but I feel like that breaks a pretty base level expectation19:40
clarkbianw: sounds good. Anything else gerrit 3.6 upgrade related?19:41
corvusassuming the upstream bug is accurate, they've had this backwards compat in place for years at our request, so i'm not ready to ding them on that.19:41
ianwclarkb: nope, let's get that groundwork done then we can decide on update schedules19:42
clarkbsounds good19:42
corvusanyway, give me a chance to actually look into it so i can speak from knowledge :)19:42
clarkbcorvus: ++19:42
clarkb#topic Acme.sh failures19:43
clarkbacme.sh switched to ecc certs/keys by default (from rsa) recently and broke our ability to renew things19:43
clarkbianw fungi and I all poked at it a bit and I think ianw was able to track it down to that specific change and wrote an upstream issue about it19:43
clarkb#link https://github.com/acmesh-official/acme.sh/issues/441619:43
clarkbBasically file paths changed and now acme.sh can't find data it is looking for later19:44
clarkbto address this we've pinned to the previous release: 3.0.5 and the change to do that landed recently19:44
clarkbwe should expect that certs that our cert checker complains about refresh overnight today and address that (maybe sooner)19:44
clarkbOne thing I wanted to ask is if we should try and explicitly set settings like that to avoid underlying changes impacting us19:45
clarkbwe can set the key type and length and so on explicitly19:45
clarkband that might allow us to avoid pinning?19:45
clarkbThe downside to this is 5 years from now when rsa 2048 is no longer safe <_<19:45
ianwwe could -- this feels like a bit of a corner case because the ecc path seems to be a bit of a separate, opt-in thing, and upstream sort of half-switched it19:46
ianwthey can't really make the tool suddenly start updating certs in a different place on disk (a directory with _ecc appended) because that would seem to break everything19:47
clarkbmaybe we wait for feedback on the issue before deciding on how to move forward post pin19:47
clarkbI just wanted to call the option of being explicit as an alternative to pinning19:47
ianwyeah; i think it's good we've pointed it out -- we've run from dev for several years and this is the first time we've had an issue19:47
ianwso i'll keep an eye, and if we can get back to a point of running from dev I think that's desirable to continue being a canary19:48
ianwwe can always pin to the last known working thing easily19:48
clarkbsounds good and thank you for digging into that yesterday19:48
clarkbI had a note on your debugging change too no sure if you saw19:49
clarkbthe one that updates driver.sh19:49
clarkb#topic Open Discussion19:49
ianwok thanks, will go over those.  just a couple of things that would have made it easier if happens again19:49
clarkb++19:49
clarkbcorvus: ianw's fix to openstacksdk for launch node may be what we need for latest openstacksdk to work with nodepool too so I'll try to reprioritize testing that with your test tool19:50
clarkbalso linaro's new arm cloud appears to be near ready for use. This is being spun up to get us off old hardware that equinix wants to shutdown19:50
clarkbI expect we'll need to move the builder and the mirror node manually (but maybe linaro has some magic to move those vms? I doubt it though )19:51
ianwyeah pretty sure that will all be a rebuild19:51
clarkbAnd a reminder that we'll turn off the iweb cloud at the end of the year.19:51
ianwthat's ok, good test of the launch node changes :)19:51
clarkb++19:51
clarkblast call for anything else19:52
fungii've checked the new mm3 server's ip addresses against spamhaus and senderbase, and they're all clean19:54
clarkbexcellent19:55
clarkbThank you all for your time (in the meeting and working on OpenDev)! We'll be back next week. I should look at a calendar soon to see how December holidays and the new yaer impact our schedule.19:55
fungiannouncement is being drafted in the migration plan etherpad and is nearly complete, so i'll give folks a heads up in #opendev once it's ready to proof19:55
fungithanks clarkb!19:55
clarkb#endmeeting19:55
opendevmeetMeeting ended Tue Nov 29 19:55:27 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:55
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-11-29-19.01.html19:55
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-11-29-19.01.txt19:55
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-11-29-19.01.log.html19:55
clarkb5 minutes early :)19:55
funginice19:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!