Tuesday, 2023-03-07

clarkbJust about meeting time19:00
clarkbwe'll get started shortly19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Mar  7 19:01:14 2023 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
ianwo/19:01
clarkb#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/3ATGLS7XQ2Z2TUC7W4U2747T2XPBR2WL/ Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI didn't think to put this on the agenda but the next virtual PTG and OpenStack release are just a couple/few weeks away19:01
clarkbOpenStack release is March 22 and PTG is March 27-3119:02
clarkbThings to be aware of as we're making changes19:02
clarkb#topic Bastion Host Updates19:03
clarkb#link https://review.opendev.org/q/topic:bridge-backups19:03
clarkbI've reviewed this stack of changes and I think it is ready. That said there were two things I wanted to bring up19:03
clarkbFirst: are we comfortable sticking the private key in the backup itself (fungi might have opinions on this). The alternative is storing it with our portion of the passphrase andkeeping it more secret19:04
fungithis is the private key for encrypting the backup?19:04
clarkbSecond: how do we want to assign passphrase portions and ensure that everyone has taken the appropriate one? THis only works if we all save our bit and don't all save the same one19:04
clarkbfungi: correct. The private key that encrypted the backup and is itself protected by the split up passphrase19:05
ianwyes it's a trade off between loosing the key, or having the wrong one stored19:05
ianwfor distributing the key, i can just generate one on bridge and put usernames against it.  when people have stored it safely, we can delete everything19:05
fungithe usual way shamir is done is for each party to generate and provide part of they key, then there's no risk that people hold onto the wrong parts19:05
clarkbianw: that method works for me19:05
fungibut yeah, i'm fine with either way19:06
ianwfungi: the ssss tool has a 128 character encoding limit.  so we are encoding a 128 mixed case password to the private key19:07
clarkbI just wanted to make sure other reviewers considered the private key risk there and were ok with it.19:09
clarkbI think the plan of annotating with usernames and then everyone indicating when they've captured their bit should work well19:09
clarkbAnything else bridge related or should we move on?19:10
ianwnot from me19:10
clarkb#topic Mailman 319:10
clarkbfungi: ny progress creating the sites in django?19:11
funginope. maybe if unrelated things will stop breaking ;)19:11
fungihopefully with the nodepool provider issues almost behind us i can get back to it19:11
clarkb#topic Gerrit Updates19:12
clarkbcool we can contineu on then19:12
clarkbianw has created a number of changes and a plan for updating our Gerrit ACLs to avoid items that are deprecated in 3.7 (and won't be accepted in new acl updates)19:12
clarkb#link https://review.opendev.org/q/topic:gerrit-s-r-3.7 Cleaning up deprecated copy conditions in project ACLs19:12
ianwoh my upload may have broken the topic on some changes, i'll put them all back19:13
clarkbthere are a couple of axis in this. The first is the vote copy conditions need updating as well as converting to submit requirements. THe other axis is updating regular project ACLs vs All-Projects19:13
clarkbIt would be good for others to review this due to its potential for broad impact. Keep in mind the docs are a bit clunky however ianw has tested on a held node to try and undersatnd behaviors better19:14
clarkbI think things are looking good at this point though and its just a matter of applying the updates19:14
ianw#link https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/message/ZOI6Z3M3K45FCX2CNZ6DYM5QBYZBDFP4/19:14
ianwis a summary too19:14
clarkbI do need to rereview the latest updates though19:14
clarkbOnce this is done we're basically ready to start planning a Gerrit 3.7 upgrade too19:15
ianwhttps://104.130.253.50/c/x/test-project/+/3 is All-Projects with s-r's19:16
clarkbianw: looks like that is a 3.7.1 host. Should we apply All-Projects updates to 3.6.4 too?19:16
clarkbthe little view conditions drop down is maybe a bit detail specific but cool to have that info at the change level19:17
ianwthat is also there in 3.6 as well19:17
clarkbneat19:18
ianwi can apply it to 3.6 if you like, but i don't think in this regard anything is different19:18
clarkbthank you for putting this together. The docs definitely didn't make it easy and probably led to more experimenting than should be necessary19:18
clarkbianw: oh! should we be doing any updates to our CI jobs? I already converted Verified to submit requirements there. But we could update All-Projects for code-review too?19:19
ianwyeah, currently in CI we don't set all-projects, so by default we don't have the workflow label19:20
clarkbI don't think its super urgent but that might be a good way to get ongoing coverage of this19:20
fungii wonder if it would make sense to put some of that in all-projects19:20
fungioh, yes that's what you're pointing out19:21
ianwi guess the problem is that it's either editing what's there, or maintaining a complete all-projects for each release19:21
clarkbhrm ya since we already get a precanned version19:21
ianwat the moment we just get the gerrit default version19:21
ianwwhich gives us perfect "bootstrapability" -- but are we ever going to actually start with a compeltely fresh gerrit?19:22
ianwif the worst happened, we'd restore from backup anyway19:22
clarkbanother thought is that I think code-review is officiall defined with function = MaxWithBlock. Maybe we should ask upstream about changing that to a submit requirement and if that will impact any potential future migrations?19:22
clarkbianw: ya I think its mostly about having the test env align with prod as much as possible. Less about automating things19:23
clarkbBut its also not urgent considering we haven't yet needed to do that19:23
clarkbwe'll likely continue to be fine as is19:23
ianwi've fiddled to much now, i feel like the 3.7 started with code-review as a SR in it's pre-canned All-Projects19:24
clarkbianw: ah ok so there must already be a migration?19:24
clarkbOr maybe they don't migrate it and expect you to update it by hand before/after 3.7. Some understanding of that might not be a bad idea19:25
ianwactually no, looking at the history of All-Projects19:25
ianw+[label "Code-Review"]19:25
ianw+function = MaxWithBlock19:25
clarkbI'm just worred we'll get to doing our upgrade and it will break beacuse the migration tool doesn't know how to reconcile what we've got and the new thing19:25
ianwInitialized Gerrit Code Review 3.7.1-1-ga57c4bf868-dirty19:25
clarkbya ok so they haven't updated it internally yet, but eventually will19:25
clarkbwe should see if they have a plan for that and if converting ahead of time is in conflict with that plan19:26
ianwbeing 100% like production, i think the cla's might be a problem19:26
ianw(making testing All-Projects 100% like production, i mean)19:26
clarkbbecause we would need to add the clas to testing too?19:27
ianwyeah, i think so19:27
ianw+copyCondition = changekind:NO_CHANGE OR changekind:TRIVIAL_REBASE OR is:MIN19:27
ianwat least that explains where i copied the NO_CHANGE copy-condition -- that's the pre-canned 3.7 copyConditions19:28
ianwfor code-review19:28
clarkbbut ya if others can look it over to make sure we aren't changing behaviors or doing anything unexpected that would be great19:28
clarkbI also wanted to call out the results of the gerrit community meeting19:28
clarkbFor Java 17 they are in no rush to drop Java 11. THis means we can stick to Java 11 for now and wait for upstream to fix the issues rather tahn deploying annoying workarounds19:29
clarkbSounds like part of what drives this is the version of java Google wants to deploy with and that isn't expected to change soon19:29
clarkbFor the ssh thing I was told they would accept a docs update for the secret config option19:29
clarkb#link https://gerrit-review.googlesource.com/c/gerrit/+/362054 document secret Gerrit config option19:30
clarkbAnd NasserG would try to test ianw's fix for the underlying issue in their env which has this problem a lot more than ours19:30
fungitangentially related, sometime i want to pick the brains of people who have messed with automated gerrit setup for our deployment testing and zuul quickstart tests in order to figure out how to get un-stuck with updating the gerrit version we test git-review against, last i tried i was running up against problems bootstrapping user accounts (i think), maybe it was specifically19:30
clarkbThis ended up being fairly productive for us. I'll make an effort to continue attending that meeting each month (it is at 8am pacific on the first thursday of a month)19:30
fungiadministrator bootstrapping for project creation19:30
clarkbfungi: the zuul quickstart stuff is pretty easy toread though. Probably more so than our system-config stuff since the zuul deployment is far more focused19:31
clarkb(we're trying to make something that has attributes similar to our prod setup to check things we care about whereas zuul quickstart is just give me a gerrit)19:32
clarkbI want to say the trick is the development flag19:32
fungiyeah, i want to say there were things the git-review tests need which the zuul quickstart doesn't, and that's where i was stuck, but i need to page all that back in again19:33
fungiwe've got a submitted feature addition to support something in newer gerrit we currently can't run the tests for because of being unable to boottrap newer gerrit for it19:33
fungioh, right, setting develop mode broke some things about auth19:33
clarkbhuh ssh should still work as normal19:33
clarkbwe do a lot of ssh with dev mode in the system-config job19:33
fungi#link https://review.opendev.org/849419 Upgrade testing to Gerrit 3.4.419:34
fungii'll follow up after the meeting trying to see if i can find where i got stuck with it19:34
clarkb#topic Upgrading Old Servers19:34
clarkbgitea05-08 are now all remvoed from gerrit replication and not behind the gitea haproxy19:35
clarkb#link https://review.opendev.org/c/opendev/system-config/+/876470 Remove old giteas from config management19:35
clarkbThis change will remove those servers from configuration management so that we can start to delete the old servers19:35
clarkbFor next steps we need to decide how many (if any) additional larger gitea servers we need in the cluster. To help figure this out I've disabled gitea01-04 via the haproxy command socket19:36
clarkbSo far the four new servers have been holding up. But we may need or want additional servers anyway. I'm open to feedback on that.19:36
clarkbThe other next step is moving gitea backups from gitea01 to gitea0919:36
clarkb#link https://review.opendev.org/c/opendev/system-config/+/876471/ should do that19:37
clarkbGood progress here overall. Just a couple of chagnes to land and then decisions on the remaining old servers and if they need to be replaced. Please let me know what you think if you get a chance to look at this19:37
clarkbThe other two tpes of servers on my radar for prioritizing replacements are the nameservers and etherpad19:38
clarkbI think I'll do etherpad after the PTG. Since etherpad is heavily relied upon during the PTG I don't want to be rushed or find jammy's kernel makes nodejs sad or something19:38
clarkbIt should be a quick one too. Deploy new server. Take downtime, move database, update dns, done19:38
clarkbianw: I think you mentioned being able to look at nameservers. Have you had a chance to start that et?19:39
clarkbI think the main thing here is going to be coordinating the chagnes in DNS itself? And thats somethign I need to page back in before I start19:39
clarkbI always forget who has what records19:39
ianwnot yet sorry, sniped by iterations on gerrit stuff.  i think it's probably one we want to pre-plan with a checklist, i can come up with that19:40
clarkb++19:40
clarkbdefinitely a multistep process that we need to take and possibly one with hold points to ensure DNS out in the wild isn't relying on stale nameservers19:40
clarkb#topic AFS quotas19:40
clarkbI noticed yesterday a number of afs volumes are nearing their quota limits19:41
clarkbin particular debian-security is down to a couple hundred MB19:41
clarkbubuntu-ports and one (or both?) of the centos volumes are also close.19:41
clarkbShould I take some time to increase the debian-security one by say 20GB today?19:41
clarkbI don't hear any objections so I'll try to get that done before it creates problems19:43
ianw++19:44
ianwi think we determined there wasn't much we could drop19:44
clarkbya I poked around and didn't find nything obvious at least19:44
clarkbI'm a little less worried about the others but I should look at them and give them similar small bumps if appropriate too19:44
fungistill somewhat curious where the extra utilization has cropped up19:44
clarkbfor centos it seems they add packages without always removing packages19:45
clarkbfor example there are several gigabytes of just thunderbird packages19:45
clarkbI suspect similar may happen with debian for security updates?19:45
clarkbwhere they don't remove the security updates from the security mirror once they add them to regular mirrors19:46
fungithe debian increases are a bit odd, those typically get folded into point releases eventually19:46
fungijust wondering why it would have shot up in the past few months19:46
clarkblots of security updates that they don't actually remove when they add them to the point releases?19:47
clarkbthats my hunch19:47
clarkbBut ya its a bit annoying things just grow forever even on old stable platforms19:49
clarkbAnyawy I'll try to get that done today. I've got an errand to run and school pickup and ther eis an openinfra foundation board meeting so I may end up getting to it tomorrow19:50
clarkb#topic Scheduling project renames19:50
clarkbfungi: you wanted to discuss this. I think we should wait for the openstack release and ptg to complete but then we're probably wide open?19:51
clarkb(this assumes I have the gitea work done by then and I expect this will be the case)19:51
fungiyes, i concur19:52
clarkbeaster is april 9 which may or may not be a good time to do it19:52
ianwwe'll probably be in a place to upgrade gerrit then, if we want to do both19:52
clarkbgood friday or monday after easter?19:52
clarkbianw: I think we should do them as distinct steps but could share a downtime window19:53
fungi2023-03-22 is the openstack release19:53
ianwyeah i did mean sequentially :)19:53
clarkbI think ianw's part of the world gets easter and adjacent days as holidays19:54
clarkbshould we look at the week after that instead?19:54
clarkb(my kids have a four day weekend the week after so I personally prefer using the easter holiday, but I can be flexible)19:54
ianw15/16/18 are public holidays here19:54
fungifriday 2023-03-31 would be the friday after openstack release week, and yeah 2023-04-07 would be the friday before easter i guess19:54
clarkb(also I don't know who did the scheduling at school but they mess that up)19:54
clarkbfungi: I expect we'll all be tired from the PTG so wouldn't want to do it that week. I'm good with the 7th19:55
clarkbor 10th19:55
clarkbI expet both of those days will be quiet due to holidays19:55
fungii'm travelling later in april, need to go check the dates if we're pushing it to later19:55
clarkbianw: do you hvaea preference?19:56
ianwmaybe 7th ish?  i'm happy to drive things at a quiet time19:57
clarkbI'm good anytime in April and would prefer toavoid the 13-16th19:57
fungilooks like i'm gone april 10-1619:57
ianwif we have a good checklist i think the issues are minimal19:57
clarkbok why don't we pencil in the 7th and we can refine that as we get things ready19:57
fungithat sounds good to me19:58
clarkb#info April 7th ish for Gerrit downtime to rename projects and do 3.7 upgrade. Specific downtime window to be determined as we get closer to the day of.19:58
clarkb#topic Open Discussion19:58
clarkbYou have one minute for anything else19:59
clarkbSounds like that was it. Thank you everone20:00
clarkbwe'll be back here same time and location next week.20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Mar  7 20:00:49 2023 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-07-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-07-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2023/infra.2023-03-07-19.01.log.html20:00
ianwthanks!20:01

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!