Tuesday, 2022-10-11

*** kopecmartin\|sick is now known as kopecmartin		08:08
clarkb	meeting time	19:00
ianw	o/	19:00
fungi	ohai	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Oct 11 19:01:04 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
frickler	\o	19:01
clarkb	#link https://lists.opendev.org/pipermail/service-discuss/2022-October/000364.html Our Agenda	19:01
clarkb	#topic Announcements	19:01
clarkb	The PTG is happening next week. Be aware of that as we make changes (best to avoid updating meetpad and etherpad for example)	19:01
clarkb	Also encourage people that have problems to reach out to us directly. In the past we've gotten reports of trouble via a game of telephone and that has been difficult	19:02
clarkb	Also, this morning I spilled a glass of water on my keyboard so I've been in disarray all day. Thankfully it was the desktop keyboard and not the laptop and I've got a spare modem M that has been plugged in	19:03
clarkb	but still its really weird to type on a new keyboard after having that one for a decade	19:03
fungi	the zuul default ansible version changed to 6 across all of our tenants as of late last week, and we'll be dropping ansible 5 support "real soon" so fixing related problems is time-sensitive	19:03
fungi	probably warrants another announcement if we can nail down a timeline for the ansible 5 removal change	19:04
clarkb	I did at least warn of the ansible 5 removal in the email anouncing the switch to 6 by default	19:04
fungi	yep	19:05
clarkb	#topic Topics	19:06
clarkb	#topic Bastion Host Updates	19:06
clarkb	The changes to stop writing console log files on bridge landed yesterday. Looks like there was a small issue getting the flag name correct. ianw do we have an idea yet if that is working as expected?	19:07
fungi	and we still need a similar one for static.o.o right? or is that already up?	19:08
ianw	i just checked and i think so. there hasn't been a log file written since Oct 11 06:09 (UTC) which is about when all the periodic jobs cleared out	19:08
ianw	static has been done the same way, so it looks good too	19:09
clarkb	awesome. Thats one less thing to worry about now :)	19:09
ianw	heh yes, thank you for reviews. i think it was good to reach a bit more of a generic solution	19:10
ianw	the "tunnel console things via a socket and the ssh connection" changes are another option that is still on my todo list, and seems like a great thing to look into as well	19:10
ianw	one day ... :)	19:11
clarkb	ya though I think we don't want to expose that on bridge or static due to how the protocl works	19:11
clarkb	since it could be used to read other files?	19:11
clarkb	I think what we've done not exposing it is correct for us	19:11
fungi	is intended for reading additional files	19:11
fungi	designed with that in mind anyway	19:11
clarkb	The other stack of changes in flight here has to do with ansible in a venv	19:12
clarkb	#link https://review.opendev.org/q/topic:bridge-ansible-venv	19:12
corvus	er... the current protocol shouldn't be able to read other files?	19:12
clarkb	corvus: ya I don't think it does, but that was always part of the intention iirc. And I don't want to have to think about undoing any opening of things if that changes	19:13
fungi	i know it was at least a theoretical future use case	19:13
corvus	clarkb: yes, it was originally designed for that, but we haven't implemented it because we haven't figured out how to do it safely	19:13
clarkb	gotcha	19:13
corvus	so if that's the concern -- just future-proofing cool. but if there was a thought that we had a current vulnerability... then i would like to explore that more. :)	19:14
clarkb	oh no I don't think we have a current vulnerability. We're set up to avoid it should the zuul behavior change to what was (I thought anyway) the intended behavior	19:14
corvus	(and there's really 2 protocols here -- there's the websocket/finger protocol of user -> zuul, and the internal protocol of executor -> node; the former is the one we designed to allow reading other files in the future, and the second is what we just changed)	19:15
corvus	(though support for the former probably would need changes like the latter)	19:16
corvus	okay, so i think we're all on the same page that currently there is not the ability to read arbitrary files, but that we like the status quo of explicitly disabling log streaming on bridge because, among other things, that future-proofs us against eventually adding that feature. ya?	19:16
clarkb	++	19:17
corvus	cool, thx and sorry for the diversion. just wanted to make sure we didn't open something we didn't intend to. :)	19:17
clarkb	ianw: for ansible in a venv did you manage to sort out using the first member of a singleton group as the hosts specification?	19:18
ianw	(specifically i was talking about https://review.opendev.org/c/zuul/zuul/+/542469 but let's not go into that further now :)	19:18
ianw	clarkb: thanks ... one step back i just approved the change reviewed by yourself and fungi to move the production ansible into a venv on the current bridge. so i'll watch that in today. that's the "venv" bit of it really	19:19
fungi	and that preps us for being able to use newer ansible, right?	19:20
clarkb	ya and in theory thta should just switch over due to symlinking the venv install over to ansible	19:20
clarkb	(that was my thought during review anyway)	19:20
ianw	yep, in theory it's a noop :)	19:20
clarkb	fungi: sort of, we need to upgrade the python installation too (which is where the replacement node comes in and why the other group work is related)	19:20
ianw	the bits on top now are about upgrading to jammy, and abstracting the way we address the bastion host so we can switch the host more easily -- in this case to probably bridge01.opendev.org	19:20
ianw	anyway, i did establish that as a playbook matcher "groupname[0]" does seem to work to address the first member of a group	19:21
corvus	like `- hosts: bridgegroup[0]` means this is a play that runs on the first host in the bridge group?	19:22
corvus	(er, in the group named "bridgegroup"; i was trying to be clear and may have failed :)	19:22
fungi	and group member ordering is guaranteed deterministic (uses the order in which the members are added i guess) right?	19:22
clarkb	ya I the idea being we can control what the bridge is in a single place (the bridgegroup group) but then only ever have a single entry in that group	19:22
ianw	yep -- https://review.opendev.org/c/opendev/system-config/+/858476	19:22
clarkb	fungi: the idea is that it would be a singleton group	19:22
clarkb	but to enforce that we would take the first entry everywhere	19:23
fungi	i see	19:23
corvus	why not just let it run on the whole group of 1?	19:23
clarkb	the reason I was concerned with that is it makes the ansible really confusing when you need to address a specific node	19:23
clarkb	like when grabbing the CA files	19:23
clarkb	the ansible you express becomes "create a different CA on every member of the bridge group, but only distribute the CA files for the first group member	19:24
clarkb	if others prefer that I'm ok with that too, but I found it a bit confusing to read when I reviewed it	19:24
ianw	corvus: one problem i haven't dealt with yet is playbooks/bootstrap-bridge.yaml. that runs both under zuul, where the inventory is setup via the job, and in infra-prod, where the inventory is setup by opendev/base-jobs	19:25
corvus	i'm not sure whether or not i would have the same confusion, but i certainly see your point, and the solution seems good. now that i know the reasoning, i can be on board with that.	19:25
ianw	so basically both have to agree on the name/group. this is a bit annoying for clarkb's note of trying to use a different group name for the initial setup bastion host, and the production version	19:26
ianw	sorry, that wasn't intended for corvus: ... :)	19:26
clarkb	oh hrm if using distinct groups for the top level ansible and nested ansible in CI is problematic I think we can just not do that	19:26
corvus	oh whew cause that's a hard question and i was struggling with that. glad i'm off the hook. :)	19:27
clarkb	it was an idea I had whentrying to sort out why the job needed to redefine the group	19:27
ianw	yeah, it is mostly explained in the comment at https://review.opendev.org/c/opendev/system-config/+/858476/9/zuul.d/system-config-run.yaml	19:27
ianw	anyway -- i will keep at it and see what we can come up with; i don't think we need a solution now	19:28
corvus	intuitively, having the group name be the same makes sense to me... so if that's a workable/livable option i would be in favor of that.	19:28
ianw	i think that's where i'm coming back to as well ...	19:29
corvus	and maybe keep a version of that comment explaining that we're using that as a group for the zuul playbook	19:30
fungi	sounds good to me	19:30
clarkb	wfm	19:30
ianw	yes i will definitely do my usual probably-too-verbose commenting on all this :)	19:30
ianw	anyway, I think it's quite likely by this time next meeting we'll have a fully updated bridge, and an easier path when we want to rotate it out next time as well	19:32
clarkb	sounds good. Thank you for working through all the little details of this	19:32
clarkb	#topic Upgrading Bionic Servers	19:32
clarkb	The expected fix for removing the ubuntu user has landed. Now just need to try booting a jammy control plane server again. I'm hoping to give that a go sometime this week.	19:33
clarkb	Sounds like ianw may also give it a go	19:33
clarkb	But other than that I didn't have any new updated here	19:33
fungi	we'll want it before we boot the new listserv at the very least	19:33
clarkb	yup I was thinking I'd find something easy to replace as a guinea pig like a mirror maybe	19:35
clarkb	but probably not until the end of this week	19:35
clarkb	Lets keep moving as the last topic on the agenda is one that deserves discussion before we run out of time	19:36
clarkb	#topic Mailman 3	19:36
clarkb	fungi has edited the extra long strings on he production mailman2 site and has begun the process of copying data for reattempting the mm3 migration on a newly held test node with our forked images	19:36
fungi	new held node for this is 149.202.168.204, built from your container image fork	19:37
fungi	will hopefully kick off a new scripted import on it within the next hour or so	19:37
fungi	depending on how much longer the rsync runs	19:37
clarkb	corvus: we noticed that a child change of https://review.opendev.org/c/opendev/system-config/+/860157 doesn't find the images that change builds. And were wondering if we got the bits wrong for telling zuul about the image	19:37
clarkb	corvus: maybe if you get some time you can take a look at how the new image build jobs and system-config-run-mailman3 job are hooked up with the buildest registry and provides/requires and dependencies	19:37
clarkb	we've worked around it by forcingthe node hold change to rebuild the images itself	19:38
clarkb	fungi: anything else you need from the rest of us? I expect it is largely just a wait for test results though	19:38
fungi	we've knocked out about all the remaining todo items, so we're probably ready to talk scheduling for lists.opendev.org and lists.zuul-ci.org production migrations	19:39
corvus	clarkb: let's continue that in #opendev	19:39
fungi	i did want to check a few more urls for possible easy/convenient redirects (things like list description pages which people tend to link in various places)	19:39
fungi	stuff not covered by keeping a copy of the pipermail archives hosted from the new server	19:40
clarkb	corvus: yup don't need to solve that here	19:40
clarkb	fungi: good idea, the existing redirects are probably not much help though as tehy redirect to content on disk but you probbaly want url redirects to mm3 urls for those	19:40
fungi	right. i think the list description pages are probably the only thing we really care about redirects for	19:41
fungi	the list indexes for the sites are just served from the root url of each vhost anyway	19:42
fungi	and i'm not too worried about redirecting old admin and moderator interface urls	19:42
clarkb	makes sense	19:42
clarkb	anything else on mm3 before we continue?	19:42
fungi	we should probably also confirm whether we want local logins for users or whether there's a desire to hold this for keycloak integration in order to avoid local credentials in mailman	19:43
fungi	i'm assuming we'd rather get the mm3 migration done and then look at keycloak integration after the fact, but just want to be sure everyone's on the same page there	19:44
clarkb	you can subscribe to lists without creating a user (I did this with upstream mm3)	19:44
fungi	correct	19:44
clarkb	we might even encourage users to do that if they never want to use the web ui for repsonding to things	19:44
clarkb	but ya I wasn't too worried about a future switch over	19:44
ianw	just off the top of my head, it feels like if we allow local logins and then move to a more generic keycloak, we then have the problem of having to merge the local users too?	19:44
fungi	list admins/moderators will need accounts though, and if someone wants to adjust their subscription preferences they'll need a login	19:44
clarkb	ianw: yes we'd likely need to do that. The good thing is we should have email on both sides to align them at least	19:45
fungi	ianw: we'll have that either way. subscribers technically all have accounts, they just don't necessarily have login info for them unless they go through the password reset	19:45
ianw	ahh ok	19:46
frickler	is the login per list or per site or per installation? for mm2 it was per list iiuc	19:46
clarkb	frickler: its per installation	19:46
fungi	frickler: right, for mm3 it's system-wide	19:47
fungi	so not just all lists on a given site, but all mailman sites on that server	19:47
fungi	convenient for folks who interact with a lot of lists, especially across multiple domains on the sam ehost	19:47
fungi	same host	19:48
frickler	so if this is needed to set e.g. digest mode, I think we cannot delay it into the future	19:49
fungi	anyway, i didn't have anything else. we can mull that over, i expect we'll start doing migration scheduling after the ptg	19:49
fungi	frickler: correct	19:49
fungi	basically the options are 1. wait to migrate lists to mm3 until we have keycloak in production the way we want, or 2. migrate to mm3 and then integrate keycloak later and make sure accounts can be linked/merged as needed	19:50
clarkb	right, I think some users will still need to create accounts, but a good chunk of them shouldn't need to which helps simplify things if we want to try and keep them simple like that	19:50
clarkb	I'm fine with 2	19:50
frickler	ack	19:50
fungi	well, to reiterate, the accounts are precreated, whether the users have login info for them or not	19:50
clarkb	fungi: for all uses?	19:51
clarkb	I guess the migration doesn't stick to not creating an account if it doesn't need to	19:51
fungi	if they're referenced in a config (admin, mod, existing subscription) then the import process creates their accounts. if they subscribe later an account is created the first time they do so	19:51
clarkb	anyway I think its fine to migrate them later since in this case we should have the info needed to make associations	19:51
clarkb	also the mailing list is the sort of thing that can probably safely not have single sign on forever	19:52
clarkb	we are running out of time and I do want to get to the last item on the agenda	19:53
clarkb	we can return to this in #opendev if necessary	19:53
fungi	please do	19:53
clarkb	#topic Updating OpenDev's base job nodeset to Jammy	19:53
clarkb	It has been pointed out that OpenDev's base job nodeset is still Focal. Jammy has been out for about half a year now and has a .1 release. It should be stable enough for our jobs	19:53
clarkb	But that opens questions about how we want to communicate and schedule the switch	19:54
frickler	yes, I came across that while looking to upgrade devstack jobs	19:54
clarkb	I was thinking that we should avoid changing it before the PTG since that will just add a distraction during PTG week. But maybe we can do it the week after ish? Basically do a 2 week notice to service-announce and then swap?	19:54
fungi	openstack is actively switching from focal to jammy for testing now that their zed release is done	19:54
frickler	I think we'd want to run some tests with base-test before discussing details of scheduling?	19:54
clarkb	frickler: in the past we've done that (when the infra team managed this all for openstack) and he problem with that is it sets the expectation that we are repsonsible for making it work for every job	19:55
clarkb	I twas the xenial switch or maybe trusty switch that made me never want to do that again.	19:55
clarkb	I think people should test what they are interested in and be explicit where they know they need to be (say for specific verisons of python).	19:56
frickler	still we'd need to change base-test in order to allow for that?	19:56
frickler	#link https://review.opendev.org/c/opendev/base-jobs/+/860686 would be the change for that	19:56
clarkb	frickler: no, any job can select the jammy nodeset	19:56
fungi	anything inheriting from our default nodeset which breaks when we change it has the option of overriding the nodeset it uses to the earlier value anyway	19:56
fungi	just as it can be adjusted to use the new value before our planned transition date	19:57
frickler	hmm, true that	19:57
clarkb	I think updating base-test is a good idea to keep it in sync with base. But I don't think that is the method for tesitng this. base-test is for testing the roles in base	19:57
clarkb	we know they work on jammy because projects like zuul already use jammy	19:57
clarkb	so we don	19:57
clarkb	er we don't need to test that base functionality	19:58
corvus	i agree i don't think this needs a base-test cycle since we know that the change won't break all jobs (because we can and have made the change explicitly elsewhere, and zuul performs syntax validation on the change)	19:58
fungi	in my mind, the main questions are when do we plan to switch it/how much advance notice do we want to provide users	19:58
clarkb	fungi: ++	19:58
clarkb	I think we should wait for after the PTG at the very least	19:58
fungi	wait for after the ptg to announce it, or for actually changing it?	19:59
clarkb	actually changing it. Ideally we should announce whatever we decide on real soon now	19:59
frickler	2 week notice should be fine then. announce now	19:59
fungi	sounds good to me	19:59
ianw	++	19:59
clarkb	cool. I can work on a draft for service-announce after lunch today	20:00
clarkb	(I'm happy to send that as I think most others get moderated)	20:00
fungi	however, we should be mindful of the zuul dropping ansible 5 situation as well, and whether we want those to coincide, or be announced together, or not compete	20:00
clarkb	dropping ansible 5 has already been announced but without a hard date. I think it was a week or so from today that zuul had planned to drop ansible5	20:01
frickler	agree, having a couple of days between them will help in distinguishing failure causes	20:01
clarkb	we will need to manually restart zuul to pick up that change quicker than our weekly restarts. But that is easy to do	20:02
clarkb	(also I don't think anything is using ansible 5 so should be an easy switch)	20:02
clarkb	I'll work on a draft email for all that in a bit	20:02
fungi	thanks!	20:02
clarkb	and we are at time	20:02
ianw	for mine i think it probably gets confusing to combine them as a single change, as they're not really related as such, so agree with doing separtely	20:02
clarkb	thanks everyone	20:02
corvus	thanks clarkb	20:02
clarkb	feel free to continue discussion over in #opendev	20:03
clarkb	#endmeeting	20:03
opendevmeet	Meeting ended Tue Oct 11 20:03:05 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:03
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.html	20:03
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.txt	20:03
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-11-19.01.log.html	20:03
fungi	thanks clarkb!	20:03
clarkb	I'm also trying to decide how hard I should try and rescue that keyboard	20:03
clarkb	I have to rip rubber feet off the bottom to get at the screws then somehow workaround some clips....	20:03
clarkb	but one thing at a time	20:03

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!