Tuesday, 2022-10-04

clarkb	Hello, it is meeting time	18:59
clarkb	we'll get started in a couple of minutes	18:59
fungi	ahoy!	18:59
ianw	o/	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Oct 4 19:01:15 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/pipermail/service-discuss/2022-October/000363.html Our Agenda	19:01
clarkb	#topic Announcements	19:01
clarkb	The OpenStack release is happenign this week (tomorrow in fact)	19:01
clarkb	fungi: I Think you indicated you would try to be around early tomorrow to keep an eye on things. I'll do my best too	19:01
clarkb	But I don't expect any issues	19:02
fungi	yeah, though i have appointments starting around 14:00 utc	19:02
fungi	so will be less available at that point	19:02
fungi	extra eyes are appreciated	19:02
clarkb	I can probably be around at that point and take over	19:03
clarkb	The other thing to note is that the PTG is in 2 weeks	19:03
clarkb	#topic Bastion Host Changes	19:04
clarkb	lets dive right into the agenda	19:04
clarkb	ianw has made progress on a stack fo chagnes to shift bridge to running ansible out of a venv	19:04
clarkb	#link https://review.opendev.org/q/topic:bridge-ansible-venv	19:04
clarkb	The changes lgtm but please do reivew them carefully since this is the bastion host	19:04
ianw	yep i need to loop back on your comments thankyou, but it's close	19:05
clarkb	ianw: one thing I noted on one of the chagnes is that launch node may need different venvs for different clouds in order to have different versions of oepnstacksdk	19:05
clarkb	It is possible that good followup to this will be managing launch node venvs for that purpose	19:05
clarkb	And then separately your change to update zuul to disable console log file generation landed in zuul and I think the most recent restart of the cluster picked it up	19:06
clarkb	That means we can configure out jobs to not write those files	19:06
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/855472	19:06
ianw	yeah; is that mostly cinder/rax? i feel like that's been a pita before, and i saw in scrollback annoyances adding disk to nodepool builders	19:06
ianw	(openstacksdk venvs)	19:06
clarkb	ianw: its now rax and networking (not sure if nova or neutron is the problem there)	19:06
clarkb	but ya	19:06
clarkb	ianw: re console log writing I have a note there that a second location also needs the update.	19:07
fungi	though on a related note, the openstacksdk maintainers want to add a new pipeline in the openstack tenant of our zuul to ease testing of public clouds	19:07
fungi	(a "post-review" pipeline and associated required label in gerrit to enable/trigger it)	19:08
clarkb	and I've proposed a topic to the openstack tc ptg to discuss not forgetting the sdk is a tool for end users in addition to an itnernal api tool for openstack clusters	19:08
ianw	++	19:08
fungi	i think i'm the only reviewer to have provided them feedback on those changes so far	19:08
ianw	we don't want to have to start another project to smooth out differences in openstacksdk versions ... maybe call it "shade" :)	19:08
clarkb	fungi: I thought I left a comment too	19:09
fungi	ahh, cool	19:09
fungi	i probably missed the update	19:09
clarkb	indicating that there isn't a reason to put it in project-config I don't think	19:09
clarkb	since project-config doesn't protect the secrets in the way they think it does	19:10
fungi	oh, that part, yeah	19:10
fungi	the pipeline creation still needs to happen in project-config though, as does the acl addition and support for the new review label in our linter	19:10
clarkb	I guess I'm not up to date on why any of that is necessary. I'll have to take another look	19:11
fungi	i can bring up more details when we get to open discussion	19:11
clarkb	but ya infra-root please look over the ansible in venv changes and the console log file disabling change(s). And ianw don't forget the second change needed for that	19:11
clarkb	Anything else to bring up on this topic?	19:11
ianw	yep i'll loop back on that	19:12
ianw	one minor change this relevaed in zuul was	19:12
ianw	#link https://review.opendev.org/c/zuul/zuul/+/860062	19:12
ianw	after i messed up the node allocations. that improves an edge-case error message	19:12
ianw	i think probably the last thing i can do is switch the testing to "bridge.opendev.org"	19:12
ianw	all the changes should have abstracted things such that it should work	19:13
ianw	at that point, i think we're ready (modulo launching focal nodes) to do the switch. it will still be quite a manual process getting secrets etc, but i'm happy to do that	19:13
clarkb	ya and using the symlink into $PATH should make it fairly transparent to all the infra-prod job runs	19:13
clarkb	#topic Updating Bionic Servers / Launching Jammy Servers	19:15
clarkb	Thats a good lead into the next topic	19:15
clarkb	corvus did try to launch the new tracing server on a jammy host but that failed because our base user role couldn't delete the ubuntu user as a process was running and owned by it	19:15
clarkb	I believe what happened there is launch node logged in as the ubuntu user and used it to set up root. Then it logged back in as root and tried to delete the ubuntu user but something was left behind from the original login	19:16
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/860112 Update to launch node to handle jammy hosts	19:16
clarkb	That is one attempt at addressing this. Basically we use userdel --force whcih won't care if a process is running. Then the end of launch node processing involves a reboot which should clear out any stale processes	19:16
clarkb	The downside to this is that --force has some behaviors we may not want generally which is why I've limited the --force deletion to users associated with the base distro cloud images and not with our regular users	19:17
clarkb	this way failures to remove regular users will bubble up and we can debug them more closely	19:17
clarkb	If we don't like that I think another approach would be to have launch login as ubuntu, set up root, then reboot the host and log back in after a reboot	19:18
corvus	what kind of undesirable behaviors?	19:18
clarkb	the reboot should clear out any stale processes and allow userdel to run as before	19:18
clarkb	corvus: "This option forces the removal of the user account, even if the user is still logged in. It also forces userdel to remove the user's home directory and mail spool, even if another user uses the same home directory or if the mail spool is not owned by the specified user."	19:18
clarkb	corvus: in particular I think we want it to error if a normal user outside of the launch context is logged in or otherwise has processes running	19:19
clarkb	as that is something we should address. In the launch context the ubuntu user isn't something we care about and we'll reboot in a few minuets anyway	19:19
corvus	yep agree. seems like --force is okay (even exactly what we want) for this case, and basically almost never otherwise.	19:19
clarkb	anyway I expect that with that change landed we can retry a jammy node launch and see if we make more progress there	19:20
clarkb	but also let me know if we want to try a different appraoch like the early reboot during launch idea	19:21
clarkb	Did anyone else have server upgrade related items for this topic?	19:21
ianw	all sounds good thanks! hopefully we have some new nodes up soon :) if not bridge, the arm64 bits too	19:22
clarkb	#topic Mailman 3	19:23
clarkb	We continue to make progress. Though things have probably slowed a bit	19:23
clarkb	In particular my efforts to work upstream to improve the images seems to have stalled.	19:23
clarkb	There haven't been any responses to the github issues and PRs so I sent email to the mailman3 users list and the response I got there was that maxking is basically the only person who devs on those and we need to wait for maxking	19:23
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/860157 Forking upstream mm3 images	19:24
clarkb	because of that I finally gave in and pushed ^ to fork the images.	19:24
clarkb	I think this leads us to two major questions: 1) Do we want to fork or just use the images with their existing issues? and 2) If we do want to fork how forked do we want to get? If we do a minimal fork we can more easily resync with upstream if they become active again. But then we need to continue to carry workarounds in our mm3 role and stick to their uid and gid	19:25
clarkb	selections.	19:25
clarkb	It is worth noting that I did look at maybe just building our own images based on our python base image stuff. The problem with that is it appears there is a lot of inside knowledge over what versions of things need to be combined together to make a working system	19:25
clarkb	https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/message/H7YK27E4GKG3KNAUPWTV32XWRWPFEU25/ upstream even acknowledges the confusion	19:26
clarkb	For that reason I think we're best off forking or working upstream if we can amange it and then hope upstream curates those lists of specific versions for us	19:26
clarkb	The existing change does a "lightweight" fork fwiw. The only change I made to the images was to install lynx which is necessary for html to text conversion	19:27
clarkb	I don't think we need to decide on any of this right now in the meeting. But I wanted ot throw the considerations out there and ask ya'll to take a look. Feel free to leave your thoughts on the chagne and I'll do my best to followup there	19:28
clarkb	with that out of the way fungi did you have anything to add on the testing side ?	19:28
fungi	it seems like a reasonable path forward, and opens us up to adding other fixes	19:28
fungi	i expect we'll want to hold another node with the build using the forked containers, and do another test import	19:28
clarkb	++ and probably do that after we update the prod fields that are too long for the new db?	19:29
fungi	i also wanted to double-check that we're redirecting some common patterns like list description pages	19:29
fungi	and i was going to fix those three lists with message templates that were too large for the columns in the db and do at least one more import test	19:29
fungi	yes	19:30
fungi	but otherwise we're probably close to scheduling maintenance for some initial site cut-overs	19:30
clarkb	sounds good. Maybe see if we can get feedback on the image fork idea and then hold based on that	19:31
fungi	right	19:31
clarkb	since we may need to make changes to the images	19:31
fungi	and maybe we'll hear back from the upstream image maintainer	19:31
fungi	but at least we have options if not	19:31
clarkb	Anything else?	19:32
fungi	not on my end	19:32
clarkb	#topic Gitea Connectivity Issues	19:33
clarkb	At the end of last week we had several reports from users in europe that had problems with git clones to opendev.org	19:33
clarkb	We were unable to reproduce this from north american' isp connections and from our ovh region in france	19:34
clarkb	Ultimately I think we decided it was something between the two endpoints and not something we could fix ourselves.	19:34
clarkb	However	19:34
clarkb	it did expose that our gitea logging no longer correlated connections from haproxy -> apache -> gitea	19:34
clarkb	haproxy -> apache was workign fine. The problem was apache -> gitea and that appears to be related to gitea switching http libraries from macaron to go-chi	19:35
clarkb	basically go-chi doesn't handle x-forwarded-for properly to preserve port info and isntead the port becomes :0	19:35
clarkb	We made some changes to stop forwarding x-forwarded-for which forces everything to record the actual ports in use. THis mostly works but apache -> gitea does reuse connections for multiple requests which means that it isn't a fully 1:1 mapping now but it is better than what we had on friday	19:36
clarkb	I think we can also force apache to use a new connection for each request but that is probably overkill?	19:36
clarkb	I wanted to bring this up in case anyone had better ideas or concerns with these cahgnes since we tried to get them in quickly last week while debugging	19:36
fungi	the request pipelining is probably more efficient, yeah, i don't think i'd turn it off just to make logs easier to correlate	19:37
clarkb	Sounds like no one has any immediate concerns.	19:39
clarkb	#topic Open Discussion	19:39
clarkb	Zuul will make its 7.0.0 release soon. The next step in the zuul release planning process is to switch opendev to ansible 6 by default to ensuer that is working happily. I had asked that we do that after the openstack release. But once openstack releases I think we can make that change	19:39
clarkb	I had a test devstack change up to check ansible 6 on the devstack jobs and that seemed to work happily	19:40
clarkb	https://review.opendev.org/c/openstack/devstack/+/858436	19:40
clarkb	Now is a good time to test things with ansible 6 if you have any concerns	19:40
fungi	#link https://review.opendev.org/859977 Add post-review pipeline	19:41
fungi	that's where most of the discussion i was talking about earlier took place	19:41
ianw	thanks -- in slightly related to ansible updates i think ansible-lint has fixed some issues that were holding us back from upgrading in zuul-jobs, i'll take a look	19:42
fungi	the openstacksdk maintainers want to take advantage of zuul's post-review pipeline flag to run some specific jobs which use secrets but limit them to changes which the core reviewers have okayed	19:42
clarkb	fungi: and looks like they don't want to use gate for that because they don't want the changes to merge at that point necessarily	19:42
fungi	right, the reviewers want build results after checking that it's safe to run those jobs but before approving them	19:43
clarkb	it might be worth considering if "Allow-Post-Review" conveys the intent here clearly as this might be a pipeline that is adopted more widely	19:43
fungi	we'd discussed this as a possibility (precisely for the case they bring up, testing with public cloud credentials), so i tried to rehash some of our earlier conversations about that	19:44
clarkb	(typicalyl I'd avoid bikeshedding stuff like that but once it is in gerrit acls it is hard to change)	19:44
fungi	yeah, allow-post-review was merely my best suggestion. what they had before that was even less clear	19:44
corvus	(this use case was an explicit design requirement for zuul, so something like this was anticipated and planned for)	19:44
fungi	something to convey "voting +1 here means it's safe to run post-review pipeline jobs" but small enough to be a gerrit label name	19:45
corvus	in design, i think we called it a "restricted check" pipeline or something like that.	19:45
fungi	that's not terrible	19:45
clarkb	no objections from me to move forward on this. As mentioned this was alays somethign we anticipated might become a useful utility	19:46
fungi	yeah, the previous name they had for it was the "post-check" pipeline (and a corresponding gerrit label of the same name)	19:46
fungi	but i agree bikeshedding on terms at least a little is probably good just because of the cargo cult potential	19:47
corvus	the "post-check" phrasing is slightly confusing to me.	19:47
fungi	yeah, since we already have pipelines in that tenant called post and check	19:47
clarkb	I think my initial concern with "allow-post-review" is it doesn't convey what is being allowed. Just that somethign is	19:47
fungi	short for allow-post-review-jobs-to-run	19:48
corvus	for the label name, maybe something that conveys "safety" or some level of having been "reviewed"...	19:48
fungi	yes, something along those lines would be good	19:49
fungi	my wordsmithing was simply not getting me all that far	19:49
fungi	everything i came up with was too lengthy	19:49
corvus	yeah, i'm not much help either	19:49
clarkb	ya its a tough one	19:49
clarkb	trigger-zuul-secrets	19:49
fungi	word-soup	19:49
clarkb	indeed	19:50
fungi	anyway, since it's a use case we'd discussed at length, but it's been a while, i just wanted to call those changes to others' attention so they don't go unnoticed	19:50
clarkb	++ thanks	19:51
fungi	especially since it's also in service of something we've had a bee in our collective bonnet over (loss of old public cloud support in openstacksdk)	19:51
corvus	++	19:51
clarkb	I'll give it a couple more minutes for anything else, but then we can probably end about 5 minutes early today	19:52
clarkb	sounds like that is it. Thank you everyone	19:54
clarkb	We'll be back next week	19:54
clarkb	same location and time	19:54
clarkb	#endmeeting	19:55
opendevmeet	Meeting ended Tue Oct 4 19:55:03 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:55
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-04-19.01.html	19:55
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-04-19.01.txt	19:55
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-10-04-19.01.log.html	19:55
fungi	thanks clarkb!	19:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!