Tuesday, 2022-09-06

clarkb	Meeting time	19:00
clarkb	I sent the agenda out a bit late (this morning) due to yesterday's holiday here. But I got one out and there are a few things to cover	19:00
fungi	ahoy!	19:00
ianw	o/	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Sep 6 19:01:04 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/pipermail/service-discuss/2022-September/000358.html Our Agenda	19:01
clarkb	#topic Announcements	19:01
clarkb	No new announcements but keep in mind that OpenStack and StarglingX are working through their release processes right now	19:01
clarkb	#topic Bastion Host Updates	19:02
clarkb	Anything new on this item? I discoverd a few minutse ago that newer ansible on bridge would be nice for newer apt module features. But I was able to work around that without updating ansible	19:02
clarkb	In particular I think the new features also require python3.8?	19:03
clarkb	And that implies upgrading the server and so on	19:03
ianw	yeah, i noticed that; i'm hoping the work to move to a venv will be top of todo now	19:04
clarkb	excellent I'll do my best to review those changes	19:04
clarkb	#topic Bionic Server Upgrades	19:05
clarkb	#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes on the work that needs to be done.	19:05
clarkb	Mostly keeping this on the agenda to keep it top of mind.	19:05
clarkb	I don't think any new work has happened on this, but we really should start digging in as we can	19:06
clarkb	#topic Mailman 3	19:06
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/851248 Change to deploy mm3 server.	19:06
clarkb	This change is ready for review now	19:06
clarkb	#link https://etherpad.opendev.org/p/mm3migration Server and list migration notes	19:06
clarkb	Thank you fungi for doing much of the work to test the migration process on a held test node	19:07
clarkb	Seems like it works as expected and gave us some good feedback on updates to make to our default list creation	19:07
clarkb	One thing we found is that lynx needs to be installed for html to txt email conversion support. That isn't currently happening on the upstream images and I've opened a PR against them to fix it. Unfortunately no word frmo usptream yet on whether or not they want to accept it	19:07
clarkb	Worst case we'll build our own images based on theirs and fix that	19:08
clarkb	The next thinsg to test are some updates to database connnection settings to allow for larger email attachments (and check that mysqldump can backup the resulting database state)	19:08
fungi	yeah, next phase is to retest openstack-discuss import (particularly the archive) for the db packet size limit tweaks, and also script up what a whole-site migration looks like	19:08
ianw	oh wow, haven't used lynx in a while!	19:08
clarkb	We also need to test the pipermail archive hosting redirects	19:09
fungi	oh, yep that too	19:09
clarkb	as we'll be hosting old pipermail archives to keep old links alive	19:09
clarkb	(there isn't a mapping from pipermail to hyperkitty apparently)	19:09
fungi	probably also time to start thinking about what other redirects we might want for list info pages and the like	19:09
clarkb	I think it is a bit early to commit to any convesion of lists.opendev.org but I'm hopeful we'll continue to make enough progress that we do that in the not too distant future	19:10
fungi	since i the old pipermail listinfo pages had different urls	19:10
fungi	maybe soon after the ptg would be a good timeframe	19:10
clarkb	fungi: for those links we likely can redirect to the new archives though ya?	19:10
fungi	yes	19:11
clarkb	great	19:11
clarkb	More testing planned tomorrow. Hopefully we'll have good things to report next week :)	19:11
fungi	just that we have old links to them in many places so not having to fix all o fthem will be good	19:11
clarkb	Anything else on this topic?	19:11
clarkb	++	19:11
clarkb	#topic Jaeger Tracing Server	19:13
clarkb	corvus: Wanted to give you an opportunity to provide any updates here if there are any	19:14
corvus	ah yeah i started this	19:14
corvus	#link tracing server https://review.opendev.org/855983	19:14
corvus	does not pass tests yet -- just pushed it up eod yesterday	19:14
corvus	hopefully it's basically complete and probably i just typo'd something.	19:14
corvus	if folks wanted to give it a look, i think it's not too early for that	19:15
corvus	oh	19:15
corvus	i'm using the "zk-ca" to generate certs for the clients (zuul) to send data to the server (jaeger)	19:16
clarkb	corvus: client authentication certs?	19:16
corvus	tls is optional -- we could rely on iptables only -- but i thought that was better, and seemed a reasonable scope expansion for our private ca	19:16
corvus	clarkb: yes exactly	19:16
clarkb	that seems reasonable. There is a similar concern re meetpad that I'll talk about soon	19:16
corvus	it's basically the same use we're already using zk-ca for for zookeeper, so it seemed reasonable to keep it parallel	19:17
corvus	that's all i have -- that's the only interesting new thing that came up	19:17
clarkb	thank you for teh update. I'll try to take a look at the change today	19:17
clarkb	#topic Fedora 36	19:18
clarkb	ianw: this is another one I feel not caught up on. I believe I saw changes happening, but I'm not sure what teh current state is.	19:18
clarkb	Any cahnce you can fill us in?	19:18
ianw	in terms of general deployment, zuul-jobs it's all gtg	19:19
ianw	devstack i do not think works, i haven't looked closely yet	19:20
ianw	#link https://review.opendev.org/c/openstack/devstack/+/854334	19:20
clarkb	ianw: is fedora 35 still there or is that on the way out now?	19:20
ianw	in terms of getting rid of f35, there are issues with openshift testing	19:20
frickler	it is still there for devstack, but pretty unstable	19:21
clarkb	oh right the client doesn't work on f36 yet	19:21
ianw	the openshift testing has two parts too, just to make a bigger yak to shave	19:21
frickler	https://zuul.openstack.org/builds?job_name=devstack-platform-fedora-latest&skip=0	19:22
ianw	the client side is one thing. the "oc" client doesn't run on fedora 36; due to go compatability issues	19:22
ianw	i just got a bug update overnight that "it was never supposed to work" or something ...	19:22
ianw	#link https://issues.redhat.com/browse/OCPBUGS-559	19:22
ianw	(i haven't quite parsed it)	19:23
ianw	the server side of that testing is also a problem -- it runs on centos7 using a PaaS repo that no longer exists	19:24
ianw	this is disucssion that's been happening about converting this to openshift local which is crc etc. etc. (i think we discused this last week)	19:25
clarkb	Yup and we brought it up in the zuul + open infra board discussion earlier today as well	19:25
clarkb	Sounds like there may be some interest from at least one board member in helping if possible	19:25
ianw	my concern with that is that it requires a nest-virt 9gb+ VM to run.	19:25
fungi	which we can provide in opendev too, so not entirely a show stopper	19:26
ianw	which we can provide via nodepool -- but it limits our testing range	19:26
fungi	yeah	19:26
fungi	not ideal, certainl	19:26
fungi	y	19:26
ianw	and also, it seems kind of crazy to require this to run a few containers	19:26
ianw	to test against. but in general the vibe i get is that nobody else thinks that	19:27
ianw	so maybe i'm just old and think 640k should be enough for everyone :/	19:27
fungi	others probably do, their voices may just not be that loud (yet)	19:27
fungi	or they're consigned to what they feel is unavoidable	19:28
ianw	there is some talk of having this thing support "minishift" which sounds like what i described ... a few containers. but i don't know anything concrete about that	19:28
clarkb	Sounds like we've got some options there we've just got to sort out which is the best one for our needs?	19:29
clarkb	I guess which is best and viable	19:30
ianw	well, i guess the options both kind of suck. one is to just drop it all, the other is to re-write a bunch of zuul-jobs openshift deployment, etc. jobs that can only run on bespoke (to opendev) nodes	19:30
ianw	hence i haven't been running at it full speed :)	19:31
clarkb	We did open up the idea of those more specialized labels for specific needs like this so I think it is ok to go that route from an OpenDev perspective if people want to push on that	19:31
clarkb	I can understand the frustration though.	19:32
clarkb	Anything else on this topic?	19:32
fungi	there's an open change for a 16vcpu flavor, which would automatically have more ram as well (and we could make a nestedvirt version of that)	19:32
ianw	yeah, we can commit that. i kind of figured since it wasn't being pushed the need for it had subsided?	19:34
fungi	the original desire for it was coming from some other jobs in the zuul tenant too	19:34
fungi	though i forget which ones precisely	19:35
corvus	i'd still try out the unit tests on it if it lands	19:35
fungi	ahh, right that was it	19:35
corvus	it's not as critical as when i wrote it, but it will be again in the future :)	19:35
fungi	wanting more parallelization for concurrent unit tests	19:35
ianw	yeah -- i mean i guess my only concern was that people in general find the nodes and give up on smaller vm's and just use that, limiting the testing environments	19:36
corvus	a 2020's era vm can run them in <10m (and they're reliable)	19:36
ianw	but, it's probably me who is out of touch :)	19:36
clarkb	ianw: I think that is still a concern and we should impress upon people that the more people who use that lable by default the more contention and slower the rtt will be	19:37
fungi	ianw: conversely, we can't use the preferred flavors in vexxhost currently because they would supply too much ram for our standard flavor, so our quota there is under-utilized	19:37
clarkb	and they become less fault tolerate so you should use it only when a specific need makes it necessary	19:37
corvus	i'm a big fan of keeping them accessible -- though opendev's definition of accessible is literally 12 years old now, so i'm open to a small amount of moore's law inflation :)	19:37
clarkb	corvus: to eb fair I only just this year ended up with a laptop with more than 8GB of ram. 8GB was common for a very long time for some reason	19:37
fungi	basically, vexxhost wants us to use a minimum 32gb ram flavor	19:38
corvus	clarkb: (yeah, i agree -- that's super weird they stuck on that value so long)	19:38
fungi	so being able to put that quota to use for something would be better than having it sit there	19:38
ianw	yeah, i guess it's cracking the door on a bigger question of vm types ...	19:38
ianw	if we want 8+gb AND nested virt, is vexxhost the only option? rax is out due to xen	19:38
clarkb	ianw: ovh and inmotion do nested virt too	19:39
clarkb	inap does not iirc	19:39
clarkb	and no nested virt on arm	19:39
fungi	vexxhost is just the one where we actually don't have access to any "standard" 8gb flavors	19:39
corvus	(good -- i definitely don't want to rely on only one provider for check/gate jobs)	19:40
ianw	i know i can look at nodepool config but do we have these bigger vms on those providers too?	19:40
ianw	or is it vexxhost pinned?	19:40
clarkb	ianw: I think we had them on vexxhost and something else for a while. It may be vexxhost specific right now	19:40
corvus	#link big vms https://review.opendev.org/844116	19:40
fungi	ianw: we don't have the label defined yet, but the open change adds it to more providers than just vexxhost	19:40
clarkb	ovh gives us specific flavors we would need to ask them about expanding them	19:40
clarkb	inmotion we control the flavors on and can modify ourselves	19:40
corvus	i did some research for that and tried to add 16gb everywhere i could ^	19:40
clarkb	oh right I remember that	19:41
fungi	but also as noted, that's not 16vcpu+nestedvirt, so we'd want another change for that addition	19:41
corvus	looks like the commit msg says we need custom flavors added to some providers	19:41
corvus	so 3 in that change, and potentially another 3 if we add custom flavors	19:41
corvus	(and yes, that's only looking at ram -- reduce that list for nested)	19:42
corvus	oh actually that's a 16 vcpu change	19:42
ianw	++ on the nested virt; as this tool does checks for /dev/kvm and won't even start without it	19:42
corvus	(not ram)	19:42
fungi	corvus: yeah, though i think the minimum ram for the 16vcpu flavors was at least 16gb ram (more in some providers)?	19:43
corvus	so yes, i would definitely think we would want another label for that, but this is a start	19:43
corvus	fungi: yes that is true	19:43
corvus	(and that's annotated in the change too)	19:43
ianw	it sounds like maybe to move this on -- we can do corvus' change first	19:43
corvus	in that change, i started adding comments for all the flavors -- i think we should do that in nodepool in the future	19:43
clarkb	Yup sounds liek we'd need to investigate further, but ya no objections from me. I think we can indicate that because they are larger they limit our capacity and thus should be used more sparingly when necessary and take it from there	19:43
ianw	but i should take an action item to come up with some sort of spreadsheet-like layout of node types we could provide	19:43
corvus	eg: `flavor-name: 'A1.16' # 16vcpu, 16384ram, 320disk`	19:43
fungi	i also like the comment idea, yes	19:43
fungi	hopefully providers don't change what their flavors mean (seems unlikely they would though)	19:44
corvus	i think they generally have not (but could), so i think it's reasonable for us to document them in our config and assume they mostly won't change	19:44
ianw	it seems like we probably want to have more options of ram/cpus/nested virt	19:44
corvus	(and if they do -- our docs will annotate what we thought they were supposed to be, which is also useful!)	19:45
ianw	++ and also something to point potential hosting providers at -- since currently i think we just say "we run 8gb vms"	19:46
clarkb	sounds like we've reached a good conclusion on this topic for the meeting. Let's move on to cover the last few topics before we run out of time.	19:46
clarkb	#topic Meetpad and Jitsi Meet Updates	19:46
clarkb	Fungi recently updated our jitsi meet configs for our meetpad service to catch up to upstream's latest happenings	19:47
clarkb	The motivation behind this was to add a landing page to joining meetings so that firefox would stop auto blocking the auto playing audio	19:47
fungi	they call it a "pre-join" page	19:47
clarkb	Good news is that all seems to be working now. Along the way we discovered a few interesting things though.	19:47
clarkb	First is that the upstream :latest docker hub image tag is no longer latest (they stopped updating it about 4 months ago)	19:47
clarkb	we now use the :stable tag which seems to correspond most closely to what tehy were tagging :latest previously	19:48
clarkb	The next is that JVB scale out appears to rely on this new colibri websocket system that requires each JVB to be accessible from the nginx serving the jitsi meet site via http(s)	19:48
clarkb	To work around that for now we've shutdown the jvbs and put them in the emergencyfile so that meetpad's all in one installation with localhost http connections can function and serve all the video	19:49
clarkb	I wasn't comfortable running http to remote hosts without knowing what the data sent across that is. I think what we'll end up doing is having the jvb's allocate LE certs and then set it all up with ssl instead	19:49
fungi	though shutting them down was probably not strictly necessary, it helps us be sure we don't accidentally try to farm a room out to one and then have it break	19:50
fungi	but worth noting there, it suggests that the separate jvb servers have probably been completely broken and so unused for many months now	19:50
clarkb	After some thought I don't think the change to do all that for the JVBs is taht difficult. More that we'll have to test it afterwards and ensure it is working as expected. We basically need to add LE, open firewall ports and set a config value that indicates the hostname to connect to via the proxy.	19:50
clarkb	fungi: yup that as well	19:51
clarkb	Hopefully we can try fixing the JVBs later this week and do another round of testing so that we're all prepped and ready for the PTG in a month	19:51
fungi	testing should also be trivial: stop the jvb container on meetpad.o.o and start it on one of the separate jvb servers	19:52
fungi	that way we can be certain over-the-network communication is used	19:52
clarkb	++ and in the meantime there shouldn't be any issues using the deployment as is	19:52
fungi	rather than the loopback-connected container	19:52
fungi	yeah, we're just not scaling out well currently, but as i said we probably haven't actually been for a while anyway	19:53
clarkb	#topic Zuul Reboot Playbook Stability	19:53
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/856176	19:53
fungi	we also talked about dumping one of the two dedicated jvb servers and just keeping a single one unless we find out we need more capacity	19:53
clarkb	#link https://review.opendev.org/c/opendev/system-config/+/855826	19:53
clarkb	Over the weekend the zuul reboot playbook crashed beacuse it ran into the unattended upgrades apt lock on zm02	19:54
fungi	i want to say that happened previously on one of the other servers as well	19:54
clarkb	I restarted it and then also noticed that since services were already down on zm02 the graceful stop playbook would similarly crash once it got to zm02 trying to docker exec on those containers	19:54
clarkb	The changes above address a logging thing I noticed and shoudl address the apt lock issue aswell. I'll try to address the graceful stop thing later today	19:55
clarkb	Would be good to address taht stuff and we should continue to check the zuul components list on monday mornings until we're happy it is running in a stable manner	19:55
corvus	clarkb: you mention `latest ansible` -- what about upgrading bridge?	19:55
clarkb	corvus: yes that would be required and is something that ianw has started poking at. The plan there is to move things into a virtualenv for ansible first aiui making it easier to manage the ansible install. Then use that to upgrade or redeploy	19:56
corvus	kk	19:56
clarkb	corvus: I think I'm fine punting on that as work is in progress to make that happen and we'll get there	19:56
clarkb	just not soon enough for the next weekly run of the zuul reboot playbook	19:56
corvus	ya, just thought to check in on it since i'm not up to date. thx.	19:57
clarkb	Good news is all these problems have been mechanical and not pointing out major flaws in our system	19:57
clarkb	#topic Open Discussion	19:57
clarkb	Just a couple minutes left. Anything else?	19:57
clarkb	One of our backup hosts needs pruning if anyone hasn't done this yet and would like to run the prune script	19:58
clarkb	frickler: corvus: ^ It is documented and previously ianw, myself, and then fungi have run through the docs just to amke sure we're comforatble with the process	19:58
fungi	i've done it a couple of times already in the past, but happy to do it again unless someone else wants the honor	19:58
frickler	I'll skip this time	19:59
ianw	i'm happy to do it, will do this afternoon .au time	19:59
clarkb	fungi: sounds like maybe you're it this time :) thanks	19:59
fungi	wfm	19:59
clarkb	oh I jumped the gun. Thank you ianw	19:59
clarkb	by 2 seconds :)	19:59
fungi	thanks ianw!	20:00
clarkb	and we are at time. Thank you everyone for listening to me ramble on IRC	20:00
clarkb	We'll be back next week at the same time and location	20:00
fungi	thanks clarkb!	20:00
clarkb	#endmeeting	20:00
opendevmeet	Meeting ended Tue Sep 6 20:00:22 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:00
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.html	20:00
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.txt	20:00
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.log.html	20:00
fungi	same bat time, same bat channel	20:00
ianw	clarkb: haha i think i derailed things the most today :)	20:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!