Tuesday, 2022-09-06

clarkbMeeting time19:00
clarkbI sent the agenda out a bit late (this morning) due to yesterday's holiday here. But I got one out and there are a few things to cover19:00
fungiahoy!19:00
ianwo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Sep  6 19:01:04 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-September/000358.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbNo new announcements but keep in mind that OpenStack and StarglingX are working through their release processes right now19:01
clarkb#topic Bastion Host Updates19:02
clarkbAnything new on this item? I discoverd a few minutse ago that newer ansible on bridge would be nice for newer apt module features. But I was able to work around that without updating ansible19:02
clarkbIn particular I think the new features also require python3.8?19:03
clarkbAnd that implies upgrading the server and so on19:03
ianwyeah, i noticed that; i'm hoping the work to move to a venv will be top of todo now19:04
clarkbexcellent I'll do my best to review those changes19:04
clarkb#topic Bionic Server Upgrades19:05
clarkb#link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes on the work that needs to be done.19:05
clarkbMostly keeping this on the agenda to keep it top of mind.19:05
clarkbI don't think any new work has happened on this, but we really should start digging in as we can19:06
clarkb#topic Mailman 319:06
clarkb#link https://review.opendev.org/c/opendev/system-config/+/851248 Change to deploy mm3 server.19:06
clarkbThis change is ready for review now19:06
clarkb#link https://etherpad.opendev.org/p/mm3migration Server and list migration notes19:06
clarkbThank you fungi for doing much of the work to test the migration process on a held test node19:07
clarkbSeems like it works as expected and gave us some good feedback on updates to make to our default list creation19:07
clarkbOne thing we found is that lynx needs to be installed for html to txt email conversion support. That isn't currently happening on the upstream images and I've opened a PR against them to fix it. Unfortunately no word frmo usptream yet on whether or not they want to accept it19:07
clarkbWorst case we'll build our own images based on theirs and fix that19:08
clarkbThe next thinsg to test are some updates to database connnection settings to allow for larger email attachments (and check that mysqldump can backup the resulting database state)19:08
fungiyeah, next phase is to retest openstack-discuss import (particularly the archive) for the db packet size limit tweaks, and also script up what a whole-site migration looks like19:08
ianwoh wow, haven't used lynx in a while!19:08
clarkbWe also need to test the pipermail archive hosting redirects19:09
fungioh, yep that too19:09
clarkbas we'll be hosting old pipermail archives to keep old links alive19:09
clarkb(there isn't a mapping from pipermail to hyperkitty apparently)19:09
fungiprobably also time to start thinking about what other redirects we might want for list info pages and the like19:09
clarkbI think it is a bit early to commit to any convesion of lists.opendev.org but I'm hopeful we'll continue to make enough progress that we do that in the not too distant future19:10
fungisince i the old pipermail listinfo pages had different urls19:10
fungimaybe soon after the ptg would be a good timeframe19:10
clarkbfungi: for those links we likely can redirect to the new archives though ya?19:10
fungiyes19:11
clarkbgreat19:11
clarkbMore testing planned tomorrow. Hopefully we'll have good things to report next week :)19:11
fungijust that we have old links to them in many places so not having to fix all o fthem will be good19:11
clarkbAnything else on this topic?19:11
clarkb++19:11
clarkb#topic Jaeger Tracing Server19:13
clarkbcorvus: Wanted to give you an opportunity to provide any updates here if there are any19:14
corvusah yeah i started this19:14
corvus#link tracing server https://review.opendev.org/85598319:14
corvusdoes not pass tests yet -- just pushed it up eod yesterday19:14
corvushopefully it's basically complete and probably i just typo'd something.19:14
corvusif folks wanted to give it a look, i think it's not too early for that19:15
corvusoh19:15
corvusi'm using the "zk-ca" to generate certs for the clients (zuul) to send data to the server (jaeger)19:16
clarkbcorvus: client authentication certs?19:16
corvustls is optional -- we could rely on iptables only -- but i thought that was better, and seemed a reasonable scope expansion for our private ca19:16
corvusclarkb: yes exactly19:16
clarkbthat seems reasonable. There is a similar concern re meetpad that I'll talk about soon19:16
corvusit's basically the same use we're already using zk-ca for for zookeeper, so it seemed reasonable to keep it parallel19:17
corvusthat's all i have -- that's the only interesting new thing that came up19:17
clarkbthank you for teh update. I'll try to take a look at the change today19:17
clarkb#topic Fedora 3619:18
clarkbianw: this is another one I feel not caught up on. I believe I saw changes happening, but I'm not sure what teh current state is.19:18
clarkbAny cahnce you can fill us in?19:18
ianwin terms of general deployment, zuul-jobs it's all gtg19:19
ianwdevstack i do not think works, i haven't looked closely yet19:20
ianw#link https://review.opendev.org/c/openstack/devstack/+/85433419:20
clarkbianw: is fedora 35 still there or is that on the way out now?19:20
ianwin terms of getting rid of f35, there are issues with openshift testing 19:20
fricklerit is still there for devstack, but pretty unstable19:21
clarkboh right the client doesn't work on f36 yet19:21
ianwthe openshift testing has two parts too, just to make a bigger yak to shave19:21
fricklerhttps://zuul.openstack.org/builds?job_name=devstack-platform-fedora-latest&skip=019:22
ianwthe client side is one thing.  the "oc" client doesn't run on fedora 36; due to go compatability issues19:22
ianwi just got a bug update overnight that "it was never supposed to work" or something ...19:22
ianw#link https://issues.redhat.com/browse/OCPBUGS-55919:22
ianw(i haven't quite parsed it)19:23
ianwthe server side of that testing is also a problem -- it runs on centos7 using a PaaS repo that no longer exists19:24
ianwthis is disucssion that's been happening about converting this to openshift local which is crc etc. etc. (i think we discused this last week)19:25
clarkbYup and we brought it up in the zuul + open infra board discussion earlier today as well19:25
clarkbSounds like there may be some interest from at least one board member in helping if possible19:25
ianwmy concern with that is that it requires a nest-virt 9gb+ VM to run.  19:25
fungiwhich we can provide in opendev too, so not entirely a show stopper19:26
ianwwhich we *can* provide via nodepool -- but it limits our testing range19:26
fungiyeah19:26
funginot ideal, certainl19:26
fungiy19:26
ianwand also, it seems kind of crazy to require this to run a few containers19:26
ianwto test against.  but in general the vibe i get is that nobody else thinks that19:27
ianwso maybe i'm just old and think 640k should be enough for everyone :/19:27
fungiothers probably do, their voices may just not be that loud (yet)19:27
fungior they're consigned to what they feel is unavoidable19:28
ianwthere is some talk of having this thing support "minishift" which sounds like what i described ... a few containers.  but i don't know anything concrete about that19:28
clarkbSounds like we've got some options there we've just got to sort out which is the best one for our needs?19:29
clarkbI guess which is best and viable19:30
ianwwell, i guess the options both kind of suck.  one is to just drop it all, the other is to re-write a bunch of zuul-jobs openshift deployment, etc. jobs that can only run on bespoke (to opendev) nodes19:30
ianwhence i haven't been running at it full speed :)19:31
clarkbWe did open up the idea of those more specialized labels for specific needs like this so I think it is ok to go that route from an OpenDev perspective if people want to push on that19:31
clarkbI can understand the frustration though.19:32
clarkbAnything else on this topic?19:32
fungithere's an open change for a 16vcpu flavor, which would automatically have more ram as well (and we could make a nestedvirt version of that)19:32
ianwyeah, we can commit that.  i kind of figured since it wasn't being pushed the need for it had subsided?19:34
fungithe original desire for it was coming from some other jobs in the zuul tenant too19:34
fungithough i forget which ones precisely19:35
corvusi'd still try out the unit tests on it if it lands19:35
fungiahh, right that was it19:35
corvusit's not as critical as when i wrote it, but it will be again in the future :)19:35
fungiwanting more parallelization for concurrent unit tests19:35
ianwyeah -- i mean i guess my only concern was that people in general find the nodes and give up on smaller vm's and just use that, limiting the testing environments19:36
corvusa 2020's era vm can run them in <10m (and they're reliable)19:36
ianwbut, it's probably me who is out of touch :)19:36
clarkbianw: I think that is still a concern and we should impress upon people that the more people who use that lable by default the more contention and slower the rtt will be19:37
fungiianw: conversely, we can't use the preferred flavors in vexxhost currently because they would supply too much ram for our standard flavor, so our quota there is under-utilized19:37
clarkband they become less fault tolerate so you should use it only when a specific need makes it necessary19:37
corvusi'm a big fan of keeping them accessible -- though opendev's definition of accessible is literally 12 years old now, so i'm open to a small amount of moore's law inflation :)19:37
clarkbcorvus: to eb fair I only just this year ended up with a laptop with more than 8GB of ram. 8GB was common for a very long time for some reason19:37
fungibasically, vexxhost wants us to use a minimum 32gb ram flavor19:38
corvusclarkb: (yeah, i agree -- that's super weird they stuck on that value so long)19:38
fungiso being able to put that quota to use for something would be better than having it sit there19:38
ianwyeah, i guess it's cracking the door on a bigger question of vm types ...19:38
ianwif we want 8+gb AND nested virt, is vexxhost the only option?  rax is out due to xen19:38
clarkbianw: ovh and inmotion do nested virt too19:39
clarkbinap does not iirc19:39
clarkband no nested virt on arm19:39
fungivexxhost is just the one where we actually don't have access to any "standard" 8gb flavors19:39
corvus(good -- i definitely don't want to rely on only one provider for check/gate jobs)19:40
ianwi know i can look at nodepool config but do we have these bigger vms on those providers too?19:40
ianwor is it vexxhost pinned?19:40
clarkbianw: I think we had them on vexxhost and something else for a while. It may be vexxhost specific right now19:40
corvus#link big vms https://review.opendev.org/84411619:40
fungiianw: we don't have the label defined yet, but the open change adds it to more providers than just vexxhost19:40
clarkbovh gives us specific flavors we would need to ask them about expanding them19:40
clarkbinmotion we control the flavors on and can modify ourselves19:40
corvusi did some research for that and tried to add 16gb everywhere i could ^19:40
clarkboh right I remember that19:41
fungibut also as noted, that's not 16vcpu+nestedvirt, so we'd want another change for that addition19:41
corvuslooks like the commit msg says we need custom flavors added to some providers19:41
corvusso 3 in that change, and potentially another 3 if we add custom flavors19:41
corvus(and yes, that's only looking at ram -- reduce that list for nested)19:42
corvusoh actually that's a 16 vcpu change19:42
ianw++ on the nested virt; as this tool does checks for /dev/kvm and won't even start without it19:42
corvus(not ram)19:42
fungicorvus: yeah, though i think the minimum ram for the 16vcpu flavors was at least 16gb ram (more in some providers)?19:43
corvusso yes, i would definitely think we would want another label for that, but this is a start19:43
corvusfungi: yes that is true19:43
corvus(and that's annotated in the change too)19:43
ianwit sounds like maybe to move this on -- we can do corvus' change first19:43
corvusin that change, i started adding comments for all the flavors -- i think we should do that in nodepool in the future19:43
clarkbYup sounds liek we'd need to investigate further, but ya no objections from me. I think we can indicate that because they are larger they limit our capacity and thus should be used more sparingly when necessary and take it from there19:43
ianwbut i should take an action item to come up with some sort of spreadsheet-like layout of node types we could provide19:43
corvuseg: `flavor-name: 'A1.16'  # 16vcpu, 16384ram, 320disk`19:43
fungii also like the comment idea, yes19:43
fungihopefully providers don't change what their flavors mean (seems unlikely they would though)19:44
corvusi think they generally have not (but could), so i think it's reasonable for us to document them in our config and assume they mostly won't change19:44
ianwit seems like we probably want to have more options of ram/cpus/nested virt19:44
corvus(and if they do -- our docs will annotate what we thought they were supposed to be, which is also useful!)19:45
ianw++ and also something to point potential hosting providers at -- since currently i think we just say "we run 8gb vms"19:46
clarkbsounds like we've reached a good conclusion on this topic for the meeting. Let's move on to cover the last few topics before we run out of time.19:46
clarkb#topic Meetpad and Jitsi Meet Updates19:46
clarkbFungi recently updated our jitsi meet configs for our meetpad service to catch up to upstream's latest happenings19:47
clarkbThe motivation behind this was to add a landing page to joining meetings so that firefox would stop auto blocking the auto playing audio19:47
fungithey call it a "pre-join" page19:47
clarkbGood news is that all seems to be working now. Along the way we discovered a few interesting things though.19:47
clarkbFirst is that the upstream :latest docker hub image tag is no longer latest (they stopped updating it about 4 months ago)19:47
clarkbwe now use the :stable tag which seems to correspond most closely to what tehy were tagging :latest previously19:48
clarkbThe next is that JVB scale out appears to rely on this new colibri websocket system that requires each JVB to be accessible from the nginx serving the jitsi meet site via http(s)19:48
clarkbTo work around that for now we've shutdown the jvbs and put them in the emergencyfile so that meetpad's all in one installation with localhost http connections can function and serve all the video19:49
clarkbI wasn't comfortable running http to remote hosts without knowing what the data sent across that is. I think what we'll end up doing is having the jvb's allocate LE certs and then set it all up with ssl instead19:49
fungithough shutting them down was probably not strictly necessary, it helps us be sure we don't accidentally try to farm a room out to one and then have it break19:50
fungibut worth noting there, it suggests that the separate jvb servers have probably been completely broken and so unused for many months now19:50
clarkbAfter some thought I don't think the change to do all that for the JVBs is taht difficult. More that we'll have to test it afterwards and ensure it is working as expected. We basically need to add LE, open firewall ports and set a config value that indicates the hostname to connect to via the proxy.19:50
clarkbfungi: yup that as well19:51
clarkbHopefully we can try fixing the JVBs later this week and do another round of testing so that we're all prepped and ready for the PTG in a month19:51
fungitesting should also be trivial: stop the jvb container on meetpad.o.o and start it on one of the separate jvb servers19:52
fungithat way we can be certain over-the-network communication is used19:52
clarkb++ and in the meantime there shouldn't be any issues using the deployment as is19:52
fungirather than the loopback-connected container19:52
fungiyeah, we're just not scaling out well currently, but as i said we probably haven't actually been for a while anyway19:53
clarkb#topic Zuul Reboot Playbook Stability19:53
clarkb#link https://review.opendev.org/c/opendev/system-config/+/85617619:53
fungiwe also talked about dumping one of the two dedicated jvb servers and just keeping a single one unless we find out we need more capacity19:53
clarkb#link https://review.opendev.org/c/opendev/system-config/+/85582619:53
clarkbOver the weekend the zuul reboot playbook crashed beacuse it ran into the unattended upgrades apt lock on zm0219:54
fungii want to say that happened previously on one of the other servers as well19:54
clarkbI restarted it and then also noticed that since services were already down on zm02 the graceful stop playbook would similarly crash once it got to zm02 trying to docker exec on those containers19:54
clarkbThe changes above address a logging thing I noticed and shoudl address the apt lock issue aswell. I'll try to address the graceful stop thing later today19:55
clarkbWould be good to address taht stuff and we should continue to check the zuul components list on monday mornings until we're happy it is running in a stable manner19:55
corvusclarkb: you mention `latest ansible` -- what about upgrading bridge?19:55
clarkbcorvus: yes that would be required and is something that ianw has started poking at. The plan there is to move things into a virtualenv for ansible first aiui making it easier to manage the ansible install. Then use that to upgrade or redeploy19:56
corvuskk19:56
clarkbcorvus: I think I'm fine punting on that as work is in progress to make that happen and we'll get there19:56
clarkbjust not soon enough for the next weekly run of the zuul reboot playbook19:56
corvusya, just thought to check in on it since i'm not up to date.  thx.19:57
clarkbGood news is all these problems have been mechanical and not pointing out major flaws in our system19:57
clarkb#topic Open Discussion19:57
clarkbJust a couple minutes left. Anything else?19:57
clarkbOne of our backup hosts needs pruning if anyone hasn't done this yet and would like to run the prune script19:58
clarkbfrickler: corvus: ^ It is documented and previously ianw, myself, and then fungi have run through the docs just to amke sure we're comforatble with the process19:58
fungii've done it a couple of times already in the past, but happy to do it again unless someone else wants the honor19:58
fricklerI'll skip this time19:59
ianwi'm happy to do it, will do this afternoon .au time19:59
clarkbfungi: sounds like maybe you're it this time :) thanks19:59
fungiwfm19:59
clarkboh I jumped the gun. Thank you ianw19:59
clarkbby 2 seconds :)19:59
fungithanks ianw!20:00
clarkband we are at time. Thank you everyone for listening to me ramble on IRC20:00
clarkbWe'll be back next week at the same time and location20:00
fungithanks clarkb!20:00
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Sep  6 20:00:22 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.log.html20:00
fungisame bat time, same bat channel20:00
ianwclarkb: haha i think i derailed things the most today :)20:00

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!