clarkb | Meeting time | 19:00 |
---|---|---|
clarkb | I sent the agenda out a bit late (this morning) due to yesterday's holiday here. But I got one out and there are a few things to cover | 19:00 |
fungi | ahoy! | 19:00 |
ianw | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Sep 6 19:01:04 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-September/000358.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | No new announcements but keep in mind that OpenStack and StarglingX are working through their release processes right now | 19:01 |
clarkb | #topic Bastion Host Updates | 19:02 |
clarkb | Anything new on this item? I discoverd a few minutse ago that newer ansible on bridge would be nice for newer apt module features. But I was able to work around that without updating ansible | 19:02 |
clarkb | In particular I think the new features also require python3.8? | 19:03 |
clarkb | And that implies upgrading the server and so on | 19:03 |
ianw | yeah, i noticed that; i'm hoping the work to move to a venv will be top of todo now | 19:04 |
clarkb | excellent I'll do my best to review those changes | 19:04 |
clarkb | #topic Bionic Server Upgrades | 19:05 |
clarkb | #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes on the work that needs to be done. | 19:05 |
clarkb | Mostly keeping this on the agenda to keep it top of mind. | 19:05 |
clarkb | I don't think any new work has happened on this, but we really should start digging in as we can | 19:06 |
clarkb | #topic Mailman 3 | 19:06 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/851248 Change to deploy mm3 server. | 19:06 |
clarkb | This change is ready for review now | 19:06 |
clarkb | #link https://etherpad.opendev.org/p/mm3migration Server and list migration notes | 19:06 |
clarkb | Thank you fungi for doing much of the work to test the migration process on a held test node | 19:07 |
clarkb | Seems like it works as expected and gave us some good feedback on updates to make to our default list creation | 19:07 |
clarkb | One thing we found is that lynx needs to be installed for html to txt email conversion support. That isn't currently happening on the upstream images and I've opened a PR against them to fix it. Unfortunately no word frmo usptream yet on whether or not they want to accept it | 19:07 |
clarkb | Worst case we'll build our own images based on theirs and fix that | 19:08 |
clarkb | The next thinsg to test are some updates to database connnection settings to allow for larger email attachments (and check that mysqldump can backup the resulting database state) | 19:08 |
fungi | yeah, next phase is to retest openstack-discuss import (particularly the archive) for the db packet size limit tweaks, and also script up what a whole-site migration looks like | 19:08 |
ianw | oh wow, haven't used lynx in a while! | 19:08 |
clarkb | We also need to test the pipermail archive hosting redirects | 19:09 |
fungi | oh, yep that too | 19:09 |
clarkb | as we'll be hosting old pipermail archives to keep old links alive | 19:09 |
clarkb | (there isn't a mapping from pipermail to hyperkitty apparently) | 19:09 |
fungi | probably also time to start thinking about what other redirects we might want for list info pages and the like | 19:09 |
clarkb | I think it is a bit early to commit to any convesion of lists.opendev.org but I'm hopeful we'll continue to make enough progress that we do that in the not too distant future | 19:10 |
fungi | since i the old pipermail listinfo pages had different urls | 19:10 |
fungi | maybe soon after the ptg would be a good timeframe | 19:10 |
clarkb | fungi: for those links we likely can redirect to the new archives though ya? | 19:10 |
fungi | yes | 19:11 |
clarkb | great | 19:11 |
clarkb | More testing planned tomorrow. Hopefully we'll have good things to report next week :) | 19:11 |
fungi | just that we have old links to them in many places so not having to fix all o fthem will be good | 19:11 |
clarkb | Anything else on this topic? | 19:11 |
clarkb | ++ | 19:11 |
clarkb | #topic Jaeger Tracing Server | 19:13 |
clarkb | corvus: Wanted to give you an opportunity to provide any updates here if there are any | 19:14 |
corvus | ah yeah i started this | 19:14 |
corvus | #link tracing server https://review.opendev.org/855983 | 19:14 |
corvus | does not pass tests yet -- just pushed it up eod yesterday | 19:14 |
corvus | hopefully it's basically complete and probably i just typo'd something. | 19:14 |
corvus | if folks wanted to give it a look, i think it's not too early for that | 19:15 |
corvus | oh | 19:15 |
corvus | i'm using the "zk-ca" to generate certs for the clients (zuul) to send data to the server (jaeger) | 19:16 |
clarkb | corvus: client authentication certs? | 19:16 |
corvus | tls is optional -- we could rely on iptables only -- but i thought that was better, and seemed a reasonable scope expansion for our private ca | 19:16 |
corvus | clarkb: yes exactly | 19:16 |
clarkb | that seems reasonable. There is a similar concern re meetpad that I'll talk about soon | 19:16 |
corvus | it's basically the same use we're already using zk-ca for for zookeeper, so it seemed reasonable to keep it parallel | 19:17 |
corvus | that's all i have -- that's the only interesting new thing that came up | 19:17 |
clarkb | thank you for teh update. I'll try to take a look at the change today | 19:17 |
clarkb | #topic Fedora 36 | 19:18 |
clarkb | ianw: this is another one I feel not caught up on. I believe I saw changes happening, but I'm not sure what teh current state is. | 19:18 |
clarkb | Any cahnce you can fill us in? | 19:18 |
ianw | in terms of general deployment, zuul-jobs it's all gtg | 19:19 |
ianw | devstack i do not think works, i haven't looked closely yet | 19:20 |
ianw | #link https://review.opendev.org/c/openstack/devstack/+/854334 | 19:20 |
clarkb | ianw: is fedora 35 still there or is that on the way out now? | 19:20 |
ianw | in terms of getting rid of f35, there are issues with openshift testing | 19:20 |
frickler | it is still there for devstack, but pretty unstable | 19:21 |
clarkb | oh right the client doesn't work on f36 yet | 19:21 |
ianw | the openshift testing has two parts too, just to make a bigger yak to shave | 19:21 |
frickler | https://zuul.openstack.org/builds?job_name=devstack-platform-fedora-latest&skip=0 | 19:22 |
ianw | the client side is one thing. the "oc" client doesn't run on fedora 36; due to go compatability issues | 19:22 |
ianw | i just got a bug update overnight that "it was never supposed to work" or something ... | 19:22 |
ianw | #link https://issues.redhat.com/browse/OCPBUGS-559 | 19:22 |
ianw | (i haven't quite parsed it) | 19:23 |
ianw | the server side of that testing is also a problem -- it runs on centos7 using a PaaS repo that no longer exists | 19:24 |
ianw | this is disucssion that's been happening about converting this to openshift local which is crc etc. etc. (i think we discused this last week) | 19:25 |
clarkb | Yup and we brought it up in the zuul + open infra board discussion earlier today as well | 19:25 |
clarkb | Sounds like there may be some interest from at least one board member in helping if possible | 19:25 |
ianw | my concern with that is that it requires a nest-virt 9gb+ VM to run. | 19:25 |
fungi | which we can provide in opendev too, so not entirely a show stopper | 19:26 |
ianw | which we *can* provide via nodepool -- but it limits our testing range | 19:26 |
fungi | yeah | 19:26 |
fungi | not ideal, certainl | 19:26 |
fungi | y | 19:26 |
ianw | and also, it seems kind of crazy to require this to run a few containers | 19:26 |
ianw | to test against. but in general the vibe i get is that nobody else thinks that | 19:27 |
ianw | so maybe i'm just old and think 640k should be enough for everyone :/ | 19:27 |
fungi | others probably do, their voices may just not be that loud (yet) | 19:27 |
fungi | or they're consigned to what they feel is unavoidable | 19:28 |
ianw | there is some talk of having this thing support "minishift" which sounds like what i described ... a few containers. but i don't know anything concrete about that | 19:28 |
clarkb | Sounds like we've got some options there we've just got to sort out which is the best one for our needs? | 19:29 |
clarkb | I guess which is best and viable | 19:30 |
ianw | well, i guess the options both kind of suck. one is to just drop it all, the other is to re-write a bunch of zuul-jobs openshift deployment, etc. jobs that can only run on bespoke (to opendev) nodes | 19:30 |
ianw | hence i haven't been running at it full speed :) | 19:31 |
clarkb | We did open up the idea of those more specialized labels for specific needs like this so I think it is ok to go that route from an OpenDev perspective if people want to push on that | 19:31 |
clarkb | I can understand the frustration though. | 19:32 |
clarkb | Anything else on this topic? | 19:32 |
fungi | there's an open change for a 16vcpu flavor, which would automatically have more ram as well (and we could make a nestedvirt version of that) | 19:32 |
ianw | yeah, we can commit that. i kind of figured since it wasn't being pushed the need for it had subsided? | 19:34 |
fungi | the original desire for it was coming from some other jobs in the zuul tenant too | 19:34 |
fungi | though i forget which ones precisely | 19:35 |
corvus | i'd still try out the unit tests on it if it lands | 19:35 |
fungi | ahh, right that was it | 19:35 |
corvus | it's not as critical as when i wrote it, but it will be again in the future :) | 19:35 |
fungi | wanting more parallelization for concurrent unit tests | 19:35 |
ianw | yeah -- i mean i guess my only concern was that people in general find the nodes and give up on smaller vm's and just use that, limiting the testing environments | 19:36 |
corvus | a 2020's era vm can run them in <10m (and they're reliable) | 19:36 |
ianw | but, it's probably me who is out of touch :) | 19:36 |
clarkb | ianw: I think that is still a concern and we should impress upon people that the more people who use that lable by default the more contention and slower the rtt will be | 19:37 |
fungi | ianw: conversely, we can't use the preferred flavors in vexxhost currently because they would supply too much ram for our standard flavor, so our quota there is under-utilized | 19:37 |
clarkb | and they become less fault tolerate so you should use it only when a specific need makes it necessary | 19:37 |
corvus | i'm a big fan of keeping them accessible -- though opendev's definition of accessible is literally 12 years old now, so i'm open to a small amount of moore's law inflation :) | 19:37 |
clarkb | corvus: to eb fair I only just this year ended up with a laptop with more than 8GB of ram. 8GB was common for a very long time for some reason | 19:37 |
fungi | basically, vexxhost wants us to use a minimum 32gb ram flavor | 19:38 |
corvus | clarkb: (yeah, i agree -- that's super weird they stuck on that value so long) | 19:38 |
fungi | so being able to put that quota to use for something would be better than having it sit there | 19:38 |
ianw | yeah, i guess it's cracking the door on a bigger question of vm types ... | 19:38 |
ianw | if we want 8+gb AND nested virt, is vexxhost the only option? rax is out due to xen | 19:38 |
clarkb | ianw: ovh and inmotion do nested virt too | 19:39 |
clarkb | inap does not iirc | 19:39 |
clarkb | and no nested virt on arm | 19:39 |
fungi | vexxhost is just the one where we actually don't have access to any "standard" 8gb flavors | 19:39 |
corvus | (good -- i definitely don't want to rely on only one provider for check/gate jobs) | 19:40 |
ianw | i know i can look at nodepool config but do we have these bigger vms on those providers too? | 19:40 |
ianw | or is it vexxhost pinned? | 19:40 |
clarkb | ianw: I think we had them on vexxhost and something else for a while. It may be vexxhost specific right now | 19:40 |
corvus | #link big vms https://review.opendev.org/844116 | 19:40 |
fungi | ianw: we don't have the label defined yet, but the open change adds it to more providers than just vexxhost | 19:40 |
clarkb | ovh gives us specific flavors we would need to ask them about expanding them | 19:40 |
clarkb | inmotion we control the flavors on and can modify ourselves | 19:40 |
corvus | i did some research for that and tried to add 16gb everywhere i could ^ | 19:40 |
clarkb | oh right I remember that | 19:41 |
fungi | but also as noted, that's not 16vcpu+nestedvirt, so we'd want another change for that addition | 19:41 |
corvus | looks like the commit msg says we need custom flavors added to some providers | 19:41 |
corvus | so 3 in that change, and potentially another 3 if we add custom flavors | 19:41 |
corvus | (and yes, that's only looking at ram -- reduce that list for nested) | 19:42 |
corvus | oh actually that's a 16 vcpu change | 19:42 |
ianw | ++ on the nested virt; as this tool does checks for /dev/kvm and won't even start without it | 19:42 |
corvus | (not ram) | 19:42 |
fungi | corvus: yeah, though i think the minimum ram for the 16vcpu flavors was at least 16gb ram (more in some providers)? | 19:43 |
corvus | so yes, i would definitely think we would want another label for that, but this is a start | 19:43 |
corvus | fungi: yes that is true | 19:43 |
corvus | (and that's annotated in the change too) | 19:43 |
ianw | it sounds like maybe to move this on -- we can do corvus' change first | 19:43 |
corvus | in that change, i started adding comments for all the flavors -- i think we should do that in nodepool in the future | 19:43 |
clarkb | Yup sounds liek we'd need to investigate further, but ya no objections from me. I think we can indicate that because they are larger they limit our capacity and thus should be used more sparingly when necessary and take it from there | 19:43 |
ianw | but i should take an action item to come up with some sort of spreadsheet-like layout of node types we could provide | 19:43 |
corvus | eg: `flavor-name: 'A1.16' # 16vcpu, 16384ram, 320disk` | 19:43 |
fungi | i also like the comment idea, yes | 19:43 |
fungi | hopefully providers don't change what their flavors mean (seems unlikely they would though) | 19:44 |
corvus | i think they generally have not (but could), so i think it's reasonable for us to document them in our config and assume they mostly won't change | 19:44 |
ianw | it seems like we probably want to have more options of ram/cpus/nested virt | 19:44 |
corvus | (and if they do -- our docs will annotate what we thought they were supposed to be, which is also useful!) | 19:45 |
ianw | ++ and also something to point potential hosting providers at -- since currently i think we just say "we run 8gb vms" | 19:46 |
clarkb | sounds like we've reached a good conclusion on this topic for the meeting. Let's move on to cover the last few topics before we run out of time. | 19:46 |
clarkb | #topic Meetpad and Jitsi Meet Updates | 19:46 |
clarkb | Fungi recently updated our jitsi meet configs for our meetpad service to catch up to upstream's latest happenings | 19:47 |
clarkb | The motivation behind this was to add a landing page to joining meetings so that firefox would stop auto blocking the auto playing audio | 19:47 |
fungi | they call it a "pre-join" page | 19:47 |
clarkb | Good news is that all seems to be working now. Along the way we discovered a few interesting things though. | 19:47 |
clarkb | First is that the upstream :latest docker hub image tag is no longer latest (they stopped updating it about 4 months ago) | 19:47 |
clarkb | we now use the :stable tag which seems to correspond most closely to what tehy were tagging :latest previously | 19:48 |
clarkb | The next is that JVB scale out appears to rely on this new colibri websocket system that requires each JVB to be accessible from the nginx serving the jitsi meet site via http(s) | 19:48 |
clarkb | To work around that for now we've shutdown the jvbs and put them in the emergencyfile so that meetpad's all in one installation with localhost http connections can function and serve all the video | 19:49 |
clarkb | I wasn't comfortable running http to remote hosts without knowing what the data sent across that is. I think what we'll end up doing is having the jvb's allocate LE certs and then set it all up with ssl instead | 19:49 |
fungi | though shutting them down was probably not strictly necessary, it helps us be sure we don't accidentally try to farm a room out to one and then have it break | 19:50 |
fungi | but worth noting there, it suggests that the separate jvb servers have probably been completely broken and so unused for many months now | 19:50 |
clarkb | After some thought I don't think the change to do all that for the JVBs is taht difficult. More that we'll have to test it afterwards and ensure it is working as expected. We basically need to add LE, open firewall ports and set a config value that indicates the hostname to connect to via the proxy. | 19:50 |
clarkb | fungi: yup that as well | 19:51 |
clarkb | Hopefully we can try fixing the JVBs later this week and do another round of testing so that we're all prepped and ready for the PTG in a month | 19:51 |
fungi | testing should also be trivial: stop the jvb container on meetpad.o.o and start it on one of the separate jvb servers | 19:52 |
fungi | that way we can be certain over-the-network communication is used | 19:52 |
clarkb | ++ and in the meantime there shouldn't be any issues using the deployment as is | 19:52 |
fungi | rather than the loopback-connected container | 19:52 |
fungi | yeah, we're just not scaling out well currently, but as i said we probably haven't actually been for a while anyway | 19:53 |
clarkb | #topic Zuul Reboot Playbook Stability | 19:53 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/856176 | 19:53 |
fungi | we also talked about dumping one of the two dedicated jvb servers and just keeping a single one unless we find out we need more capacity | 19:53 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/855826 | 19:53 |
clarkb | Over the weekend the zuul reboot playbook crashed beacuse it ran into the unattended upgrades apt lock on zm02 | 19:54 |
fungi | i want to say that happened previously on one of the other servers as well | 19:54 |
clarkb | I restarted it and then also noticed that since services were already down on zm02 the graceful stop playbook would similarly crash once it got to zm02 trying to docker exec on those containers | 19:54 |
clarkb | The changes above address a logging thing I noticed and shoudl address the apt lock issue aswell. I'll try to address the graceful stop thing later today | 19:55 |
clarkb | Would be good to address taht stuff and we should continue to check the zuul components list on monday mornings until we're happy it is running in a stable manner | 19:55 |
corvus | clarkb: you mention `latest ansible` -- what about upgrading bridge? | 19:55 |
clarkb | corvus: yes that would be required and is something that ianw has started poking at. The plan there is to move things into a virtualenv for ansible first aiui making it easier to manage the ansible install. Then use that to upgrade or redeploy | 19:56 |
corvus | kk | 19:56 |
clarkb | corvus: I think I'm fine punting on that as work is in progress to make that happen and we'll get there | 19:56 |
clarkb | just not soon enough for the next weekly run of the zuul reboot playbook | 19:56 |
corvus | ya, just thought to check in on it since i'm not up to date. thx. | 19:57 |
clarkb | Good news is all these problems have been mechanical and not pointing out major flaws in our system | 19:57 |
clarkb | #topic Open Discussion | 19:57 |
clarkb | Just a couple minutes left. Anything else? | 19:57 |
clarkb | One of our backup hosts needs pruning if anyone hasn't done this yet and would like to run the prune script | 19:58 |
clarkb | frickler: corvus: ^ It is documented and previously ianw, myself, and then fungi have run through the docs just to amke sure we're comforatble with the process | 19:58 |
fungi | i've done it a couple of times already in the past, but happy to do it again unless someone else wants the honor | 19:58 |
frickler | I'll skip this time | 19:59 |
ianw | i'm happy to do it, will do this afternoon .au time | 19:59 |
clarkb | fungi: sounds like maybe you're it this time :) thanks | 19:59 |
fungi | wfm | 19:59 |
clarkb | oh I jumped the gun. Thank you ianw | 19:59 |
clarkb | by 2 seconds :) | 19:59 |
fungi | thanks ianw! | 20:00 |
clarkb | and we are at time. Thank you everyone for listening to me ramble on IRC | 20:00 |
clarkb | We'll be back next week at the same time and location | 20:00 |
fungi | thanks clarkb! | 20:00 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Sep 6 20:00:22 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-06-19.01.log.html | 20:00 |
fungi | same bat time, same bat channel | 20:00 |
ianw | clarkb: haha i think i derailed things the most today :) | 20:00 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!