Tuesday, 2021-09-14

fungiahoy!19:00
clarkbhello19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Sep 14 19:01:13 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
ianwo/19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-September/000283.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbI didn't have any announcements. Did anyone else have announcements to share?19:02
fungii don't think i did19:02
clarkb#topic Actions from last meeting19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-09-07-19.01.txt minutes from last meeting19:03
clarkbThere were no actions recorded last meeting19:03
* mordred waves to lovely hoomans19:03
clarkb#topic Specs19:03
clarkb#link https://review.opendev.org/c/opendev/infra-specs/+/804122 Prometheus Cacti replacement19:03
clarkbI updated the spec based on some of the feedback I got. Seems everyone is happy with the general plan but one specific thing has come up since I pushed the update19:04
clarkbBasically corvus is pointing out we shouldn't try to do node exporter and snmp exporter as that will double our work we should commit to one or the other19:04
clarkbI'll try to capture the pros/cons of each really quickly here, but I would appreciate it ya'll could take a look and leave your thoughts on this specific topic19:05
clarkbFor the SNMP exporter the upside is we already run and configure snmpd on all of our instances. This means the only change on our instance needed to collect snmp data is a firewall update to allow the new prometheus server to poll the data.19:05
clarkbThe snmp exporter downside is that we'll have to do a fair bit of configuration to tell the snmp exporter what snmp mibs (is that the right terminology?) to collect and where to map them into prometheus. Then we have to do a bunch of work to set of graphs for that data19:06
fungioids, technically19:06
clarkbFor node exporter the issue is we need to run a new service that doesn't exist in our distros (at least I'm fairly certain there aren't packages for it). We would instead use docker + docker-compose to run this service19:07
fungimibs are collections of oids19:07
clarkbThis means we will need to add docker to a number of systems that don't currently run docker today. OpenAFS, DNS, mailman servers immediately come to mind. This is possible but a bit of work too.19:07
clarkbThe upside to using node exporter is we use something that is a bit more ready out of the box to collect server perormance metrics and I'm sure there are preexisting grafana graphs we can borrow from somewhere too19:08
clarkbThat is the gist of it. Please leave your preferences on the spec and I'll followup on that19:08
fungii guess we'd just include the docker role in our base playbook19:08
fungiright?19:09
clarkbPersonally I was leaning towards snmp simply because I thought we hadn't wanted to run docker in places like our dns servers19:09
clarkbfungi: yup and set up docker-compose for node exporter there19:09
fungiare there resource concerns with adding docker to some of those servers?19:09
fricklerdo we really need docker to run node-exporter?19:10
clarkbfrickler: we do if we don't want to reinvent systems/tooling to deploy an up to date version of node exporter19:10
clarkbthere are alternatives but then you're doing a bunch of work to keep a binary blob up to date whcih is basically what docker does19:10
clarkbI definitely don't have a strong opinion on this myself right now and will need to think about it a bit more19:11
frickleryeah, I guess I'll need to do some research, too19:11
clarkbfungi: that is probably the biggest reason to not do this if dockerd + node exporter consume a bunch of resources. I can probably deploy it locally on my fileserver and see what sort of memory and cpu consumption it does19:11
fungiubuntu has prometheus-node-exporter and prometheus-node-exporter-collectors packages, maybe that would be just as good?19:12
fricklerI'm also thinking whether I should add myself as volunteer, but let me sleep about that idea first19:12
clarkbfungi: but not far enough back in time for our systems iirc19:12
clarkbI thought I looked at that and decided docker was really the only way we could run it with our heterogenous setup19:13
fungithere's a prometheus-node-exporter on bionic19:13
clarkblooks like focal does have it but the version is quite old (and focals is quite old too) maybe that was the issue19:13
fungiwe're just about out of the xenial weeds19:13
clarkbYa lets look at this a bit more. THink it over and update the spec. I'm going to continue on in the meeting as we have other stuff to cover and are a quarter of the way through the hour19:14
clarkb#topic Topics19:14
clarkb#topic Mailman Ansible and Server Upgrades19:15
corvusi don't have a strong opinion on which; i just feel like writing system-config changes for either basically negates the value of the other, so we should try to pick one early19:15
corvus[eot from me; carry on]19:15
clarkbOn Sunday fungi and I upgraded lists.openstack.org and that was quite the adventure19:15
fungicorvus also helped with that19:15
clarkboh right corvus helped out with the mailman stuff at the end19:15
clarkbEverything went well until we tried to boot the Focal kernel on the ancient rax xen pv flavor19:16
corvusvery little;  i made only a brief appearance; :)19:16
clarkbit turns out that xen can't properly decompress the focal kernels because they are compressed with lz419:16
fungicorvus: brief but crucial to the plot19:16
clarkbWe worked around the kernel issue by manually decompressing the kernel using the linxu kernel's extract-vmlinux tool, installing grub-xen, then chainbooting to the /boot/xen/pvboot-x86_64.elf that it installs19:17
clarkbWhat that did was tell xen how to find the kernel as well as supply a kernel to it that it doesn't have to decompress19:17
clarkbThen we had to fix up our exim, mailman, and apache configs to handle new mailman env var filtering19:17
clarkbWhere we are at right now is the host is out of the emergency file and ansible is ansibling the new configs that we had to do successfuuly19:18
clarkbBut the kernel situation is still all kinds of bad. We need to decide how we want to ensure that ubuntu isn't going to (re)install a compressed kernel.19:18
funginote that the kernel dance is purely because the lists.o.o server was created in 2013 and has been in-place upgraded continuously since ubuntu 12.04 lts19:18
fungiso it's still running an otherwise unavailable pv flavor in rackspace19:19
clarkbWe can pin the kernel package. We can create a kernel postinst.d hook to decompress the kernel when the kernel updates. We can manually decompress the current kernel whenever we need to update (and use a rescue instance if the host reboots unexpectedly).19:19
fungipv xen loads the kernel from outside the guest domu, while pvhvm works more like a bootable virtual machine similar to kvm19:19
clarkbIn all cases I think we should begin working to replace the server, but there will be some period of time between now and when we are running with a new server where we want to have a working boot setup19:19
corvusoh, the chainloaded kernel can't be compressed?19:20
corvus(i thought maybe the chainloading could get around that)19:20
fungicorvus: nope, because it still has to hand the kernel blob off to the pv xen hypervisor19:20
clarkbcorvus: we did some digging this morning and while we haven't tested it have foudn sufficient evidence that this doesnt work on mailing lists and web forums that we didn't want to try it19:20
clarkbreally all the chain load is doing is finding the correct kernel to hand to xen I think19:20
clarkbbecause it understands grub2 configs19:20
clarkbhttps://unix.stackexchange.com/questions/583714/xen-pvgrub-with-lz4-compressed-kernels covers what is involved in auto decompressing the kernel if we want to do that19:21
fungiyeah, it essentially communicates the offset where the kernel blob starts19:21
corvusok.  then i agree, we're in a hole and we should get out of it with a new server19:21
fungiwith "new server" comes a number of questions, like should we take this opportunity to fold in lists.katacontainers.io? should we take this as an opportunity to migrate to mm3 on a new server?19:22
clarkbyup I think we should just accept that is necessary now. Then decide what workaround for the kernel we want to use while we do that new server work19:22
ianwpinning it as is so a power-off situation doesn't become fatal and working on a new server seems best to me19:22
clarkbianw: ya and if we really need to do a kernel update on the server we can do it manually and do the decompress step at the same time19:23
clarkbI'm leaning towards an apt pin myself for this reason. It doesn't prevent us from updating but ensures we do so with care19:23
fricklermaybe too obvious a question, but resizing to a modern flavor isn't supported on rackspace?19:24
clarkbfrickler: ya iirc you could only resize within pv or pvhvm flavors but not across19:24
fungiswitching from pv to pvhvm isn't supported anyway19:24
clarkbBut I guess that is somethign we could ask? fungi maybe as a followup on the issue you opened?19:24
ianwfungi: it seems sensible to make the migration also be a mm3 migration19:25
fungioh, that trouble ticket is already closed after we worked out how to boot19:25
fungii went back over the current state of semi-official mm3 containers, we'd basically need three containers for the basic components of mm3 (core, hyperkitty, postorius) plus apache and mysql. or we could use the distro packages in focal (it has mm 3.2.2 while latest is 3.3.4)19:25
fungialso there are tools to import mm 2.1 configs to 3.x19:25
clarkbfungi: I think we should confirm we can't switch from pv to pvhm. I'm fairly certain our image would support both since the menu.lst is where we put the chainload and normal grub boot should ignore that19:25
fungiand import old archives (with some caveats), though we can also serve old pipermail copies of the archives for backward compatibility with existing hyperlinks19:26
ianwfungi: it could basically be stood up completely independently for validation right?  the archives seem the thing that need importing19:26
clarkbianw: fungi: yes and we should be able to use zuul holds for that too19:26
fungiclarkb: yeah, in theory the image we have now could work on a pvhvm flavor, if there's a way to swotch it19:26
fungiswitch it19:26
fungiianw: archives and list configs both need importing, but yes i expect we'd follow our test-based development pattern for building the new mm3 deployment and then just hold a test node19:27
clarkbLet me try an summarize what we seem to be thinking: 1) pin the kernel package on lists.o.o so it doesn't break. Manually update the kernel and decompress if necessary. 2) Begin work to upgrade to mm3 3and4) Determine if we can switch to a pvhvm flavor whcih boots reliably against modern kernels or replace the server19:28
clarkbIs there any objections to 1) since getting that sorted sooner than later is a good idea.19:28
ianw++ to all from me19:28
fungiyeah, i'm good with all of the above19:28
fungii can set the kernel hackage hold once the meeting ends19:29
clarkbfungi: thanks. I'd be happy to follow along since I always find those confusing and more experience with them would be good :)19:29
fungihappily19:29
clarkbI'll see if I can do any research into the pv to pvhvm question19:30
clarkband sounds like fungi has already been looking at 2)19:30
fungifor years, but again this week yes19:30
clarkbAnything else on this subject? Concerns or issues you've noticed since the upgrade outside of the above19:31
fungiaside from the kernel issue we also had some changes we needed to make to our tooling around envvars19:31
fungicorvus managed to work out that newer mailman started filtering envvars19:32
fungiso the one we made up for the site dir in our multi-site design was no longer making it throughto the config script19:32
fungiand we ended up needing to pivot to a specific envvar it wasn't filtering19:32
fungithis meant refactoring the site hostname to directory mapping into the config script19:33
fungisince we switched from using an envvar which conveyed the directory to one which conveyed the virtual hostname19:33
clarkbright we could've theoretically set the site dir in the HOST env var but that would have been very confusing19:35
clarkband if mailman used the env var for anything else potentially broken19:35
fungiworth noting, since mm3 properly supports distinct hostnames (you can have the same list localpart at multiple domains now and each is distinct) we'll be able to avoid all that complexity with a switch to mm319:35
clarkbhttps://review.opendev.org/c/opendev/system-config/+/808570 has the details if you are interested19:36
clarkbAlright lets move on. We haev a few more things to discuss and more than half our time is gone.19:36
clarkb#topic Improving CD throughput19:37
clarkbianw: ^ anything new on this subject since the realization we needed to update periodic pipelines? Sorry I haven't had time to look at this again in a while19:37
ianwumm, things in progress but i got a little sidetracked19:38
ianwi think we have the basis of the dependencies worked out, but i need to rework the changes 19:38
ianwso in short, no, nothing new19:38
clarkbIt might also be good to sketch out what the future of the semaphores looks like in WIP changes just so we can see the end result. But no rush lots to sort out on this stack19:38
ianwyeah it's definitely a "make it work serially first" situation19:39
clarkb#topic Gerrit Account Cleanups19:39
clarkbI have written notes on proposed plans for each user in the comments of review02:~clarkb/gerrit_user_cleanups/audit-results-annotated.yaml19:40
clarkbThere are 33 of these conflicts remaining. If you get a chance to look at the notes I wrote that would be great. fungi has read them over and didn't seem concerned though19:40
clarkbMy intent was to start writing those emails this week and make fixups in a checkout of the repo on review02 but mailing lists and other things have distracted me19:40
clarkbOther than checking the notes I don't really need anything other than time though. This is making decent progress when I get that time19:41
clarkb#topic OpenDev Logo Hosting19:41
clarkbat this point we just need to update paste and gerrit's themes to use the gitea in repo hosted files then we are cleaned up from a gitea upgrade perspective19:41
fungithis seems to be working well so far19:42
clarkbianw: you said you would write those changes, are they up yet?19:42
clarkbfungi: and I agree seems to be working for what we are doing with gitea itself19:42
ianwahh, no those changes aren't up yet.  on my todo19:42
clarkbfeel free to ping me when they go up and I'll review them19:43
clarkb#topic Expanding InMotion cloud deployment19:43
clarkbIt sounds like InMotion is able to give us a few more IPs in order to better utilize the cluster we have19:43
clarkbI'll be working with them Friday morning to work through that. However, right now we are failing to boot instances there and I need to go look at it more in depth19:43
clarkbapparently rabbitmq is fine? and it may be some nova quota mismatch problem19:44
fungineat19:44
clarkbI'll probably go ahead and disable the cloud soon as it isn't booting stuff and I think network changes potentailly mean we don't want nodepool running against it anyway19:44
clarkbif anyone else wants to join let me know19:45
clarkbwill be conference call configuration meeting sounds like19:45
clarkb#topic Scheduling Gerrit Project Renames19:46
clarkbWe've got a few project rename requests now. In addition to starting to think about a time to do those we have discovered some additional questions about the rename process19:46
clarkbWhen we rename projects we should update all of their metadata in gitea so that the issues link and review links all line up19:46
clarkbThis should be doable but requires updates to the rename playbook. Good news is that is tested now :)19:47
clarkbThe other question I had was what do we do with orgs in gitea (and gerrit) when all projects are moved out of them.19:47
clarkbIn particular I'm concerned that deleting an org would break the redirects gitea has for things from foo/bar -> bar/foo if we delete foo/19:47
clarkbcorvus: ^ you've looked at that code before do you have a sense for what might happen there?19:48
fungiin addition to that, it might not be terrible to have a redirect mapping for storyboard.o.o, which could probably just be a flat list in a .htaccess file deployed on the server, built from the renames data we have in opendev/project-config (this could be a nice addition after the containerization work diablo_rojo is hacking on)19:48
corvusclarkb: i don't recall for certain, but i think there is a reasonable chance that it may break as you suspect19:49
clarkbok something to test for sure then19:50
clarkbAs far as scheduling goes I'm wary of trying to do it before the openstack release which happens October 6th ish19:50
fungiywah, one of the proposed rename changes would empty the osf/ gerrit namespace and thus the osf org in gitea19:50
clarkbBut the week after: October 11 -15 might be a good time to do renames19:51
clarkbI think that is the week before the ptg too?19:51
clarkbProbably a good idea to avoid doing it during the ptg :)19:51
fungiyeah, i'm good with that. i'm taking the friday before then off though19:52
fungi(the 8th)19:52
clarkbok lets pencil in that week and decide on a specifc day as we get closer. Also work on doing metadata upadtes and test org removals19:52
fungiwfm19:53
clarkbIf orgs can't be removed safely that isn't the end of the world and we'll just keep them for redirects19:53
clarkb#topic Open Discussion19:53
clarkbThank you for listening to me for the last hour :) Anything else?19:53
funginothing immediately springs to mind. i'll try to whip up a mm3 spec though19:54
clarkbI'll give it another minute19:55
clarkbThanks everyone! we'll see you here next week same time and location19:56
clarkb#endmeeting19:56
opendevmeetMeeting ended Tue Sep 14 19:56:36 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:56
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.html19:56
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.txt19:56
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-09-14-19.01.log.html19:56
fungithanks clarkb!19:56

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!