Tuesday, 2021-02-23

*** hamalq has quit IRC01:26
*** hashar has joined #opendev-meeting07:12
*** hashar is now known as hasharLunch11:57
*** hasharLunch is now known as hashar16:22
clarkbanyone else here for the meeting?19:00
ianwo/19:00
clarkbI've been in meetings all morning and have also been trying to catch up on merger things as well as make new ones so I'm a bit behind for this hour. But trying to be ready :)19:00
clarkb#startmeeting infra19:01
openstackMeeting started Tue Feb 23 19:01:16 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-February/000185.html Our Agenda19:01
clarkb#topic Announcements19:02
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:02
clarkbI did not have any announcements19:02
clarkb#topic Actions from last meeting19:02
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:02
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-16-19.01.txt minutes from last meeting19:02
clarkbcorvus has an action to unfork our jitsi meet installation19:03
clarkbI haven't seen any changes for that and corvus was busy with the zuul v4 release so I expect that has not happened yet19:03
clarkbcorvus: ^ should I readd the action?19:04
corvusyep so sorry19:04
clarkb#action corvus unfork our jitsi meet installation19:05
clarkb#topic Priority Efforts19:05
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:05
clarkb#topic OpenDev19:05
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:05
clarkbfungi: ianw: I guess we hit the missing accounts index lock problem again over the weekend19:05
clarkblslocks showed it was gone and fungi responded to the upstream gerrit bug pointing out that we saw it again19:06
ianwyes, fungi did all the investigation, but we did restart it sunday/monday during quiet time19:06
clarkbThere hasn't been much movement on the bug since I originally filed it. I wonder if we should bring it up on the mailing list to see if anyone else has seen this behavior19:07
clarkbI wonder if we can suggest it try to relock it19:08
fungiand yes, the restart once again mitigated the error19:08
clarkbaccording to lslocks there is no lock for the path so it isn't like something else has taken the lock away19:08
fungiall i could figure is something is happening on the fs19:08
fungibut i couldn't find any logs to indicate what that might have been19:09
clarkbok, I guess we continue to monitor it and if we find time bring it up with upstream on the mailing list to see if anyone else suffers this as well19:10
clarkbNext up is the account inconsistencies. I have not yet found time to check which of the unhappy accounts are active. But I do still like fungi's idea of generating that list, retiring the others and sorting out only the subset of active accounts19:10
fungiit's the first time we've seen it in ~4 months19:10
clarkbthat should greatly simplify the todo list there19:10
fungior was it 3? anyway, it's been some time19:11
fungiyeah, it's the "if a tree fals in the forest and nobody's ever going to log into it again anyway" approach ;)19:11
clarkbWith gitea OOMs I tried to watch manage-projects as it ran yseterday as part of the "all the jobs" run for zm01.opendev.org deployment. And there was a slight jump in resource utilization but things looked happy19:13
clarkbthat makes me suspect the we are our own dos theory less19:13
clarkbHowever, the dstat recording change for system-config-run jobs did eventually land yesterday so we can start to try and look at that sort of info for future updates19:13
clarkband that applies to all the system-config-run jobs, not just gitea.19:14
clarkbAny other opendev related items or should we move on?19:14
clarkb#topic Update Config Management19:16
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:16
clarkbAre there any configuration management updates to call out? I haven't seen any, but have had plenty of distractions so may have missed something19:16
clarkb#topic General Topics19:18
*** openstack changes topic to "General Topics (Meeting topic: infra)"19:18
clarkb#topic OpenAFS Cluster Status19:18
*** openstack changes topic to "OpenAFS Cluster Status (Meeting topic: infra)"19:18
clarkbLast we checked in on this subject all the servers had their openafs packages upgraded but we were still waiting on operating system upgrades. Anything new on this?19:18
ianwi haven't got to upgrades for this yet19:18
clarkbok we can move on then19:19
clarkb#topic Bup and Borg19:19
*** openstack changes topic to "Bup and Borg (Meeting topic: infra)"19:19
clarkbAt this point I think this topic might be more of just "Borg" but we're continuing to refine and improve the borg backups19:19
ianwyep i need to do a final sweep through the hosts and make sure the bup jobs have stopped19:20
ianwand then we can shutdown the bup servers and decide what to do with the storage19:20
clarkbin the past we've held on to old bup backup volumes when rotating in new ones. Probably want to keep them around for a bit to ensure we've got that overlap here?19:21
ianwyep, we can keep for a bit.  practically, last time we tried to retrieve anything we'd sorted everything out before the bup processes had even completed extracting a tar :)19:22
clarkbya, that is a good point19:22
*** hashar has quit IRC19:22
clarkbanything else to add on this item?19:23
ianwnope, next week might be the last time it's a thing of interest :)19:23
clarkbexcellent19:24
clarkb#topic Picking up steam on server upgrades19:24
*** openstack changes topic to "Picking up steam on server upgrades (Meeting topic: infra)"19:24
clarkbI've jumped into trying to upgrade the operating systems under zuul, nodepool, and zookeeper19:24
*** diablo_rojo has joined #opendev-meeting19:24
fungithanks!19:25
clarkbso far zm01.opendev.org has been replaced and seems happy. I've been working on replacing 02-08 this morning so expect changes for that after the meeting19:25
ianw++19:25
clarkbThen my plan is to look at executors, launchers, zookeeper, and zuul scheduler (likely in that order)19:25
clarkbI think that order is roughly from easiest to most difficult and working through the steps will hopefully make the more difficult steps easier :)19:26
clarkbThere are other services that need this treatment too. If you've got time and or interest please jump in too :)19:26
clarkbsome of them will require puppet be rewritten to ansible as well. These are likely to be the most painful ones19:26
clarkbbut maybe doing that sort of rewrite is more interesting to some19:27
clarkbAnything else to add to this item?19:28
funginot from me19:28
clarkb#topic Upgrading refstack.o.o19:28
*** openstack changes topic to "Upgrading refstack.o.o (Meeting topic: infra)"19:28
clarkbianw: kopecmartin: are there any changes we can help review or updates to the testing here?19:29
ianwlast update for me was we put some nodes on hold after finding the testinfra wasn't actually working as well as we'd hoped19:29
ianwthere was some unicode errors which i *think* got fixed too19:30
clarkbya I think some problems were identified in the service itself too (a bonus for ebtter testing)19:31
clarkbSounds like we're still largely waiting for kopecmartin to figure out what is going on though?19:32
ianwi think so yes; kopecmartin -- lmn if anything needs actioning19:33
clarkbthanks for the update19:33
clarkb#topic Bridge disk space19:33
*** openstack changes topic to "Bridge disk space (Meeting topic: infra)"19:33
clarkbWe're running low on disk space on bridge. I did some quick investigating yesterday and the three locations we seem to consume the most space is /var/log /home and /opt19:34
clarkbI think there may be some cleanup we can do in /var/log/ansible where we've leaked some older log files. /home has miscellaneous content in our various homedirs, maybe we can each take a look and clean up unneeded files? and /opt seems to have a number of disk images on it as well as some stuff for ianw19:35
clarkbmordred: I think the images in /opt were when you were trying to do builds for focal?19:35
clarkbwe ended up not using those iirc because we couldn't consistently build them with nodepool due to the way boot from volume treats images19:36
clarkbshould we just clean those up? or maybe remove the raw and vhd versions and keep qcow2?19:36
clarkb(as a side note I used the cloud provided focal images for zuul mergers since we seemed to abandon the build our own idea for the time being)19:36
ianwyeah i think they can go19:37
clarkbin any case I suspect we'll run out of disk there in the near future so cleanup that can be made would be great.19:38
clarkbif infra-root can check their homedirs and ianw can look at /opt/ianw I can take a look at the images and maybe start by removing the raw/vhd copies first19:38
fungiapparently the launch-env in my homedir accounts for 174M19:38
fungibut otherwise all cleaned up now19:39
clarkbthanks!19:39
ianwi can't remember what /opt/ianw was about, i'll clear it out19:39
clarkbAnd I think that was all I had on the agenda19:40
clarkb#topic Open Discussion19:40
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:40
clarkbAnything else?19:40
fungii'm still struggling to get git-review testing working for python 3.9 so i can tag 2.0.019:40
fungiafter discussion yesterday, i may have to rework more of how gerrit is being invoked in the test setup19:41
fungisomething is mysteriously causing the default development credentials to not work19:41
ianwyou are using the upstream image right?19:42
fungiofficial warfile, yes19:42
ianwoh, no, the upstream .jar ... not their container19:42
fungiwe could redo git-review's functional tests to use containerized gerrit, but that seemed like a much larger overhaul19:43
fungiright now it's designed to set up a template gerrit site and then start a bunch of parallel per-test gerrits from that in different temporary directories19:44
clarkbthere are examples of similar in the gerritlib tests19:44
clarkband ya you'd probably want to switch it to using a project per test rather than gerrit per test19:44
fungiright, and that gets into a deep overhaul of git-review's testing, which i was trying to avoid right now (i don't really have time for that, but maybe i don't have time for this either)19:45
fungialternatives are to say we support python 3.9 but not test with it, or say we don't support python 3.9 because we're unable to test with it19:45
ianwit's sort of "we're unable to test git-review" in general ATM right?19:46
fungior maybe try to get 3.9 tests going on bionic instead of focal19:46
fungiianw: it's that our gerrit tests rely on gerrit 2.11 which focal's openssh can't connect to19:47
*** gmann is now known as gmann_lunch19:47
fungiso we test up through python 3.8 just fine19:47
ianwwe could connect it to the system-config job; that has figured out the "get a gerrit running" bit ... but doesn't help local testing19:47
fungialso doesn't help the "would have to substantially redo git-review's current test framework design" part19:48
mordredclarkb: oh sorry - re: images - I think the focal images in /opt on bridge are the ones I manually built and uploaded for control plane things?19:48
clarkbmordred: yes, but then we didn't really use them because boot from volume is weird iirc19:48
mordredbut - honestly - I don't see any reason to keep them around19:48
clarkbthat was the precursor to having nodepool do it, then nodepool did it, then we undid the nodepool19:48
fungii assumed the path of least resistance was to update the gerrit version we're testing against to one focal can ssh into, but 2.11 was the last version to keep ssh keys in the rdbms, which was how the test account was getting bootstrapped19:49
fungiso gerrit>2.11 means changing how we bootstrap our test user19:50
fungibut as usually happens, that's a rabbit hole to which i have yet to find the bottom19:50
clarkbmaybe bad idea: you could vendor an all-users repo state19:51
clarkband start gerrit with that19:51
fungithat's something i considered, yeah19:51
fungithough we'll need to vendor a corresponding ssh public key as well i suppose19:51
fungier, public/private keypair19:51
clarkbor edit the repo directly before starting it with a generated value19:51
clarkbbut gerritlib and others bootstrap using a dev mode that should work19:52
clarkbjust need to sort out why it doesn't19:52
fungiright, i thought working out how to interact with the rest api would be 1. easier than reverse-engineering undocumented notedb structure, and 2. an actual supported stable interface so we don't find ourselves right back here the next time they decide to tweak some implementation detail of the db19:52
fungianyway, it seems like something about how the test framework is initializing and then later starting gerrit migt be breaking dev mode19:53
fungiso that's the next string to tug on19:53
fungii'll see if it could be as simple as calling java directly instead of trying to use the provided gerrit.sh initscript19:54
ianwi can try like actually running it today instead of just making red-herring comments on diffs and see if i can see anything19:54
fungiianw: a tip is to comment out all the addcleanup() calls and then ask tox to run a single test19:55
fungii've abused that to get a running gerrit exactly how the tests try to run it19:55
fungiright now the setup calls init with --no-auto-start, and then reindex, and then copies the resulting site and runs gerrit from teh copies in daemon mode via gerrit.sh, which is rather convoluted19:56
fungiand supplies custom configs to each site copy with distinct tcp ports19:57
clarkbok, let me know if I can help.19:58
clarkbI may be partially responsible for the old setup :)19:58
clarkband now we're just about at time19:58
clarkbfeel free to continue discussion on the mailing list or in #opendev19:58
clarkband thank you everyone for your time19:58
clarkb#endmeeting19:58
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"19:58
openstackMeeting ended Tue Feb 23 19:58:58 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:59
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.html19:59
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.txt19:59
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.log.html19:59
fungithanks clarkb!19:59
*** gmann_lunch is now known as gmann20:07
*** hamalq has joined #opendev-meeting20:30

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!