*** hamalq has quit IRC | 01:26 | |
*** hashar has joined #opendev-meeting | 07:12 | |
*** hashar is now known as hasharLunch | 11:57 | |
*** hasharLunch is now known as hashar | 16:22 | |
clarkb | anyone else here for the meeting? | 19:00 |
---|---|---|
ianw | o/ | 19:00 |
clarkb | I've been in meetings all morning and have also been trying to catch up on merger things as well as make new ones so I'm a bit behind for this hour. But trying to be ready :) | 19:00 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Feb 23 19:01:16 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-February/000185.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:02 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:02 | |
clarkb | I did not have any announcements | 19:02 |
clarkb | #topic Actions from last meeting | 19:02 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:02 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-16-19.01.txt minutes from last meeting | 19:02 |
clarkb | corvus has an action to unfork our jitsi meet installation | 19:03 |
clarkb | I haven't seen any changes for that and corvus was busy with the zuul v4 release so I expect that has not happened yet | 19:03 |
clarkb | corvus: ^ should I readd the action? | 19:04 |
corvus | yep so sorry | 19:04 |
clarkb | #action corvus unfork our jitsi meet installation | 19:05 |
clarkb | #topic Priority Efforts | 19:05 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:05 | |
clarkb | #topic OpenDev | 19:05 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:05 | |
clarkb | fungi: ianw: I guess we hit the missing accounts index lock problem again over the weekend | 19:05 |
clarkb | lslocks showed it was gone and fungi responded to the upstream gerrit bug pointing out that we saw it again | 19:06 |
ianw | yes, fungi did all the investigation, but we did restart it sunday/monday during quiet time | 19:06 |
clarkb | There hasn't been much movement on the bug since I originally filed it. I wonder if we should bring it up on the mailing list to see if anyone else has seen this behavior | 19:07 |
clarkb | I wonder if we can suggest it try to relock it | 19:08 |
fungi | and yes, the restart once again mitigated the error | 19:08 |
clarkb | according to lslocks there is no lock for the path so it isn't like something else has taken the lock away | 19:08 |
fungi | all i could figure is something is happening on the fs | 19:08 |
fungi | but i couldn't find any logs to indicate what that might have been | 19:09 |
clarkb | ok, I guess we continue to monitor it and if we find time bring it up with upstream on the mailing list to see if anyone else suffers this as well | 19:10 |
clarkb | Next up is the account inconsistencies. I have not yet found time to check which of the unhappy accounts are active. But I do still like fungi's idea of generating that list, retiring the others and sorting out only the subset of active accounts | 19:10 |
fungi | it's the first time we've seen it in ~4 months | 19:10 |
clarkb | that should greatly simplify the todo list there | 19:10 |
fungi | or was it 3? anyway, it's been some time | 19:11 |
fungi | yeah, it's the "if a tree fals in the forest and nobody's ever going to log into it again anyway" approach ;) | 19:11 |
clarkb | With gitea OOMs I tried to watch manage-projects as it ran yseterday as part of the "all the jobs" run for zm01.opendev.org deployment. And there was a slight jump in resource utilization but things looked happy | 19:13 |
clarkb | that makes me suspect the we are our own dos theory less | 19:13 |
clarkb | However, the dstat recording change for system-config-run jobs did eventually land yesterday so we can start to try and look at that sort of info for future updates | 19:13 |
clarkb | and that applies to all the system-config-run jobs, not just gitea. | 19:14 |
clarkb | Any other opendev related items or should we move on? | 19:14 |
clarkb | #topic Update Config Management | 19:16 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:16 | |
clarkb | Are there any configuration management updates to call out? I haven't seen any, but have had plenty of distractions so may have missed something | 19:16 |
clarkb | #topic General Topics | 19:18 |
*** openstack changes topic to "General Topics (Meeting topic: infra)" | 19:18 | |
clarkb | #topic OpenAFS Cluster Status | 19:18 |
*** openstack changes topic to "OpenAFS Cluster Status (Meeting topic: infra)" | 19:18 | |
clarkb | Last we checked in on this subject all the servers had their openafs packages upgraded but we were still waiting on operating system upgrades. Anything new on this? | 19:18 |
ianw | i haven't got to upgrades for this yet | 19:18 |
clarkb | ok we can move on then | 19:19 |
clarkb | #topic Bup and Borg | 19:19 |
*** openstack changes topic to "Bup and Borg (Meeting topic: infra)" | 19:19 | |
clarkb | At this point I think this topic might be more of just "Borg" but we're continuing to refine and improve the borg backups | 19:19 |
ianw | yep i need to do a final sweep through the hosts and make sure the bup jobs have stopped | 19:20 |
ianw | and then we can shutdown the bup servers and decide what to do with the storage | 19:20 |
clarkb | in the past we've held on to old bup backup volumes when rotating in new ones. Probably want to keep them around for a bit to ensure we've got that overlap here? | 19:21 |
ianw | yep, we can keep for a bit. practically, last time we tried to retrieve anything we'd sorted everything out before the bup processes had even completed extracting a tar :) | 19:22 |
clarkb | ya, that is a good point | 19:22 |
*** hashar has quit IRC | 19:22 | |
clarkb | anything else to add on this item? | 19:23 |
ianw | nope, next week might be the last time it's a thing of interest :) | 19:23 |
clarkb | excellent | 19:24 |
clarkb | #topic Picking up steam on server upgrades | 19:24 |
*** openstack changes topic to "Picking up steam on server upgrades (Meeting topic: infra)" | 19:24 | |
clarkb | I've jumped into trying to upgrade the operating systems under zuul, nodepool, and zookeeper | 19:24 |
*** diablo_rojo has joined #opendev-meeting | 19:24 | |
fungi | thanks! | 19:25 |
clarkb | so far zm01.opendev.org has been replaced and seems happy. I've been working on replacing 02-08 this morning so expect changes for that after the meeting | 19:25 |
ianw | ++ | 19:25 |
clarkb | Then my plan is to look at executors, launchers, zookeeper, and zuul scheduler (likely in that order) | 19:25 |
clarkb | I think that order is roughly from easiest to most difficult and working through the steps will hopefully make the more difficult steps easier :) | 19:26 |
clarkb | There are other services that need this treatment too. If you've got time and or interest please jump in too :) | 19:26 |
clarkb | some of them will require puppet be rewritten to ansible as well. These are likely to be the most painful ones | 19:26 |
clarkb | but maybe doing that sort of rewrite is more interesting to some | 19:27 |
clarkb | Anything else to add to this item? | 19:28 |
fungi | not from me | 19:28 |
clarkb | #topic Upgrading refstack.o.o | 19:28 |
*** openstack changes topic to "Upgrading refstack.o.o (Meeting topic: infra)" | 19:28 | |
clarkb | ianw: kopecmartin: are there any changes we can help review or updates to the testing here? | 19:29 |
ianw | last update for me was we put some nodes on hold after finding the testinfra wasn't actually working as well as we'd hoped | 19:29 |
ianw | there was some unicode errors which i *think* got fixed too | 19:30 |
clarkb | ya I think some problems were identified in the service itself too (a bonus for ebtter testing) | 19:31 |
clarkb | Sounds like we're still largely waiting for kopecmartin to figure out what is going on though? | 19:32 |
ianw | i think so yes; kopecmartin -- lmn if anything needs actioning | 19:33 |
clarkb | thanks for the update | 19:33 |
clarkb | #topic Bridge disk space | 19:33 |
*** openstack changes topic to "Bridge disk space (Meeting topic: infra)" | 19:33 | |
clarkb | We're running low on disk space on bridge. I did some quick investigating yesterday and the three locations we seem to consume the most space is /var/log /home and /opt | 19:34 |
clarkb | I think there may be some cleanup we can do in /var/log/ansible where we've leaked some older log files. /home has miscellaneous content in our various homedirs, maybe we can each take a look and clean up unneeded files? and /opt seems to have a number of disk images on it as well as some stuff for ianw | 19:35 |
clarkb | mordred: I think the images in /opt were when you were trying to do builds for focal? | 19:35 |
clarkb | we ended up not using those iirc because we couldn't consistently build them with nodepool due to the way boot from volume treats images | 19:36 |
clarkb | should we just clean those up? or maybe remove the raw and vhd versions and keep qcow2? | 19:36 |
clarkb | (as a side note I used the cloud provided focal images for zuul mergers since we seemed to abandon the build our own idea for the time being) | 19:36 |
ianw | yeah i think they can go | 19:37 |
clarkb | in any case I suspect we'll run out of disk there in the near future so cleanup that can be made would be great. | 19:38 |
clarkb | if infra-root can check their homedirs and ianw can look at /opt/ianw I can take a look at the images and maybe start by removing the raw/vhd copies first | 19:38 |
fungi | apparently the launch-env in my homedir accounts for 174M | 19:38 |
fungi | but otherwise all cleaned up now | 19:39 |
clarkb | thanks! | 19:39 |
ianw | i can't remember what /opt/ianw was about, i'll clear it out | 19:39 |
clarkb | And I think that was all I had on the agenda | 19:40 |
clarkb | #topic Open Discussion | 19:40 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:40 | |
clarkb | Anything else? | 19:40 |
fungi | i'm still struggling to get git-review testing working for python 3.9 so i can tag 2.0.0 | 19:40 |
fungi | after discussion yesterday, i may have to rework more of how gerrit is being invoked in the test setup | 19:41 |
fungi | something is mysteriously causing the default development credentials to not work | 19:41 |
ianw | you are using the upstream image right? | 19:42 |
fungi | official warfile, yes | 19:42 |
ianw | oh, no, the upstream .jar ... not their container | 19:42 |
fungi | we could redo git-review's functional tests to use containerized gerrit, but that seemed like a much larger overhaul | 19:43 |
fungi | right now it's designed to set up a template gerrit site and then start a bunch of parallel per-test gerrits from that in different temporary directories | 19:44 |
clarkb | there are examples of similar in the gerritlib tests | 19:44 |
clarkb | and ya you'd probably want to switch it to using a project per test rather than gerrit per test | 19:44 |
fungi | right, and that gets into a deep overhaul of git-review's testing, which i was trying to avoid right now (i don't really have time for that, but maybe i don't have time for this either) | 19:45 |
fungi | alternatives are to say we support python 3.9 but not test with it, or say we don't support python 3.9 because we're unable to test with it | 19:45 |
ianw | it's sort of "we're unable to test git-review" in general ATM right? | 19:46 |
fungi | or maybe try to get 3.9 tests going on bionic instead of focal | 19:46 |
fungi | ianw: it's that our gerrit tests rely on gerrit 2.11 which focal's openssh can't connect to | 19:47 |
*** gmann is now known as gmann_lunch | 19:47 | |
fungi | so we test up through python 3.8 just fine | 19:47 |
ianw | we could connect it to the system-config job; that has figured out the "get a gerrit running" bit ... but doesn't help local testing | 19:47 |
fungi | also doesn't help the "would have to substantially redo git-review's current test framework design" part | 19:48 |
mordred | clarkb: oh sorry - re: images - I think the focal images in /opt on bridge are the ones I manually built and uploaded for control plane things? | 19:48 |
clarkb | mordred: yes, but then we didn't really use them because boot from volume is weird iirc | 19:48 |
mordred | but - honestly - I don't see any reason to keep them around | 19:48 |
clarkb | that was the precursor to having nodepool do it, then nodepool did it, then we undid the nodepool | 19:48 |
fungi | i assumed the path of least resistance was to update the gerrit version we're testing against to one focal can ssh into, but 2.11 was the last version to keep ssh keys in the rdbms, which was how the test account was getting bootstrapped | 19:49 |
fungi | so gerrit>2.11 means changing how we bootstrap our test user | 19:50 |
fungi | but as usually happens, that's a rabbit hole to which i have yet to find the bottom | 19:50 |
clarkb | maybe bad idea: you could vendor an all-users repo state | 19:51 |
clarkb | and start gerrit with that | 19:51 |
fungi | that's something i considered, yeah | 19:51 |
fungi | though we'll need to vendor a corresponding ssh public key as well i suppose | 19:51 |
fungi | er, public/private keypair | 19:51 |
clarkb | or edit the repo directly before starting it with a generated value | 19:51 |
clarkb | but gerritlib and others bootstrap using a dev mode that should work | 19:52 |
clarkb | just need to sort out why it doesn't | 19:52 |
fungi | right, i thought working out how to interact with the rest api would be 1. easier than reverse-engineering undocumented notedb structure, and 2. an actual supported stable interface so we don't find ourselves right back here the next time they decide to tweak some implementation detail of the db | 19:52 |
fungi | anyway, it seems like something about how the test framework is initializing and then later starting gerrit migt be breaking dev mode | 19:53 |
fungi | so that's the next string to tug on | 19:53 |
fungi | i'll see if it could be as simple as calling java directly instead of trying to use the provided gerrit.sh initscript | 19:54 |
ianw | i can try like actually running it today instead of just making red-herring comments on diffs and see if i can see anything | 19:54 |
fungi | ianw: a tip is to comment out all the addcleanup() calls and then ask tox to run a single test | 19:55 |
fungi | i've abused that to get a running gerrit exactly how the tests try to run it | 19:55 |
fungi | right now the setup calls init with --no-auto-start, and then reindex, and then copies the resulting site and runs gerrit from teh copies in daemon mode via gerrit.sh, which is rather convoluted | 19:56 |
fungi | and supplies custom configs to each site copy with distinct tcp ports | 19:57 |
clarkb | ok, let me know if I can help. | 19:58 |
clarkb | I may be partially responsible for the old setup :) | 19:58 |
clarkb | and now we're just about at time | 19:58 |
clarkb | feel free to continue discussion on the mailing list or in #opendev | 19:58 |
clarkb | and thank you everyone for your time | 19:58 |
clarkb | #endmeeting | 19:58 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 19:58 | |
openstack | Meeting ended Tue Feb 23 19:58:58 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:59 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.html | 19:59 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.txt | 19:59 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.log.html | 19:59 |
fungi | thanks clarkb! | 19:59 |
*** gmann_lunch is now known as gmann | 20:07 | |
*** hamalq has joined #opendev-meeting | 20:30 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!