*** hamalq has quit IRC | 00:41 | |
*** sboyron has joined #opendev-meeting | 06:54 | |
*** hashar has joined #opendev-meeting | 08:56 | |
*** hashar has quit IRC | 18:21 | |
*** hamalq has joined #opendev-meeting | 18:38 | |
clarkb | anyone else here for the meeting? we will get started soon | 19:00 |
---|---|---|
zbr | o/ | 19:00 |
ianw | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
openstack | Meeting started Tue Mar 2 19:01:07 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-March/000191.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:01 | |
clarkb | clarkb out March 23rd, could use a volunteer meeting chair or plan to skip | 19:01 |
clarkb | This didn't make it onto the email I sent, but will be trying to spend time with the kids during their break from school | 19:01 |
clarkb | if you'd like to chair the meeting on the 23rd feel free to let us know and send out a meeting agenda prior to the meeting. Otherwise I think we can likely skip it | 19:02 |
clarkb | #topic Actions from last meeting | 19:02 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:02 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-02-23-19.01.txt minutes from last meeting | 19:02 |
clarkb | corvus: are there changes to review to unfork jitsi meet's web component? (I think things continue to be busy with zuul so understood if not) | 19:03 |
clarkb | I'll go ahead and readd the action and we can follow up on it next week | 19:04 |
clarkb | #action corvus unfork jitsi meet | 19:04 |
clarkb | #topic Priority Efforts | 19:04 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:04 | |
clarkb | #topic OpenDev | 19:04 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:04 | |
clarkb | Last week another user showed up requesting account surgery which has bumped the priority on addressing gerrit account inconsistencies back up again | 19:05 |
clarkb | I've been trying to work through that since then | 19:05 |
clarkb | As suggested by fungi I have taken another approach at it which is to try and classify the conflicts based on whether or not one side of the conflict belongs to an inactive account or if the accounts appear to have been unused for significant periods of time | 19:06 |
clarkb | That has produced a list of ~35 account that we can go ahead and retire (which I did this morning) and then delete the conflicting external ids from the retired side | 19:06 |
clarkb | I haven't done the external id deletions for all of those accounts yet, but did push up the script I am planning to use for that if people can take a look and see if that seems safe enough | 19:07 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/777846 Collecting scripting efforts here | 19:07 |
clarkb | Hoping to get through that chunk of fixes today, then rerun the consistency check for an up to date list of issues which can be fed back into the audit to get up to date classifications on accounts recent usage | 19:08 |
clarkb | There are a good number of accounst that do appear to have not been used recently. For those I think we can go through the same process as above (either pick an account out of the conflicting set to "win" or retire and remove external ids fro all of them) | 19:08 |
clarkb | I did notice that there may be some accounts that are only used to query the server though and my organizing based on code reviews and pushes is probably incomplete | 19:09 |
clarkb | I reached out to weshay|ruck about one of these (a tripleo account) to see if we can better capture those use cases | 19:09 |
clarkb | it continues to feel like slow going, but it is progress and the more I look at things the better I understand them | 19:10 |
clarkb | One thing that occured to me is that setting accounts inactive is a relatively low cost option. That makes me think we should do this in a staged process where we set the accounts inactive then wait a week or whatever for people to complain (can send eamil about this too) | 19:10 |
clarkb | then if people complain we reactivate their accounts and move them out of the list, for the rest we remove the external ids and fix the conflicts | 19:11 |
clarkb | anyway that is still a ways away as I want to refine the classifications further once this set is done | 19:11 |
clarkb | Any other OpenDev topics to discuss before we move on? | 19:11 |
ianw | no, but thanks for working on this tricky set of circumstances! :) | 19:12 |
fungi | i'm working on pushing git-review 2.0.0.0 release candidates now to exercise release automation for it in preparation for a new release | 19:13 |
fungi | we've got everything merged at this point which was slated for release | 19:13 |
clarkb | cool, the big change being git-review will require python3? | 19:13 |
fungi | rc1 is in the release pipeline as we speak | 19:13 |
fungi | yes, no more 2.7 support (thanks zbr for the change for that) | 19:14 |
clarkb | #topic General topics | 19:16 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:16 | |
clarkb | #topic OpenAFS cluster status | 19:16 |
*** openstack changes topic to "OpenAFS cluster status (Meeting topic: infra)" | 19:16 | |
clarkb | ianw is adding a third afs db server in order for us to have proper quorum in the cluster | 19:16 |
clarkb | apparently 2 is not enough (not surprising) | 19:16 |
clarkb | ianw: anything additional to add to that? changes to review maybe? | 19:16 |
ianw | yeah, that third server is active and has validated that it works ok with focal, so i'll take on the in-place upgrades we've talked about | 19:17 |
clarkb | excellent | 19:17 |
clarkb | Also I noticed that afs01.dfw's vicepa is fairly full | 19:17 |
ianw | couple of small reviews are https://review.opendev.org/c/opendev/system-config/+/778127 and https://review.opendev.org/c/opendev/system-config/+/778120 | 19:17 |
clarkb | I noticed that a few weeks ago and pushed up some changes to work towards dropping fedora-old (not sure of the exact version) | 19:18 |
clarkb | There are probably other ways we could prune the data set if others have ideas that would be great | 19:18 |
ianw | ahh, ok, i can go through and look for that and deal. fedora is hitting up against our -minimal issues with tools on build hosts, the container-build stuff is working but needs polishing | 19:19 |
clarkb | ianw: ya we have fedora-old, fedora-intermediate, and fedora-current. Its -current that has trouble, most testing seems to be on -intermediate so I think we can drop -old | 19:20 |
clarkb | but if you can double check that and review some of the changes that would probably be good | 19:20 |
ianw | will do | 19:20 |
clarkb | #topic Borg Backups | 19:21 |
*** openstack changes topic to "Borg Backups (Meeting topic: infra)" | 19:21 | |
clarkb | ianw: fungi: any new insight into why gitea db backups pushing to the vexxhost dest has trouble? | 19:21 |
ianw | no, but i have to admit i haven't looked fully. i think i'll try and run the mysqldump a few times and see if that is dying locally | 19:22 |
clarkb | ++ that seems like a good test | 19:22 |
fungi | ahh, yeah i got sidetracked after getting as far as finding the disconnect error in the mariadb logs | 19:22 |
ianw | the fact that it died three days in a row at the same row number seems very supicious | 19:22 |
clarkb | anything else on this topic? | 19:23 |
ianw | and that the filesystem part doesn't seem to have issues; and no other host is reporting issues | 19:23 |
ianw | nope, otherwise, i've retired the old servers, we have a 1tb drive attached to the RAX host with the latest rotation of bup backups if we require | 19:24 |
clarkb | thank you! | 19:24 |
clarkb | #topic Server Updates | 19:24 |
*** openstack changes topic to "Server Updates (Meeting topic: infra)" | 19:24 | |
clarkb | I've made some progress with zuul server rolling replacements | 19:24 |
clarkb | all the mergers are focal now and the old servers have been cleaned up (though it just occurred to me I still have dns records to clean up) | 19:25 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/778227 is the next step for executor replacements | 19:25 |
clarkb | basically if you think the new ze server is happy (from what I can see it is, including tarball publishing jobs to afs) | 19:25 |
clarkb | er if ^ then please help alnd that chagne. I'll delete the old server then start doing some replacements in larger batches (3 at a time?) | 19:25 |
clarkb | Anyone else looking at updates other than afs servers, refstack, and zuul? | 19:26 |
ianw | yeah i started on review | 19:26 |
clarkb | oh ya I saw your email to upstream about the mariadb weirdness | 19:27 |
ianw | however we've got ourselves in a bit of a tangle with review01.<openstack|opendev>.org | 19:27 |
ianw | so we have A dns records for review01.opendev.org | 19:27 |
ianw | i proposed removing them for the new server ... https://review.opendev.org/c/opendev/zone-opendev.org/+/777926 | 19:27 |
ianw | i need to spend some time with system-config and see what we can do | 19:28 |
ianw | calling the new server "review02.opendev.org" *may* help a little? | 19:28 |
clarkb | my poor memory says we may have done that for a reason | 19:29 |
clarkb | hrm ya and with the LE records too | 19:29 |
clarkb | ya we use the dns records there to validate the ssl cert on the server :/ | 19:30 |
fungi | git history might point to why we added it | 19:30 |
fungi | but that sounds likely | 19:30 |
ianw | but do we need a cert for review01.opendev.org? | 19:30 |
ianw | i don't feel like anyone is accessing it like that | 19:30 |
clarkb | I think the major reason for it may be for sshfp since we sshfp to review01 for 22 but to review.opendev.org for 29418 | 19:31 |
fungi | #link https://review.opendev.org/744557 Split review's resource records from review01's | 19:31 |
clarkb | and ya maybe we can stop doing a review01 altname and just generate certs for review.opendev and review.openstack | 19:31 |
fungi | sshfp record was breaking ssh access to gerrit's ssh api port | 19:32 |
clarkb | and the sshfp records aren't super important right now iirc | 19:32 |
clarkb | fungi: ya so we moved it to review01 from review | 19:32 |
clarkb | so ya I think we are ok if we reduce the LE tie in and maybe clean up sshfp records too for completeness | 19:33 |
ianw | ok, i can look at that, split 777926 up into two steps | 19:33 |
fungi | makes sense | 19:33 |
clarkb | ianw: then for bootstrapping the new host with ansibel we want to do somethign similar to what review-test did without replication config, etc | 19:33 |
clarkb | anything else on the topic of server upgrades? | 19:34 |
ianw | yep | 19:34 |
ianw | one more thing, what did we decide about review-dev? | 19:34 |
clarkb | ianw: we should clean it up though that hasn't happened yet | 19:34 |
ianw | ok, i'll do that too | 19:34 |
clarkb | might want to double check with mordred and corvus et al that they dno't have anything on that server to retain (shouldn't but it was a sandbox for a while) | 19:35 |
clarkb | we also need to get review-test back into ansible but that is probably less urgent | 19:35 |
clarkb | #topic New refstack server | 19:36 |
*** openstack changes topic to "New refstack server (Meeting topic: infra)" | 19:36 | |
clarkb | Looked like there was some new testing being done to sort out some problems? I didn't catch what the current problems are though | 19:36 |
kopecmartin | i have 2 patches up for that | 19:37 |
kopecmartin | #link https://review.opendev.org/c/opendev/system-config/+/776292 | 19:37 |
kopecmartin | when merged, will the held server be updated automatically? | 19:37 |
kopecmartin | I'd like to test it one more time and then let's got to production finally | 19:37 |
ianw | kopecmartin: nope, that's not being ansiblised | 19:37 |
kopecmartin | ok, np, I'll do it manually then | 19:38 |
ianw | however, we could update things on the held bridge and run it manually to confirm without having to run new nodes | 19:38 |
clarkb | ianw: kopecmartin for that first change I think that may be a noop | 19:38 |
kopecmartin | ianw: or that, whatever you say :) | 19:38 |
clarkb | because we are already redirecting everything under / to localhost:8000 | 19:38 |
clarkb | I want to say there is a way to define the refstack api path in refstack itself | 19:38 |
clarkb | api_url =<%= scope.lookupvar("::refstack::params::api_url") %> is what puppet does | 19:39 |
kopecmartin | clarkb: hmm, so maybe that's why refstack server didn't behave as expected when i tested it , because of the '/ to localhost:8000' | 19:39 |
clarkb | I think you may want to set the config such that the api_url has an /api at the end of it | 19:39 |
kopecmartin | yeah yeah, i was playing with the api_url opiton, but it was ignored and i couldn't figure out why | 19:39 |
kopecmartin | now it makes sense | 19:40 |
clarkb | I think you may also have to set a js config value too | 19:40 |
clarkb | I remember looking at it and leaving some comments recently | 19:40 |
kopecmartin | ok then, let me get back to it and i'll implement updates shortly and ping you back so that it's moving forward | 19:40 |
clarkb | kopecmartin: ianw in the ansible template for refstack config try changing api_url = {{ refstack_url }} to api_url = {{ refstack_url }}/api maybe? | 19:41 |
clarkb | but ya I'm not sure that apache config change will help since it is already sending things to / | 19:41 |
ianw | i thought we did that, but maybe not | 19:41 |
kopecmartin | we did , but it didn't work | 19:41 |
clarkb | I see | 19:41 |
kopecmartin | it seemed like the opt was ignored or something like that | 19:41 |
clarkb | oh interesting the puppet side runs it at a wsgi app WSGIScriptAlias /api /etc/refstack/app.wsgi | 19:42 |
kopecmartin | therefore I reverted that and put the proxy pass there (as workaround) | 19:42 |
clarkb | so ya maybe the real fix is to switch to using it as wsgi? | 19:42 |
clarkb | that gets awkward with containes though | 19:42 |
clarkb | anyway sounds like you're ahead of me in the debugging so I should get out of the way :) | 19:43 |
clarkb | Anything else on this ? | 19:43 |
kopecmartin | so the WSGIScriptAlias /api /etc/refstack/app.wsgi is an equivalent for the ProxyPass I wrote? | 19:43 |
clarkb | kopecmartin: no, it runs a python wsgi process under apache and does wsgi "proxying" instead | 19:44 |
clarkb | they are similar in some ways but also different | 19:44 |
kopecmartin | ah | 19:45 |
ianw | yeah it seems to be almost running the api bits separately | 19:45 |
clarkb | alright lets move on | 19:47 |
clarkb | #topic Bridge disk use | 19:47 |
*** openstack changes topic to "Bridge disk use (Meeting topic: infra)" | 19:47 | |
clarkb | frickler discovered that /root/.cache is consuming a fair bit of disk. Particularly caches for python entrypoints and pip | 19:47 |
clarkb | does anyone know what caches entrypoints (is it pkg_resources?) and if it is safe to simply remove the entire dir? | 19:48 |
clarkb | I think my concern is that if a python process is running it may rely on that fiel being present after it has pkg_resourced | 19:48 |
ianw | i think we could just mtime delete anything older than a day though? | 19:49 |
clarkb | ianw: ya we could do that too, but there are so many files I expect the stating for that to be slow. But maybe that is fine | 19:49 |
clarkb | just start it and then wait :) | 19:49 |
ianw | yeah, i was thinking a cron job | 19:49 |
ianw | presumably it's not "leaking" as such, as it's under .cache ... | 19:50 |
clarkb | what is weird is I can't find evidence that this is part of python packaging proper | 19:51 |
ianw | ".cache/python-entrypoints" does not give many hits | 19:52 |
clarkb | I do have a much smaller number of entries on my local system from zuul testing it looks like | 19:52 |
fungi | could it be stevedore? | 19:52 |
mordred | ianw: I do not have anything on review-dev | 19:53 |
clarkb | fungi: ya maybe something in stevedore or ansible pulling in etc | 19:53 |
fungi | or something similar caching entrypoints, anyway | 19:53 |
clarkb | I think it would be worthwhile to try and source it before we go and delete them so that we understand it better (and its expected rate of growth) | 19:53 |
clarkb | I can probably take a look at that after getting this batch of gerrit accounts sorted | 19:53 |
mordred | I think it's stevedore | 19:54 |
mordred | random other hit on the internet: https://github.com/cpoppema/docker-flexget/issues/82 - also mentions stevedore - and I think I remember someone saying something about doing that a while back for performance | 19:54 |
mordred | stevedore/_cache.py: return os.path.join(base_path, 'python-entrypoints') | 19:55 |
clarkb | that looks incredibly suspicious :) | 19:55 |
* mordred puts on his useful-for-the-day hat | 19:55 | |
fungi | i was hoping for something incredibly delicious. i shouldn't have skipped lunch | 19:55 |
clarkb | based on that it should be fine to do a time based clearing, but maybe we should also file a bug | 19:55 |
ianw | and the latest patch is where you can drop a . file to stop it caching | 19:55 |
clarkb | ianw: oh ha someone else already hit this then I bet :) | 19:56 |
ianw | Add possibility to skip caching endpoints to the filesystem when '.disable' file is present in the cache directory. | 19:56 |
clarkb | (the idea of a cache seems like a good one, I wonder why it needs so many cache files though) | 19:56 |
ianw | is that coming from cloud launcher? what exactly is using stevedore? | 19:56 |
clarkb | we have just a few minutes left so one more thing | 19:56 |
clarkb | #topic InMotion OpenStack as a Service | 19:57 |
*** openstack changes topic to "InMotion OpenStack as a Service (Meeting topic: infra)" | 19:57 | |
clarkb | This has ended up towards the bottom of my priority list due to otherdistractions. I think getting ssl sorted out on this system would still be worthwhile if anyone else wants to take a look (you basically need to figure out how to configure kolla then rerun kolla against the cluster) | 19:57 |
clarkb | I think you can even tell kolla to just make a self signed cert as a first step | 19:58 |
clarkb | anyway I think we are all busy so don't necessarily expect anyone to jump on that, but thought I would mention it so it doesn't get completely forgotten | 19:58 |
clarkb | #topic Open Discussion | 19:58 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:58 | |
clarkb | Any thing else in our minute and a half remaining? | 19:58 |
fungi | the git-review 2.0.0.0rc1 tag seems to have worked fine | 19:58 |
clarkb | exciting | 19:59 |
fungi | #link https://pypi.org/project/git-review/2.0.0.0rc1/ | 19:59 |
clarkb | did you want people to install it and use it for a bit or was that mostly to exercise the publsihing? | 19:59 |
fungi | i just noticed though that the release notes could be better organized | 19:59 |
fungi | mostly to exercise publishing though we can ask folks to test it briefly | 19:59 |
fungi | #link https://review.opendev.org/778257 will clean up release notes | 19:59 |
clarkb | That is all we haev scheduled tiem for. Thank you everyone and feel free to continue discussion in #opendev | 20:01 |
clarkb | #endmeeting | 20:01 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:01 | |
openstack | Meeting ended Tue Mar 2 20:01:04 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:01 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-02-19.01.html | 20:01 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-02-19.01.txt | 20:01 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-03-02-19.01.log.html | 20:01 |
fungi | thanks clarkb! | 20:02 |
*** hashar has joined #opendev-meeting | 20:30 | |
*** sboyron has quit IRC | 21:30 | |
*** hashar has quit IRC | 23:10 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!