*** sboyron has joined #opendev-meeting | 08:05 | |
*** mordred has quit IRC | 11:19 | |
*** mordred has joined #opendev-meeting | 11:29 | |
*** hashar has joined #opendev-meeting | 13:27 | |
*** hashar has quit IRC | 15:10 | |
*** ianw_pto is now known as ianw | 18:59 | |
clarkb | anyone else here for our weekly meeting? | 19:00 |
---|---|---|
clarkb | We will get started shortly | 19:00 |
fungi | i'm here | 19:00 |
fungi | is that why i'm here? | 19:00 |
zbr | o/ | 19:00 |
fungi | i guess it is | 19:00 |
clarkb | #startmeeting infra | 19:01 |
ianw | o/ | 19:01 |
openstack | Meeting started Tue Jan 26 19:01:03 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
*** openstack changes topic to " (Meeting topic: infra)" | 19:01 | |
openstack | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-January/000174.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
*** openstack changes topic to "Announcements (Meeting topic: infra)" | 19:01 | |
clarkb | I had no announcements so lets move along | 19:01 |
clarkb | #topic Actions from last meeting | 19:01 |
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)" | 19:01 | |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-19-19.01.txt minutes from last meeting | 19:01 |
clarkb | There were three actions recorded. | 19:01 |
clarkb | First up is ianw ensuring that wiki is still getting backed up with the new borg backup setup | 19:01 |
clarkb | I believe this may have happend but will let ianw confirm | 19:02 |
*** diablo_rojo has joined #opendev-meeting | 19:02 | |
ianw | ahh i got a little distracted trying to fit everything into our space available | 19:02 |
diablo_rojo | o/ | 19:02 |
ianw | so the short version is that no, it is not yet backing up to the new servers | 19:03 |
clarkb | got it, I've got the general borg updates on the agenda for later too so we can dig in then | 19:03 |
clarkb | fungi has an action to send an email to the openstack-discuss list asking for config-core assistance | 19:03 |
clarkb | #link https://etherpad.opendev.org/p/tact-sig-2021-rfh is the draft but the email hasn't been sent yet | 19:03 |
fungi | yeah, i was hoping mnaser could take a look first and make sure it covers what he was looking for | 19:04 |
fungi | since it was his topic in last week's meeting and the preceding openstack tc meeting which precipitated it | 19:04 |
fungi | but as far as i'm concerned it's ready to flu | 19:04 |
fungi | fly | 19:04 |
fungi | (please not flu) | 19:05 |
clarkb | and the last action was one for myself to start a puppet -> ansible and xenial upgrade todo list. I got thoroughly sniped by gerrit account inconsistencies and have not done this | 19:05 |
clarkb | #action ianw Backup wiki to new borg servers | 19:05 |
clarkb | #action fungi send https://etherpad.opendev.org/p/tact-sig-2021-rfh once mnaser is happy with it | 19:05 |
clarkb | #action clarkb Write puppet to ansible and xenial upgrade todo list | 19:05 |
clarkb | #topic Priority Efforts | 19:06 |
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)" | 19:06 | |
clarkb | #topic OpenDev | 19:06 |
*** openstack changes topic to "OpenDev (Meeting topic: infra)" | 19:06 | |
clarkb | Nominations for the service coordination position are still open | 19:06 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-January/000161.html less a week remaining. | 19:06 |
clarkb | we've got through the weekend UTC time | 19:06 |
clarkb | If you are interested but want to learn more or have concerns don't hesitate to reach out | 19:07 |
clarkb | And that takes us to the thing that sniped me. | 19:07 |
clarkb | Last week a user managed to create an interesting gerrit account situation where openids and email conflicts caused a sitaution where gerrit moved and openid from one account to another | 19:07 |
clarkb | I've done a fair bit of digging into this as well as communicating upstream on the repo-discuss list about it and this opened a whole new can of problems | 19:08 |
clarkb | We have inconsistent user groups, user accounts, and account external ids | 19:08 |
clarkb | this is a problem because we can't push external id fixes to gerrit while it is online (to fix that user's problem for example) until all the inconsistencies are dealt with | 19:09 |
clarkb | A workaround to this is that it does appear that we can stop gerrit, modify the external ids directly in git (don't push through gerrit but stright to disk), reindex accounts (and groups?), start gerrit then clear caches (accounts and groups?) | 19:10 |
clarkb | Since this workaround involves downtime, I've been trying to audit the errors to see if we can correct them and do online updates instead | 19:10 |
clarkb | For the group inconsistency it is a single group that has included itself as a subgroup which is a loop. My plan was to just fix that one today via the web ui. | 19:11 |
clarkb | We have about 109 accounts with preferred emails set in All-Users refs/users/XY/ABXY:account.config with no corresponding external id | 19:11 |
clarkb | the vast majority of these are accounts that are inactive or functionally inactive. For them I think we can set them to inactive and remove the preferred email address from account.config and push the update back to correct them | 19:12 |
clarkb | this can be done online because each account has its own ref under refs/users. If you try to push an invalid config to there it should be rejected but we are pushing updates that make them valid | 19:12 |
clarkb | and finally we have ~642 email addresses in use by multiple external ids | 19:13 |
ianw | so catching up, this has happened because they changed their email in launchpad? | 19:13 |
clarkb | sorting these out is much more complicated because many of them seem to be active accounts. fungi and I were brainstorming around this a bit earlier today and I think we may be able to classify a subset of them (where one account has clearly been unused or underused) and merge it into the other account | 19:13 |
fungi | ianw: for some users, yes | 19:14 |
clarkb | ianw: that seems to be part of it yes | 19:14 |
fungi | it's hard to generalize, because there are a myriad of different sorts of conflicts currently returned by the validation check | 19:14 |
clarkb | the other big problem with the external id conflicts is they are all present in a single ref: refs/meta/external-ids whcih means we have to fix all of them at once and push that or do the downtime workaround and iterate | 19:14 |
clarkb | and ya I'm only just starting to scratch the surface on these. I think it is possible there are multuple scenarios going on. Including the potential for some users with multiple accounts where they actively use one for ssh and another for https | 19:15 |
clarkb | review-test:~clarkb/gerrit-consistency-notes/ is where I'm keeping notes and scripts | 19:16 |
corvus | in a different gerrit, i hosed my account by removing the email addr associated with my openid account (i didn't change my openid addr). in short: yes, hard to generalize. | 19:16 |
clarkb | conflicting_email_user_info and preferred-email-classifications are the two areas of distilled info and may be most interesting | 19:16 |
clarkb | it is also worth noting that I have yet to dump the info from prod | 19:17 |
clarkb | it shuoldn't be vastly different than -test, but at some point i should do that dump from prod | 19:17 |
clarkb | I haven't done it yet as it isn't super clear to me how costly that check is to the running server. When I run it against -test it takes several minutes to return | 19:17 |
clarkb | Maybe I should fix the groups issue then run the consistency check against prod today? | 19:18 |
clarkb | (it is a rest api request) | 19:18 |
clarkb | fungi indicated we could pair up tomorrow and start correct some of the simpler situations for accounts that have preferred email addrs without external ids | 19:19 |
fungi | yeah, i'm up for that | 19:20 |
clarkb | I guess that is where I'm at on this: fix teh group today, run consistency check against prod today if there are no objects, cross check that against -test, fix the simpler cases tomorrow | 19:20 |
fungi | sounds great, thanks for digging into this ball of hair | 19:20 |
clarkb | if people want to take a look at the info I've put together on -test and try to classify the email conflicts or otherwise propose fixes for them I'd be grateful :) | 19:20 |
clarkb | another issue with doing a major fix for 642 emails all at once is that if we get something functionally wrong we'll potentially have a lot of people in a bad spot. vs being able to do this one by one | 19:21 |
clarkb | upstream said it is a bug that you can't do it one by one but still a bug :/ | 19:21 |
clarkb | Thats all I had, happy to answer more qusetions on the subject if ya'll have them | 19:21 |
ianw | is there any way to stop this happening once we fix them? | 19:22 |
fungi | they should no longer happen | 19:22 |
clarkb | ianw: yes, new gerrit doesn't allow it to happen anymore | 19:22 |
fungi | this is an artifact of the beforetime | 19:23 |
clarkb | it does have the issue that the original user had which is it can move an openid so we may have to surgery that in the future | 19:23 |
corvus | (and my issue) | 19:23 |
clarkb | but preferred emails lacking external ids and external id email conflicts shouldn't happen to accounts once we fix those | 19:23 |
ianw | ahh, right, excellent | 19:24 |
clarkb | corvus: after the meeting I should catch up with you on that to find out what exactly you edited to cause that (as I think it will be useful to know for editing these fixes) | 19:24 |
fungi | ahh, yes, gerrit still seems capable of getting itself thoroughly confused around external id changes, but it no longer creates new conflicts, just leaves a mess for you to fix | 19:24 |
clarkb | fungi: yes that | 19:24 |
corvus | clarkb: sure -- but to be clear, i caused the problem as a regular user. fixing it required admin. | 19:24 |
clarkb | oh wow | 19:25 |
fungi | older gerrit allowed these inconsistencies, newer gerrit does now, but we were able to upgrade without fixing them, we just can't push changes without fixing them because the push operation wants to validate everything not just what you're changing | 19:25 |
fungi | er, newer gerrit does check for them now | 19:25 |
fungi | you can make changes via the rest api without validating the entire set, however the rest api is currently limited to reading and deleting external-ids | 19:26 |
fungi | it can't create or update | 19:26 |
clarkb | I also don't think it checks for conflicts on login unless it is creating a new account | 19:27 |
clarkb | which means that users in thissituation should be fine unless they try to introduce a new conflict | 19:27 |
fungi | yep | 19:28 |
clarkb | which is unfortunate beacuse we are likely to introduce some pain for them when we correct things in our bookkeeping | 19:28 |
fungi | well, presuambly it also checks for conflicts if you try to add an address to your account | 19:28 |
fungi | but only checks that the addition doesn't conflict | 19:28 |
clarkb | right | 19:28 |
clarkb | one of my thoughts here is that we set accounts to inactive to see who complains and then work with them to fix things | 19:28 |
clarkb | (and if we do that we can do some aggressive surgery on the external ids to make them pass consistency checking without worrying to much about user impacts. Then fix user impact when they can't login anymore and do it in a way that makes sense for them) | 19:29 |
clarkb | but that is super overkill | 19:29 |
clarkb | as a timecheck we're halfway through our hour. Let's continue on and we can talk about this in #opendev more as necessary | 19:30 |
clarkb | Next up is testing that Zuul handles WIP changes properly. Has anyone done this yet? | 19:30 |
clarkb | should be simple if we push up a trivial change, mark it wip with the built in state, then approve it and see if zuul enqueues it to the gate | 19:30 |
clarkb | that might make a good distraction from gerrit accounts task I can do later this week too if no one beats me to it | 19:31 |
clarkb | Gerrit 3.3.1 includes a workaround for making Zuul notice recheck comments. There is also a followon change to this workaround one that changes event stream data structures to do this in a richer way. Zuul support for that new unlanded method has landed in Zuul | 19:32 |
clarkb | All of this to say that we should be ok to upgrade Gerrit from a Zuul perspective now. | 19:32 |
clarkb | However, I've now noticed two different users on the gerrit mailing list that have downgraded back to 3.2 after upgrading | 19:33 |
clarkb | I wonder if we should reach out to them and find out what their issues were? | 19:33 |
clarkb | (there is a documented downgrade process which I think is a first) | 19:33 |
clarkb | I also think that upgrading the gerrit server ratehr than gerrit itself might be a bigger priority right now if we had to order those | 19:34 |
clarkb | #topic Update Config Management | 19:34 |
*** openstack changes topic to "Update Config Management (Meeting topic: infra)" | 19:34 | |
clarkb | There have been updates to the change to ansible and docker refstack. | 19:35 |
clarkb | I'm not driving that anymore, but trying to help with reviews when I have time | 19:35 |
clarkb | fungi: do you know if there are changes for storyboard docker stuff yet too? | 19:35 |
fungi | no, not yet, other than a bit of planning | 19:35 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/705258 refstack dockerization | 19:36 |
clarkb | Any other config management updates to call out? | 19:36 |
fungi | time for that has been split with planning for the storyboard-webclient rewrite framework discussion | 19:36 |
*** hamalq has joined #opendev-meeting | 19:36 | |
clarkb | sounds like that maybe it | 19:37 |
clarkb | #topic General topics | 19:37 |
*** openstack changes topic to "General topics (Meeting topic: infra)" | 19:37 | |
clarkb | #topic OpenAFS cluster status | 19:37 |
*** openstack changes topic to "OpenAFS cluster status (Meeting topic: infra)" | 19:37 | |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/771521 properly install new openafs on xenial openafs clients. | 19:37 |
clarkb | I have been rechecking this change for many days now. Its always something new :) | 19:37 |
fungi | i think the only outstanding problem right now is the wheel builder updates, but it's not clear the reason those jobs are failing is afs-related | 19:38 |
clarkb | ianw: fungi: I thought it would be good to get a quick update on the state of the afs server cluster. Are they all running 1.8.6 now from our ppa? are they out of the emergency file, etc | 19:38 |
ianw | the fileservers are all afs 1.8, the db servers i did not get to before a little PTO last week | 19:39 |
fungi | i haven't touched the db servers, but things have been stable | 19:39 |
ianw | (this week i mean) | 19:39 |
ianw | after that, i think we've decided on in-place focal updates which i can stage with (hopefully) zero downtime by doing one-at-a-time | 19:39 |
fungi | amd it's worth noting not all client systems upgraded to the new packages have been restarted on them, but since issues were predominately around restarting, that should be okay | 19:40 |
clarkb | fungi: ya and we tested reboots on some prominent clients to ensure the others would likely be ok with a reboot if/when that happened | 19:41 |
clarkb | ianw: and ya I think that was the plan. THanks for the update | 19:42 |
clarkb | #topic Bup and Borg Backups | 19:42 |
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)" | 19:42 | |
clarkb | We discovered that borg has filled disks somewhat quickly and are now looking at how to more sustainably run backups | 19:42 |
ianw | so yeah, i got sniped trying to get the working set to a more reasonable level | 19:43 |
ianw | the main issue is rotating gzipped sql backups that do not do well with delta updates | 19:43 |
ianw | my proposal is to use borg's feature of streaming in from stdout directly to a separate archive to store plain dumsp | 19:44 |
ianw | https://review.opendev.org/c/opendev/system-config/+/771748/4 | 19:44 |
ianw | with some help with mordred with the dump output, we made zero-delta updates mariadb even more efficient (not incoporated into changes yet) | 19:45 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/771748/4 stream database backups to borg to make it friendly to delta based backups | 19:45 |
clarkb | ianw: we are also successfully backing up to one location (out of two) ? | 19:46 |
ianw | yes, vexxhost has run out of space, but rax is larger, and we haven't fully turned off bup | 19:47 |
clarkb | bup is off for review though iirc | 19:47 |
ianw | it is very confusing, which is why i'd like to make it consistent post-haste | 19:47 |
clarkb | ++ | 19:47 |
ianw | ahh yes, indeed | 19:47 |
clarkb | thank you for working on this | 19:47 |
clarkb | Anything else on this subject? | 19:48 |
ianw | no, just i guess reviews on the streaming backup changes | 19:48 |
clarkb | #topic two-review rule impact on low-activity projects | 19:48 |
ianw | there are some trade-offs, we had a small discussion in #opendev; happy to continue the disucssion with anyone concerned | 19:48 |
*** openstack changes topic to "two-review rule impact on low-activity projects (Meeting topic: infra)" | 19:48 | |
clarkb | thanks again! | 19:48 |
clarkb | I kept this on the agenda because I wasn't sure we had taken the discussion last week to a conclusion. | 19:49 |
clarkb | My interpretation from last week was that it would be good if we tried to set expectations appropriately (somehow) | 19:49 |
clarkb | and that updating and exposing the things we are working on (like the borg things and gerrit account db inconsistencies) would be helpful | 19:49 |
clarkb | Was there anything else to add to that or concerns we think aren't well captured already? | 19:50 |
fungi | yeah, well, there were two main points. it's (still) okay to approve changes with a single core reviewer in emergencies or if the change is trivial or you're otherwise comfortable taking responsibility for making sure it goes okay, but also that we could be better about declining proposed changes, especially for some of our smaller.utility projects and libraries when those changes aren't really in | 19:51 |
fungi | scope | 19:51 |
clarkb | ++ | 19:52 |
ianw | perhaps also we should have a specific section of this meeting "review review" or something, where we more clearly can have people put reviews that seem stalled? | 19:52 |
clarkb | ianw: I'd be happy to try that | 19:53 |
fungi | sure | 19:53 |
clarkb | I can add that to the wiki agenda so I don't forget | 19:53 |
clarkb | #topic InMotion Hosting Bare Metal Cloud | 19:53 |
*** openstack changes topic to "InMotion Hosting Bare Metal Cloud (Meeting topic: infra)" | 19:53 | |
ianw | generally if i've had/have something i add it as an agenda point, but perhaps people feel a little shy to do that | 19:53 |
clarkb | Last week I got pm'd to say the new inmotion cloud resources should be ready for us to try them out | 19:54 |
clarkb | the credentials and contact info are in the usual place if someone wants to try out deploying an openstack cloud | 19:54 |
clarkb | I had hoped to try it out this week btu the ngerrit stuff happened | 19:54 |
clarkb | and maybe I'll still give it a go just to focus the brain on something else for a bit | 19:54 |
clarkb | but if anyone else is interested feel free to go for it | 19:54 |
clarkb | #topic Open Discussion | 19:55 |
*** openstack changes topic to "Open Discussion (Meeting topic: infra)" | 19:55 | |
clarkb | We have just under 5 minutes for anything that may have been missed | 19:55 |
fungi | unless anyone else wants to review my updates to the opendev.org main page, i suppose i can self-approve them after the meeting | 19:56 |
fungi | #link https://review.opendev.org/769826 Polish the main opendev.org page | 19:56 |
fungi | wanted to get that cleaned up before we start looking at options like linking/embedding statusbot info or an infra donors callout | 19:57 |
clarkb | oh they've been updated since I last reviwed them. That said looks like you have plenty of reviewers so I wouldn't wait on me | 19:57 |
clarkb | ++ I think they are good improvements overall too | 19:57 |
clarkb | like just for random users | 19:57 |
* fungi considers himself a random user | 19:57 | |
fungi | they don't come much more random than me | 19:58 |
fungi | oh, and a heads up, i'm trying to knock out significant git-review and bindep releases this week | 19:58 |
fungi | will discuss in #opendev after the meeting | 19:58 |
clarkb | thank you for the heads up | 19:58 |
fungi | zbr has been a huge help rescuing old reviews on git-review in particular | 19:59 |
clarkb | and thank you zbr for the help | 19:59 |
clarkb | we are at time | 20:00 |
fungi | thanks as always, clarkb! | 20:00 |
clarkb | thank you everyone! | 20:00 |
clarkb | #endmeeting | 20:00 |
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev" | 20:00 | |
openstack | Meeting ended Tue Jan 26 20:00:24 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.html | 20:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.txt | 20:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.log.html | 20:00 |
*** hamalq has quit IRC | 21:21 | |
*** hamalq has joined #opendev-meeting | 21:21 | |
*** sboyron has quit IRC | 22:08 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!