Tuesday, 2021-01-26

*** sboyron has joined #opendev-meeting08:05
*** mordred has quit IRC11:19
*** mordred has joined #opendev-meeting11:29
*** hashar has joined #opendev-meeting13:27
*** hashar has quit IRC15:10
*** ianw_pto is now known as ianw18:59
clarkbanyone else here for our weekly meeting?19:00
clarkbWe will get started shortly19:00
fungii'm here19:00
fungiis that why i'm here?19:00
zbro/19:00
fungii guess it is19:00
clarkb#startmeeting infra19:01
ianwo/19:01
openstackMeeting started Tue Jan 26 19:01:03 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-January/000174.html Our Agenda19:01
clarkb#topic Announcements19:01
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:01
clarkbI had no announcements so lets move along19:01
clarkb#topic Actions from last meeting19:01
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:01
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-19-19.01.txt minutes from last meeting19:01
clarkbThere were three actions recorded.19:01
clarkbFirst up is ianw ensuring that wiki is still getting backed up with the new borg backup setup19:01
clarkbI believe this may have happend but will let ianw confirm19:02
*** diablo_rojo has joined #opendev-meeting19:02
ianwahh i got a little distracted trying to fit everything into our space available19:02
diablo_rojoo/19:02
ianwso the short version is that no, it is not yet backing up to the new servers19:03
clarkbgot it, I've got the general borg updates on the agenda for later too so we can dig in then19:03
clarkbfungi has an action to send an email to the openstack-discuss list asking for config-core assistance19:03
clarkb#link https://etherpad.opendev.org/p/tact-sig-2021-rfh is the draft but the email hasn't been sent yet19:03
fungiyeah, i was hoping mnaser could take a look first and make sure it covers what he was looking for19:04
fungisince it was his topic in last week's meeting and the preceding openstack tc meeting which precipitated it19:04
fungibut as far as i'm concerned it's ready to flu19:04
fungifly19:04
fungi(please not flu)19:05
clarkband the last action was one for myself to start a puppet -> ansible and xenial upgrade todo list. I got thoroughly sniped by gerrit account inconsistencies and have not done this19:05
clarkb#action ianw Backup wiki to new borg servers19:05
clarkb#action fungi send https://etherpad.opendev.org/p/tact-sig-2021-rfh once mnaser is happy with it19:05
clarkb#action clarkb Write puppet to ansible and xenial upgrade todo list19:05
clarkb#topic Priority Efforts19:06
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:06
clarkb#topic OpenDev19:06
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:06
clarkbNominations for the service coordination position are still open19:06
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-January/000161.html less a week remaining.19:06
clarkbwe've got through the weekend UTC time19:06
clarkbIf you are interested but want to learn more or have concerns don't hesitate to reach out19:07
clarkbAnd that takes us to the thing that sniped me.19:07
clarkbLast week a user managed to create an interesting gerrit account situation where openids and email conflicts caused a sitaution where gerrit moved and openid from one account to another19:07
clarkbI've done a fair bit of digging into this as well as communicating upstream on the repo-discuss list about it and this opened a whole new can of problems19:08
clarkbWe have inconsistent user groups, user accounts, and account external ids19:08
clarkbthis is a problem because we can't push external id fixes to gerrit while it is online (to fix that user's problem for example) until all the inconsistencies are dealt with19:09
clarkbA workaround to this is that it does appear that we can stop gerrit, modify the external ids directly in git (don't push through gerrit but stright to disk), reindex accounts (and groups?), start gerrit then clear caches (accounts and groups?)19:10
clarkbSince this workaround involves downtime, I've been trying to audit the errors to see if we can correct them and do online updates instead19:10
clarkbFor the group inconsistency it is a single group that has included itself as a subgroup which is a loop. My plan was to just fix that one today via the web ui.19:11
clarkbWe have about 109 accounts with preferred emails set in All-Users refs/users/XY/ABXY:account.config with no corresponding external id19:11
clarkbthe vast majority of these are accounts that are inactive or functionally inactive. For them I think we can set them to inactive and remove the preferred email address from account.config and push the update back to correct them19:12
clarkbthis can be done online because each account has its own ref under refs/users. If you try to push an invalid config to there it should be rejected but we are pushing updates that make them valid19:12
clarkband finally we have ~642 email addresses in use by multiple external ids19:13
ianwso catching up, this has happened because they changed their email in launchpad?19:13
clarkbsorting these out is much more complicated because many of them seem to be active accounts. fungi and I were brainstorming around this a bit earlier today and I think we may be able to classify a subset of them (where one account has clearly been unused or underused) and merge it into the other account19:13
fungiianw: for some users, yes19:14
clarkbianw: that seems to be part of it yes19:14
fungiit's hard to generalize, because there are a myriad of different sorts of conflicts currently returned by the validation check19:14
clarkbthe other big problem with the external id conflicts is they are all present in a single ref: refs/meta/external-ids whcih means we have to fix all of them at once and push that or do the downtime workaround and iterate19:14
clarkband ya I'm only just starting to scratch the surface on these. I think it is possible there are multuple scenarios going on. Including the potential for some users with multiple accounts where they actively use one for ssh and another for https19:15
clarkbreview-test:~clarkb/gerrit-consistency-notes/ is where I'm keeping notes and scripts19:16
corvusin a different gerrit, i hosed my account by removing the email addr associated with my openid account  (i didn't change my openid addr).  in short: yes, hard to generalize.19:16
clarkbconflicting_email_user_info and preferred-email-classifications are the two areas of distilled info and may be most interesting19:16
clarkbit is also worth noting that I have yet to dump the info from prod19:17
clarkbit shuoldn't be vastly different than -test, but at some point i should do that dump from prod19:17
clarkbI haven't done it yet as it isn't super clear to me how costly that check is to the running server. When I run it against -test it takes several minutes to return19:17
clarkbMaybe I should fix the groups issue then run the consistency check against prod today?19:18
clarkb(it is a rest api request)19:18
clarkbfungi indicated we could pair up tomorrow and start correct some of the simpler situations for accounts that have preferred email addrs without external ids19:19
fungiyeah, i'm up for that19:20
clarkbI guess that is where I'm at on this: fix teh group today, run consistency check against prod today if there are no objects, cross check that against -test, fix the simpler cases tomorrow19:20
fungisounds great, thanks for digging into this ball of hair19:20
clarkbif people want to take a look at the info I've put together on -test and try to classify the email conflicts or otherwise propose fixes for them I'd be grateful :)19:20
clarkbanother issue with doing a major fix for 642 emails all at once is that if we get something functionally wrong we'll potentially have a lot of people in a bad spot. vs being able to do this one by one19:21
clarkbupstream said it is a bug that you can't do it one by one but still a bug :/19:21
clarkbThats all I had, happy to answer more qusetions on the subject if ya'll have them19:21
ianwis there any way to stop this happening once we fix them?19:22
fungithey should no longer happen19:22
clarkbianw: yes, new gerrit doesn't allow it to happen anymore19:22
fungithis is an artifact of the beforetime19:23
clarkbit does have the issue that the original user had which is it can move an openid so we may have to surgery that in the future19:23
corvus(and my issue)19:23
clarkbbut preferred emails lacking external ids and external id email conflicts shouldn't happen to accounts once we fix those19:23
ianwahh, right, excellent19:24
clarkbcorvus: after the meeting I should catch up with you on that to find out what exactly you edited to cause that (as I think it will be useful to know for editing these fixes)19:24
fungiahh, yes, gerrit still seems capable of getting itself thoroughly confused around external id changes, but it no longer creates new conflicts, just leaves a mess for you to fix19:24
clarkbfungi: yes that19:24
corvusclarkb: sure -- but to be clear, i caused the problem as a regular user.  fixing it required admin.19:24
clarkboh wow19:25
fungiolder gerrit allowed these inconsistencies, newer gerrit does now, but we were able to upgrade without fixing them, we just can't push changes without fixing them because the push operation wants to validate everything not just what you're changing19:25
fungier, newer gerrit does check for them now19:25
fungiyou can make changes via the rest api without validating the entire set, however the rest api is currently limited to reading and deleting external-ids19:26
fungiit can't create or update19:26
clarkbI also don't think it checks for conflicts on login unless it is creating a new account19:27
clarkbwhich means that users in thissituation should be fine unless they try to introduce a new conflict19:27
fungiyep19:28
clarkbwhich is unfortunate beacuse we are likely to introduce some pain for them when we correct things in our bookkeeping19:28
fungiwell, presuambly it also checks for conflicts if you try to add an address to your account19:28
fungibut only checks that the addition doesn't conflict19:28
clarkbright19:28
clarkbone of my thoughts here is that we set accounts to inactive to see who complains and then work with them to fix things19:28
clarkb(and if we do that we can do some aggressive surgery on the external ids to make them pass consistency checking without worrying to much about user impacts. Then fix user impact when they can't login anymore and do it in a way that makes sense for them)19:29
clarkbbut that is super overkill19:29
clarkbas a timecheck we're halfway through our hour. Let's continue on and we can talk about this in #opendev more as necessary19:30
clarkbNext up is testing that Zuul handles WIP changes properly. Has anyone done this yet?19:30
clarkbshould be simple if we push up a trivial change, mark it wip with the built in state, then approve it and see if zuul enqueues it to the gate19:30
clarkbthat might make a good distraction from gerrit accounts task I can do later this week too if no one beats me to it19:31
clarkbGerrit 3.3.1 includes a workaround for making Zuul notice recheck comments. There is also a followon change to this workaround one that changes event stream data structures to do this in a richer way. Zuul support for that new unlanded method has landed in Zuul19:32
clarkbAll of this to say that we should be ok to upgrade Gerrit from a Zuul perspective now.19:32
clarkbHowever, I've now noticed two different users on the gerrit mailing list that have downgraded back to 3.2 after upgrading19:33
clarkbI wonder if we should reach out to them and find out what their issues were?19:33
clarkb(there is a documented downgrade process which I think is a first)19:33
clarkbI also think that upgrading the gerrit server ratehr than gerrit itself might be a bigger priority right now if we had to order those19:34
clarkb#topic Update Config Management19:34
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:34
clarkbThere have been updates to the change to ansible and docker refstack.19:35
clarkbI'm not driving that anymore, but trying to help with reviews when I have time19:35
clarkbfungi: do you know if there are changes for storyboard docker stuff yet too?19:35
fungino, not yet, other than a bit of planning19:35
clarkb#link https://review.opendev.org/c/opendev/system-config/+/705258 refstack dockerization19:36
clarkbAny other config management updates to call out?19:36
fungitime for that has been split with planning for the storyboard-webclient rewrite framework discussion19:36
*** hamalq has joined #opendev-meeting19:36
clarkbsounds like that maybe it19:37
clarkb#topic General topics19:37
*** openstack changes topic to "General topics (Meeting topic: infra)"19:37
clarkb#topic OpenAFS cluster status19:37
*** openstack changes topic to "OpenAFS cluster status (Meeting topic: infra)"19:37
clarkb#link https://review.opendev.org/c/opendev/system-config/+/771521 properly install new openafs on xenial openafs clients.19:37
clarkbI have been rechecking this change for many days now. Its always something new :)19:37
fungii think the only outstanding problem right now is the wheel builder updates, but it's not clear the reason those jobs are failing is afs-related19:38
clarkbianw: fungi: I thought it would be good to get a quick update on the state of the afs server cluster. Are they all running 1.8.6 now from our ppa? are they out of the emergency file, etc19:38
ianwthe fileservers are all afs 1.8, the db servers i did not get to before a little PTO last week19:39
fungii haven't touched the db servers, but things have been stable19:39
ianw(this week i mean)19:39
ianwafter that, i think we've decided on in-place focal updates which i can stage with (hopefully) zero downtime by doing one-at-a-time19:39
fungiamd it's worth noting not all client systems upgraded to the new packages have been restarted on them, but since issues were predominately around restarting, that should be okay19:40
clarkbfungi: ya and we tested reboots on some prominent clients to ensure the others would likely be ok with a reboot if/when that happened19:41
clarkbianw: and ya I think that was the plan. THanks for the update19:42
clarkb#topic Bup and Borg Backups19:42
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:42
clarkbWe discovered that borg has filled disks somewhat quickly and are now looking at how to more sustainably run backups19:42
ianwso yeah, i got sniped trying to get the working set to a more reasonable level19:43
ianwthe main issue is rotating gzipped sql backups that do not do well with delta updates19:43
ianwmy proposal is to use borg's feature of streaming in from stdout directly to a separate archive to store plain dumsp19:44
ianwhttps://review.opendev.org/c/opendev/system-config/+/771748/419:44
ianwwith some help with mordred with the dump output, we made zero-delta updates mariadb even more efficient (not incoporated into changes yet)19:45
clarkb#link https://review.opendev.org/c/opendev/system-config/+/771748/4 stream database backups to borg to make it friendly to delta based backups19:45
clarkbianw: we are also successfully backing up to one location (out of two) ?19:46
ianwyes, vexxhost has run out of space, but rax is larger, and we haven't fully turned off bup19:47
clarkbbup is off for review though iirc19:47
ianwit is very confusing, which is why i'd like to make it consistent post-haste19:47
clarkb++19:47
ianwahh yes, indeed19:47
clarkbthank you for working on this19:47
clarkbAnything else on this subject?19:48
ianwno, just i guess reviews on the streaming backup changes19:48
clarkb#topic two-review rule impact on low-activity projects19:48
ianwthere are some trade-offs, we had a small discussion in #opendev; happy to continue the disucssion with anyone concerned19:48
*** openstack changes topic to "two-review rule impact on low-activity projects (Meeting topic: infra)"19:48
clarkbthanks again!19:48
clarkbI kept this on the agenda because I wasn't sure we had taken the discussion last week to a conclusion.19:49
clarkbMy interpretation from last week was that it would be good if we tried to set expectations appropriately (somehow)19:49
clarkband that updating and exposing the things we are working on (like the borg things and gerrit account db inconsistencies) would be helpful19:49
clarkbWas there anything else to add to that or concerns we think aren't well captured already?19:50
fungiyeah, well, there were two main points. it's (still) okay to approve changes with a single core reviewer in emergencies or if the change is trivial or you're otherwise comfortable taking responsibility for making sure it goes okay, but also that we could be better about declining proposed changes, especially for some of our smaller.utility projects and libraries when those changes aren't really in19:51
fungiscope19:51
clarkb++19:52
ianwperhaps also we should have a specific section of this meeting "review review" or something, where we more clearly can have people put reviews that seem stalled?19:52
clarkbianw: I'd be happy to try that19:53
fungisure19:53
clarkbI can add that to the wiki agenda so I don't forget19:53
clarkb#topic InMotion Hosting Bare Metal Cloud19:53
*** openstack changes topic to "InMotion Hosting Bare Metal Cloud (Meeting topic: infra)"19:53
ianwgenerally if i've had/have something i add it as an agenda point, but perhaps people feel a little shy to do that19:53
clarkbLast week I got pm'd to say the new inmotion cloud resources should be ready for us to try them out19:54
clarkbthe credentials and contact info are in the usual place if someone wants to try out deploying an openstack cloud19:54
clarkbI had hoped to try it out this week btu the ngerrit stuff happened19:54
clarkband maybe I'll still give it a go just to focus the brain on something else for a bit19:54
clarkbbut if anyone else is interested feel free to go for it19:54
clarkb#topic Open Discussion19:55
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:55
clarkbWe have just under 5 minutes for anything that may have been missed19:55
fungiunless anyone else wants to review my updates to the opendev.org main page, i suppose i can self-approve them after the meeting19:56
fungi#link https://review.opendev.org/769826 Polish the main opendev.org page19:56
fungiwanted to get that cleaned up before we start looking at options like linking/embedding statusbot info or an infra donors callout19:57
clarkboh they've been updated since I last reviwed them. That said looks like you have plenty of reviewers so I wouldn't wait on me19:57
clarkb++ I think they are good improvements overall too19:57
clarkblike just for random users19:57
* fungi considers himself a random user19:57
fungithey don't come much more random than me19:58
fungioh, and a heads up, i'm trying to knock out significant git-review and bindep releases this week19:58
fungiwill discuss in #opendev after the meeting19:58
clarkbthank you for the heads up19:58
fungizbr has been a huge help rescuing old reviews on git-review in particular19:59
clarkband thank you zbr for the help19:59
clarkbwe are at time20:00
fungithanks as always, clarkb!20:00
clarkbthank you everyone!20:00
clarkb#endmeeting20:00
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:00
openstackMeeting ended Tue Jan 26 20:00:24 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.html20:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.txt20:00
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-01-26-19.01.log.html20:00
*** hamalq has quit IRC21:21
*** hamalq has joined #opendev-meeting21:21
*** sboyron has quit IRC22:08

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!