Tuesday, 2020-11-10

*** hamalq has quit IRC02:44
*** sboyron has joined #opendev-meeting07:11
*** hashar has joined #opendev-meeting07:50
*** SotK has quit IRC09:00
*** SotK has joined #opendev-meeting09:01
*** hashar has quit IRC09:22
*** hashar has joined #opendev-meeting09:28
*** hashar is now known as hasharLunch10:18
*** hasharLunch is now known as hashar12:44
*** hamalq has joined #opendev-meeting17:10
*** hamalq has quit IRC17:10
*** hamalq has joined #opendev-meeting17:11
*** hashar has quit IRC17:59
clarkbanyone else here for the infra meeting?19:01
clarkbI'm trying to juggle a quick set of updates for one of the topics but we'll get things going19:01
clarkb#startmeeting infra19:01
openstackMeeting started Tue Nov 10 19:01:39 2020 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
*** openstack changes topic to " (Meeting topic: infra)"19:01
openstackThe meeting name has been set to 'infra'19:01
corvuso/19:01
fungiohai19:02
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2020-November/000134.html Our Agenda19:02
ianwo/19:02
clarkb#topic Announcements19:03
*** openstack changes topic to "Announcements (Meeting topic: infra)"19:03
*** diablo_rojo__ has joined #opendev-meeting19:03
diablo_rojo__o/19:03
clarkbWallaby cycle signing key has been activated https://review.opendev.org/76036419:03
clarkbPlease sign if you haven't yet https://docs.opendev.org/opendev/system-config/latest/signing.html19:03
diablo_rojo__o/19:03
clarkbI should find time to do that19:03
fungias long as we have at least a few folks attesting to it, that should be fine. the previous key has also published a signature for it anyway19:04
clarkb#topic Actions from last meeting19:05
*** openstack changes topic to "Actions from last meeting (Meeting topic: infra)"19:05
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-03-19.01.txt minutes from last meeting19:05
clarkbThere were no recorded actions19:05
clarkb#topic Priority Efforts19:05
*** openstack changes topic to "Priority Efforts (Meeting topic: infra)"19:05
clarkb#topic Update Config Management19:05
*** openstack changes topic to "Update Config Management (Meeting topic: infra)"19:05
clarkbI believe we have an update on mirror-update.opendev.org from ianw and fungi? The reprepro stuff has been converted to ansible and the old puppeted server is no more?19:05
fungithat sounds right to me19:06
ianwyes, all done now, i've removed the old server so it's all opendev.org, all the time :)19:06
clarkbexcellent, thank you for working on that.19:06
clarkbHas the change to do vos release via ssh landed?19:06
ianwyes, i haven't double checked all the runs yet this morning, but the ones i saw last night looked good19:07
fungi758695 merged and was deployed by 05:12:1619:07
clarkbcool. Are there any other puppet conversions to call out?19:07
fungiso in theory any mirror pulses starting after that time should have used it19:07
ianwumm you saw the thing about the afs puppet jobs19:09
ianwi think they have just been broken for ... a long time?19:09
clarkbianw: yup I've pushed up a few changes/patchsets to try and fix the testing on that change19:09
clarkband yes I expect that has always been broken19:09
clarkbjust more noticeable now due to the symlink thing19:09
clarkbianw: if my patches don't work then maybe we should ignore e208 for now in order to get the puppetry happy19:10
ianwok, i think afs is my next challenge to get updated19:10
fungigrafana indicates ~current state (all <4hr old) for our package mirrors19:10
clarkb#topic OpenDev19:11
*** openstack changes topic to "OpenDev (Meeting topic: infra)"19:11
clarkbPreparations for a gerrit 3.2 upgrade are ramping up again19:11
clarkb#link http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html Our announcement for the November 20 - 22 upgrade window19:11
clarkbfungi and I have got review-test upgraded from a ~november 5 prod state19:12
clarkbThe server should be up and useable for testing and other interactions19:12
fungiyep, fully upgraded to 3.219:12
clarkbianw: we are hoping that will help with your jeepyb testing19:12
fungialso usable for demonstrating the ui19:12
ianwahh yes, i can play with the api and see if we can replicate the jeepyb things19:13
fungiand it sounds like the 3.3 release is coming right about the time we planned to upgrade to 3.2, so we should probably plan a separate 3.3 release soon after?19:13
clarkbfungi: I think once we've settled then ya 3.3 should happen quickly after19:13
clarkbI think we've basically decided that the surrogate gerrit idea is neat, but introduces a bit of complexity in knowing what needs to be synced back and forth to end up with a valid upgrade path that way.19:13
fungii suppose we can keep review-test around to also test the 3.3 upgrade if we want19:13
clarkbfungi and I did discover that giving the notedb conversion more threads sped up that process. Still not short but noticeably quicker19:13
clarkbwe gave it 50% more threads and it ran 40% quicker19:14
clarkbI think we plan to double the default thread count when we do the production upgrade19:14
fungiwe might be able to speed it up a bit more still too, though i don't expect much below 4 hours to complete the notedb migration step19:14
clarkbThere are a couple of things that have popped up that I wanted to bring up for wider discussion.19:14
clarkbThe first is that we have confirmed that gerrit does not like updating accounts if they don't have an email set19:15
clarkb#link https://bugs.chromium.org/p/gerrit/issues/detail?id=1365419:15
fungiif we budget ~4 hours on the upgrade plan, i guess we can see where that would leave us in possible timelines19:15
fungioh, yeah, that email-less account behavior strikes me as a bug19:15
clarkbI've filed that upstream bug due to that weird account management behavior. You can create an internal account just fine without and email address, but you cannot then update that account's ssh keys19:15
clarkbyou also can't use a duplicate email address across accounts19:16
fungiyou're allowed to create accounts with no e-mail address, but adding an ssh key to one after the fact throws a weird backtrace into the logs and responds "unavailable"19:16
fungiso probably just a regression19:16
clarkbthis means that if we need to update our admin accounts we may need to set a unique email address on them :/19:16
corvuswe can set "infra-root+foobar" as email for for aour admin accounts19:16
fungiyeah, that seems like a reasonable workaround19:16
clarkbah cool19:16
ianw++19:17
clarkbfungi: ^ we should probably test that gerrit treats those as unique?19:17
corvusor i guess probably our own email addresses depending on hosting provider19:17
fungisince rackspace's e-mail system we're using does support + addresses as automatic aliases19:17
corvusgmail supports it iirc19:17
clarkband hopefully newer gerrit will just fix the problem19:17
corvus(and my exim/cyrus does)19:17
fungii'm happy to test that gerrit sees those addresses as unique, but i can pretty well guarantee it will19:17
clarkbfungi: thanks19:18
clarkbthat seems like a quick and easy fix so we probably don't need to get into it much further19:18
clarkbThe other thing I wanted to bring up was the set of chagnes that I've prepped in relation to the upgrade19:18
clarkb#link https://review.opendev.org/#/q/status:open+topic:gerrit-upgrade-prep Changes for before and after the upgrade19:18
clarkbA number of those should be safe to land today and they do not have a WIP19:18
fungithe main reason i would want to avoid having to put e-mail addresses on those accounts is it's just one more thing which can tab complete to them in the gerrit ui and confuse users19:18
clarkbAnother chunk reflect state after the upgrade and are WIP because we shouldn't land them yet19:19
clarkbIt would be great if we could get reviews on the lot of them to sanity check things as well as land as much as we can today19:19
clarkb(or $day before the upgrade_19:19
clarkbOne specific concern I've got is there are ~4 system-config chagnes that sort of all need to land together because they reflect post upgrade system state, but zuul will run them in sequence19:20
clarkbso I'm wondering how should we manipulate zuul during/after the upgrade to safely run those updates against the updated state19:20
corvusfungi: good point, we should probably avoid "corvus+admin"; infra-root+corvus is better due to tab-complete19:20
clarkbhttps://review.opendev.org/#/c/757155/ https://review.opendev.org/#/c/757625/ https://review.opendev.org/#/c/757156/ https://review.opendev.org/#/c/757176/ are the 4 changes I've identified in this situation19:21
corvusclarkb: bracket with disable/enable jobs change?19:21
clarkbcorvus: ya so I think our options are: disable then enable the jobs entirely, force merge them all before zuul starts, squash them and set it up so that a single job running is fine19:22
clarkbone concern with disabling the jobs then enabling them is I worry I won't manage to sufficiently disable the job since we trigger them in a number of places. But that concern may just be mitigated with sufficient grepping19:22
corvusi agree and force-merge or squashing means less time spinning wheels19:23
clarkbjust before the meeting I discovered thatjeepyb wasn't running the gerrit 3.1 and 3.2 image builds as an example of where we've missed things like that previously19:23
fungii'm good with squashing, those changes aren't massive19:24
fungiand they're all for the same repo19:24
clarkbthe changes in system-config that trail the ones I've listed above should all be safe to land as after the fact cleanups19:24
clarkbAnother concern I had was I expect gitea replication to take a day and a half or so based on testing, I don't think we rely on gitea state for our zuul jobs that run ansible, but if we do anywhere can you call that out?19:24
clarkbbecause that is another syncing of the world step that may impact our automated deployments19:25
clarkbbut ya if people can review those changes and think about them from a perspective of how do we land them safely post upgrade that would be great. I'm open to feedback and ideas19:25
clarkbI'm hoping to write up a concrete upgrade plan doc soon (starting tomorrow likely) and we can start to fill in those details19:26
clarkbat this point I think my biggest concern with the upgrade revolves around how do we turn zuul back on safely :)19:26
corvusthe gitea replication lag will probably confuse folks cloning or pulling changes (or using gertty)19:26
*** hashar has joined #opendev-meeting19:27
corvusbut it's happened before, so i think if we include that in the announcement folks can deal19:27
fungithis is also why even if we can get stuff done on saturday we need to say the maintenance is through sunday19:27
clarkbfungi: yup and we have done that19:28
fungi(or early monday as we've been communicating so far)19:28
clarkbanother thought that occured to me when writing https://review.opendev.org/#/c/762191/1 earlier today is that it feels like we're effectively abandoning review-dev19:28
clarkbShould we try to upgrade review-dev or decide it doesn't work well for us anymore and we need something like review-test going forward?19:28
clarkbI'm hopeful that zuul jobs can fit in there too19:29
fungii had assumed, perhaps incorrectly, that we wouldn't really need review-dev going forward19:29
clarkbfungi: fwiw I don't think that is incorrect, mostly just me realizing today "Oh ya we still have review dev and these cahgnes will make it sad"19:29
clarkbI think that is ok if one of the todo items here is retire review-dev19:29
clarkbwe can put it in the emergency file in the interim19:29
clarkbreview-test with prod like data has been way more valuable imo19:30
fungiour proliferation of -dev servers predates our increased efficiency at standing up test servers on demand, or even as part of ci19:30
fungiand at some point they become more of a maintenance burden than a benefit19:30
corvusclarkb: ++19:32
clarkbok /me adds put review-dev in stasis to the list19:32
clarkbThe last thing on my talk about gerrit list is that storyboard is still an unknown19:33
clarkbits-storyboard may or may not work is the more specific way of saying that19:33
clarkbfungi: how terrible would it be to set up credentials for review-test against storyboard-dev now and test that integration?19:33
fungiwe're building it into the images, adding credentials for it would be fairly trivial19:33
fungii can give that a go later this week and test it19:34
clarkbthat would be great, thank you19:34
clarkbanyone else have questions or concerns to bring up around the upgrade?19:34
fungii think where it's likely yo fall apart is around commentlinks mapping to the its actions19:35
fungier, likely to fall apart19:35
fungi(talking about its-storyboard plugin integration that is)19:37
clarkb#topic General topics19:38
*** openstack changes topic to "General topics (Meeting topic: infra)"19:38
clarkb#topic PTG Followups19:38
*** openstack changes topic to "PTG Followups (Meeting topic: infra)"19:38
clarkbJust a note that I haven't forgotten these, but the time pressure for the gerrit upgrade has me focusing on that (the downside to having all the things happen in a short period of time)19:38
clarkbI'm hoping tomorrow will be a "writing" day and I'll get an upgrade plan doc written as well as some of these ptg things and not look at failing jobs or code for a bit19:39
clarkb#topic Meetpad not useable from some locations19:39
*** openstack changes topic to "Meetpad not useable from some locations (Meeting topic: infra)"19:39
clarkbI brought this up with Horace and he was willing to help us test it, then I completely spaced on it because last week had a very distracting event going on.19:40
clarkbI'll try pinging horace this evening (my time) to see if there is a good time to test again19:40
clarkbthen hopeflly we can narrow this down to corporate firewalls or the great firewall etc19:40
clarkb#topic Bup and Borg Backups19:41
*** openstack changes topic to "Bup and Borg Backups (Meeting topic: infra)"19:41
clarkbWanted to bring this up since there have been recent updates19:41
clarkbIn particular I think we declared bup bankruptcy on etherpad since /root/.bup was using significant disk19:41
clarkband out of that ianw has landed chagnes to start running borg on all the hosts we back up19:42
clarkbianw: were you happy with the results of those changes?19:42
ianwi was last night on etherpad19:42
ianwi haven't yet gone through all the other hosts but will today19:42
clarkbsounds good19:43
ianwnote per our discussion bup is now off on etherpad, because it was filling up the disk19:43
clarkbI think the biggest change from what we were doing with bup is that borg requires a bit more opt in to what is backed up rather than backing up all of / with exclusions19:43
clarkb(we could set borg to backup / then do exclusions too I suppose)19:44
clarkbwant to call that out as I tripped over it a few times when reasoning about exclusion list updates and the like19:44
ianwanother thing is that the vexxhost backup server has 1tb attached, the rax one 3tb19:45
fungii think if we set a good policy about where we expect important data/state to reside on our systems and then back up those paths, it's fine19:45
clarkbianw: also have we set the borg settings to do append only backups?19:45
clarkbwe had called that out as a desireable feature and now I can't recall if we're setting that or not19:45
ianwyes, we run the remote side with --append-only19:46
clarkbgreat, thank you for working on this. Hopeflly we end up freeing a lot of local disk that was consumed by /root/.bup as well as handle the python2 less world19:47
clarkbI had a couple other topics (openstackid.org and splitting puppet else up) but I don't think anything has happend on those subjects19:48
clarkb#topic Open Discussion19:48
*** openstack changes topic to "Open Discussion (Meeting topic: infra)"19:48
clarkbtomorrow is a holiday in many parts of the wordl which is why I'm hoping I can get away with writing documents :)19:49
clarkbif you've got the day off enjoy19:49
corvusianw: there was some discussion in #zuul this morning related to your pypa zuul work; did you see that?  is a tl;dr worthwhile?19:49
ianwcorvus: sure, pypa have shown interest in zuul and i've been working to get a proof-of-concept up19:49
corvusoh sorry, i meant do you want me to summarize the #zuul chat? :)19:50
ianwthe pull request doing some tox testing is @ https://github.com/pypa/pip/pull/910719:50
ianwoh, haha, sure19:50
corvusit was suggested that if we pull more stuff out of the run playbook and put it into pre (eg, ensure-tox etc) it would make the console tab more accessible to folks.  i think that's relevant in your pr since that job is being defined there.  i think avass was going to leave a comment.19:51
corvusbuilding on that, we thought we might look into having zuul default to the console tab rather than the summary tab.  (this item is less immediately relevant)19:52
ianwoh right, yeah i pushed a change to do that in that pr19:53
clarkboh inmotionhosting has reached out to me about possibly providing cloud resources to opendev. I'ev got an introductory call with them tomorrow to start that conversation19:53
corvusthe overall theme is if we focus on simplifying the run playbook and present the console tab to users, we can immediately present important information to users, increase the signal/noise ratio, and the output may start to seem a little more familiar to folks using other ci tools.19:53
corvusianw: cool, then you're probably ahead of me on this, i had to duck out right after that convo.  :)19:54
ianwcorvus: this is true, as with travis or github, i forget, you get basically your yaml file shown to you in a "console" format19:54
ianwlike you click to open up each step and see the logs19:54
corvusianw: yeah, and we do too, it's just our yaml file is way bigger :)19:55
corvus(and the console tab hides pre/post playbooks by default, so putting "boring" stuff in those is a win for ux [assuming it's appropriate to put them there])19:55
corvusclarkb: neatoh19:56
ianwi'm pretty aware that just using zuul to run tox as 3rd party CI for github isn't a big goal for us ... but i do feel like there's some opportunity to bring pip a little further along here19:56
fungithe tasks like "Run tox testing" which are just role inclusion statements could also be considered noise, i suppose19:57
corvusfungi: yeah, that might be worth a ui re-think19:57
corvus(maybe we can ignore those?)19:57
corvusclarkb: are they a private cloud provider?19:58
fungior maybe "expandable" task results could be more prominent in the ui somehow19:58
clarkbcorvus: yup, the brief intro I got was that they coudl run an openstack private cloud that we would use19:58
fungibesides just the leading > marker19:58
clarkbWe are just about to our hour time limit. Thank you everyone!19:59
fungithanks clarkb!19:59
corvusclarkb: thx!19:59
clarkbWe'll see you here next week. Probably with another focus on gerrit as that'll be a few days before the planned upgrade20:00
clarkbprobably do a sanity check go no go then too20:00
clarkb#endmeeting20:00
*** openstack changes topic to "Incident management and meetings for the OpenDev sysadmins; normal discussions are in #opendev"20:00
openstackMeeting ended Tue Nov 10 20:00:10 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.html20:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.txt20:00
openstackLog:            http://eavesdrop.openstack.org/meetings/infra/2020/infra.2020-11-10-19.01.log.html20:00
corvusinmotion is in el segundo... i left my wallet in el segundo.20:00
fungimaybe they can help you find it!20:07
*** hashar has quit IRC20:55
*** sboyron has quit IRC23:36

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!