Tuesday, 2021-07-13

-opendevstatus- NOTICE: Depends-On using https://review.opendev.org URLs are currently not working. This was due to a config change in Zuul that we are reverting and will be restarting Zuul to pick up.17:40
clarkbAnyone else here for the meeting/ We will start shortly18:59
clarkbI expect it might be a small crew today as fungi is out and I think diablo_rojo_phone is busy19:00
corvuso/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jul 13 19:01:04 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-July/000267.html Our Agenda19:01
diablo_rojo_phoneJuuuust made it to Seattle. 19:01
diablo_rojo_phoneSo I may half paying attention. 19:01
clarkbdiablo_rojo_phone: don't worry about it19:01
clarkb#topic Announcements19:01
ianwo/19:01
clarkbA reminder that the gerrit server will be moving July 18 at 23:00UTC. We'll talk about that in more depth later in the meeting though19:02
clarkbOther than that I dind't have any announcements19:02
clarkbI can't type didn't today19:02
clarkb#topic Actions from last meeting19:02
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-06-19.01.txt minutes from last meeting19:02
clarkb#action someone write spec to replace Cacti with Prometheus19:02
clarkbThat hasn't happened yet. I'm not too worried about it as we've been focused on updates to other systems. But once we can get up for some air that would be a good thing to look at next19:03
clarkbThere were no other actions recorded that I saw19:03
clarkb#topic Specs Approval19:03
clarkb#link https://review.opendev.org/796156 Supporting communications on our very own Matrix homeserver19:03
clarkbI think this is now in a position where people can review it with enough real world information to make informed decisions19:04
clarkbWe have a matrix homeserver up for opendev.org. We have a test channel on that server. infra-root can invite themselves to that channel usign the admin account (details in the typical location) or you can ask corvus, mordred, fungi, or myself to add you19:04
clarkbthoguh doesn't look like fungi has made it in there yet19:05
clarkbgiven all that do we think we are in a position to put the spec up for approval now? I'm comfortable with that myself19:05
corvus++19:05
clarkbcorvus: ^ you may have input19:05
clarkbconsidering the focus on gerrit things this week and that fungi is not currently around. What about asking for reviews before 7/22 then approving it then if there are no objections19:06
clarkb(gives people a few days after this week to review it)19:06
corvuswfm19:07
clarkbAlright infra-root please review https://review.opendev.org/796156 by 7/2219:07
clarkband feelfree to interact with the system that is there to aid your review19:08
clarkb#topic Topics19:08
clarkb#topic review server upgrade19:08
clarkbThis is still schedueld for July 18 at 23:00 UTC19:09
clarkbwe ran into a small hiccup today when it was noticed that depends-on had stopped working. Turns out this was related to switching zuul to talk to review01.opendev.org instead of review.opendev.org. Zuul uses that config the line up the depends on and determine which are valid19:09
clarkbA revert of that chaneg is in the gate right now and we'll need to restart Zuul once the deploy job for zuul runs for that19:10
clarkbTo handle the DNS update during the migration I think we can force merge the DNSchange in gerrit on review02, then manually pull and run the dns deploy playbook on bridge19:10
clarkbNot as elegant but prevents depends-on from being ignored19:10
clarkbianw: ^ not sure if you had caught up on all that scrollback yet but that is the tldr19:10
ianw++ yep, i will re-write the checklist today to account for that19:11
clarkbI also pushed up a change to reduce the TTL on the review.o.o cname record to 300 seconds since updating that will be more important now19:11
clarkbwe should be able to land that one today to get it out of the way19:11
ianwyep, good idea19:12
clarkbI think it would be good to do a resync of review02 today as well. Then we can spin it up with the current gerrit image and make sure everything looks happy19:12
clarkbI have a related item on the next topic, but I'll hold off in case there are other upgrade specific things to go over19:12
clarkboh! have reminders gone out yet? we should send those to the mailing list. The meme is peopel don't read but we can only do our best :)19:13
corvussend our reminder gifs?19:13
ianwahh, i said i would do that and got sidetracked sorry.  i'll send one in reply to the original now19:13
clarkbianw: thanks!19:13
clarkbcorvus: no reminders that the server will haev a longer that typical outage19:13
clarkbcorvus: but adding gifs is probably a good way to get people to read them :)19:13
clarkbAnything else?19:14
clarkb#topic Gerrit Account Cleanup19:16
clarkbI won't bother with cleaning up the ~176 external ID conflicts that I retired accounts for until after the move19:16
clarkbhowever efoley reached out yesterday after they managed to get themselves into a bad spot with their account.19:16
clarkbThe root cause has been captured in https://bugs.chromium.org/p/gerrit/issues/detail?id=14776 tldr is deleting emails in the web ui is not safe if you delete the email for your openid it also deletes your openid externalid19:17
clarkbWe can't fix this in a simple way because of the conflicts I have been working to cleanup. What we can do is take advantage of the downtime to push a fix to the externalid records under gerrit then we'll reindex anyway and in theory be happy19:17
clarkbianw: ^ I started looking at the testing and staging of this on review02 today. That led me to create a /home/gerrit2/scratch/ dir where I was going to clone All-Users to and then checkout refs/meta/external-ids to make the necessary edits so they are staged and ready to go (and possibly test them?)19:18
clarkbianw: but I ran into a weird thing: I don't want that dir to be backed up because the refs/meta/externalids checkout has tons of small files and we already backup the source repo19:19
clarkbianw: is there a better location for me to do that? maybe /tmp? we can figure that out after the meeting too19:19
ianwhrm, yeah the root disk should be big enough to handle it19:20
clarkbBut I am hoping to be able to stage that all up, push the fixes back into review_site/git/All-Users.git after we sync up to current state then maybe have efoley test a login against review02 if we turn it on19:20
clarkbI'll coordinate that with ianw and we can edit the outage doc with what we learn19:20
ianwotherwise we could do something like add an exclude to ~gerrit2/tmp ... that might be a good idea as even on the old server we've acquired random intermediate bits of little value19:20
ianwso working in ~/tmp/<user>/ ... would be good that we know we can always remove those bits (and a signal to us working to remind us to consider it ephemeral and do things another way if we want it persisted)19:21
clarkbnot a bad idea. I actually do that on my personal machines because tmp is small19:22
clarkbAnyway I think /tmp will work for now and we can coordinate the syncing and testing bits later19:23
clarkbAnother odd thing I noticed when doing that is /home/gerrit2 is root:root ownership19:23
clarkbwhich means gerrit2 can't create dirs or files in its own homedir root. I suspect something related to docker containers with that?19:23
clarkbNot critical either, but things like that make me want to turn on the gerrit if we can and ensure it starts up cleanly19:24
clarkb#topic gitea backups failing to one backup target19:24
ianwhrm, i quite possibly did a mkdir of /home/gerrit2 to get the LVM mounted there19:24
clarkbianw: ah19:24
clarkbianw: re gitea backups do we still suspecttimeout values in mysql configs?19:25
ianwso that would be an oversight.  i definitely have started it and played with it, so it does minimally work19:25
clarkbcool19:25
ianwumm, last thing was the ipv6 between gitea01 -> backup was seem to not work19:25
clarkboh right this is the vexxhost between regions routing problem19:25
ianwi've reported that to mnaser and i believe an issue was raised, but i haven't followed up since19:25
clarkbok. This topic is on here mostly to remind me to catch up if there are any updates to catch up on. Sounds like we are still waiting for vexxhost19:26
clarkbmaybe we should consider dropping the AAAA record for now?19:26
ianwit seems unfortunate but we could19:27
ianwalso the filesystem component of the backup is working19:27
ianwso it must be falling back to ipv419:27
clarkbI wonder if the streaming setup for the db prevents fallback from working19:27
clarkbbecause the stream gets interrupted19:27
clarkbvs the fs backup which can simply reconnect and then start doing files19:28
ianwafaics borg doesn't log anything of interest relating to that19:29
ianwi'll have to fiddle more, i'll put it on the todo list19:29
clarkbIt seems plausible at least19:29
ianwthe ipv6 may be a red herring to the actual problem19:29
clarkbya19:29
clarkband thanks19:29
ianwit would just be nice to debug one thing at a time :)19:29
clarkb++19:30
clarkb#topic Gitea 1.14.4 upgrade scheduling19:30
clarkb#link https://review.opendev.org/c/opendev/system-config/+/800274 Gitea 1.14.4 upgrade19:30
clarkbI've got this change passing now. It is one of the larger Gitea upgrade changes that we've had I think19:30
clarkbworthy of careful review. There is a link to a held test node running that code too19:30
clarkbGiven everything else happening I'm happy to defer this to next week assuming things settle down a bit :) But if you have time for review this week that would be helpful19:31
clarkbas that way I can address any concerns before we actually do the upgrade19:31
ianw++ i played around and change overall lgtm19:31
clarkb#topic Scheduling Gerrit Project Renames19:32
clarkbWe said we'd do the week after the server upgrade/mova previously. Does anyone have opinions on a specific day for that? Probably Monday 7/26 or Friday 7/30? (I think I'm supposed to not be around on 7/30)19:33
clarkbAny objections to pencilling in 7/26?19:34
clarkbLet's pencil that in then and when fungi returns we can talk about a specific timeframe19:35
clarkbI expect the rename outage to be quite short as we can do online reindexing19:35
clarkb#topic Open Discussion19:35
clarkbAnything else?19:35
ianwI got https://paste01.opendev.org/ up19:36
ianwi have a minor change to db layout to merge, but will then import the old database19:36
ianwif it seems to work, i'm presuming no objections to changing the paste.openstack.org CNAME ?19:37
clarkbsounds good to me19:37
ianwi don't think the service has a bright future, but it should continue chugging along for a while in it's container19:38
ianwas with all good web apps, every library it depends on has changed to the point that you basically have to rewrite everything to update it19:39
clarkbfun, I think vexxhost was doing some minor maintenance with it though19:39
ianwyeah, i got into "this bit deprecated from main framework, use this library -- oh, that library is now unmaintained and has  bug that makes it not work with later versions of main framework" loop and gave up19:40
clarkbSounds like that may be about it. I'll go ahead and call the meeting here so that we can proceed with the Zuul restart19:42
clarkbAs always feel free to bring discussion up in #opendev or at service-discuss@lists.opendev.org19:43
clarkbThank you everyone19:43
clarkb#endmeeting19:43
opendevmeetMeeting ended Tue Jul 13 19:43:16 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:43
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-13-19.01.html19:43
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-13-19.01.txt19:43
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-13-19.01.log.html19:43

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!