Tuesday, 2021-07-27

clarkbAnyone else here for the meeting? We will get started shortly18:59
fungiyuppp19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jul 27 19:01:20 2021 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link http://lists.opendev.org/pipermail/service-discuss/2021-July/000270.html Our Agenda19:01
clarkb#topic Announcements19:01
ianwo/19:01
clarkbI have none (though the first real agenda item is likely to produce one)19:01
clarkb#topic Specs Approval19:02
clarkbJust a note that I did approve the matrix spec last week19:02
clarkbThere is a series of changes that can be found at topic:matrix to start running an eavesdrop and gerritbot equivalent against our test matrix channel19:02
clarkbI think I'm mostly up to date on reviews for those but reviews from others are always appreciated as well19:02
fungiwake up, neo!19:03
clarkb#topic Actions from last meeting19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-20-19.01.txt minutes from last meeting19:03
clarkb#action someone write spec to replace Cacti with Prometheus19:03
clarkbKeeping that there as a reminder that we intend on trying to do that. I'm hopeful that maybe next week I'll be able to start poking at that though19:04
clarkbas well as maybe start swinging back around on the list server upgrades19:04
fungipromethean cactus19:04
clarkb#topic Topics19:04
clarkbTime to dive in19:04
clarkb#topic Service Coordinator Election19:05
* fungi loves the topics topic19:05
fungiis it that time of half-year again?19:05
clarkbIt occurred to me that we must be getting close to when I said we should do our next service coordinator election and sure enough I had missed when I said we should open nominations by a couple weeks :/19:05
clarkbthe original plan was to open nominations about 2 weeks ago then have an election next week if necessary19:05
fungishucks19:05
corvusi guess you're it for another 6 months then19:06
fungistick a crash-test dummy in a chair and nominate it19:06
clarkbI'm proposing that we open nominations now for two weeks then have an election from August 10-17 if necessary19:06
clarkbIf that seems reasonable I'll go ahead and send email to service-discuss telling people to get on the nominations19:07
clarkbAnd figure out some maths for 6 months from now and try harder to not miss it again19:07
fungiwfm, this is one of those "hot potato" elections where we all try to figure out how to be "not it"19:07
clarkb(I think in my head elections happening August 1 meant that nothing needed to be done until August)19:07
clarkbfungi: ianw corvus  ya one of yall should take the hot potato :)19:07
clarkbI don't hear any complaints with that plan. Feel free to bring them up over the next hour or two as I need to finish this meeting and eat lunch before I send that email19:08
clarkbBut if I don't hear anything until then I'll proceed with the proposal above19:08
fungiyeah, i'll have to decide between getting nothing done for the next six months or getting even less done ;)19:09
clarkb#topic Review Upgrade19:10
clarkbI mentioned this in #opendev earlier today, but I had privs bumped up temporarily to do some account cleanups and took advatnage of that to check melody19:10
clarkbmelody reports the new server has peaked at 84GB of memory use. I believe that is below the upper limit we have given it.19:10
clarkbI think this gives us a few opportunities. First we can increase the size of our caches to fill more space. We can also reduce the heap limit provided to the jvm and give more memory to apache and the kernel caches19:11
clarkbNeither seems super urgent right now but as we gather more of this real world data we can start to think about how to better take advantage of the new server19:11
ianwyeah iirc we set the java heap limit at 96gb19:11
fungimight want to check again just before we restart for the renames just to see if the needle has moved19:11
clarkbfungi: ++19:12
fungi(because we'll be resetting it at that point)19:12
clarkbIt is also a good double check on the assertion we needed more memory19:12
fungiis there a formal plan somewhere yet for the rename maintenance i can add that?19:12
clarkbThis is about double what we were able to provide the server before so it definitely wanted more19:13
clarkbfungi: not yet, that is on the agenda for later in the meeting to start getting all that together19:13
fungino sweat19:13
clarkbI think the other big thing related to the review upgrade is cleanup of the old review01 server and review-test server19:13
fungiwe also probably have some clearance between the 96gib allocated to the jvm and the 125gib allocated to the vm but should see if we can spot how that's being spent19:14
clarkbfungi: yup cacti should hopefull give us that info?19:14
clarkbianw: mordred and myself have confirmed that we believe we have all the info we want from review-test now. I think we are waiting on fungi to confirm as well.19:14
fungioh, jeez, responsibilities19:14
clarkbianw: I bring up review-test because I suspect cleaning it up may conflict with cleaning up review01. Do we want to work together on that?19:15
clarkbI know you emntioned waiting for frickler_pto to return from pto before doing those cleanups as well so this probably isnt urgent. Just let me know how you want to proceed on that and I'll try to avoid stepping on toes19:15
fungithere's nothing but dotfiles/dotdirs in my homedir now, so nothing i care to keep19:16
ianwok, i think i proposed a change to remove review-test bits?19:16
clarkbianw: oh maybe you did and I should go review it19:16
ianw#link https://review.opendev.org/c/opendev/system-config/+/80155619:16
clarkbgreat sounds like we're ready to land ^ if it passes review. I've added it to the afternoon review list19:17
fungier, i mean nothing but dotfiles/dotdirs in my homedir on the old review server, but same goes for review-test too19:17
ianwyeah, i assume everyone but frickler_pto has cleaned up on review01 too?19:17
clarkbfungi: I called you and mordred out on review-test because you were both invovled in that server (along with myself) for various tasks19:17
clarkbianw: I got what I know is important on review01 from emory. But should do a skim to double check I'm not forgetting anything19:17
fungialso getting rid of review-test will finally silence the cronjob errors i keep getting e-mailed about, so thumbs-up19:18
fungimemory?19:18
clarkbya sorry memory. I knew I wnted the gerrit user cleanup records19:18
clarkbas it will help us go back and do surgery if we need to for any of these accounts in the future19:18
fungijust making sure there wasn't someone new named emory i was unaware of19:18
fungii did take a week off, after all19:19
clarkb#topic Project Renames19:19
fungithere's just the one so far, right?19:20
clarkbWe've said we'll do these July 30 at 15:00UTC. I sent email to service-announce announcing this19:20
fungiand we're aiming to start at 15z friday?19:20
fungicool19:20
clarkband yup there is a single rename.19:20
fungihaving just one is probably better for testing this out anyway19:20
clarkbAs part of prep work I've been working on a change to test our rename playbook19:20
clarkb#link https://review.opendev.org/c/opendev/system-config/+/802112 Test the rename playbook19:20
fungisince it'll be the first rename after our major 3.x upgrade19:20
clarkbfungi: ++19:20
clarkbI don't think this testing change needs to land before we do the renames since it seems to show that the playbook is working as is19:21
fungibut it's helpful to have it as a demonstration we expect this to work19:21
clarkbThere are some weird test framework integration things to sort out. I do intend on making that cleaner and merging the change. Just pointing out it isn't on our critical path here19:21
clarkbyup19:21
clarkbAdditional prep work that needs doing: 1) review the rename change and ensure it is mergable 2) push a change to opendev/project-config to record the rename 3) write up a plan in an etherpad19:22
clarkbfor 1) having a couple people do that is good and I intend on being one of them. Does anyone want to do 2) and 3)19:23
clarkbI can work on 2) and 3) tomorrow.19:24
clarkbif someone else wants to do it let me know :)19:24
fungiout of curiosity, does 802112 actually confirm the group rename worked?19:24
clarkbfungi: it confirms that the group rename doesn't explode, but doesn't actively check the new group is what is in gerrit19:24
fungiit just dawned on me to look and while you include a group rename there i don't see it assert the new name exists19:24
clarkbfungi: we can pretty easily add a testinfra test to check that though19:25
fungiyeah, okay. not super important for now19:25
fungijust making sure i wasn't misreading19:25
fungifine as a todo for us down the line19:25
clarkbThe other thing to be aware of is that corvus has some changes up to zuul to change how zuul project secrets are managed19:26
clarkbIf those changes land before friday we'll need to update our process for renaming a project I think19:26
clarkbI'm hoping that don't land this week :) but be aware of that if they do19:26
clarkbs/that/they/19:26
fungii can't recall, do we actually restart zuul during the renames maintenance?19:26
clarkbfungi: we do not19:26
corvusi'm not planning to push for those this week19:27
fungithen it's really if those land *and* we've restarted the scheduler?19:27
clarkbfungi: yes19:27
fungik19:27
corvusi'd like to restart soon and make a release and land those afterwords.  but even if they do land, it's not a huge deal19:27
corvusalso, if the projects being renamed don't have secrets, it doesn't matter at all19:27
clarkbya its not a huge deal just need to maybe comment out the zuul stuff in the playbook when we run it and manually execute the tools to do it this time around (or update the playbook)19:28
fungiand the alternative would be to script changes via zkshell?19:28
clarkbfungi: the changes from corvus include new tools to use instead of zkshell19:28
fungioh, right i sort of remember that now ;)19:28
clarkbzkshell is an option but probably one to avoid19:28
clarkbcorvus: oh that is a good point19:28
fungiso we'd dump, rename, then import?19:28
clarkbI'm not sure if this project has secrets but I can double check19:29
clarkbfungi: or if the copy and delete commands land too then we can copy then delete19:29
fungi(and cleanup the old copies in zk afterward)19:29
fungiyeah, that19:29
fungiokay19:29
clarkbI really don't expect it to be a major issue, but did want to call it out as I'm not sure how closely others are following that in zuul19:29
fungiso  anyway, the lingering question is who will sign up for the rename prep 3 tasks you outlined19:30
clarkbOther than all that I thinkwe check in Thursday and make sure we're ready to go as far as changes being pushed and mergable and a documented plan goes19:30
clarkbfungi: ya I think I have time for them tomorrow if no one else gets started earlier19:30
fungii'll definitely do #1, i guess i can also volunteer for #2 if someone else will take #3 (or vice versa if there's a preference)19:30
clarkband we can check in thursday to make sure we aren't missing anything19:30
clarkbfungi: ok I'll coordinate with you tomorrow ?19:31
fungiwfm, i'm around all day19:31
fungiminus a brief errand at some point19:31
clarkb#topic gitea01 backups19:31
clarkbianw: any word from this on the vexxhost side of things? I had issues copying files from review-test to review02 to record gerrit user cleanups over ipv6 the other day too19:32
clarkbI had to use a -4 flag19:32
ianwno, apparently an issue was opened but i haven't received any response to a ping about it last week19:33
ianwi haven't seen mnaser around much even in the vexxhost channel, so it probably is not an effective communication mechanism19:34
ianwwe also rebooted without that helping19:34
clarkbalright, not much we can do other than dropping AAAA records in dns and I'm not sure we're there yet. Maybe we should do that for the backup servers though to make this particular issue go away?19:34
clarkbI guess it would be server not servers19:34
ianwyeah it works to rax19:35
fungiso far all we know is it's broken between different vexxhost regions, right?19:35
clarkbfungi: and review-test to review02 in similar fashion19:35
ianwso we're just getting 24 hour dumps, instead of 12 hour dumps, so probably not super critical19:36
fungithat lends further credence to the stale route entry in one of the core routers19:36
fungitheory19:36
clarkbanyway this continues to be non critical, but I don't want to forget about it. Hopefully vexxhost is able to dig into it19:37
clarkb#topic PTG Participation19:37
clarkbI submitted PTG participation for us19:38
fungithanks!19:38
clarkbSigned up for 14:00 - 16:00 UTC Wednesday October 20, 202119:38
clarkbThat block seemed to be popular last time and the others were not.19:38
clarkb#link https://openinfra-ptg.eventbrite.com/ Register if you plan to attend19:38
clarkbIf you can make that time block to help answer questions or drive discussion that would be great. But I know it is pretty terrible for ianw in particular. But also fairly early for west coast usa19:39
fungiit's no sweat for me at least19:39
clarkbI plan to be there. If you can't make it that isn't a big deal as the idea is to help others with opendev questions more so than for us to use the time to collaborate directly19:39
funginow i just need to remember to register19:39
clarkb#topic Open Discussion19:41
clarkbOver the last day and a half I've done a big push on gerrit account email conflict cleanups19:41
clarkbI think those went well and we are now down to 103 remaining conflicts19:41
fungithat's amazing19:42
fungiand soon 102 right?19:42
clarkbI've put an audit result yaml file and a set of proposed next cleanups on review02 in ~/clarkb/gerrit_user_cleanups/notes if anyone else is able to take a look at those.19:42
clarkbfungi: it was 105 yseterday and I got it to 103 today after cleaning up dpawlik's account and a straggler from yesterday19:42
ianw#link https://review.opendev.org/c/opendev/system-config/+/80166719:42
ianwthat's a review to update restart flags to be more consistent19:42
fungii haven't looked yet, do you recall how many the new batch will knock out?19:42
clarkbfungi: its about 7019:42
fungithat's a substantial chunk19:42
clarkbfungi: I think we've whittled it down to about 30 where direct reach out is a good idea19:43
corvusis anyone else planning on reviewing the matrix changes?  i believe we're about ready to run 2 bots, and with those landed, we could start prepping to move the zuul project over, but the changes aren't getting much in the way of votes.19:43
clarkbcorvus: I mentioned them at the beginning of the meeting and asked for reviews. I think I've reviewed them, but would be good for someone other than myself to review them if possible19:44
clarkbFor the remaing ~30 accounts I'd like to push a  change that fixes them and gets verfied by gerrit directly19:44
clarkbwe'll see what that looks like when I get to ~3019:44
corvusyeah, i'm looking for a review volunteer :)19:44
clarkb++ would be good to have another reviewer19:46
fungii intend to try but am cautious not to overcommit and don't want the changes to be held up waiting for my feedback19:46
clarkb#link https://review.opendev.org/q/topic:matrix matrix bot changes19:46
clarkbOh and yuriys jumped on OFTC today and started talking about reviving the inmotion cloud. Sounds like there was a network card problem that caused an interface to drop whcih cascaded to sad rabbitmq19:47
corvusfungi: does that mean we should waive the 2x+2 requirement?19:47
clarkbrestarting services seems to make things happier, but we're going to take the opportunity here to upgrade the operating system for kernel patches and update to newer openstack kolla docker images as well19:47
corvusjust so we're clear, these changes have been sitting out there for about 3 weeks, so at this point, i don't think i'm pushing an overly aggressive timeline19:48
fungicorvus: it means i don't want to say i'm going to review it and then have people hold the changes open waiting for me even if they have enough reviews to merge19:48
fungibut i'd also be fine merging them with a core reviewer proposing and another core reviewer who isn't me approving19:48
clarkbin that case maybe we can proceedwith the eavesdrop bot landing. Then corvus and I can review tristanC's gerritbot change19:49
fungiyeah, no objection19:49
fungii'm still happy to help supporting it even though i haven't had time to review19:49
clarkbianw: would you like to review any of those or are you good with proceeding?19:49
corvusthere was a previous +2 from mordred on those too; i imagine it's not easy for mordred to keep up with updates since then.19:50
fungibut very much appreciated!19:50
clarkbI think that may be it. ianw  let us know if you want to review topic:matrix but we'll probably proceed later today/tomorrow if we don't hear otherwise19:53
clarkbThank you everyone!19:53
corvusclarkb: thanks!19:53
clarkbI'm about to grab lunch but then will be back toe send that service coordinator email and review all the things19:53
clarkb#endmeeting19:53
opendevmeetMeeting ended Tue Jul 27 19:53:40 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:53
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-27-19.01.html19:53
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-27-19.01.txt19:53
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-27-19.01.log.html19:53
fungithanks clarkb!19:53

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!