clarkb | Anyone else here for the meeting? We will get started shortly | 18:59 |
---|---|---|
fungi | yuppp | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jul 27 19:01:20 2021 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link http://lists.opendev.org/pipermail/service-discuss/2021-July/000270.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
ianw | o/ | 19:01 |
clarkb | I have none (though the first real agenda item is likely to produce one) | 19:01 |
clarkb | #topic Specs Approval | 19:02 |
clarkb | Just a note that I did approve the matrix spec last week | 19:02 |
clarkb | There is a series of changes that can be found at topic:matrix to start running an eavesdrop and gerritbot equivalent against our test matrix channel | 19:02 |
clarkb | I think I'm mostly up to date on reviews for those but reviews from others are always appreciated as well | 19:02 |
fungi | wake up, neo! | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2021/infra.2021-07-20-19.01.txt minutes from last meeting | 19:03 |
clarkb | #action someone write spec to replace Cacti with Prometheus | 19:03 |
clarkb | Keeping that there as a reminder that we intend on trying to do that. I'm hopeful that maybe next week I'll be able to start poking at that though | 19:04 |
clarkb | as well as maybe start swinging back around on the list server upgrades | 19:04 |
fungi | promethean cactus | 19:04 |
clarkb | #topic Topics | 19:04 |
clarkb | Time to dive in | 19:04 |
clarkb | #topic Service Coordinator Election | 19:05 |
* fungi loves the topics topic | 19:05 | |
fungi | is it that time of half-year again? | 19:05 |
clarkb | It occurred to me that we must be getting close to when I said we should do our next service coordinator election and sure enough I had missed when I said we should open nominations by a couple weeks :/ | 19:05 |
clarkb | the original plan was to open nominations about 2 weeks ago then have an election next week if necessary | 19:05 |
fungi | shucks | 19:05 |
corvus | i guess you're it for another 6 months then | 19:06 |
fungi | stick a crash-test dummy in a chair and nominate it | 19:06 |
clarkb | I'm proposing that we open nominations now for two weeks then have an election from August 10-17 if necessary | 19:06 |
clarkb | If that seems reasonable I'll go ahead and send email to service-discuss telling people to get on the nominations | 19:07 |
clarkb | And figure out some maths for 6 months from now and try harder to not miss it again | 19:07 |
fungi | wfm, this is one of those "hot potato" elections where we all try to figure out how to be "not it" | 19:07 |
clarkb | (I think in my head elections happening August 1 meant that nothing needed to be done until August) | 19:07 |
clarkb | fungi: ianw corvus ya one of yall should take the hot potato :) | 19:07 |
clarkb | I don't hear any complaints with that plan. Feel free to bring them up over the next hour or two as I need to finish this meeting and eat lunch before I send that email | 19:08 |
clarkb | But if I don't hear anything until then I'll proceed with the proposal above | 19:08 |
fungi | yeah, i'll have to decide between getting nothing done for the next six months or getting even less done ;) | 19:09 |
clarkb | #topic Review Upgrade | 19:10 |
clarkb | I mentioned this in #opendev earlier today, but I had privs bumped up temporarily to do some account cleanups and took advatnage of that to check melody | 19:10 |
clarkb | melody reports the new server has peaked at 84GB of memory use. I believe that is below the upper limit we have given it. | 19:10 |
clarkb | I think this gives us a few opportunities. First we can increase the size of our caches to fill more space. We can also reduce the heap limit provided to the jvm and give more memory to apache and the kernel caches | 19:11 |
clarkb | Neither seems super urgent right now but as we gather more of this real world data we can start to think about how to better take advantage of the new server | 19:11 |
ianw | yeah iirc we set the java heap limit at 96gb | 19:11 |
fungi | might want to check again just before we restart for the renames just to see if the needle has moved | 19:11 |
clarkb | fungi: ++ | 19:12 |
fungi | (because we'll be resetting it at that point) | 19:12 |
clarkb | It is also a good double check on the assertion we needed more memory | 19:12 |
fungi | is there a formal plan somewhere yet for the rename maintenance i can add that? | 19:12 |
clarkb | This is about double what we were able to provide the server before so it definitely wanted more | 19:13 |
clarkb | fungi: not yet, that is on the agenda for later in the meeting to start getting all that together | 19:13 |
fungi | no sweat | 19:13 |
clarkb | I think the other big thing related to the review upgrade is cleanup of the old review01 server and review-test server | 19:13 |
fungi | we also probably have some clearance between the 96gib allocated to the jvm and the 125gib allocated to the vm but should see if we can spot how that's being spent | 19:14 |
clarkb | fungi: yup cacti should hopefull give us that info? | 19:14 |
clarkb | ianw: mordred and myself have confirmed that we believe we have all the info we want from review-test now. I think we are waiting on fungi to confirm as well. | 19:14 |
fungi | oh, jeez, responsibilities | 19:14 |
clarkb | ianw: I bring up review-test because I suspect cleaning it up may conflict with cleaning up review01. Do we want to work together on that? | 19:15 |
clarkb | I know you emntioned waiting for frickler_pto to return from pto before doing those cleanups as well so this probably isnt urgent. Just let me know how you want to proceed on that and I'll try to avoid stepping on toes | 19:15 |
fungi | there's nothing but dotfiles/dotdirs in my homedir now, so nothing i care to keep | 19:16 |
ianw | ok, i think i proposed a change to remove review-test bits? | 19:16 |
clarkb | ianw: oh maybe you did and I should go review it | 19:16 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/801556 | 19:16 |
clarkb | great sounds like we're ready to land ^ if it passes review. I've added it to the afternoon review list | 19:17 |
fungi | er, i mean nothing but dotfiles/dotdirs in my homedir on the old review server, but same goes for review-test too | 19:17 |
ianw | yeah, i assume everyone but frickler_pto has cleaned up on review01 too? | 19:17 |
clarkb | fungi: I called you and mordred out on review-test because you were both invovled in that server (along with myself) for various tasks | 19:17 |
clarkb | ianw: I got what I know is important on review01 from emory. But should do a skim to double check I'm not forgetting anything | 19:17 |
fungi | also getting rid of review-test will finally silence the cronjob errors i keep getting e-mailed about, so thumbs-up | 19:18 |
fungi | memory? | 19:18 |
clarkb | ya sorry memory. I knew I wnted the gerrit user cleanup records | 19:18 |
clarkb | as it will help us go back and do surgery if we need to for any of these accounts in the future | 19:18 |
fungi | just making sure there wasn't someone new named emory i was unaware of | 19:18 |
fungi | i did take a week off, after all | 19:19 |
clarkb | #topic Project Renames | 19:19 |
fungi | there's just the one so far, right? | 19:20 |
clarkb | We've said we'll do these July 30 at 15:00UTC. I sent email to service-announce announcing this | 19:20 |
fungi | and we're aiming to start at 15z friday? | 19:20 |
fungi | cool | 19:20 |
clarkb | and yup there is a single rename. | 19:20 |
fungi | having just one is probably better for testing this out anyway | 19:20 |
clarkb | As part of prep work I've been working on a change to test our rename playbook | 19:20 |
clarkb | #link https://review.opendev.org/c/opendev/system-config/+/802112 Test the rename playbook | 19:20 |
fungi | since it'll be the first rename after our major 3.x upgrade | 19:20 |
clarkb | fungi: ++ | 19:20 |
clarkb | I don't think this testing change needs to land before we do the renames since it seems to show that the playbook is working as is | 19:21 |
fungi | but it's helpful to have it as a demonstration we expect this to work | 19:21 |
clarkb | There are some weird test framework integration things to sort out. I do intend on making that cleaner and merging the change. Just pointing out it isn't on our critical path here | 19:21 |
clarkb | yup | 19:21 |
clarkb | Additional prep work that needs doing: 1) review the rename change and ensure it is mergable 2) push a change to opendev/project-config to record the rename 3) write up a plan in an etherpad | 19:22 |
clarkb | for 1) having a couple people do that is good and I intend on being one of them. Does anyone want to do 2) and 3) | 19:23 |
clarkb | I can work on 2) and 3) tomorrow. | 19:24 |
clarkb | if someone else wants to do it let me know :) | 19:24 |
fungi | out of curiosity, does 802112 actually confirm the group rename worked? | 19:24 |
clarkb | fungi: it confirms that the group rename doesn't explode, but doesn't actively check the new group is what is in gerrit | 19:24 |
fungi | it just dawned on me to look and while you include a group rename there i don't see it assert the new name exists | 19:24 |
clarkb | fungi: we can pretty easily add a testinfra test to check that though | 19:25 |
fungi | yeah, okay. not super important for now | 19:25 |
fungi | just making sure i wasn't misreading | 19:25 |
fungi | fine as a todo for us down the line | 19:25 |
clarkb | The other thing to be aware of is that corvus has some changes up to zuul to change how zuul project secrets are managed | 19:26 |
clarkb | If those changes land before friday we'll need to update our process for renaming a project I think | 19:26 |
clarkb | I'm hoping that don't land this week :) but be aware of that if they do | 19:26 |
clarkb | s/that/they/ | 19:26 |
fungi | i can't recall, do we actually restart zuul during the renames maintenance? | 19:26 |
clarkb | fungi: we do not | 19:26 |
corvus | i'm not planning to push for those this week | 19:27 |
fungi | then it's really if those land *and* we've restarted the scheduler? | 19:27 |
clarkb | fungi: yes | 19:27 |
fungi | k | 19:27 |
corvus | i'd like to restart soon and make a release and land those afterwords. but even if they do land, it's not a huge deal | 19:27 |
corvus | also, if the projects being renamed don't have secrets, it doesn't matter at all | 19:27 |
clarkb | ya its not a huge deal just need to maybe comment out the zuul stuff in the playbook when we run it and manually execute the tools to do it this time around (or update the playbook) | 19:28 |
fungi | and the alternative would be to script changes via zkshell? | 19:28 |
clarkb | fungi: the changes from corvus include new tools to use instead of zkshell | 19:28 |
fungi | oh, right i sort of remember that now ;) | 19:28 |
clarkb | zkshell is an option but probably one to avoid | 19:28 |
clarkb | corvus: oh that is a good point | 19:28 |
fungi | so we'd dump, rename, then import? | 19:28 |
clarkb | I'm not sure if this project has secrets but I can double check | 19:29 |
clarkb | fungi: or if the copy and delete commands land too then we can copy then delete | 19:29 |
fungi | (and cleanup the old copies in zk afterward) | 19:29 |
fungi | yeah, that | 19:29 |
fungi | okay | 19:29 |
clarkb | I really don't expect it to be a major issue, but did want to call it out as I'm not sure how closely others are following that in zuul | 19:29 |
fungi | so anyway, the lingering question is who will sign up for the rename prep 3 tasks you outlined | 19:30 |
clarkb | Other than all that I thinkwe check in Thursday and make sure we're ready to go as far as changes being pushed and mergable and a documented plan goes | 19:30 |
clarkb | fungi: ya I think I have time for them tomorrow if no one else gets started earlier | 19:30 |
fungi | i'll definitely do #1, i guess i can also volunteer for #2 if someone else will take #3 (or vice versa if there's a preference) | 19:30 |
clarkb | and we can check in thursday to make sure we aren't missing anything | 19:30 |
clarkb | fungi: ok I'll coordinate with you tomorrow ? | 19:31 |
fungi | wfm, i'm around all day | 19:31 |
fungi | minus a brief errand at some point | 19:31 |
clarkb | #topic gitea01 backups | 19:31 |
clarkb | ianw: any word from this on the vexxhost side of things? I had issues copying files from review-test to review02 to record gerrit user cleanups over ipv6 the other day too | 19:32 |
clarkb | I had to use a -4 flag | 19:32 |
ianw | no, apparently an issue was opened but i haven't received any response to a ping about it last week | 19:33 |
ianw | i haven't seen mnaser around much even in the vexxhost channel, so it probably is not an effective communication mechanism | 19:34 |
ianw | we also rebooted without that helping | 19:34 |
clarkb | alright, not much we can do other than dropping AAAA records in dns and I'm not sure we're there yet. Maybe we should do that for the backup servers though to make this particular issue go away? | 19:34 |
clarkb | I guess it would be server not servers | 19:34 |
ianw | yeah it works to rax | 19:35 |
fungi | so far all we know is it's broken between different vexxhost regions, right? | 19:35 |
clarkb | fungi: and review-test to review02 in similar fashion | 19:35 |
ianw | so we're just getting 24 hour dumps, instead of 12 hour dumps, so probably not super critical | 19:36 |
fungi | that lends further credence to the stale route entry in one of the core routers | 19:36 |
fungi | theory | 19:36 |
clarkb | anyway this continues to be non critical, but I don't want to forget about it. Hopefully vexxhost is able to dig into it | 19:37 |
clarkb | #topic PTG Participation | 19:37 |
clarkb | I submitted PTG participation for us | 19:38 |
fungi | thanks! | 19:38 |
clarkb | Signed up for 14:00 - 16:00 UTC Wednesday October 20, 2021 | 19:38 |
clarkb | That block seemed to be popular last time and the others were not. | 19:38 |
clarkb | #link https://openinfra-ptg.eventbrite.com/ Register if you plan to attend | 19:38 |
clarkb | If you can make that time block to help answer questions or drive discussion that would be great. But I know it is pretty terrible for ianw in particular. But also fairly early for west coast usa | 19:39 |
fungi | it's no sweat for me at least | 19:39 |
clarkb | I plan to be there. If you can't make it that isn't a big deal as the idea is to help others with opendev questions more so than for us to use the time to collaborate directly | 19:39 |
fungi | now i just need to remember to register | 19:39 |
clarkb | #topic Open Discussion | 19:41 |
clarkb | Over the last day and a half I've done a big push on gerrit account email conflict cleanups | 19:41 |
clarkb | I think those went well and we are now down to 103 remaining conflicts | 19:41 |
fungi | that's amazing | 19:42 |
fungi | and soon 102 right? | 19:42 |
clarkb | I've put an audit result yaml file and a set of proposed next cleanups on review02 in ~/clarkb/gerrit_user_cleanups/notes if anyone else is able to take a look at those. | 19:42 |
clarkb | fungi: it was 105 yseterday and I got it to 103 today after cleaning up dpawlik's account and a straggler from yesterday | 19:42 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/801667 | 19:42 |
ianw | that's a review to update restart flags to be more consistent | 19:42 |
fungi | i haven't looked yet, do you recall how many the new batch will knock out? | 19:42 |
clarkb | fungi: its about 70 | 19:42 |
fungi | that's a substantial chunk | 19:42 |
clarkb | fungi: I think we've whittled it down to about 30 where direct reach out is a good idea | 19:43 |
corvus | is anyone else planning on reviewing the matrix changes? i believe we're about ready to run 2 bots, and with those landed, we could start prepping to move the zuul project over, but the changes aren't getting much in the way of votes. | 19:43 |
clarkb | corvus: I mentioned them at the beginning of the meeting and asked for reviews. I think I've reviewed them, but would be good for someone other than myself to review them if possible | 19:44 |
clarkb | For the remaing ~30 accounts I'd like to push a change that fixes them and gets verfied by gerrit directly | 19:44 |
clarkb | we'll see what that looks like when I get to ~30 | 19:44 |
corvus | yeah, i'm looking for a review volunteer :) | 19:44 |
clarkb | ++ would be good to have another reviewer | 19:46 |
fungi | i intend to try but am cautious not to overcommit and don't want the changes to be held up waiting for my feedback | 19:46 |
clarkb | #link https://review.opendev.org/q/topic:matrix matrix bot changes | 19:46 |
clarkb | Oh and yuriys jumped on OFTC today and started talking about reviving the inmotion cloud. Sounds like there was a network card problem that caused an interface to drop whcih cascaded to sad rabbitmq | 19:47 |
corvus | fungi: does that mean we should waive the 2x+2 requirement? | 19:47 |
clarkb | restarting services seems to make things happier, but we're going to take the opportunity here to upgrade the operating system for kernel patches and update to newer openstack kolla docker images as well | 19:47 |
corvus | just so we're clear, these changes have been sitting out there for about 3 weeks, so at this point, i don't think i'm pushing an overly aggressive timeline | 19:48 |
fungi | corvus: it means i don't want to say i'm going to review it and then have people hold the changes open waiting for me even if they have enough reviews to merge | 19:48 |
fungi | but i'd also be fine merging them with a core reviewer proposing and another core reviewer who isn't me approving | 19:48 |
clarkb | in that case maybe we can proceedwith the eavesdrop bot landing. Then corvus and I can review tristanC's gerritbot change | 19:49 |
fungi | yeah, no objection | 19:49 |
fungi | i'm still happy to help supporting it even though i haven't had time to review | 19:49 |
clarkb | ianw: would you like to review any of those or are you good with proceeding? | 19:49 |
corvus | there was a previous +2 from mordred on those too; i imagine it's not easy for mordred to keep up with updates since then. | 19:50 |
fungi | but very much appreciated! | 19:50 |
clarkb | I think that may be it. ianw let us know if you want to review topic:matrix but we'll probably proceed later today/tomorrow if we don't hear otherwise | 19:53 |
clarkb | Thank you everyone! | 19:53 |
corvus | clarkb: thanks! | 19:53 |
clarkb | I'm about to grab lunch but then will be back toe send that service coordinator email and review all the things | 19:53 |
clarkb | #endmeeting | 19:53 |
opendevmeet | Meeting ended Tue Jul 27 19:53:40 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:53 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-27-19.01.html | 19:53 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-27-19.01.txt | 19:53 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2021/infra.2021-07-27-19.01.log.html | 19:53 |
fungi | thanks clarkb! | 19:53 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!