Tuesday, 2022-09-27

clarkbmeeting time19:00
clarkbit will be a short one since I think it will likely just be a couple of us19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Sep 27 19:01:24 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-September/000362.html Our Agenda19:01
clarkb#topic Announcements19:02
clarkbNo announcements. I guess keep in mind the openstack release is happening soon19:03
clarkbthen in a couple? maybe three? weeks the PTG is happening virtually19:03
clarkb#topic Mailman 319:03
fungiyet still moar progress19:04
fungimigration script is now included in the implementation change19:04
fungiand it's been updated to also cover the bits to move the backward-compat copies of the old archives to where we'll serve them from, and relink them properly19:05
fungiwe have a handful of mailing lists with some fields which are too large for their db columns, but i'll fix those in production and then we can do another (perhaps final) full migration test19:05
fungii think we're close to scheduling migration cut-over for lists.opendev.org and lists.zuul-ci.org19:06
clarkbis that the last thing we need to do bfeore holding a node and doing a migration test again?19:06
fungiafaik yes19:06
clarkbsounds great19:06
fungii noted the specific lists and fields at the end of th eetherpad in the todo list19:07
fungi#link https://etherpad.opendev.org/p/mm3migration19:07
fungimaybe next meeting we'll be in a position to talk about maintenance scheduling19:08
clarkband the disk filling was due to unbound and not something we expect to be a problem in production right?19:08
clarkbbecause unbound on our test nodes has more verbose logging settings19:08
fungicorrect, we apparently set unbound up for very verbose logging on our test nodes19:08
fungithe risks of holding a test node and letting it sit there for weeks19:09
clarkbexciting and let me know if there is anyhting else I can help with on this. I'e been trying to make sure I help keep it moving forward19:10
clarkbanything else?19:10
fungiyou said the ci failures on the lynx addition pr are bitrot, right?19:10
fungi#link https://github.com/maxking/docker-mailman/pull/55219:10
clarkbyes I'm pretty sure that was a upstream change to mailman3 iirc that broke it19:11
clarkbsomething to do with translations19:11
fungik19:11
clarkbbut that is a good reminder we should probably decide if we need lynx installed and do our own images or try to reach out to upstream more19:11
fungilooks like that repo has merged no new commits to its main branch since june19:12
clarkbI wonder if anyone knows the maintainer19:13
fungianyway, i don't have anything else on this topic19:15
clarkb#topic jaeger tracing server19:15
clarkbThe server is up and running19:15
corvusit's up, lgtm19:15
fungicertainly a thing we like to see in our servers ;)19:15
clarkbwe've also updated the zuul config to send tracing info to the jaeger server19:15
corvuswe're not sending data to it yet, but we should be configured to do so once we restart zuul with the changes that start exporting traces19:15
fungiis that waiting for the weekend, or should we restart sooner?19:16
clarkbrestarting sooner is easy to do with the playbook if the changes have landed to zuul19:17
corvusi'm inclined to just let it happen over the weekend.... but i dunno, maybe a restart of the schedulers before then might be interesting?19:17
corvusthat might get us some traces without restarting the schedulers19:17
fungiwithout restarting the executors?19:18
corvusyep that, sorry :)19:18
fungicool, just making sure i understood!19:18
fungithat sounds fine to me. i can probably do that after dinner if we're ready19:18
corvuswould be a quick sanity check, maybe if there's an issue we can fix it before the weekend.19:19
corvuswell, i don't think the zuul stack has merged up to where i'd like it to yet19:19
fungiwhich changes are still outstanding?19:20
corvusthere's a good chunk approved, but they hit some failures19:20
corvusi just rechecked them19:20
fungiahh19:20
corvus#link https://review.opendev.org/858372 initial retracing change tip19:20
corvusi think a partial (or full) restart after that makes sense19:20
corvusso that's the one to watch for.  i can ping when it lands19:20
fungiokay, so we're clear to restart schedulers once there are images built from 858372?19:21
corvusya.  sounds like a plan19:21
fungi(and tagged latest in promote of course)19:21
fungicool, i've starred that so i should get notifications for it19:21
clarkband once that restart happens we'd expect to see traces in jaeger. Any issue with potentially orphaned traces since all components won't be restarted yet?19:22
clarkb(thinking about this from a rolling upgrade standpoint). I guess the jaeger server might hold onto partial info that we can't view but otherwise its fine?19:22
corvusi don't think so - in my testing jaeger just shows a little error message as a sort of footnote to the trace19:22
corvusso it knows there's missing info, but it's no big deal19:22
corvusand eventually things start looking more complete19:23
fungii could do a rolling scheduler restart, and then follow it up with a full restart, if that's preferable19:23
fungifull rolling restart of all components i mean19:23
corvusfungi: not worth it for this -- no matter what, there will be partial/orphaned traces.  that's sort of intentional in order to keep the complexity of the changes under control19:23
fungiokay, wfm19:24
corvusalso (schedulers are generally emitting the highest level traces, so we'll just be missing children, not parents, once all the schedulers are restarted)19:24
corvusthen over the next days/weeks, we'll be landing more changes to zuul to add more traces, so data will become richer over time19:25
fungiawesome19:25
corvusi think that's probably it from my pov19:25
clarkbsounds good, thanks19:25
clarkb#topic Nodepool Disk Utilization19:26
fungihopefully we're in the clear here now19:26
fungiand least for a while19:27
clarkbyup you just expanded the disks on both builders to 2TB which is plenty19:27
fungiyeah, i added a 1tb volume to each of nb01 and nb02 and expanded /opt to twice its earlier size19:27
clarkbI wasn't sure where this was when I put the agenda togethe rso wanted to make sure it was brought up19:27
fungiprobably safe to take off the agenda now, unless we want to monitor things for a while19:28
clarkbagreed. I think this was the solution. Thank you for taking care of it19:28
funginp19:28
clarkb#topic Open Discussion19:29
clarkbLooks like https://review.opendev.org/c/zuul/zuul-jobs/+/858961 has the +2's it needs. Are ya'll good with me approving when I can keep an eye on it?19:30
clarkbIf so I can probably do that tomorrow (today I've got to get the kids from schoo land stuff so my afternoon won't be attached to a keyboard the whole time)19:30
fungiyeah, i say go for it19:30
fungivolume for reprepro mirror of ceph quincy deb packages has been created, i'm just waiting to see if the jobs run clean on 858961 now19:30
clarkboh I was also going to land the rocky 9 arm64 image if we are happy wit hit19:31
fungiyeah, sounds good, thanks19:31
clarkbthat needs less attention if it goes wrong19:31
clarkbunless it kills a launcher I suppose but our testing should catch that now19:31
fungiwell, a similar change recently did take down nl0319:31
fungior was it nl04?19:31
clarkbnl04. But ya I can approv ethat one when I've got more consistent time at a keyboard to check things too19:32
fungianyway, there's certainly some associated risk but yes if we're on top of double-checking after it deploys i think that's fine19:32
fungidoes anyone know whether we expect system-config-run-base-ansible-devel to ever start passing again?19:33
clarkbfungi: yes I think the issue is that bridge is too old for new ansible19:34
fungicurrently seems to be breaking because we test with python 3.8 and ansible-core needs >=3.919:34
fungiso that makes sense19:34
clarkbso once we get the venv stuff done and switched over to a newer bridge in theory it will work19:34
fungiawesome19:35
clarkblast call for anything else19:36
* fungi gots nuthin'19:36
fungiother than a gnawing hunger for takeout19:36
clarkb#endmeeting19:36
opendevmeetMeeting ended Tue Sep 27 19:36:52 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:36
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.html19:36
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.txt19:36
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.log.html19:36
clarkbthanks everyone!19:36
fungithanks clarkb!19:37
clarkbok now to find lunch and then rereview the rocky9 change even if I don't approve it now19:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!