Tuesday, 2022-09-27

clarkb	meeting time	19:00
clarkb	it will be a short one since I think it will likely just be a couple of us	19:00
clarkb	#startmeeting infra	19:01
opendevmeet	Meeting started Tue Sep 27 19:01:24 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.	19:01
opendevmeet	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	19:01
opendevmeet	The meeting name has been set to 'infra'	19:01
clarkb	#link https://lists.opendev.org/pipermail/service-discuss/2022-September/000362.html Our Agenda	19:01
clarkb	#topic Announcements	19:02
clarkb	No announcements. I guess keep in mind the openstack release is happening soon	19:03
clarkb	then in a couple? maybe three? weeks the PTG is happening virtually	19:03
clarkb	#topic Mailman 3	19:03
fungi	yet still moar progress	19:04
fungi	migration script is now included in the implementation change	19:04
fungi	and it's been updated to also cover the bits to move the backward-compat copies of the old archives to where we'll serve them from, and relink them properly	19:05
fungi	we have a handful of mailing lists with some fields which are too large for their db columns, but i'll fix those in production and then we can do another (perhaps final) full migration test	19:05
fungi	i think we're close to scheduling migration cut-over for lists.opendev.org and lists.zuul-ci.org	19:06
clarkb	is that the last thing we need to do bfeore holding a node and doing a migration test again?	19:06
fungi	afaik yes	19:06
clarkb	sounds great	19:06
fungi	i noted the specific lists and fields at the end of th eetherpad in the todo list	19:07
fungi	#link https://etherpad.opendev.org/p/mm3migration	19:07
fungi	maybe next meeting we'll be in a position to talk about maintenance scheduling	19:08
clarkb	and the disk filling was due to unbound and not something we expect to be a problem in production right?	19:08
clarkb	because unbound on our test nodes has more verbose logging settings	19:08
fungi	correct, we apparently set unbound up for very verbose logging on our test nodes	19:08
fungi	the risks of holding a test node and letting it sit there for weeks	19:09
clarkb	exciting and let me know if there is anyhting else I can help with on this. I'e been trying to make sure I help keep it moving forward	19:10
clarkb	anything else?	19:10
fungi	you said the ci failures on the lynx addition pr are bitrot, right?	19:10
fungi	#link https://github.com/maxking/docker-mailman/pull/552	19:10
clarkb	yes I'm pretty sure that was a upstream change to mailman3 iirc that broke it	19:11
clarkb	something to do with translations	19:11
fungi	k	19:11
clarkb	but that is a good reminder we should probably decide if we need lynx installed and do our own images or try to reach out to upstream more	19:11
fungi	looks like that repo has merged no new commits to its main branch since june	19:12
clarkb	I wonder if anyone knows the maintainer	19:13
fungi	anyway, i don't have anything else on this topic	19:15
clarkb	#topic jaeger tracing server	19:15
clarkb	The server is up and running	19:15
corvus	it's up, lgtm	19:15
fungi	certainly a thing we like to see in our servers ;)	19:15
clarkb	we've also updated the zuul config to send tracing info to the jaeger server	19:15
corvus	we're not sending data to it yet, but we should be configured to do so once we restart zuul with the changes that start exporting traces	19:15
fungi	is that waiting for the weekend, or should we restart sooner?	19:16
clarkb	restarting sooner is easy to do with the playbook if the changes have landed to zuul	19:17
corvus	i'm inclined to just let it happen over the weekend.... but i dunno, maybe a restart of the schedulers before then might be interesting?	19:17
corvus	that might get us some traces without restarting the schedulers	19:17
fungi	without restarting the executors?	19:18
corvus	yep that, sorry :)	19:18
fungi	cool, just making sure i understood!	19:18
fungi	that sounds fine to me. i can probably do that after dinner if we're ready	19:18
corvus	would be a quick sanity check, maybe if there's an issue we can fix it before the weekend.	19:19
corvus	well, i don't think the zuul stack has merged up to where i'd like it to yet	19:19
fungi	which changes are still outstanding?	19:20
corvus	there's a good chunk approved, but they hit some failures	19:20
corvus	i just rechecked them	19:20
fungi	ahh	19:20
corvus	#link https://review.opendev.org/858372 initial retracing change tip	19:20
corvus	i think a partial (or full) restart after that makes sense	19:20
corvus	so that's the one to watch for. i can ping when it lands	19:20
fungi	okay, so we're clear to restart schedulers once there are images built from 858372?	19:21
corvus	ya. sounds like a plan	19:21
fungi	(and tagged latest in promote of course)	19:21
fungi	cool, i've starred that so i should get notifications for it	19:21
clarkb	and once that restart happens we'd expect to see traces in jaeger. Any issue with potentially orphaned traces since all components won't be restarted yet?	19:22
clarkb	(thinking about this from a rolling upgrade standpoint). I guess the jaeger server might hold onto partial info that we can't view but otherwise its fine?	19:22
corvus	i don't think so - in my testing jaeger just shows a little error message as a sort of footnote to the trace	19:22
corvus	so it knows there's missing info, but it's no big deal	19:22
corvus	and eventually things start looking more complete	19:23
fungi	i could do a rolling scheduler restart, and then follow it up with a full restart, if that's preferable	19:23
fungi	full rolling restart of all components i mean	19:23
corvus	fungi: not worth it for this -- no matter what, there will be partial/orphaned traces. that's sort of intentional in order to keep the complexity of the changes under control	19:23
fungi	okay, wfm	19:24
corvus	also (schedulers are generally emitting the highest level traces, so we'll just be missing children, not parents, once all the schedulers are restarted)	19:24
corvus	then over the next days/weeks, we'll be landing more changes to zuul to add more traces, so data will become richer over time	19:25
fungi	awesome	19:25
corvus	i think that's probably it from my pov	19:25
clarkb	sounds good, thanks	19:25
clarkb	#topic Nodepool Disk Utilization	19:26
fungi	hopefully we're in the clear here now	19:26
fungi	and least for a while	19:27
clarkb	yup you just expanded the disks on both builders to 2TB which is plenty	19:27
fungi	yeah, i added a 1tb volume to each of nb01 and nb02 and expanded /opt to twice its earlier size	19:27
clarkb	I wasn't sure where this was when I put the agenda togethe rso wanted to make sure it was brought up	19:27
fungi	probably safe to take off the agenda now, unless we want to monitor things for a while	19:28
clarkb	agreed. I think this was the solution. Thank you for taking care of it	19:28
fungi	np	19:28
clarkb	#topic Open Discussion	19:29
clarkb	Looks like https://review.opendev.org/c/zuul/zuul-jobs/+/858961 has the +2's it needs. Are ya'll good with me approving when I can keep an eye on it?	19:30
clarkb	If so I can probably do that tomorrow (today I've got to get the kids from schoo land stuff so my afternoon won't be attached to a keyboard the whole time)	19:30
fungi	yeah, i say go for it	19:30
fungi	volume for reprepro mirror of ceph quincy deb packages has been created, i'm just waiting to see if the jobs run clean on 858961 now	19:30
clarkb	oh I was also going to land the rocky 9 arm64 image if we are happy wit hit	19:31
fungi	yeah, sounds good, thanks	19:31
clarkb	that needs less attention if it goes wrong	19:31
clarkb	unless it kills a launcher I suppose but our testing should catch that now	19:31
fungi	well, a similar change recently did take down nl03	19:31
fungi	or was it nl04?	19:31
clarkb	nl04. But ya I can approv ethat one when I've got more consistent time at a keyboard to check things too	19:32
fungi	anyway, there's certainly some associated risk but yes if we're on top of double-checking after it deploys i think that's fine	19:32
fungi	does anyone know whether we expect system-config-run-base-ansible-devel to ever start passing again?	19:33
clarkb	fungi: yes I think the issue is that bridge is too old for new ansible	19:34
fungi	currently seems to be breaking because we test with python 3.8 and ansible-core needs >=3.9	19:34
fungi	so that makes sense	19:34
clarkb	so once we get the venv stuff done and switched over to a newer bridge in theory it will work	19:34
fungi	awesome	19:35
clarkb	last call for anything else	19:36
* fungi gots nuthin'		19:36
fungi	other than a gnawing hunger for takeout	19:36
clarkb	#endmeeting	19:36
opendevmeet	Meeting ended Tue Sep 27 19:36:52 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	19:36
opendevmeet	Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.html	19:36
opendevmeet	Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.txt	19:36
opendevmeet	Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.log.html	19:36
clarkb	thanks everyone!	19:36
fungi	thanks clarkb!	19:37
clarkb	ok now to find lunch and then rereview the rocky9 change even if I don't approve it now	19:38

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!