clarkb | meeting time | 19:00 |
---|---|---|
clarkb | it will be a short one since I think it will likely just be a couple of us | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Sep 27 19:01:24 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-September/000362.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:02 |
clarkb | No announcements. I guess keep in mind the openstack release is happening soon | 19:03 |
clarkb | then in a couple? maybe three? weeks the PTG is happening virtually | 19:03 |
clarkb | #topic Mailman 3 | 19:03 |
fungi | yet still moar progress | 19:04 |
fungi | migration script is now included in the implementation change | 19:04 |
fungi | and it's been updated to also cover the bits to move the backward-compat copies of the old archives to where we'll serve them from, and relink them properly | 19:05 |
fungi | we have a handful of mailing lists with some fields which are too large for their db columns, but i'll fix those in production and then we can do another (perhaps final) full migration test | 19:05 |
fungi | i think we're close to scheduling migration cut-over for lists.opendev.org and lists.zuul-ci.org | 19:06 |
clarkb | is that the last thing we need to do bfeore holding a node and doing a migration test again? | 19:06 |
fungi | afaik yes | 19:06 |
clarkb | sounds great | 19:06 |
fungi | i noted the specific lists and fields at the end of th eetherpad in the todo list | 19:07 |
fungi | #link https://etherpad.opendev.org/p/mm3migration | 19:07 |
fungi | maybe next meeting we'll be in a position to talk about maintenance scheduling | 19:08 |
clarkb | and the disk filling was due to unbound and not something we expect to be a problem in production right? | 19:08 |
clarkb | because unbound on our test nodes has more verbose logging settings | 19:08 |
fungi | correct, we apparently set unbound up for very verbose logging on our test nodes | 19:08 |
fungi | the risks of holding a test node and letting it sit there for weeks | 19:09 |
clarkb | exciting and let me know if there is anyhting else I can help with on this. I'e been trying to make sure I help keep it moving forward | 19:10 |
clarkb | anything else? | 19:10 |
fungi | you said the ci failures on the lynx addition pr are bitrot, right? | 19:10 |
fungi | #link https://github.com/maxking/docker-mailman/pull/552 | 19:10 |
clarkb | yes I'm pretty sure that was a upstream change to mailman3 iirc that broke it | 19:11 |
clarkb | something to do with translations | 19:11 |
fungi | k | 19:11 |
clarkb | but that is a good reminder we should probably decide if we need lynx installed and do our own images or try to reach out to upstream more | 19:11 |
fungi | looks like that repo has merged no new commits to its main branch since june | 19:12 |
clarkb | I wonder if anyone knows the maintainer | 19:13 |
fungi | anyway, i don't have anything else on this topic | 19:15 |
clarkb | #topic jaeger tracing server | 19:15 |
clarkb | The server is up and running | 19:15 |
corvus | it's up, lgtm | 19:15 |
fungi | certainly a thing we like to see in our servers ;) | 19:15 |
clarkb | we've also updated the zuul config to send tracing info to the jaeger server | 19:15 |
corvus | we're not sending data to it yet, but we should be configured to do so once we restart zuul with the changes that start exporting traces | 19:15 |
fungi | is that waiting for the weekend, or should we restart sooner? | 19:16 |
clarkb | restarting sooner is easy to do with the playbook if the changes have landed to zuul | 19:17 |
corvus | i'm inclined to just let it happen over the weekend.... but i dunno, maybe a restart of the schedulers before then might be interesting? | 19:17 |
corvus | that might get us some traces without restarting the schedulers | 19:17 |
fungi | without restarting the executors? | 19:18 |
corvus | yep that, sorry :) | 19:18 |
fungi | cool, just making sure i understood! | 19:18 |
fungi | that sounds fine to me. i can probably do that after dinner if we're ready | 19:18 |
corvus | would be a quick sanity check, maybe if there's an issue we can fix it before the weekend. | 19:19 |
corvus | well, i don't think the zuul stack has merged up to where i'd like it to yet | 19:19 |
fungi | which changes are still outstanding? | 19:20 |
corvus | there's a good chunk approved, but they hit some failures | 19:20 |
corvus | i just rechecked them | 19:20 |
fungi | ahh | 19:20 |
corvus | #link https://review.opendev.org/858372 initial retracing change tip | 19:20 |
corvus | i think a partial (or full) restart after that makes sense | 19:20 |
corvus | so that's the one to watch for. i can ping when it lands | 19:20 |
fungi | okay, so we're clear to restart schedulers once there are images built from 858372? | 19:21 |
corvus | ya. sounds like a plan | 19:21 |
fungi | (and tagged latest in promote of course) | 19:21 |
fungi | cool, i've starred that so i should get notifications for it | 19:21 |
clarkb | and once that restart happens we'd expect to see traces in jaeger. Any issue with potentially orphaned traces since all components won't be restarted yet? | 19:22 |
clarkb | (thinking about this from a rolling upgrade standpoint). I guess the jaeger server might hold onto partial info that we can't view but otherwise its fine? | 19:22 |
corvus | i don't think so - in my testing jaeger just shows a little error message as a sort of footnote to the trace | 19:22 |
corvus | so it knows there's missing info, but it's no big deal | 19:22 |
corvus | and eventually things start looking more complete | 19:23 |
fungi | i could do a rolling scheduler restart, and then follow it up with a full restart, if that's preferable | 19:23 |
fungi | full rolling restart of all components i mean | 19:23 |
corvus | fungi: not worth it for this -- no matter what, there will be partial/orphaned traces. that's sort of intentional in order to keep the complexity of the changes under control | 19:23 |
fungi | okay, wfm | 19:24 |
corvus | also (schedulers are generally emitting the highest level traces, so we'll just be missing children, not parents, once all the schedulers are restarted) | 19:24 |
corvus | then over the next days/weeks, we'll be landing more changes to zuul to add more traces, so data will become richer over time | 19:25 |
fungi | awesome | 19:25 |
corvus | i think that's probably it from my pov | 19:25 |
clarkb | sounds good, thanks | 19:25 |
clarkb | #topic Nodepool Disk Utilization | 19:26 |
fungi | hopefully we're in the clear here now | 19:26 |
fungi | and least for a while | 19:27 |
clarkb | yup you just expanded the disks on both builders to 2TB which is plenty | 19:27 |
fungi | yeah, i added a 1tb volume to each of nb01 and nb02 and expanded /opt to twice its earlier size | 19:27 |
clarkb | I wasn't sure where this was when I put the agenda togethe rso wanted to make sure it was brought up | 19:27 |
fungi | probably safe to take off the agenda now, unless we want to monitor things for a while | 19:28 |
clarkb | agreed. I think this was the solution. Thank you for taking care of it | 19:28 |
fungi | np | 19:28 |
clarkb | #topic Open Discussion | 19:29 |
clarkb | Looks like https://review.opendev.org/c/zuul/zuul-jobs/+/858961 has the +2's it needs. Are ya'll good with me approving when I can keep an eye on it? | 19:30 |
clarkb | If so I can probably do that tomorrow (today I've got to get the kids from schoo land stuff so my afternoon won't be attached to a keyboard the whole time) | 19:30 |
fungi | yeah, i say go for it | 19:30 |
fungi | volume for reprepro mirror of ceph quincy deb packages has been created, i'm just waiting to see if the jobs run clean on 858961 now | 19:30 |
clarkb | oh I was also going to land the rocky 9 arm64 image if we are happy wit hit | 19:31 |
fungi | yeah, sounds good, thanks | 19:31 |
clarkb | that needs less attention if it goes wrong | 19:31 |
clarkb | unless it kills a launcher I suppose but our testing should catch that now | 19:31 |
fungi | well, a similar change recently did take down nl03 | 19:31 |
fungi | or was it nl04? | 19:31 |
clarkb | nl04. But ya I can approv ethat one when I've got more consistent time at a keyboard to check things too | 19:32 |
fungi | anyway, there's certainly some associated risk but yes if we're on top of double-checking after it deploys i think that's fine | 19:32 |
fungi | does anyone know whether we expect system-config-run-base-ansible-devel to ever start passing again? | 19:33 |
clarkb | fungi: yes I think the issue is that bridge is too old for new ansible | 19:34 |
fungi | currently seems to be breaking because we test with python 3.8 and ansible-core needs >=3.9 | 19:34 |
fungi | so that makes sense | 19:34 |
clarkb | so once we get the venv stuff done and switched over to a newer bridge in theory it will work | 19:34 |
fungi | awesome | 19:35 |
clarkb | last call for anything else | 19:36 |
* fungi gots nuthin' | 19:36 | |
fungi | other than a gnawing hunger for takeout | 19:36 |
clarkb | #endmeeting | 19:36 |
opendevmeet | Meeting ended Tue Sep 27 19:36:52 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:36 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.html | 19:36 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.txt | 19:36 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-09-27-19.01.log.html | 19:36 |
clarkb | thanks everyone! | 19:36 |
fungi | thanks clarkb! | 19:37 |
clarkb | ok now to find lunch and then rereview the rocky9 change even if I don't approve it now | 19:38 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!