clarkb | Meeting time! | 19:00 |
---|---|---|
fungi | ohai! | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jul 26 19:01:06 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
ianw | o/ | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-July/000346.html Our Agenda | 19:01 |
clarkb | I had no announcements so I'm just going to dive right into the topic list | 19:01 |
clarkb | #topic Topics | 19:01 |
clarkb | #topic Improving CD throughput | 19:02 |
clarkb | I'm not aware of any changes to this since the last meeting, but wanted to make sure I wasn't overlooking anything important or actionable | 19:02 |
clarkb | Sounds like there aren't any updates from others either | 19:03 |
clarkb | #topic Updating Grafana Management Tooling | 19:03 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-July/000342.html | 19:03 |
clarkb | #link https://review.opendev.org/q/topic:grafana-json | 19:04 |
clarkb | Thank you ianw for putting this together | 19:04 |
clarkb | I've managed to review the stack and had a few pieces of feedback but overall I think this looks good | 19:04 |
ianw | yes thanks, i need to respond to your comments | 19:05 |
clarkb | I suspect once we clarify a few of those things a second reviewer should be able to land them if second review is happy too | 19:05 |
ianw | one was about having two jobs; they probably could be combined. one is an explicit syntax check | 19:05 |
clarkb | ya I +2'd them all as my feedback was all minor and I'd be happy to address things in a followup if we decide that addressing those items is a good idea | 19:06 |
clarkb | I guess that is probably all there is to say on this :) second reviewer would be great if anyone else has time | 19:07 |
clarkb | #topic Bastion Host Updates | 19:07 |
clarkb | We discovered recently that Zuul leaks console log streaming artifacts (task log files essentially) in /tmp on bridge | 19:08 |
clarkb | I wrote a simple (probably too simple) change to have a periodic cleanup of those files run on bridge. But ianw had the even better idea of udpating zuul to clean up after itself | 19:08 |
clarkb | Considering how long this has been happening I don't think it is an emergency and I can abandon my change while we work to land ianw's fixes to zuul | 19:09 |
clarkb | if anyone is concerned with that plan let me know and I'll work to make my change less bad and it can be a temporary fix | 19:09 |
clarkb | ianw: I also wanted to call out taht corvus made note on the base change of the zuul fix stack that we probably do need tmp reaper functionality in zuul itself too for aborted jobs | 19:09 |
ianw | i just noticed there was some discussion in matrix over adding a periodic cleaner to the zuul-console daemon | 19:09 |
clarkb | yup | 19:10 |
clarkb | I got the impression the current stack can go in as is, but we should look at a followup to close the aborted job gap | 19:10 |
corvus | to be clear, the current behavior is not an oversight in zuul | 19:10 |
clarkb | since the current stack is a strict improvement. It just doesn't fully solve the problem | 19:10 |
ianw | yeah. i guess my concern with that a priori is the same thing that made be feel a bit weird about cleaning it via the cron job, in that it's a global namespace there | 19:10 |
corvus | i think an improvement is fine | 19:10 |
corvus | but it's not like we just forgot to do that | 19:10 |
corvus | we understood that it's nearly impossible to actually remove these files synchronously | 19:11 |
corvus | which is why we expected one of 2 things: either the node disappears, or a tmp reaper/reboot fixes it | 19:11 |
ianw | but we can't really change the name of the file on disk until we are happy enough that there are no zuul_console processes out there looking for the old name | 19:11 |
corvus | ianw's change is an improvement in that it deletes many of the files much of the time, but it's not 100%; the only 100% fix is async tmp cleanup | 19:12 |
clarkb | corvus: right | 19:13 |
clarkb | anyway I think we can improve what we've got for now, then look into further improvement as a followup | 19:13 |
corvus | if we've deleted files in /tmp on bridge, then we've probably got a year of headroom :) | 19:14 |
corvus | hopefully it won't take that long :) | 19:14 |
clarkb | yup I deleted all files older than a month following the rough format used by zuul currently | 19:14 |
clarkb | should be plenty :) | 19:14 |
clarkb | Any other bastion host changes to call out? I think the ansible in a venv work hasn't happened yet as other items have come up | 19:15 |
corvus | ianw: thanks for you work on this -- it's def a good improvement. it also scares me a lot which is why i'm trying to bring up as much info/caveats as possible. | 19:15 |
corvus | not sure if that comes through in text :) | 19:16 |
ianw | corvus: thanks, and yes touching anything related to command:/shell: also worries me :) | 19:17 |
clarkb | #topic Upgrading Bionic servers to Focal/Jammy | 19:18 |
clarkb | #link https://etherpad.opendev.org/p/opendev-bionic-server-upgrades Notes on the work that needs to be done. | 19:18 |
clarkb | I was hoping to spin up a jammy replacement server for something like zp01 late last week then that jammy kernel thing happened | 19:18 |
clarkb | Since then I've also realized that I keep intending on spinning up a prometheus and helping with mailman 3 work. I'm now thinking I'm going to start here by seeing what mailman3 on jammy looks like in CI | 19:19 |
clarkb | I think that kills two birds with one stone as far as spinning up jammy in our configuration management goes. I don't really expect any problems | 19:19 |
fungi | agreed! | 19:20 |
clarkb | But don't let me stop anyone else from chipping away at this either. I think there is enough here to do htat we can work it concurrently :) | 19:20 |
ianw | ++ yes taking any steps helps! :) | 19:21 |
clarkb | If you do find jammy differences that are notable please acll them out (on the etherpad?) | 19:21 |
ianw | do we have system-config-base jobs on jammy yet? | 19:21 |
ianw | that's probably an easy place to start | 19:22 |
clarkb | ianw: we have some jobs primarily for wheel building iirc. I don't know if that made it as far as system-config-base jobs. But that is a good call out | 19:22 |
clarkb | I can probably look at that this afternoon | 19:22 |
clarkb | #topic Zuul job POST_FAILUREs | 19:24 |
clarkb | I haven't heard anyone complaining about these recently, but we did end up landing the base job upate to record log upload target swift locations before uploading | 19:24 |
clarkb | this means if we do start to get reports of these again we can query their logs (on the executor since upload failed) to see where they were uploading to. Then we can check if they are all consistently to a single target | 19:25 |
clarkb | and take the debugging from there. I'm still slightly suspicious the glibc fix may have helped make things better, but only beacuse when we updated glibc the problem seemed to stop being reported. I don't haven't explicitly gone looking for the problem afterwards | 19:25 |
clarkb | it could also be that the log pruning to reduce the total count of files has made a noticeable impact | 19:27 |
clarkb | #topic Service Coordinator Elections | 19:28 |
clarkb | About 6 months ago I pencilled in August 2-16, 2022 as our nomination period. I think that scheduling continue to work and was going to make sure there were no objections here before sending email about it to the service-discuss list today | 19:29 |
clarkb | I'm still happy for someone else to give it a go too :) | 19:29 |
ianw | ... this is the problem with doing the job too well :) | 19:30 |
clarkb | heh. Are you suggesting I should do worse? :P | 19:30 |
clarkb | I won't send the email immediately, so let me know if you've got any objections to that timing and we can take it from there. Otherwise late today (relative to me) I'll get that sent out | 19:31 |
clarkb | #topic Open Discussion | 19:31 |
clarkb | I'm in day 3 of a ~6-8 day heat wave. I really only ever worry about power outages in weather like this or ice storms. Though things seem to be holding up so far. | 19:33 |
clarkb | I've also got a meeting with the works on arm folks thursday at 8am pacific. | 19:34 |
clarkb | ianw: it would be great to have you, but I don't think it is more important than your sleep :) | 19:34 |
clarkb | I should be able to handle it fine, but happy to forward the email with details to anyone else if interested | 19:34 |
ianw | ok, i guess my input is "we like running jobs on arm, and i think it benefits everyone having them" :) | 19:36 |
clarkb | ++ | 19:36 |
ianw | i think i pulled a few stats on the total number of jobs run, i wonder if there's an exact way to tell? | 19:37 |
clarkb | the graphite/grafana numbers are probably the best we've got | 19:37 |
clarkb | We might also be able to scrape the zuul api but I don't think it is very efficient at collecting stats like that through the rest api | 19:38 |
clarkb | (you'd have to iterate through all the jobs you are interested in) | 19:38 |
ianw | yeah, pointing to that is probably the most compelling thing, since you can see it | 19:38 |
ianw | yep, and things like system-config have two nodes, which doubles usage | 19:39 |
corvus | for a one off, you could run a sql query | 19:39 |
clarkb | corvus: oh good point | 19:39 |
clarkb | and ya I agree that the visual data is always good | 19:39 |
clarkb | Anything else before we call it a meeting? | 19:40 |
fungi | i didn't have anything | 19:41 |
clarkb | Sounds like that may be it. Thank you everyone. | 19:42 |
clarkb | We'll be back here next week | 19:42 |
clarkb | #endmeeting | 19:42 |
opendevmeet | Meeting ended Tue Jul 26 19:42:39 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:42 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-26-19.01.html | 19:42 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-26-19.01.txt | 19:42 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-26-19.01.log.html | 19:42 |
fungi | thanks! | 19:43 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!