*** kopecmartin_ is now known as kopecmartin | 07:39 | |
clarkb | Meeting time | 19:00 |
---|---|---|
clarkb | my net work connection (I suspect local wifi) is acting up so forgive me if I disappear at random points | 19:00 |
ianw | o/ | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Jul 5 19:01:20 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
frickler | \o | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-July/000343.html Our Agenda | 19:01 |
clarkb | The agenda did go out a bit late this time. Yesterday was a holiday and I spent more time outside than anticipated | 19:01 |
clarkb | But we do have an agenda I put together today :) | 19:02 |
clarkb | #topic Announcements | 19:02 |
clarkb | I'm going to miss our next meeting (July 12) | 19:02 |
fungi | i'll be around, but also skipping an occasional meeting is fine by me | 19:03 |
ianw | maybe see if anything drastic comes up by eow, i'll be around too | 19:04 |
clarkb | works for me | 19:04 |
clarkb | Also as of today the web server for our mailman lists redirects to and should force the use of https | 19:05 |
clarkb | It seems to be working but making note of that since it is a change | 19:05 |
clarkb | #topic Topics | 19:06 |
clarkb | #topic Improving CD throughput | 19:06 |
clarkb | The fix for the zuul auto upgrade cron appears to have worked. All of the zuul components were running the same version as of early this morning and uptime was a few days | 19:07 |
clarkb | A new manually triggered restart is in progress beacuse zuul's docker image just udpated to python3.10 and we wanted to control observation of that | 19:07 |
clarkb | That too seems to be going well ( a couple executors are already running on python3.10) | 19:08 |
fungi | yay! automate all the things | 19:09 |
clarkb | Please call out if you notice anything unexpected with these changes. | 19:09 |
clarkb | #topic Improving Grafana management tooling | 19:10 |
clarkb | ianw: started a thread on this so that we can accumulate any concerns and thoughts there. | 19:11 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-July/000342.html | 19:11 |
clarkb | I'm hoping that we can keep the discussion on the mailing list (and in code reviews) as much as possible so that we can run down any issues and concerns without needing to remind ourselves what tehy are from previous meeting discussions | 19:12 |
fungi | the dashboard preview screenshots in build results are awesome, btw | 19:12 |
clarkb | All that to say, lets try not to dive into details in the meeting today. But if there are specific things that need synchronous comms now is a good time to bring them up. Otherwise please read the email and review the changes and respond there with feedback | 19:12 |
clarkb | ianw: before I continue on anything to call out? | 19:13 |
ianw | nope; all there (and end-to-end tested now :) | 19:14 |
clarkb | thank you for putting it together. I still need to read through it and look at the examples and all that myself | 19:15 |
clarkb | #topic Run a custom url shortener service | 19:15 |
clarkb | frickler: ^ anything new on this item? | 19:15 |
frickler | no, I think we should drop this from the agenda and I'll readd when I make progress | 19:15 |
clarkb | ok will do | 19:15 |
clarkb | #topic Zuul job POST_FAILURES | 19:16 |
clarkb | There are two changes related to this that may aid in debugging. | 19:16 |
clarkb | #link https://review.opendev.org/c/zuul/zuul/+/848014 Report POST_FAILURE job timing to graphite | 19:16 |
clarkb | This first one is probably less important, but zuul currently doesn't report job timing on POST_FAILUREs | 19:16 |
clarkb | havingthat info might help us better identify trends (like are these failures often teimout related) | 19:17 |
clarkb | The other records the swift upload target before we upload which means we should get that info even if the task timesouts | 19:17 |
clarkb | #link https://review.opendev.org/c/opendev/base-jobs/+/848027 Add remote log store location debugging info to base jobs | 19:17 |
clarkb | This second change deserves care and in depth review to ensure we don't disclose anything we don't want to disclose. But I think the content of the change is safe now and constructed to avoid anything unexpected. jrosser even pasted and example that emulates what the chagne does to show this | 19:18 |
clarkb | if we are confident in that I think we can go ahead and land the base-test update and check that it does what we want before landing it on the proper base job | 19:18 |
clarkb | I hesitate to approve that myself as I'll be popping out in a day and a half, but if someone else is able to push that over the hump and monitor it that would be great | 19:19 |
fungi | thanks, i meant to follow up on that one. will look again after the meeting | 19:19 |
clarkb | thanks. | 19:19 |
fungi | i plan to be around all evening anyway | 19:19 |
clarkb | fungi: ya my main concern is being able to shepherd the main base job through when we are ready for it. | 19:19 |
clarkb | Other than these two chagnes I'm not aware of any other fixes other than our suggestion to the projects to be more careful about what they log. | 19:20 |
clarkb | Avoid deep nesting, avoid duplicating log files either beacuse the same file is copied multiple times by a job or because we're copying stuff off the base OS install that is always identical and so on | 19:20 |
fungi | yeah, it's worth reiterating that the vast majority have occurred for two projects in the openstack tenant we know do massive amounts of logging, so are already pushing this to its limits | 19:21 |
clarkb | Is anyone else aware of any changes here? I suspect the problem may have self corrected a bit since there is less complaining about it. But we should be ready if it happens again | 19:21 |
fungi | it's already come and gone at least once | 19:21 |
fungi | so there's probably some transient environmental variable involved | 19:21 |
fungi | (api in some provider getting bogged down, network overloaded near our executors, et cetera) | 19:22 |
fungi | but having more data is the next step before we can refine our theories | 19:22 |
clarkb | ++ | 19:24 |
clarkb | #topic Bastion Host Updates | 19:24 |
clarkb | I wanted to follow up on this to make sure I wans't missing any progress towards shifting things into a venv | 19:24 |
clarkb | ianw: ^ are there changes for that to call out yet? | 19:24 |
ianw | nope, haven't pushed those yet, will soon | 19:24 |
clarkb | great just double checking | 19:25 |
clarkb | Separately it has got me thinking we should look at bionic -> focal and/or jammy upgrades for servers. In theory this is a lot more straightforward now for anything that has been containerized | 19:25 |
clarkb | On Friday I ran into a fun situation with new setuptools and virtualenv on Jammy not actually creating proper venvs on jammy | 19:25 |
clarkb | You end up with a venv that if used installs to the root of the system | 19:26 |
clarkb | effectively a noop venv. Just binaries living at a different path | 19:26 |
ianw | oh there's your problem, you expected setuptools and virtualenv and venv to work :) | 19:26 |
clarkb | Calling that out as it may impact upgrades of services if they rely on virtualenvs on the root system. I think the vast majority of stuff is in containers and shouldn't be affected though | 19:26 |
clarkb | When I get back from this trip I would like to put together a todo list for these upgrades and start figuring out the scale nad scope of them. | 19:27 |
fungi | sounds great | 19:27 |
clarkb | Anyway the weird ajmmy behavior and discussion about bridge upgrades with a new server or in place got me thinking about the broader need. I'll see what that looks like and put something together | 19:28 |
clarkb | #topic Open Discussion | 19:28 |
fungi | thanks! | 19:28 |
clarkb | That was it for the agenda I put together quickly today. Is there anything else to bring up? | 19:28 |
clarkb | ianw: I think you are working on adding a "proper" CA to our fake SSL certs in test jobs so that we can drop the curl --insecure flag? | 19:29 |
ianw | yes | 19:29 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/848562 | 19:29 |
ianw | pending zuul; i think that's ready for review now | 19:29 |
clarkb | was there a specific need for that or more just cleaning up the dirty --insecure flag | 19:29 |
ianw | (i only changed comments so i expect CI to work) | 19:29 |
ianw | it actually was a yak shaving exercise that came out of another change | 19:30 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/845316 | 19:30 |
ianw | (which is also ready for review :) | 19:30 |
fungi | for those who missed the status log addition, all our mailman sites are strictly https now. should be transparent, though clarkb remarked on a possible content cache problem returning stale contents, so keep an eye out | 19:30 |
ianw | that redirects the haproxy logs to a separate file instead of going into syslog on the loadbalancer | 19:30 |
frickler | I saw the xenial builds failing earlier. not sure how much effort to put into debugging and fixing or whether it could be time to retire them | 19:31 |
ianw | however, i wanted to improve end-to-end testing of that, but i'm 99% sure that we don't pass data through the load-balancer during the testing | 19:31 |
fungi | clarkb surmised it's old python stdlib not satisfying sni requirements for pypi | 19:31 |
clarkb | thats just a hunch. We should check if the builds consistently fail. But ya pausing them and putting people on notice about that may be a good idea | 19:32 |
ianw | builds as in nodepool builds? | 19:32 |
fungi | yeah | 19:32 |
clarkb | ianw: ya the nodepool dib builds fail in the chroot trying to install os-testr and failing to find a version of pbr that is good enough | 19:32 |
clarkb | that is behavior we've seen from distutils in the past when SNI is required by pypi but the installer doesn't speak it (which I think is true for xenial but can be double checked) | 19:32 |
ianw | ok i'll take a look. first thought is we had workarounds for that but who knows | 19:32 |
frickler | you can nicely spot this by the wiggly graph in https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=1 | 19:33 |
fungi | always be on the lookout for wiggly graphs, that's my motto | 19:33 |
clarkb | I do think we should be telling openstack that they should stop relying on the images at this point (we brought it up with them a while back but not sure much movement happened) and start removing our own uses of them | 19:33 |
ianw | ++ | 19:33 |
clarkb | another user was windmill which i have since removed from zuul's tenant config for other erasons | 19:34 |
fungi | i'll have to take a look and see when 18.04 lts last appeared in the openstack pti | 19:34 |
clarkb | fungi: 16.04 is xenial | 19:34 |
fungi | er, right | 19:34 |
fungi | so long ago is the answer to that | 19:34 |
ianw | (one of the things i played with, with the new grafana was a heatmap of dib builds -> https://68acf0be36b12a32a6a5-4c67d681abf13f527cb7d280eb684e4e.ssl.cf2.rackcdn.com/848212/3/check/project-config-grafana/24fe236/screenshots/nodepool-dib-status.png) | 19:35 |
fungi | the openstack pti is published all the way back to the stein release and says 18.04, so worst case it's used by stable/stein grenade jobs | 19:35 |
fungi | i'll check with elod for a second pair of eyes, but i expect we can yank it as far as openstack is officially concerned | 19:36 |
clarkb | fungi: I think frickler put together an etherpad of users | 19:36 |
fungi | oh, all the better | 19:36 |
clarkb | #link https://etherpad.opendev.org/p/ubuntu-xenial-jobs | 19:37 |
frickler | oh, I had forgotten about that :-) | 19:37 |
clarkb | I annotated it with notes at the time. We can probably psuh on some of those and remove the usages for things like gear | 19:37 |
clarkb | oh that also notes it is only a master branch audit | 19:38 |
fungi | in theory, rocky was the last release to use xenial since it was released august 2018 and we'd have had bionic images long available for stein testing | 19:38 |
clarkb | so ya old openstack may hide extra bits of it | 19:38 |
frickler | master only because I used codesearch, yes | 19:38 |
clarkb | But I do think we are very close to being able to remove it without too much fallout. Getting to that point would be great | 19:39 |
fungi | and grenade jobs aren't supported for branches in extended maintenance, so we should be long clear to drop xenial images in openstack | 19:39 |
frickler | I'll look into devstack(+gate) in more detail | 19:39 |
ianw | fungi: do you want to get back to me and i'm happy to drive an email to openstack with a date and then start working towards it | 19:39 |
fungi | ianw: absolutely, once i hear something from elod (which i expect to be a thumbs-up) i'll let you know | 19:40 |
clarkb | Anything else? to cover? | 19:40 |
ianw | ++, i can then draft something, and i think the way to get this moving is to give ourselves a self-imposed deadline | 19:40 |
fungi | openstack stable/wallaby is the oldest coordinated branch under maintenance now anyway | 19:40 |
clarkb | er I fail at punctuation | 19:40 |
clarkb | Sounds like that may be it. Thank you everyone. I'll let you go to enjoy your morning/evening/$meal :) | 19:43 |
corvus | thanks clarkb ! | 19:44 |
clarkb | Feel free to continue discussion in #opendev or on the mailing list if necessary | 19:44 |
clarkb | #endmeeting | 19:44 |
opendevmeet | Meeting ended Tue Jul 5 19:44:22 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 19:44 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-05-19.01.html | 19:44 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-05-19.01.txt | 19:44 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-05-19.01.log.html | 19:44 |
fungi | thanks clarkb! | 19:44 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!