Tuesday, 2022-07-05

*** kopecmartin_ is now known as kopecmartin07:39
clarkbMeeting time19:00
clarkbmy net work connection (I suspect local wifi) is acting up so forgive me if I disappear at random points19:00
ianwo/19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Jul  5 19:01:20 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
frickler\o19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-July/000343.html Our Agenda19:01
clarkbThe agenda did go out a bit late this time. Yesterday was a holiday and I spent more time outside than anticipated19:01
clarkbBut we do have an agenda I put together today :)19:02
clarkb#topic Announcements19:02
clarkbI'm going to miss our next meeting (July 12)19:02
fungii'll be around, but also skipping an occasional meeting is fine by me19:03
ianwmaybe see if anything drastic comes up by eow, i'll be around too19:04
clarkbworks for me19:04
clarkbAlso as of today the web server for our mailman lists redirects to and should force the use of https19:05
clarkbIt seems to be working but making note of that since it is a change19:05
clarkb#topic Topics19:06
clarkb#topic Improving CD throughput19:06
clarkbThe fix for the zuul auto upgrade cron appears to have worked. All of the zuul components were running the same version as of early this morning and uptime was a few days19:07
clarkbA new manually triggered restart is in progress beacuse zuul's docker image just udpated to python3.10 and we wanted to control observation of that19:07
clarkbThat too seems to be going well ( a couple executors are already running on python3.10)19:08
fungiyay! automate all the things19:09
clarkbPlease call out if you notice anything unexpected with these changes.19:09
clarkb#topic Improving Grafana management tooling19:10
clarkbianw: started a thread on this so that we can accumulate any concerns and thoughts there.19:11
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-July/000342.html19:11
clarkbI'm hoping that we can keep the discussion on the mailing list (and in code reviews) as much as possible so that we can run down any issues and concerns without needing to remind ourselves what tehy are from previous meeting discussions19:12
fungithe dashboard preview screenshots in build results are awesome, btw19:12
clarkbAll that to say, lets try not to dive into details in the meeting today. But if there are specific things that need synchronous comms now is a good time to bring them up. Otherwise please read the email and review the changes and respond there with feedback19:12
clarkbianw: before I continue on anything to call out?19:13
ianwnope; all there (and end-to-end tested now :)19:14
clarkbthank you for putting it together. I still need to read through it and look at the examples and all that myself19:15
clarkb#topic Run a custom url shortener service19:15
clarkbfrickler: ^ anything new on this item?19:15
fricklerno, I think we should drop this from the agenda and I'll readd when I make progress19:15
clarkbok will do19:15
clarkb#topic Zuul job POST_FAILURES19:16
clarkbThere are two changes related to this that may aid in debugging.19:16
clarkb#link https://review.opendev.org/c/zuul/zuul/+/848014 Report POST_FAILURE job timing to graphite19:16
clarkbThis first one is probably less important, but zuul currently doesn't report job timing on POST_FAILUREs19:16
clarkbhavingthat info might help us better identify trends (like are these failures often teimout related)19:17
clarkbThe other records the swift upload target before we upload which means we should get that info even if the task timesouts19:17
clarkb#link https://review.opendev.org/c/opendev/base-jobs/+/848027 Add remote log store location debugging info to base jobs19:17
clarkbThis second change deserves care and in depth review to ensure we don't disclose anything we don't want to disclose. But I think the content of the change is safe now and constructed to avoid anything unexpected. jrosser even pasted and example that emulates what the chagne does to show this19:18
clarkbif we are confident in that I think we can go ahead and land the base-test update and check that it does what we want before landing it on the proper base job19:18
clarkbI hesitate to approve that myself as I'll be popping out in a day and a half, but if someone else is able to push that over the hump and monitor it that would be great19:19
fungithanks, i meant to follow up on that one. will look again after the meeting19:19
clarkbthanks.19:19
fungii plan to be around all evening anyway19:19
clarkbfungi: ya my main concern is being able to shepherd the main base job through when we are ready for it.19:19
clarkbOther than these two chagnes I'm not aware of any other fixes other than our suggestion to the projects to be more careful about what they log.19:20
clarkbAvoid deep nesting, avoid duplicating log files either beacuse the same file is copied multiple times by a job or because we're copying stuff off the base OS install that is always identical and so on19:20
fungiyeah, it's worth reiterating that the vast majority have occurred for two projects in the openstack tenant we know do massive amounts of logging, so are already pushing this to its limits19:21
clarkbIs anyone else aware of any changes here? I suspect the problem may have self corrected a bit since there is less complaining about it. But we should be ready if it happens again19:21
fungiit's already come and gone at least once19:21
fungiso there's probably some transient environmental variable involved19:21
fungi(api in some provider getting bogged down, network overloaded near our executors, et cetera)19:22
fungibut having more data is the next step before we can refine our theories19:22
clarkb++19:24
clarkb#topic Bastion Host Updates19:24
clarkbI wanted to follow up on this to make sure I wans't missing any progress towards shifting things into a venv19:24
clarkbianw: ^ are there changes for that to call out yet?19:24
ianwnope, haven't pushed those yet, will soon19:24
clarkbgreat just double checking19:25
clarkbSeparately it has got me thinking we should look at bionic -> focal and/or jammy upgrades for servers. In theory this is a lot more straightforward now for anything that has been containerized19:25
clarkbOn Friday I ran into a fun situation with new setuptools and virtualenv on Jammy not actually creating proper venvs on jammy19:25
clarkbYou end up with a venv that if used installs to the root of the system19:26
clarkbeffectively a noop venv. Just binaries living at a different path19:26
ianwoh there's your problem, you expected setuptools and virtualenv and venv to work :)19:26
clarkbCalling that out as it may impact upgrades of services if they rely on virtualenvs on the root system. I think the vast majority of stuff is in containers and shouldn't be affected though19:26
clarkbWhen I get back from this trip I would like to put together a todo list for these upgrades and start figuring out the scale nad scope of them.19:27
fungisounds great19:27
clarkbAnyway the weird ajmmy behavior and discussion about bridge upgrades with a new server or in place got me thinking about the broader need. I'll see what that looks like and put something together19:28
clarkb#topic Open Discussion19:28
fungithanks!19:28
clarkbThat was it for the agenda I put together quickly today. Is there anything else to bring up?19:28
clarkbianw: I think you are working on adding a "proper" CA to our fake SSL certs in test jobs so that we can drop the curl --insecure flag?19:29
ianwyes19:29
ianw#link https://review.opendev.org/c/opendev/system-config/+/84856219:29
ianwpending zuul; i think that's ready for review now19:29
clarkbwas there a specific need for that or more just cleaning up the dirty --insecure flag19:29
ianw(i only changed comments so i expect CI to work)19:29
ianwit actually was a yak shaving exercise that came out of another change19:30
ianw#link https://review.opendev.org/c/opendev/system-config/+/84531619:30
ianw(which is also ready for review :)19:30
fungifor those who missed the status log addition, all our mailman sites are strictly https now. should be transparent, though clarkb remarked on a possible content cache problem returning stale contents, so keep an eye out19:30
ianwthat redirects the haproxy logs to a separate file instead of going into syslog on the loadbalancer19:30
fricklerI saw the xenial builds failing earlier. not sure how much effort to put into debugging and fixing or whether it could be time to retire them19:31
ianwhowever, i wanted to improve end-to-end testing of that, but i'm 99% sure that we don't pass data through the load-balancer during the testing19:31
fungiclarkb surmised it's old python stdlib not satisfying sni requirements for pypi19:31
clarkbthats just a hunch. We should check if the builds consistently fail. But ya pausing them and putting people on notice about that may be a good idea19:32
ianwbuilds as in nodepool builds?19:32
fungiyeah19:32
clarkbianw: ya the nodepool dib builds fail in the chroot trying to install os-testr and failing to find a version of pbr that is good enough19:32
clarkbthat is behavior we've seen from distutils in the past when SNI is required by pypi but the installer doesn't speak it (which I think is true for xenial but can be double checked)19:32
ianwok i'll take a look.  first thought is we had workarounds for that but who knows19:32
frickleryou can nicely spot this by the wiggly graph in https://grafana.opendev.org/d/f3089338b3/nodepool-dib-status?orgId=119:33
fungialways be on the lookout for wiggly graphs, that's my motto19:33
clarkbI do think we should be telling openstack that they should stop relying on the images at this point (we brought it up with them a while back but not sure much movement happened) and start removing our own uses of them19:33
ianw++19:33
clarkbanother user was windmill which i have since removed from zuul's tenant config for other erasons19:34
fungii'll have to take a look and see when 18.04 lts last appeared in the openstack pti19:34
clarkbfungi: 16.04 is xenial19:34
fungier, right19:34
fungiso long ago is the answer to that19:34
ianw(one of the things i played with, with the new grafana was a heatmap of dib builds -> https://68acf0be36b12a32a6a5-4c67d681abf13f527cb7d280eb684e4e.ssl.cf2.rackcdn.com/848212/3/check/project-config-grafana/24fe236/screenshots/nodepool-dib-status.png)19:35
fungithe openstack pti is published all the way back to the stein release and says 18.04, so worst case it's used by stable/stein grenade jobs19:35
fungii'll check with elod for a second pair of eyes, but i expect we can yank it as far as openstack is officially concerned19:36
clarkbfungi: I think frickler put together an etherpad of users19:36
fungioh, all the better19:36
clarkb#link https://etherpad.opendev.org/p/ubuntu-xenial-jobs19:37
frickleroh, I had forgotten about that :-)19:37
clarkbI annotated it with notes at the time. We can probably psuh on some of those and remove the usages for things like gear19:37
clarkboh that also notes it is only a master branch audit19:38
fungiin theory, rocky was the last release to use xenial since it was released august 2018 and we'd have had bionic images long available for stein testing19:38
clarkbso ya old openstack may hide extra bits of it19:38
fricklermaster only because I used codesearch, yes19:38
clarkbBut I do think we are very close to being able to remove it without too much fallout. Getting to that point would be great19:39
fungiand grenade jobs aren't supported for branches in extended maintenance, so we should be long clear to drop xenial images in openstack19:39
fricklerI'll look into devstack(+gate) in more detail19:39
ianwfungi: do you want to get back to me and i'm happy to drive an email to openstack with a date and then start working towards it19:39
fungiianw: absolutely, once i hear something from elod (which i expect to be a thumbs-up) i'll let you know19:40
clarkbAnything else? to cover?19:40
ianw++, i can then draft something, and i think the way to get this moving is to give ourselves a self-imposed deadline19:40
fungiopenstack stable/wallaby is the oldest coordinated branch under maintenance now anyway19:40
clarkber I fail at punctuation19:40
clarkbSounds like that may be it. Thank you everyone. I'll let you go to enjoy your morning/evening/$meal :)19:43
corvusthanks clarkb !19:44
clarkbFeel free to continue discussion in #opendev or on the mailing list if necessary19:44
clarkb#endmeeting19:44
opendevmeetMeeting ended Tue Jul  5 19:44:22 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)19:44
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-05-19.01.html19:44
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-05-19.01.txt19:44
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-07-05-19.01.log.html19:44
fungithanks clarkb!19:44

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!