Tuesday, 2022-03-01

*** Guest721 is now known as diablo_rojo_phone15:47
clarkbhello it is meeting time19:00
fricklero/19:00
fungiohai19:00
clarkb#startmeeting infra19:01
opendevmeetMeeting started Tue Mar  1 19:01:06 2022 UTC and is due to finish in 60 minutes.  The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot.19:01
opendevmeetUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.19:01
opendevmeetThe meeting name has been set to 'infra'19:01
clarkb#link https://lists.opendev.org/pipermail/service-discuss/2022-March/000323.html Our Agenda19:01
clarkb#topic Announcements19:01
clarkbFirst up I won't be able to run next weeks meeting as I have other meetings19:01
ianwo/19:01
fungii expect to be in a similar situation19:02
clarkbI've proposed we skip it in the agenda, but if others want a meeting feel free to update the agenda and send it out. I just won't be able to participate19:02
fricklerI won't travel but have a holiday, so fine with skipping19:02
ianwi can host it, but if fungi is out too probably ok to skip19:02
clarkbcool consider it skipped then19:03
clarkb#topic Actions from last meeting19:03
clarkb#link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-22-19.01.txt minutes from last meeting19:03
clarkbThere were no actions recorded19:03
clarkb#topic Topics19:03
clarkbTime to dive in19:03
clarkb#topic Improving CD throughput19:03
clarkbianw: Did all the logs changes end up landing?19:03
clarkbthat info is now available to us via zuul if we add our gpg keys ya?19:04
ianwnot yet19:04
ianw#link https://review.opendev.org/c/opendev/system-config/+/83078419:04
ianwis the one that turns it on globally -- i was hoping for some extra reviews on that but i'm happy to babysit it19:04
ianw#link https://review.opendev.org/c/opendev/system-config/+/83078519:05
clarkbah19:05
clarkbI guess I saw it only on the codesearch jobs19:05
ianwis a doc update, that turned into quite a bit more than just how to add your gpg keys but it's in there19:05
clarkbya seemed like good improvements to the docs19:05
ianwi felt like we were all ok with it, but didn't want to single approve 830784 just in case19:06
clarkb++19:06
clarkbGood to get lots of eyeballs on changes like this19:06
clarkbAnything else on this topic?19:07
fungioh, i totally missed 830785, thanks19:07
fungii approved 830784 now19:08
clarkb#topic Container Maintenance19:08
ianwnope, that's it for now, thanks19:08
clarkbjentoio: and I met up last week and discussed what needed to be done for giving containers dedicated users. We decided to look at updating insecure ci registry to start since it doesn't write to the fs19:09
clarkbThat will help get the shape of things in place before we tackle some of the more complicated containers19:09
jentoioI'll be working on zuul-registry this afternoon19:09
clarkbjentoio: thanks again!19:09
jentoiofinaly allocated some time to focus on it19:10
clarkbso ya some progress here. Reviews appreciated once we have changes19:10
clarkb#topic Cleaning Up Old Reviews19:11
clarkb#link https://review.opendev.org/q/topic:retirement+status:open Changes to retire all unused the repos.19:11
clarkbWe're basically at actually retiring content from the repos. I would appreciate if we could start picking some of those off. I don't think they are difficult reviews, you should be able to fetch them and ls the repo contents to make sure nothing got left behind andcheck the README content but there are a lot of them19:11
clarkbOnce those changes land I'll do bulk abandons for open changes on those repos and then push up the zuul removal cleanups19:12
clarkbas well as gerrit acl updates19:12
ianw++ sorry i meant to look at them, will do19:12
clarkbthanks!19:13
fungiyeah, same. i'm going to blame the fact that i had unsubbed from those repos in gertty a while back19:13
clarkb#topic Gitea 1.1619:14
clarkbI think it is worth waiting until 1.16.3 releases and then make a push to upgrade19:15
clarkbhttps://github.com/go-gitea/gitea/milestone/113 indicates the 1.16.3 release should happen soon. Only one remaining issue left to address19:15
clarkbThe reason for this is that of the issues we've discovered (diff rendering of images and build inconsistencies with a dep) they should both be fixed by 1.16.319:15
clarkbI think we can hold off on reviewing things for now untikl 1.16.3 happens. I'll get a change pushed for that and update our hold to inspect it19:16
clarkbMostly a heads up that I haven't forgotten about this, but waiting until it stablizes a bit more19:16
clarkb#topic Rocky Linux19:17
clarkb#link https://review.opendev.org/c/zuul/nodepool/+/831108 Need new dib for next steps19:17
clarkbThis is a nodepool change that will upgrade dib in our builder images19:17
clarkbThis new dib should address new issues found with Rocky Linux builds19:17
clarkbPlease review that if you get a chance19:18
clarkbalso keep an eye out for any unexpected new glean behavior since it updated a couple of days ago19:18
clarkb#topic Removing airship-citycloud nodepool provider19:18
clarkbThis morning I woke up to an email that this provider is going away for us today19:19
clarkbThe first two changes to remove it have landed. After this meeting I need to check if nodepool is complaining about a node in that provider that wouldn't delete. I'll run nodepool erase airship-citycloud if so19:19
fungiericsson had been paying for that citycloud account in support of airship testing, but has decided to discontinue it19:19
clarkbright19:20
clarkbOnce nodepool is done we can land the system-config and dns update changes. Then we can delete the mirror node if it hasn't gone away already19:20
fricklerdo we need to inform the airship team or do they know already?19:20
clarkbfrickler: this was apparently discussed with them months ago. They just neglected to tell us :/19:20
fricklerah, o.k.19:21
clarkbI'll keep driving this along today to help ensure it doesn't cause problems for us.19:21
clarkbThank you for the help with reviews.19:21
clarkbOh the mirror node is already in the emergency file to avoid ansible errors if they remove it quicker than we can take it out of our inventory19:22
clarkb#topic zuul-registry bugs19:23
clarkbWe've discovered some zuul-registry bugs. Specifically the way it handles concurrent uploads of the same image layer blobs is broken19:24
clarkbWhat happens is the second upload notices that the first one is running and it exits early. When it exits early that second uplodaing client HEADs the object to get its size back (even though it already knows the size) and we get a short read because the first upload isn't completed yet19:24
clarkbthen when the second client uses that second short size to push a manifest using that blob it errors because the server validates the input sizes19:25
clarkb#link https://review.opendev.org/c/zuul/zuul-registry/+/831235 Should address these bugs. Please monitor container jobs after this lands.19:25
ianw++ will review.  i got a little sidetracked with podman issues that got uncovered too19:25
clarkbThis change addresses that by forcing all uploads to run to their own completion. Then we handle the data such that it should always be valid when read (using atomic moves on the filesystem and swift eventual consistency stuff)19:25
clarkbThere is also a child change that makes the testing of zuul-registry much more concurrent to try and catch these issues19:26
clarkbit caught problems with one of my first attempts at fixing this so seems to be helpful already19:26
clarkbIn general this entire problem is avoided if yo udon't push the same image blobs at the same time which is why we likely haven't noticed until recently19:26
clarkb#topic PTG Planning19:27
clarkbI was reminded that the deadline for teams signing up to the PTG is approaching in about 10 days19:28
clarkbMy initial thought was that we could skip this one since last one was very quiet for us19:28
clarkbBut if there is interest in having a block of time or two let me know and I'm happy to sign up and manage that for us19:28
fungiwe've not really had any takers on our office hours sessions in the past19:28
clarkbyup. I think if we did time this time around we should dedicate it to things we want to cover and not do office hours19:29
clarkbAnyway no rush on that decision. Let me know in the next week or so and I can get us signed up if we like. But happy to avoid it. I know some of us end up in a lot of other sessions so having less opendev stuff might be helpful there too19:31
clarkb#topic Open Discussion19:31
clarkbThat was what I had on the agenda. Anything else?19:32
fricklerany idea regarding the buster ensure-pip issue?19:32
clarkbI've managed to miss this issue.19:33
fricklerif you missed it earlier, we break installing tox on py2 with our self-built wheels19:33
ianwfrickler: sorry i only briefly saw your notes, but haven't dug into it yet19:33
fricklerbecause the wheel gets build as *py2.py3* wheel and we don't serve the metadata to let pip know that this really only is for >=py3.619:34
fricklerwe've seen similar issues multiple times and usually worked around by capping the associated pkgs19:34
clarkbis that because tox doesn't set those flags upstream properly?19:35
clarkbwhich results in downstream artifacts being wrong?19:35
fricklerno, we only serve the file, so pip chooses the package only based on the filename19:35
fricklerpluggy-1.0.0-py2.py3-none-any.whl19:36
clarkbright but when we build it it shouldn't build a py2 wheel if upstream sets their metadata properly iirc19:37
ianw(i've had a spec out to add metadata but never got around to it ... but it has details https://review.opendev.org/c/opendev/infra-specs/+/703916/1/specs/wheel-modernisation.rst)19:37
fricklerof course it is also a bug in that package to not specify the proper settings19:37
fungiyeah, a big part of the problem is that our wheel builder solution assumes that since a wheel can be built on a platform it can be used on it, so we build wheels with python3 and then python picks them up19:37
fricklerbut we can't easily fix that19:37
ianw"Making PEP503 compliant indexes"19:37
fungiand unlike the simple api, we have no way to tell pip those wheels aren't suitable for python19:37
clarkbya so my concern is that if upstream is broken its hard for us to prevent this. We could address it after the fact one way or another, but if we rely on upstream to specify versions and they get them wrong we'll build bad wheels19:38
fungiyeah, we'd basically need to emulate the pypi simple api, and mark the entries as appropriate only for the python versions they built with, if we're supporting multiple interpreter versions on a single platform19:39
funginote this could also happen between different python3 versions if we made multiple versions of python3 available on the same platform, some of which weren't sufficient for wheels we cache for those platforms19:39
fungii suppose an alternative would be to somehow override universalness, and force them to build specific cp27 or cp36 abis19:40
fricklerapart from fixing the general issue, we should also discuss how to shortterm fix zuul-jobs builds19:41
clarkbnot all wheels support that iirc as your C linking is more limited19:41
fungicould be as simple as renaming the files19:41
clarkbfrickler: I think my preference for the short term would be to pin the dependency if we know it cannot work19:41
clarkbs/pin/cap/19:41
fricklerhttps://review.opendev.org/c/zuul/zuul-jobs/+/831136 is what is currently blocked19:41
fricklerwell the failure happens in our test which does a simple "pip install tox"19:42
fricklerwe could add "pluggy\<1" to it globally or add a constraints file that does this only for py2.719:42
fricklerI haven't been able to get the latter to work directly without a file19:42
fricklerthe other option that I've proposed would be to not use the whell mirror on buster19:43
clarkbya I think pip install tox pluggy\<1 is a good way to address it19:44
clarkbthen we can address mirroring without it being an emergency19:44
fungiahh, so pluggy is a dependency of tox?19:44
fricklerat least until latest tox requires pluggy>=119:45
fricklerfungi: yes19:45
fricklero.k., I'll propose a patch for that19:47
frickleranother thing, I've been considering whether some additions to nodepool might be useful19:48
clarkblike preinstalling tox?19:48
fricklernot sure whether that would better be discussed in #zuul but I wanted to mention them here19:48
clarkbI think we said we'd be ok with that sort of thing as long as we stashed things in virtualenvs to avoid conflicts with the system19:48
clarkbya feel free to bring it up here. We can always take it to #zuul later19:49
fricklerno, unrelated to that issue. although pre-provisioning nodes might also be interesting19:49
fricklerthese came up while I'm setting up a local zuul environment to test our (osism) kolla deployments19:49
fricklercurrently we use terraform and deploy nodes with fixed IPs and with additional volumes19:50
ianw(note we do install a tox venv in nodepool -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/infra-package-needs/install.d/40-install-tox)19:50
clarkbianw: thanks19:51
fricklerso in particular the additional volumes should be easily done in nodepool, just mark them to be deleted together with the server19:51
fricklerso then after creation and attaching, nodepool can forget about them19:51
clarkbfrickler: ya there has been talk about managing additional resources in nodepool. The problem is you have to track those resources just like you track instances because they can be leaked or have "alien" counterparts too19:51
clarkbUnfortuantely nodepool can't completely forget due to the leaking19:52
clarkb(cinder is particularly bad about leaking resources019:52
clarkbBut if nodepool were modified to track extra resources that would be doable.19:52
clarkbAnd I agree this is probably a better discussion for #zuul as it may require some design19:53
clarkbin particular we'll want to think about what this looks like for EBS volumes and other clouds too19:53
fricklero.k., so I need to find some stable homeserver and then move over to #zuul, fine19:54
fungii don't think cinder was particularly bad about leaking resources, just that the nova api was particularly indifferent to whether cinder successfully deleted things19:54
fricklerdo you know if someone already tried this or what it just some idea?19:55
clarkbfrickler: spamaps had talked about it at one time but I think it was mostly in the idea stage19:55
fricklero.k., great to hear that I'm not the only one with weird ideas ;)19:58
fricklerthat would be it from me for now19:58
clarkbWe are just about at time. Thank you everyone for joining. Remember no meeting next week. We'll see you in two weeks.19:58
clarkb#endmeeting20:00
opendevmeetMeeting ended Tue Mar  1 20:00:33 2022 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:00
opendevmeetMinutes:        https://meetings.opendev.org/meetings/infra/2022/infra.2022-03-01-19.01.html20:00
opendevmeetMinutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-03-01-19.01.txt20:00
opendevmeetLog:            https://meetings.opendev.org/meetings/infra/2022/infra.2022-03-01-19.01.log.html20:00
*** clarkb is now known as Guest98523:32
*** Guest985 is now known as clarkb23:41

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!