*** Guest721 is now known as diablo_rojo_phone | 15:47 | |
clarkb | hello it is meeting time | 19:00 |
---|---|---|
frickler | o/ | 19:00 |
fungi | ohai | 19:00 |
clarkb | #startmeeting infra | 19:01 |
opendevmeet | Meeting started Tue Mar 1 19:01:06 2022 UTC and is due to finish in 60 minutes. The chair is clarkb. Information about MeetBot at http://wiki.debian.org/MeetBot. | 19:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 19:01 |
opendevmeet | The meeting name has been set to 'infra' | 19:01 |
clarkb | #link https://lists.opendev.org/pipermail/service-discuss/2022-March/000323.html Our Agenda | 19:01 |
clarkb | #topic Announcements | 19:01 |
clarkb | First up I won't be able to run next weeks meeting as I have other meetings | 19:01 |
ianw | o/ | 19:01 |
fungi | i expect to be in a similar situation | 19:02 |
clarkb | I've proposed we skip it in the agenda, but if others want a meeting feel free to update the agenda and send it out. I just won't be able to participate | 19:02 |
frickler | I won't travel but have a holiday, so fine with skipping | 19:02 |
ianw | i can host it, but if fungi is out too probably ok to skip | 19:02 |
clarkb | cool consider it skipped then | 19:03 |
clarkb | #topic Actions from last meeting | 19:03 |
clarkb | #link http://eavesdrop.openstack.org/meetings/infra/2022/infra.2022-02-22-19.01.txt minutes from last meeting | 19:03 |
clarkb | There were no actions recorded | 19:03 |
clarkb | #topic Topics | 19:03 |
clarkb | Time to dive in | 19:03 |
clarkb | #topic Improving CD throughput | 19:03 |
clarkb | ianw: Did all the logs changes end up landing? | 19:03 |
clarkb | that info is now available to us via zuul if we add our gpg keys ya? | 19:04 |
ianw | not yet | 19:04 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/830784 | 19:04 |
ianw | is the one that turns it on globally -- i was hoping for some extra reviews on that but i'm happy to babysit it | 19:04 |
ianw | #link https://review.opendev.org/c/opendev/system-config/+/830785 | 19:05 |
clarkb | ah | 19:05 |
clarkb | I guess I saw it only on the codesearch jobs | 19:05 |
ianw | is a doc update, that turned into quite a bit more than just how to add your gpg keys but it's in there | 19:05 |
clarkb | ya seemed like good improvements to the docs | 19:05 |
ianw | i felt like we were all ok with it, but didn't want to single approve 830784 just in case | 19:06 |
clarkb | ++ | 19:06 |
clarkb | Good to get lots of eyeballs on changes like this | 19:06 |
clarkb | Anything else on this topic? | 19:07 |
fungi | oh, i totally missed 830785, thanks | 19:07 |
fungi | i approved 830784 now | 19:08 |
clarkb | #topic Container Maintenance | 19:08 |
ianw | nope, that's it for now, thanks | 19:08 |
clarkb | jentoio: and I met up last week and discussed what needed to be done for giving containers dedicated users. We decided to look at updating insecure ci registry to start since it doesn't write to the fs | 19:09 |
clarkb | That will help get the shape of things in place before we tackle some of the more complicated containers | 19:09 |
jentoio | I'll be working on zuul-registry this afternoon | 19:09 |
clarkb | jentoio: thanks again! | 19:09 |
jentoio | finaly allocated some time to focus on it | 19:10 |
clarkb | so ya some progress here. Reviews appreciated once we have changes | 19:10 |
clarkb | #topic Cleaning Up Old Reviews | 19:11 |
clarkb | #link https://review.opendev.org/q/topic:retirement+status:open Changes to retire all unused the repos. | 19:11 |
clarkb | We're basically at actually retiring content from the repos. I would appreciate if we could start picking some of those off. I don't think they are difficult reviews, you should be able to fetch them and ls the repo contents to make sure nothing got left behind andcheck the README content but there are a lot of them | 19:11 |
clarkb | Once those changes land I'll do bulk abandons for open changes on those repos and then push up the zuul removal cleanups | 19:12 |
clarkb | as well as gerrit acl updates | 19:12 |
ianw | ++ sorry i meant to look at them, will do | 19:12 |
clarkb | thanks! | 19:13 |
fungi | yeah, same. i'm going to blame the fact that i had unsubbed from those repos in gertty a while back | 19:13 |
clarkb | #topic Gitea 1.16 | 19:14 |
clarkb | I think it is worth waiting until 1.16.3 releases and then make a push to upgrade | 19:15 |
clarkb | https://github.com/go-gitea/gitea/milestone/113 indicates the 1.16.3 release should happen soon. Only one remaining issue left to address | 19:15 |
clarkb | The reason for this is that of the issues we've discovered (diff rendering of images and build inconsistencies with a dep) they should both be fixed by 1.16.3 | 19:15 |
clarkb | I think we can hold off on reviewing things for now untikl 1.16.3 happens. I'll get a change pushed for that and update our hold to inspect it | 19:16 |
clarkb | Mostly a heads up that I haven't forgotten about this, but waiting until it stablizes a bit more | 19:16 |
clarkb | #topic Rocky Linux | 19:17 |
clarkb | #link https://review.opendev.org/c/zuul/nodepool/+/831108 Need new dib for next steps | 19:17 |
clarkb | This is a nodepool change that will upgrade dib in our builder images | 19:17 |
clarkb | This new dib should address new issues found with Rocky Linux builds | 19:17 |
clarkb | Please review that if you get a chance | 19:18 |
clarkb | also keep an eye out for any unexpected new glean behavior since it updated a couple of days ago | 19:18 |
clarkb | #topic Removing airship-citycloud nodepool provider | 19:18 |
clarkb | This morning I woke up to an email that this provider is going away for us today | 19:19 |
clarkb | The first two changes to remove it have landed. After this meeting I need to check if nodepool is complaining about a node in that provider that wouldn't delete. I'll run nodepool erase airship-citycloud if so | 19:19 |
fungi | ericsson had been paying for that citycloud account in support of airship testing, but has decided to discontinue it | 19:19 |
clarkb | right | 19:20 |
clarkb | Once nodepool is done we can land the system-config and dns update changes. Then we can delete the mirror node if it hasn't gone away already | 19:20 |
frickler | do we need to inform the airship team or do they know already? | 19:20 |
clarkb | frickler: this was apparently discussed with them months ago. They just neglected to tell us :/ | 19:20 |
frickler | ah, o.k. | 19:21 |
clarkb | I'll keep driving this along today to help ensure it doesn't cause problems for us. | 19:21 |
clarkb | Thank you for the help with reviews. | 19:21 |
clarkb | Oh the mirror node is already in the emergency file to avoid ansible errors if they remove it quicker than we can take it out of our inventory | 19:22 |
clarkb | #topic zuul-registry bugs | 19:23 |
clarkb | We've discovered some zuul-registry bugs. Specifically the way it handles concurrent uploads of the same image layer blobs is broken | 19:24 |
clarkb | What happens is the second upload notices that the first one is running and it exits early. When it exits early that second uplodaing client HEADs the object to get its size back (even though it already knows the size) and we get a short read because the first upload isn't completed yet | 19:24 |
clarkb | then when the second client uses that second short size to push a manifest using that blob it errors because the server validates the input sizes | 19:25 |
clarkb | #link https://review.opendev.org/c/zuul/zuul-registry/+/831235 Should address these bugs. Please monitor container jobs after this lands. | 19:25 |
ianw | ++ will review. i got a little sidetracked with podman issues that got uncovered too | 19:25 |
clarkb | This change addresses that by forcing all uploads to run to their own completion. Then we handle the data such that it should always be valid when read (using atomic moves on the filesystem and swift eventual consistency stuff) | 19:25 |
clarkb | There is also a child change that makes the testing of zuul-registry much more concurrent to try and catch these issues | 19:26 |
clarkb | it caught problems with one of my first attempts at fixing this so seems to be helpful already | 19:26 |
clarkb | In general this entire problem is avoided if yo udon't push the same image blobs at the same time which is why we likely haven't noticed until recently | 19:26 |
clarkb | #topic PTG Planning | 19:27 |
clarkb | I was reminded that the deadline for teams signing up to the PTG is approaching in about 10 days | 19:28 |
clarkb | My initial thought was that we could skip this one since last one was very quiet for us | 19:28 |
clarkb | But if there is interest in having a block of time or two let me know and I'm happy to sign up and manage that for us | 19:28 |
fungi | we've not really had any takers on our office hours sessions in the past | 19:28 |
clarkb | yup. I think if we did time this time around we should dedicate it to things we want to cover and not do office hours | 19:29 |
clarkb | Anyway no rush on that decision. Let me know in the next week or so and I can get us signed up if we like. But happy to avoid it. I know some of us end up in a lot of other sessions so having less opendev stuff might be helpful there too | 19:31 |
clarkb | #topic Open Discussion | 19:31 |
clarkb | That was what I had on the agenda. Anything else? | 19:32 |
frickler | any idea regarding the buster ensure-pip issue? | 19:32 |
clarkb | I've managed to miss this issue. | 19:33 |
frickler | if you missed it earlier, we break installing tox on py2 with our self-built wheels | 19:33 |
ianw | frickler: sorry i only briefly saw your notes, but haven't dug into it yet | 19:33 |
frickler | because the wheel gets build as *py2.py3* wheel and we don't serve the metadata to let pip know that this really only is for >=py3.6 | 19:34 |
frickler | we've seen similar issues multiple times and usually worked around by capping the associated pkgs | 19:34 |
clarkb | is that because tox doesn't set those flags upstream properly? | 19:35 |
clarkb | which results in downstream artifacts being wrong? | 19:35 |
frickler | no, we only serve the file, so pip chooses the package only based on the filename | 19:35 |
frickler | pluggy-1.0.0-py2.py3-none-any.whl | 19:36 |
clarkb | right but when we build it it shouldn't build a py2 wheel if upstream sets their metadata properly iirc | 19:37 |
ianw | (i've had a spec out to add metadata but never got around to it ... but it has details https://review.opendev.org/c/opendev/infra-specs/+/703916/1/specs/wheel-modernisation.rst) | 19:37 |
frickler | of course it is also a bug in that package to not specify the proper settings | 19:37 |
fungi | yeah, a big part of the problem is that our wheel builder solution assumes that since a wheel can be built on a platform it can be used on it, so we build wheels with python3 and then python picks them up | 19:37 |
frickler | but we can't easily fix that | 19:37 |
ianw | "Making PEP503 compliant indexes" | 19:37 |
fungi | and unlike the simple api, we have no way to tell pip those wheels aren't suitable for python | 19:37 |
clarkb | ya so my concern is that if upstream is broken its hard for us to prevent this. We could address it after the fact one way or another, but if we rely on upstream to specify versions and they get them wrong we'll build bad wheels | 19:38 |
fungi | yeah, we'd basically need to emulate the pypi simple api, and mark the entries as appropriate only for the python versions they built with, if we're supporting multiple interpreter versions on a single platform | 19:39 |
fungi | note this could also happen between different python3 versions if we made multiple versions of python3 available on the same platform, some of which weren't sufficient for wheels we cache for those platforms | 19:39 |
fungi | i suppose an alternative would be to somehow override universalness, and force them to build specific cp27 or cp36 abis | 19:40 |
frickler | apart from fixing the general issue, we should also discuss how to shortterm fix zuul-jobs builds | 19:41 |
clarkb | not all wheels support that iirc as your C linking is more limited | 19:41 |
fungi | could be as simple as renaming the files | 19:41 |
clarkb | frickler: I think my preference for the short term would be to pin the dependency if we know it cannot work | 19:41 |
clarkb | s/pin/cap/ | 19:41 |
frickler | https://review.opendev.org/c/zuul/zuul-jobs/+/831136 is what is currently blocked | 19:41 |
frickler | well the failure happens in our test which does a simple "pip install tox" | 19:42 |
frickler | we could add "pluggy\<1" to it globally or add a constraints file that does this only for py2.7 | 19:42 |
frickler | I haven't been able to get the latter to work directly without a file | 19:42 |
frickler | the other option that I've proposed would be to not use the whell mirror on buster | 19:43 |
clarkb | ya I think pip install tox pluggy\<1 is a good way to address it | 19:44 |
clarkb | then we can address mirroring without it being an emergency | 19:44 |
fungi | ahh, so pluggy is a dependency of tox? | 19:44 |
frickler | at least until latest tox requires pluggy>=1 | 19:45 |
frickler | fungi: yes | 19:45 |
frickler | o.k., I'll propose a patch for that | 19:47 |
frickler | another thing, I've been considering whether some additions to nodepool might be useful | 19:48 |
clarkb | like preinstalling tox? | 19:48 |
frickler | not sure whether that would better be discussed in #zuul but I wanted to mention them here | 19:48 |
clarkb | I think we said we'd be ok with that sort of thing as long as we stashed things in virtualenvs to avoid conflicts with the system | 19:48 |
clarkb | ya feel free to bring it up here. We can always take it to #zuul later | 19:49 |
frickler | no, unrelated to that issue. although pre-provisioning nodes might also be interesting | 19:49 |
frickler | these came up while I'm setting up a local zuul environment to test our (osism) kolla deployments | 19:49 |
frickler | currently we use terraform and deploy nodes with fixed IPs and with additional volumes | 19:50 |
ianw | (note we do install a tox venv in nodepool -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/elements/infra-package-needs/install.d/40-install-tox) | 19:50 |
clarkb | ianw: thanks | 19:51 |
frickler | so in particular the additional volumes should be easily done in nodepool, just mark them to be deleted together with the server | 19:51 |
frickler | so then after creation and attaching, nodepool can forget about them | 19:51 |
clarkb | frickler: ya there has been talk about managing additional resources in nodepool. The problem is you have to track those resources just like you track instances because they can be leaked or have "alien" counterparts too | 19:51 |
clarkb | Unfortuantely nodepool can't completely forget due to the leaking | 19:52 |
clarkb | (cinder is particularly bad about leaking resources0 | 19:52 |
clarkb | But if nodepool were modified to track extra resources that would be doable. | 19:52 |
clarkb | And I agree this is probably a better discussion for #zuul as it may require some design | 19:53 |
clarkb | in particular we'll want to think about what this looks like for EBS volumes and other clouds too | 19:53 |
frickler | o.k., so I need to find some stable homeserver and then move over to #zuul, fine | 19:54 |
fungi | i don't think cinder was particularly bad about leaking resources, just that the nova api was particularly indifferent to whether cinder successfully deleted things | 19:54 |
frickler | do you know if someone already tried this or what it just some idea? | 19:55 |
clarkb | frickler: spamaps had talked about it at one time but I think it was mostly in the idea stage | 19:55 |
frickler | o.k., great to hear that I'm not the only one with weird ideas ;) | 19:58 |
frickler | that would be it from me for now | 19:58 |
clarkb | We are just about at time. Thank you everyone for joining. Remember no meeting next week. We'll see you in two weeks. | 19:58 |
clarkb | #endmeeting | 20:00 |
opendevmeet | Meeting ended Tue Mar 1 20:00:33 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:00 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/infra/2022/infra.2022-03-01-19.01.html | 20:00 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/infra/2022/infra.2022-03-01-19.01.txt | 20:00 |
opendevmeet | Log: https://meetings.opendev.org/meetings/infra/2022/infra.2022-03-01-19.01.log.html | 20:00 |
*** clarkb is now known as Guest985 | 23:32 | |
*** Guest985 is now known as clarkb | 23:41 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!