Tuesday, 2023-03-28

-@gerrit:opendev.org- Ian Wienand proposed:06:19
- [zuul/zuul-jobs] 878614: registry-tag-remove: role to delete tags from registry https://review.opendev.org/c/zuul/zuul-jobs/+/878614
- [zuul/zuul-jobs] 878740: promote-container-image: use generic tag removal role https://review.opendev.org/c/zuul/zuul-jobs/+/878740
@iwienand:matrix.org^ corvus Clark i do see what we mean by uploading from the promote role, but i do think the current system has a fair bit of merit in the way it tries to keep the window between code-commit and tag-update as small as possible.  i think that abstracting out the handling of the tags to a role is a practical solution ... ^ implements it.  06:22
@tobias-urdin:matrix.orgfelixedel: hello 👋 thanks for looking through the changes! any more feedback on the dark theme change?09:20
@clarkb:matrix.orgcorvus: before I review ianw's stack at https://review.opendev.org/c/zuul/zuul-jobs/+/878612/ to handle image promotion and tag deletion did you feel strongly about his proposal there? It diverges from what you had originally proposed17:19
@clarkb:matrix.orgI guess I'm happy to review it but don't want to dig in if you are already a hard no17:19
@jim:acmegating.comClark: well, it sounds like what i described as option 1b17:35
@jim:acmegating.comso i agree, it's a solution we can live with.  but it makes the whole thing non-generic. it means that if we move from quay.io to something else, we are right back to where we were, which is we need to implement a registry-specific way of deleting tags.17:38
@jim:acmegating.comi was hoping we could find a way to avoid vendor-specific apis, and i think options 2, 2b, and 2c have reasonable trade-offs to do that.  it's a question of whether we think the fast tag swap is more important than avoiding vendor lock-in.  and it's worth noting that 2b and 2c are ways of potentially accomplishing both.17:42
@jim:acmegating.comhaving said all that, at least separating out the cleanup part makes maintenance and future implementation easier17:43
@jim:acmegating.comClark: to directly answer the question, definitely not a hard no.  worth considering.  there's the trade-offs as i see it ^17:46
@clarkb:matrix.orgI think I'm mostly trying to figure out where to best put my energy to move forward on the registry move. Ianw has written one path forward but as you mention it relies on vendor specific apis to maintain the old workflow we've got.18:02
@clarkb:matrix.orgI or someone else could start work on 2)/2b) but before that happens we should probably agree that we don't want to proceed with what ianw has written?18:02
@jim:acmegating.comwell, i started on 2, with the assumption that we could upgrade to 2b or 2c if we want18:03
@jim:acmegating.comi'm happy to discard the work on 2 if we want to change course and go with 1.  but i stopped where i did so we could discuss it and decide18:03
@clarkb:matrix.orgoh did I miss those changes?18:03
@jim:acmegating.comso... let's not decide by saying that 1 is already written.18:03
@clarkb:matrix.orgI may have. I've been pulled in a bunch of different directions the last few days18:03
@jim:acmegating.comcorvus proposed: [zuul/zuul-jobs] 878538: WIP: Update promote-container-image to copy from intermediate registry18:04
@clarkb:matrix.orgoh right the docs outline change18:04
@jim:acmegating.com(also, ianw's implementation of 1 is incomplete as well, which is fine; i don't think we should run either of these to completion until we, you know, actually decide what we want)18:05
@clarkb:matrix.org++18:05
@clarkb:matrix.orgI think where I'm currently at is that I'd like to avoid the vendor specific APIs if we can and fallback to vendor specific tooling if necessary. Put another way I'd like to push 2 to completion and only fall back to 1 if 2 fails18:06
@clarkb:matrix.org(and we don't have any reason to think 2 would fail at the moment)18:06
@jim:acmegating.comi think the tricky part about that is that 2 will pull in the intermediate registry as a requirement, and once we go down that path, it may be hard to back that out18:07
@jim:acmegating.commight be worth ciphering on whether we can make that optional, or the fallback more graceful or something18:08
@clarkb:matrix.orgone way to make it optional is to rebuild the image in the post merge pipeline18:09
@clarkb:matrix.org(not a great answer, but an option I think)18:09
@jim:acmegating.comso basically say, if you have an intermediate registry, use build, upload, and promote jobs.  if you don't, then just use build and upload jobs.18:09
@jim:acmegating.com * so basically say, if you have an intermediate registry, use build, upload, and promote jobs.  if you don't, then just use build and upload jobs.  ?18:10
@clarkb:matrix.orgyes. Basically no promotion as you'd build and upload post merge and not have an artifact to manage18:10
@jim:acmegating.commaybe we should name the role "container-image-intermediate-registry-promote" when implementing option 2 to leave room for "container-image-registry-promote" as option 1 (so if we end up implementing 2 and we later decide we want 1, we can end up with 4 roles all designed together)18:12
@clarkb:matrix.orgI like that. Gives flexibility too for end users if they have a strong need for the tag updates to happen quickly too18:14
@clarkb:matrix.orgthey could maintain that half of the tooling for example while others used the generic jobs/roles18:14
@jim:acmegating.comcool, let's see what ianw says later18:15
@iwienand:matrix.orgunhelpfully i don't have any strong opinion :)  i feel like a tag cleanup role is probably not a bad thing to have in zuul-jobs independent of this anyway.  of things like looked docker, quay, gitlab, artifactory, digitalocean all had a simple endpoint to remove a tag.  google-cloud i could only see being able to do it using the client libraries.  github doesn't seem to map and you have to remove packages or something?  20:58
@iwienand:matrix.org> <@jim:acmegating.com> maybe we should name the role "container-image-intermediate-registry-promote" when implementing option 2 to leave room for "container-image-registry-promote" as option 1 (so if we end up implementing 2 and we later decide we want 1, we can end up with 4 roles all designed together)20:59
I think I was probably coming to this idea too. Have container-image-promote work as it does now, using the "generic" tag removal; but in docs note that you need to use a supported registry
@iwienand:matrix.orgbut also have an option to upload from the intermediate registry if you have that without re-tagging21:00
@iwienand:matrix.orgi tried to update the docs to explain a bit more about the code-merged -> image published latency in https://review.opendev.org/c/zuul/zuul-jobs/+/878612/121:01
@iwienand:matrix.orgmy thought was that would have "this can run in one of 3 modes" with this model... 21:02
@jim:acmegating.comsounds like no strong opinions all around, and willingness to experiment.  i'm glad we all like the idea of making room for both roles, and i'm happy to review and merge both.  we'll still need to decide which thing we want to try first with zuul (and also opendev).  but that's also not urgent.21:03
@jim:acmegating.com * sounds like no strong opinions all around, and willingness to experiment.  i'm glad we all like the idea of making room for both roles, and i'm happy to review/author as appropriate and merge both.  we'll still need to decide which thing we want to try first with zuul (and also opendev).  but that's also not urgent.21:03
@iwienand:matrix.orgi would say 878612 is ready for review -- the explanations i've put in there are my understanding of the way it works, so if that's wrong, i'm starting from the wrong place :)21:06
@iwienand:matrix.orghttps://review.opendev.org/c/zuul/zuul-jobs/+/878494/1 is another one that is just docs but tries to pull apart the buildx things a bit.  the whole buildx path is quite impressive when you dig a bit like that :)21:10
@iwienand:matrix.orgClark: https://review.opendev.org/c/zuul/zuul-jobs/+/878487/2 is related; just trying to simplify the buildx path a bit.  per the cl the re i think the multi-stage push was actually fixed by your work a little while ago on atomic uploads to zuul-registry21:11
@jim:acmegating.comianw: i agree with your understanding in 878612 -- i just left some nit-level comments on it21:13
@jim:acmegating.comwell, almost-nit level...i mean, comments that don't radically change the understanding being communicated :)21:13
@clarkb:matrix.orgok cool I'll take a look at reviewing the changes we've got at this point21:13
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:21:16
- [zuul/nodepool] 878179: Use a persistent recursive watch for caches https://review.opendev.org/c/zuul/nodepool/+/878179
- [zuul/nodepool] 877431: Use image cache when launching nodes https://review.opendev.org/c/zuul/nodepool/+/877431
- [zuul/nodepool] 877432: Use node cache in node deleter https://review.opendev.org/c/zuul/nodepool/+/877432
- [zuul/nodepool] 877565: Log the reason we decline a request https://review.opendev.org/c/zuul/nodepool/+/877565
@clarkb:matrix.orgunrelated I'm running zuul's test suite locally to compare my python3.10 without x86_64-v3 support to python3.11 with x86_64-v3 support and am noticing a few tests are failing. Many of them seem related to file io somehow? makes me wonder if btrfs or my version of tmpfs is causing problems21:16
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed:21:16
- [zuul/nodepool] 878178: Vendor persistent recursive watch Kazoo support https://review.opendev.org/c/zuul/nodepool/+/878178
- [zuul/nodepool] 878179: Use a persistent recursive watch for caches https://review.opendev.org/c/zuul/nodepool/+/878179
- [zuul/nodepool] 877431: Use image cache when launching nodes https://review.opendev.org/c/zuul/nodepool/+/877431
- [zuul/nodepool] 877432: Use node cache in node deleter https://review.opendev.org/c/zuul/nodepool/+/877432
- [zuul/nodepool] 877565: Log the reason we decline a request https://review.opendev.org/c/zuul/nodepool/+/877565
@jim:acmegating.comClark: both versions failing?  i have not seen that locally21:17
@clarkb:matrix.orgcorvus: I havne't gotten to 3.11 yet. Failures under 3.10 though. I guess I should run under 3.11 and see if the same tests fail consistently21:18
@clarkb:matrix.org/tmp is a tmpfs here and other things under / are btrfs though21:18
@jim:acmegating.commine are running under ext4 right now21:21
@jim:acmegating.comianw: again 99% agreement with content in 87849421:27
-@gerrit:opendev.org- Ian Wienand proposed:21:30
- [zuul/zuul-jobs] 878612: promote-image-container: do not delete tags https://review.opendev.org/c/zuul/zuul-jobs/+/878612
- [zuul/zuul-jobs] 878614: registry-tag-remove: role to delete tags from registry https://review.opendev.org/c/zuul/zuul-jobs/+/878614
- [zuul/zuul-jobs] 878740: promote-container-image: use generic tag removal role https://review.opendev.org/c/zuul/zuul-jobs/+/878740
@iwienand:matrix.orgthanks, will loop back after school run21:31
@clarkb:matrix.orgthinking about these test runs more I did some fairly big runs with the warnings cleanup and they were fine I think. Maybe I've got some local test setup problem21:33
@clarkb:matrix.orgI did just rebuild things21:33
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 878725: Check Gerrit submit requirements https://review.opendev.org/c/zuul/zuul/+/87872521:34
@jim:acmegating.comClark: i just did a full run on a tmpfs and saw no errors21:35
@clarkb:matrix.orgcorvus: ack I've probably got some issue I need to sort out21:36
@clarkb:matrix.orgianw: corvus at the end of https://review.opendev.org/c/zuul/zuul-jobs/+/878612/2/roles/build-container-image/common.rst we document that the temporary change tags will be cleaned up bu then in the same change we remove that code (bceause it isn't reliable). Should we update the bit in the docs indicating the role doesn't currently do tag cleanup?21:45
@clarkb:matrix.organd then I guess expand on that later as we implement the options available to us for working around that?21:45
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul-jobs] 878487: build-container-image: directly push with buildx https://review.opendev.org/c/zuul/zuul-jobs/+/87848721:50
@clarkb:matrix.orgno failures yet under 3.11. I wonder if we've somehow broken 3.10 or as I mentioned I've broken 3.10 on my local setup21:51
@clarkb:matrix.orgInterestingly comparing specific test case runtimes for some of the longer running tests some are faster and some are slower with the supposedly faster python3.11 installation22:05
@clarkb:matrix.orgI probably need to ensure even less stuff is runnogn on the test machine to get consistent comparisons between runs22:05
@clarkb:matrix.organd python3.11 was successful. Runtime total was less than the failed 3.10 case too22:09
@iwienand:matrix.org> <@clarkb:matrix.org> ianw: corvus at the end of https://review.opendev.org/c/zuul/zuul-jobs/+/878612/2/roles/build-container-image/common.rst we document that the temporary change tags will be cleaned up bu then in the same change we remove that code (bceause it isn't reliable). Should we update the bit in the docs indicating the role doesn't currently do tag cleanup?22:14
yeah i removed the skopeo removal because it's actually bad in it removing the underlying image, but i guess it's intended to be temporary
@iwienand:matrix.orgthe other thing i noticed in some of the output was "docker build" saying it was deprecated.  so we should probably just make the buildx path the default22:15
@clarkb:matrix.orgianw: I think one issue with that is docker pre 23 didn't include buildx by default (and even with 23 if you install from distro packages they don't pull in buildx by default...)22:19
@iwienand:matrix.orgahh i guess we're always using the ce packaes that have it22:21
@clarkb:matrix.orgya they have it but you have to explicitly install it even though buildx is the dfeault now22:23
@clarkb:matrix.orgI think they ended up not modifying their package metadata for the new release much compared to the old release so there are some weird inconsistencies compared to the release notes22:24
@clarkb:matrix.orgalso the whole apparmor thing22:24
@clarkb:matrix.orgcorvus: left some thoughts on the gerrit submit requirements chnage but +2 from me22:24
@clarkb:matrix.orgcorvus: `"failed_modules": {"ansible.legacy.setup": {"failed": true, "module_stderr": "/bin/sh: line 1: /home/clark/src/zuul/zuul/.nox/tests-3-10/bin/python3.10: No such file or directory\\n", "module_stdout": "", "msg": "The module failed to execute correctly, you probably need to set the interpreter.` I think this is the problem I've got. That path does exist though22:44
@clarkb:matrix.orgoh except this is in bwrap so maybe that isn't mounted?22:45
@clarkb:matrix.orgyes I think that is the issue. It is running ansible out of the nox env and ansible dirs are getting bind mounted but not the python nox venv. I don't know why python3.11 would work though22:47
@clarkb:matrix.orghrm python3.11 doesn't seem to add the equivalent 3.11 path to the ro bind paths22:56
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/nodepool] 878802: Report leaked resource metrics in statemachine driver https://review.opendev.org/c/zuul/nodepool/+/87880223:09
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul] 878725: Check Gerrit submit requirements https://review.opendev.org/c/zuul/zuul/+/87872523:20
@jim:acmegating.comClark: replied and updated re submit reqs23:20
@jim:acmegating.comClark: re bwrap -- i think there's some special detection of the python venv install path to bind mount it into bwrap?23:21
@clarkb:matrix.orgcorvus: maybe? I haven't found it in the logs at least. The more I dig into this the more I'm confused :/ The next thing I notice is that the job seems to run a playbook against hosts: all but it isn't clear what the inventory looks like. I think I need to hold the entire test env and not have it get cleaned up23:27
@clarkb:matrix.orglooks like there is an env var to do that23:29
@clarkb:matrix.orgcorvus: under python3.11 playbook_0's inventory has python set to auto but python3.10 does not23:40
@clarkb:matrix.orgit also sets the connection to local under 3.11 but not 3.10. I think this is the underlying cause of the failure for me23:40
@clarkb:matrix.orgwith 3.10 it is actually attempting to do ssh which it cannot do23:40
@clarkb:matrix.orgoh hrm the freeze playbook is failing which might be causing that 23:48
@clarkb:matrix.orgso this is a number of failures deep now. elasticsearch reporting test has no ES index because the reporter doesn't run on RETRY_LIMIT? That happens because there is no connection when running playbook_0 and that happens because freeze playbook is failing on that module thing I posted earlier23:49

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!