Friday, 2023-03-24

-@gerrit:opendev.org- Ian Wienand proposed:00:08
- [zuul/zuul-jobs] 878487: build-container-image: directly push with buildx https://review.opendev.org/c/zuul/zuul-jobs/+/878487
- [zuul/zuul-jobs] 878494: build-container-image: enhance buildx documentation https://review.opendev.org/c/zuul/zuul-jobs/+/878494
@iwienand:matrix.orghopefully 878494 isn't too far from reality and helps the next <del>sucker</del> interested developer 00:09
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-client] 878289: Publish images to quay.io https://review.opendev.org/c/zuul/zuul-client/+/87828900:11
@jim:acmegating.comthe upload job worked, but the promote job did not ^00:40
@jim:acmegating.comalso, the docs promote job failed.  that's probably unrelated.  if anyone else wants to look into that, please feel free.00:41
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 878497: test-registry: split docker and container paths https://review.opendev.org/c/zuul/zuul-jobs/+/87849700:47
@iwienand:matrix.orgcorvus: if you have a sec 878492 should help with the nodepool multiarch build; 878497 is additional testing for it00:48
@iwienand:matrix.orghttps://zuul.opendev.org/t/zuul/build/fcb9c9516c28479793b4f3bf49ffb40b/console#1/0/2/localhost00:50
@iwienand:matrix.orgpromote-container-image: Log in to registry failed on the zuul-client promote job00:50
@iwienand:matrix.orgunfortunately no log00:50
@jim:acmegating.comyeah; that command is identical to what i used to copy the repos over00:50
@jim:acmegating.comand the same creds were used to push the image (but it used docker to push, not skopeo)00:50
@jim:acmegating.com(+2 on both changes)00:51
@iwienand:matrix.orgit skipped https://zuul.opendev.org/t/zuul/build/fcb9c9516c28479793b4f3bf49ffb40b/console#1/0/1/localhost00:52
@iwienand:matrix.orgpromote-container-image: Verify repository permission ... i wonder if that's a clue00:52
@jim:acmegating.comthat's expected, the variable isn't set00:52
@jim:acmegating.comthat's the regex thing that we don't use00:53
@iwienand:matrix.orgoh right, yeah00:53
@iwienand:matrix.orgright, so we know "podman login" worked with the credentials from https://opendev.org/zuul/zuul-jobs/src/commit/b7cf56103e77a0ab13f3ab8599a46299dd8a12ca/roles/upload-container-image/tasks/push.yaml#L200:55
@iwienand:matrix.orgsorry, docker login ... specifically that happened at https://zuul.opendev.org/t/zuul/build/1930c4ccef2f48e19b1087996e482b77/console#4/0/3/ubuntu-jammy00:56
@jim:acmegating.comi wonder if skopeo is trying to write the creds somewhere it can't00:56
@jim:acmegating.comi just opened a zuul-bwrap and it stored it in /run/containers/10001/auth.json ... successfully00:58
@iwienand:matrix.orgi wonder if it might require putting it in a try 3 times loop?  it's a completely fresh executor where nothing was done but try to log in, so not much to make invalid state01:01
@jim:acmegating.comi think it already is01:02
@iwienand:matrix.orgheh, yep01:02
@iwienand:matrix.orgi guess we could go back to the "write a file and use the stdin flags" to get rid of the no_log01:04
@jim:acmegating.comyeah.  sigh.01:05
@iwienand:matrix.orgor re-enque it and try and catch it with an strace or something01:05
@jim:acmegating.comi just made a playbook in zuul-bwrap with just that task, without no_log, and the actual creds we should be using and it worked fine01:07
@jim:acmegating.comi'm liking the re-enqueue idea01:07
@iwienand:matrix.orgi have that in history if you want me to try01:07
@jim:acmegating.comi just re-enqueued it (used the web ui)01:08
@jim:acmegating.comit's um, a bit fast to try to catch it with strace i think01:09
@jim:acmegating.comapparently we have < 23 seconds to find out which host it's on...01:09
@jim:acmegating.comanyway, it failed again01:09
@iwienand:matrix.orgyeah, it goes fast.  did you try with same version of skopeo?01:10
@iwienand:matrix.orgalthough we pull from upstream into the images, so it's a very recent version, so unlikely to be that01:10
@jim:acmegating.comi was using zuul-bwrap in the prod environment on ze1101:10
@jim:acmegating.comso should be same everything01:11
@jim:acmegating.comhrm; if it finished in 23 seconds... it didn't retry 3 times with a 30 second delay01:14
@jim:acmegating.commaybe it's an ansible thing we're missing01:15
-@gerrit:opendev.org- Zuul merged on behalf of Ian Wienand: [zuul/zuul-jobs] 878492: push-to-intermediate-registry: look for container_images variable https://review.opendev.org/c/zuul/zuul-jobs/+/87849201:16
@jim:acmegating.comi think i see it...  gimme a min01:16
-@gerrit:opendev.org- Ian Wienand proposed: [zuul/zuul-jobs] 878497: test-registry: split docker and container paths https://review.opendev.org/c/zuul/zuul-jobs/+/87849701:20
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 878499: Move container-image-promote login block https://review.opendev.org/c/zuul/zuul-jobs/+/87849901:20
@iwienand:matrix.orgsigh, one character can make all the difference01:20
@jim:acmegating.comianw: ^ i suspect we may have some serious holes in those test jobs01:20
@iwienand:matrix.orgoh of course that makes sense. zj_image not defined, ansible goes boom01:21
@iwienand:matrix.orgi think we can probably merge that and re-enqueue?  that seems like the best test01:32
@jim:acmegating.com++01:35
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-jobs] 878499: Move container-image-promote login block https://review.opendev.org/c/zuul/zuul-jobs/+/87849901:47
@jim:acmegating.comonce more unto the breach01:51
@jim:acmegating.com"Error loading trust policy: open /etc/containers/policy.json: no such file or directory"01:54
@jim:acmegating.commore mundane errors now01:54
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 878501: Add --insecure-policy to skopeo promote command https://review.opendev.org/c/zuul/zuul-jobs/+/87850101:57
@jim:acmegating.comianw: ^01:57
@iwienand:matrix.orglgtm, mine has "insecureAcceptAnything" which seems as good as setting a flag01:58
-@gerrit:opendev.org- Zuul merged on behalf of James E. Blair https://matrix.to/#/@jim:acmegating.com: [zuul/zuul-jobs] 878501: Add --insecure-policy to skopeo promote command https://review.opendev.org/c/zuul/zuul-jobs/+/87850102:39
@iwienand:matrix.orgcorvus: closer02:45
@iwienand:matrix.org```02:45
@iwienand:matrix.org * ```02:46
https://review.opendev.org/c/zuul/zuul-jobs/+/878246
```
@iwienand:matrix.org * ```02:46
time="2023-03-24T02:45:06Z" level=fatal msg="Failed to delete /v2/zuul-ci/zuul-client/manifests/sha256:b518322c8c4daa704d2d0276c6da36651b947b7547e5df4747c77c57756d8ca7: {\"errors\":[{\"code\":\"UNAUTHORIZED\",\"detail\":{},\"message\":\"access to the requested resource is not authorized\"}]}\n (401 UNAUTHORIZED)"
```
@iwienand:matrix.orgautomationtools has write access, i wonder if that includes delete, or that requires admin?02:49
@iwienand:matrix.org```It is recommend not to use skopeo delete when deleting tags. Instead of that, Quay provides a specific API endpoint to delete tags, that doesn't touch the underlying image. Quay's API endpoint will delete only the specified tag, all other tags that are connected to that image will stay intact. ``` https://access.redhat.com/solutions/460015102:55
@jim:acmegating.comthat sounds concerning02:56
@jim:acmegating.comnevertheless, how about we give it admin perms and try again and see what happens?02:56
@jim:acmegating.comno time like the present... :)02:56
@iwienand:matrix.orgi can try, i have it open02:57
@iwienand:matrix.orgtrying to find something that explicitly says what can do what, but so far haven't found it02:57
@iwienand:matrix.orgit's about to start02:58
@jim:acmegating.comwow that really did delete latest didn't it?02:59
@iwienand:matrix.orghttps://zuul.opendev.org/t/zuul/build/b04e8e6ffa574c618df38bce863176cb/console#1/0/7/localhost02:59
@jim:acmegating.comit looks like that's interpreted as "delete the thing this tag points to"03:00
@iwienand:matrix.org latest was deleted 03:00
@jim:acmegating.comand also all the tags that point to it03:00
@iwienand:matrix.orgno wonder you need admin :)03:00
@jim:acmegating.comi have (using the web ui) restored the change_878289_latest tag03:01
@jim:acmegating.comso that we can try more things later just by re-running this buildset03:01
@iwienand:matrix.orgthat seems super annoying because you need different tokens to hit the api03:02
@jim:acmegating.comayup.03:02
@jim:acmegating.comi'm going to sign off for the evening.03:02
@iwienand:matrix.orgyeah i'm probably not going to get too far with this, i'll see.  but i guess at least we have an idea what's going on03:03
@iwienand:matrix.orgthe nodepool multi-arch build looks good now at least -> https://zuul.opendev.org/t/zuul/build/4003ae9fc2d84468bfec0c7966ca3920/artifacts05:16
-@gerrit:opendev.org- Simon Westphahl proposed: [zuul/zuul] 878523: Log commit SHA when getting files from repo https://review.opendev.org/c/zuul/zuul/+/87852310:36
@flaper87:matrix.orgIs there a way to get the name of the branch that was cloned? The `zuul.change` variable contains the number of the PR (Github) and branch the name of the target branch. I must be missing something because I can't find the variable holding the commit sha or the name of the source branch13:23
@fungicide:matrix.org> <@clarkb:matrix.org> and for some reason it was never a pier. Maybe because a pier implies pilings rather than a solid construction?13:49
my dock is built on pilings, but then so is my house
@fungicide:matrix.orgalso somewhat related, because most of the houses here are named, we decided to call ours "skeleton quay" (i still need to finish working on the nameplate/signage though)13:50
@fungicide:matrix.orgof course, only people who know the correct pronunciation will get the pun13:51
@jim:acmegating.comClark: ianw fungi i think the delete behavior is more or less specified by the protocol... so i think we need to rethink the promote role13:52
@fungicide:matrix.orgthe design i'm working on fits some of the letters into teeth on a skeleton key, so maybe it will be a hint13:52
@jim:acmegating.comhere are some options:13:55
1) don't have the promote role delete the temporary tags. leave them there forever.
1b) don't have the promote role delete the temporary tags, but have some new quay-api specific role come along and delete them later.
2) don't upload the image to quay in the upload role, instead, have the promote role copy the image from the intermediate registry to quay. this makes the promote role a little slower since it's actually moving data.
2b) have the upload role push to a single static tag on quay like "zuul_gate_pipeline_latest" just to get all the layers on quay, and then have the promote role copy from the intermediate registry. should still be relatively fast with all the layers already there.
@jim:acmegating.comi kind of like (2)  (a little more than 2b, that seems a little messy even if it might be the most efficient)13:56
@fungicide:matrix.orgwas there a reason we hadn't been doing #2 previously for dockerhub uploads?13:57
@fungicide:matrix.orgi agree it seems cleaner than the old way13:57
@fungicide:matrix.orgdid it pre-date the existence of our insecure ci registry maybe?13:58
@jim:acmegating.comfungi: so that we can minimize the time delta between when a change merges and the image is published.  basically, it makes gating container images a little squirrely if that delta is large13:58
@jim:acmegating.com(because once a change is merged, if there are any changes in gate that depend on that, zuul is going to assume the image that includes it is already published rather than fetch a speculative image from the intermediate registry)13:59
@fungicide:matrix.orgaha, good point, large image uploads do take time14:00
@fungicide:matrix.orgwhile tags are nearly instantaneous14:00
@jim:acmegating.comi feel like maybe we can assume option (2) will generally be fast enough for us most of the time, and if we have problems, we can "upgrade" from 2 to 2b without major design changes.14:01
@jim:acmegating.com(there's probably an option 2c as well, where we devise a tool of our own that pushes the underlying layers without pushing a tag;  that would be fast and clean)14:02
@jim:acmegating.com(2b and 2c assume that the registry isn't going to go and prune dangling layers between our gate and promote events.  but even if it does, the behavior just degrades to option 2 -- the layers get copied again)14:03
@clarkb:matrix.org2) seems reasonable to me14:20
@jim:acmegating.comClark: fungi ianw the existing roles don't actually depend on the intermediate registry.  that's probably another reason the docker roles didn't do #2. so if we do that, we will be adding a dependency on that.  i still think that's the way to go.14:54
@clarkb:matrix.orgcorvus: considering these are new jobs I don't see a reaosn that would be a problem.14:57
@clarkb:matrix.organd even then this seems like a reasonable compromise 14:57
-@gerrit:opendev.org- James E. Blair https://matrix.to/#/@jim:acmegating.com proposed: [zuul/zuul-jobs] 878538: WIP: Update promote-container-image to copy from intermediate registry https://review.opendev.org/c/zuul/zuul-jobs/+/87853814:59
@jim:acmegating.comthat ^ is a mostly docs / comments change to sketch out what i think that would look like15:00
-@gerrit:opendev.org- Zuul merged on behalf of Simon Westphahl: [zuul/zuul] 877245: Set cache ltime when branch protection changed https://review.opendev.org/c/zuul/zuul/+/87724515:10
@clarkb:matrix.orgcorvus: the docs update makes sense to me. I did leave a comment for additional clarification though20:40
@clarkb:matrix.orgno rush on that since this is already incomplate20:40

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!