pabelanger[m] | okay, 2nd issue | 00:10 |
---|---|---|
pabelanger[m] | changes don't appear to be enqueuing into gate now | 00:11 |
pabelanger[m] | I see the following | 00:11 |
pabelanger[m] | 2021-06-24 00:07:36,153 DEBUG zuul.Pipeline.ansible.gate: [e: 2d2601a0-d480-11eb-9486-73eec78895a0] Change <Change 0x7fb6c81ed0f0 ansible-network/windmill-config 844,b4458df0176c4e7ba802a7ca03f32a530fa5a0e9> does not match pipeline requirement <GithubRefFilter connection_name: github.com statuses: ansible-zuul\[bot\]:ansible/check:success open: True current-patchset: True labels: ['gate']> because RequiredStatuses | 00:11 |
pabelanger[m] | ['ansible-zuul\\[bot\\]:ansible/check:success'] does not match ['ansible-zuul:ansible/check:success'] | 00:11 |
ianw | pabelanger[m]: just clutching, what's with the [bot]? bit | 00:25 |
pabelanger[m] | that's how it worked in 3.9.1 | 00:25 |
pabelanger[m] | okay, dropping the [bot] part, seems to have done it | 00:28 |
pabelanger[m] | trying to confirm now | 00:29 |
pabelanger[m] | ianw: okay, seems to have done it | 00:45 |
pabelanger[m] | I've also noticed we are down to 2 versions of ansible in zuul | 01:13 |
pabelanger[m] | where 2.8 is EOL and 2.9 is in security mode | 01:13 |
pabelanger[m] | guess we need to add 2.10 and 3.0.0 / 4.0.0 support | 01:13 |
corvus | pabelanger: i agree, that looks like a doc bug; it should say that the keystore is used by scheduler+executor | 01:48 |
opendevreview | Paul Belanger proposed zuul/zuul master: Ensure executor also as keystore.password setting https://review.opendev.org/c/zuul/zuul/+/797800 | 04:46 |
*** marios is now known as marios|ruck | 05:01 | |
*** bhagyashris_ is now known as bhagyashris | 05:13 | |
*** rpittau|afk is now known as rpittau | 07:21 | |
*** jpena|off is now known as jpena | 07:36 | |
*** sshnaidm|afk is now known as sshnaidm | 08:45 | |
*** marios is now known as marios|ruck | 09:48 | |
*** bhagyashris_ is now known as bhagyashris | 09:48 | |
opendevreview | Merged zuul/zuul master: Ensure executor also as keystore.password setting https://review.opendev.org/c/zuul/zuul/+/797800 | 10:04 |
tibeer | Hello. I have a question regarding the tenant configuration for untrusted projects. I’m using GitHub repositories, but the default branch name is „main“. For configuration-projects this can be set by „load-branch“, but this does not work for untrusted-projects. I’m getting this message during pushes from GitHub on my build details page: „Error: Project github.com/tibeerorg/zuul_config does not | 11:08 |
tibeer | have the default branch master“. Am I missing something here or might this be a missing feature? | 11:08 |
fungi | tibeer: https://zuul-ci.org/docs/zuul/reference/project_def.html#attr-project.default-branch | 11:27 |
tibeer | thanks a lot! i knew i was missing something | 11:27 |
fungi | no worries, there's a lot of documentation to wade through | 11:31 |
*** jpena is now known as jpena|lunch | 11:39 | |
-opendevstatus- NOTICE: Our Zuul gating CI/CD services will be offline starting around 14:00 UTC (in roughly two hours from now) in order to apply some critical security updates, and is not expected to remain offline for more than 30 minutes. | 12:03 | |
*** jpena|lunch is now known as jpena | 12:36 | |
pabelanger[m] | morning, is there a time frame setup to tag the next release? | 13:18 |
corvus | NOOO! | 13:18 |
corvus | pabelanger, tobiash, ianw: you merged a change to zuul after i tagged 4.6.0 | 13:18 |
*** marios|ruck is now known as marios|ruck|call | 13:19 | |
corvus | zuul-maint: do you have any suggestions as to how we should handle the ensuing security release in 40 minutes? | 13:19 |
tobiash[m] | corvus: wasn;t that only a doc change? | 13:19 |
tobiash[m] | I thought that would be ok, sorry for that | 13:19 |
corvus | i spent hours yesterday rebasing the changes and building releases, and all of the git commits were going to be fast-forwarded, and the actual release tag would be on the master branch... | 13:20 |
corvus | i guess i will repeat that process now, and hope i can complete it in 40 minutes | 13:20 |
avass[m] | sounds like the easiest way is to force push master to undo the last change and re-apply that later? | 13:20 |
corvus | avass: then people who have pulled won't be able to fast-forward | 13:21 |
fungi | corvus: i guess if it takes a few minutes longer that's not the end of the world. unfortunate though yes | 13:22 |
mnaser | i like to think a lot of the bigger zuul operators are gonna hangout here anyways | 13:23 |
corvus | okay. i recorded all of my commands from yesterday, so hopefully i can run the same sequence now | 13:23 |
mnaser | and getting an email a few mins later in zuul-announce should be alright :) | 13:23 |
corvus | tristanC: ^ zuul master has moved since yesterday. i need to rebase the patches. is that going to cause a problem for you? | 13:25 |
fungi | i suppose one way to save time would be to not rebuild the 4.6.0 container images and just accept that they're missing that doc change so don't perfectly correspond to what ends up being tagged as 4.6.0 | 13:26 |
tristanC | corvus: why do you have to rebase the patches? | 13:26 |
fungi | tristanC: they'll either be rebased or merged one way or the other | 13:26 |
corvus | and i think it would be prudent for the tag and release artifacts i'm manually publishing to be built from a commit that's actually in the master branch | 13:28 |
tristanC | corvus: but aren't we going to merge those patch through zuul after it is restarted? | 13:29 |
fungi | right, and to not tag a locally-created merge commit | 13:29 |
fungi | tristanC: nope | 13:29 |
fungi | tristanC: per the circulated plan they'll be pushed while our zuul is offline | 13:29 |
fungi | along with the 4.6.0 tag | 13:29 |
corvus | well, we could do either; they will be ff regardless | 13:30 |
tristanC | fungi: i see, but perhaps we don't have to push the tag | 13:30 |
fungi | this is probably not the time to be deciding further deviations from the planned timeline | 13:30 |
tobiash[m] | isn't it the lower cost to force push instead and accepting the non-fast forward for the likely few people who pulled in the last few hours? | 13:31 |
tristanC | corvus: in any case that's not going to cause a problem for us | 13:31 |
fungi | any more than are already necessitated by the fact that the master branch is now different from what was tested and built privately in advance | 13:31 |
corvus | i am adamently opposed to force-pushing master | 13:31 |
corvus | fungi: i did actually have the 'push changes' step as git-review after restarting zuul; i wasn't planning on running 'git push' for those. | 13:32 |
corvus | but maybe we should 'git push' them instead of 'git review' | 13:32 |
tristanC | corvus: then i think it's ok to publish your existing sdist and container image using HEAD^ + the patch, then once zuul is restarted, propose the patch with git-review | 13:33 |
corvus | since they're effectively merged already, then do any fixups as new changes | 13:33 |
avass[m] | It's probably better to just pick a solution and roll with it, and since there are strong opinions about force pushing master maybe re-doing yesterdays work and sending an announcement that it's gonna take bit longer is best? | 13:33 |
fungi | corvus: i thought that was for the opendev job config updates you had drafted, otherwise how can you push the tag without pushing the commits it's based on? | 13:33 |
corvus | fungi: that's obvious in retrospect, thanks for remiding me, this is why we have review :) | 13:34 |
corvus | fungi: i would have been surprised at what happened :) | 13:34 |
avass[m] | (instead of having a discussion that is starting to add significantly more time) | 13:34 |
fungi | heh, no worries | 13:34 |
corvus | that makes me think it's even more prudent to rebase, retag, rebuild. | 13:35 |
corvus | fungi: i think pushing the tag and pushing master are still separate things | 13:35 |
fungi | and yes, i think redo the steps from yesterday, rebase, recreate the tag. i still think not rebuilding the container images and accepting they lack the doc change might be acceptable | 13:35 |
corvus | fungi: so maybe we should adjust the plan to push the tag and master at the same time? | 13:36 |
fungi | corvus: that's fair, the commits will exist in the repo initially just not on the master branch, however gerrit may balk at changes pushing the same commits | 13:36 |
fungi | so i do think pushing master at the same time as the tag makes sense | 13:37 |
corvus | i'll rebuild the binaries too | 13:38 |
fungi | and sorry, i had assumed that was implied, but yes you had specific git push commands in the plan and didn't separately push the master branch before the tag in what you had written | 13:38 |
fungi | i missed that in reading through it | 13:38 |
corvus | sdist is built; building container images now | 13:42 |
corvus | pabelanger, tobiash, ianw: i'm sorry, i should have sent out an announcement asking folks to freeze the master branch | 13:47 |
*** marios|ruck|call is now known as marios|ruck | 13:49 | |
pabelanger[m] | yah, my bad. I could have waited until today to push up my change | 13:51 |
pabelanger[m] | did it basically as I went offline to go to bed | 13:51 |
corvus | the images are re-built | 13:52 |
corvus | so i think we're ready to go | 13:52 |
avass[m] | Oh, time to take things offline then | 13:52 |
fungi | thanks! and sorry about the scramble | 13:52 |
tobiash[m] | phew, sorry again for causing such stress | 13:52 |
corvus | i've stopped gerrit's zuul | 13:53 |
fungi | i can stop our scheduler in opendev unless you're already working on it | 13:54 |
corvus | fungi: go for it :) | 13:54 |
fungi | i'll coordinate that in #opendev | 13:54 |
fungi | corvus: it's offline | 13:56 |
corvus | okay, should i start pushing docker images now? | 13:56 |
corvus | i have a slow link, the first one could take a while | 13:57 |
fungi | yeah, go for it | 13:57 |
-opendevstatus- NOTICE: Our Zuul gating CI/CD services are being taken offline now in order to apply some critical security updates, and are not expected to remain offline for more than 30 minutes. | 13:58 | |
corvus | they're laying fiber in my neighborhood soon | 14:08 |
fungi | not soon enough apparently! | 14:08 |
avass[m] | corvus: wow, I don't think I've ever had anything below 100Mb/s :) | 14:09 |
fungi | i have fond memories of 110bps over an acoustic coupler | 14:09 |
corvus | first image is almost done; the next is the executor which has another layer that might take a little bit, then all the rest should be quick | 14:10 |
corvus | oh, i drastically underestimated the size of the executor layer | 14:11 |
fungi | nothing to be done about it now | 14:11 |
corvus | i'll try to work on a time estimate | 14:12 |
fungi | thanks! | 14:12 |
corvus | (but for reference, it looks to be about 2x the data for the first image upload, so as a first approximation, this could take another 30m) | 14:13 |
corvus | eta 20m | 14:18 |
fungi | not too terrible | 14:19 |
corvus | oh, i guess there were a bunch of zeroes; it's finishing early | 14:26 |
fungi | i like zeroes | 14:26 |
fungi | but especially today | 14:26 |
mnaser | (esepecially at the end of my bank account, the more the better? :P) | 14:26 |
fungi | mnaser: i have a lot of those, but unfortunately they're all after the . | 14:27 |
mnaser | :P | 14:27 |
fungi | or , if in europe | 14:27 |
corvus | docker images pushed | 14:28 |
corvus | pushing git tag and commits now | 14:28 |
fungi | the docker tags are updated now too? if so i'll start pulling in opendev | 14:29 |
mnaser | so 4.6.0 is what we should pull | 14:29 |
avass[m] | I can pull zuul/zuul-executor:4.6.0, but they don't seem to show up in dockerhub yet | 14:30 |
avass[m] | nevermind now they do :) | 14:30 |
fungi | mnaser: 4, 4.6, 4.6.0 or latest | 14:30 |
corvus | git pushed | 14:30 |
fungi | should all be updated according to the plan which was worked out | 14:30 |
corvus | Falson: correct | 14:30 |
corvus | er fungi ^ | 14:30 |
fungi | thanks | 14:30 |
corvus | uploading to pypi | 14:31 |
* mnaser watches pipeline | 14:31 | |
tristanC | https://pypi.org/project/zuul/ says 4.6.0 is available | 14:35 |
corvus | pypi upload complete | 14:35 |
corvus | working on announcement email now | 14:35 |
corvus | announcement sent | 14:35 |
tristanC | and zuul-4.6.0-1.el7 is also now available in the sf-3.6-release repository, software-factory user can now run `sfconfig --update` | 14:36 |
avass[m] | deployed | 14:36 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 14:36 |
pabelanger[m] | starting the upgrade | 14:36 |
fungi | opendev is on its way back up as well | 14:37 |
corvus | changes pushed to zuul-jobs and opendev/base-jobs | 14:37 |
corvus | we'll need to recheck those and merge asap | 14:37 |
gtema | https://zuul-ci.org/docs/zuul/4.6.0/reference/releasenotes.html - returns 404 | 14:37 |
corvus | gtema: yeah, we'll have to catch up doc publishing later; they're in the email for now though | 14:38 |
fungi | gtema: yes, those won't be published just yet, good point | 14:38 |
corvus | i guess we should have said they would eventually be published there :/ | 14:38 |
fungi | there are 6 stories i sent a private reply about asking to confirm before switching to public, but would also like to do that asap | 14:38 |
avass[m] | pabelanger: your change would have been perfect to bump the docs ;) | 14:39 |
gtema | yes, I just followed link from email | 14:39 |
corvus | fungi: i think it's okay to flip the switch now | 14:39 |
fungi | so just to confirm, all 6 of those are okay to be public now? | 14:39 |
corvus | let me check the list | 14:40 |
fungi | thanks, i didn't want to inadvertently disclose something we haven't fixed yet | 14:40 |
fungi | so just wanted a second set of eyes on that list | 14:40 |
corvus | fungi: lgtm | 14:44 |
fungi | thanks, will switch them momentarily | 14:45 |
corvus | i've rechecked the 2 changes i uploaded | 14:45 |
fungi | ahh, thanks, i was about to approve them | 14:45 |
fungi | though will the base-jobs change be able to pass testing? | 14:45 |
fungi | or will it fail on the problems it addresses? | 14:46 |
corvus | i think it will pass | 14:46 |
pabelanger[m] | the biggest unknown for me right now, is potential changes needed in our base jobs. I think we are good, but guess we will know shortly | 14:46 |
corvus | i don't think it uses those secrets | 14:46 |
fungi | ahh, okay | 14:46 |
avass[m] | pabelanger: we've tried to make it as easy to mitigate any issues at least | 14:46 |
corvus | fungi: i think we need to make a corresponding change to openstack/project-config | 14:47 |
tristanC | it seems like our post log playbook is now failing when using `- hosts: "{{ site.fqdn }}"` with site is a secret | 14:47 |
corvus | fungi: i think it has very similar secrets, but the values are slightly different | 14:47 |
corvus | tristanC: you may be able to use the format approach i did in https://review.opendev.org/797910 | 14:48 |
corvus | oh | 14:48 |
corvus | i'm not sure i understand why that isn't working | 14:48 |
avass[m] | no that should work | 14:49 |
avass[m] | I think ? | 14:49 |
tobiash[m] | so far everything looks good in our env including base jobs | 14:49 |
tristanC | corvus: yes, that is surprising, the exception http://paste.openstack.org/show/806922/ | 14:49 |
tristanC | and the "{{ site }}" works in a task attribute before | 14:51 |
pabelanger[m] | deploying 4.6.0 now | 14:51 |
corvus | tristanC, avass: is it because secrets are hostvars now? | 14:52 |
corvus | so it's out of scope there? | 14:53 |
avass[m] | yeah that could be | 14:53 |
tristanC | corvus: that would explain the failure yes, so i guess we can nolonger do that | 14:53 |
corvus | i wonder if you can set a fact in a play before it? | 14:53 |
corvus | hrm that's not working for me | 14:54 |
avass[m] | it works if you do `hostvars[<inventory_hostname>].<var>` but not as a set fact | 14:55 |
corvus | avass: ++ | 14:56 |
corvus | tristanC: maybe you can change that to: hosts: "{{ hostvars['localhost'].site.fqdn }}" | 14:56 |
corvus | (with no need to run set_fact) | 14:57 |
mnaser | https://curl.zuul.vexxhost.dev/status | 14:57 |
mnaser | running 4.6.0 | 14:57 |
avass[m] | oh, but only if I set it as a fact in a play just before that play :) | 14:57 |
corvus | avass: right, but the secret "site" should still already be a hostvar, so i think set_fact isn't needed | 14:57 |
avass[m] | corvus: it actually doesn't | 14:58 |
avass[m] | so set_fact is needed | 14:58 |
tristanC | corvus: we can also simply splice the hostname directly in the post playbook | 14:59 |
corvus | if that's easy... :) but i'm certain avass's suggestion of setting the fact and then accessing it as a hostvar will work (i verified that) | 15:00 |
corvus | i'm testing my suggestion of not using set_fact now :) | 15:00 |
corvus | avass: i agree it does not seem to work; that seems weird to me. | 15:01 |
avass[m] | even weirder is that I don't even have to make it cacheable | 15:02 |
corvus | avass: i don't think you need to cache between plays, just between playbooks, right? | 15:02 |
fungi | i think caching is only needed if you want to pass it between different playbooks | 15:02 |
avass[m] | ah | 15:02 |
corvus | avass: oh weird! it's hostvars that's undefined if you don't set_fact | 15:07 |
corvus | that makes no sense | 15:07 |
* corvus < https://matrix.org/_matrix/media/r0/download/matrix.org/rPYNHehoGrbUbDdpWzQwwyNO/message.txt > | 15:08 | |
corvus | that actually works, but not without that first play | 15:08 |
corvus | (cacheable isn't needed) | 15:09 |
tristanC | also https://opendev.org/opendev/base-jobs/src/branch/master/roles/submit-logstash-jobs/tasks/main.yaml#L9 is now failing with `The task includes an option with an undefined variable. The error was: 'None' has no attribute 'get'` | 15:10 |
tristanC | i guess we need an extra `.get('data')` ? | 15:11 |
fungi | tristanC: is that after we merged 797910? | 15:14 |
avass[m] | corvus: weird | 15:14 |
tristanC | fungi: it seems like the issue will still happens after 797910 | 15:18 |
tristanC | i mean i don't see how 797910 would affect the logstash-jobs task | 15:19 |
corvus | https://review.opendev.org/797910 exercised its own fix, so the .format() method of dealing with jinja in secrets looks like it works | 15:21 |
pabelanger[m] | 4.6.0 - https://dashboard.zuul.ansible.com/t/ansible/status | 15:21 |
pabelanger[m] | checking to see if there is any fall out | 15:21 |
corvus | i agree; i don't understand the issue with the logstash playbook | 15:21 |
corvus | oh i think i may know, 1 sec | 15:22 |
corvus | tristanC: oh yes, that's exactly it, .get('data') | 15:23 |
corvus | tristanC: i agree -- this should fix it: "{{ (lookup('file', zuul.executor.result_data_file) | from_json).get('data').get('zuul').get('log_url') }}" | 15:23 |
corvus | tristanC: you want to propose that? | 15:24 |
tristanC | corvus: yes sure, it's https://review.opendev.org/c/opendev/base-jobs/+/797960 | 15:25 |
corvus | already approved | 15:27 |
corvus | tristanC, tobiash: does https://review.opendev.org/797135 look like an easy approval so we can exercise all of zuul's gate jobs? | 15:28 |
tobiash[m] | lgtm | 15:29 |
pabelanger[m] | https://review.opendev.org/c/zuul/zuul-jobs/+/797909/ failed linters | 15:29 |
pabelanger[m] | plus some testing | 15:30 |
pabelanger[m] | looking now | 15:30 |
corvus | it's probably the same error | 15:30 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 15:31 |
fungi | which has now merged | 15:31 |
pabelanger[m] | pull-from-intermediate-registry: Load information from zuul_return seemed to fail | 15:31 |
corvus | pabelanger: ah that's going to be the same issue tristan just found; i'll fix that | 15:33 |
pabelanger[m] | cool | 15:33 |
tristanC | so far the two unexpected failures are using secret in hosts play value and when using zuul result lookup | 15:34 |
pabelanger[m] | corvus: I am guessing, container builds will not work until we land 797909? | 15:36 |
avass[m] | well everyone quit early because of midsummer tomorrow so we have no traffic :) | 15:36 |
pabelanger[m] | must be, seeing retries on our jobs right now | 15:38 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 15:40 |
corvus | pabelanger: retries on container jobs? yeah, i think 797909 will be necessary | 15:43 |
pabelanger[m] | Yah, this is what we are seeing | 15:43 |
pabelanger[m] | The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'buildset_registry'\n\nThe error appears to be in '/var/lib/zuul/builds/129e3b83a77649db98651863a6122153/trusted/project_1/opendev.org/zuul/zuul-jobs/roles/pull-from-intermediate-registry/tasks/main.yaml': line 2, column 3 | 15:43 |
pabelanger[m] | so, once we land your change, it means 4.6.0 is the min requirement now for container build jobs | 15:44 |
corvus | that agrees with the latest fix i did | 15:44 |
corvus | yep | 15:45 |
corvus | but, also, s/for container build jobs// given how exploitable the issues are | 15:45 |
pabelanger[m] | https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8d3/797909/3/check/zuul-jobs-test-registry-buildset-registry/8d305dc/job-output.txt | 15:48 |
pabelanger[m] | same error as above | 15:49 |
tobiash[m] | pabelanger: does the re-read the result data json? | 15:49 |
tobiash[m] | we had that in one project and that file changed its layout | 15:50 |
tobiash[m] | pabelanger: you can put .data in between | 15:51 |
pabelanger[m] | I'm still getting up to speed, will defer to @corvus to update | 15:52 |
opendevreview | Tobias Henkel proposed zuul/zuul-jobs master: Fix reading buildset_registry from results.json https://review.opendev.org/c/zuul/zuul-jobs/+/797963 | 15:53 |
tobiash[m] | corvus, pabelanger I think that should fix it ^ | 15:53 |
tobiash[m] | we had a similar issue in one of our projects | 15:54 |
opendevreview | Tobias Henkel proposed zuul/zuul-jobs master: Fix reading buildset_registry from results.json https://review.opendev.org/c/zuul/zuul-jobs/+/797963 | 15:54 |
corvus | tobiash: that's already in my latest patchset | 15:54 |
corvus | except it should be secret_data, not data | 15:55 |
tobiash[m] | oh, sorry, I didn't follow all the backscroll | 15:55 |
tobiash[m] | then I'll abandon that | 15:55 |
corvus | https://review.opendev.org/797909 is the change | 15:55 |
pabelanger[m] | ah | 15:55 |
pabelanger[m] | so my comment is not right | 15:55 |
pabelanger[m] | however, ansible is failing on that line | 15:55 |
corvus | it's 2 issues -- adding in the extra level of hierarchy, and then actually fixing the security issue by using secret_data. 797909 is attempting to do both | 15:58 |
pabelanger[m] | podman test seems to be failing now too, looks like we are not finding the image on the buildset registry and going upstream | 16:00 |
corvus | if anyone can spot the issue in https://zuul.opendev.org/t/zuul/build/a8622de6cfaf47bb9a8114ffeed1c72e/console please let me know, i don't see it yet | 16:01 |
pabelanger[m] | i'm comparing to https://zuul.opendev.org/t/zuul/build/dc9e40d2ef144c69be6e73af59a091f8/console which looks to be the last good run | 16:04 |
pabelanger[m] | use-buildset-registry: Modify registries.conf doesn't seem change | 16:06 |
pabelanger[m] | modify_registries_conf not working properly for some reason? | 16:07 |
pabelanger[m] | which means we don't inject the buildset_registry? | 16:07 |
*** rpittau is now known as rpittau|afk | 16:09 | |
pabelanger[m] | yah, then our docker config isn't saved | 16:09 |
corvus | i think i see an issue | 16:10 |
corvus | https://opendev.org/zuul/zuul-jobs/src/branch/master/test-playbooks/registry/test-registry.yaml#L101-L102 | 16:11 |
avass[m] | corvus: was just about to comment that | 16:12 |
corvus | i don't think we can override "zuul.*" vars like that any more | 16:12 |
avass[m] | no, since they're extra vars | 16:12 |
avass[m] | I think I pointed that out while discussing the update on irc a couple of weeks ago :) | 16:12 |
avass[m] | corvus: if those roles need to use `zuul` vars running a nested ansible command is probably the easiest way to test them | 16:13 |
corvus | you may have; unfortunately, i didn't remember that when i was checking on the status of the patches in storyboard | 16:14 |
corvus | avass: yeah, though the restricted ansible in zuul is different than plain ansible | 16:15 |
corvus | so it's a divergence from testing like prod | 16:16 |
avass[m] | most differences are there to protect from running things on localhost right? | 16:16 |
corvus | another alternative would be to pass in variables or default to zuul.* -- if that's safe | 16:16 |
avass[m] | oh, I think it should be safe to add a `vars` entry in the role that defaults to zuul and override it like that | 16:17 |
avass[m] | since it's only possible to override role vars with include_role vars or extra vars | 16:17 |
corvus | right, but is it safe to allow someone to pass in a separate list of previously built artifacts? | 16:18 |
avass[m] | it would only be possible to override it in the same playbook that uses the role, so yes? | 16:19 |
corvus | avass: we're talking about saying "zuulartifacts: zuulartifacts | default(zuul.artifacts)" | 16:20 |
corvus | grr | 16:20 |
avass[m] | corvus: I mean like add-build-sshkey: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/add-build-sshkey/vars/main.yaml | 16:20 |
corvus | avass: we're talking about saying "zuul_artifacts: zuul_artifacts | default(zuul.artifacts)" | 16:20 |
corvus | actually let me rephrase that to make it clearer | 16:21 |
corvus | "zuul_artifacts: test_zuul_artifacts | default(zuul.artifacts)" | 16:21 |
corvus | so anyone could add test_zuul_artifacts as a job var | 16:21 |
avass[m] | let me push an example :) | 16:21 |
corvus | i understand the mechanics :) | 16:22 |
avass[m] | job vars wouldn't be able to override it | 16:22 |
corvus | avass: okay, i think i see what you're saying | 16:23 |
corvus | you'd have to write a new playbook that invokes the role to override it | 16:23 |
corvus | (i think it would actually be okay for job vars to override this, but i like it better if they can't) | 16:24 |
corvus | i'll work on a new patchset | 16:24 |
opendevreview | Albin Vass proposed zuul/zuul-jobs master: Add _zuul var that can be overriden when including role https://review.opendev.org/c/zuul/zuul-jobs/+/797977 | 16:25 |
avass[m] | that ^ makes it so the only way to override the roles internal `_zuul` is to do exactly what the test does | 16:26 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 16:26 |
corvus | avass: okay i think that ^ is your suggestion but more minimal | 16:27 |
avass[m] | corvus: ++ | 16:27 |
corvus | i want to add a comment to that, but i'll let the builds finish first | 16:28 |
*** jpena is now known as jpena|lunch | 16:30 | |
*** jpena|lunch is now known as jpena | 16:30 | |
*** jpena is now known as jpena|off | 16:31 | |
corvus | better | 16:46 |
corvus | there's an issue with the inception aspect of this: https://zuul.opendev.org/t/zuul/build/4da79c76a392443d870216ef1b564eef/console | 16:46 |
corvus | (the real buildset registry before the fake buildset registry) | 16:46 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 16:48 |
corvus | i don't understand the issue yet; i'm just adding some debug prints there | 16:48 |
corvus | (but zuul-jobs-test-registry-docker did pass, so the zuul.artifacts thing worked) | 16:51 |
corvus | oh i think i get it | 16:52 |
corvus | this is a catch-22 because the real buildset registry is running from a trusted playbook, so it's not running with the secret_data change | 16:53 |
corvus | we should disable these jobs, merge the change, then re-enable them | 16:53 |
corvus | (the existing docker job should give us confidence) | 16:54 |
pabelanger[m] | +1 | 16:54 |
avass[m] | they're currently broken so it can't get worse? :) | 16:54 |
corvus | indeed | 16:54 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 16:56 |
corvus | zuul-maint: ^ expected to pass | 16:56 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Re-add buildset-registry jobs https://review.opendev.org/c/zuul/zuul-jobs/+/797986 | 16:57 |
pabelanger[m] | +2 | 16:58 |
pabelanger[m] | anyone else want to +A https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 17:03 |
fungi | pabelanger[m]: done, just finished reading through it | 17:06 |
corvus | looks like no issues with gerrit's zuul: https://ci.gerritcodereview.com/t/gerrit/build/2f6a0197530e4bedb3cdae66ecccf5fc/console | 17:07 |
fungi | excellent! | 17:07 |
corvus | tristanC: when you have a sec, can you look at https://review.opendev.org/797135 and leave a +2 (but don't approve)? then once the image jobs are done i can approve it to exercise all the gate/promote jobs. | 17:14 |
avass[m] | we really threaded the needle with the security update | 17:14 |
corvus | tristanC: (of course, if you -1 it i can find another small change) | 17:14 |
pabelanger[m] | avass: the hardest part for us was dealing with zookeeper package with SSL support. I know it isn't zuuls issue, but would almost be nice to have a way to opt into ssl zookeeper still | 17:16 |
pabelanger[m] | that said, we managed to get it installed via tarball and it just work | 17:16 |
fungi | i suppose that's one up-side to java applications basically behaving like containers | 17:17 |
fungi | or virtual machines | 17:17 |
pabelanger[m] | Yup. Once we updated our ansible role to support it, it basically just worked | 17:18 |
corvus | pabelanger: now that it's done, you should be ready for 5.0 :) | 17:18 |
pabelanger[m] | everything else for zuul / nodepool fell into place (minus keystore.password) issue | 17:18 |
pabelanger[m] | looking forward to it | 17:19 |
pabelanger[m] | do we have a timeline on 5.0? Or etherpad for it | 17:21 |
corvus | pabelanger: no, we're just merging changes as fast as we can :) | 17:22 |
opendevreview | Merged zuul/zuul-jobs master: Prevent leaks of buildset registry credentials https://review.opendev.org/c/zuul/zuul-jobs/+/797909 | 17:23 |
corvus | we're almost done removing gearman, and i think the first change that adds a unit test with two schedulers will be ready shortly after that | 17:24 |
corvus | (we won't be ready to run 2 schedulers at that point, but it's still a big milestone and it's within sight :) | 17:25 |
pabelanger[m] | sounds exciting, looking forward to that day for sure | 17:25 |
corvus | i rechecked 797986 | 17:26 |
pabelanger[m] | it was pretty nice to just stop / start nodepool for 3.14.0 -> 4.1.0 upgrade and have no state loss | 17:26 |
corvus | fungi: could i tempt you into a +3 or -1 on https://review.opendev.org/797135 ? | 17:28 |
corvus | i'm assuming tristanC is updating base playbooks | 17:28 |
pabelanger[m] | it would also be worth a discussion to see if we can add support in zuul-executor to use ansible-runner some how, that way people can use ansible via an execution environment is they want. Given that is what downstream redhat is now shipping for folks | 17:28 |
corvus | (now that the zuul-jobs change has merged, i think we're ready to exercise zuul's gate) | 17:28 |
fungi | sure, toggling back and forth between here and playing whack-a-mole with broken opendev jobs | 17:28 |
corvus | fungi: cool, i'll look into #opendev | 17:30 |
fungi | corvus: that's not a hotfix for the security release... is it urgent? | 17:30 |
corvus | fungi: no, it's the first simple change i could find to merge in order to exercise the gate and publish docs | 17:30 |
fungi | ahh, yes good idea too | 17:31 |
fungi | lgtm, approved | 17:32 |
fungi | hopefully we'll get release notes too | 17:32 |
fungi | so they no longer 404 | 17:32 |
corvus | yeah. we may need to run a docs publish job on the tag for that | 17:32 |
corvus | so we might consider enqueing the tag and.... i dunno, somehow stopping the docker build jobs? | 17:34 |
corvus | (either that, or decide that we don't care about pushing a new build with the same tag) | 17:35 |
avass[m] | corvus: merge a change that disables everything except documentation and then trigger the release pipeline for 4.6.0 maybe? | 17:36 |
corvus | yeah, i think that would work (i don't think that would run afoul of the branch-contains-tag check) | 17:37 |
opendevreview | James E. Blair proposed zuul/zuul master: Temporarily disable some release jobs https://review.opendev.org/c/zuul/zuul/+/797994 | 17:39 |
opendevreview | James E. Blair proposed zuul/zuul master: Re-enable the release jobs https://review.opendev.org/c/zuul/zuul/+/797995 | 17:39 |
opendevreview | James E. Blair proposed zuul/zuul master: Add a comment about checking zk cert perms https://review.opendev.org/c/zuul/zuul/+/797997 | 17:41 |
pabelanger[m] | Hmm, I am seeing the following in one of our jobs still | 17:44 |
pabelanger[m] | 2021-06-24 17:43:31.215244 | localhost | "msg": "'dict object' has no attribute 'artifacts'" | 17:44 |
pabelanger[m] | https://328deba99b1e154be8f6-b5cb0ed4d1c5e0b8a6ba497ae655d78f.ssl.cf1.rackcdn.com/85/61d77107edefcf45b1cac03e26e9df3452e2551b/check/network-ee-tox-ansible-builder/3cf8c6e/job-output.html#l810 | 17:45 |
corvus | pabelanger: using the 'git' driver for zuul-jobs? we may need to check the poll interval | 17:45 |
corvus | pabelanger: (also, you're running direct from upstream -- not a local fork of zuul-jobs, right?) | 17:45 |
pabelanger[m] | yah, upstream | 17:45 |
pabelanger[m] | let me check config | 17:45 |
pabelanger[m] | I thought we set it up from gerrit | 17:45 |
pabelanger[m] | for depends-on | 17:45 |
corvus | pabelanger: oh, then hrm... if it's git, the default poll interval is 2 hours. but if it's gerrit and you've got stream-events, then it should have been approximately immediate -- but of course would only take effect if the item was enqueued after the change merged | 17:47 |
corvus | pabelanger: it may be working -- we may have just missed something in our fix | 17:47 |
corvus | it may be that we need to set the default value differently | 17:48 |
fungi | workingly unworking | 17:48 |
corvus | yeah, i think that's it; fix in a second | 17:48 |
pabelanger[m] | Hmm, let me recheck | 17:48 |
pabelanger[m] | maybe I was too fast on starting job after the merge | 17:49 |
corvus | nah i think it's a real bug | 17:49 |
pabelanger[m] | it says artifacts not zuul_artifacts | 17:50 |
pabelanger[m] | which is your change for that task | 17:50 |
corvus | pabelanger: but it might be the initializer in vars/main.yaml that's emitting that error | 17:51 |
pabelanger[m] | oh, yah | 17:51 |
pabelanger[m] | I think you are right | 17:51 |
opendevreview | James E. Blair proposed zuul/zuul-jobs master: Fix default value for zuul_artifacts https://review.opendev.org/c/zuul/zuul-jobs/+/798000 | 17:51 |
corvus | zuul's upload-image job just hit a retry; i didn't catch it, but i'd wager it's the same error | 17:51 |
corvus | pabelanger, avass, fungi: ^ one more zuul-jobs fix | 17:53 |
fungi | lgtm, expedited approval | 17:53 |
pabelanger[m] | oh noes | 17:54 |
pabelanger[m] | I think our schedule just crashed | 17:54 |
fungi | did it log a traceback? | 17:54 |
pabelanger[m] | trying to get it back online first | 17:55 |
pabelanger[m] | I might have an idea | 17:57 |
corvus | (ftr, yes, zuul hit the same .artifacts error) | 17:58 |
fungi | that probably explains some subset of build failures i'm seeing as well which i haven't dug into logs for yet | 17:59 |
corvus | fungi: anything with a buildset registry is still broken | 17:59 |
fungi | that too | 17:59 |
fungi | yes | 17:59 |
corvus | zuul-client tests are broken: https://b2b5edd1153190007318-36597f05313fd3fdc073d75819a76269.ssl.cf1.rackcdn.com/797135/1/gate/zuul-tox-zuul-client/853c2a9/job-output.txt | 18:00 |
fungi | also i know we still have some less-used jobs in opendev which will need updating for things like no longer trying to set ansible_python_interpreter | 18:00 |
corvus | i'll see if i can repro locally | 18:00 |
pabelanger[m] | ExecReload=/bin/kill -HUP $MAINPID | 18:00 |
pabelanger[m] | I believe that did it | 18:01 |
pabelanger[m] | I missed the full-reconfigure change | 18:01 |
fungi | d'oh, in retrospect it might have made sense to at least not die on hup, but... i guess that makes it noticeable at least | 18:01 |
corvus | whew, that should be easy to fix :) | 18:02 |
corvus | pabelanger: you may be able to use "zuul-scheduler smart-reconfigure" instead too | 18:02 |
corvus | will be a lot faster | 18:02 |
fungi | system: "hey, zuul, hang up already" ... zuul: "okay, can do!" | 18:02 |
corvus | (full-reconfigure should basically only be needed if something goes wrong with smart-reconfigure) | 18:02 |
fungi | *click* | 18:02 |
pabelanger[m] | k, will look into smart-reconfigure | 18:04 |
opendevreview | James E. Blair proposed zuul/zuul master: Fix zuul client tests https://review.opendev.org/c/zuul/zuul/+/798004 | 18:10 |
opendevreview | James E. Blair proposed zuul/zuul master: Temporarily disable some release jobs https://review.opendev.org/c/zuul/zuul/+/797994 | 18:11 |
opendevreview | James E. Blair proposed zuul/zuul master: Re-enable the release jobs https://review.opendev.org/c/zuul/zuul/+/797995 | 18:11 |
corvus | looks like a tox-remote error too | 18:11 |
corvus | (also looks like tests just need updating) | 18:12 |
opendevreview | Merged zuul/zuul-jobs master: Fix default value for zuul_artifacts https://review.opendev.org/c/zuul/zuul-jobs/+/798000 | 18:14 |
corvus | zuul-maint: https://opendev.org/zuul/zuul/src/branch/master/tests/remote/test_remote_hostvars.py#L77-L80 | 18:15 |
corvus | the only assertion that test has is something we don't support any more | 18:15 |
corvus | is there something else we should be testing instead? | 18:16 |
corvus | like... we have regular unit tests that make sure normal hostvars work... i think that test really was to make sure that ansiblepythoninterpreter worked | 18:16 |
fungi | seems that way, i'd personally be fine dropping it | 18:17 |
fungi | added by https://review.openstack.org/637338 "executor: use node python path" | 18:18 |
fungi | yep, exactly why it's in there | 18:19 |
opendevreview | James E. Blair proposed zuul/zuul master: Fix zuul client and remote tests https://review.opendev.org/c/zuul/zuul/+/798004 | 18:26 |
corvus | zuul-maint: that should fix the tox-client and tox-remote jobs | 18:26 |
opendevreview | James E. Blair proposed zuul/zuul master: Temporarily disable some release jobs https://review.opendev.org/c/zuul/zuul/+/797994 | 18:27 |
opendevreview | James E. Blair proposed zuul/zuul master: Re-enable the release jobs https://review.opendev.org/c/zuul/zuul/+/797995 | 18:27 |
corvus | the fastest (yet still safe) way to proceed would be to +3 798004 and 797994 now | 18:27 |
fungi | i've +2's both, second review would be nice | 18:31 |
corvus | tristanC, tobiash, pabelanger: ^ if you're around | 18:31 |
fungi | if everyone's busy i'm okay with snigle-core approving it | 18:34 |
fungi | single too | 18:34 |
pabelanger[m] | sorry, which ones? | 18:34 |
* fungi wonders for a moment what a snigle might be | 18:35 | |
corvus | https://review.opendev.org/798004 and https://review.opendev.org/797994 | 18:35 |
corvus | really mostly the first one | 18:35 |
corvus | er i just caught a typo in it | 18:35 |
opendevreview | James E. Blair proposed zuul/zuul master: Fix zuul client and remote tests https://review.opendev.org/c/zuul/zuul/+/798004 | 18:36 |
pabelanger[m] | k, +2 | 18:36 |
pabelanger[m] | add +A when ready | 18:36 |
corvus | done | 18:36 |
fungi | oh, typo indeed. i missed that | 18:37 |
corvus | 00]0[]1[1one! | 18:37 |
fungi | eleventy | 18:37 |
opendevreview | James E. Blair proposed zuul/zuul master: Temporarily disable some release jobs https://review.opendev.org/c/zuul/zuul/+/797994 | 18:37 |
pabelanger[m] | also, our container jobs appear to be good now | 18:38 |
corvus | pabelanger: \o/ | 18:38 |
fungi | awesome! | 18:38 |
corvus | okay, those are both in gate now and expected to pass; assuming they don't fail earlier, i think we're idle for about an hour, then we can enqueue the tag | 18:39 |
fungi | looks like most of the urgently-failing things are now under control in opendev | 18:42 |
fungi | thanks everyone for all the help! | 18:43 |
corvus | thanks indeed! | 18:43 |
pabelanger[m] | is anyone able to help out with the following: http://paste.openstack.org/show/806930/ | 19:01 |
pabelanger[m] | if we pre-apply 'gate' label before PR has reported back, zuul will not enqueue the PR any more | 19:03 |
pabelanger[m] | we have to toggle the gate label | 19:03 |
pabelanger[m] | https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L34 | 19:03 |
pabelanger[m] | is the pipeline | 19:03 |
avass[m] | pabelanger: I think you need to add the githbuchecks to your gate pipeline triggers | 19:04 |
pabelanger[m] | you mean the check_run event? | 19:05 |
pabelanger[m] | let me compare to https://zuul-ci.org/docs/zuul/reference/drivers/github.html#reference-pipelines | 19:06 |
avass[m] | trying to figure out how to do that, if it's possible to do it | 19:06 |
avass[m] | but I'd expect it to be be possible to a check success somehow | 19:07 |
avass[m] | trigger on a check success* | 19:07 |
pabelanger[m] | yes, we have that | 19:08 |
pabelanger[m] | I wonder if the status syntax is wrong some how | 19:08 |
pabelanger[m] | I had issue with [bot] in it last night and removed it | 19:08 |
avass[m] | pabelanger: I can only see a check success requirement, not a trigger? | 19:08 |
avass[m] | pabelanger: oh, nvm I'm blind | 19:09 |
pabelanger[m] | http://paste.openstack.org/show/806931/ | 19:10 |
pabelanger[m] | should be more help | 19:10 |
pabelanger[m] | because Types [re.compile('pull_request')] doesn't match check_run | 19:11 |
pabelanger[m] | is the part that is confusing me | 19:11 |
avass[m] | pabelanger: should the `event: pull_request` be replaced with `event: check_run` maybe? | 19:18 |
pabelanger[m] | I don't believe so | 19:20 |
pabelanger[m] | tobiash: ^ maybe you have some thoughts | 19:20 |
pabelanger[m] | https://zuul-ci.org/docs/zuul/reference/drivers/github.html#value-pipeline.trigger.%3Cgithub%20source%3E.action.status | 19:20 |
pabelanger[m] | the syntax looks to be right | 19:21 |
avass[m] | there was a change to fix the normalization going on in the github driver some weeks ago | 19:21 |
avass[m] | but I think that was till backwards compatible | 19:21 |
avass[m] | I got the same config with `action: pull_request` but I'm not sure if I've seen it work | 19:22 |
avass[m] | pull_request doesn't have a `status` action: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#pull_request | 19:22 |
avass[m] | and it looks like it should be check_run? https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#check_run | 19:23 |
pabelanger[m] | it is statuses | 19:23 |
pabelanger[m] | statuses: ansible-zuul:ansible/check:success | 19:23 |
pabelanger[m] | in pull_request | 19:23 |
pabelanger[m] | yah, really don't know what is going on | 19:29 |
avass[m] | pabelanger: are you sure that's the correct event you got? | 19:33 |
pabelanger[m] | https://zuul-ci.org/docs/zuul/reference/drivers/github.html#attr-pipeline.require.%3Cgithub%20source%3E.status | 19:33 |
pabelanger[m] | is what I am reading now | 19:33 |
avass[m] | the logs I mean | 19:33 |
pabelanger[m] | yah, its right event | 19:36 |
avass[m] | because that's a check_run event, and explains why pull_request isn't matching in that case | 19:36 |
avass[m] | but I suppose that worked before? | 19:37 |
avass[m] | oh, but didn't github change their api a while ago, and you updated from 3.19? | 19:38 |
pabelanger[m] | yes, this worked properly in 3.19.1 | 19:41 |
corvus | pabelanger, avass: pretty sure pullrequest and checkrun are different events | 19:41 |
corvus | sigh | 19:41 |
pabelanger[m] | but after upgraded I needed to remove [bot] from the status check | 19:41 |
avass[m] | corvus: it does look like that right? | 19:41 |
corvus | pabelanger, avass: pretty sure pull_request and check_run are different events | 19:41 |
corvus | pabelanger: at first glance, i would say that you should ignore that debug message because it doesn't correspond with a trigger you are interested in | 19:42 |
pabelanger[m] | eg: https://github.com/ansible/project-config/commit/a61e3d286890634cce2c3eea2dde50c8ebcd2631#diff-169729fef3c97c20106951e89bf95e0d3bcb2e30c2f1fec93d93a387703fe4bf | 19:42 |
pabelanger[m] | ack | 19:42 |
corvus | somewhere there should be a pull_request event with a "labeled" action | 19:43 |
corvus | pabelanger: but you have a a pipeline requirement that the check pipeline succeed | 19:43 |
corvus | https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L50 | 19:44 |
pabelanger[m] | yes | 19:44 |
pabelanger[m] | we want check reported back before enqueing into gate | 19:44 |
pabelanger[m] | enqueued* | 19:44 |
corvus | doesn't that mean that it can't be enqueued into gate unless the github check named 'check' has succeeded | 19:44 |
corvus | i thought the problem statement was that it wasn't enqueued if you added the 'gate' label before it reported back | 19:45 |
pabelanger[m] | yes, that is what I would expect to happen. | 19:45 |
pabelanger[m] | yes, but we wouldn't expect it to be enqueued until after the check is reported | 19:45 |
corvus | oh, wait, you're saying you want to add the gate label, and then expect the check success to be the enqueue trigger? | 19:45 |
pabelanger[m] | yes, that is right | 19:45 |
avass[m] | corvus: I think the idea is that it should be possible to add the gate label, and the gate should be triggered by a check success | 19:45 |
pabelanger[m] | so we are not matching properly some how on the check success | 19:45 |
pabelanger[m] | and trying to debug why that is | 19:46 |
corvus | so https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L55-L57 is not matching | 19:46 |
pabelanger[m] | exactly | 19:46 |
pabelanger[m] | if I unlabel gate then relabel gate | 19:46 |
pabelanger[m] | it is enqueued | 19:46 |
pabelanger[m] | so our pull_request trigger isn't proper for some reason | 19:47 |
pabelanger[m] | https://github.com/ansible/project-config/commit/a61e3d286890634cce2c3eea2dde50c8ebcd2631#diff-169729fef3c97c20106951e89bf95e0d3bcb2e30c2f1fec93d93a387703fe4bf was the change I made last night, I wonder if we need to keep [bot] for the pull_request status | 19:47 |
avass[m] | yeah so the trigger is wrong, I think `action: pull_request` should be `action: check_run` and the status should probably be something else: | 19:48 |
avass[m] | https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L581 | 19:48 |
pabelanger[m] | https://zuul-ci.org/docs/zuul/reference/drivers/github.html#attr-pipeline.require.%3Cgithub%20source%3E.status | 19:48 |
avass[m] | but I don't understand why it was working in the first case then :( | 19:48 |
corvus | pabelanger: it does look like you're reporting via the status api, so i think you would need [bot] there | 19:49 |
corvus | here's a pull_request status event from opendev: Event <GithubTriggerEvent 0x7fe0747b7160 pull_request pypa/pip refs/pull/10065/head status github.com/pypa/pip 10065,7d2406620e12927983d48f2680e4c1b1cfc7f780 delivery: 0963ae1e-d4f0-11eb-93e6-3711742a29fa> for change <Change 0x7fe026c32ac0 pypa/pip 10065,7d2406620e12927983d48f2680e4c1b1cfc7f780> | 19:49 |
corvus | (just so you know they do exist :) | 19:49 |
pabelanger[m] | I would guess https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L50 doesn't need to change, since it is working properly | 19:50 |
avass[m] | yeah but what triggered that event? | 19:50 |
corvus | avass: no idea, i'm just saying pullrequest is a real event and checkrun is the wrong one to look at :) | 19:50 |
avass[m] | looking at pabelangers logs it looks like he's getting a check_run event | 19:50 |
corvus | avass: i'm sure he is, but it's not relevant | 19:51 |
corvus | it's some other system reporting a check run completion | 19:51 |
avass[m] | isn't that what we want? | 19:51 |
avass[m] | just not another system | 19:51 |
corvus | pabelanger: wait, are you reporting via checks api or status? | 19:52 |
corvus | it is the checks api, so you shouldn't need [bot] in either place | 19:53 |
avass[m] | > 2021-06-24 18:51:44,131 DEBUG zuul.GithubConnection: [e: 372f0b90-d51d-11eb-9bff-71269ba4cc73] Scheduling event from github.com: <GithubTriggerEvent 0x7f3240632860 check_run ansible/project-config refs/pull/851/head completed github.com/ansible/project-config 851,7f3f39d1cbd5b1baa3e49032569a87371e5f0a86 delivery: 372f0b90-d51d-11eb-9bff-71269ba4cc73 check_run: ansible-zuul:ansible/check:success> | 19:53 |
avass[m] | I'd expect it to trigger on that ^ | 19:53 |
pabelanger[m] | I believe we are only using check now | 19:53 |
pabelanger[m] | https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L25 | 19:53 |
corvus | avass: i agree, that would be a good event to trigger on. pabelanger maybe you should change your trigger to be that instead? | 19:54 |
pabelanger[m] | sorry, which one? | 19:54 |
corvus | i don't think there's an example yaml tirregr config for what avass is suggesting | 19:55 |
pabelanger[m] | k, I have to jump into a meeting now, but will try to craft something on the check_run event | 19:56 |
avass[m] | that doesn't really explain how this used to work however :) | 19:56 |
corvus | we should probably entertain the possibility that it hasn't worked for longer than 6 hours | 19:57 |
avass[m] | I was thinking more how it worked in 3.19.1, but I guess something made it compatible back then | 19:58 |
corvus | highlighting in the streaming console looks good now; also the scrollbar is fixed | 20:02 |
opendevreview | Merged zuul/zuul master: Fix zuul client and remote tests https://review.opendev.org/c/zuul/zuul/+/798004 | 20:03 |
corvus | \o/ unblocked! | 20:03 |
corvus | watching the promote jobs for that now | 20:04 |
corvus | i think docs promote may have a bug | 20:04 |
corvus | ah i see | 20:07 |
pabelanger[m] | avass: for 3.19.1 we did status api, and check runs | 20:10 |
opendevreview | James E. Blair proposed zuul/project-config master: Update doc publish secret with python string format https://review.opendev.org/c/zuul/project-config/+/798016 | 20:10 |
pabelanger[m] | but now, we've moved directly to check runs | 20:10 |
corvus | fungi: around for a +3 on https://review.opendev.org/798016 ? | 20:11 |
pabelanger[m] | I guess it would be something like: http://paste.openstack.org/show/806933/ | 20:12 |
corvus | should action be status or complete? | 20:13 |
corvus | pabelanger: you can go into the event history in github and inspect it | 20:13 |
pabelanger[m] | actually... | 20:13 |
pabelanger[m] | yah | 20:13 |
pabelanger[m] | doing that now | 20:13 |
corvus | i'll just self-approve that config change | 20:14 |
opendevreview | Merged zuul/project-config master: Update doc publish secret with python string format https://review.opendev.org/c/zuul/project-config/+/798016 | 20:14 |
corvus | and now i'll re-enqueue 798004,3 in promote | 20:14 |
avass[m] | pabelanger: probably completed: https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L530 | 20:14 |
opendevreview | James E. Blair proposed zuul/zuul master: Increase unit test job timeout to 90 minutes https://review.opendev.org/c/zuul/zuul/+/798019 | 20:17 |
corvus | cool, all the promote jobs should work now | 20:17 |
corvus | hopefully the docs site will update in a few minutes after an afs publish cycle | 20:18 |
pabelanger[m] | I can't figure out how to limit the trigger to a specific check run | 20:22 |
corvus | \o/ https://zuul-ci.org/docs/zuul/reference/releasenotes.html is updated | 20:22 |
pabelanger[m] | eg: ansible-zuul:ansible/check:success | 20:22 |
pabelanger[m] | check_runs don't seem to be name for that | 20:22 |
corvus | next we need https://review.opendev.org/797994 to merge, then we can re-enqueue the tag and that should cause the 4.6.0 release notes to publish | 20:23 |
avass[m] | pabelanger: the check run seems to do `slug:name:conclusion`: https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L2315 | 20:25 |
fungi | sorry, back now, looking but i guess 798016 has merged already | 20:25 |
avass[m] | not sure what the slug is supposed to be | 20:25 |
pabelanger[m] | was just looking at that | 20:25 |
corvus | fungi: thx, i think we're idle for another 1-1.5 hours since the test disable job hit a timeout | 20:27 |
corvus | though if any zuul-maint wants to spend a few seconds reviewing https://review.opendev.org/798019 that could be timely :) | 20:28 |
pabelanger[m] | avass: so http://paste.openstack.org/show/806933/ ? | 20:28 |
pabelanger[m] | err | 20:28 |
pabelanger[m] | http://paste.openstack.org/show/806934/ | 20:28 |
avass[m] | pabelanger: I guess so? not sure if the check_run attribute is correct but that seems reasonable to me | 20:29 |
pabelanger[m] | yah, this isn't documented too well :( | 20:29 |
avass[m] | corvus: I've reviewed it but I'm not a maintainer :) | 20:30 |
avass[m] | but since the testsuite is up to 90minutes, maybe it's a good idea to start looking at using multiple nodes to speed things up or set up some test avoidance? | 20:30 |
fungi | or look into whether any of the tests are less efficient than they could be | 20:31 |
avass[m] | yeah | 20:31 |
corvus | they're all less efficient than they could be :) | 20:31 |
corvus | the other thing we should consider is that opendev established its standard node size 10 years ago | 20:31 |
pabelanger[m] | avass: I think it is https://github.com/ansible/project-config/pull/852 | 20:32 |
pabelanger[m] | looking at the schema | 20:32 |
pabelanger[m] | we'll find out soon | 20:32 |
corvus | i was geniuinely surprised that https://review.opendev.org/797180 did not speed up the tests. and i was equally surprised that https://review.opendev.org/797181 did not slow them down. | 20:33 |
pabelanger[m] | okay, https://github.com/ansible/project-config/pull/852 seems to have done it | 21:05 |
pabelanger[m] | thanks corvus avass | 21:05 |
corvus | pabelanger: \o/ | 21:05 |
avass[m] | pabelanger: np :) | 21:06 |
corvus | pabelanger: prolly can drop lines 58-60 now | 21:06 |
pabelanger[m] | yah, doing clean up now | 21:06 |
corvus | i know i'd come back to that in 3 months and be like "which thing is the one that's supposed to work again?" :) | 21:07 |
pabelanger[m] | next up is to look at the dequeue stuff | 21:07 |
pabelanger[m] | from the example pipeline | 21:07 |
opendevreview | Merged zuul/zuul-jobs master: Re-add buildset-registry jobs https://review.opendev.org/c/zuul/zuul-jobs/+/797986 | 21:28 |
corvus | sigh, another timeout | 21:39 |
corvus | it's worth noting that the unit tests take about 1 hour on most providers; it's only one that's 33% slower | 21:41 |
corvus | i should just re-enqueue changes as soon as they start running jobs in that provider | 22:50 |
fungi | does it seem to be i/o contention? we recently took the limestone provider out of our pool because of that, has it been readded or something? | 22:57 |
corvus | fungi: it's bhs | 22:57 |
ianw | i'm not sure limestone has been re-added. i wasn't aware of node issues there, it was the mirror that was getting stuck in that case | 22:58 |
fungi | ahh, i wonder why ovh-bhs1 would be so much slower than ovh-gra1... maybe they have us on a constrained host aggregate or something and we're competing with our other nodes | 22:59 |
fungi | ianw: yeah, well it seemed to be i/o bandwidth for the mirror in that case, but yes | 23:00 |
opendevreview | Merged zuul/zuul master: Temporarily disable some release jobs https://review.opendev.org/c/zuul/zuul/+/797994 | 23:05 |
corvus | wow, by the skin of our teeth | 23:06 |
corvus | okay, i'm going to enqueue the tag now | 23:10 |
fungi | thanks! | 23:11 |
corvus | fungi: how does this look? docker exec zuul-schedulerscheduler1 zuul enqueue-ref --tenant zuul --pipeline release --project opendev.org/zuul/zuul --ref ref/tags/4.6.0 --newrev 487c0ba5f8b2758795bb5e5c8e5bd64777d36524 | 23:17 |
corvus | fungi: how does this look? docker exec zuul-scheduler_scheduler_1 zuul enqueue-ref --tenant zuul --pipeline release --project opendev.org/zuul/zuul --ref ref/tags/4.6.0 --newrev 487c0ba5f8b2758795bb5e5c8e5bd64777d36524 | 23:17 |
fungi | corvus: that doesn't seem to match the 4.6.0 tag object sha for me | 23:20 |
fungi | corvus: do you not also need --trigger=gerrit? | 23:21 |
corvus | fungi: i think trigger is not necessary | 23:21 |
fungi | anyway, `git show-ref 4.6.0` in my zuul checkout gives me bbafeada02635a4c8b5477ec316c16c132386892 | 23:22 |
corvus | fungi: huh, i just did a git clone and got 487c0ba5f8b2758795bb5e5c8e5bd64777d36524 a second time | 23:23 |
fungi | from show-ref or show? | 23:24 |
* corvus < https://matrix.org/_matrix/media/r0/download/matrix.org/nxIlZtMqIsxcyBWKHFwCuqAT/message.txt > | 23:24 | |
fungi | i'll try a fresh clone | 23:25 |
fungi | oh! you know what, i need to delete 4.6.0 and update again | 23:26 |
fungi | 487c0ba5f8b2758795bb5e5c8e5bd64777d36524 | 23:26 |
corvus | fungi: was the other tag from a test build? | 23:27 |
fungi | yep, okay i agree that command looks correct | 23:27 |
corvus | cool, running enqueue-ref now | 23:27 |
fungi | yes, i had locally tagged 4.6.0 when working out the pypi upload commands for you, then promptly forgot i had done that | 23:27 |
fungi | thanks! | 23:28 |
corvus | hrm, not showing up on status... i'm not sure that ref was right | 23:29 |
corvus | i think i was missing the 's' in refs | 23:29 |
corvus | refs/tags/4.6.0 is what it should be, yeah? | 23:29 |
fungi | oh, yep! | 23:29 |
corvus | yep, there it is now | 23:29 |
corvus | and only the docs job is queued | 23:30 |
fungi | i usually run via docker-compose but a recent working example from my command history is: | 23:30 |
fungi | sudo docker-compose exec scheduler zuul ^Cqueue | 23:30 |
fungi | -ref --tenant=openstack --trigger=gerrit --pipeline=release-post --project=opens | 23:30 |
fungi | tack/releases --ref=refs/heads/master --newrev=2d7fc060b52ee823a7ebe690ea50deab9 | 23:30 |
fungi | (minus the stray newline in there) | 23:31 |
corvus | i should have just checked your history :) | 23:31 |
pabelanger[m] | so, given all the fuse with the check_run status moving from check to gate, I think I am just going make the more to remove the clean check requirements. | 23:33 |
pabelanger[m] | that said, am I missing any obvious downside? | 23:33 |
pabelanger[m] | maybe more gate resets because of poor commits? | 23:33 |
corvus | while waiting, i checked runtimes, and it turns out that bhs is just slightly behind rax in averages; my guess is it just has a bit more variation which is why we see it, but it's probably performing at par | 23:33 |
fungi | pabelanger[m]: do your users frequently approve broken changes which won't merge, and do you tend to have long gate queues? | 23:34 |
fungi | it's that combination which led to openstack relying on it | 23:34 |
corvus | and if you're not in that position, i highly recommend not using it; users will be much happier | 23:35 |
pabelanger[m] | fungi: honestly, I don't think so. | 23:35 |
fungi | corvus: yep, fully agree | 23:35 |
pabelanger[m] | not using clean check? | 23:35 |
corvus | (and you get to benefit from the whole "let the computer do the testing and don't worry about it" thing) | 23:35 |
pabelanger[m] | that would mean, in github, we'd update our branch protection to only require ansible/gate | 23:35 |
corvus | pabelanger: correct i recommend not using clean check unless you have to | 23:36 |
pabelanger[m] | ack | 23:36 |
ianw | i should have realised that overriding ansible_python_interprter in https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/797808 would fail. i guess the real problem here is that python-path: auto and ansible's detection is failing on debian bullseye | 23:36 |
fungi | pabelanger[m]: yes, basically openstack relies on it to reduce the massive amounts of nodes which get chewed up and spat out on gate resets from failures on very long queues, and broken changes working their way up the queue like a wrecking ball | 23:36 |
pabelanger[m] | okay, I'll enable supercedes: check tomorrow, remove something but not tell people right away | 23:36 |
corvus | ianw: can you set in in nodepool? | 23:36 |
fungi | pabelanger[m]: aha, the "folgers crystals" test | 23:37 |
pabelanger[m] | indeed | 23:37 |
ianw | corvus: yeah, i guess that is the place to do it. it might fix itself with an ansible upgrade in zuul at some point i guess | 23:37 |
corvus | ianw: https://zuul-ci.org/docs/nodepool/configuration.html#attr-diskimages.python-path i think | 23:38 |
ianw | hrm, we already have it set https://opendev.org/openstack/project-config/src/branch/master/nodepool/nodepool.yaml#L249 | 23:39 |
fungi | ianw: i feel like we solved this once for bullseye already | 23:39 |
fungi | ianw: is it nested ansible maybe? | 23:40 |
fungi | https://zuul-ci.org/docs/zuul/4.6.0/reference/releasenotes.html loads for me now. yay! | 23:40 |
corvus | huzzah! eventually consistent releases | 23:41 |
ianw | fungi: it is not nested ... | 23:41 |
opendevreview | James E. Blair proposed zuul/zuul master: Re-enable the release jobs https://review.opendev.org/c/zuul/zuul/+/797995 | 23:42 |
ianw | fungi: maybe we did solve it and i was looking at an old log. https://zuul.opendev.org/t/openstack/builds?job_name=publish-wheel-cache-debian-bullseye shows the timeout but the not job failure trying to install "python-apt" | 23:46 |
ianw | i definitely debugged some sort of log where that was an issue, but maybe i got confused ... | 23:46 |
fungi | ianw: that seems more likely | 23:46 |
fungi | (that it used to be a problem and we solved it in the nodepool config) | 23:47 |
ianw | ahhh! | 23:47 |
ianw | https://zuul.opendev.org/t/openstack/build/5eaf6cdde2524933842c482e956a552d | 23:47 |
ianw | the *arm64* builds seem to have this issue | 23:47 |
fungi | oh | 23:47 |
ianw | that's how i was getting confused. so we probably need to set that on arm64 | 23:47 |
fungi | yup | 23:47 |
fungi | in the nodepool config of course | 23:48 |
pabelanger[m] | so just thinking about supercedes logic, if there is a PR already running in check, and we apply gate. Would we set the status of the github check? or would it be pending for ever? | 23:49 |
pabelanger[m] | the check in the check pipeline I should say | 23:49 |
corvus | pabelanger: i think if you have a dequeue reporter it should close it out | 23:49 |
pabelanger[m] | okay | 23:49 |
ianw | fungi: indeed not there -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/nb03.opendev.org.yaml#L135 ... mystery solved. thought i was going nuts! | 23:50 |
pabelanger[m] | check: cancelled | 23:50 |
pabelanger[m] | I'll test that out | 23:50 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!