Thursday, 2021-06-24

pabelanger[m]okay, 2nd issue00:10
pabelanger[m]changes don't appear to be enqueuing into gate now00:11
pabelanger[m]I see the following00:11
pabelanger[m]2021-06-24 00:07:36,153 DEBUG zuul.Pipeline.ansible.gate: [e: 2d2601a0-d480-11eb-9486-73eec78895a0] Change <Change 0x7fb6c81ed0f0 ansible-network/windmill-config 844,b4458df0176c4e7ba802a7ca03f32a530fa5a0e9> does not match pipeline requirement <GithubRefFilter connection_name: github.com statuses: ansible-zuul\[bot\]:ansible/check:success open: True current-patchset: True labels: ['gate']> because RequiredStatuses00:11
pabelanger[m]['ansible-zuul\\[bot\\]:ansible/check:success'] does not match ['ansible-zuul:ansible/check:success']00:11
ianwpabelanger[m]: just clutching, what's with the [bot]? bit00:25
pabelanger[m]that's how it worked in 3.9.100:25
pabelanger[m]okay, dropping the [bot] part, seems to have done it00:28
pabelanger[m]trying to confirm now00:29
pabelanger[m]ianw: okay, seems to have done it00:45
pabelanger[m]I've also noticed we are down to 2 versions of ansible in zuul01:13
pabelanger[m]where 2.8 is EOL and 2.9 is in security mode01:13
pabelanger[m]guess we need to add 2.10 and 3.0.0 / 4.0.0 support01:13
corvuspabelanger: i agree, that looks like a doc bug; it should say that the keystore is used by scheduler+executor01:48
opendevreviewPaul Belanger proposed zuul/zuul master: Ensure executor also as keystore.password setting  https://review.opendev.org/c/zuul/zuul/+/79780004:46
*** marios is now known as marios|ruck05:01
*** bhagyashris_ is now known as bhagyashris05:13
*** rpittau|afk is now known as rpittau07:21
*** jpena|off is now known as jpena07:36
*** sshnaidm|afk is now known as sshnaidm08:45
*** marios is now known as marios|ruck09:48
*** bhagyashris_ is now known as bhagyashris09:48
opendevreviewMerged zuul/zuul master: Ensure executor also as keystore.password setting  https://review.opendev.org/c/zuul/zuul/+/79780010:04
tibeerHello. I have a question regarding the tenant configuration for untrusted projects. I’m using GitHub repositories, but the default branch name is „main“. For configuration-projects this can be set by „load-branch“, but this does not work for untrusted-projects. I’m getting this message during pushes from GitHub on my build details page: „Error: Project github.com/tibeerorg/zuul_config does not 11:08
tibeerhave the default branch master“. Am I missing something here or might this be a missing feature?11:08
fungitibeer: https://zuul-ci.org/docs/zuul/reference/project_def.html#attr-project.default-branch11:27
tibeerthanks a lot! i knew i was missing something11:27
fungino worries, there's a lot of documentation to wade through11:31
*** jpena is now known as jpena|lunch11:39
-opendevstatus- NOTICE: Our Zuul gating CI/CD services will be offline starting around 14:00 UTC (in roughly two hours from now) in order to apply some critical security updates, and is not expected to remain offline for more than 30 minutes.12:03
*** jpena|lunch is now known as jpena12:36
pabelanger[m]morning, is there a time frame setup to tag the next release?13:18
corvusNOOO!13:18
corvuspabelanger, tobiash, ianw: you merged a change to zuul after i tagged 4.6.013:18
*** marios|ruck is now known as marios|ruck|call13:19
corvuszuul-maint: do you have any suggestions as to how we should handle the ensuing security release in 40 minutes?13:19
tobiash[m]corvus: wasn;t that only a doc change?13:19
tobiash[m]I thought that would be ok, sorry for that13:19
corvusi spent hours yesterday rebasing the changes and building releases, and all of the git commits were going to be fast-forwarded, and the actual release tag would be on the master branch...13:20
corvusi guess i will repeat that process now, and hope i can complete it in 40 minutes13:20
avass[m]sounds like the easiest way is to force push master to undo the last change and re-apply that later?13:20
corvusavass: then people who have pulled won't be able to fast-forward 13:21
fungicorvus: i guess if it takes a few minutes longer that's not the end of the world. unfortunate though yes13:22
mnaseri like to think a lot of the bigger zuul operators are gonna hangout here anyways13:23
corvusokay.  i recorded all of my commands from yesterday, so hopefully i can run the same sequence now13:23
mnaserand getting an email a few mins later in zuul-announce should be alright :)13:23
corvustristanC: ^ zuul master has moved since yesterday.  i need to rebase the patches.  is that going to cause a problem for you?13:25
fungii suppose one way to save time would be to not rebuild the 4.6.0 container images and just accept that they're missing that doc change so don't perfectly correspond to what ends up being tagged as 4.6.013:26
tristanCcorvus: why do you have to rebase the patches?13:26
fungitristanC: they'll either be rebased or merged one way or the other13:26
corvusand i think it would be prudent for the tag and release artifacts i'm manually publishing to be built from a commit that's actually in the master branch13:28
tristanCcorvus: but aren't we going to merge those patch through zuul after it is restarted?13:29
fungiright, and to not tag a locally-created merge commit13:29
fungitristanC: nope13:29
fungitristanC: per the circulated plan they'll be pushed while our zuul is offline13:29
fungialong with the 4.6.0 tag13:29
corvuswell, we could do either; they will be ff regardless13:30
tristanCfungi: i see, but perhaps we don't have to push the tag13:30
fungithis is probably not the time to be deciding further deviations from the planned timeline13:30
tobiash[m]isn't it the lower cost to force push instead and accepting the non-fast forward for the likely few people who pulled in the last few hours?13:31
tristanCcorvus: in any case that's not going to cause a problem for us13:31
fungiany more than are already necessitated by the fact that the master branch is now different from what was tested and built privately in advance13:31
corvusi am adamently opposed to force-pushing master13:31
corvusfungi: i did actually have the 'push changes' step as git-review after restarting zuul; i wasn't planning on running 'git push' for those.13:32
corvusbut maybe we should 'git push' them instead of 'git review'13:32
tristanCcorvus: then i think it's ok to publish your existing sdist and container image using HEAD^ + the patch, then once zuul is restarted, propose the patch with git-review13:33
corvussince they're effectively merged already, then do any fixups as new changes13:33
avass[m]It's probably better to just pick a solution and roll with it, and since there are strong opinions about force pushing master maybe re-doing yesterdays work and sending an announcement that it's gonna take bit longer is best?13:33
fungicorvus: i thought that was for the opendev job config updates you had drafted, otherwise how can you push the tag without pushing the commits it's based on?13:33
corvusfungi: that's obvious in retrospect, thanks for remiding me, this is why we have review :)13:34
corvusfungi: i would have been surprised at what happened :)13:34
avass[m](instead of having a discussion that is starting to add significantly more time)13:34
fungiheh, no worries13:34
corvusthat makes me think it's even more prudent to rebase, retag, rebuild.13:35
corvusfungi: i think pushing the tag and pushing master are still separate things13:35
fungiand yes, i think redo the steps from yesterday, rebase, recreate the tag. i still think not rebuilding the container images and accepting they lack the doc change might be acceptable13:35
corvusfungi: so maybe we should adjust the plan to push the tag and master at the same time?13:36
fungicorvus: that's fair, the commits will exist in the repo initially just not on the master branch, however gerrit may balk at changes pushing the same commits13:36
fungiso i do think pushing master at the same time as the tag makes sense13:37
corvusi'll rebuild the binaries too13:38
fungiand sorry, i had assumed that was implied, but yes you had specific git push commands in the plan and didn't separately push the master branch before the tag in what you had written13:38
fungii missed that in reading through it13:38
corvussdist is built; building container images now13:42
corvuspabelanger, tobiash, ianw: i'm sorry, i should have sent out an announcement asking folks to freeze the master branch13:47
*** marios|ruck|call is now known as marios|ruck13:49
pabelanger[m]yah, my bad. I could have waited until today to push up my change13:51
pabelanger[m]did it basically as I went offline to go to bed13:51
corvusthe images are re-built13:52
corvusso i think we're ready to go13:52
avass[m]Oh, time to take things offline then13:52
fungithanks! and sorry about the scramble13:52
tobiash[m]phew, sorry again for causing such stress13:52
corvusi've stopped gerrit's zuul13:53
fungii can stop our scheduler in opendev unless you're already working on it13:54
corvusfungi: go for it :)13:54
fungii'll coordinate that in #opendev13:54
fungicorvus: it's offline13:56
corvusokay, should i start pushing docker images now?13:56
corvusi have a slow link, the first one could take a while13:57
fungiyeah, go for it13:57
-opendevstatus- NOTICE: Our Zuul gating CI/CD services are being taken offline now in order to apply some critical security updates, and are not expected to remain offline for more than 30 minutes.13:58
corvusthey're laying fiber in my neighborhood soon14:08
funginot soon enough apparently!14:08
avass[m]corvus: wow, I don't think I've ever had anything below 100Mb/s :)14:09
fungii have fond memories of 110bps over an acoustic coupler14:09
corvusfirst image is almost done; the next is the executor which has another layer that might take a little bit, then all the rest should be quick14:10
corvusoh, i drastically underestimated the size of the executor layer14:11
funginothing to be done about it now14:11
corvusi'll try to work on a time estimate14:12
fungithanks!14:12
corvus(but for reference, it looks to be about 2x the data for the first image upload, so as a first approximation, this could take another 30m)14:13
corvuseta 20m14:18
funginot too terrible14:19
corvusoh, i guess there were a bunch of zeroes; it's finishing early14:26
fungii like zeroes14:26
fungibut especially today14:26
mnaser(esepecially at the end of my bank account, the more the better? :P)14:26
fungimnaser: i have a lot of those, but unfortunately they're all after the .14:27
mnaser:P14:27
fungior , if in europe14:27
corvusdocker images pushed14:28
corvuspushing git tag and commits now14:28
fungithe docker tags are updated now too? if so i'll start pulling in opendev14:29
mnaserso 4.6.0 is what we should pull14:29
avass[m]I can pull zuul/zuul-executor:4.6.0, but they don't seem to show up in dockerhub yet14:30
avass[m]nevermind now they do :)14:30
fungimnaser: 4, 4.6, 4.6.0 or latest14:30
corvusgit pushed14:30
fungishould all be updated according to the plan which was worked out14:30
corvusFalson: correct14:30
corvuser fungi ^14:30
fungithanks14:30
corvusuploading to pypi14:31
* mnaser watches pipeline14:31
tristanChttps://pypi.org/project/zuul/ says 4.6.0 is available14:35
corvuspypi upload complete14:35
corvusworking on announcement email now14:35
corvusannouncement sent14:35
tristanCand zuul-4.6.0-1.el7 is also now available in the sf-3.6-release repository, software-factory user can now run `sfconfig --update`14:36
avass[m]deployed14:36
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790914:36
pabelanger[m]starting the upgrade14:36
fungiopendev is on its way back up as well14:37
corvuschanges pushed to zuul-jobs and opendev/base-jobs14:37
corvuswe'll need to recheck those and merge asap14:37
gtemahttps://zuul-ci.org/docs/zuul/4.6.0/reference/releasenotes.html - returns 40414:37
corvusgtema: yeah, we'll have to catch up doc publishing later; they're in the email for now though14:38
fungigtema: yes, those won't be published just yet, good point14:38
corvusi guess we should have said they would eventually be published there :/14:38
fungithere are 6 stories i sent a private reply about asking to confirm before switching to public, but would also like to do that asap14:38
avass[m]pabelanger: your change would have been perfect to bump the docs ;)14:39
gtemayes, I just followed link from email14:39
corvusfungi: i think it's okay to flip the switch now14:39
fungiso just to confirm, all 6 of those are okay to be public now?14:39
corvuslet me check the list14:40
fungithanks, i didn't want to inadvertently disclose something we haven't fixed yet14:40
fungiso just wanted a second set of eyes on that list14:40
corvusfungi: lgtm14:44
fungithanks, will switch them momentarily14:45
corvusi've rechecked the 2 changes i uploaded14:45
fungiahh, thanks, i was about to approve them14:45
fungithough will the base-jobs change be able to pass testing?14:45
fungior will it fail on the problems it addresses?14:46
corvusi think it will pass14:46
pabelanger[m]the biggest unknown for me right now, is potential changes needed in our base jobs. I think we are good, but guess we will know shortly14:46
corvusi don't think it uses those secrets14:46
fungiahh, okay14:46
avass[m]pabelanger: we've tried to make it as easy to mitigate any issues at least14:46
corvusfungi: i think we need to make a corresponding change to openstack/project-config14:47
tristanCit seems like our post log playbook is now failing when using `- hosts: "{{ site.fqdn }}"` with site is a secret14:47
corvusfungi: i think it has very similar secrets, but the values are slightly different14:47
corvustristanC: you may be able to use the format approach i did in https://review.opendev.org/79791014:48
corvusoh14:48
corvusi'm not sure i understand why that isn't working14:48
avass[m]no that should work14:49
avass[m]I think ?14:49
tobiash[m]so far everything looks good in our env including base jobs14:49
tristanCcorvus: yes, that is surprising, the exception http://paste.openstack.org/show/806922/14:49
tristanCand the "{{ site }}" works in a task attribute before14:51
pabelanger[m]deploying 4.6.0 now14:51
corvustristanC, avass: is it because secrets are hostvars now?14:52
corvusso it's out of scope there?14:53
avass[m]yeah that could be14:53
tristanCcorvus: that would explain the failure yes, so i guess we can nolonger do that14:53
corvusi wonder if you can set a fact in a play before it?14:53
corvushrm that's not working for me14:54
avass[m]it works if you do `hostvars[<inventory_hostname>].<var>` but not as a set fact14:55
corvusavass: ++14:56
corvustristanC: maybe you can change that to:   hosts: "{{ hostvars['localhost'].site.fqdn }}"14:56
corvus(with no need to run set_fact)14:57
mnaserhttps://curl.zuul.vexxhost.dev/status14:57
mnaserrunning 4.6.014:57
avass[m]oh, but only if I set it as a fact in a play just before that play :) 14:57
corvusavass: right, but the secret "site" should still already be a hostvar, so i think set_fact isn't needed14:57
avass[m]corvus: it actually doesn't14:58
avass[m]so set_fact is needed14:58
tristanCcorvus: we can also simply splice the hostname directly in the post playbook14:59
corvusif that's easy... :)   but i'm certain avass's suggestion of setting the fact and then accessing it as a hostvar will work (i verified that)15:00
corvusi'm testing my suggestion of not using set_fact now :)15:00
corvusavass: i agree it does not seem to work; that seems weird to me.15:01
avass[m]even weirder is that I don't even have to make it cacheable15:02
corvusavass: i don't think you need to cache between plays, just between playbooks, right?15:02
fungii think caching is only needed if you want to pass it between different playbooks15:02
avass[m]ah15:02
corvusavass: oh weird!  it's hostvars that's undefined if you don't set_fact15:07
corvusthat makes no sense15:07
* corvus < https://matrix.org/_matrix/media/r0/download/matrix.org/rPYNHehoGrbUbDdpWzQwwyNO/message.txt >15:08
corvusthat actually works, but not without that first play15:08
corvus(cacheable isn't needed)15:09
tristanCalso https://opendev.org/opendev/base-jobs/src/branch/master/roles/submit-logstash-jobs/tasks/main.yaml#L9 is now failing with `The task includes an option with an undefined variable. The error was: 'None' has no attribute 'get'`15:10
tristanCi guess we need an extra `.get('data')` ?15:11
fungitristanC: is that after we merged 797910?15:14
avass[m]corvus: weird15:14
tristanCfungi: it seems like the issue will still happens after 79791015:18
tristanCi mean i don't see how 797910 would affect the logstash-jobs task15:19
corvushttps://review.opendev.org/797910 exercised its own fix, so the .format() method of dealing with jinja in secrets looks like it works15:21
pabelanger[m]4.6.0 - https://dashboard.zuul.ansible.com/t/ansible/status15:21
pabelanger[m]checking to see if there is any fall out15:21
corvusi agree; i don't understand the issue with the logstash playbook15:21
corvusoh i think i may know, 1 sec15:22
corvustristanC: oh yes, that's exactly it, .get('data')15:23
corvustristanC: i agree -- this should fix it: "{{ (lookup('file', zuul.executor.result_data_file) | from_json).get('data').get('zuul').get('log_url') }}"15:23
corvustristanC: you want to propose that?15:24
tristanCcorvus: yes sure, it's  https://review.opendev.org/c/opendev/base-jobs/+/79796015:25
corvusalready approved15:27
corvustristanC, tobiash: does https://review.opendev.org/797135 look like an easy approval so we can exercise all of zuul's gate jobs?15:28
tobiash[m]lgtm15:29
pabelanger[m]https://review.opendev.org/c/zuul/zuul-jobs/+/797909/ failed linters15:29
pabelanger[m]plus some testing15:30
pabelanger[m]looking now15:30
corvusit's probably the same error15:30
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790915:31
fungiwhich has now merged15:31
pabelanger[m]pull-from-intermediate-registry: Load information from zuul_return seemed to fail15:31
corvuspabelanger: ah that's going to be the same issue tristan just found; i'll fix that15:33
pabelanger[m]cool15:33
tristanCso far the two unexpected failures are using secret in hosts play value and when using zuul result lookup15:34
pabelanger[m]corvus: I am guessing, container builds will not work until we land 797909?15:36
avass[m]well everyone quit early because of midsummer tomorrow so we have no traffic :)15:36
pabelanger[m]must be, seeing retries on our jobs right now15:38
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790915:40
corvuspabelanger: retries on container jobs?  yeah, i think 797909 will be necessary15:43
pabelanger[m]Yah, this is what we are seeing15:43
pabelanger[m]The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'buildset_registry'\n\nThe error appears to be in '/var/lib/zuul/builds/129e3b83a77649db98651863a6122153/trusted/project_1/opendev.org/zuul/zuul-jobs/roles/pull-from-intermediate-registry/tasks/main.yaml': line 2, column 315:43
pabelanger[m]so, once we land your change, it means 4.6.0 is the min requirement now for container build jobs15:44
corvusthat agrees with the latest fix i did15:44
corvusyep15:45
corvusbut, also, s/for container build jobs// given how exploitable the issues are15:45
pabelanger[m]https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_8d3/797909/3/check/zuul-jobs-test-registry-buildset-registry/8d305dc/job-output.txt15:48
pabelanger[m]same error as above15:49
tobiash[m]pabelanger: does the re-read the result data json?15:49
tobiash[m]we had that in one project and that file changed its layout15:50
tobiash[m]pabelanger: you can put .data in between15:51
pabelanger[m]I'm still getting up to speed, will defer to @corvus to update15:52
opendevreviewTobias Henkel proposed zuul/zuul-jobs master: Fix reading buildset_registry from results.json  https://review.opendev.org/c/zuul/zuul-jobs/+/79796315:53
tobiash[m]corvus, pabelanger I think that should fix it ^15:53
tobiash[m]we had a similar issue in one of our projects15:54
opendevreviewTobias Henkel proposed zuul/zuul-jobs master: Fix reading buildset_registry from results.json  https://review.opendev.org/c/zuul/zuul-jobs/+/79796315:54
corvustobiash: that's already in my latest patchset15:54
corvusexcept it should be secret_data, not data15:55
tobiash[m]oh, sorry, I didn't follow all the backscroll15:55
tobiash[m]then I'll abandon that15:55
corvushttps://review.opendev.org/797909 is the change15:55
pabelanger[m]ah15:55
pabelanger[m]so my comment is not right15:55
pabelanger[m]however, ansible is failing on that line15:55
corvusit's 2 issues -- adding in the extra level of hierarchy, and then actually fixing the security issue by using secret_data.  797909 is attempting to do both15:58
pabelanger[m]podman test seems to be failing now too, looks like we are not finding the image on the buildset registry and going upstream16:00
corvusif anyone can spot the issue in https://zuul.opendev.org/t/zuul/build/a8622de6cfaf47bb9a8114ffeed1c72e/console please let me know, i don't see it yet16:01
pabelanger[m]i'm comparing to https://zuul.opendev.org/t/zuul/build/dc9e40d2ef144c69be6e73af59a091f8/console which looks to be the last good run16:04
pabelanger[m]use-buildset-registry: Modify registries.conf doesn't seem change16:06
pabelanger[m]modify_registries_conf not working properly for some reason?16:07
pabelanger[m]which means we don't inject the buildset_registry?16:07
*** rpittau is now known as rpittau|afk16:09
pabelanger[m]yah, then our docker config isn't saved16:09
corvusi think i see an issue16:10
corvushttps://opendev.org/zuul/zuul-jobs/src/branch/master/test-playbooks/registry/test-registry.yaml#L101-L10216:11
avass[m]corvus: was just about to comment that16:12
corvusi don't think we can override "zuul.*" vars like that any more16:12
avass[m]no, since they're extra vars16:12
avass[m]I think I pointed that out while discussing the update on irc a couple of weeks ago :)16:12
avass[m]corvus: if those roles need to use `zuul` vars running a nested ansible command is probably the easiest way to test them16:13
corvusyou may have; unfortunately, i didn't remember that when i was checking on the status of the patches in storyboard16:14
corvusavass: yeah, though the restricted ansible in zuul is different than plain ansible16:15
corvusso it's a divergence from testing like prod16:16
avass[m]most differences are there to protect from running things on localhost right?16:16
corvusanother alternative would be to pass in variables or default to zuul.* -- if that's safe16:16
avass[m]oh, I think it should be safe to add a `vars` entry in the role that defaults to zuul and override it like that16:17
avass[m]since it's only possible to override role vars with include_role vars or extra vars16:17
corvusright, but is it safe to allow someone to pass in a separate list of previously built artifacts?16:18
avass[m]it would only be possible to override it in the same playbook that uses the role, so yes?16:19
corvusavass: we're talking about saying "zuulartifacts: zuulartifacts | default(zuul.artifacts)"16:20
corvusgrr16:20
avass[m]corvus: I mean like add-build-sshkey: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/add-build-sshkey/vars/main.yaml16:20
corvusavass: we're talking about saying "zuul_artifacts: zuul_artifacts | default(zuul.artifacts)"16:20
corvusactually let me rephrase that to make it clearer16:21
corvus"zuul_artifacts: test_zuul_artifacts | default(zuul.artifacts)"16:21
corvusso anyone could add test_zuul_artifacts as a job var16:21
avass[m]let me push an example :)16:21
corvusi understand the mechanics :)16:22
avass[m]job vars wouldn't be able to override it16:22
corvusavass: okay, i think i see what you're saying16:23
corvusyou'd have to write a new playbook that invokes the role to override it16:23
corvus(i think it would actually be okay for job vars to override this, but i like it better if they can't)16:24
corvusi'll work on a new patchset16:24
opendevreviewAlbin Vass proposed zuul/zuul-jobs master: Add _zuul var that can be overriden when including role  https://review.opendev.org/c/zuul/zuul-jobs/+/79797716:25
avass[m]that ^ makes it so the only way to override the roles internal `_zuul` is to do exactly what the test does16:26
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790916:26
corvusavass: okay i think that ^ is your suggestion but more minimal16:27
avass[m]corvus: ++16:27
corvusi want to add a comment to that, but i'll let the builds finish first16:28
*** jpena is now known as jpena|lunch16:30
*** jpena|lunch is now known as jpena16:30
*** jpena is now known as jpena|off16:31
corvusbetter16:46
corvusthere's an issue with the inception aspect of this: https://zuul.opendev.org/t/zuul/build/4da79c76a392443d870216ef1b564eef/console16:46
corvus(the real buildset registry before the fake buildset registry)16:46
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790916:48
corvusi don't understand the issue yet; i'm just adding some debug prints there16:48
corvus(but zuul-jobs-test-registry-docker did pass, so the zuul.artifacts thing worked)16:51
corvusoh i think i get it16:52
corvusthis is a catch-22 because the real buildset registry is running from a trusted playbook, so it's not running with the secret_data change16:53
corvuswe should disable these jobs, merge the change, then re-enable them16:53
corvus(the existing docker job should give us confidence)16:54
pabelanger[m]+116:54
avass[m]they're currently broken so it can't get worse? :)16:54
corvusindeed16:54
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790916:56
corvuszuul-maint: ^ expected to pass16:56
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Re-add buildset-registry jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/79798616:57
pabelanger[m]+216:58
pabelanger[m]anyone else want to +A https://review.opendev.org/c/zuul/zuul-jobs/+/79790917:03
fungipabelanger[m]: done, just finished reading through it17:06
corvuslooks like no issues with gerrit's zuul: https://ci.gerritcodereview.com/t/gerrit/build/2f6a0197530e4bedb3cdae66ecccf5fc/console17:07
fungiexcellent!17:07
corvustristanC: when you have a sec, can you look at https://review.opendev.org/797135 and leave a +2 (but don't approve)? then once the image jobs are done i can approve it to exercise all the gate/promote jobs.17:14
avass[m]we really threaded the needle with the security update17:14
corvustristanC: (of course, if you -1 it i can find another small change)17:14
pabelanger[m]avass: the hardest part for us was dealing with zookeeper package with SSL support. I know it isn't zuuls issue, but would almost be nice to have a way to opt into ssl zookeeper still17:16
pabelanger[m]that said, we managed to get it installed via tarball and it just work17:16
fungii suppose that's one up-side to java applications basically behaving like containers17:17
fungior virtual machines17:17
pabelanger[m]Yup. Once we updated our ansible role to support it, it basically just worked17:18
corvuspabelanger: now that it's done, you should be ready for 5.0 :)17:18
pabelanger[m]everything else for zuul / nodepool fell into place (minus keystore.password) issue17:18
pabelanger[m]looking forward to it17:19
pabelanger[m]do we have a timeline on 5.0? Or etherpad for it17:21
corvuspabelanger: no, we're just merging changes as fast as we can :)17:22
opendevreviewMerged zuul/zuul-jobs master: Prevent leaks of buildset registry credentials  https://review.opendev.org/c/zuul/zuul-jobs/+/79790917:23
corvuswe're almost done removing gearman, and i think the first change that adds a unit test with two schedulers will be ready shortly after that17:24
corvus(we won't be ready to run 2 schedulers at that point, but it's still a big milestone and it's within sight :)17:25
pabelanger[m]sounds exciting, looking forward to that day for sure17:25
corvusi rechecked 79798617:26
pabelanger[m]it was pretty nice to just stop / start nodepool for 3.14.0 -> 4.1.0 upgrade and have no state loss17:26
corvusfungi: could i tempt you into a +3 or -1 on https://review.opendev.org/797135 ?17:28
corvusi'm assuming tristanC is updating base playbooks17:28
pabelanger[m]it would also be worth a discussion to see if we can add support in zuul-executor to use ansible-runner some how, that way people can use ansible via an execution environment is they want. Given that is what downstream redhat is now shipping for folks17:28
corvus(now that the zuul-jobs change has merged, i think we're ready to exercise zuul's gate)17:28
fungisure, toggling back and forth between here and playing whack-a-mole with broken opendev jobs17:28
corvusfungi: cool, i'll look into #opendev17:30
fungicorvus: that's not a hotfix for the security release... is it urgent?17:30
corvusfungi: no, it's the first simple change i could find to merge in order to exercise the gate and publish docs17:30
fungiahh, yes good idea too17:31
fungilgtm, approved17:32
fungihopefully we'll get release notes too17:32
fungiso they no longer 40417:32
corvusyeah.  we may need to run a docs publish job on the tag for that17:32
corvusso we might consider enqueing the tag and.... i dunno, somehow stopping the docker build jobs?17:34
corvus(either that, or decide that we don't care about pushing a new build with the same tag)17:35
avass[m]corvus: merge a change that disables everything except documentation and then trigger the release pipeline for 4.6.0 maybe?17:36
corvusyeah, i think that would work (i don't think that would run afoul of the branch-contains-tag check)17:37
opendevreviewJames E. Blair proposed zuul/zuul master: Temporarily disable some release jobs  https://review.opendev.org/c/zuul/zuul/+/79799417:39
opendevreviewJames E. Blair proposed zuul/zuul master: Re-enable the release jobs  https://review.opendev.org/c/zuul/zuul/+/79799517:39
opendevreviewJames E. Blair proposed zuul/zuul master: Add a comment about checking zk cert perms  https://review.opendev.org/c/zuul/zuul/+/79799717:41
pabelanger[m]Hmm, I am seeing the following in one of our jobs still17:44
pabelanger[m]2021-06-24 17:43:31.215244 | localhost |   "msg": "'dict object' has no attribute 'artifacts'"17:44
pabelanger[m]https://328deba99b1e154be8f6-b5cb0ed4d1c5e0b8a6ba497ae655d78f.ssl.cf1.rackcdn.com/85/61d77107edefcf45b1cac03e26e9df3452e2551b/check/network-ee-tox-ansible-builder/3cf8c6e/job-output.html#l81017:45
corvuspabelanger: using the 'git' driver for zuul-jobs?  we may need to check the poll interval17:45
corvuspabelanger: (also, you're running direct from upstream -- not a local fork of zuul-jobs, right?)17:45
pabelanger[m]yah, upstream17:45
pabelanger[m]let me check config17:45
pabelanger[m]I thought we set it up from gerrit17:45
pabelanger[m]for depends-on17:45
corvuspabelanger: oh, then hrm...  if it's git, the default poll interval is 2 hours.  but if it's gerrit and you've got stream-events, then it should have been approximately immediate -- but of course would only take effect if the item was enqueued after the change merged17:47
corvuspabelanger: it may be working -- we may have just missed something in our fix17:47
corvusit may be that we need to set the default value differently17:48
fungiworkingly unworking17:48
corvusyeah, i think that's it; fix in a second17:48
pabelanger[m]Hmm, let me recheck17:48
pabelanger[m]maybe I was too fast on starting job after the merge17:49
corvusnah i think it's a real bug17:49
pabelanger[m]it says artifacts not zuul_artifacts17:50
pabelanger[m]which is your change for that task17:50
corvuspabelanger: but it might be the initializer in vars/main.yaml that's emitting that error17:51
pabelanger[m]oh, yah17:51
pabelanger[m]I think you are right17:51
opendevreviewJames E. Blair proposed zuul/zuul-jobs master: Fix default value for zuul_artifacts  https://review.opendev.org/c/zuul/zuul-jobs/+/79800017:51
corvuszuul's upload-image job just hit a retry; i didn't catch it, but i'd wager it's the same error17:51
corvuspabelanger, avass, fungi: ^ one more zuul-jobs fix17:53
fungilgtm, expedited approval17:53
pabelanger[m]oh noes17:54
pabelanger[m]I think our schedule just crashed17:54
fungidid it log a traceback?17:54
pabelanger[m]trying to get it back online first17:55
pabelanger[m]I might have an idea17:57
corvus(ftr, yes, zuul hit the same .artifacts error)17:58
fungithat probably explains some subset of build failures i'm seeing as well which i haven't dug into logs for yet17:59
corvusfungi: anything with a buildset registry is still broken17:59
fungithat too17:59
fungiyes17:59
corvuszuul-client tests are broken: https://b2b5edd1153190007318-36597f05313fd3fdc073d75819a76269.ssl.cf1.rackcdn.com/797135/1/gate/zuul-tox-zuul-client/853c2a9/job-output.txt18:00
fungialso i know we still have some less-used jobs in opendev which will need updating for things like no longer trying to set ansible_python_interpreter18:00
corvusi'll see if i can repro locally18:00
pabelanger[m]ExecReload=/bin/kill -HUP $MAINPID18:00
pabelanger[m]I believe that did it18:01
pabelanger[m]I missed the full-reconfigure change18:01
fungid'oh, in retrospect it might have made sense to at least not die on hup, but... i guess that makes it noticeable at least18:01
corvuswhew, that should be easy to fix :)18:02
corvuspabelanger: you may be able to use "zuul-scheduler smart-reconfigure" instead too18:02
corvuswill be a lot faster18:02
fungisystem: "hey, zuul, hang up already" ... zuul: "okay, can do!"18:02
corvus(full-reconfigure should basically only be needed if something goes wrong with smart-reconfigure)18:02
fungi*click*18:02
pabelanger[m]k, will look into smart-reconfigure18:04
opendevreviewJames E. Blair proposed zuul/zuul master: Fix zuul client tests  https://review.opendev.org/c/zuul/zuul/+/79800418:10
opendevreviewJames E. Blair proposed zuul/zuul master: Temporarily disable some release jobs  https://review.opendev.org/c/zuul/zuul/+/79799418:11
opendevreviewJames E. Blair proposed zuul/zuul master: Re-enable the release jobs  https://review.opendev.org/c/zuul/zuul/+/79799518:11
corvuslooks like a tox-remote error too18:11
corvus(also looks like tests just need updating)18:12
opendevreviewMerged zuul/zuul-jobs master: Fix default value for zuul_artifacts  https://review.opendev.org/c/zuul/zuul-jobs/+/79800018:14
corvuszuul-maint: https://opendev.org/zuul/zuul/src/branch/master/tests/remote/test_remote_hostvars.py#L77-L8018:15
corvusthe only assertion that test has is something we don't support any more18:15
corvusis there something else we should be testing instead?18:16
corvuslike... we have regular unit tests that make sure normal hostvars work... i think that test really was to make sure that ansiblepythoninterpreter worked18:16
fungiseems that way, i'd personally be fine dropping it18:17
fungiadded by https://review.openstack.org/637338 "executor: use node python path"18:18
fungiyep, exactly why it's in there18:19
opendevreviewJames E. Blair proposed zuul/zuul master: Fix zuul client and remote tests  https://review.opendev.org/c/zuul/zuul/+/79800418:26
corvuszuul-maint: that should fix the tox-client and tox-remote jobs18:26
opendevreviewJames E. Blair proposed zuul/zuul master: Temporarily disable some release jobs  https://review.opendev.org/c/zuul/zuul/+/79799418:27
opendevreviewJames E. Blair proposed zuul/zuul master: Re-enable the release jobs  https://review.opendev.org/c/zuul/zuul/+/79799518:27
corvusthe fastest (yet still safe) way to proceed would be to +3 798004 and 797994 now18:27
fungii've +2's both, second review would be nice18:31
corvustristanC, tobiash, pabelanger: ^ if you're around18:31
fungiif everyone's busy i'm okay with snigle-core approving it18:34
fungisingle too18:34
pabelanger[m]sorry, which ones?18:34
* fungi wonders for a moment what a snigle might be18:35
corvushttps://review.opendev.org/798004 and https://review.opendev.org/79799418:35
corvusreally mostly the first one18:35
corvuser i just caught a typo in it18:35
opendevreviewJames E. Blair proposed zuul/zuul master: Fix zuul client and remote tests  https://review.opendev.org/c/zuul/zuul/+/79800418:36
pabelanger[m]k, +218:36
pabelanger[m]add +A when ready18:36
corvusdone18:36
fungioh, typo indeed. i missed that18:37
corvus00]0[]1[1one!18:37
fungieleventy18:37
opendevreviewJames E. Blair proposed zuul/zuul master: Temporarily disable some release jobs  https://review.opendev.org/c/zuul/zuul/+/79799418:37
pabelanger[m]also, our container jobs appear to be good now18:38
corvuspabelanger: \o/18:38
fungiawesome!18:38
corvusokay, those are both in gate now and expected to pass; assuming they don't fail earlier, i think we're idle for about an hour, then we can enqueue the tag18:39
fungilooks like most of the urgently-failing things are now under control in opendev18:42
fungithanks everyone for all the help!18:43
corvusthanks indeed!18:43
pabelanger[m]is anyone able to help out with the following: http://paste.openstack.org/show/806930/19:01
pabelanger[m]if we pre-apply 'gate' label before PR has reported back, zuul will not enqueue the PR any more19:03
pabelanger[m]we have to toggle the gate label19:03
pabelanger[m]https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L3419:03
pabelanger[m]is the pipeline19:03
avass[m]pabelanger: I think you need to add the githbuchecks to your gate pipeline triggers19:04
pabelanger[m]you mean the check_run event?19:05
pabelanger[m]let me compare to https://zuul-ci.org/docs/zuul/reference/drivers/github.html#reference-pipelines19:06
avass[m]trying to figure out how to do that, if it's possible to do it19:06
avass[m]but I'd expect it to be be possible to a check success somehow19:07
avass[m]trigger on a check success*19:07
pabelanger[m]yes, we have that19:08
pabelanger[m]I wonder if the status syntax is wrong some how19:08
pabelanger[m]I had issue with [bot] in it last night and removed it19:08
avass[m]pabelanger: I can only see a check success requirement, not a trigger?19:08
avass[m]pabelanger: oh, nvm I'm blind19:09
pabelanger[m]http://paste.openstack.org/show/806931/19:10
pabelanger[m]should be more help19:10
pabelanger[m]because Types [re.compile('pull_request')] doesn't match check_run19:11
pabelanger[m]is the part that is confusing me19:11
avass[m]pabelanger: should the `event: pull_request` be replaced with `event: check_run` maybe?19:18
pabelanger[m]I don't believe so19:20
pabelanger[m]tobiash: ^ maybe you have some thoughts19:20
pabelanger[m]https://zuul-ci.org/docs/zuul/reference/drivers/github.html#value-pipeline.trigger.%3Cgithub%20source%3E.action.status19:20
pabelanger[m]the syntax looks to be right19:21
avass[m]there was a change to fix the normalization going on in the github driver some weeks ago19:21
avass[m]but I think that was till backwards compatible19:21
avass[m]I got the same config with `action: pull_request` but I'm not sure if I've seen it work19:22
avass[m]pull_request doesn't have a `status` action: https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#pull_request19:22
avass[m]and it looks like it should be check_run? https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#check_run19:23
pabelanger[m]it is statuses19:23
pabelanger[m]statuses: ansible-zuul:ansible/check:success19:23
pabelanger[m]in pull_request19:23
pabelanger[m]yah, really don't know what is going on19:29
avass[m]pabelanger: are you sure that's the correct event you got?19:33
pabelanger[m]https://zuul-ci.org/docs/zuul/reference/drivers/github.html#attr-pipeline.require.%3Cgithub%20source%3E.status19:33
pabelanger[m]is what I am reading now19:33
avass[m]the logs I mean19:33
pabelanger[m]yah, its right event19:36
avass[m]because that's a check_run event, and explains why pull_request isn't matching in that case19:36
avass[m]but I suppose that worked before?19:37
avass[m]oh, but didn't github change their api a while ago, and you updated from 3.19?19:38
pabelanger[m]yes, this worked properly in 3.19.119:41
corvuspabelanger, avass: pretty sure pullrequest and checkrun are different events19:41
corvussigh19:41
pabelanger[m]but after upgraded I needed to remove [bot] from the status check19:41
avass[m]corvus: it does look like that right?19:41
corvuspabelanger, avass: pretty sure pull_request and check_run are different events19:41
corvuspabelanger: at first glance, i would say that you should ignore that debug message because it doesn't correspond with a trigger you are interested in19:42
pabelanger[m]eg: https://github.com/ansible/project-config/commit/a61e3d286890634cce2c3eea2dde50c8ebcd2631#diff-169729fef3c97c20106951e89bf95e0d3bcb2e30c2f1fec93d93a387703fe4bf19:42
pabelanger[m]ack19:42
corvussomewhere there should be a pull_request event with a "labeled" action19:43
corvuspabelanger: but you have a a pipeline requirement that the check pipeline succeed19:43
corvushttps://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L5019:44
pabelanger[m]yes19:44
pabelanger[m]we want check reported back before enqueing into gate19:44
pabelanger[m]enqueued*19:44
corvusdoesn't that mean that it can't be enqueued into gate unless the github check named 'check' has succeeded19:44
corvusi thought the problem statement was that it wasn't enqueued if you added the 'gate' label before it reported back19:45
pabelanger[m]yes, that is what I would expect to happen.19:45
pabelanger[m]yes, but we wouldn't expect it to be enqueued until after the check is reported19:45
corvusoh, wait, you're saying you want to add the gate label, and then expect the check success to be the enqueue trigger?19:45
pabelanger[m]yes, that is right19:45
avass[m]corvus: I think the idea is that it should be possible to add the gate label, and the gate should be triggered by a check success19:45
pabelanger[m]so we are not matching properly some how on the check success19:45
pabelanger[m]and trying to debug why that is19:46
corvusso https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L55-L57 is not matching19:46
pabelanger[m]exactly19:46
pabelanger[m]if I unlabel gate then relabel gate19:46
pabelanger[m]it is enqueued19:46
pabelanger[m]so our pull_request trigger isn't proper for some reason19:47
pabelanger[m]https://github.com/ansible/project-config/commit/a61e3d286890634cce2c3eea2dde50c8ebcd2631#diff-169729fef3c97c20106951e89bf95e0d3bcb2e30c2f1fec93d93a387703fe4bf was the change I made last night, I wonder if we need to keep [bot] for the pull_request status19:47
avass[m]yeah so the trigger is wrong, I think `action: pull_request` should be `action: check_run` and the status should probably be something else:19:48
avass[m]https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L58119:48
pabelanger[m]https://zuul-ci.org/docs/zuul/reference/drivers/github.html#attr-pipeline.require.%3Cgithub%20source%3E.status19:48
avass[m]but I don't understand why it was working in the first case then :(19:48
corvuspabelanger: it does look like you're reporting via the status api, so i think you would need [bot] there19:49
corvushere's a pull_request status event from opendev:  Event <GithubTriggerEvent 0x7fe0747b7160 pull_request pypa/pip refs/pull/10065/head status github.com/pypa/pip 10065,7d2406620e12927983d48f2680e4c1b1cfc7f780 delivery: 0963ae1e-d4f0-11eb-93e6-3711742a29fa> for change <Change 0x7fe026c32ac0 pypa/pip 10065,7d2406620e12927983d48f2680e4c1b1cfc7f780>19:49
corvus(just so you know they do exist :)19:49
pabelanger[m]I would guess https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L50 doesn't need to change, since it is working properly19:50
avass[m]yeah but what triggered that event?19:50
corvusavass: no idea, i'm just saying pullrequest is a real event and checkrun is the wrong one to look at :)19:50
avass[m]looking at pabelangers logs it looks like he's getting a check_run event19:50
corvusavass: i'm sure he is, but it's not relevant19:51
corvusit's some other system reporting a check run completion19:51
avass[m]isn't that what we want?19:51
avass[m]just not another system19:51
corvuspabelanger: wait, are you reporting via checks api or status?19:52
corvusit is the checks api, so you shouldn't need [bot] in either place19:53
avass[m]> 2021-06-24 18:51:44,131 DEBUG zuul.GithubConnection: [e: 372f0b90-d51d-11eb-9bff-71269ba4cc73] Scheduling event from github.com: <GithubTriggerEvent 0x7f3240632860 check_run ansible/project-config refs/pull/851/head completed github.com/ansible/project-config 851,7f3f39d1cbd5b1baa3e49032569a87371e5f0a86 delivery: 372f0b90-d51d-11eb-9bff-71269ba4cc73 check_run: ansible-zuul:ansible/check:success>19:53
avass[m]I'd expect it to trigger on that ^19:53
pabelanger[m]I believe we are only using check now19:53
pabelanger[m]https://github.com/ansible/project-config/blob/master/zuul.d/pipelines.yaml#L2519:53
corvusavass: i agree, that would be a good event to trigger on.  pabelanger maybe you should change your trigger to be that instead?19:54
pabelanger[m]sorry, which one?19:54
corvusi don't think there's an example yaml tirregr config for what avass is suggesting19:55
pabelanger[m]k, I have to jump into a meeting now, but will try to craft something on the check_run event19:56
avass[m]that doesn't really explain how this used to work however :)19:56
corvuswe should probably entertain the possibility that it hasn't worked for longer than 6 hours19:57
avass[m]I was thinking more how it worked in 3.19.1, but I guess something made it compatible back then19:58
corvushighlighting in the streaming console looks good now; also the scrollbar is fixed20:02
opendevreviewMerged zuul/zuul master: Fix zuul client and remote tests  https://review.opendev.org/c/zuul/zuul/+/79800420:03
corvus\o/ unblocked!20:03
corvuswatching the promote jobs for that now20:04
corvusi think docs promote may have a bug20:04
corvusah i see20:07
pabelanger[m]avass: for 3.19.1 we did status api, and check runs20:10
opendevreviewJames E. Blair proposed zuul/project-config master: Update doc publish secret with python string format  https://review.opendev.org/c/zuul/project-config/+/79801620:10
pabelanger[m]but now, we've moved directly to check runs20:10
corvusfungi: around for a +3 on https://review.opendev.org/798016 ?20:11
pabelanger[m]I guess it would be something like: http://paste.openstack.org/show/806933/20:12
corvusshould action be status or complete?20:13
corvuspabelanger: you can go into the event history in github and inspect it20:13
pabelanger[m]actually...20:13
pabelanger[m]yah20:13
pabelanger[m]doing that now20:13
corvusi'll just self-approve that config change20:14
opendevreviewMerged zuul/project-config master: Update doc publish secret with python string format  https://review.opendev.org/c/zuul/project-config/+/79801620:14
corvusand now i'll re-enqueue 798004,3 in promote20:14
avass[m]pabelanger: probably completed: https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L53020:14
opendevreviewJames E. Blair proposed zuul/zuul master: Increase unit test job timeout to 90 minutes  https://review.opendev.org/c/zuul/zuul/+/79801920:17
corvuscool, all the promote jobs should work now20:17
corvushopefully the docs site will update in a few minutes after an afs publish cycle20:18
pabelanger[m]I can't figure out how to limit the trigger to a specific check run20:22
corvus\o/ https://zuul-ci.org/docs/zuul/reference/releasenotes.html is updated20:22
pabelanger[m]eg: ansible-zuul:ansible/check:success20:22
pabelanger[m]check_runs don't seem to be name for that20:22
corvusnext we need https://review.opendev.org/797994 to merge, then we can re-enqueue the tag and that should cause the 4.6.0 release notes to publish20:23
avass[m]pabelanger: the check run seems to do `slug:name:conclusion`: https://opendev.org/zuul/zuul/src/branch/master/zuul/driver/github/githubconnection.py#L231520:25
fungisorry, back now, looking but i guess 798016 has merged already20:25
avass[m]not sure what the slug is supposed to be 20:25
pabelanger[m]was just looking at that20:25
corvusfungi: thx, i think we're idle for another 1-1.5 hours since the test disable job hit a timeout20:27
corvusthough if any zuul-maint wants to spend a few seconds reviewing https://review.opendev.org/798019 that could be timely :)20:28
pabelanger[m]avass: so http://paste.openstack.org/show/806933/ ?20:28
pabelanger[m]err20:28
pabelanger[m]http://paste.openstack.org/show/806934/20:28
avass[m]pabelanger: I guess so? not sure if the check_run attribute is correct but that seems reasonable to me20:29
pabelanger[m]yah, this isn't documented too well :(20:29
avass[m]corvus: I've reviewed it but I'm not a maintainer :)20:30
avass[m]but since the testsuite is up to 90minutes, maybe it's a good idea to start looking at using multiple nodes to speed things up or set up some test avoidance?20:30
fungior look into whether any of the tests are less efficient than they could be20:31
avass[m]yeah20:31
corvusthey're all less efficient than they could be :)20:31
corvusthe other thing we should consider is that opendev established its standard node size 10 years ago20:31
pabelanger[m]avass: I think it is https://github.com/ansible/project-config/pull/85220:32
pabelanger[m]looking at the schema20:32
pabelanger[m]we'll find out soon20:32
corvusi was geniuinely surprised that https://review.opendev.org/797180 did not speed up the tests.  and i was equally surprised that https://review.opendev.org/797181 did not slow them down.20:33
pabelanger[m]okay, https://github.com/ansible/project-config/pull/852 seems to have done it21:05
pabelanger[m]thanks corvus avass 21:05
corvuspabelanger: \o/21:05
avass[m]pabelanger: np :)21:06
corvuspabelanger: prolly can drop lines 58-60 now21:06
pabelanger[m]yah, doing clean up now21:06
corvusi know i'd come back to that in 3 months and be like "which thing is the one that's supposed to work again?" :)21:07
pabelanger[m]next up is to look at the dequeue stuff21:07
pabelanger[m]from the example pipeline21:07
opendevreviewMerged zuul/zuul-jobs master: Re-add buildset-registry jobs  https://review.opendev.org/c/zuul/zuul-jobs/+/79798621:28
corvussigh, another timeout21:39
corvusit's worth noting that the unit tests take about 1 hour on most providers; it's only one that's 33% slower21:41
corvusi should just re-enqueue changes as soon as they start running jobs in that provider22:50
fungidoes it seem to be i/o contention? we recently took the limestone provider out of our pool because of that, has it been readded or something?22:57
corvusfungi: it's bhs22:57
ianwi'm not sure limestone has been re-added.  i wasn't aware of node issues there, it was the mirror that was getting stuck in that case22:58
fungiahh, i wonder why ovh-bhs1 would be so much slower than ovh-gra1... maybe they have us on a constrained host aggregate or something and we're competing with our other nodes22:59
fungiianw: yeah, well it seemed to be i/o bandwidth for the mirror in that case, but yes23:00
opendevreviewMerged zuul/zuul master: Temporarily disable some release jobs  https://review.opendev.org/c/zuul/zuul/+/79799423:05
corvuswow, by the skin of our teeth23:06
corvusokay, i'm going to enqueue the tag now23:10
fungithanks!23:11
corvusfungi: how does this look?  docker exec zuul-schedulerscheduler1 zuul enqueue-ref --tenant zuul --pipeline release --project opendev.org/zuul/zuul --ref ref/tags/4.6.0 --newrev 487c0ba5f8b2758795bb5e5c8e5bd64777d3652423:17
corvusfungi: how does this look?  docker exec zuul-scheduler_scheduler_1 zuul enqueue-ref --tenant zuul --pipeline release --project opendev.org/zuul/zuul --ref ref/tags/4.6.0 --newrev 487c0ba5f8b2758795bb5e5c8e5bd64777d3652423:17
fungicorvus: that doesn't seem to match the 4.6.0 tag object sha for me23:20
fungicorvus: do you not also need --trigger=gerrit?23:21
corvusfungi: i think trigger is not necessary23:21
fungianyway, `git show-ref 4.6.0` in my zuul checkout gives me bbafeada02635a4c8b5477ec316c16c13238689223:22
corvusfungi: huh, i just did a git clone and got 487c0ba5f8b2758795bb5e5c8e5bd64777d36524 a second time23:23
fungifrom show-ref or show?23:24
* corvus < https://matrix.org/_matrix/media/r0/download/matrix.org/nxIlZtMqIsxcyBWKHFwCuqAT/message.txt >23:24
fungii'll try a fresh clone23:25
fungioh! you know what, i need to delete 4.6.0 and update again23:26
fungi487c0ba5f8b2758795bb5e5c8e5bd64777d3652423:26
corvusfungi: was the other tag from a test build?23:27
fungiyep, okay i agree that command looks correct23:27
corvuscool, running enqueue-ref now23:27
fungiyes, i had locally tagged 4.6.0 when working out the pypi upload commands for you, then promptly forgot i had done that23:27
fungithanks!23:28
corvushrm, not showing up on status...  i'm not sure that ref was right23:29
corvusi think i was missing the 's' in refs23:29
corvusrefs/tags/4.6.0 is what it should be, yeah?23:29
fungioh, yep!23:29
corvusyep, there it is now23:29
corvusand only the docs job is queued23:30
fungii usually run via docker-compose but a recent working example from my command history is:23:30
fungisudo docker-compose exec scheduler zuul ^Cqueue23:30
fungi-ref --tenant=openstack --trigger=gerrit --pipeline=release-post --project=opens23:30
fungitack/releases --ref=refs/heads/master --newrev=2d7fc060b52ee823a7ebe690ea50deab923:30
fungi(minus the stray newline in there)23:31
corvusi should have just checked your history :)23:31
pabelanger[m]so, given all the fuse with the check_run status moving from check to gate, I think I am just going make the more to remove the clean check requirements.23:33
pabelanger[m]that said, am I missing any obvious downside?23:33
pabelanger[m]maybe more gate resets because of poor commits?23:33
corvuswhile waiting, i checked runtimes, and it turns out that bhs is just slightly behind rax in averages; my guess is it just has a bit more variation which is why we see it, but it's probably performing at par23:33
fungipabelanger[m]: do your users frequently approve broken changes which won't merge, and do you tend to have long gate queues?23:34
fungiit's that combination which led to openstack relying on it23:34
corvusand if you're not in that position, i highly recommend not using it; users will be much happier23:35
pabelanger[m]fungi: honestly, I don't think so.23:35
fungicorvus: yep, fully agree23:35
pabelanger[m]not using clean check?23:35
corvus(and you get to benefit from the whole "let the computer do the testing and don't worry about it" thing)23:35
pabelanger[m]that would mean, in github, we'd update our branch protection to only require ansible/gate23:35
corvuspabelanger: correct i recommend not using clean check unless you have to23:36
pabelanger[m]ack23:36
ianwi should have realised that overriding ansible_python_interprter in https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/797808 would fail.  i guess the real problem here is that python-path: auto and ansible's detection is failing on debian bullseye23:36
fungipabelanger[m]: yes, basically openstack relies on it to reduce the massive amounts of nodes which get chewed up and spat out on gate resets from failures on very long queues, and broken changes working their way up the queue like a wrecking ball23:36
pabelanger[m]okay, I'll enable supercedes: check tomorrow, remove something but not tell people right away23:36
corvusianw: can you set in in nodepool?23:36
fungipabelanger[m]: aha, the "folgers crystals" test23:37
pabelanger[m]indeed23:37
ianwcorvus: yeah, i guess that is the place to do it.  it might fix itself with an ansible upgrade in zuul at some point i guess23:37
corvusianw: https://zuul-ci.org/docs/nodepool/configuration.html#attr-diskimages.python-path i think23:38
ianwhrm, we already have it set https://opendev.org/openstack/project-config/src/branch/master/nodepool/nodepool.yaml#L24923:39
fungiianw: i feel like we solved this once for bullseye already23:39
fungiianw: is it nested ansible maybe?23:40
fungihttps://zuul-ci.org/docs/zuul/4.6.0/reference/releasenotes.html loads for me now. yay!23:40
corvushuzzah!  eventually consistent releases23:41
ianwfungi: it is not nested ...23:41
opendevreviewJames E. Blair proposed zuul/zuul master: Re-enable the release jobs  https://review.opendev.org/c/zuul/zuul/+/79799523:42
ianwfungi: maybe we did solve it and i was looking at an old log.  https://zuul.opendev.org/t/openstack/builds?job_name=publish-wheel-cache-debian-bullseye shows the timeout but the not job failure trying to install "python-apt"23:46
ianwi definitely debugged some sort of log where that was an issue, but maybe i got confused ...23:46
fungiianw: that seems more likely23:46
fungi(that it used to be a problem and we solved it in the nodepool config)23:47
ianwahhh!23:47
ianwhttps://zuul.opendev.org/t/openstack/build/5eaf6cdde2524933842c482e956a552d23:47
ianwthe *arm64* builds seem to have this issue23:47
fungioh23:47
ianwthat's how i was getting confused.  so we probably need to set that on arm6423:47
fungiyup23:47
fungiin the nodepool config of course23:48
pabelanger[m]so just thinking about supercedes logic, if there is a PR already running in check, and we apply gate. Would we set the status of the github check? or would it be pending for ever?23:49
pabelanger[m]the check in the check pipeline I should say23:49
corvuspabelanger: i think if you have a dequeue reporter it should close it out23:49
pabelanger[m]okay23:49
ianwfungi: indeed not there -> https://opendev.org/openstack/project-config/src/branch/master/nodepool/nb03.opendev.org.yaml#L135 ... mystery solved.  thought i was going nuts!23:50
pabelanger[m]check: cancelled23:50
pabelanger[m]I'll test that out23:50

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!