Friday, 2021-06-25

pabelanger[m]oh, interesting00:25
pabelanger[m]we do have some config errors00:26
pabelanger[m]https://dashboard.zuul.ansible.com/t/ansible/config-errors00:26
pabelanger[m]so we actually setup ansible_connection in our host-vars, not because we use it from the zuul-executor00:26
pabelanger[m]but because we use write-inventory role to create the /etc/ansible/hosts on disk of the node00:26
pabelanger[m]https://github.com/ansible/ansible-zuul-jobs/blob/master/roles/write-inventory-fork/library/write_inventory.py#L2200:27
ianwpabelanger[m]: is that something because you're then running ansible on that node connecting to ... ? something?00:29
pabelanger[m]yah00:29
pabelanger[m]basically we boot network appliances, that won't use ssh connection00:29
pabelanger[m]so we cannot connect to them from zuul-executors00:30
pabelanger[m]but we always boot a 2nd node as the controller00:30
pabelanger[m]so, we use the job host-vars to populate the hosts file on the controller00:30
ianwahh, not so far from system-config & bridge 00:30
pabelanger[m]as an easy way to get nodepool info00:30
pabelanger[m]I'll have to look into unsafe vars00:30
pabelanger[m]and clean check requirement removed00:43
pabelanger[m]lets see if people notice00:44
opendevreviewMerged zuul/zuul master: Re-enable the release jobs  https://review.opendev.org/c/zuul/zuul/+/79799501:03
pabelanger[m]I've been noticing the following since upgrading to 4.6.0: http://paste.openstack.org/show/806940/03:12
pabelanger[m]ansible-collections/community.elastic isn't a project we manage, but we are getting the events03:12
ianwpabelanger[m]: yeah, i think you might be seeing the same anonymous access issue we had03:13
ianwwhich is ... looking03:13
ianwhttps://review.opendev.org/c/zuul/zuul/+/79468803:14
ianwhttps://storyboard.openstack.org/#!/story/200894003:14
ianwdoesn't look like *exactly* the same error but in the ballpark03:14
ianwhrm, maybe not03:15
ianwGET https://api.github.com/search/issues?q=38540417c72f72e1ded3ef799af7e94cbb5b96d5&per_page=100 result: 200,03:15
ianwyou're seeing a 200 result03:16
pabelanger[m]I'll have to look at the event in the morning03:19
ianwyeah i think above is a red-herring sorry03:20
pabelanger[m]that project isn't in our config. So could explain why we cannot find it03:21
pabelanger[m]either way, will follow up tomorrow03:21
*** marios is now known as marios|ruck06:52
*** jpena|off is now known as jpena06:54
*** rpittau|afk is now known as rpittau07:15
mhu1hello, I haven't looked into the code yet, but is it possible the recent security change drastically reduced the amount of variables that are in the zuul inventory?10:11
mhu1here's an example of a recent inventory: https://softwarefactory-project.io/logs/98/22198/6/check/sf-rpm-build/e81c507/zuul-info/inventory.yaml10:11
mhu1compared to a run from yesterday (before updating to 4.6) https://softwarefactory-project.io/logs/98/22198/4/check/sf-rpm-build/262ea42/zuul-info/inventory.yaml10:12
fbomhu1: No change there https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/log-inventory related to the security fix10:19
avass[m]mhu1: I think what happened is that `zuul.*` vara are now extra-vars so they need to be logged separately10:19
mhu1avass[m], okay so they're still usable right?10:20
avass[m]They should be10:20
*** rpittau is now known as rpittau|bbl11:17
*** jpena is now known as jpena|lunch11:42
*** jpena|lunch is now known as jpena12:41
*** rpittau|bbl is now known as rpittau12:56
opendevreviewMatthieu Huin proposed zuul/zuul-jobs master: log-inventory: log zuul vars  https://review.opendev.org/c/zuul/zuul-jobs/+/79808713:37
fungimhu1: oh, i hadn't even noticed that, but yeah absolutely seems to be the case... for example: https://zuul.opendev.org/t/zuul/build/ced400df792446638388cdedbd97609d/log/zuul-info/inventory.yaml13:45
fungithanks, reviewing, i rely on those heavily as well13:45
mhu1fungi, thanks!13:45
corvusthat approach wfm, but i put in an alternative suggestion if we want to maintain more compatibility with the old system13:47
opendevreviewJames E. Blair proposed zuul/zuul master: Put zuul vars back in debug inventory.yaml  https://review.opendev.org/c/zuul/zuul/+/79809213:55
corvusmhu1, fungi: ^ that's the alternative; i'm okay with either.13:56
mhu1corvus, makes more sense, thanks - out of curiosity was there any reason to remove these vars?13:57
fungimhu1: to remove them from the logged file, or are you asking why they were moved out of the normal list of ansible vars?13:57
mhu1fungi, I assume they were moved from the logged file *because* they were moved out of the list of ansible vars?13:58
corvusmhu1: for my part, i would say it was mostly unintentional; i didn't fully consider the implications of zuul vars not being in the debug inventory.  i agree having them logged somewhere is super important :)14:00
fungimhu1: https://storyboard.openstack.org/#!/story/200732714:00
mhu1eh, no worries14:00
corvusmhu1: during development of the patch, they moved around like 3 times.  :)14:00
fungiit was entirely about moving the zuul vars to a higher precedence, nothing to do with whether we logged them14:00
mhu1ok, I understand better now14:01
fungithe disappearance from that inventory.yaml file is more of a side effect of the fact that we didn't include extra-vars in it and they're now extra-vars14:01
corvus++14:03
corvusanyway, i'd say our choice of which change to use should hinge on whether we want to expand the small lie that we tell users by even having this (unused) inventory file at all to include the zuul vars in order to maintain muscle-memory and tool compatability, or if we want to underscore that they've moved to extra vars and are separate.14:04
fungiit seems like the most correct solution would be to do the combined file and rename it to something other than inventory.yaml... but that's an even more drastic change14:07
mhu1also on our side (software factory hat on), it'd be easier to just update our version of zuul-jobs than repackage our rpm for zuul with that fix - but that's just FYI and shouldn't weigh in the balance14:08
mhu1we'll be fine either way14:08
corvus(could also do both; one as an interim step)14:09
corvusmhu1: btw, all those post_failure errors on your change from sf.io -- everything ok?14:09
fungilike put the extra-vars into inventory.yaml, but rename it to something like metadata.yaml for 5.0?14:09
fungior by both you mean the zuul and zuul-jobs changes?14:10
mhu1corvus, I'm investigating right now - we're having slow network issues that don't help but in that specific case I think it's a var we need to adjust post 4.6.014:10
mhu1the upgrade went okay on our side, all things considered, btw14:11
corvusmhu1: good!14:12
opendevreviewMatthieu Huin proposed zuul/zuul-jobs master: log-inventory: log zuul vars  https://review.opendev.org/c/zuul/zuul-jobs/+/79808714:25
corvusmhu1, fungi: see my comment on https://review.opendev.org/79808714:40
*** rpittau is now known as rpittau|afk14:47
mhu1SF 3rd party CI fixed on zuul-jobs btw15:29
*** jpena is now known as jpena|off15:57
*** marios|ruck is now known as marios|out16:04
pabelanger[m]last night, I enabled superceeds check for our gate pipeline on zuul.a.c, for github we set the dequeue trigger to set cancelled for the ansible/check check run. The downside, is that seem to show a red X on the PR16:32
fungieven for cancelled buildsets?16:33
pabelanger[m]looking at https://docs.github.com/en/rest/reference/checks#update-a-check-run I can see 'skipped', and trying to see if that is nice in the UI16:33
pabelanger[m]fungi: yah, let me get PR to show16:34
clarkbcorvus: is https://review.opendev.org/q/hashtag:%22zuul-4.6%22+(status:open%20OR%20status:merged) a set of things to review/land now that 4.6.0 has shipped?16:35
pabelanger[m]https://github.com/ansible/project-config/pull/86316:35
clarkbalso, I'm not sure how all the publication stuff ended up rolling out but does it make sense to do a 4.6.1 in the near future to run zuul through its own publication jobs and make sure those haven't regressed under the updates?16:36
pabelanger[m]1 cancelled and 1 in progress checks16:36
pabelanger[m]it doesn't stop zuul from merging code, because of branch protection but I am sure people will ask why is the job red16:36
fungigah, i always forget for a moment that it won't show you any of that unless you log in16:36
pabelanger[m]like I said, I hope skipped is more friendly16:36
fungi"Some checks were not successful...1 cancelled"16:37
fungium, yeah, that is bizarre that they consider cancelled checks as a failure rather than just resetting to not yet run16:37
pabelanger[m]cancelled, failure, neutral, success, skipped, stale, or timed_out16:38
pabelanger[m]seem to be the values, trying to see what each does in the UI16:38
fungimaybe "neutral" then?16:38
pabelanger[m]https://aws1.discourse-cdn.com/github/original/2X/4/49474b50e50c5848696cdb2ae6f745daa233964a.png16:39
pabelanger[m]that seems to be neutral16:39
clarkbstale might also be appropriate since the gate is making check "stale"16:41
pabelanger[m]You cannot change a check run conclusion to stale, only GitHub can set this.16:42
pabelanger[m]:(16:42
clarkbthough stale probably implies to most people the job needs to rerun16:42
clarkbah well that settles that16:42
pabelanger[m]https://github.community/t/pr-stuck-waiting-for-checks-which-should-be-skipped/1848516:44
pabelanger[m]skipped will grey out, which may be nicer looking16:44
pabelanger[m]and, we do skip them if we move directly into gate16:44
fungithat seems like the best of the available options then16:46
clarkb++16:49
corvus++16:54
corvusclarkb: i think the publication stuff is good and we don't need to do an immediate 4.6.116:55
clarkbcorvus: k16:55
mhu1zuul-maint, some small changes on zuul-client: https://review.opendev.org/c/zuul/zuul-client/+/794553/ and https://review.opendev.org/c/zuul/zuul-client/+/75290916:56
mhu1also I'll be on PTO until July 15th so feel free to take over https://review.opendev.org/c/zuul/zuul-jobs/+/798087 if needed16:57
mhu1and maybe we can look into the admin web ui changes when I am back16:57
corvusmhu1: are you in a position to base-test that one?16:58
mhu1not sure how?16:58
corvusmhu1: https://opendev.org/opendev/base-jobs/src/branch/master/zuul.d/jobs.yaml#L5-L2316:59
corvusclarkb: i just went through the 4.6 hashtag list and left my current thoughts on all of those17:02
clarkbcool I'll try to review the 2 that you +2'd17:03
clarkbcorvus: I see that your part of the world is avoiding this mess of insane tempuratures17:04
mhu1corvus, ah ok got it - not sure we have that clever setup in sf, let me check17:05
corvusmhu1: feel free to use opendev too; config-core/infra-root will generally quickly approve base-test changes in response to irc pings :)17:05
corvusclarkb: yeah, i learned this morning that yours is not; i thought this weekend was going to be bad for us, but looks like we'll scrape by17:06
clarkbyup our all time high is 107. Tomorrow is predicted to be 106, then sunday 113, and monday 10917:06
opendevreviewPaul Belanger proposed zuul/zuul master: Add skipped / neutral statuses to the github driver  https://review.opendev.org/c/zuul/zuul/+/79813117:07
pabelanger[m]^my attempt to add support for check run status17:08
clarkbcorvus: in the azure driver stack there are a couple of changes that can be approved. Should I go ahead and +A those now?17:08
opendevreviewJames E. Blair proposed zuul/zuul master: Replace TreeCache in component registry  https://review.opendev.org/c/zuul/zuul/+/79658217:08
opendevreviewJames E. Blair proposed zuul/zuul master: Add ExecutorApi  https://review.opendev.org/c/zuul/zuul/+/77090217:09
clarkbspecifically the two at the bottom of that stack have two +2s but I held off approving previosuly due to the rechecking and zuul 4.6.0 work17:09
opendevreviewJames E. Blair proposed zuul/zuul master: Change zone handling in ExecutorApi  https://review.opendev.org/c/zuul/zuul/+/78783317:09
opendevreviewJames E. Blair proposed zuul/zuul master: Switch to string constants in BuildRequest  https://review.opendev.org/c/zuul/zuul/+/79184917:09
opendevreviewJames E. Blair proposed zuul/zuul master: Clean up Executor API build request locking and add tests  https://review.opendev.org/c/zuul/zuul/+/78862417:09
corvusclarkb: yes, i think the whole stack is ready now, and approval is appropriate17:09
opendevreviewJames E. Blair proposed zuul/zuul master: Fix race with watches in ExecutorAPI  https://review.opendev.org/c/zuul/zuul/+/79230017:09
corvusthis is a patch bomb is slow motion, btw :/17:09
opendevreviewJames E. Blair proposed zuul/zuul master: Execute builds via ZooKeeper  https://review.opendev.org/c/zuul/zuul/+/78898817:09
opendevreviewJames E. Blair proposed zuul/zuul master: Move build request cleanup from executor to scheduler  https://review.opendev.org/c/zuul/zuul/+/79468717:10
opendevreviewJames E. Blair proposed zuul/zuul master: Handle errors in the executor main loop  https://review.opendev.org/c/zuul/zuul/+/79658317:10
opendevreviewJames E. Blair proposed zuul/zuul master: Shard BuildRequest parameters  https://review.opendev.org/c/zuul/zuul/+/79714917:10
opendevreviewJames E. Blair proposed zuul/zuul master: Compress sharded ZK data  https://review.opendev.org/c/zuul/zuul/+/79715617:10
opendevreviewJames E. Blair proposed zuul/zuul master: Switch to ZooKeeper backed merge result events  https://review.opendev.org/c/zuul/zuul/+/78419517:10
opendevreviewJames E. Blair proposed zuul/zuul master: Switch from global to tenant event queues  https://review.opendev.org/c/zuul/zuul/+/79744017:10
corvusclarkb: i just added 2 changes to the zuul-4.6 hashtag which are also ready to merge17:12
clarkbhttps://review.opendev.org/c/zuul/nodepool/+/781925/ is the next one in the azure stack I should review. I'll try to get to that one and the zuul 4.6.0 hashtag list after lunch17:13
corvusclarkb: sounds good; i tried to keep actual changes in that one small17:13
corvus(but there necessarily were a few)17:13
clarkbya some of the files remained fixed but had their contents replaced with code from the new implementation17:14
opendevreviewPaul Belanger proposed zuul/zuul master: Add skipped / neutral statuses to the github driver  https://review.opendev.org/c/zuul/zuul/+/79813117:21
opendevreviewMatthieu Huin proposed zuul/zuul-jobs master: [DNM] test log-inventory  https://review.opendev.org/c/zuul/zuul-jobs/+/79814217:36
mhu1corvus, ^ hope that'll do the trick - I have to bounce, frozen margaritas are calling17:37
mhu1(well actually dirty diapers are calling but it's not as glamorous)17:37
mhu1see y'all in two weeks17:37
fungihave a good vacation!17:37
opendevreviewMerged zuul/nodepool master: Azure: use rate limiting  https://review.opendev.org/c/zuul/nodepool/+/78185618:14
opendevreviewJames E. Blair proposed zuul/zuul master: Remove unused addResultEvent method  https://review.opendev.org/c/zuul/zuul/+/79754218:21
opendevreviewMerged zuul/nodepool master: Azure: fix race in leaked resources  https://review.opendev.org/c/zuul/nodepool/+/78192418:33
pabelanger[m]is anyone using zuul-storage-proxy yet?18:35
clarkbI think it was tobiash[m] that had the use case for it18:35
corvuspabelanger: it's in use18:37
pabelanger[m]documentation is pretty light, was mostly curious to see how it is working18:37
corvusyeah, i think docs are 'read the code' right now; i believe all users of it are running from containers; i think the implementation is in good shape.18:40
pabelanger[m]is any deployments on public zuul?18:42
pabelanger[m]mostly wanting to see how upload-logs roles look, for rendering the log url18:42
pabelanger[m]more and more getting some pressure to be able to support 'password protected' log files / console18:44
opendevreviewPaul Belanger proposed zuul/zuul master: Add skipped / neutral statuses to the github driver  https://review.opendev.org/c/zuul/zuul/+/79813118:48
pabelanger[m]I shared this last night, but doing so again. http://paste.openstack.org/show/806940/ is about the only new issue we are seeing after upgrading to 4.6.0. Any ideas why project.name is missing?18:53
pabelanger[m]the project in question isn't in our zuul tenant config18:54
clarkbpabelanger[m]: is that project configured to send you github events?18:54
pabelanger[m]yah18:55
clarkbI assume that is what is happening then zuul is failing to find a config for it18:55
pabelanger[m]it is setup at the gorg level18:55
clarkbya so either add a zuul config for the project or just continue to ignore it18:55
pabelanger[m]I'd like to see how to minimize the ERRORs in our log files18:55
fungipabelanger[m]: you need to access zuul[project][name] now18:56
fungiand pass zuul=zuul in a .format() call18:56
pabelanger[m]in the github driver?18:57
fungiassuming this is arising in one of the protected configs18:57
fungioh, hrm18:57
pabelanger[m]no, this is from the scheduler debug logs18:57
pabelanger[m]so, not ansible18:57
fungiyeah, just now looked at the paste, sorry, did a bunch of fixes to project.name references in secrets yesterday and was guessing poorl18:58
fungiy18:58
pabelanger[m]does it make sense to process an event, for a project that doesn't exist in our configuration?18:59
fungiis the app added to that project for some reason? i guess that's why it's sending events?19:00
fungiseems like this came up before19:00
clarkbpabelanger[m]: I think we assume that you won't add the app to projects you don't care about19:00
clarkbso zuul reports an error saying"hey you did something weird here"19:00
clarkbif we just ignored it that would make debugging this when people want the events to be meaningful more difficult19:00
pabelanger[m]there are 2 options for github apps for orgs. At the org level or project level. For github.com/ansible-collections we do it at org level, but not all projects are enrolled in zuul19:02
pabelanger[m]this way, if a project wants to be onboarded, it is one less things an admin needs to click19:03
pabelanger[m]we've done this for a long time19:03
pabelanger[m]just now, in 4.6.0 we are seeing ERRORs spamming the debug logs19:03
pabelanger[m]can look to see what changed19:03
fungipabelanger[m]: and you didn't have that with 4.5.1?19:08
pabelanger[m]not really sure we only ran 4.5.1 for a few hours19:11
pabelanger[m]I can try to see in the logs19:11
clarkbI suspect 614bd40341bfdcc142b9ff19da35d6671b1c0848 introduced the behavior19:11
clarkbit converted event dispatch to the zookeeper queues19:12
clarkbbut that particular code in your traceback doesn't seem to have changed in  a very long time19:15
fungipabelanger[m]: prior to the 4.5.1 upgrade, what version were you running? do you recall?19:17
opendevreviewJeremy Stanley proposed zuul/zuul-jobs master: [DNM] test log-inventory  https://review.opendev.org/c/zuul/zuul-jobs/+/79814219:25
clarkbtobiash[m]: corvus: re https://review.opendev.org/c/zuul/zuul/+/796458 the nodesets are immutable because we use the NodeSet.copy() method? We don't operate on them in place?19:34
clarkboh it may be due to config freezing19:37
corvusclarkb: i'm unaware of any in-place mods, but it'd be good for you to double check :)19:38
clarkbya I'm not finding any yet19:39
clarkband then we lazily retrieve the job nodeset when constructing the gearman request for the job to be run19:41
clarkband that returns the default set in that change otherwise.19:41
pabelanger[m]@fungi 3.19.119:46
clarkbcorvus: and for https://review.opendev.org/c/zuul/zuul/+/796431 the connection name + project name + secret name is enough to uniquely identify every secret right? secrets are project specific so this should be the case (we want to avoid being able to use the cache to retrive someone else's secret)19:47
corvussecrets are global within a tenant -- so a second tenant with the same connection names, project name, and secret name and different secret data is possible19:49
corvusso i think the answer to your question is 'no' that's not enough19:49
clarkbin that case I htink that change needs to also track the tenant info19:49
corvusbut it looks like that change also would include the secret data as a cache key?19:49
clarkboh hrm ya I think that may be the case19:50
clarkband since it is encrypted you'd fail to decrypt later if the wrong data was provided?19:50
corvus(theoretically i guess the encrypted data could be identical while the plaintext data are different (if somehow the two different keys encrypted different data to the same encrypted value).  that seems like a vanishingly small probability though)19:51
corvus(but, moreover, that's not an issue with this change)19:52
clarkbyou think this is adequately safe then because the encrypted data is supplied as part of the cache key?19:54
clarkbwould we need to worry about collision searches with the issue you point out?19:55
clarkbyou could theoretically run a bunch of encryptions searching for a collision19:55
corvusclarkb: as i think more... i think maybe the answer to your original question may be closer to "almost yes" than "no".... connection name and project name are globally unique within a tenant.  two tenants with the same connections and project could have different secrets if they had different zuul.yaml files loaded; but those secrets would still be from and for the same project, and encrypted with the same key.19:57
corvuser sorry let me revise that19:58
corvusclarkb: as i think more... i think maybe the answer to your original question may be closer to "almost yes" than "no".... connection name and project name are globally unique.  two tenants with the same connections and project could have different secrets if they had different zuul.yaml files loaded; but those secrets would still be from and for the same project, and encrypted with the same key.19:58
clarkbgotcha so the level of trust is the same across those19:58
corvus(i had an extra "within a tenant" which negated the point i was trying to make)19:58
corvusyeah19:58
clarkband in this case we may even want a collision in those cases because they key and secrets could be the same19:58
clarkbsplitting by tenant would incrase memory use (which we are trying to avoid) and not be any safer19:59
corvusadd encrypted data to the key, and i think that means you are certain to get exactly only the encrypted data paired with a pointer to the key that you should.19:59
corvusclarkb: re tenant: correct19:59
clarkbok I guess I'll go ahead and approve it then20:01
clarkbcorvus: I'm not sure I fully understand your comment on https://review.opendev.org/c/zuul/zuul/+/795195 are you saying that you'd prefer not to update the xterm dep since the issue with highlighted text has fixed itself?20:12
clarkbfwiw I agree that latest zuul seems to have fixed that in firefox and chrome for me20:12
clarkbbut I think keeping the dep up to date is worthwhile if we have a change to do so20:12
clarkbcorvus: for https://review.opendev.org/c/zuul/nodepool/+/781925/ any concern that changing that driver out from under existing users will break them? I think there are config changes that need to be handled at least. Do we just mark that in release notes and state it is improvement for the future and go for it? If so I think we need a release note update20:19
corvusclarkb: re xterm: it was just a quick note that we should re-examine it since things changed; i agree we should merge it if everything looks good20:20
corvusclarkb: i think there's a reno for the only config item removal (and it has a backwards compat shim in place)20:22
corvusi think everything else is addition20:22
corvusclarkb: so in general, i think it's okay to merge and we don't expect breakage20:23
clarkbok20:24
opendevreviewMerged zuul/zuul master: Use annotated logger for layout related logs  https://review.opendev.org/c/zuul/zuul/+/79722920:34
opendevreviewMerged zuul/zuul master: Add missing attributes to change mgmt events  https://review.opendev.org/c/zuul/zuul/+/79676120:49
clarkbcorvus: I have approved https://review.opendev.org/c/zuul/nodepool/+/781925 but did note a couple of small things on that change that may be worth followups. In particular the stricter adherence to multiples of 512byte page blobs20:58
opendevreviewMerged zuul/zuul master: Cache frozen secrets  https://review.opendev.org/c/zuul/zuul/+/79643121:13
opendevreviewMerged zuul/zuul master: Re-use empty nodeset in job variants  https://review.opendev.org/c/zuul/zuul/+/79645821:16
opendevreviewMerged zuul/nodepool master: Azure: replace driver with state machine driver  https://review.opendev.org/c/zuul/nodepool/+/78192521:57
opendevreviewMerged zuul/nodepool master: Azure: update documentation  https://review.opendev.org/c/zuul/nodepool/+/78192622:08

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!