Wednesday, 2021-08-18

tristanCcorvus: thanks you for the pings, i approved the changes00:20
corvustristanC: thanks!00:24
*** iurygregory_ is now known as iurygregory06:42
*** rpittau|afk is now known as rpittau07:22
*** jpena|off is now known as jpena07:34
opendevreviewIan Wienand proposed zuul/zuul master: [wip] treeview for job overview page  https://review.opendev.org/c/zuul/zuul/+/80495607:35
opendevreviewBenjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota  https://review.opendev.org/c/zuul/nodepool/+/80076507:47
opendevreviewBenjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota  https://review.opendev.org/c/zuul/nodepool/+/80076507:59
opendevreviewFelix Edel proposed zuul/zuul master: WIP NodeRequest watches  https://review.opendev.org/c/zuul/zuul/+/80496108:08
*** mgoddard- is now known as mgoddard08:20
*** mhuin is now known as mhu08:31
gryfhello!08:48
avass[m]gryf: hi!08:48
gryfI have a question regarding two nodes job (controller and compute1), defined like this: https://opendev.org/openstack/kuryr-kubernetes/src/branch/master/.zuul.d/multinode.yaml#L15-L63 and I'd like to pass something which is generated during stacking on controller devstack to the compute1. is it possible?08:50
gryfI'm looking on https://zuul-ci.org/docs/zuul/reference/jobs.html#return-values but cannot figure out how to expose the data (from first example) on controller. any hints?08:50
gryfis it even possible on job level?08:50
*** dviroel|ruck|out is now known as dviroel|ruck11:12
*** jpena is now known as jpena|lunch11:34
*** sshnaidm|pto is now known as sshnaidm12:25
*** jpena|lunch is now known as jpena12:32
*** bhavikdbavishi1 is now known as bhavikdbavishi12:38
fungigryf: i'm not clear on what you mean by "expose the data" (the example you've linked was just for a job definition)13:02
fungicorvus: okay, this one may be cache-related, and i'm also not sure where to go next debugging it: a project in opendev recently renamed a zuul config file in their git repository, and zuul is reporting config errors for content in the old filename which no longer exists. change https://review.opendev.org/801436 renamed zuul.d/jobs.yaml to zuul.d/devstack-tobiko.yaml weeks ago, but zuul is13:32
fungireporting config errors for zuul.d/jobs.yaml in x/devstack-plugin-tobiko here: https://zuul.opendev.org/t/openstack/config-errors13:32
fungilike it still has the old file cached13:32
opendevreviewMatthieu Huin proposed zuul/zuul master: web UI: user login with OpenID Connect  https://review.opendev.org/c/zuul/zuul/+/73408213:32
fungirelated, another project which uses jobs from there ceased being able to run any builds after we restarted zuul yesterday, getting "Unable to freeze job graph: Job devstack-tobiko is abstract and may not be directly run" and i'm not sure yet whether that's related13:33
fungito the same problem13:33
fungisuggestions welcome from anybody really, i just mentioned corvus because yesterday he mentioned the potential for regressions in the recent config cache work13:35
fungii suspect the second problem is related though because the change which renamed that file also switched the problem job definition from abstract to runnable13:41
fungiso zuul still seeing the old file would explain both symptoms13:41
gryffungi, so I have a job which is multinode. on controller there is devstack spinned up, and as one of the results there is a kubernetes cluster spinned up. I'd like to pass somehow results of the command `kubeadm token create --print-join-command`, which gives something like `kubeadm join 10.0.2.15:6443 --token z4dnzr.ag24bdnghb3nb3o5     --discovery-token-ca-cert-hash13:44
gryfsha256:6a9b90fe4394bbff3b5d88f5aefea3a2b04744a4f4502a1d51992870bd920cb8` on the output. I'd like to pass it somehow to the other node, compute1.13:44
corvusfungi: i think i'd start at the same place -- did we get the event for the change merge?13:46
fungigryf: the two nodes should be able to communicate with one another, worst case you could run the command remotely from the other node via ssh, or put it into a file and scp it to or from the other node13:46
fungicorvus: even though the change merged in july and we've had multiple scheduler restarts since?13:47
fungiand the problem doesn't seem to have appeared until some time between 14:55 utc yesterday and 07:16 utc today13:48
gryffungi, I was thinking about it as a last resort, and was hoping about some mechanism within zuul itself. but it will work aswell. thank you, kind sir :)13:48
corvusfungi: in re your 1st 2 questions: yes.  your last question does change things; i don't know why it would suddenly appear13:49
fungiso we no longer reread configuration on scheduler restart i guess, and that's been the case since some number of weeks13:50
fungiand that's now persisted in a cache (in zk?)13:50
fungigryf: you could probably rely on ansible to save the output as a cached fact and then reference it like a variable in a later playbook, though if that data is sensitive i'm not sure facts are a safe way to pass it around13:52
corvusfungi: a small number of weeks, yes, though we've been partially writing to the cache for a larger number of weeks.13:52
corvusfungi: i can help dig into this in a few hours13:52
fungiokay, so it's conceivable we've cached that config from prior to july 21, at least13:53
gryffungi, I guess that's not the case, since this is all happening within job, so after job is done, it's already too late. I guess, I'd go with the ssh way.13:53
fungicorvus: no hurry, i was just trying to figure out what to be looking at. i'll see if we still have scehduler logs from the timedframe where that file rename merged13:53
pabelanger[m]gryf: you can use set_fact but delegate_to the other host, then the next play can access it. You may want to set cacheable to be safe14:00
pabelanger[m]if the nodes are in the same job / nodeset14:01
pabelanger[m]child jobs, you'll need to use zuul_return14:01
fungicorvus: it hadn't dawned on me that the cache might persist past scheduler restarts, but in retrospect that makes sense and explains the reconfigure on start change (804806)14:03
fungii'm going to conveniently blame the fact that people always seem to want to report these issues while i'm asleep, and so i foolishly start trying to debug them before breakfast14:15
gryfpabelanger[m], can you provide some general job example for this please?14:27
opendevreviewBenjamin Schanzel proposed zuul/nodepool master: Add Tenant-Scoped Resource Quota  https://review.opendev.org/c/zuul/nodepool/+/80076514:35
clarkbgryf: https://opendev.org/zuul/zuul-jobs/src/branch/master/roles/ensure-tox/tasks/main.yaml#L15-L19 that ensures a cacheable fact called tox-executable that later playbooks can use to run tox15:19
clarkbgryf: that is the sort of thing pabelanger[m] is describing15:19
*** jpena is now known as jpena|off15:42
opendevreviewFelix Edel proposed zuul/zuul master: WIP NodeRequest watches  https://review.opendev.org/c/zuul/zuul/+/80496115:44
*** rpittau is now known as rpittau|afk16:06
pabelanger[m]gryf: https://docs.ansible.com/ansible/latest/user_guide/playbooks_delegation.html#delegating-facts16:10
pabelanger[m]actually: https://docs.ansible.com/ansible/latest/user_guide/playbooks_delegation.html#delegating-tasks16:10
clarkbhttps://lwn.net/ml/linux-kernel/xmqq1r6touqi.fsf@gitster.g/ git 2.33 released on Monday and includes a new merge algorithm. This algorithm is expected to become the default in the next release. It is apparently much much faster under certain situations16:34
clarkbWondering if maybe zuul should consider updating its docker images to include git 2.34 whenever that happens as merge times are one of the big time costs in zuul?16:35
fungido we perform default strategy merges now? i know at one point it tried to mimic the limited merge strategy implemented by jgit because of wanting to avoid merging something gerrit couldn't16:36
clarkbfungi: I think it depends on the merge context? but maybe we do it globally16:37
clarkbya strategy is an optional argument to the merge method. Now to see if we ever call it with None set16:38
corvusthe default is 'git merge -s resolve'16:38
corvusif you waint plain 'git merge' from zuul, it's a specific setting16:38
corvuswe could add 'merge-sort'  (lol :) as another strategy though; but i don't know that it makes sense to change zuul's default unless the code review systems change16:40
corvusbut maybe github will because of their close ties to the git folks?16:40
clarkbcorvus: I think when git 2.34 happens zuul.model.MERGER_MERGE will use the new mode16:41
corvusclarkb: yes16:42
clarkbwhich is probably sufficient here16:42
clarkband ya as long as jgit is simpler then resolve is likely best there (and apparently resolve is quite fast too but doesn't succeed as often?)16:42
opendevreviewJames E. Blair proposed zuul/zuul master: Add delete-state command to delete everything from ZK  https://review.opendev.org/c/zuul/zuul/+/80430416:44
corvusclarkb, fungi: apropos the current opendev issue, ^ is a "reset button" for zuul operators16:45
fungineat!16:46
fungibig red switch16:46
gryfcorvus, pabelanger[m], I'll take a look, thanks!16:58
corvusfungi, clarkb: on a first pass through here -- i'm beginning to suspect we may not delete files from the cache in zk, which was fine right up until we started relying on the list of files in zk as canonical on startup.17:17
corvus(we do delete entire projects from the cache, but if we delete files, i'm missing it)17:17
corvusthat still doesn't explain why this worked yesterday.17:18
fungiwhich commit started relying on the file list from zk? could that have maybe merged between our last two scheduler restarts?17:19
corvusfungi: https://review.opendev.org/771463 merged on aug 617:23
corvuswe restarted the next day.  we restarted aug 7, aug 10, and aug 1717:25
fungiso we should have seen this cropping up as soon as the 7th17:25
fungipuzzling17:26
corvusmaybe they merged a config change since then which did not rely on zk for the file cache17:27
corvuss/cache/list/17:27
corvusyes, they did17:28
fungiwell, for the observed symptom, we saw zuul was able to enqueue a change in the check pipeline yesterday, which today it could no longer freeze the job graph for. all open changes they've tried for that same project are seeing that behavior today17:28
corvusso it's possible that when zuul sees a change that touches the config, it gets the list of existing files from the merger via cat jobs, but on startup, it will get the set of files from zk.17:29
corvusso every time they merge a config change they "fix" the problem, temporarily, until the next time we restart the scheduler, at which point it 'breaks" again17:29
fungioh, got it17:29
fungithat explains why they ended up doing https://review.opendev.org/804356 and then thought it solved a problem17:30
fungiall it really did was convince zuul to start using a current configuration there, until the next restart17:31
corvusyep17:31
corvusi think this is enough information for me to start a reproducer test case; i'm going to take a short break, then start working on that.  i may be only intermittently available for the next little bit because this is a heads-down kind of thing.17:31
fungithanks! i agree this seems to explain everything we saw17:31
corvusi'm not going to issue the .1 release i was planning on, because i think this is a more important fix than the other things, and that will take time.17:31
fungiit does seem like a regression worth including in whatever we tag next17:32
fungier, worth including a fix for17:32
Clark[m]I'll have to catch up as I remembered the garden needed tending before the sun showed up. Then I found a present from what I think is a raccoon...17:33
dmsimardI'm already on matrix but was surprised to see the thread on zuul-discuss about moving off of IRC entirely :o17:39
clarkbAgreed that holding off on the .1 makes sense so that this can be included in that release17:53
clarkbdmsimard: that has been the intention for a while, just a matter of getting a homeserver and bots on it sorted out17:54
fungidmsimard: https://zuul-ci.org/docs/zuul/reference/developer/specs/community-matrix.html17:55
dmsimardthanks for the info and the link18:04
corvusclarkb, fungi: i have a test reproducer19:04
corvus(which confirms the theory)19:04
fungicorvus: splendid! thanks for working on the proof19:07
*** dviroel|ruck is now known as dviroel|ruck|out19:07
fungii guess once there's a fix we're also going to want to recommend that deployments flush the state cache (so 804304 will come in handy there)?19:09
Clark[m]Does the flush delete all the secrets too? If so we should proceed carefully :)19:12
fungiit does not19:13
fungiat least the documentation added with it says it doesn't19:13
fungibut worth double-checking the code to make sure when reviewing19:13
corvusfungi, Clark: i think a 'zuul-scheduler full-reconfigure' would be sufficient to fix; but let's check back in on that when i'm done with the fix :)19:21
fungiabsolutely19:22
opendevreviewJames E. Blair proposed zuul/zuul master: Fix error with config cache  https://review.opendev.org/c/zuul/zuul/+/80507820:00
corvusclarkb, fungi, swest, tobiash: ^20:01
clarkbreviewing20:01
corvusactual fix is 7 lines, naturally20:02
fungithat many? ;P20:03
opendevreviewJames E. Blair proposed zuul/zuul master: Fix error with config cache  https://review.opendev.org/c/zuul/zuul/+/80507820:04
corvusps2 only fixes typos in commit message20:04
corvusgrr20:04
opendevreviewJames E. Blair proposed zuul/zuul master: Fix error with config cache  https://review.opendev.org/c/zuul/zuul/+/80507820:05
corvusditto20:05
tobiash[m]wow20:05
corvusto be fair, 2 of the 7 lines are only there because the linter doesn't like long lines :)20:08
corvusand 4 of them are a log line20:09
clarkbcorvus: super small thing noted inline that may be worth another ps if you can take a look20:15
clarkbI +2'd though in case you don't think it is a big deal you can probably +A the change20:15
corvusclarkb: yes we should do that; that actually matches what we do elsewhere20:16
corvuscan anyone think of a reason this should be startswith and not == ?  https://opendev.org/zuul/zuul/src/branch/master/zuul/configloader.py#L240820:22
corvusclarkb: oh, we actually don't append the / in that place either20:24
clarkbcorvus: since extra config files is distinct from extra config dirs I think line 2408 that you call out should be an ==20:25
clarkbwe should only need startswith for the dirs case20:25
corvusyeah.  i feel like that would be better.  i'm going to fix that in this change, mostly because i want the two cases to be consistent, and if that's going to fail tests, i want to know now, before we merge the main fix.20:27
opendevreviewJames E. Blair proposed zuul/zuul master: Fix error with config cache  https://review.opendev.org/c/zuul/zuul/+/80507820:27
corvusclarkb, tobiash: tell me if PS3..PS4 looks correct; i'm going to kick off a full local test run20:28
clarkbcorvus: I think my only concern with what you have is line 2429. What happens if the extra config dirs are passed with a trailing / do we strip that off?20:29
clarkbotherwise you may need to check for a trailing / first? I guess that is true for the code on line 1856 too20:29
corvusclarkb: yes we do https://opendev.org/zuul/zuul/src/branch/master/zuul/configloader.py#L169220:29
clarkbin that case +220:30
corvusit's actually significant in the config specification; 'trailing /' means directory in the yaml file, then we strip it once we store it in the _dirs variable20:30
corvusand it looks like the merger is designed to return only files if you ask for files (it's not going to return a dir that matches a requested file).  that's the only other reason i could think of using the startswith.20:33
corvustobiash: ^ if you have a second for a re-review20:33
*** timburke_ is now known as timburke21:00
*** y2kenny is now known as Guest476621:07
*** Guest4766 is now known as y2kenny21:10
y2kennyWhen dealing with git repository that uses submodule, are there any special configuration needed with Zuul?21:13
corvustobiash: i'm carrying over your +2 and approving that21:15
clarkby2kenny: your jobs needs to be smarter and the coordination between repos that zuul does gets trickier. But at a very base level no21:16
corvusy2kenny: no, but zuul doesn't know anything about them (only enough to not corrupt data).  getting them to work with zuul's speculative execution can be considerable work.21:16
corvus(and depends heavily on the environment they're used in -- what works in some regime may not work in another)21:17
y2kennyok... so the executor will just checkout the parent without any additional 'git submodule' commands and any nesting will need to be handled by the job?21:19
corvusyes21:19
y2kennyok thanks.21:19
opendevreviewMerged zuul/zuul-operator master: Document externalConfig  https://review.opendev.org/c/zuul/zuul-operator/+/80079121:58
opendevreviewMerged zuul/zuul-operator master: Fix config update detection  https://review.opendev.org/c/zuul/zuul-operator/+/80099122:03
opendevreviewMerged zuul/zuul master: Fix error with config cache  https://review.opendev.org/c/zuul/zuul/+/80507822:23
clarkbcorvus: ^ I guess that needs a restart and then the full reconfigur?22:27
corvusyep, i'll get that going over in #opendev22:32
opendevreviewJames E. Blair proposed zuul/zuul master: Fix error with config cache some more  https://review.opendev.org/c/zuul/zuul/+/80508923:42
corvusclarkb, fungi, tobiash, swest: ^ part 2.  i don't think that's as urgent, but i would like to merge and restart with that before making the .1 release23:43
fungithanks, reviewing23:43
opendevreviewIan Wienand proposed zuul/zuul master: web: Jobs: Use TreeView for job overview page  https://review.opendev.org/c/zuul/zuul/+/80495623:51
clarkbyup reviewing23:55

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!