Tuesday, 2018-06-12

openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/projects and /{tenant}/project/{project} routes  https://review.openstack.org/55097901:05
*** rlandy|rover|bbl is now known as rlandy|rover02:08
*** rlandy|rover has quit IRC02:49
*** ianychoi has joined #zuul03:51
*** Wei_Liu1 has joined #zuul04:37
*** Wei_Liu has quit IRC04:37
*** Wei_Liu1 is now known as Wei_Liu04:37
*** frickler has joined #zuul04:56
*** fbo has quit IRC05:11
*** mhu has quit IRC05:11
*** mhu has joined #zuul05:11
*** fbo has joined #zuul05:11
*** myoung|off has quit IRC05:11
*** myoung has joined #zuul05:17
*** pcaruana has quit IRC05:24
tobiashcorvus: started debugging and it seems that it's just the with_items which causes the id to be none06:09
tobiashnow looking into how to fix that06:10
*** Rohaan has joined #zuul06:10
*** Wei_Liu has quit IRC06:12
*** hashar has joined #zuul06:21
*** Rohaan___ has joined #zuul06:24
*** Rohaan has quit IRC06:24
*** Rohaan___ is now known as Rohaan06:26
*** pcaruana has joined #zuul06:27
tobiashoh actually it's not the loop but the notify which breaks it06:35
*** Wei_Liu has joined #zuul06:36
*** Rohaan has quit IRC06:37
tobiashand also fails with only one node06:37
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: reproducer for 2002528  https://review.openstack.org/57448706:47
tobiashcorvus, clarkb: that should be a minimal reproducer ^06:47
tobiashthe cause is adding a notify to the task06:47
tobiashno idea yet why06:48
tobiashso the v2_playbook_on_task_start seems to be ok, it injects the log id07:03
tobiashhttp://paste.openstack.org/show/723262/07:03
tobiashbut it somehow doesn't reach the task07:03
*** gtema has joined #zuul07:08
*** Wei_Liu has quit IRC07:18
tobiashwhen I override the command action module the zuul_log_id is still there07:21
tobiashoh no, it's actually vanishing between the callback and calling the command action plugin (still on the ansible runner)07:24
tobiashhttp://paste.openstack.org/show/723264/07:30
tobiashhrm, the action module seems to be called twice in that task07:30
tobiashonce with the id and once without07:30
tobiashah, the second is already the handler07:30
*** pcaruana has quit IRC07:34
*** jpena|off is now known as jpena07:35
*** pcaruana has joined #zuul07:49
*** pcaruana has quit IRC07:59
*** Rohaan has joined #zuul08:04
openstackgerritTristan Cacqueray proposed openstack-infra/zuul master: web: add initial GraphQL controller  https://review.openstack.org/57462508:10
*** pcaruana has joined #zuul08:15
*** electrofelix has joined #zuul08:28
*** Wei_Liu has joined #zuul08:29
*** Wei_Liu1 has joined #zuul08:34
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix broken command tasks in handlers  https://review.openstack.org/57464108:35
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix command tasks with free play strategy  https://review.openstack.org/57464208:35
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add blocks to the zuul stream test  https://review.openstack.org/57464308:35
tobiashcorvus, clarkb: I have a fix now ^08:35
tobiash:)08:35
*** Wei_Liu has quit IRC08:36
*** Wei_Liu1 is now known as Wei_Liu08:36
tobiashactually two fixes as the free play strategy (does anyone even use it?) was broken as well08:36
*** Rohaan___ has joined #zuul08:49
*** Rohaan has quit IRC08:50
tobiashhrm, looks like the current zuul misbehavior breaks the test which creating of an unwritable /tmp/console-None.log in the test09:07
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix broken command tasks in handlers  https://review.openstack.org/57464109:11
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix command tasks with free play strategy  https://review.openstack.org/57464209:12
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add blocks to the zuul stream test  https://review.openstack.org/57464309:12
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log  https://review.openstack.org/57467209:12
*** Wei_Liu1 has joined #zuul09:14
*** Wei_Liu has quit IRC09:15
*** Wei_Liu1 is now known as Wei_Liu09:15
*** Rohaan___ is now known as Rohaan09:42
RohaantristanC: Hello09:42
tristanCRohaan: hey o/09:43
RohaanSorry for pinging late, actually I had to do sfactory setup all over again.. that took time :(09:44
RohaanRight now I'm at the step where on first demo-project job Zuul reports openshift-test finger://dhcppc2:7979/a576f620dc8b42c6a8edc0c90d664246 : RETRY_LIMIT in 2s09:45
RohaanAfter changing Zuul handler to DEBUG, /var/log/zuul/executor.log shows: https://pastebin.com/93TkAdmR09:46
RohaantristanC: Do you have any idea what could I've missed?09:47
tristanCRohaan: seems like ansible and zuul version mismatch, did you installed both from https://softwarefactory-project.io/draft/zuul-openshift/#orgheadline7 ?09:49
RohaanI think I installed that09:51
Rohaanbut lemme try again :)09:51
tristanCwell ansible-2.5 support is now merged in the master version, so yum update should also works09:54
RohaantristanC: These both packages seem to be installed: https://pastebin.com/ZMs5G6it09:55
tristanCRohaan: and did you applied https://review.openstack.org/#/c/570668/ ?09:57
*** Wei_Liu has quit IRC09:58
RohaantristanC: Yes, those files are modified accordingly10:00
tristanCRohaan: can you try systemctl restart rh-python35-zuul-executor, it seems like "zuul/ansible/callback/zuul_json.py", line 119" doesn't match what is on disk10:05
RohaantristanC: okay10:08
tristanCRohaan: or maybe there is something wrong with the config/playbooks/openshift/pre.yaml file, could you paste its content?10:09
RohaantristanC: https://pastebin.com/A5ngqtbi10:10
RohaantristanC: Shall I try removing and installing ansible?10:23
tristanCRohaan: the version seems correct, i'm trying to reproduce locally. Are the import_tasks files (e.g. prepare-namespace.yaml) in the config project?10:24
Rohaanthey are in /root/config/playbooks/openshift/10:25
Rohaanbuild-project.yaml  deploy-project.yaml  prepare-namespace.yaml  pre.yaml10:26
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix broken command tasks in handlers  https://review.openstack.org/57464110:31
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix command tasks with free play strategy  https://review.openstack.org/57464210:31
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add blocks to the zuul stream test  https://review.openstack.org/57464310:31
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log  https://review.openstack.org/57467210:31
*** Wei_Liu has joined #zuul10:37
Diabelkohello10:39
Diabelkois there somewhere a list of variables I can use for subject for the smtp reporter?10:40
Diabelkohttps://docs.openstack.org/infra/zuul/admin/drivers/smtp.html10:40
DiabelkoI want to put branch name overe there10:40
tristanCRohaan: I'm unable to reproduce your issue, though I suspect our environments to be different because of the manual cherry-picking of the openshift driver. let me prepare new rpm with all the fix needed, it will be easier to reproduce10:44
RohaantristanC: I didn't cherry pick your changes in this vm. I simply copied them from my host machine(where I cherry-picked)10:48
*** jpena is now known as jpena|lunch11:04
*** GonZo2000 has joined #zuul11:17
*** GonZo2000 has quit IRC11:28
*** GonZo2000 has joined #zuul11:48
*** GonZo2000 has quit IRC11:55
*** jpena|lunch is now known as jpena11:58
*** GonZo2000 has joined #zuul12:03
*** GonZo2000 has joined #zuul12:03
*** Rohaan has quit IRC12:07
*** Rohaan___ has joined #zuul12:07
*** Rohaan___ is now known as Rohaan12:08
*** rlandy has joined #zuul12:12
*** rlandy is now known as rlandy|rover12:12
tobiashclarkb: starting to analyse the double enqueue issue now12:34
*** Rohaan has quit IRC12:53
*** pcaruana has quit IRC13:00
*** Wei_Liu has quit IRC13:11
mordredtobiash: \o/ good stack for the callback issue13:14
tobiashmordred: thanks :)13:20
mordredtristanC: re: https://review.openstack.org/#/c/551989/ - did you build with ZUUL_API_URL?13:20
tobiashclarkb: I'm having trouble reproducing the double enqueue issue13:21
tobiashclarkb: do you have some more information/logs for that?13:21
gtemamordred: any comments on https://review.openstack.org/#/c/572829/?13:21
*** dkranz has joined #zuul13:22
mordredgtema: ooh, good. I was looking for that yesterday13:24
Shrewsmight want to 'check experimental' that one13:24
tristanCmordred: no, but does it mean you have to set this when deploying on a sub path?13:26
tristanCmordred: because 573494 was enough to make it work without defining zuul_api_url, the location was constructed out of the basehref13:27
tristanCi don't mind adding the define, but then our zuul-webui won't be usable without the /zuul/ sub-path13:28
mordredtristanC: hrm...13:29
mordredno, you're right - we shouldn't need that just for sub-url if you're not also serving the angular stuff directly with apache13:30
tristanCmordred: i think the const baseHref was ok with this change https://review.openstack.org/#/c/573494/1/web/zuul/zuul.service.ts13:32
mordredtristanC: well - there's two issues there ...13:32
mordredtristanC: this.baseHref is also the result of calling getBaseHrefFromPath(window.location.pathname) - but it also allows being overridden by ZUUL_API_URL13:33
mordred(and removes the / from the end)13:33
mordredso using the local const baseHref shouldn't fix anything. however, I think the t/tenants.html you added might be the thing that was allowing the logic to work13:35
*** pcaruana has joined #zuul13:46
corvusDiabelko: you should be able to use "{change.branch}" (assuming that all the items in that pipeline have branches)14:09
*** pwhalen has quit IRC14:09
*** pwhalen has joined #zuul14:14
*** pwhalen has joined #zuul14:14
corvustobiash, mordred: regarding https://storyboard.openstack.org/#!/story/2002528  --  i see the explanation for the handler task, did you find a reproducer for the second task ("Discover hosts") ?14:14
tobiashcorvus: must have overlooked the second task14:15
tobiashcorvus: do you have a link to the source of the role containing the task?14:16
corvustobiash: http://logs.openstack.org/20/564220/1/check/devstack-multinode/6d88d0c/ara-report/file/aec3daab-8a41-499d-9a1f-e880bf9df974/#line-2614:17
tobiashah found it14:17
mordredcorvus: was just looking at that14:18
tobiashcorvus: that looks more difficult to reproduce :/14:22
corvustobiash: you wrote that blocks could potentially affect this, and that has a block.  can you think of anything related to blocks that might trigger it?14:22
tobiashmaybe, currently trying that14:22
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528  https://review.openstack.org/57448714:23
mordredcorvus: that looks like a good reproducer chunk14:24
corvustobiash: yes, i hit a wall yesterday with it -- i pushed up a playbook+role that was as close as i could yesterday (that was PS1 of that change, it seems that you removed that in PS2, i'm not sure why).  but it still doesn't reproduce the problem.14:25
tobiashcorvus: I was just focused on the error you reproduced with that and trimmed that down to a minimal version14:25
tobiashcorvus: I wasn't aware of a second error14:26
mordredcorvus: I can't see anything in the original source that is different from what you have in the reproducer :(14:26
corvustobiash: i'm sorry, i thought i wrote about it in the bug report14:26
tobiashcorvus: I think you did but I didn't read that close enough14:26
corvusi pushed it up so that you and mordred would have everything i'd done so far14:26
corvus(i say "i" but it was clarkb and i)14:26
tobiashjust spotted the first failure and hunted it down ;)14:27
tobiashand it took me four hours to hunt it down...14:27
corvusi'm worried that if what's there isn't sufficient, it's some strang interaction with previous roles or tasks14:28
tobiashcorvus: I'm trying to cherry pick your original reproducer onto the handler fix14:28
tobiashmaybe that can isolate the second issue14:29
corvusoh hey, it looks like my delegate_to task was called with no log id14:31
tobiashcorvus: you're right, it reproduces a second issue14:31
mordred\o/14:31
corvustobiash: it does?  if you cherry pick it onto the fix, discover hosts has no zuul_log_id?14:32
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: reproducer for 2002528  https://review.openstack.org/57448714:32
tobiashcorvus: that reproduces locally on top of the handler fix ^14:32
corvustobiash: it might be the delegate_to that caused it to fail14:33
tobiashcorvus: http://paste.openstack.org/show/723318/14:33
tobiashdidn't look yet what's causing this14:33
tobiashbut at least it reproduces14:33
*** CrayZee has joined #zuul14:33
corvusyeah, that's delegate_to14:33
tobiashyes14:34
corvusi put that in there because there was a delegate_to task before the "Discover hosts" task.  i didn't expect it to actually fail itself.  it was just to try to get the next block-task to fail.14:34
tobiashcorvus: http://paste.openstack.org/show/723319/14:38
tobiashI wonder if we need an additional callback there14:38
tobiashmaybe the delegated is its own virtual task14:38
mordredyah14:38
mordredtobiash: looking at the base callback class in 2.5 ... I wonder if we should maybe define v2_on_any14:39
mordredtobiash: and use that basically to make sure log id is set?14:39
corvusthat sounds promising :)14:39
tobiashI don't see that callback14:39
mordredlooking in lib/ansible/plugins/callback/__init__.py14:40
tobiashadded it for test and it at least seems to get called14:41
mordredthat's good :)14:41
tobiashI don't know yet if we can do something useful with it ;)14:43
tobiashmordred: on_any is not the solution: http://paste.openstack.org/show/723323/14:47
tobiashit's called like our working callback14:47
mordredtobiash: yuck14:47
corvusi don't understand why it won't work?14:48
tobiashnext step would be to override the command action plugin to check what's arriving there14:48
corvusoh -- you're saying that even on_any isn't being called for the delegated task?14:48
tobiashcorvus: under the assumption that we're missing a callback I would have assumed that on_any is called once more between lines 7 and 814:49
tobiashcorvus: yes14:49
corvusgot it14:49
mordredOH!14:49
openstackgerritMerged openstack-infra/zuul master: Fix broken command tasks in handlers  https://review.openstack.org/57464114:49
openstackgerritMerged openstack-infra/zuul master: Fix command tasks with free play strategy  https://review.openstack.org/57464214:49
mordredtobiash: task.action = shell14:49
mordredif task.action == 'command':14:49
tobiashmordred: where?14:50
mordredwe're getting shell for action now14:50
tobiashyes14:50
tobiashsince 2.514:50
mordredtobiash: in v2_playbook_on_task_start - we do if task.action == 'command':14:50
tobiashthat broke *all* shell tasks on our first rollout ;)14:50
mordredah. crap - am I just looking at a bad local checkout?14:50
* mordred sighs14:50
tobiashmordred: I'm pretty sure I fixed that ;)14:50
tobiashcorvus, mordred: I guess we should override the command action plugin and inspect the content of that further14:53
tobiashbut I have to run now14:53
tobiashmaybe I can assist later14:53
corvustobiash: thanks!14:53
*** mordred has quit IRC14:55
clarkbis it worth an update to https://docs.ansible.com/ansible/2.5/porting_guides/porting_guide_2.5.html upstream for on_handler_start?15:01
*** mordred has joined #zuul15:08
Diabelkocorvus: I'll try that, thanks.15:08
* mordred is back in channel - irc had a sad15:19
clarkbcatching up, we believe we have fixed the configure mirror problem, does that fix the later devstack multinode failure?15:25
clarkbI think that later devstack multinode issue is due to free strategy?15:26
clarkbcorvus: should openstack infra plan on an executor restart here in a few?15:26
corvusclarkb: i don't think free is involved15:27
corvusclarkb: so i don't think we've solved the devstack multinode case15:27
corvusclarkb: we have identified the second trigger in the helm case -- delegated tasks15:28
corvusbut we don't have a solution for that yet15:28
*** CrayZee has quit IRC15:28
clarkbah yup, we set the strategy to lienar in devstack15:29
tobiashcorvus: free should be resolvef15:33
tobiashWith 57464215:34
corvustobiash: right, but none of our bug reports include free15:35
corvusi was trying to tell clarkb that we have not identified the devstack problem15:35
tobiashcorvus: nevermind, you wrote involved and i read resolved...15:35
corvus(we have seen 3 problems, have reproducers for 2 and have fixed 1)15:36
tobiashWhat's the third problem?15:36
corvushandler, delegate, and the unknown devstack problem15:40
Diabelkocorvus: works, luv ya, thx :)15:44
corvusclarkb, tobiash: we could restart our executors with the changes so far and see what happens in the devstack multinode job.  it should still fail because of the extra check that tobiash added, but it may fail more cleanly and we may get more data15:44
corvusclarkb, tobiash: thoough it's possible though that more jobs may fail -- jobs which have two null logs with different permissions will still fail.  jobs with one null log will still succeed.  but jobs with two null logs and the same permissions will now fail.15:45
clarkbya probably better to try and understand it more if possible15:46
tobiashcorvus: with that stack any job with a null log will fail15:46
tobiashBecause that also adds a validation to the module15:47
fungitobiash: dmsimard: https://github.com/ansible/ansible/pull/41414/commits/4e31216 looks like public disclosure to me. i expect we can switch story 2002177 to public?15:47
fungicorvus: ^15:48
dmsimardfungi: your timing is interesting, I was asking in #ansible-devel about that just now15:48
tobiashSo without fixing the other problem actually more jobs might fail15:48
*** mordred has quit IRC15:48
dmsimardfungi: we need to land the zuul fix that tobiash worked on first though15:48
Shrewscorvus: so, quick design question on the static driver changes... I'm thinking it makes sense for the provider manager to be able to register static nodes when its start() method is called during config reconfiguration. Does that sound reasonable, or can you think of a different approach to consider?15:49
corvusdmsimard: do we need to land the zuul fix, or do we need to upgrade ansible?15:49
fungidmsimard: yeah, proof that it's hard to keep an embargo while submitting fixes to a public review system15:49
tobiashcorvus: I assume that we still need the fix in the callbacks15:50
corvusShrews: that sounds reasonable -- it seems in sync with our idea of managing static nodes by changing config files15:50
Shrewscorvus: *nod*. I've toyed with maybe some sort of config-change-notifcation idea, but since start() is called on such changes, seemed logical to reuse that15:51
Shrewsjust need to pass it a zk connection15:51
corvustobiash: how about we revert the callback changes, then do the no_log changes, then work on a more complete stack of callback changes and merge them when we think we have everything covered?15:51
*** mordred has joined #zuul15:51
tobiashcorvus: fine for me15:52
mordredwow. irc is very unhappy for me to day15:53
mordredcorvus: I think that's a good idea15:53
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Revert callback fixes  https://review.openstack.org/57478515:53
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix broken command tasks in handlers  https://review.openstack.org/57478615:54
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Fix command tasks with free play strategy  https://review.openstack.org/57478715:54
corvustobiash: you can stack your no_log change on 57478515:54
tobiashok, back @laptop now15:54
tobiashjust a sec15:54
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix information disclosure caused by unreachable nodes  https://review.openstack.org/57478815:56
tobiashthere it is15:57
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add blocks to the zuul stream test  https://review.openstack.org/57464315:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log  https://review.openstack.org/57467215:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528  https://review.openstack.org/57448715:58
corvusmordred: want to +W 574785?16:00
corvusi'm looking at 78816:01
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Fix information disclosure caused by unreachable nodes  https://review.openstack.org/57478816:02
tobiashfixed typo in assert ^16:02
mordredtobiash: whoops16:02
tobiashalso just noticed that a few seconds ago ;)16:03
*** aspiers[m] has quit IRC16:08
tobiashcorvus, mordred: that delegate is getting spooky: http://paste.openstack.org/show/723326/16:17
tobiashI duplicated the delegated task into the first play with the expectation that it fails16:17
tobiashbut it doesn't16:17
tobiashI've also overwritten the command action plugin to spit out some logging info16:18
corvusinteresting16:19
mordredwow16:19
mordreduhm16:19
corvusokay, i'm going to go off and try to even more faithfully reproduce the devstack-multinode issue.16:21
mordredtobiash: I wonder if they're making separate task.args objects for each host16:21
tobiashlooks like16:22
mordredtobiash: in the code you have that made that paste - what's the difference between the first and second plays?16:24
tobiashone difference is play vs role16:25
tobiashand the role does stuff with include_role before that16:26
openstackgerritMerged openstack-infra/zuul master: Switch content type of public keys back to text/plain  https://review.openstack.org/57422016:31
openstackgerritMerged openstack-infra/zuul master: Fix signature of overridden methods in LogStreamHandler  https://review.openstack.org/57420416:31
*** aspiers[m] has joined #zuul16:34
openstackgerritMerged openstack-infra/zuul master: Accumulate errors in context managers - part 2  https://review.openstack.org/57402816:47
clarkbmordred: tobiash separate task objects entirely was my guess yesterday16:49
clarkbthis seemed to be backed up by a change that no longer cached all the tasks upfront but instead cached them when running them16:49
clarkb(so the objects are now created per host,task pair I think)16:49
clarkband then the linear strategy only runs the on_task_start callback once16:50
mordredclarkb: ya - thething is we're not getting a callback event triggered for the second task object16:50
clarkbnot once per task,host16:50
mordredah16:50
clarkbthere is a flag to track if the callback was called and is set after calling it once then they check the flag and never call it again16:50
clarkbthis is in the linear strategy module code16:50
tobiashit also seems to have something to do with the include_role task before16:51
tobiashif I disable that it works16:51
clarkbone idea I had was to use task._uuid (this is terrbiel for _ reasons) instead of making our own uuids16:51
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer (not working) for story 2002528  https://review.openstack.org/57480816:53
mordredclarkb: yah - so - the thing is we need to get the task._uuid into the parameters passed to the remote side16:53
corvusclarkb, tobiash, mordred: ^ i started with the actual playbooks/roles for that job, then only slightly edited them to be no-ops.  still no error.16:54
mordredand we need the uuid to be the same across tasks, because we need to be able to stream the one file across multple tasks16:55
mordredcorvus: ugh16:55
corvusmordred: i thought it was a unique uuid per task?16:55
tobiashcorvus: a task can span multiple hosts16:55
corvusoh, i grok16:55
*** gtema has quit IRC16:56
corvusi guess we have a language problem with describing a single playbook-level task which spans multiple hosts (but has separate constituent host-tasks as viewed in the callback), versus separate playbook-level tasks?16:57
mordredcorvus: ++16:57
mordredthat said ... lemme go try something real quick ...16:58
*** hashar is now known as hasharDinner16:59
*** toabctl has quit IRC17:05
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: delegate reproducer for story 2002528  https://review.openstack.org/57482317:12
tobiashcorvus, mordred: that is a minimal working reproducer of the delegated shell problem ^17:12
tobiashit also contains some attempts to get debugging information17:13
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: delegate reproducer for story 2002528  https://review.openstack.org/57482317:17
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: delegate reproducer for story 2002528  https://review.openstack.org/57482317:23
tobiashnow really minimal17:23
tobiashcorvus, mordred: what's interesting is if I switch the strategy to 'free' the test works17:24
tobiashstrategy linear breaks17:24
*** jpena is now known as jpena|off17:24
mordredtobiash: yah - I'm thinking clarkb is right about it being something the linear strategy is doing differently with callback api calls17:25
clarkbmordred: I dropped a sha1 yseterdy for the change that look suspicious to me. let me find it17:26
tobiashcorvus, mordred: latest run with stack trace in the command action plugin: http://paste.openstack.org/show/723330/17:26
clarkbd6872a7b070d1179e7d76bcda1490bb7003c4574 b107e397cbd75d8e095b08495da2ac2aad3ef96f were the two I idenfitied17:26
*** elyezer has joined #zuul17:34
tobiashclarkb: I suspect the second one rather the first one (as blocks seem to be not needed for the reproducers)17:35
clarkbya that was the one that looked related to not having the same task object for all hosts17:36
*** gtema has joined #zuul17:36
tobiashjust tested against ansible-devel and the issue persists17:38
clarkbtobiash: looking at your traceback I think if we can log the id of the task object in linear.py line 305 or so as well as log each time the on task start is called line 300 or so then we can confirm that they are different and only called the once17:39
clarkb(or step through it with pdb but that is probably more involved)17:40
tobiashcan we override linear.py?17:40
mordredI think so - same as other things - we should be able to make our own linear strategy plugin17:40
clarkbwe can add our own strategy plugins and probably provide a replacement linear17:40
tobiashhm, probably as it's just a plugin17:40
openstackgerritMerged openstack-infra/zuul master: Revert callback fixes  https://review.openstack.org/57478517:42
mordredclarkb: ok. I think the linear strategy thing is making some amount of sense to my brain now17:44
clarkbwe could also just patch ansible directly in the tox install and yolo it17:45
clarkbmordred: thats good beacuse their get next task iterator state machine was making me want to do yardwork yesterday17:45
clarkbbut the yardwork is done now so I don't have it as an alternative task anymore17:46
mordredwell - I mean - conceptually, not all of the specifics17:46
mordredbut it's basically only calling callback task_start when it starts running the task - which then has the loop over hosts thing. for the other strategies, we get task_start on each task/host combo, because they're happening in parallel, so there isn't a 'start' that's better than calling it on each one17:47
*** GonZo2000_ has joined #zuul17:48
clarkbyup17:48
clarkbwhat I haven't been able to udnerstand is if the task object is different for each host17:48
*** GonZo2000 has quit IRC17:48
clarkbbut the behavior implies that it is, first one through works next one through doesn't17:48
mordredjimi|ansible: (bugging you because we found a patch you wrote that might be related) ... we're tracking down an issue in zuul land that we're hitting on 2.5 related to the linear strategy17:48
clarkbalso linear runs them in parallel its just that they are in parallel and wait until all complete before going to the next task17:50
mordredjimi|ansible: from what we can tell, when linear strategy is used, the callback plugin method playbook_on_task_start is only called once for a given task even if it has more than one host17:50
mordredjimi|ansible: there is some debug printing here showing that: http://paste.openstack.org/show/723330/17:50
mordredjimi|ansible: with non-linear, we get a playbook_on_task_start for every task/host combo17:51
mordredjimi|ansible: we are currently using playbok_on_task_start to add something to task.args - but in 2.5 with linear this is breaking because subsequent hosts in the task don't get that (or any other callback method that we can find) called for subsequent hosts17:53
mordredjimi|ansible: is this intended behavior (not calling the callback for each task/host on start but instead once per task)?17:55
tobiashI've hooked into the strategy and _queue_task is already called without the zuul_log_id18:02
*** gtema has quit IRC18:04
tobiashclarkb: I think I see the error: https://github.com/ansible/ansible/blob/stable-2.5/lib/ansible/plugins/strategy/linear.py#L30018:05
tobiashit's calling the callback on a copy of the task18:05
tobiashoh wait, it's only messing with the task name18:06
openstackgerritMerged openstack-infra/zuul master: Fix information disclosure caused by unreachable nodes  https://review.openstack.org/57478818:06
mordredtobiash: look in _get_next_task_lockstep18:07
mordredtobiash: it's the fact that we have a when that makes the task a no-op for some hosts18:07
mordredthose hosts get a noop task created - and I think it's the noop task on which we are setting the zuul_log_id - so it doesn't carry over18:07
clarkboh that would explain it. the tasks would normally be shared except for when it is conditional and replaced by a different task and goes first18:08
mordredyah18:08
corvusthis feels like it should be the cause of the "discover hosts" error too18:09
clarkbcorvus: yup that has a when as well18:09
corvusi found an error in my earlier repro attempt, so i'm still working on that18:10
mordredI think we could write a patch to the strategy plugin that checks to see if the task is a noop task and if so _doesn't_ fire the task_start callback (so that it won't fire the callback until it hits a real task)18:11
mordredI'm not sure if that would be acceptable upstream - but it's worth a stab18:11
corvusomg omg omg omg did it just work?  http://paste.openstack.org/show/723334/18:12
mordredcorvus: I think so!18:12
mordredcorvus: let me write a quick patch to the linear strategy plugin for ansible - then we can see aout making a local override of it with that applied and see if it fixes it?18:12
corvusw00t!  i'll commit that and push it up.  it requires gobs of locally prepared stuff still, but i can start eliminating things now18:12
corvusmordred: cool18:12
openstackgerritTobias Henkel proposed openstack-infra/zuul master: DNM: Add facility for overriding linear strategy  https://review.openstack.org/57484318:13
mordredwot18:13
mordredwoot18:13
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer for story 2002528  https://review.openstack.org/57480818:13
tobiashcorvus, mordred: that is a facility for overriding the linear strategy in zuul ^18:13
tobiashthat should enable you to try this out within the tox-remote18:14
tobiashmordred: according to my logs the callback is never called with a noop action18:15
tobiashmordred: and the reproducer only works with the include_role in front of the shell task18:16
*** electrofelix has quit IRC18:16
tobiashmordred: if I remove the include_role it doesn't reproduce18:16
jimi|ansiblemordred: yeah it's kind of strategy dependent due to the way linear works18:16
tobiashmordred: so I fear that your idea won't work18:16
corvustobiash, clarkb, mordred: i think the order of the hosts may not be deterministic18:16
corvusif that's so, that may be why we've had so much trouble reproducing these18:17
clarkbya that could explain it since you'd need the nooping host to go first18:17
corvusour nodeset is ordered, but we write it out to the inventory as a dict18:18
corvusi have to go grab lunch.  biab.18:19
tobiashmordred: oh we might have talked about different stuff, you were talking about issue 3 which corvus repruduced?18:20
tobiashI was talking about the delegate issue (issue 2)18:20
tobiashand there the root cause still seems to be some weird side effect maybe inside task caching?18:21
mordredtobiash: http://paste.openstack.org/show/723335 <-- how does that look?18:21
*** wissal has joined #zuul18:21
mordredtobiash: ah yes - I was talking about the other issue18:22
mordredjimi|ansible: nod. we're learning many fun things :)18:22
tobiashmordred: that looks good to me as ansible noob ;)18:23
*** wissal has quit IRC18:24
tobiashmordred: just curious, would we need that zuul_log_id if we switch to the log streaming you prototyped a few months ago?18:36
openstackgerritMerged openstack-infra/zuul master: Fix tox-cover  https://review.openstack.org/57408018:45
mordredtobiash: we would not18:56
*** hasharDinner is now known as hashar19:21
corvusi've confirmed that with my current repro, compute1,controller fails and controller,compute1 does not19:34
corvusi think we can just remove the when: from that task19:35
clarkbthe when is what produces the noop task in linear.py though right?19:36
corvushrm.  removing the 'when' doesn't cause the test to succeed.19:38
corvusmaybe we need to test two things :)19:48
pabelangerIs zuul_return setup to work across job dependencies? Looking to pass artifact information between 2 jobs19:48
corvuspabelanger: yes19:49
corvuspabelanger: https://zuul-ci.org/docs/zuul/user/jobs.html#parent-job-results19:49
pabelangerthanks19:51
corvusthe copy-build-sshkey role seems to be required19:53
pabelangercorvus: is zuul.log_url a special case? I'm having some issues seeing where that data would be passed to the child job19:55
pabelangerah19:56
pabelangerI see19:56
pabelangerAny values other than those in the zuul hierarchy will be supplied as Ansible variables to child jobs.19:56
pabelangerthanks19:56
corvusyep. so zuul.* is the special case19:56
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer for story 2002528  https://review.openstack.org/57480820:03
corvusclarkb, tobiash, mordred: ^ that's a minimal test for the discover hosts problem:  a shell following an include_role with a register of the user module20:03
*** elyezer has quit IRC20:04
corvusactually, it doesn't require the user module or registering20:06
corvusit appears to be a shell following an include_role (which can do anything)20:06
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer for story 2002528  https://review.openstack.org/57480820:07
*** aspiers[m] has quit IRC20:50
*** hashar has quit IRC20:55
*** dkranz has quit IRC20:58
corvusi'm going to try to get all of the DNM reproducers into shape20:59
openstackgerritJames E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528  https://review.openstack.org/57448721:03
*** pcaruana has quit IRC21:04
corvusclarkb, tobiash, mordred: i think the delegated issue was actually the include_role issue.  if i remove the include_role from 574823 it works21:05
clarkbcorvus: was the helm case using include_role?21:06
clarkbmaybe that was a level aboev the file I was looking at21:06
corvusclarkb: yep21:07
*** aspiers[m] has joined #zuul21:08
corvusclarkb: it was a few tasks back21:08
corvushttp://logs.openstack.org/55/574055/3/check/openstack-helm-infra-ubuntu/9e9ac15/ara-report/file/e595fcd2-dd71-4195-ae7e-9fc1ba885708/#line-4121:08
corvusline 23 has an include_role21:08
openstackgerritMichael Johnson proposed openstack-infra/zuul-jobs master: Collect the coverage report for npm test jobs  https://review.openstack.org/57026021:27
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Pass zk connection to ProviderManager.start()  https://review.openstack.org/57489521:38
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Make a proper driver notification API  https://review.openstack.org/57489621:38
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes  https://review.openstack.org/57489721:38
Shrewscorvus: i think that ^ almost covers it. I want to handle the case mentioned in 574897 commit message. I'll do that tomorrow.21:40
*** dtruong has quit IRC21:41
* Shrews EODs21:44
*** dtruong has joined #zuul21:47
openstackgerritMerged openstack-infra/nodepool master: webapp: fix browser return  https://review.openstack.org/57305321:55
corvusclarkb: i haven't been able to get a simple case of skipping the first host-task and running the second host-task to fail22:03
corvusclarkb: the earlier tests which sometimes failed based on order were after the include_role22:03
clarkbah so maybe it is only include_role at play?22:04
*** myoung is now known as myoung|bbl22:07
corvusclarkb: yeah... if i do include_role followed by a task that runs on the second host, it fails.  include_role followed by a task running on the first host succeeds.  include_role followed by a task that runs on both hosts fails on the second host only.22:20
corvusso i think that explains all the observed behavior22:20
clarkbthat is really weird22:21
corvuswell, explains is not the right word.22:21
clarkbcovers the behavior22:21
corvusi think we know how to make test cases that cover all the observed behavior, yeah :)22:22
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role  https://review.openstack.org/57448722:43
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log  https://review.openstack.org/57467222:43
corvusclarkb, mordred, tobiash: okay, i think if we insert the necessary fixes before 574487 we should be able to see the results.  i think 574487 has all the necessary test cases now.22:44
clarkbcool, before we EOD do we need to update ansible (reason for earlier revert aiui) then we can add those changes back in?22:45
corvusclarkb: yeah, we should be able to upgrade openstack-infra's install to patch the security bug.  i've put a -2 on the bottom of the missing zuul_log_id stack though to hold it until we have all the fixes staged.  i don't think we should merge any of those until it's all ready.22:47
corvuslet's switch to -infra to talk about the upgrade22:48
clarkbonce 2.5 logging is sorted we should probably make a release22:58
corvusya23:05
*** aspiers[m] has quit IRC23:16
clarkbis the include_role case the only one we don't have a fix for?23:19
*** jimi|ansible has quit IRC23:24
corvusclarkb: i think so -- i think everything else factored out down to that23:30
corvusi'm going to eod shortly, hopefully we can pick this up again tomorrow and figure that out :)23:31
corvusi updated the story with a very brief current summary23:33
clarkbya anytime I try to read ansible code at about 4:30pm my brain finds a whole list of things to do around the house :)23:33
clarkbthis stuff is relatively complicated and if you don't liev in it frequently its not easy to start23:33
clarkbmy skimming of how include_role works is that it is a type of task and the associated action is "include_role"23:34
clarkbbut I don't yet see what it would have a dominoe effect on subsequent tasks23:34
*** aspiers[m] has joined #zuul23:34
*** threestrands has joined #zuul23:36
corvusall things considered, i'm happy with the progress we made today :)23:38

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!