openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add /{tenant}/projects and /{tenant}/project/{project} routes https://review.openstack.org/550979 | 01:05 |
---|---|---|
*** rlandy|rover|bbl is now known as rlandy|rover | 02:08 | |
*** rlandy|rover has quit IRC | 02:49 | |
*** ianychoi has joined #zuul | 03:51 | |
*** Wei_Liu1 has joined #zuul | 04:37 | |
*** Wei_Liu has quit IRC | 04:37 | |
*** Wei_Liu1 is now known as Wei_Liu | 04:37 | |
*** frickler has joined #zuul | 04:56 | |
*** fbo has quit IRC | 05:11 | |
*** mhu has quit IRC | 05:11 | |
*** mhu has joined #zuul | 05:11 | |
*** fbo has joined #zuul | 05:11 | |
*** myoung|off has quit IRC | 05:11 | |
*** myoung has joined #zuul | 05:17 | |
*** pcaruana has quit IRC | 05:24 | |
tobiash | corvus: started debugging and it seems that it's just the with_items which causes the id to be none | 06:09 |
tobiash | now looking into how to fix that | 06:10 |
*** Rohaan has joined #zuul | 06:10 | |
*** Wei_Liu has quit IRC | 06:12 | |
*** hashar has joined #zuul | 06:21 | |
*** Rohaan___ has joined #zuul | 06:24 | |
*** Rohaan has quit IRC | 06:24 | |
*** Rohaan___ is now known as Rohaan | 06:26 | |
*** pcaruana has joined #zuul | 06:27 | |
tobiash | oh actually it's not the loop but the notify which breaks it | 06:35 |
*** Wei_Liu has joined #zuul | 06:36 | |
*** Rohaan has quit IRC | 06:37 | |
tobiash | and also fails with only one node | 06:37 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: DNM: reproducer for 2002528 https://review.openstack.org/574487 | 06:47 |
tobiash | corvus, clarkb: that should be a minimal reproducer ^ | 06:47 |
tobiash | the cause is adding a notify to the task | 06:47 |
tobiash | no idea yet why | 06:48 |
tobiash | so the v2_playbook_on_task_start seems to be ok, it injects the log id | 07:03 |
tobiash | http://paste.openstack.org/show/723262/ | 07:03 |
tobiash | but it somehow doesn't reach the task | 07:03 |
*** gtema has joined #zuul | 07:08 | |
*** Wei_Liu has quit IRC | 07:18 | |
tobiash | when I override the command action module the zuul_log_id is still there | 07:21 |
tobiash | oh no, it's actually vanishing between the callback and calling the command action plugin (still on the ansible runner) | 07:24 |
tobiash | http://paste.openstack.org/show/723264/ | 07:30 |
tobiash | hrm, the action module seems to be called twice in that task | 07:30 |
tobiash | once with the id and once without | 07:30 |
tobiash | ah, the second is already the handler | 07:30 |
*** pcaruana has quit IRC | 07:34 | |
*** jpena|off is now known as jpena | 07:35 | |
*** pcaruana has joined #zuul | 07:49 | |
*** pcaruana has quit IRC | 07:59 | |
*** Rohaan has joined #zuul | 08:04 | |
openstackgerrit | Tristan Cacqueray proposed openstack-infra/zuul master: web: add initial GraphQL controller https://review.openstack.org/574625 | 08:10 |
*** pcaruana has joined #zuul | 08:15 | |
*** electrofelix has joined #zuul | 08:28 | |
*** Wei_Liu has joined #zuul | 08:29 | |
*** Wei_Liu1 has joined #zuul | 08:34 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574641 | 08:35 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574642 | 08:35 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add blocks to the zuul stream test https://review.openstack.org/574643 | 08:35 |
tobiash | corvus, clarkb: I have a fix now ^ | 08:35 |
tobiash | :) | 08:35 |
*** Wei_Liu has quit IRC | 08:36 | |
*** Wei_Liu1 is now known as Wei_Liu | 08:36 | |
tobiash | actually two fixes as the free play strategy (does anyone even use it?) was broken as well | 08:36 |
*** Rohaan___ has joined #zuul | 08:49 | |
*** Rohaan has quit IRC | 08:50 | |
tobiash | hrm, looks like the current zuul misbehavior breaks the test which creating of an unwritable /tmp/console-None.log in the test | 09:07 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574641 | 09:11 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574642 | 09:12 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add blocks to the zuul stream test https://review.openstack.org/574643 | 09:12 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log https://review.openstack.org/574672 | 09:12 |
*** Wei_Liu1 has joined #zuul | 09:14 | |
*** Wei_Liu has quit IRC | 09:15 | |
*** Wei_Liu1 is now known as Wei_Liu | 09:15 | |
*** Rohaan___ is now known as Rohaan | 09:42 | |
Rohaan | tristanC: Hello | 09:42 |
tristanC | Rohaan: hey o/ | 09:43 |
Rohaan | Sorry for pinging late, actually I had to do sfactory setup all over again.. that took time :( | 09:44 |
Rohaan | Right now I'm at the step where on first demo-project job Zuul reports openshift-test finger://dhcppc2:7979/a576f620dc8b42c6a8edc0c90d664246 : RETRY_LIMIT in 2s | 09:45 |
Rohaan | After changing Zuul handler to DEBUG, /var/log/zuul/executor.log shows: https://pastebin.com/93TkAdmR | 09:46 |
Rohaan | tristanC: Do you have any idea what could I've missed? | 09:47 |
tristanC | Rohaan: seems like ansible and zuul version mismatch, did you installed both from https://softwarefactory-project.io/draft/zuul-openshift/#orgheadline7 ? | 09:49 |
Rohaan | I think I installed that | 09:51 |
Rohaan | but lemme try again :) | 09:51 |
tristanC | well ansible-2.5 support is now merged in the master version, so yum update should also works | 09:54 |
Rohaan | tristanC: These both packages seem to be installed: https://pastebin.com/ZMs5G6it | 09:55 |
tristanC | Rohaan: and did you applied https://review.openstack.org/#/c/570668/ ? | 09:57 |
*** Wei_Liu has quit IRC | 09:58 | |
Rohaan | tristanC: Yes, those files are modified accordingly | 10:00 |
tristanC | Rohaan: can you try systemctl restart rh-python35-zuul-executor, it seems like "zuul/ansible/callback/zuul_json.py", line 119" doesn't match what is on disk | 10:05 |
Rohaan | tristanC: okay | 10:08 |
tristanC | Rohaan: or maybe there is something wrong with the config/playbooks/openshift/pre.yaml file, could you paste its content? | 10:09 |
Rohaan | tristanC: https://pastebin.com/A5ngqtbi | 10:10 |
Rohaan | tristanC: Shall I try removing and installing ansible? | 10:23 |
tristanC | Rohaan: the version seems correct, i'm trying to reproduce locally. Are the import_tasks files (e.g. prepare-namespace.yaml) in the config project? | 10:24 |
Rohaan | they are in /root/config/playbooks/openshift/ | 10:25 |
Rohaan | build-project.yaml deploy-project.yaml prepare-namespace.yaml pre.yaml | 10:26 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574641 | 10:31 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574642 | 10:31 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add blocks to the zuul stream test https://review.openstack.org/574643 | 10:31 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log https://review.openstack.org/574672 | 10:31 |
*** Wei_Liu has joined #zuul | 10:37 | |
Diabelko | hello | 10:39 |
Diabelko | is there somewhere a list of variables I can use for subject for the smtp reporter? | 10:40 |
Diabelko | https://docs.openstack.org/infra/zuul/admin/drivers/smtp.html | 10:40 |
Diabelko | I want to put branch name overe there | 10:40 |
tristanC | Rohaan: I'm unable to reproduce your issue, though I suspect our environments to be different because of the manual cherry-picking of the openshift driver. let me prepare new rpm with all the fix needed, it will be easier to reproduce | 10:44 |
Rohaan | tristanC: I didn't cherry pick your changes in this vm. I simply copied them from my host machine(where I cherry-picked) | 10:48 |
*** jpena is now known as jpena|lunch | 11:04 | |
*** GonZo2000 has joined #zuul | 11:17 | |
*** GonZo2000 has quit IRC | 11:28 | |
*** GonZo2000 has joined #zuul | 11:48 | |
*** GonZo2000 has quit IRC | 11:55 | |
*** jpena|lunch is now known as jpena | 11:58 | |
*** GonZo2000 has joined #zuul | 12:03 | |
*** GonZo2000 has joined #zuul | 12:03 | |
*** Rohaan has quit IRC | 12:07 | |
*** Rohaan___ has joined #zuul | 12:07 | |
*** Rohaan___ is now known as Rohaan | 12:08 | |
*** rlandy has joined #zuul | 12:12 | |
*** rlandy is now known as rlandy|rover | 12:12 | |
tobiash | clarkb: starting to analyse the double enqueue issue now | 12:34 |
*** Rohaan has quit IRC | 12:53 | |
*** pcaruana has quit IRC | 13:00 | |
*** Wei_Liu has quit IRC | 13:11 | |
mordred | tobiash: \o/ good stack for the callback issue | 13:14 |
tobiash | mordred: thanks :) | 13:20 |
mordred | tristanC: re: https://review.openstack.org/#/c/551989/ - did you build with ZUUL_API_URL? | 13:20 |
tobiash | clarkb: I'm having trouble reproducing the double enqueue issue | 13:21 |
tobiash | clarkb: do you have some more information/logs for that? | 13:21 |
gtema | mordred: any comments on https://review.openstack.org/#/c/572829/? | 13:21 |
*** dkranz has joined #zuul | 13:22 | |
mordred | gtema: ooh, good. I was looking for that yesterday | 13:24 |
Shrews | might want to 'check experimental' that one | 13:24 |
tristanC | mordred: no, but does it mean you have to set this when deploying on a sub path? | 13:26 |
tristanC | mordred: because 573494 was enough to make it work without defining zuul_api_url, the location was constructed out of the basehref | 13:27 |
tristanC | i don't mind adding the define, but then our zuul-webui won't be usable without the /zuul/ sub-path | 13:28 |
mordred | tristanC: hrm... | 13:29 |
mordred | no, you're right - we shouldn't need that just for sub-url if you're not also serving the angular stuff directly with apache | 13:30 |
tristanC | mordred: i think the const baseHref was ok with this change https://review.openstack.org/#/c/573494/1/web/zuul/zuul.service.ts | 13:32 |
mordred | tristanC: well - there's two issues there ... | 13:32 |
mordred | tristanC: this.baseHref is also the result of calling getBaseHrefFromPath(window.location.pathname) - but it also allows being overridden by ZUUL_API_URL | 13:33 |
mordred | (and removes the / from the end) | 13:33 |
mordred | so using the local const baseHref shouldn't fix anything. however, I think the t/tenants.html you added might be the thing that was allowing the logic to work | 13:35 |
*** pcaruana has joined #zuul | 13:46 | |
corvus | Diabelko: you should be able to use "{change.branch}" (assuming that all the items in that pipeline have branches) | 14:09 |
*** pwhalen has quit IRC | 14:09 | |
*** pwhalen has joined #zuul | 14:14 | |
*** pwhalen has joined #zuul | 14:14 | |
corvus | tobiash, mordred: regarding https://storyboard.openstack.org/#!/story/2002528 -- i see the explanation for the handler task, did you find a reproducer for the second task ("Discover hosts") ? | 14:14 |
tobiash | corvus: must have overlooked the second task | 14:15 |
tobiash | corvus: do you have a link to the source of the role containing the task? | 14:16 |
corvus | tobiash: http://logs.openstack.org/20/564220/1/check/devstack-multinode/6d88d0c/ara-report/file/aec3daab-8a41-499d-9a1f-e880bf9df974/#line-26 | 14:17 |
tobiash | ah found it | 14:17 |
mordred | corvus: was just looking at that | 14:18 |
tobiash | corvus: that looks more difficult to reproduce :/ | 14:22 |
corvus | tobiash: you wrote that blocks could potentially affect this, and that has a block. can you think of anything related to blocks that might trigger it? | 14:22 |
tobiash | maybe, currently trying that | 14:22 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528 https://review.openstack.org/574487 | 14:23 |
mordred | corvus: that looks like a good reproducer chunk | 14:24 |
corvus | tobiash: yes, i hit a wall yesterday with it -- i pushed up a playbook+role that was as close as i could yesterday (that was PS1 of that change, it seems that you removed that in PS2, i'm not sure why). but it still doesn't reproduce the problem. | 14:25 |
tobiash | corvus: I was just focused on the error you reproduced with that and trimmed that down to a minimal version | 14:25 |
tobiash | corvus: I wasn't aware of a second error | 14:26 |
mordred | corvus: I can't see anything in the original source that is different from what you have in the reproducer :( | 14:26 |
corvus | tobiash: i'm sorry, i thought i wrote about it in the bug report | 14:26 |
tobiash | corvus: I think you did but I didn't read that close enough | 14:26 |
corvus | i pushed it up so that you and mordred would have everything i'd done so far | 14:26 |
corvus | (i say "i" but it was clarkb and i) | 14:26 |
tobiash | just spotted the first failure and hunted it down ;) | 14:27 |
tobiash | and it took me four hours to hunt it down... | 14:27 |
corvus | i'm worried that if what's there isn't sufficient, it's some strang interaction with previous roles or tasks | 14:28 |
tobiash | corvus: I'm trying to cherry pick your original reproducer onto the handler fix | 14:28 |
tobiash | maybe that can isolate the second issue | 14:29 |
corvus | oh hey, it looks like my delegate_to task was called with no log id | 14:31 |
tobiash | corvus: you're right, it reproduces a second issue | 14:31 |
mordred | \o/ | 14:31 |
corvus | tobiash: it does? if you cherry pick it onto the fix, discover hosts has no zuul_log_id? | 14:32 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: DNM: reproducer for 2002528 https://review.openstack.org/574487 | 14:32 |
tobiash | corvus: that reproduces locally on top of the handler fix ^ | 14:32 |
corvus | tobiash: it might be the delegate_to that caused it to fail | 14:33 |
tobiash | corvus: http://paste.openstack.org/show/723318/ | 14:33 |
tobiash | didn't look yet what's causing this | 14:33 |
tobiash | but at least it reproduces | 14:33 |
*** CrayZee has joined #zuul | 14:33 | |
corvus | yeah, that's delegate_to | 14:33 |
tobiash | yes | 14:34 |
corvus | i put that in there because there was a delegate_to task before the "Discover hosts" task. i didn't expect it to actually fail itself. it was just to try to get the next block-task to fail. | 14:34 |
tobiash | corvus: http://paste.openstack.org/show/723319/ | 14:38 |
tobiash | I wonder if we need an additional callback there | 14:38 |
tobiash | maybe the delegated is its own virtual task | 14:38 |
mordred | yah | 14:38 |
mordred | tobiash: looking at the base callback class in 2.5 ... I wonder if we should maybe define v2_on_any | 14:39 |
mordred | tobiash: and use that basically to make sure log id is set? | 14:39 |
corvus | that sounds promising :) | 14:39 |
tobiash | I don't see that callback | 14:39 |
mordred | looking in lib/ansible/plugins/callback/__init__.py | 14:40 |
tobiash | added it for test and it at least seems to get called | 14:41 |
mordred | that's good :) | 14:41 |
tobiash | I don't know yet if we can do something useful with it ;) | 14:43 |
tobiash | mordred: on_any is not the solution: http://paste.openstack.org/show/723323/ | 14:47 |
tobiash | it's called like our working callback | 14:47 |
mordred | tobiash: yuck | 14:47 |
corvus | i don't understand why it won't work? | 14:48 |
tobiash | next step would be to override the command action plugin to check what's arriving there | 14:48 |
corvus | oh -- you're saying that even on_any isn't being called for the delegated task? | 14:48 |
tobiash | corvus: under the assumption that we're missing a callback I would have assumed that on_any is called once more between lines 7 and 8 | 14:49 |
tobiash | corvus: yes | 14:49 |
corvus | got it | 14:49 |
mordred | OH! | 14:49 |
openstackgerrit | Merged openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574641 | 14:49 |
openstackgerrit | Merged openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574642 | 14:49 |
mordred | tobiash: task.action = shell | 14:49 |
mordred | if task.action == 'command': | 14:49 |
tobiash | mordred: where? | 14:50 |
mordred | we're getting shell for action now | 14:50 |
tobiash | yes | 14:50 |
tobiash | since 2.5 | 14:50 |
mordred | tobiash: in v2_playbook_on_task_start - we do if task.action == 'command': | 14:50 |
tobiash | that broke *all* shell tasks on our first rollout ;) | 14:50 |
mordred | ah. crap - am I just looking at a bad local checkout? | 14:50 |
* mordred sighs | 14:50 | |
tobiash | mordred: I'm pretty sure I fixed that ;) | 14:50 |
tobiash | corvus, mordred: I guess we should override the command action plugin and inspect the content of that further | 14:53 |
tobiash | but I have to run now | 14:53 |
tobiash | maybe I can assist later | 14:53 |
corvus | tobiash: thanks! | 14:53 |
*** mordred has quit IRC | 14:55 | |
clarkb | is it worth an update to https://docs.ansible.com/ansible/2.5/porting_guides/porting_guide_2.5.html upstream for on_handler_start? | 15:01 |
*** mordred has joined #zuul | 15:08 | |
Diabelko | corvus: I'll try that, thanks. | 15:08 |
* mordred is back in channel - irc had a sad | 15:19 | |
clarkb | catching up, we believe we have fixed the configure mirror problem, does that fix the later devstack multinode failure? | 15:25 |
clarkb | I think that later devstack multinode issue is due to free strategy? | 15:26 |
clarkb | corvus: should openstack infra plan on an executor restart here in a few? | 15:26 |
corvus | clarkb: i don't think free is involved | 15:27 |
corvus | clarkb: so i don't think we've solved the devstack multinode case | 15:27 |
corvus | clarkb: we have identified the second trigger in the helm case -- delegated tasks | 15:28 |
corvus | but we don't have a solution for that yet | 15:28 |
*** CrayZee has quit IRC | 15:28 | |
clarkb | ah yup, we set the strategy to lienar in devstack | 15:29 |
tobiash | corvus: free should be resolvef | 15:33 |
tobiash | With 574642 | 15:34 |
corvus | tobiash: right, but none of our bug reports include free | 15:35 |
corvus | i was trying to tell clarkb that we have not identified the devstack problem | 15:35 |
tobiash | corvus: nevermind, you wrote involved and i read resolved... | 15:35 |
corvus | (we have seen 3 problems, have reproducers for 2 and have fixed 1) | 15:36 |
tobiash | What's the third problem? | 15:36 |
corvus | handler, delegate, and the unknown devstack problem | 15:40 |
Diabelko | corvus: works, luv ya, thx :) | 15:44 |
corvus | clarkb, tobiash: we could restart our executors with the changes so far and see what happens in the devstack multinode job. it should still fail because of the extra check that tobiash added, but it may fail more cleanly and we may get more data | 15:44 |
corvus | clarkb, tobiash: thoough it's possible though that more jobs may fail -- jobs which have two null logs with different permissions will still fail. jobs with one null log will still succeed. but jobs with two null logs and the same permissions will now fail. | 15:45 |
clarkb | ya probably better to try and understand it more if possible | 15:46 |
tobiash | corvus: with that stack any job with a null log will fail | 15:46 |
tobiash | Because that also adds a validation to the module | 15:47 |
fungi | tobiash: dmsimard: https://github.com/ansible/ansible/pull/41414/commits/4e31216 looks like public disclosure to me. i expect we can switch story 2002177 to public? | 15:47 |
fungi | corvus: ^ | 15:48 |
dmsimard | fungi: your timing is interesting, I was asking in #ansible-devel about that just now | 15:48 |
tobiash | So without fixing the other problem actually more jobs might fail | 15:48 |
*** mordred has quit IRC | 15:48 | |
dmsimard | fungi: we need to land the zuul fix that tobiash worked on first though | 15:48 |
Shrews | corvus: so, quick design question on the static driver changes... I'm thinking it makes sense for the provider manager to be able to register static nodes when its start() method is called during config reconfiguration. Does that sound reasonable, or can you think of a different approach to consider? | 15:49 |
corvus | dmsimard: do we need to land the zuul fix, or do we need to upgrade ansible? | 15:49 |
fungi | dmsimard: yeah, proof that it's hard to keep an embargo while submitting fixes to a public review system | 15:49 |
tobiash | corvus: I assume that we still need the fix in the callbacks | 15:50 |
corvus | Shrews: that sounds reasonable -- it seems in sync with our idea of managing static nodes by changing config files | 15:50 |
Shrews | corvus: *nod*. I've toyed with maybe some sort of config-change-notifcation idea, but since start() is called on such changes, seemed logical to reuse that | 15:51 |
Shrews | just need to pass it a zk connection | 15:51 |
corvus | tobiash: how about we revert the callback changes, then do the no_log changes, then work on a more complete stack of callback changes and merge them when we think we have everything covered? | 15:51 |
*** mordred has joined #zuul | 15:51 | |
tobiash | corvus: fine for me | 15:52 |
mordred | wow. irc is very unhappy for me to day | 15:53 |
mordred | corvus: I think that's a good idea | 15:53 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Revert callback fixes https://review.openstack.org/574785 | 15:53 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574786 | 15:54 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574787 | 15:54 |
corvus | tobiash: you can stack your no_log change on 574785 | 15:54 |
tobiash | ok, back @laptop now | 15:54 |
tobiash | just a sec | 15:54 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix information disclosure caused by unreachable nodes https://review.openstack.org/574788 | 15:56 |
tobiash | there it is | 15:57 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add blocks to the zuul stream test https://review.openstack.org/574643 | 15:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log https://review.openstack.org/574672 | 15:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528 https://review.openstack.org/574487 | 15:58 |
corvus | mordred: want to +W 574785? | 16:00 |
corvus | i'm looking at 788 | 16:01 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Fix information disclosure caused by unreachable nodes https://review.openstack.org/574788 | 16:02 |
tobiash | fixed typo in assert ^ | 16:02 |
mordred | tobiash: whoops | 16:02 |
tobiash | also just noticed that a few seconds ago ;) | 16:03 |
*** aspiers[m] has quit IRC | 16:08 | |
tobiash | corvus, mordred: that delegate is getting spooky: http://paste.openstack.org/show/723326/ | 16:17 |
tobiash | I duplicated the delegated task into the first play with the expectation that it fails | 16:17 |
tobiash | but it doesn't | 16:17 |
tobiash | I've also overwritten the command action plugin to spit out some logging info | 16:18 |
corvus | interesting | 16:19 |
mordred | wow | 16:19 |
mordred | uhm | 16:19 |
corvus | okay, i'm going to go off and try to even more faithfully reproduce the devstack-multinode issue. | 16:21 |
mordred | tobiash: I wonder if they're making separate task.args objects for each host | 16:21 |
tobiash | looks like | 16:22 |
mordred | tobiash: in the code you have that made that paste - what's the difference between the first and second plays? | 16:24 |
tobiash | one difference is play vs role | 16:25 |
tobiash | and the role does stuff with include_role before that | 16:26 |
openstackgerrit | Merged openstack-infra/zuul master: Switch content type of public keys back to text/plain https://review.openstack.org/574220 | 16:31 |
openstackgerrit | Merged openstack-infra/zuul master: Fix signature of overridden methods in LogStreamHandler https://review.openstack.org/574204 | 16:31 |
*** aspiers[m] has joined #zuul | 16:34 | |
openstackgerrit | Merged openstack-infra/zuul master: Accumulate errors in context managers - part 2 https://review.openstack.org/574028 | 16:47 |
clarkb | mordred: tobiash separate task objects entirely was my guess yesterday | 16:49 |
clarkb | this seemed to be backed up by a change that no longer cached all the tasks upfront but instead cached them when running them | 16:49 |
clarkb | (so the objects are now created per host,task pair I think) | 16:49 |
clarkb | and then the linear strategy only runs the on_task_start callback once | 16:50 |
mordred | clarkb: ya - thething is we're not getting a callback event triggered for the second task object | 16:50 |
clarkb | not once per task,host | 16:50 |
mordred | ah | 16:50 |
clarkb | there is a flag to track if the callback was called and is set after calling it once then they check the flag and never call it again | 16:50 |
clarkb | this is in the linear strategy module code | 16:50 |
tobiash | it also seems to have something to do with the include_role task before | 16:51 |
tobiash | if I disable that it works | 16:51 |
clarkb | one idea I had was to use task._uuid (this is terrbiel for _ reasons) instead of making our own uuids | 16:51 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer (not working) for story 2002528 https://review.openstack.org/574808 | 16:53 |
mordred | clarkb: yah - so - the thing is we need to get the task._uuid into the parameters passed to the remote side | 16:53 |
corvus | clarkb, tobiash, mordred: ^ i started with the actual playbooks/roles for that job, then only slightly edited them to be no-ops. still no error. | 16:54 |
mordred | and we need the uuid to be the same across tasks, because we need to be able to stream the one file across multple tasks | 16:55 |
mordred | corvus: ugh | 16:55 |
corvus | mordred: i thought it was a unique uuid per task? | 16:55 |
tobiash | corvus: a task can span multiple hosts | 16:55 |
corvus | oh, i grok | 16:55 |
*** gtema has quit IRC | 16:56 | |
corvus | i guess we have a language problem with describing a single playbook-level task which spans multiple hosts (but has separate constituent host-tasks as viewed in the callback), versus separate playbook-level tasks? | 16:57 |
mordred | corvus: ++ | 16:57 |
mordred | that said ... lemme go try something real quick ... | 16:58 |
*** hashar is now known as hasharDinner | 16:59 | |
*** toabctl has quit IRC | 17:05 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: DNM: delegate reproducer for story 2002528 https://review.openstack.org/574823 | 17:12 |
tobiash | corvus, mordred: that is a minimal working reproducer of the delegated shell problem ^ | 17:12 |
tobiash | it also contains some attempts to get debugging information | 17:13 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: DNM: delegate reproducer for story 2002528 https://review.openstack.org/574823 | 17:17 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: DNM: delegate reproducer for story 2002528 https://review.openstack.org/574823 | 17:23 |
tobiash | now really minimal | 17:23 |
tobiash | corvus, mordred: what's interesting is if I switch the strategy to 'free' the test works | 17:24 |
tobiash | strategy linear breaks | 17:24 |
*** jpena is now known as jpena|off | 17:24 | |
mordred | tobiash: yah - I'm thinking clarkb is right about it being something the linear strategy is doing differently with callback api calls | 17:25 |
clarkb | mordred: I dropped a sha1 yseterdy for the change that look suspicious to me. let me find it | 17:26 |
tobiash | corvus, mordred: latest run with stack trace in the command action plugin: http://paste.openstack.org/show/723330/ | 17:26 |
clarkb | d6872a7b070d1179e7d76bcda1490bb7003c4574 b107e397cbd75d8e095b08495da2ac2aad3ef96f were the two I idenfitied | 17:26 |
*** elyezer has joined #zuul | 17:34 | |
tobiash | clarkb: I suspect the second one rather the first one (as blocks seem to be not needed for the reproducers) | 17:35 |
clarkb | ya that was the one that looked related to not having the same task object for all hosts | 17:36 |
*** gtema has joined #zuul | 17:36 | |
tobiash | just tested against ansible-devel and the issue persists | 17:38 |
clarkb | tobiash: looking at your traceback I think if we can log the id of the task object in linear.py line 305 or so as well as log each time the on task start is called line 300 or so then we can confirm that they are different and only called the once | 17:39 |
clarkb | (or step through it with pdb but that is probably more involved) | 17:40 |
tobiash | can we override linear.py? | 17:40 |
mordred | I think so - same as other things - we should be able to make our own linear strategy plugin | 17:40 |
clarkb | we can add our own strategy plugins and probably provide a replacement linear | 17:40 |
tobiash | hm, probably as it's just a plugin | 17:40 |
openstackgerrit | Merged openstack-infra/zuul master: Revert callback fixes https://review.openstack.org/574785 | 17:42 |
mordred | clarkb: ok. I think the linear strategy thing is making some amount of sense to my brain now | 17:44 |
clarkb | we could also just patch ansible directly in the tox install and yolo it | 17:45 |
clarkb | mordred: thats good beacuse their get next task iterator state machine was making me want to do yardwork yesterday | 17:45 |
clarkb | but the yardwork is done now so I don't have it as an alternative task anymore | 17:46 |
mordred | well - I mean - conceptually, not all of the specifics | 17:46 |
mordred | but it's basically only calling callback task_start when it starts running the task - which then has the loop over hosts thing. for the other strategies, we get task_start on each task/host combo, because they're happening in parallel, so there isn't a 'start' that's better than calling it on each one | 17:47 |
*** GonZo2000_ has joined #zuul | 17:48 | |
clarkb | yup | 17:48 |
clarkb | what I haven't been able to udnerstand is if the task object is different for each host | 17:48 |
*** GonZo2000 has quit IRC | 17:48 | |
clarkb | but the behavior implies that it is, first one through works next one through doesn't | 17:48 |
mordred | jimi|ansible: (bugging you because we found a patch you wrote that might be related) ... we're tracking down an issue in zuul land that we're hitting on 2.5 related to the linear strategy | 17:48 |
clarkb | also linear runs them in parallel its just that they are in parallel and wait until all complete before going to the next task | 17:50 |
mordred | jimi|ansible: from what we can tell, when linear strategy is used, the callback plugin method playbook_on_task_start is only called once for a given task even if it has more than one host | 17:50 |
mordred | jimi|ansible: there is some debug printing here showing that: http://paste.openstack.org/show/723330/ | 17:50 |
mordred | jimi|ansible: with non-linear, we get a playbook_on_task_start for every task/host combo | 17:51 |
mordred | jimi|ansible: we are currently using playbok_on_task_start to add something to task.args - but in 2.5 with linear this is breaking because subsequent hosts in the task don't get that (or any other callback method that we can find) called for subsequent hosts | 17:53 |
mordred | jimi|ansible: is this intended behavior (not calling the callback for each task/host on start but instead once per task)? | 17:55 |
tobiash | I've hooked into the strategy and _queue_task is already called without the zuul_log_id | 18:02 |
*** gtema has quit IRC | 18:04 | |
tobiash | clarkb: I think I see the error: https://github.com/ansible/ansible/blob/stable-2.5/lib/ansible/plugins/strategy/linear.py#L300 | 18:05 |
tobiash | it's calling the callback on a copy of the task | 18:05 |
tobiash | oh wait, it's only messing with the task name | 18:06 |
openstackgerrit | Merged openstack-infra/zuul master: Fix information disclosure caused by unreachable nodes https://review.openstack.org/574788 | 18:06 |
mordred | tobiash: look in _get_next_task_lockstep | 18:07 |
mordred | tobiash: it's the fact that we have a when that makes the task a no-op for some hosts | 18:07 |
mordred | those hosts get a noop task created - and I think it's the noop task on which we are setting the zuul_log_id - so it doesn't carry over | 18:07 |
clarkb | oh that would explain it. the tasks would normally be shared except for when it is conditional and replaced by a different task and goes first | 18:08 |
mordred | yah | 18:08 |
corvus | this feels like it should be the cause of the "discover hosts" error too | 18:09 |
clarkb | corvus: yup that has a when as well | 18:09 |
corvus | i found an error in my earlier repro attempt, so i'm still working on that | 18:10 |
mordred | I think we could write a patch to the strategy plugin that checks to see if the task is a noop task and if so _doesn't_ fire the task_start callback (so that it won't fire the callback until it hits a real task) | 18:11 |
mordred | I'm not sure if that would be acceptable upstream - but it's worth a stab | 18:11 |
corvus | omg omg omg omg did it just work? http://paste.openstack.org/show/723334/ | 18:12 |
mordred | corvus: I think so! | 18:12 |
mordred | corvus: let me write a quick patch to the linear strategy plugin for ansible - then we can see aout making a local override of it with that applied and see if it fixes it? | 18:12 |
corvus | w00t! i'll commit that and push it up. it requires gobs of locally prepared stuff still, but i can start eliminating things now | 18:12 |
corvus | mordred: cool | 18:12 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: DNM: Add facility for overriding linear strategy https://review.openstack.org/574843 | 18:13 |
mordred | wot | 18:13 |
mordred | woot | 18:13 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer for story 2002528 https://review.openstack.org/574808 | 18:13 |
tobiash | corvus, mordred: that is a facility for overriding the linear strategy in zuul ^ | 18:13 |
tobiash | that should enable you to try this out within the tox-remote | 18:14 |
tobiash | mordred: according to my logs the callback is never called with a noop action | 18:15 |
tobiash | mordred: and the reproducer only works with the include_role in front of the shell task | 18:16 |
*** electrofelix has quit IRC | 18:16 | |
tobiash | mordred: if I remove the include_role it doesn't reproduce | 18:16 |
jimi|ansible | mordred: yeah it's kind of strategy dependent due to the way linear works | 18:16 |
tobiash | mordred: so I fear that your idea won't work | 18:16 |
corvus | tobiash, clarkb, mordred: i think the order of the hosts may not be deterministic | 18:16 |
corvus | if that's so, that may be why we've had so much trouble reproducing these | 18:17 |
clarkb | ya that could explain it since you'd need the nooping host to go first | 18:17 |
corvus | our nodeset is ordered, but we write it out to the inventory as a dict | 18:18 |
corvus | i have to go grab lunch. biab. | 18:19 |
tobiash | mordred: oh we might have talked about different stuff, you were talking about issue 3 which corvus repruduced? | 18:20 |
tobiash | I was talking about the delegate issue (issue 2) | 18:20 |
tobiash | and there the root cause still seems to be some weird side effect maybe inside task caching? | 18:21 |
mordred | tobiash: http://paste.openstack.org/show/723335 <-- how does that look? | 18:21 |
*** wissal has joined #zuul | 18:21 | |
mordred | tobiash: ah yes - I was talking about the other issue | 18:22 |
mordred | jimi|ansible: nod. we're learning many fun things :) | 18:22 |
tobiash | mordred: that looks good to me as ansible noob ;) | 18:23 |
*** wissal has quit IRC | 18:24 | |
tobiash | mordred: just curious, would we need that zuul_log_id if we switch to the log streaming you prototyped a few months ago? | 18:36 |
openstackgerrit | Merged openstack-infra/zuul master: Fix tox-cover https://review.openstack.org/574080 | 18:45 |
mordred | tobiash: we would not | 18:56 |
*** hasharDinner is now known as hashar | 19:21 | |
corvus | i've confirmed that with my current repro, compute1,controller fails and controller,compute1 does not | 19:34 |
corvus | i think we can just remove the when: from that task | 19:35 |
clarkb | the when is what produces the noop task in linear.py though right? | 19:36 |
corvus | hrm. removing the 'when' doesn't cause the test to succeed. | 19:38 |
corvus | maybe we need to test two things :) | 19:48 |
pabelanger | Is zuul_return setup to work across job dependencies? Looking to pass artifact information between 2 jobs | 19:48 |
corvus | pabelanger: yes | 19:49 |
corvus | pabelanger: https://zuul-ci.org/docs/zuul/user/jobs.html#parent-job-results | 19:49 |
pabelanger | thanks | 19:51 |
corvus | the copy-build-sshkey role seems to be required | 19:53 |
pabelanger | corvus: is zuul.log_url a special case? I'm having some issues seeing where that data would be passed to the child job | 19:55 |
pabelanger | ah | 19:56 |
pabelanger | I see | 19:56 |
pabelanger | Any values other than those in the zuul hierarchy will be supplied as Ansible variables to child jobs. | 19:56 |
pabelanger | thanks | 19:56 |
corvus | yep. so zuul.* is the special case | 19:56 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer for story 2002528 https://review.openstack.org/574808 | 20:03 |
corvus | clarkb, tobiash, mordred: ^ that's a minimal test for the discover hosts problem: a shell following an include_role with a register of the user module | 20:03 |
*** elyezer has quit IRC | 20:04 | |
corvus | actually, it doesn't require the user module or registering | 20:06 |
corvus | it appears to be a shell following an include_role (which can do anything) | 20:06 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: Devstack "Discover hosts" reproducer for story 2002528 https://review.openstack.org/574808 | 20:07 |
*** aspiers[m] has quit IRC | 20:50 | |
*** hashar has quit IRC | 20:55 | |
*** dkranz has quit IRC | 20:58 | |
corvus | i'm going to try to get all of the DNM reproducers into shape | 20:59 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: DNM: reproducer for 2002528 https://review.openstack.org/574487 | 21:03 |
*** pcaruana has quit IRC | 21:04 | |
corvus | clarkb, tobiash, mordred: i think the delegated issue was actually the include_role issue. if i remove the include_role from 574823 it works | 21:05 |
clarkb | corvus: was the helm case using include_role? | 21:06 |
clarkb | maybe that was a level aboev the file I was looking at | 21:06 |
corvus | clarkb: yep | 21:07 |
*** aspiers[m] has joined #zuul | 21:08 | |
corvus | clarkb: it was a few tasks back | 21:08 |
corvus | http://logs.openstack.org/55/574055/3/check/openstack-helm-infra-ubuntu/9e9ac15/ara-report/file/e595fcd2-dd71-4195-ae7e-9fc1ba885708/#line-41 | 21:08 |
corvus | line 23 has an include_role | 21:08 |
openstackgerrit | Michael Johnson proposed openstack-infra/zuul-jobs master: Collect the coverage report for npm test jobs https://review.openstack.org/570260 | 21:27 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Pass zk connection to ProviderManager.start() https://review.openstack.org/574895 | 21:38 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Make a proper driver notification API https://review.openstack.org/574896 | 21:38 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897 | 21:38 |
Shrews | corvus: i think that ^ almost covers it. I want to handle the case mentioned in 574897 commit message. I'll do that tomorrow. | 21:40 |
*** dtruong has quit IRC | 21:41 | |
* Shrews EODs | 21:44 | |
*** dtruong has joined #zuul | 21:47 | |
openstackgerrit | Merged openstack-infra/nodepool master: webapp: fix browser return https://review.openstack.org/573053 | 21:55 |
corvus | clarkb: i haven't been able to get a simple case of skipping the first host-task and running the second host-task to fail | 22:03 |
corvus | clarkb: the earlier tests which sometimes failed based on order were after the include_role | 22:03 |
clarkb | ah so maybe it is only include_role at play? | 22:04 |
*** myoung is now known as myoung|bbl | 22:07 | |
corvus | clarkb: yeah... if i do include_role followed by a task that runs on the second host, it fails. include_role followed by a task running on the first host succeeds. include_role followed by a task that runs on both hosts fails on the second host only. | 22:20 |
corvus | so i think that explains all the observed behavior | 22:20 |
clarkb | that is really weird | 22:21 |
corvus | well, explains is not the right word. | 22:21 |
clarkb | covers the behavior | 22:21 |
corvus | i think we know how to make test cases that cover all the observed behavior, yeah :) | 22:22 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487 | 22:43 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Remove failed_when when creating /tmp/console-None.log https://review.openstack.org/574672 | 22:43 |
corvus | clarkb, mordred, tobiash: okay, i think if we insert the necessary fixes before 574487 we should be able to see the results. i think 574487 has all the necessary test cases now. | 22:44 |
clarkb | cool, before we EOD do we need to update ansible (reason for earlier revert aiui) then we can add those changes back in? | 22:45 |
corvus | clarkb: yeah, we should be able to upgrade openstack-infra's install to patch the security bug. i've put a -2 on the bottom of the missing zuul_log_id stack though to hold it until we have all the fixes staged. i don't think we should merge any of those until it's all ready. | 22:47 |
corvus | let's switch to -infra to talk about the upgrade | 22:48 |
clarkb | once 2.5 logging is sorted we should probably make a release | 22:58 |
corvus | ya | 23:05 |
*** aspiers[m] has quit IRC | 23:16 | |
clarkb | is the include_role case the only one we don't have a fix for? | 23:19 |
*** jimi|ansible has quit IRC | 23:24 | |
corvus | clarkb: i think so -- i think everything else factored out down to that | 23:30 |
corvus | i'm going to eod shortly, hopefully we can pick this up again tomorrow and figure that out :) | 23:31 |
corvus | i updated the story with a very brief current summary | 23:33 |
clarkb | ya anytime I try to read ansible code at about 4:30pm my brain finds a whole list of things to do around the house :) | 23:33 |
clarkb | this stuff is relatively complicated and if you don't liev in it frequently its not easy to start | 23:33 |
clarkb | my skimming of how include_role works is that it is a type of task and the associated action is "include_role" | 23:34 |
clarkb | but I don't yet see what it would have a dominoe effect on subsequent tasks | 23:34 |
*** aspiers[m] has joined #zuul | 23:34 | |
*** threestrands has joined #zuul | 23:36 | |
corvus | all things considered, i'm happy with the progress we made today :) | 23:38 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!