Wednesday, 2018-06-13

*** rlandy|rover is now known as rlandy|rover|bbl00:00
*** elyezer has joined #zuul00:14
*** jimi|ansible has joined #zuul02:24
*** myoung|bbl is now known as myoung02:28
*** dtruong has quit IRC02:45
*** dtruong has joined #zuul02:47
*** rlandy|rover|bbl is now known as rlandy|rover02:59
*** rlandy|rover has quit IRC02:59
*** bhavik1 has joined #zuul03:34
*** dmsimard has quit IRC04:13
*** bhavik1 has quit IRC04:19
*** pcaruana has joined #zuul04:50
*** eliqiao has quit IRC04:52
*** pcaruana has quit IRC05:18
*** bhavik1 has joined #zuul05:45
*** bhavik1 has quit IRC05:49
*** Rohaan has joined #zuul05:58
*** pcaruana has joined #zuul06:06
*** CrayZee has joined #zuul06:22
RohaantristanC: Hi, I was able to run an openshift-test zuul job yesterday06:44
tristanCRohaan: nice!06:46
tristanCA container was built per code review?06:47
RohaanYeah, I saw Builds, DeploymentConfigs being created06:47
RohaanBut I was getting some errors in them06:48
tristanCCould you paste logs?06:48
*** pwhalen_ has joined #zuul06:49
*** pwhalen has quit IRC06:50
*** jpena|off is now known as jpena06:51
RohaantristanC: https://pastebin.com/7j86Ssgr06:52
tristanCRohaan: maybe the timeout is too short, perhaps try changing "retries: 120" into  "retries: 600" in playbooks/openshift/prepare-namespace.yaml06:57
tristanC(you'll have to commit and push that change before recheck)06:57
RohaanOne more thing, earlier I used to login as SF Administrator using admin:userpass credentials. But since this installation I can't :( Is there any way to know how can I know what credentials are there for admin06:57
tristanCRohaan: right, admin password is now randomize at deployment time, you can get it using  awk '/admin_password/ {print $2}' /etc/software-factory/sfconfig.yaml06:58
tristanCRohaan: change the retries value in playbooks/openshift/deploy-project.yaml too07:00
RohaantristanC: okay07:00
*** gtema has joined #zuul07:01
*** nguyenhai_ has quit IRC07:17
*** threestrands has quit IRC07:21
*** hashar has joined #zuul07:21
RohaantristanC: This time I saw successful build and deployments, ImagePullErrs also got resolved. but somehow build failed07:23
RohaanSome interesting output I found in /var/log/zuul/executor.log: https://pastebin.com/vjZdZ9KL07:24
RohaanMy VM is using only 4GiB of RAM, Could that be a problem?07:24
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Add test for shell after include_role  https://review.openstack.org/57448707:37
tobiashjust added two asserts to verify that also streaming actually works ^07:38
*** CrayZee has quit IRC07:43
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Improve logging of GithubTriggerEvents  https://review.openstack.org/57501407:52
tristanCRohaan: 4GiB should be enough to do poc, but it's rather small, especially if it's hosting openshift too. Anyway you can prevent that executor warning by adding "load_multiplier=6" to the [executor] section of /etc/zuul/zuul.conf (and restart executor after)07:55
RohaantristanC: Hm, okay07:58
tristanCRohaan: oh and in your case, adding a vcpu would help a lot08:10
*** electrofelix has joined #zuul08:16
RohaantristanC: This might be a noob question, Are you talking about increasing CPUs via VirtualBox menu? Or Adding Virtual CPU?08:36
tristanCRohaan: yes adding CPU to the vm08:38
RohaantristanC: I restarted vm after increasing CPUs and RAM, but after restarting zuul services I'm getting `/zuul/api/tenant/local/status: Service Unavailable` in dashboard. I checked and all Zuul services are running, nodepool and Zookeeper are running too. But doing nodepool list gives https://pastebin.com/whf0ANm308:55
tristanCRohaan: did zookeeper started? (e.g. systemctl status zookeeper). Perhaps try restarting it?09:01
RohaantristanC: Yep, works. Thx :)09:06
*** sshnaidm has quit IRC09:06
RohaantristanC: is there any limit in gerrit for patch updates? I'm trying to update patch; I can see started job in Zuul check pipeline, but not in gerrit.09:25
tristanCRohaan: you can push any number of new patchset or create new code-review too (depending on if you ammend or commit on top)09:25
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Clarify usage of the zuul_success variable  https://review.openstack.org/57505009:26
RohaanSomehow It is only showing till 4 patchsets. Looking at openshift logs it seems to be working fine: https://pastebin.com/9nb02aHq . But I haven't seen any green tests yet09:28
tristanCRohaan: you mean zuul didn't commented back on gerrit?09:30
RohaanAnyways I'm starting with a new change09:31
RohaantristanC: I deleted that change and started new change on which Zuul has commented.09:32
*** sshnaidm has joined #zuul09:42
RohaantristanC: Build failed :( . This is what /var/log/zuul/executor.log shows: https://pastebin.com/Qid7r04m Any ideas?09:55
tristanCRohaan: I can't tell without looking at openshift resources, it failed because it couldn't get the http server pod name, e.g. these tasks didn't worked: https://review.openstack.org/#/c/570669/2/playbooks/openshift/prepare-namespace.yaml L11809:58
tristanCyou'll have to run "oc get pods --field-selector=status.phase=Running" and see if the jsonpath is correct09:58
Rohaanabove command doesn't seem to be working. giving syntax err10:01
RohaantristanC: oc get pods returns two pods:10:05
Rohaanzuul-merger-server-1-l7xp5   1/1       Running     0          1m10:05
Rohaanzuul-merger-server-2-build   0/1       Completed   0          1m10:05
tristanCRohaan: and what was the syntax err? It's very possible that the openshift-base job could use better tasks, the goal of this one is to get the pod name of the zuul-merger.10:08
tristanCI didn't looked much into it, but there must be an easier ansible task to use instead of the jsonpath10:09
RohaantristanC: Error: unknown flag: --field-selector10:09
tristanCRohaan: I've tested this job with openshift v3.9.0+46ff3a0-1810:11
RohaanOops. I'm on v3.6.1+269e828-910:12
RohaanLet me try with that openshift version10:13
*** jpena is now known as jpena|off10:24
*** jpena|off is now known as jpena10:32
RohaantristanC: How can I check for jsonpath in field selector output?10:40
Rohaan[root@dhcppc1 /]# oc get pods --field-selector=status.phase=Running10:41
RohaanNAME                         READY     STATUS    RESTARTS   AGE10:41
Rohaandemo-project-2-build         1/1       Running   0          13s10:41
Rohaanzuul-merger-server-1-hlpd8   1/1       Running   0          1m10:41
tristanCRohaan: nice, you get to the deploy-project.yaml tasks, check L47 of https://review.openstack.org/#/c/570669/2/playbooks/openshift/deploy-project.yaml10:44
tristanCand replace "{{ zuul.project.short_name }}" with demo-project10:44
Rohaanokay10:45
RohaantristanC: Do I need to restart zuul/nodepool? I just did recheck but got same result :(10:54
tristanCRohaan: if you update base jobs and push the commit, then zuul should pick it up, but it may be safer to restart the scheduler and executor to ensure they have the right content10:57
Rohaanah, I forgot :D10:57
RohaantristanC: Were you talking about replacing zuul.project.short_name's all occurences or just line 47. I still got failure after job changes. I only changed line 4711:13
*** elyezer has quit IRC11:14
tristanCRohaan: wait no, dont change the jinja template in the base job, i thought you wanted to manually check the command used in the job11:16
tristanCwhat's the status of the job, did it pass the pre phase?11:16
Rohaanhow can i check that?11:17
tristanCRohaan: on the review, zuul comments back the job result with a link to the logs11:17
Rohaanyeah11:17
RohaanHow can I check whether that has passed pre phase11:18
tristanCRohaan: then check the job-output.txt file, or clic the ara folder then the "tasks" button11:18
Rohaanokay11:18
Rohaanyeah post.yaml and pre.yaml11:19
Rohaanit failed in pre11:19
tristanCcould you paste de job-output.txt content then?11:21
RohaantristanC: https://ibb.co/kPgCPd11:22
Rohaanokay11:22
RohaantristanC: https://pastebin.com/zXNpmZ8D11:23
tristanCRohaan: it seems like you need to increase the retries count to 600 in the "wait for project image built" task of https://review.openstack.org/#/c/570669/2/playbooks/openshift/build-project.yaml11:25
tristanCL86 of playbooks/openshift/build-project.yaml11:25
tristanC(and don't forget to commit and push that change)11:25
RohaanYeah :P11:27
Rohaanokay rechecking11:27
* Rohaan prays that job turns green _|_11:28
*** elyezer has joined #zuul11:29
Rohaanfailed :(11:39
tristanCaww, what task failed?11:39
Rohaanget project pod name11:39
tristanCRohaan: did you changed the retries limit to a bigger value for this task? (at the bottom of build-project.yam)11:43
RohaanIt's retries value is set to 60011:45
*** jpena is now known as jpena|lunch11:45
tristanCRohaan: if you click on the ara report, then pre.yaml "tasks", and finally on the red rectangle marked with "FAILED", you'll get the detail of that task, could you paste the command being used?11:46
Rohaan- name: get project pod name11:47
Rohaan  command: oc get pods --field-selector=status.phase=Running -o "jsonpath={.items[?(@.metadata.labels.app=='{{ demo-project }}')].metadata.name}"11:47
Rohaan  register: _pod_name11:47
Rohaan  retries: 60011:47
Rohaan  delay: 111:47
Rohaan  until: "zuul.project.short_name in _pod_name.stdout"11:47
*** rlandy has joined #zuul11:49
tristanCRohaan: oh i see, then edit the build-project.yaml file and replace "{{ demo-project }}" with "{{ zuul.project.short_name }}"11:49
*** rlandy is now known as rlandy|rover11:50
tristanCRohaan: in jinja, "{{ }}" means "print that variable", so "{{ demo-project }}" is incorrect. It should either be simply "demo-project" (without the {}), or use the zuul variable11:50
tristanC(Ansible may use jinja templating for any string in playbooks)11:52
RohaantristanC: build-project.yaml or deploy-project.yaml(There is no demo-project occurence in build-project.yaml)11:56
tristanCRohaan: yes, I meant deploy-project.yaml11:57
RohaantristanC: yay, build succeeded \o/12:12
tristanCRohaan: at last! Now you should try a bad change on the demo.py script to make sure CI is failing on purpose12:13
tristanCAlso, now would be a good time to snapshot the vm :-)12:20
RohaantristanC: Done :)12:37
*** nguyenhai has joined #zuul12:39
*** elyezer has quit IRC12:41
mordredtristanC, Rohaan: super cool!12:50
*** pwhalen_ is now known as pwhalen13:05
*** pwhalen has quit IRC13:05
*** pwhalen has joined #zuul13:05
*** dkranz has joined #zuul13:24
*** Rohaan has quit IRC13:28
*** Rohaan has joined #zuul13:43
corvusclarkb, mordred, tobiash: any updates on https://storyboard.openstack.org/#!/story/2002528 since yesterday?13:55
tobiashcorvus: no, I didn't have time today (except adding two asserts which would make your reproducer ready for merge if the issue is fixed)13:56
corvustobiash: ah thanks13:56
tobiashI'm currently fixing a github race13:56
tobiashcorvus: I have a problem with github which anounces pr's sometimes before the head commit is querieable13:56
mordredcorvus: no, I haven't gotten to do anything actually technical yet today (morning of meetings and meeting prep for me)13:57
tobiashthat results in occational 404 exceptions in the github driver event handler13:57
mordredcorvus: my plan for when that's done is to take http://paste.openstack.org/show/723335, make a PR to ansible with it, then make a patch on top of tobiash's patch to make a local strategy plugin with that logic change13:58
tobiashcorvus: what would you prefer, a local retry mechanism around getCommit or handling the exception and reenqueue the event?13:58
corvustobiash: i thought we had a delay in the github driver; what about increasing that?13:59
tobiashcorvus: the delay was removed after solving our caching issues13:59
tobiashcorvus: that is the exception http://paste.openstack.org/show/723388/13:59
tobiashand I'd prefer a retry over a general delay13:59
corvustobiash: hrm.  i'm not sure we should have removed the delay -- we encountered this problem before and that's why it was added.  the same thing is true of the gerrit driver, and we solved it there with the delay.14:00
tobiashbecause it's great to not have this delay :)14:00
tobiashthe problem with a delay is how to adjust it right to be 'safe enough' without having too much impact on delays14:02
tobiasha retry would just work14:02
tobiashback in the v2 days we also replaced the gerrit delay by a retry which worked great at that time14:04
corvustobiash: the advantage of a delay is that, in a system where it's needed, things work correctly the first time.14:04
*** hashar is now known as hasharAway14:05
tobiashwhere a retry would be more resilient to intermittend failures other than such races14:05
corvusin a system that's normally delayed, you're sending twice as much network traffic because every other request is a failure14:06
*** jpena|lunch is now known as jpena14:06
tobiashis that the case for gerrit?14:06
tobiashin our case we had sometimes delays, but not that often14:06
corvusin ours, it's fairly continuous14:06
tobiash(gerrit last year)14:06
corvusor at least it was, several gerrit versions ago :)14:07
corvusit simply took gerrit too long to actually commit the transactions involved14:07
corvusi suppose the best approach would be both.  a delay with a back-off triggered by failures and retries.  :)14:08
tobiashthat would be great14:08
corvustobiash: if you feel strongly about avoiding the delay, i'd be okay with a local retry.  i think that's better than re-enqueing the event (which brings up ordering issues).  and maybe later we can add the delay back in and hook it up to the retries?14:09
tobiashbut at least in the github case a simple delay would do it properly (we have probably less than 1% of events that hit this race today)14:09
corvustobiash: i assume you meant 'retry' in that last comment?14:09
tobiashcorvus: oh yes ;)14:10
tobiashcorvus: I was also thinking about the ordering issues thus I implemented a local retry and currently trying to craft a test case14:10
corvus++14:11
*** Rohaan has quit IRC14:11
tobiashcorvus: what about gerrit, would you like that combined approach?14:11
corvusyeah, i think it would be great.  it's fascinating to me that both systems have the same flaw.  :)14:12
tobiashyes, in github it's probably by design because of performance14:12
corvusmordred: what's the story with http://paste.openstack.org/show/723335/ ?14:13
tobiashwhen requesting a commit you even get diffs back in the response14:13
tobiashcorvus: also if you have time this week I'd like to discuss a repo-state thing that has bitten us14:15
tobiashmaybe after the ansible things are sorted out?14:16
corvustobiash: sounds good14:16
tobiash:)14:16
*** Rohaan has joined #zuul14:20
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Make a proper driver notification API  https://review.openstack.org/57489614:22
fungitobiash: corvus: is there a reason we've still got https://storyboard.openstack.org/#!/story/2002177 set to private since https://review.openstack.org/574788 is public (and merged for close to a day now)?14:22
mordredcorvus: https://github.com/ansible/ansible/pull/41495 - made the PR14:24
*** Rohaan has quit IRC14:24
tobiashfungi: the upstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=158885514:24
openstackbugzilla.redhat.com bug 1588855 in vulnerability "ansible: Failed tasks do not honour no_log option allowing for secrets to be disclosed in logs" [Medium,New] - Assigned to security-response-team14:24
tobiashit was private and I got access so cannot see if it's still private14:24
fungitobiash: it's public14:24
fungi(note even our meetbot was able to scrape it)14:24
mordredcorvus: the tl;dr is that the linear strategy plugin makes fake Noop Task objects that it sticks into its task queue for non-matching hosts - but it also only fires the callback method on the first task/host in the queue14:25
tobiashjust saw it :)14:25
fungitobiash: as is the commit message for the change which merged to fix that in ansible14:25
fungior, rather, changelog entry i mean14:25
*** elyezer has joined #zuul14:25
mordredcorvus: so if the first host for a given task doesn't match, the callback gets fired with the fake Noop task, which means setting the parameter on that task in the callback doesn't persist to the real Task14:25
*** dmsimard has joined #zuul14:29
*** dmsimard has joined #zuul14:29
tobiashmordred: fwiw in local tests I never saw the callback handling a noop task14:33
tobiashmordred: so this could be a red herring14:33
openstackgerritTobias Henkel proposed openstack-infra/zuul master: Retry queries for commits  https://review.openstack.org/57513714:47
tobiashfungi: so I think we can make our story public too14:48
fungitobiash: mostly wondering what we risk disclosing in the story at this point since we already merged the fix for it with a relatively detailed commit message14:49
tobiashfungi: it was just late in the evening for me and everyone else working on the other ansible problems14:50
tobiashfungi: so I think just no one thought of opening that story14:50
fungino problem, just making sure i wasn't missing anything there. in openstack, our vulnerability management process is to switch embargoed bug reports to public before pushing any change into public code review14:50
fungiwhich then allows us to reference the report in the commit messages too14:50
tobiashI'll probably writing a mail later today14:50
fungiso doing something similar in zuul would make sense to me anyway14:51
fungithanks!14:51
tobiashfungi: maybe we should do that the next time14:51
tobiashthanks for that tipp14:51
fungimy pleasure14:51
* tobiash seems to remain being the bad security news poster ;)14:52
tobiashhave to run now14:52
pabelangerHmm, an untrusted job can't use zuul_return, well can't delegate to localhost15:20
*** dmsimard has quit IRC15:21
pabelangerwhat are thoughts about changing that? Otherwise, parent / child jobs would all need to be trusted, right?15:29
mordredpabelanger: do you need to delegate_to: localhost? could you just hosts: localhost: tasks: - zuul_return ?15:37
* mordred thinking out loud15:37
pabelangermordred: I think we still block that too, but I can test15:38
mordredpabelanger: I think that is easy to add zuul_return to a localhost whitelist - I think it might be harder to whitelist zuul_return for delegate_to - but I might be wrong and it might be the exact same work15:38
pabelangermordred: yah, doesn't need to be delegate_to, hosts: localhost would be fine15:40
mordred(in any case, I think it's appropriate to allow use of zuul_return on localhost)15:40
mordredcorvus: ^^ yeah?15:40
Shrewstobiash: corvus: Am I correct in thinking that pools within the static driver are essentially useless? Because the name for a node in the config must be unique (hostname or IP based on https://zuul-ci.org/docs/nodepool/configuration.html#static-driver), we really don't need the pool concept15:51
pabelangermordred: yah, hosts: localhost same issue. But happy to use that syntax if we are to whitelist zuul_return15:52
mordredpabelanger, corvus: we could also write zuul_return as an action plugin instead of a module so that it's just inherently always something that runs on localhost15:54
tobiashpabelanger: delegate or directly localhost doesn't make any difference regarding whitelisting15:56
tobiashThat probably just needs whitelisting in normal.py15:57
*** myoung is now known as myoung|biaf15:58
tobiashBut as mordred said action plugin would be universal and you wouln't have to take care of any delegate or localhost15:58
pabelangeryah, I like that actually15:58
tobiashShrews: I think you're right15:59
*** weshay is now known as weshay|ruck16:00
*** myoung|biaf is now known as myoung16:03
*** rlandy|rover is now known as rlandy16:06
*** myoung is now known as myoung|biaf16:07
corvusmordred: i agree with all the things you said16:08
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes  https://review.openstack.org/57489716:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes  https://review.openstack.org/57489716:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes  https://review.openstack.org/57489716:13
corvusmordred, tobiash, clarkb: i've started local debugging of the include issue: http://paste.openstack.org/show/723399/16:15
corvusthat looks like it's making the start callback in the order i'd expect16:16
clarkbcorvus: ok, I'm sort of distracted, but if I can help let me know16:16
clarkbcorvus: check the id() of the task object maybe?16:16
corvusyeah, that's where i'm going next16:17
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Clarify static node name uniqueness in docs  https://review.openstack.org/57515916:18
tobiashShrews: Just thought a second time about uniqueness of static nodes16:24
tobiashShrews: we haven't thought that there can be multiple static nodes be mapped behind a single ip address but different ports (nat use case)16:24
Shrewstobiash: that sounds like new functionality. i'll leave that for someone else to add  ;)16:25
tobiashI think we had this already working16:26
tobiashbut not sure anymore16:26
*** dmsimard has joined #zuul16:26
tobiashwe at least had already nodes behind a nat gateway16:26
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Clarify static node name uniqueness in docs  https://review.openstack.org/57515916:26
Shrewstobiash: hmm16:27
tobiashmaybe that just magically worked with the dynamic driver?16:27
tobiashI think that just didn't care about uniqueness16:28
Shrewstobiash: the old _checkConcurrency() method didn't take port into consideration. i'd be surprised if that worked16:28
Shrewsand if it did, maybe accidentally16:28
tobiashbut it's just a list with two nodes with same address and same label16:28
Shrewsnothing explicitly pointed it out16:28
tobiashit's walking down the list until the first free node16:28
*** myoung|biaf is now known as myoung|lunch16:29
corvuscan we split name/ip?  name is required to be unique, but ip isn't?16:29
tobiashname/hostname?16:30
tobiashor would ip also accept a dns name?16:30
tobiashbut that split sounds good16:30
tobiashjust checked our production config and we don't use that nat gateway anymore, so that at least wouldn't break me immediately16:31
*** hasharAway is now known as hashar16:34
*** ianychoi has quit IRC16:34
*** ianychoi has joined #zuul16:35
corvushttp://paste.openstack.org/show/723404/  the task executor does in fact get a different task object the second time16:55
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs  https://review.openstack.org/57517316:57
pabelangercorvus: tobiash: mordred: ^first attempt at whitelisting zuul_return16:57
corvusi've confirmed that the iterator in linear.run() gets two different task objects for the failing task17:04
tobiashpabelanger: commented17:05
corvusi think that means we need to understand _get_next_task_lockstep17:05
pabelangertobiash: thanks, let me figure that out17:08
pabelangerany idea how to get the executor-lib path from ansible?17:14
pabelangereg: /tmp/tmpag22d2js/zuul-test/executor-lib/ansible/zuul/ansible/library/zuul_return.py17:14
clarkbzuul knows what it is rgiht? is it added as a variable by zuul to the zuul dict?17:17
tobiashpabelanger: I think there is no env variable but it writes the library path to the config17:17
tobiashpabelanger: so I guess that ansible has some api to get the library path17:17
clarkbhrm we only get the work root and stuff below that17:18
*** myoung|lunch is now known as myoung17:19
tobiashpabelanger: maybe the path cache contains what you want?17:22
tobiashhttps://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/loader.py#L2317:22
tobiashhm I guess not17:23
*** jpena is now known as jpena|off17:24
mordredcorvus: that's what my ansible PR is about (I just pushed up a second version of it - missed a variable in the first one)17:29
clarkbmordred: I'm the wrong person to review that ansible PR but I would be concerned that if all the tasks end up being noops then your callback won't run. This maybe makes sense given its on_start and you never really start the task however it is a behavior change I think17:31
fungiokay, this is weird. now the error i'm hitting is that `/etc/ansible/hosts/openstack --list` is broken on puppetmaster.o.o, with a 404 getting raised by https://auth.vexxhost.net/v2.0/tokens17:34
fungier, wrong channel, sorry117:35
corvusmordred: i'm not sure we're on the same page; are you around now?17:37
*** gtema has quit IRC17:40
tobiashmordred: I ran the test case with the ansible from your branch and it crashed ;)17:41
tobiashhttp://paste.openstack.org/show/723407/17:41
corvusi need to afk for a bit17:42
*** electrofelix has quit IRC17:45
mordredtobiash: excellent!17:48
*** yolanda has quit IRC17:50
mordredcorvus: well, there is a provision for running it on the first task if all the tasks are noop - but also we might not be on the same page as well17:52
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs  https://review.openstack.org/57517318:03
pabelangertobiash: clarkb: ^found a way to get the info18:03
pabelangerthanks for the pointer tobiash18:03
*** openstackgerrit has quit IRC18:04
*** yolanda has joined #zuul18:05
*** myoung is now known as myoung|biab18:08
*** harlowja has joined #zuul18:10
corvusmordred: in the include_role case, there are no no-op tasks18:17
tobiashpabelanger: added another comment18:18
tobiashpabelanger: lgtm otherwise18:18
corvusmordred: it looks like the playbook iterator is actually producing two different task objects for the shell task following the include_role (normally it returns the same task object for both hosts)18:19
corvusthe play compiler returns a single task object for each task.  normally that same object is returned by the iterator and used to run the tasks.  but in the case of the task-after-include_role, the task object the compiler produces is not used for either of the hosts.  something creates two entirely new tasks.  that's as far as i've gotten.18:22
*** openstackgerrit has joined #zuul18:23
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs  https://review.openstack.org/57517318:23
pabelangertobiash: looking18:23
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs  https://review.openstack.org/57517318:31
pabelangertobiash: okay, that should cover future changes^18:31
openstackgerritPaul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs  https://review.openstack.org/57517318:32
tobiashpabelanger: thanks, +218:35
pabelangertobiash: yay18:35
mordredcorvus: ah - gotcha. thank you - that makes sense -- or, at least, the words you said do18:36
tobiashmordred: note that the error only happens if you have an include_role task before18:36
tobiashso that seems to have some weird sort of side effect18:36
pabelangermordred: clarkb: corvus: do you also mind looking at 575173^18:36
*** elyezer has quit IRC18:40
corvuspabelanger: will be happy to after i'm done with the ansible error18:43
clarkbpabelanger: is the default module path not where we install external job defined modules too?18:45
pabelangerclarkb: no, it should be tmp/tmpag22d2js/zuul-test/executor-lib/ansible/zuul/ansible/library for all jobs, which is only used by zuul I think18:48
clarkbpabelanger: where do other modules get installed?18:49
clarkbor maybe that isn't implemented yet?18:49
pabelangerclarkb: I don't think they do, they are loaded directly from their src_root path18:49
clarkbgot it18:50
clarkband those paths are not added to the default module path?18:50
clarkbwe have to tell ansible how to find them somehow and it seems like the moduel path in config is how one might do it18:50
pabelangerright, that is the value we set at  https://git.zuul-ci.org/cgit/zuul/tree/zuul/executor/server.py#n140018:50
corvusmordred, tobiash: https://github.com/ansible/ansible/blob/devel/lib/ansible/playbook/included_file.py#L15018:54
corvusmordred, tobiash: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/strategy/linear.py#L33118:55
mordredcorvus: ooh18:56
corvusthe linear strategy processes the results of the include_role task, which causes new copies of subsequent(?!) tasks to be made... apparently18:56
mordredthat ...18:57
tobiashinteresting18:59
*** hashar is now known as hasharAway19:00
corvusit should only be copying the include_role task19:03
corvusso i'm not sure how the block above it gets copied19:03
corvusmordred, tobiash: we're deep into ansible spaghetti here.  do we need to pause and regroup and see if we should try to solve this another way?19:06
mordredcorvus: yes19:06
tobiashcorvus: you mean maybe switching to the alternative log streaming solution?19:07
corvusi know of two alternate ideas so far:19:08
corvus1) clarkb suggested we could use the task uuid itself (i think that should generally work, except we need to add an inventory host uniqueness to it, in case the same task is run on the same physical host under two different inventory hosts)19:08
corvus2) the unix socket log streaming.  i'm not sure if we've thought about the interaction of that with containers, so if we go down that road, we should give that at least some quick thought.19:09
mordredI do not believe 1 will work - we run in to the same issue19:10
mordred(I investigated 1 for a while yesterday)19:10
corvusah, i guess the task uuid isn't automatically in the module args?19:10
mordredyah - that is the issue19:10
corvusmordred: however, the issue doesn't seem to be that on_task_start isn't being called, it is, it's just getting a different task every time19:11
mordredit turns out that ansible provides VERY minimal things to module args by default ... basically nothing19:11
corvusmordred: so if we update the existing code to do what it's doing now, except, instead of generating the uuid, it just uses task._uuid.... maybe that will work?19:11
mordredcorvus: right- and with the linear strategy, we only get a task_start once per task/host pair19:11
corvusmordred: oooh i see19:12
corvusso indeed it won't be called on the second one19:12
mordredexactly19:12
mordredsame problem as the thing I was chasing, even though the mechanism you found for producing that problem is different19:12
corvusoh, i have a meeting now, have to run.  biab19:13
*** myoung|biab is now known as myoung19:15
*** gtema has joined #zuul19:27
*** gtema has quit IRC19:57
clarkbis there a tldr on how event processing works for github triggers? because I have a ~40 minute old event that is logged as being sent in github, logged in zuul web, but nothing from the scheduler doing anything with it20:30
clarkbreading the code the X-Github-Delivery field is what I should grep for20:37
clarkbbut I don't see any of those logs in the scheduler at all20:38
clarkbweird now it shows up in there20:40
clarkbI guess zuul was just behind on event processing20:40
clarkbI found a bug /me makes a fix20:46
openstackgerritClark Boylan proposed openstack-infra/zuul master: Log github delivery ids properly  https://review.openstack.org/57523120:56
corvusmordred: so regarding the domain socket logging -- does this impact future container work at all?20:58
corvusit shouldn't be an issue with k8s_raw module, nor would it be for a container that ran ssh that we just used with the ssh connection plugin21:00
corvusthat leaves the kubectl connection plugin21:00
clarkbcorvus: mordred what is the requirement for the logger? that it get a unique string for each task? could we just generate a uuid on the remote side when it starts logging to have that?21:01
clarkbbasically does this have to come from the control side of ansible or can we just make up a number21:02
corvusthe current system requires running zuul_stream on the worker.  even if kubectl started zuul_stream, ansible isn't going to be able to connect to it without setting up ingress21:02
corvusso i think that means that the current system won't work with kubectl, so switching to domain sockets won't be a regression21:03
corvusmordred: does that sound right?21:03
*** openstackgerrit has quit IRC21:04
corvusmordred: sorry, s/zuul_stream/zuul_console/ in that ^21:04
corvusclarkb: the zuul_stream callback plugin needs to know the ID that the command module will use on the remode side21:05
corvusclarkb: so today, that's handled by the zuul_stream callback plugin generating an id, sticking it in the arguments for the command task, then using that id internally to start up the stream receiver and request the log stream21:06
corvusclarkb: any piece of information that's shared between the callback plugin on the local side and the command module on the remote side could be used.  the point mordred made earlier is that there isn't really any shared info there by default.  we have to add it ourselves.  and adding it is proving problematic.21:07
clarkbgotcha because it is an explicit request for a specific file from the streamer21:07
corvusclarkb: yep.  and it's an explicit request because we have to keep supplying old files, because the remote side can complete a command, and the local side can start a new one, all before the local side has finished receiving the data from the earlier command.  in other words, they can overlap.21:09
clarkbcould we use the task name for that instead?21:09
corvusthey aren't unique21:09
clarkbnormalize it to fs safe string, both sides have that info iirc21:09
clarkboh for some reason I thought they were21:09
corvusclarkb: but still, the command module doesn't have it.21:10
corvusthe only information the command module has is the arguments passed to it.21:10
corvusclarkb: this is what we have to work with on the remote side: http://paste.openstack.org/show/723414/21:11
clarkbhrm21:11
corvusi'm starting to think we need to revert to 2.321:13
clarkbcould we provide an override of the linear strategy module and call our callback on every task?21:14
clarkbseems like the issue is it is only called the once21:14
clarkbbut if it were called each time then copies of tasks would be ok?21:14
corvusclarkb: that might work21:14
clarkbif ^ doesn't work then ya I'm stumped as to fixing in short period of time21:15
corvusprobably best to think of that as a short term fix21:15
clarkbyup21:15
corvus(we've had users unable to make progress on their work since friday, so this is becoming a serious issue for us)21:16
clarkbI'm not really in a great position to write and tset ^ either21:19
clarkbI think a revert is likely doable as I doubt many have updated to ansible 2.5 specific code21:19
corvusclarkb: i'm informally testing that idea real quick21:20
*** hasharAway has quit IRC21:22
corvusclarkb: yeah, i think that might work21:28
corvusclarkb: we just add this to zuul_stream: http://paste.openstack.org/show/723415/21:28
corvusclarkb: and then add that callback after the v2_on_task_start callback in the linear strategy plugin21:29
corvus(except always call it)21:29
clarkboutside of the block that checks ya that21:29
corvusit passes my local testing with my hacked up copy of ansible21:29
corvusi'll try to form that into a real patch21:30
clarkbcool I can review it when its up21:30
*** openstackgerrit has joined #zuul21:54
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role  https://review.openstack.org/57448721:54
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy  https://review.openstack.org/57524821:54
corvusit occurred to me that you probably can't see the changes to linear.py easily.. i can split that into two changes real quick21:55
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2)  https://review.openstack.org/57524821:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role  https://review.openstack.org/57448721:58
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2)  https://review.openstack.org/57525021:58
corvusclarkb, mordred, tobiash: ^21:58
clarkbwhat calls import ansible stretgey plugin? (sorry reviewing over sandwich so that isnt exact name)22:03
*** harlowja has quit IRC22:11
*** harlowja has joined #zuul22:12
corvusclarkb: nothing; it's left over from tobiash's change.  it would be necessary if we needed to import the real plugin.22:18
corvusbut in this case, the method we're changing is most of the plugin anyway, so i just copied id.22:19
corvusit22:19
*** myoung is now known as myoung|off22:20
*** harlowja has quit IRC22:21
clarkbcorvus: https://review.openstack.org/#/c/575248/2/zuul/ansible/paths.py that is leftover?22:27
clarkboh I see its there to allow us to get at the real one22:27
clarkbbut as you say we just copy the whole thing22:27
corvusyeah, we could remove it since it's unused; i just left it there to ride along with the rest of the framework.22:28
mordredcorvus: yes - I think what you said about domain sockets not being a regression is correct22:34
mordredcorvus: and yes- I think temporarily overriding the whole strategy should give us time to figure out a proper fix22:36
clarkbcorvus: the proposed stack lgtm as short term fix22:36
clarkbI've +2'd those change that needed reviews22:36
mordredsince, it turns out - this one is hairy22:36
corvusmordred: cool, so we'll try to land 575248 for now, and then work on the domain socket as the "real" solution?22:36
mordredyes22:37
mordredI think that's the best bet22:37
corvusoh, 575248's parent isn't going to pass tests because i didn't put the flake8 note in there22:37
corvusi'll fix that in just a sec22:37
corvusmordred: now the real question: what do we need to do to progress the domain socket work?22:38
mordredalso - I thnik for kubectl, we might be able to get the console stream from the k8s api22:38
mordredsince it has a console output endpoint22:38
clarkbthat won't give you the ansible output though22:38
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2)  https://review.openstack.org/57525022:38
clarkbwill it?22:38
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2)  https://review.openstack.org/57524822:38
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role  https://review.openstack.org/57448722:38
clarkbseems like that is the wrong layer22:38
mordredclarkb: oh - you're right - that won't work22:39
mordredcorvus: I think it might be helpful to have a phone call to figure out what to do to finish the domain socket work22:39
corvusmordred: have time now?22:40
mordredI think it's a bit late for my brain to dive in fully - how about first thing your morning? (that way I can go re-read patches and remind myself where we are)22:40
corvussounds good... we can use the conf bridge and clarkb and tobiash (and others) can join if they are around.  and whatever we come up with we can summarize at the end22:41
clarkbI can do 8am tomorrow22:41
clarkbpacific22:41
corvuswfm22:42
corvuslooks like https://review.openstack.org/541434 is the bottom of the stack in question22:42
mordred++22:42
mordredyes - that's it22:43
clarkbdo we want to go ahead and try to merge this current stack nowish? (part of the issue here is node contention with tripleo...)22:44
clarkbwe could direct enqueue to the gate to get zuul stuff in then do executor restarts22:44
corvusclarkb: yeah, i think that's a reasonable course of action22:45
mordredcorvus, clarkb: I've +2'd or +3'd all of them22:46
clarkbcorvus: mordred https://review.openstack.org/#/c/574786/1 and https://review.openstack.org/#/c/574643/4 just need to be approved22:46
clarkbthen we can enqueue to gate I think22:47
clarkbI'll let you approve those since you have reviewed them previously22:47
*** threestrands has joined #zuul22:47
openstackgerritJames E. Blair proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs  https://review.openstack.org/57517322:48
clarkbcorvus: has -2 on that first one22:48
corvusclarkb: gtg22:49
clarkbok I'll do the enqueue now22:50
corvusmordred: thanks i was about to ask you to look at 575173 if you had time :)22:51
clarkbok enqueued22:53
clarkbcorvus: mordred https://review.openstack.org/575231 that fixes an opsy debugging issue I found today22:53
mordredclarkb: that's super weird - and we should talk to jlk about fixing it in the github library22:54
corvusclarkb: yeah, i believe all the headers are lowercased now because "different web frameworks handle things differently"22:54
mordredoh - wait - nevermind ... that's a thing we construct22:54
clarkbI decided to grep on the prefix of that log string and found a bunch of Nones22:54
mordredpython requests has a caseinsensitivedict that it uses for headers - and I was mistakenly thinking this was based on it22:54
clarkbwhich explained my initial debug confusion of why doesn't this show up in the log22:54
mordredyah22:55
corvusyeah, i think this is partly because of the fact that we ship the headers over the gearman bus22:55
mordredyah22:55
mordredwe _could_ reconstruct them into a CIdict when we pull them out of that args - but that seems like work22:56
corvusnow all our uses are lower case :)22:56
mordred>>> import requests.structures22:57
mordred>>> f = requests.structures.CaseInsensitiveDict({'foo': 'bar'})22:57
mordred>>> f['Foo']22:57
mordred'bar'22:57
mordredjust for reference22:57
clarkbhttps://zuul.openstack.org/stream.html?uuid=bae09735e26f44ed985e53b7b792faea&logfile=console.log zuul running its first kata test that didn't die early due to lack of nested virt22:59
clarkblooks like its still broken but all of the other tests were too, so not sure that says anything22:59
jlkso, is there something I need to fix or...?23:01
clarkbjlk: I don't think so. we just have to handle the lower case version internally since that is what it gets normalized to23:04
jlkokay23:05
openstackgerritMerged openstack-infra/zuul master: Fix broken command tasks in handlers  https://review.openstack.org/57478623:13
openstackgerritMerged openstack-infra/zuul master: Fix command tasks with free play strategy  https://review.openstack.org/57478723:13
openstackgerritMerged openstack-infra/zuul master: Add blocks to the zuul stream test  https://review.openstack.org/57464323:13
openstackgerritMerged openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2)  https://review.openstack.org/57525023:13
corvusother changes are still running tests23:14
corvusbut they've both passed at least one of the py35 / py36 jobs23:15
openstackgerritMerged openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2)  https://review.openstack.org/57524823:21
openstackgerritMerged openstack-infra/zuul master: Add test for shell after include_role  https://review.openstack.org/57448723:21
clarkbok I think that those are the changes we wanted for restarting23:21
* clarkb migrates conversation to -infra23:21
openstackgerritMerged openstack-infra/zuul master: Log github delivery ids properly  https://review.openstack.org/57523123:24

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!