*** rlandy|rover is now known as rlandy|rover|bbl | 00:00 | |
*** elyezer has joined #zuul | 00:14 | |
*** jimi|ansible has joined #zuul | 02:24 | |
*** myoung|bbl is now known as myoung | 02:28 | |
*** dtruong has quit IRC | 02:45 | |
*** dtruong has joined #zuul | 02:47 | |
*** rlandy|rover|bbl is now known as rlandy|rover | 02:59 | |
*** rlandy|rover has quit IRC | 02:59 | |
*** bhavik1 has joined #zuul | 03:34 | |
*** dmsimard has quit IRC | 04:13 | |
*** bhavik1 has quit IRC | 04:19 | |
*** pcaruana has joined #zuul | 04:50 | |
*** eliqiao has quit IRC | 04:52 | |
*** pcaruana has quit IRC | 05:18 | |
*** bhavik1 has joined #zuul | 05:45 | |
*** bhavik1 has quit IRC | 05:49 | |
*** Rohaan has joined #zuul | 05:58 | |
*** pcaruana has joined #zuul | 06:06 | |
*** CrayZee has joined #zuul | 06:22 | |
Rohaan | tristanC: Hi, I was able to run an openshift-test zuul job yesterday | 06:44 |
---|---|---|
tristanC | Rohaan: nice! | 06:46 |
tristanC | A container was built per code review? | 06:47 |
Rohaan | Yeah, I saw Builds, DeploymentConfigs being created | 06:47 |
Rohaan | But I was getting some errors in them | 06:48 |
tristanC | Could you paste logs? | 06:48 |
*** pwhalen_ has joined #zuul | 06:49 | |
*** pwhalen has quit IRC | 06:50 | |
*** jpena|off is now known as jpena | 06:51 | |
Rohaan | tristanC: https://pastebin.com/7j86Ssgr | 06:52 |
tristanC | Rohaan: maybe the timeout is too short, perhaps try changing "retries: 120" into "retries: 600" in playbooks/openshift/prepare-namespace.yaml | 06:57 |
tristanC | (you'll have to commit and push that change before recheck) | 06:57 |
Rohaan | One more thing, earlier I used to login as SF Administrator using admin:userpass credentials. But since this installation I can't :( Is there any way to know how can I know what credentials are there for admin | 06:57 |
tristanC | Rohaan: right, admin password is now randomize at deployment time, you can get it using awk '/admin_password/ {print $2}' /etc/software-factory/sfconfig.yaml | 06:58 |
tristanC | Rohaan: change the retries value in playbooks/openshift/deploy-project.yaml too | 07:00 |
Rohaan | tristanC: okay | 07:00 |
*** gtema has joined #zuul | 07:01 | |
*** nguyenhai_ has quit IRC | 07:17 | |
*** threestrands has quit IRC | 07:21 | |
*** hashar has joined #zuul | 07:21 | |
Rohaan | tristanC: This time I saw successful build and deployments, ImagePullErrs also got resolved. but somehow build failed | 07:23 |
Rohaan | Some interesting output I found in /var/log/zuul/executor.log: https://pastebin.com/vjZdZ9KL | 07:24 |
Rohaan | My VM is using only 4GiB of RAM, Could that be a problem? | 07:24 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487 | 07:37 |
tobiash | just added two asserts to verify that also streaming actually works ^ | 07:38 |
*** CrayZee has quit IRC | 07:43 | |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Improve logging of GithubTriggerEvents https://review.openstack.org/575014 | 07:52 |
tristanC | Rohaan: 4GiB should be enough to do poc, but it's rather small, especially if it's hosting openshift too. Anyway you can prevent that executor warning by adding "load_multiplier=6" to the [executor] section of /etc/zuul/zuul.conf (and restart executor after) | 07:55 |
Rohaan | tristanC: Hm, okay | 07:58 |
tristanC | Rohaan: oh and in your case, adding a vcpu would help a lot | 08:10 |
*** electrofelix has joined #zuul | 08:16 | |
Rohaan | tristanC: This might be a noob question, Are you talking about increasing CPUs via VirtualBox menu? Or Adding Virtual CPU? | 08:36 |
tristanC | Rohaan: yes adding CPU to the vm | 08:38 |
Rohaan | tristanC: I restarted vm after increasing CPUs and RAM, but after restarting zuul services I'm getting `/zuul/api/tenant/local/status: Service Unavailable` in dashboard. I checked and all Zuul services are running, nodepool and Zookeeper are running too. But doing nodepool list gives https://pastebin.com/whf0ANm3 | 08:55 |
tristanC | Rohaan: did zookeeper started? (e.g. systemctl status zookeeper). Perhaps try restarting it? | 09:01 |
Rohaan | tristanC: Yep, works. Thx :) | 09:06 |
*** sshnaidm has quit IRC | 09:06 | |
Rohaan | tristanC: is there any limit in gerrit for patch updates? I'm trying to update patch; I can see started job in Zuul check pipeline, but not in gerrit. | 09:25 |
tristanC | Rohaan: you can push any number of new patchset or create new code-review too (depending on if you ammend or commit on top) | 09:25 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Clarify usage of the zuul_success variable https://review.openstack.org/575050 | 09:26 |
Rohaan | Somehow It is only showing till 4 patchsets. Looking at openshift logs it seems to be working fine: https://pastebin.com/9nb02aHq . But I haven't seen any green tests yet | 09:28 |
tristanC | Rohaan: you mean zuul didn't commented back on gerrit? | 09:30 |
Rohaan | Anyways I'm starting with a new change | 09:31 |
Rohaan | tristanC: I deleted that change and started new change on which Zuul has commented. | 09:32 |
*** sshnaidm has joined #zuul | 09:42 | |
Rohaan | tristanC: Build failed :( . This is what /var/log/zuul/executor.log shows: https://pastebin.com/Qid7r04m Any ideas? | 09:55 |
tristanC | Rohaan: I can't tell without looking at openshift resources, it failed because it couldn't get the http server pod name, e.g. these tasks didn't worked: https://review.openstack.org/#/c/570669/2/playbooks/openshift/prepare-namespace.yaml L118 | 09:58 |
tristanC | you'll have to run "oc get pods --field-selector=status.phase=Running" and see if the jsonpath is correct | 09:58 |
Rohaan | above command doesn't seem to be working. giving syntax err | 10:01 |
Rohaan | tristanC: oc get pods returns two pods: | 10:05 |
Rohaan | zuul-merger-server-1-l7xp5 1/1 Running 0 1m | 10:05 |
Rohaan | zuul-merger-server-2-build 0/1 Completed 0 1m | 10:05 |
tristanC | Rohaan: and what was the syntax err? It's very possible that the openshift-base job could use better tasks, the goal of this one is to get the pod name of the zuul-merger. | 10:08 |
tristanC | I didn't looked much into it, but there must be an easier ansible task to use instead of the jsonpath | 10:09 |
Rohaan | tristanC: Error: unknown flag: --field-selector | 10:09 |
tristanC | Rohaan: I've tested this job with openshift v3.9.0+46ff3a0-18 | 10:11 |
Rohaan | Oops. I'm on v3.6.1+269e828-9 | 10:12 |
Rohaan | Let me try with that openshift version | 10:13 |
*** jpena is now known as jpena|off | 10:24 | |
*** jpena|off is now known as jpena | 10:32 | |
Rohaan | tristanC: How can I check for jsonpath in field selector output? | 10:40 |
Rohaan | [root@dhcppc1 /]# oc get pods --field-selector=status.phase=Running | 10:41 |
Rohaan | NAME READY STATUS RESTARTS AGE | 10:41 |
Rohaan | demo-project-2-build 1/1 Running 0 13s | 10:41 |
Rohaan | zuul-merger-server-1-hlpd8 1/1 Running 0 1m | 10:41 |
tristanC | Rohaan: nice, you get to the deploy-project.yaml tasks, check L47 of https://review.openstack.org/#/c/570669/2/playbooks/openshift/deploy-project.yaml | 10:44 |
tristanC | and replace "{{ zuul.project.short_name }}" with demo-project | 10:44 |
Rohaan | okay | 10:45 |
Rohaan | tristanC: Do I need to restart zuul/nodepool? I just did recheck but got same result :( | 10:54 |
tristanC | Rohaan: if you update base jobs and push the commit, then zuul should pick it up, but it may be safer to restart the scheduler and executor to ensure they have the right content | 10:57 |
Rohaan | ah, I forgot :D | 10:57 |
Rohaan | tristanC: Were you talking about replacing zuul.project.short_name's all occurences or just line 47. I still got failure after job changes. I only changed line 47 | 11:13 |
*** elyezer has quit IRC | 11:14 | |
tristanC | Rohaan: wait no, dont change the jinja template in the base job, i thought you wanted to manually check the command used in the job | 11:16 |
tristanC | what's the status of the job, did it pass the pre phase? | 11:16 |
Rohaan | how can i check that? | 11:17 |
tristanC | Rohaan: on the review, zuul comments back the job result with a link to the logs | 11:17 |
Rohaan | yeah | 11:17 |
Rohaan | How can I check whether that has passed pre phase | 11:18 |
tristanC | Rohaan: then check the job-output.txt file, or clic the ara folder then the "tasks" button | 11:18 |
Rohaan | okay | 11:18 |
Rohaan | yeah post.yaml and pre.yaml | 11:19 |
Rohaan | it failed in pre | 11:19 |
tristanC | could you paste de job-output.txt content then? | 11:21 |
Rohaan | tristanC: https://ibb.co/kPgCPd | 11:22 |
Rohaan | okay | 11:22 |
Rohaan | tristanC: https://pastebin.com/zXNpmZ8D | 11:23 |
tristanC | Rohaan: it seems like you need to increase the retries count to 600 in the "wait for project image built" task of https://review.openstack.org/#/c/570669/2/playbooks/openshift/build-project.yaml | 11:25 |
tristanC | L86 of playbooks/openshift/build-project.yaml | 11:25 |
tristanC | (and don't forget to commit and push that change) | 11:25 |
Rohaan | Yeah :P | 11:27 |
Rohaan | okay rechecking | 11:27 |
* Rohaan prays that job turns green _|_ | 11:28 | |
*** elyezer has joined #zuul | 11:29 | |
Rohaan | failed :( | 11:39 |
tristanC | aww, what task failed? | 11:39 |
Rohaan | get project pod name | 11:39 |
tristanC | Rohaan: did you changed the retries limit to a bigger value for this task? (at the bottom of build-project.yam) | 11:43 |
Rohaan | It's retries value is set to 600 | 11:45 |
*** jpena is now known as jpena|lunch | 11:45 | |
tristanC | Rohaan: if you click on the ara report, then pre.yaml "tasks", and finally on the red rectangle marked with "FAILED", you'll get the detail of that task, could you paste the command being used? | 11:46 |
Rohaan | - name: get project pod name | 11:47 |
Rohaan | command: oc get pods --field-selector=status.phase=Running -o "jsonpath={.items[?(@.metadata.labels.app=='{{ demo-project }}')].metadata.name}" | 11:47 |
Rohaan | register: _pod_name | 11:47 |
Rohaan | retries: 600 | 11:47 |
Rohaan | delay: 1 | 11:47 |
Rohaan | until: "zuul.project.short_name in _pod_name.stdout" | 11:47 |
*** rlandy has joined #zuul | 11:49 | |
tristanC | Rohaan: oh i see, then edit the build-project.yaml file and replace "{{ demo-project }}" with "{{ zuul.project.short_name }}" | 11:49 |
*** rlandy is now known as rlandy|rover | 11:50 | |
tristanC | Rohaan: in jinja, "{{ }}" means "print that variable", so "{{ demo-project }}" is incorrect. It should either be simply "demo-project" (without the {}), or use the zuul variable | 11:50 |
tristanC | (Ansible may use jinja templating for any string in playbooks) | 11:52 |
Rohaan | tristanC: build-project.yaml or deploy-project.yaml(There is no demo-project occurence in build-project.yaml) | 11:56 |
tristanC | Rohaan: yes, I meant deploy-project.yaml | 11:57 |
Rohaan | tristanC: yay, build succeeded \o/ | 12:12 |
tristanC | Rohaan: at last! Now you should try a bad change on the demo.py script to make sure CI is failing on purpose | 12:13 |
tristanC | Also, now would be a good time to snapshot the vm :-) | 12:20 |
Rohaan | tristanC: Done :) | 12:37 |
*** nguyenhai has joined #zuul | 12:39 | |
*** elyezer has quit IRC | 12:41 | |
mordred | tristanC, Rohaan: super cool! | 12:50 |
*** pwhalen_ is now known as pwhalen | 13:05 | |
*** pwhalen has quit IRC | 13:05 | |
*** pwhalen has joined #zuul | 13:05 | |
*** dkranz has joined #zuul | 13:24 | |
*** Rohaan has quit IRC | 13:28 | |
*** Rohaan has joined #zuul | 13:43 | |
corvus | clarkb, mordred, tobiash: any updates on https://storyboard.openstack.org/#!/story/2002528 since yesterday? | 13:55 |
tobiash | corvus: no, I didn't have time today (except adding two asserts which would make your reproducer ready for merge if the issue is fixed) | 13:56 |
corvus | tobiash: ah thanks | 13:56 |
tobiash | I'm currently fixing a github race | 13:56 |
tobiash | corvus: I have a problem with github which anounces pr's sometimes before the head commit is querieable | 13:56 |
mordred | corvus: no, I haven't gotten to do anything actually technical yet today (morning of meetings and meeting prep for me) | 13:57 |
tobiash | that results in occational 404 exceptions in the github driver event handler | 13:57 |
mordred | corvus: my plan for when that's done is to take http://paste.openstack.org/show/723335, make a PR to ansible with it, then make a patch on top of tobiash's patch to make a local strategy plugin with that logic change | 13:58 |
tobiash | corvus: what would you prefer, a local retry mechanism around getCommit or handling the exception and reenqueue the event? | 13:58 |
corvus | tobiash: i thought we had a delay in the github driver; what about increasing that? | 13:59 |
tobiash | corvus: the delay was removed after solving our caching issues | 13:59 |
tobiash | corvus: that is the exception http://paste.openstack.org/show/723388/ | 13:59 |
tobiash | and I'd prefer a retry over a general delay | 13:59 |
corvus | tobiash: hrm. i'm not sure we should have removed the delay -- we encountered this problem before and that's why it was added. the same thing is true of the gerrit driver, and we solved it there with the delay. | 14:00 |
tobiash | because it's great to not have this delay :) | 14:00 |
tobiash | the problem with a delay is how to adjust it right to be 'safe enough' without having too much impact on delays | 14:02 |
tobiash | a retry would just work | 14:02 |
tobiash | back in the v2 days we also replaced the gerrit delay by a retry which worked great at that time | 14:04 |
corvus | tobiash: the advantage of a delay is that, in a system where it's needed, things work correctly the first time. | 14:04 |
*** hashar is now known as hasharAway | 14:05 | |
tobiash | where a retry would be more resilient to intermittend failures other than such races | 14:05 |
corvus | in a system that's normally delayed, you're sending twice as much network traffic because every other request is a failure | 14:06 |
*** jpena|lunch is now known as jpena | 14:06 | |
tobiash | is that the case for gerrit? | 14:06 |
tobiash | in our case we had sometimes delays, but not that often | 14:06 |
corvus | in ours, it's fairly continuous | 14:06 |
tobiash | (gerrit last year) | 14:06 |
corvus | or at least it was, several gerrit versions ago :) | 14:07 |
corvus | it simply took gerrit too long to actually commit the transactions involved | 14:07 |
corvus | i suppose the best approach would be both. a delay with a back-off triggered by failures and retries. :) | 14:08 |
tobiash | that would be great | 14:08 |
corvus | tobiash: if you feel strongly about avoiding the delay, i'd be okay with a local retry. i think that's better than re-enqueing the event (which brings up ordering issues). and maybe later we can add the delay back in and hook it up to the retries? | 14:09 |
tobiash | but at least in the github case a simple delay would do it properly (we have probably less than 1% of events that hit this race today) | 14:09 |
corvus | tobiash: i assume you meant 'retry' in that last comment? | 14:09 |
tobiash | corvus: oh yes ;) | 14:10 |
tobiash | corvus: I was also thinking about the ordering issues thus I implemented a local retry and currently trying to craft a test case | 14:10 |
corvus | ++ | 14:11 |
*** Rohaan has quit IRC | 14:11 | |
tobiash | corvus: what about gerrit, would you like that combined approach? | 14:11 |
corvus | yeah, i think it would be great. it's fascinating to me that both systems have the same flaw. :) | 14:12 |
tobiash | yes, in github it's probably by design because of performance | 14:12 |
corvus | mordred: what's the story with http://paste.openstack.org/show/723335/ ? | 14:13 |
tobiash | when requesting a commit you even get diffs back in the response | 14:13 |
tobiash | corvus: also if you have time this week I'd like to discuss a repo-state thing that has bitten us | 14:15 |
tobiash | maybe after the ansible things are sorted out? | 14:16 |
corvus | tobiash: sounds good | 14:16 |
tobiash | :) | 14:16 |
*** Rohaan has joined #zuul | 14:20 | |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Make a proper driver notification API https://review.openstack.org/574896 | 14:22 |
fungi | tobiash: corvus: is there a reason we've still got https://storyboard.openstack.org/#!/story/2002177 set to private since https://review.openstack.org/574788 is public (and merged for close to a day now)? | 14:22 |
mordred | corvus: https://github.com/ansible/ansible/pull/41495 - made the PR | 14:24 |
*** Rohaan has quit IRC | 14:24 | |
tobiash | fungi: the upstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1588855 | 14:24 |
openstack | bugzilla.redhat.com bug 1588855 in vulnerability "ansible: Failed tasks do not honour no_log option allowing for secrets to be disclosed in logs" [Medium,New] - Assigned to security-response-team | 14:24 |
tobiash | it was private and I got access so cannot see if it's still private | 14:24 |
fungi | tobiash: it's public | 14:24 |
fungi | (note even our meetbot was able to scrape it) | 14:24 |
mordred | corvus: the tl;dr is that the linear strategy plugin makes fake Noop Task objects that it sticks into its task queue for non-matching hosts - but it also only fires the callback method on the first task/host in the queue | 14:25 |
tobiash | just saw it :) | 14:25 |
fungi | tobiash: as is the commit message for the change which merged to fix that in ansible | 14:25 |
fungi | or, rather, changelog entry i mean | 14:25 |
*** elyezer has joined #zuul | 14:25 | |
mordred | corvus: so if the first host for a given task doesn't match, the callback gets fired with the fake Noop task, which means setting the parameter on that task in the callback doesn't persist to the real Task | 14:25 |
*** dmsimard has joined #zuul | 14:29 | |
*** dmsimard has joined #zuul | 14:29 | |
tobiash | mordred: fwiw in local tests I never saw the callback handling a noop task | 14:33 |
tobiash | mordred: so this could be a red herring | 14:33 |
openstackgerrit | Tobias Henkel proposed openstack-infra/zuul master: Retry queries for commits https://review.openstack.org/575137 | 14:47 |
tobiash | fungi: so I think we can make our story public too | 14:48 |
fungi | tobiash: mostly wondering what we risk disclosing in the story at this point since we already merged the fix for it with a relatively detailed commit message | 14:49 |
tobiash | fungi: it was just late in the evening for me and everyone else working on the other ansible problems | 14:50 |
tobiash | fungi: so I think just no one thought of opening that story | 14:50 |
fungi | no problem, just making sure i wasn't missing anything there. in openstack, our vulnerability management process is to switch embargoed bug reports to public before pushing any change into public code review | 14:50 |
fungi | which then allows us to reference the report in the commit messages too | 14:50 |
tobiash | I'll probably writing a mail later today | 14:50 |
fungi | so doing something similar in zuul would make sense to me anyway | 14:51 |
fungi | thanks! | 14:51 |
tobiash | fungi: maybe we should do that the next time | 14:51 |
tobiash | thanks for that tipp | 14:51 |
fungi | my pleasure | 14:51 |
* tobiash seems to remain being the bad security news poster ;) | 14:52 | |
tobiash | have to run now | 14:52 |
pabelanger | Hmm, an untrusted job can't use zuul_return, well can't delegate to localhost | 15:20 |
*** dmsimard has quit IRC | 15:21 | |
pabelanger | what are thoughts about changing that? Otherwise, parent / child jobs would all need to be trusted, right? | 15:29 |
mordred | pabelanger: do you need to delegate_to: localhost? could you just hosts: localhost: tasks: - zuul_return ? | 15:37 |
* mordred thinking out loud | 15:37 | |
pabelanger | mordred: I think we still block that too, but I can test | 15:38 |
mordred | pabelanger: I think that is easy to add zuul_return to a localhost whitelist - I think it might be harder to whitelist zuul_return for delegate_to - but I might be wrong and it might be the exact same work | 15:38 |
pabelanger | mordred: yah, doesn't need to be delegate_to, hosts: localhost would be fine | 15:40 |
mordred | (in any case, I think it's appropriate to allow use of zuul_return on localhost) | 15:40 |
mordred | corvus: ^^ yeah? | 15:40 |
Shrews | tobiash: corvus: Am I correct in thinking that pools within the static driver are essentially useless? Because the name for a node in the config must be unique (hostname or IP based on https://zuul-ci.org/docs/nodepool/configuration.html#static-driver), we really don't need the pool concept | 15:51 |
pabelanger | mordred: yah, hosts: localhost same issue. But happy to use that syntax if we are to whitelist zuul_return | 15:52 |
mordred | pabelanger, corvus: we could also write zuul_return as an action plugin instead of a module so that it's just inherently always something that runs on localhost | 15:54 |
tobiash | pabelanger: delegate or directly localhost doesn't make any difference regarding whitelisting | 15:56 |
tobiash | That probably just needs whitelisting in normal.py | 15:57 |
*** myoung is now known as myoung|biaf | 15:58 | |
tobiash | But as mordred said action plugin would be universal and you wouln't have to take care of any delegate or localhost | 15:58 |
pabelanger | yah, I like that actually | 15:58 |
tobiash | Shrews: I think you're right | 15:59 |
*** weshay is now known as weshay|ruck | 16:00 | |
*** myoung|biaf is now known as myoung | 16:03 | |
*** rlandy|rover is now known as rlandy | 16:06 | |
*** myoung is now known as myoung|biaf | 16:07 | |
corvus | mordred: i agree with all the things you said | 16:08 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897 | 16:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897 | 16:10 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897 | 16:13 |
corvus | mordred, tobiash, clarkb: i've started local debugging of the include issue: http://paste.openstack.org/show/723399/ | 16:15 |
corvus | that looks like it's making the start callback in the order i'd expect | 16:16 |
clarkb | corvus: ok, I'm sort of distracted, but if I can help let me know | 16:16 |
clarkb | corvus: check the id() of the task object maybe? | 16:16 |
corvus | yeah, that's where i'm going next | 16:17 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Clarify static node name uniqueness in docs https://review.openstack.org/575159 | 16:18 |
tobiash | Shrews: Just thought a second time about uniqueness of static nodes | 16:24 |
tobiash | Shrews: we haven't thought that there can be multiple static nodes be mapped behind a single ip address but different ports (nat use case) | 16:24 |
Shrews | tobiash: that sounds like new functionality. i'll leave that for someone else to add ;) | 16:25 |
tobiash | I think we had this already working | 16:26 |
tobiash | but not sure anymore | 16:26 |
*** dmsimard has joined #zuul | 16:26 | |
tobiash | we at least had already nodes behind a nat gateway | 16:26 |
openstackgerrit | David Shrewsbury proposed openstack-infra/nodepool master: Clarify static node name uniqueness in docs https://review.openstack.org/575159 | 16:26 |
Shrews | tobiash: hmm | 16:27 |
tobiash | maybe that just magically worked with the dynamic driver? | 16:27 |
tobiash | I think that just didn't care about uniqueness | 16:28 |
Shrews | tobiash: the old _checkConcurrency() method didn't take port into consideration. i'd be surprised if that worked | 16:28 |
Shrews | and if it did, maybe accidentally | 16:28 |
tobiash | but it's just a list with two nodes with same address and same label | 16:28 |
Shrews | nothing explicitly pointed it out | 16:28 |
tobiash | it's walking down the list until the first free node | 16:28 |
*** myoung|biaf is now known as myoung|lunch | 16:29 | |
corvus | can we split name/ip? name is required to be unique, but ip isn't? | 16:29 |
tobiash | name/hostname? | 16:30 |
tobiash | or would ip also accept a dns name? | 16:30 |
tobiash | but that split sounds good | 16:30 |
tobiash | just checked our production config and we don't use that nat gateway anymore, so that at least wouldn't break me immediately | 16:31 |
*** hasharAway is now known as hashar | 16:34 | |
*** ianychoi has quit IRC | 16:34 | |
*** ianychoi has joined #zuul | 16:35 | |
corvus | http://paste.openstack.org/show/723404/ the task executor does in fact get a different task object the second time | 16:55 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 16:57 |
pabelanger | corvus: tobiash: mordred: ^first attempt at whitelisting zuul_return | 16:57 |
corvus | i've confirmed that the iterator in linear.run() gets two different task objects for the failing task | 17:04 |
tobiash | pabelanger: commented | 17:05 |
corvus | i think that means we need to understand _get_next_task_lockstep | 17:05 |
pabelanger | tobiash: thanks, let me figure that out | 17:08 |
pabelanger | any idea how to get the executor-lib path from ansible? | 17:14 |
pabelanger | eg: /tmp/tmpag22d2js/zuul-test/executor-lib/ansible/zuul/ansible/library/zuul_return.py | 17:14 |
clarkb | zuul knows what it is rgiht? is it added as a variable by zuul to the zuul dict? | 17:17 |
tobiash | pabelanger: I think there is no env variable but it writes the library path to the config | 17:17 |
tobiash | pabelanger: so I guess that ansible has some api to get the library path | 17:17 |
clarkb | hrm we only get the work root and stuff below that | 17:18 |
*** myoung|lunch is now known as myoung | 17:19 | |
tobiash | pabelanger: maybe the path cache contains what you want? | 17:22 |
tobiash | https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/loader.py#L23 | 17:22 |
tobiash | hm I guess not | 17:23 |
*** jpena is now known as jpena|off | 17:24 | |
mordred | corvus: that's what my ansible PR is about (I just pushed up a second version of it - missed a variable in the first one) | 17:29 |
clarkb | mordred: I'm the wrong person to review that ansible PR but I would be concerned that if all the tasks end up being noops then your callback won't run. This maybe makes sense given its on_start and you never really start the task however it is a behavior change I think | 17:31 |
fungi | okay, this is weird. now the error i'm hitting is that `/etc/ansible/hosts/openstack --list` is broken on puppetmaster.o.o, with a 404 getting raised by https://auth.vexxhost.net/v2.0/tokens | 17:34 |
fungi | er, wrong channel, sorry1 | 17:35 |
corvus | mordred: i'm not sure we're on the same page; are you around now? | 17:37 |
*** gtema has quit IRC | 17:40 | |
tobiash | mordred: I ran the test case with the ansible from your branch and it crashed ;) | 17:41 |
tobiash | http://paste.openstack.org/show/723407/ | 17:41 |
corvus | i need to afk for a bit | 17:42 |
*** electrofelix has quit IRC | 17:45 | |
mordred | tobiash: excellent! | 17:48 |
*** yolanda has quit IRC | 17:50 | |
mordred | corvus: well, there is a provision for running it on the first task if all the tasks are noop - but also we might not be on the same page as well | 17:52 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 18:03 |
pabelanger | tobiash: clarkb: ^found a way to get the info | 18:03 |
pabelanger | thanks for the pointer tobiash | 18:03 |
*** openstackgerrit has quit IRC | 18:04 | |
*** yolanda has joined #zuul | 18:05 | |
*** myoung is now known as myoung|biab | 18:08 | |
*** harlowja has joined #zuul | 18:10 | |
corvus | mordred: in the include_role case, there are no no-op tasks | 18:17 |
tobiash | pabelanger: added another comment | 18:18 |
tobiash | pabelanger: lgtm otherwise | 18:18 |
corvus | mordred: it looks like the playbook iterator is actually producing two different task objects for the shell task following the include_role (normally it returns the same task object for both hosts) | 18:19 |
corvus | the play compiler returns a single task object for each task. normally that same object is returned by the iterator and used to run the tasks. but in the case of the task-after-include_role, the task object the compiler produces is not used for either of the hosts. something creates two entirely new tasks. that's as far as i've gotten. | 18:22 |
*** openstackgerrit has joined #zuul | 18:23 | |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 18:23 |
pabelanger | tobiash: looking | 18:23 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 18:31 |
pabelanger | tobiash: okay, that should cover future changes^ | 18:31 |
openstackgerrit | Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 18:32 |
tobiash | pabelanger: thanks, +2 | 18:35 |
pabelanger | tobiash: yay | 18:35 |
mordred | corvus: ah - gotcha. thank you - that makes sense -- or, at least, the words you said do | 18:36 |
tobiash | mordred: note that the error only happens if you have an include_role task before | 18:36 |
tobiash | so that seems to have some weird sort of side effect | 18:36 |
pabelanger | mordred: clarkb: corvus: do you also mind looking at 575173^ | 18:36 |
*** elyezer has quit IRC | 18:40 | |
corvus | pabelanger: will be happy to after i'm done with the ansible error | 18:43 |
clarkb | pabelanger: is the default module path not where we install external job defined modules too? | 18:45 |
pabelanger | clarkb: no, it should be tmp/tmpag22d2js/zuul-test/executor-lib/ansible/zuul/ansible/library for all jobs, which is only used by zuul I think | 18:48 |
clarkb | pabelanger: where do other modules get installed? | 18:49 |
clarkb | or maybe that isn't implemented yet? | 18:49 |
pabelanger | clarkb: I don't think they do, they are loaded directly from their src_root path | 18:49 |
clarkb | got it | 18:50 |
clarkb | and those paths are not added to the default module path? | 18:50 |
clarkb | we have to tell ansible how to find them somehow and it seems like the moduel path in config is how one might do it | 18:50 |
pabelanger | right, that is the value we set at https://git.zuul-ci.org/cgit/zuul/tree/zuul/executor/server.py#n1400 | 18:50 |
corvus | mordred, tobiash: https://github.com/ansible/ansible/blob/devel/lib/ansible/playbook/included_file.py#L150 | 18:54 |
corvus | mordred, tobiash: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/strategy/linear.py#L331 | 18:55 |
mordred | corvus: ooh | 18:56 |
corvus | the linear strategy processes the results of the include_role task, which causes new copies of subsequent(?!) tasks to be made... apparently | 18:56 |
mordred | that ... | 18:57 |
tobiash | interesting | 18:59 |
*** hashar is now known as hasharAway | 19:00 | |
corvus | it should only be copying the include_role task | 19:03 |
corvus | so i'm not sure how the block above it gets copied | 19:03 |
corvus | mordred, tobiash: we're deep into ansible spaghetti here. do we need to pause and regroup and see if we should try to solve this another way? | 19:06 |
mordred | corvus: yes | 19:06 |
tobiash | corvus: you mean maybe switching to the alternative log streaming solution? | 19:07 |
corvus | i know of two alternate ideas so far: | 19:08 |
corvus | 1) clarkb suggested we could use the task uuid itself (i think that should generally work, except we need to add an inventory host uniqueness to it, in case the same task is run on the same physical host under two different inventory hosts) | 19:08 |
corvus | 2) the unix socket log streaming. i'm not sure if we've thought about the interaction of that with containers, so if we go down that road, we should give that at least some quick thought. | 19:09 |
mordred | I do not believe 1 will work - we run in to the same issue | 19:10 |
mordred | (I investigated 1 for a while yesterday) | 19:10 |
corvus | ah, i guess the task uuid isn't automatically in the module args? | 19:10 |
mordred | yah - that is the issue | 19:10 |
corvus | mordred: however, the issue doesn't seem to be that on_task_start isn't being called, it is, it's just getting a different task every time | 19:11 |
mordred | it turns out that ansible provides VERY minimal things to module args by default ... basically nothing | 19:11 |
corvus | mordred: so if we update the existing code to do what it's doing now, except, instead of generating the uuid, it just uses task._uuid.... maybe that will work? | 19:11 |
mordred | corvus: right- and with the linear strategy, we only get a task_start once per task/host pair | 19:11 |
corvus | mordred: oooh i see | 19:12 |
corvus | so indeed it won't be called on the second one | 19:12 |
mordred | exactly | 19:12 |
mordred | same problem as the thing I was chasing, even though the mechanism you found for producing that problem is different | 19:12 |
corvus | oh, i have a meeting now, have to run. biab | 19:13 |
*** myoung|biab is now known as myoung | 19:15 | |
*** gtema has joined #zuul | 19:27 | |
*** gtema has quit IRC | 19:57 | |
clarkb | is there a tldr on how event processing works for github triggers? because I have a ~40 minute old event that is logged as being sent in github, logged in zuul web, but nothing from the scheduler doing anything with it | 20:30 |
clarkb | reading the code the X-Github-Delivery field is what I should grep for | 20:37 |
clarkb | but I don't see any of those logs in the scheduler at all | 20:38 |
clarkb | weird now it shows up in there | 20:40 |
clarkb | I guess zuul was just behind on event processing | 20:40 |
clarkb | I found a bug /me makes a fix | 20:46 |
openstackgerrit | Clark Boylan proposed openstack-infra/zuul master: Log github delivery ids properly https://review.openstack.org/575231 | 20:56 |
corvus | mordred: so regarding the domain socket logging -- does this impact future container work at all? | 20:58 |
corvus | it shouldn't be an issue with k8s_raw module, nor would it be for a container that ran ssh that we just used with the ssh connection plugin | 21:00 |
corvus | that leaves the kubectl connection plugin | 21:00 |
clarkb | corvus: mordred what is the requirement for the logger? that it get a unique string for each task? could we just generate a uuid on the remote side when it starts logging to have that? | 21:01 |
clarkb | basically does this have to come from the control side of ansible or can we just make up a number | 21:02 |
corvus | the current system requires running zuul_stream on the worker. even if kubectl started zuul_stream, ansible isn't going to be able to connect to it without setting up ingress | 21:02 |
corvus | so i think that means that the current system won't work with kubectl, so switching to domain sockets won't be a regression | 21:03 |
corvus | mordred: does that sound right? | 21:03 |
*** openstackgerrit has quit IRC | 21:04 | |
corvus | mordred: sorry, s/zuul_stream/zuul_console/ in that ^ | 21:04 |
corvus | clarkb: the zuul_stream callback plugin needs to know the ID that the command module will use on the remode side | 21:05 |
corvus | clarkb: so today, that's handled by the zuul_stream callback plugin generating an id, sticking it in the arguments for the command task, then using that id internally to start up the stream receiver and request the log stream | 21:06 |
corvus | clarkb: any piece of information that's shared between the callback plugin on the local side and the command module on the remote side could be used. the point mordred made earlier is that there isn't really any shared info there by default. we have to add it ourselves. and adding it is proving problematic. | 21:07 |
clarkb | gotcha because it is an explicit request for a specific file from the streamer | 21:07 |
corvus | clarkb: yep. and it's an explicit request because we have to keep supplying old files, because the remote side can complete a command, and the local side can start a new one, all before the local side has finished receiving the data from the earlier command. in other words, they can overlap. | 21:09 |
clarkb | could we use the task name for that instead? | 21:09 |
corvus | they aren't unique | 21:09 |
clarkb | normalize it to fs safe string, both sides have that info iirc | 21:09 |
clarkb | oh for some reason I thought they were | 21:09 |
corvus | clarkb: but still, the command module doesn't have it. | 21:10 |
corvus | the only information the command module has is the arguments passed to it. | 21:10 |
corvus | clarkb: this is what we have to work with on the remote side: http://paste.openstack.org/show/723414/ | 21:11 |
clarkb | hrm | 21:11 |
corvus | i'm starting to think we need to revert to 2.3 | 21:13 |
clarkb | could we provide an override of the linear strategy module and call our callback on every task? | 21:14 |
clarkb | seems like the issue is it is only called the once | 21:14 |
clarkb | but if it were called each time then copies of tasks would be ok? | 21:14 |
corvus | clarkb: that might work | 21:14 |
clarkb | if ^ doesn't work then ya I'm stumped as to fixing in short period of time | 21:15 |
corvus | probably best to think of that as a short term fix | 21:15 |
clarkb | yup | 21:15 |
corvus | (we've had users unable to make progress on their work since friday, so this is becoming a serious issue for us) | 21:16 |
clarkb | I'm not really in a great position to write and tset ^ either | 21:19 |
clarkb | I think a revert is likely doable as I doubt many have updated to ansible 2.5 specific code | 21:19 |
corvus | clarkb: i'm informally testing that idea real quick | 21:20 |
*** hasharAway has quit IRC | 21:22 | |
corvus | clarkb: yeah, i think that might work | 21:28 |
corvus | clarkb: we just add this to zuul_stream: http://paste.openstack.org/show/723415/ | 21:28 |
corvus | clarkb: and then add that callback after the v2_on_task_start callback in the linear strategy plugin | 21:29 |
corvus | (except always call it) | 21:29 |
clarkb | outside of the block that checks ya that | 21:29 |
corvus | it passes my local testing with my hacked up copy of ansible | 21:29 |
corvus | i'll try to form that into a real patch | 21:30 |
clarkb | cool I can review it when its up | 21:30 |
*** openstackgerrit has joined #zuul | 21:54 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487 | 21:54 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy https://review.openstack.org/575248 | 21:54 |
corvus | it occurred to me that you probably can't see the changes to linear.py easily.. i can split that into two changes real quick | 21:55 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2) https://review.openstack.org/575248 | 21:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487 | 21:58 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2) https://review.openstack.org/575250 | 21:58 |
corvus | clarkb, mordred, tobiash: ^ | 21:58 |
clarkb | what calls import ansible stretgey plugin? (sorry reviewing over sandwich so that isnt exact name) | 22:03 |
*** harlowja has quit IRC | 22:11 | |
*** harlowja has joined #zuul | 22:12 | |
corvus | clarkb: nothing; it's left over from tobiash's change. it would be necessary if we needed to import the real plugin. | 22:18 |
corvus | but in this case, the method we're changing is most of the plugin anyway, so i just copied id. | 22:19 |
corvus | it | 22:19 |
*** myoung is now known as myoung|off | 22:20 | |
*** harlowja has quit IRC | 22:21 | |
clarkb | corvus: https://review.openstack.org/#/c/575248/2/zuul/ansible/paths.py that is leftover? | 22:27 |
clarkb | oh I see its there to allow us to get at the real one | 22:27 |
clarkb | but as you say we just copy the whole thing | 22:27 |
corvus | yeah, we could remove it since it's unused; i just left it there to ride along with the rest of the framework. | 22:28 |
mordred | corvus: yes - I think what you said about domain sockets not being a regression is correct | 22:34 |
mordred | corvus: and yes- I think temporarily overriding the whole strategy should give us time to figure out a proper fix | 22:36 |
clarkb | corvus: the proposed stack lgtm as short term fix | 22:36 |
clarkb | I've +2'd those change that needed reviews | 22:36 |
mordred | since, it turns out - this one is hairy | 22:36 |
corvus | mordred: cool, so we'll try to land 575248 for now, and then work on the domain socket as the "real" solution? | 22:36 |
mordred | yes | 22:37 |
mordred | I think that's the best bet | 22:37 |
corvus | oh, 575248's parent isn't going to pass tests because i didn't put the flake8 note in there | 22:37 |
corvus | i'll fix that in just a sec | 22:37 |
corvus | mordred: now the real question: what do we need to do to progress the domain socket work? | 22:38 |
mordred | also - I thnik for kubectl, we might be able to get the console stream from the k8s api | 22:38 |
mordred | since it has a console output endpoint | 22:38 |
clarkb | that won't give you the ansible output though | 22:38 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2) https://review.openstack.org/575250 | 22:38 |
clarkb | will it? | 22:38 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2) https://review.openstack.org/575248 | 22:38 |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487 | 22:38 |
clarkb | seems like that is the wrong layer | 22:38 |
mordred | clarkb: oh - you're right - that won't work | 22:39 |
mordred | corvus: I think it might be helpful to have a phone call to figure out what to do to finish the domain socket work | 22:39 |
corvus | mordred: have time now? | 22:40 |
mordred | I think it's a bit late for my brain to dive in fully - how about first thing your morning? (that way I can go re-read patches and remind myself where we are) | 22:40 |
corvus | sounds good... we can use the conf bridge and clarkb and tobiash (and others) can join if they are around. and whatever we come up with we can summarize at the end | 22:41 |
clarkb | I can do 8am tomorrow | 22:41 |
clarkb | pacific | 22:41 |
corvus | wfm | 22:42 |
corvus | looks like https://review.openstack.org/541434 is the bottom of the stack in question | 22:42 |
mordred | ++ | 22:42 |
mordred | yes - that's it | 22:43 |
clarkb | do we want to go ahead and try to merge this current stack nowish? (part of the issue here is node contention with tripleo...) | 22:44 |
clarkb | we could direct enqueue to the gate to get zuul stuff in then do executor restarts | 22:44 |
corvus | clarkb: yeah, i think that's a reasonable course of action | 22:45 |
mordred | corvus, clarkb: I've +2'd or +3'd all of them | 22:46 |
clarkb | corvus: mordred https://review.openstack.org/#/c/574786/1 and https://review.openstack.org/#/c/574643/4 just need to be approved | 22:46 |
clarkb | then we can enqueue to gate I think | 22:47 |
clarkb | I'll let you approve those since you have reviewed them previously | 22:47 |
*** threestrands has joined #zuul | 22:47 | |
openstackgerrit | James E. Blair proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173 | 22:48 |
clarkb | corvus: has -2 on that first one | 22:48 |
corvus | clarkb: gtg | 22:49 |
clarkb | ok I'll do the enqueue now | 22:50 |
corvus | mordred: thanks i was about to ask you to look at 575173 if you had time :) | 22:51 |
clarkb | ok enqueued | 22:53 |
clarkb | corvus: mordred https://review.openstack.org/575231 that fixes an opsy debugging issue I found today | 22:53 |
mordred | clarkb: that's super weird - and we should talk to jlk about fixing it in the github library | 22:54 |
corvus | clarkb: yeah, i believe all the headers are lowercased now because "different web frameworks handle things differently" | 22:54 |
mordred | oh - wait - nevermind ... that's a thing we construct | 22:54 |
clarkb | I decided to grep on the prefix of that log string and found a bunch of Nones | 22:54 |
mordred | python requests has a caseinsensitivedict that it uses for headers - and I was mistakenly thinking this was based on it | 22:54 |
clarkb | which explained my initial debug confusion of why doesn't this show up in the log | 22:54 |
mordred | yah | 22:55 |
corvus | yeah, i think this is partly because of the fact that we ship the headers over the gearman bus | 22:55 |
mordred | yah | 22:55 |
mordred | we _could_ reconstruct them into a CIdict when we pull them out of that args - but that seems like work | 22:56 |
corvus | now all our uses are lower case :) | 22:56 |
mordred | >>> import requests.structures | 22:57 |
mordred | >>> f = requests.structures.CaseInsensitiveDict({'foo': 'bar'}) | 22:57 |
mordred | >>> f['Foo'] | 22:57 |
mordred | 'bar' | 22:57 |
mordred | just for reference | 22:57 |
clarkb | https://zuul.openstack.org/stream.html?uuid=bae09735e26f44ed985e53b7b792faea&logfile=console.log zuul running its first kata test that didn't die early due to lack of nested virt | 22:59 |
clarkb | looks like its still broken but all of the other tests were too, so not sure that says anything | 22:59 |
jlk | so, is there something I need to fix or...? | 23:01 |
clarkb | jlk: I don't think so. we just have to handle the lower case version internally since that is what it gets normalized to | 23:04 |
jlk | okay | 23:05 |
openstackgerrit | Merged openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574786 | 23:13 |
openstackgerrit | Merged openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574787 | 23:13 |
openstackgerrit | Merged openstack-infra/zuul master: Add blocks to the zuul stream test https://review.openstack.org/574643 | 23:13 |
openstackgerrit | Merged openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2) https://review.openstack.org/575250 | 23:13 |
corvus | other changes are still running tests | 23:14 |
corvus | but they've both passed at least one of the py35 / py36 jobs | 23:15 |
openstackgerrit | Merged openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2) https://review.openstack.org/575248 | 23:21 |
openstackgerrit | Merged openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487 | 23:21 |
clarkb | ok I think that those are the changes we wanted for restarting | 23:21 |
* clarkb migrates conversation to -infra | 23:21 | |
openstackgerrit | Merged openstack-infra/zuul master: Log github delivery ids properly https://review.openstack.org/575231 | 23:24 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!