Wednesday, 2018-06-13

*** rlandy\|rover is now known as rlandy\|rover\|bbl		00:00
*** elyezer has joined #zuul		00:14
*** jimi\|ansible has joined #zuul		02:24
*** myoung\|bbl is now known as myoung		02:28
*** dtruong has quit IRC		02:45
*** dtruong has joined #zuul		02:47
*** rlandy\|rover\|bbl is now known as rlandy\|rover		02:59
*** rlandy\|rover has quit IRC		02:59
*** bhavik1 has joined #zuul		03:34
*** dmsimard has quit IRC		04:13
*** bhavik1 has quit IRC		04:19
*** pcaruana has joined #zuul		04:50
*** eliqiao has quit IRC		04:52
*** pcaruana has quit IRC		05:18
*** bhavik1 has joined #zuul		05:45
*** bhavik1 has quit IRC		05:49
*** Rohaan has joined #zuul		05:58
*** pcaruana has joined #zuul		06:06
*** CrayZee has joined #zuul		06:22
Rohaan	tristanC: Hi, I was able to run an openshift-test zuul job yesterday	06:44
tristanC	Rohaan: nice!	06:46
tristanC	A container was built per code review?	06:47
Rohaan	Yeah, I saw Builds, DeploymentConfigs being created	06:47
Rohaan	But I was getting some errors in them	06:48
tristanC	Could you paste logs?	06:48
*** pwhalen_ has joined #zuul		06:49
*** pwhalen has quit IRC		06:50
*** jpena\|off is now known as jpena		06:51
Rohaan	tristanC: https://pastebin.com/7j86Ssgr	06:52
tristanC	Rohaan: maybe the timeout is too short, perhaps try changing "retries: 120" into "retries: 600" in playbooks/openshift/prepare-namespace.yaml	06:57
tristanC	(you'll have to commit and push that change before recheck)	06:57
Rohaan	One more thing, earlier I used to login as SF Administrator using admin:userpass credentials. But since this installation I can't :( Is there any way to know how can I know what credentials are there for admin	06:57
tristanC	Rohaan: right, admin password is now randomize at deployment time, you can get it using awk '/admin_password/ {print $2}' /etc/software-factory/sfconfig.yaml	06:58
tristanC	Rohaan: change the retries value in playbooks/openshift/deploy-project.yaml too	07:00
Rohaan	tristanC: okay	07:00
*** gtema has joined #zuul		07:01
*** nguyenhai_ has quit IRC		07:17
*** threestrands has quit IRC		07:21
*** hashar has joined #zuul		07:21
Rohaan	tristanC: This time I saw successful build and deployments, ImagePullErrs also got resolved. but somehow build failed	07:23
Rohaan	Some interesting output I found in /var/log/zuul/executor.log: https://pastebin.com/vjZdZ9KL	07:24
Rohaan	My VM is using only 4GiB of RAM, Could that be a problem?	07:24
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487	07:37
tobiash	just added two asserts to verify that also streaming actually works ^	07:38
*** CrayZee has quit IRC		07:43
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Improve logging of GithubTriggerEvents https://review.openstack.org/575014	07:52
tristanC	Rohaan: 4GiB should be enough to do poc, but it's rather small, especially if it's hosting openshift too. Anyway you can prevent that executor warning by adding "load_multiplier=6" to the [executor] section of /etc/zuul/zuul.conf (and restart executor after)	07:55
Rohaan	tristanC: Hm, okay	07:58
tristanC	Rohaan: oh and in your case, adding a vcpu would help a lot	08:10
*** electrofelix has joined #zuul		08:16
Rohaan	tristanC: This might be a noob question, Are you talking about increasing CPUs via VirtualBox menu? Or Adding Virtual CPU?	08:36
tristanC	Rohaan: yes adding CPU to the vm	08:38
Rohaan	tristanC: I restarted vm after increasing CPUs and RAM, but after restarting zuul services I'm getting `/zuul/api/tenant/local/status: Service Unavailable` in dashboard. I checked and all Zuul services are running, nodepool and Zookeeper are running too. But doing nodepool list gives https://pastebin.com/whf0ANm3	08:55
tristanC	Rohaan: did zookeeper started? (e.g. systemctl status zookeeper). Perhaps try restarting it?	09:01
Rohaan	tristanC: Yep, works. Thx :)	09:06
*** sshnaidm has quit IRC		09:06
Rohaan	tristanC: is there any limit in gerrit for patch updates? I'm trying to update patch; I can see started job in Zuul check pipeline, but not in gerrit.	09:25
tristanC	Rohaan: you can push any number of new patchset or create new code-review too (depending on if you ammend or commit on top)	09:25
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Clarify usage of the zuul_success variable https://review.openstack.org/575050	09:26
Rohaan	Somehow It is only showing till 4 patchsets. Looking at openshift logs it seems to be working fine: https://pastebin.com/9nb02aHq . But I haven't seen any green tests yet	09:28
tristanC	Rohaan: you mean zuul didn't commented back on gerrit?	09:30
Rohaan	Anyways I'm starting with a new change	09:31
Rohaan	tristanC: I deleted that change and started new change on which Zuul has commented.	09:32
*** sshnaidm has joined #zuul		09:42
Rohaan	tristanC: Build failed :( . This is what /var/log/zuul/executor.log shows: https://pastebin.com/Qid7r04m Any ideas?	09:55
tristanC	Rohaan: I can't tell without looking at openshift resources, it failed because it couldn't get the http server pod name, e.g. these tasks didn't worked: https://review.openstack.org/#/c/570669/2/playbooks/openshift/prepare-namespace.yaml L118	09:58
tristanC	you'll have to run "oc get pods --field-selector=status.phase=Running" and see if the jsonpath is correct	09:58
Rohaan	above command doesn't seem to be working. giving syntax err	10:01
Rohaan	tristanC: oc get pods returns two pods:	10:05
Rohaan	zuul-merger-server-1-l7xp5 1/1 Running 0 1m	10:05
Rohaan	zuul-merger-server-2-build 0/1 Completed 0 1m	10:05
tristanC	Rohaan: and what was the syntax err? It's very possible that the openshift-base job could use better tasks, the goal of this one is to get the pod name of the zuul-merger.	10:08
tristanC	I didn't looked much into it, but there must be an easier ansible task to use instead of the jsonpath	10:09
Rohaan	tristanC: Error: unknown flag: --field-selector	10:09
tristanC	Rohaan: I've tested this job with openshift v3.9.0+46ff3a0-18	10:11
Rohaan	Oops. I'm on v3.6.1+269e828-9	10:12
Rohaan	Let me try with that openshift version	10:13
*** jpena is now known as jpena\|off		10:24
*** jpena\|off is now known as jpena		10:32
Rohaan	tristanC: How can I check for jsonpath in field selector output?	10:40
Rohaan	[root@dhcppc1 /]# oc get pods --field-selector=status.phase=Running	10:41
Rohaan	NAME READY STATUS RESTARTS AGE	10:41
Rohaan	demo-project-2-build 1/1 Running 0 13s	10:41
Rohaan	zuul-merger-server-1-hlpd8 1/1 Running 0 1m	10:41
tristanC	Rohaan: nice, you get to the deploy-project.yaml tasks, check L47 of https://review.openstack.org/#/c/570669/2/playbooks/openshift/deploy-project.yaml	10:44
tristanC	and replace "{{ zuul.project.short_name }}" with demo-project	10:44
Rohaan	okay	10:45
Rohaan	tristanC: Do I need to restart zuul/nodepool? I just did recheck but got same result :(	10:54
tristanC	Rohaan: if you update base jobs and push the commit, then zuul should pick it up, but it may be safer to restart the scheduler and executor to ensure they have the right content	10:57
Rohaan	ah, I forgot :D	10:57
Rohaan	tristanC: Were you talking about replacing zuul.project.short_name's all occurences or just line 47. I still got failure after job changes. I only changed line 47	11:13
*** elyezer has quit IRC		11:14
tristanC	Rohaan: wait no, dont change the jinja template in the base job, i thought you wanted to manually check the command used in the job	11:16
tristanC	what's the status of the job, did it pass the pre phase?	11:16
Rohaan	how can i check that?	11:17
tristanC	Rohaan: on the review, zuul comments back the job result with a link to the logs	11:17
Rohaan	yeah	11:17
Rohaan	How can I check whether that has passed pre phase	11:18
tristanC	Rohaan: then check the job-output.txt file, or clic the ara folder then the "tasks" button	11:18
Rohaan	okay	11:18
Rohaan	yeah post.yaml and pre.yaml	11:19
Rohaan	it failed in pre	11:19
tristanC	could you paste de job-output.txt content then?	11:21
Rohaan	tristanC: https://ibb.co/kPgCPd	11:22
Rohaan	okay	11:22
Rohaan	tristanC: https://pastebin.com/zXNpmZ8D	11:23
tristanC	Rohaan: it seems like you need to increase the retries count to 600 in the "wait for project image built" task of https://review.openstack.org/#/c/570669/2/playbooks/openshift/build-project.yaml	11:25
tristanC	L86 of playbooks/openshift/build-project.yaml	11:25
tristanC	(and don't forget to commit and push that change)	11:25
Rohaan	Yeah :P	11:27
Rohaan	okay rechecking	11:27
* Rohaan prays that job turns green _\|_		11:28
*** elyezer has joined #zuul		11:29
Rohaan	failed :(	11:39
tristanC	aww, what task failed?	11:39
Rohaan	get project pod name	11:39
tristanC	Rohaan: did you changed the retries limit to a bigger value for this task? (at the bottom of build-project.yam)	11:43
Rohaan	It's retries value is set to 600	11:45
*** jpena is now known as jpena\|lunch		11:45
tristanC	Rohaan: if you click on the ara report, then pre.yaml "tasks", and finally on the red rectangle marked with "FAILED", you'll get the detail of that task, could you paste the command being used?	11:46
Rohaan	- name: get project pod name	11:47
Rohaan	command: oc get pods --field-selector=status.phase=Running -o "jsonpath={.items[?(@.metadata.labels.app=='{{ demo-project }}')].metadata.name}"	11:47
Rohaan	register: _pod_name	11:47
Rohaan	retries: 600	11:47
Rohaan	delay: 1	11:47
Rohaan	until: "zuul.project.short_name in _pod_name.stdout"	11:47
*** rlandy has joined #zuul		11:49
tristanC	Rohaan: oh i see, then edit the build-project.yaml file and replace "{{ demo-project }}" with "{{ zuul.project.short_name }}"	11:49
*** rlandy is now known as rlandy\|rover		11:50
tristanC	Rohaan: in jinja, "{{ }}" means "print that variable", so "{{ demo-project }}" is incorrect. It should either be simply "demo-project" (without the {}), or use the zuul variable	11:50
tristanC	(Ansible may use jinja templating for any string in playbooks)	11:52
Rohaan	tristanC: build-project.yaml or deploy-project.yaml(There is no demo-project occurence in build-project.yaml)	11:56
tristanC	Rohaan: yes, I meant deploy-project.yaml	11:57
Rohaan	tristanC: yay, build succeeded \o/	12:12
tristanC	Rohaan: at last! Now you should try a bad change on the demo.py script to make sure CI is failing on purpose	12:13
tristanC	Also, now would be a good time to snapshot the vm :-)	12:20
Rohaan	tristanC: Done :)	12:37
*** nguyenhai has joined #zuul		12:39
*** elyezer has quit IRC		12:41
mordred	tristanC, Rohaan: super cool!	12:50
*** pwhalen_ is now known as pwhalen		13:05
*** pwhalen has quit IRC		13:05
*** pwhalen has joined #zuul		13:05
*** dkranz has joined #zuul		13:24
*** Rohaan has quit IRC		13:28
*** Rohaan has joined #zuul		13:43
corvus	clarkb, mordred, tobiash: any updates on https://storyboard.openstack.org/#!/story/2002528 since yesterday?	13:55
tobiash	corvus: no, I didn't have time today (except adding two asserts which would make your reproducer ready for merge if the issue is fixed)	13:56
corvus	tobiash: ah thanks	13:56
tobiash	I'm currently fixing a github race	13:56
tobiash	corvus: I have a problem with github which anounces pr's sometimes before the head commit is querieable	13:56
mordred	corvus: no, I haven't gotten to do anything actually technical yet today (morning of meetings and meeting prep for me)	13:57
tobiash	that results in occational 404 exceptions in the github driver event handler	13:57
mordred	corvus: my plan for when that's done is to take http://paste.openstack.org/show/723335, make a PR to ansible with it, then make a patch on top of tobiash's patch to make a local strategy plugin with that logic change	13:58
tobiash	corvus: what would you prefer, a local retry mechanism around getCommit or handling the exception and reenqueue the event?	13:58
corvus	tobiash: i thought we had a delay in the github driver; what about increasing that?	13:59
tobiash	corvus: the delay was removed after solving our caching issues	13:59
tobiash	corvus: that is the exception http://paste.openstack.org/show/723388/	13:59
tobiash	and I'd prefer a retry over a general delay	13:59
corvus	tobiash: hrm. i'm not sure we should have removed the delay -- we encountered this problem before and that's why it was added. the same thing is true of the gerrit driver, and we solved it there with the delay.	14:00
tobiash	because it's great to not have this delay :)	14:00
tobiash	the problem with a delay is how to adjust it right to be 'safe enough' without having too much impact on delays	14:02
tobiash	a retry would just work	14:02
tobiash	back in the v2 days we also replaced the gerrit delay by a retry which worked great at that time	14:04
corvus	tobiash: the advantage of a delay is that, in a system where it's needed, things work correctly the first time.	14:04
*** hashar is now known as hasharAway		14:05
tobiash	where a retry would be more resilient to intermittend failures other than such races	14:05
corvus	in a system that's normally delayed, you're sending twice as much network traffic because every other request is a failure	14:06
*** jpena\|lunch is now known as jpena		14:06
tobiash	is that the case for gerrit?	14:06
tobiash	in our case we had sometimes delays, but not that often	14:06
corvus	in ours, it's fairly continuous	14:06
tobiash	(gerrit last year)	14:06
corvus	or at least it was, several gerrit versions ago :)	14:07
corvus	it simply took gerrit too long to actually commit the transactions involved	14:07
corvus	i suppose the best approach would be both. a delay with a back-off triggered by failures and retries. :)	14:08
tobiash	that would be great	14:08
corvus	tobiash: if you feel strongly about avoiding the delay, i'd be okay with a local retry. i think that's better than re-enqueing the event (which brings up ordering issues). and maybe later we can add the delay back in and hook it up to the retries?	14:09
tobiash	but at least in the github case a simple delay would do it properly (we have probably less than 1% of events that hit this race today)	14:09
corvus	tobiash: i assume you meant 'retry' in that last comment?	14:09
tobiash	corvus: oh yes ;)	14:10
tobiash	corvus: I was also thinking about the ordering issues thus I implemented a local retry and currently trying to craft a test case	14:10
corvus	++	14:11
*** Rohaan has quit IRC		14:11
tobiash	corvus: what about gerrit, would you like that combined approach?	14:11
corvus	yeah, i think it would be great. it's fascinating to me that both systems have the same flaw. :)	14:12
tobiash	yes, in github it's probably by design because of performance	14:12
corvus	mordred: what's the story with http://paste.openstack.org/show/723335/ ?	14:13
tobiash	when requesting a commit you even get diffs back in the response	14:13
tobiash	corvus: also if you have time this week I'd like to discuss a repo-state thing that has bitten us	14:15
tobiash	maybe after the ansible things are sorted out?	14:16
corvus	tobiash: sounds good	14:16
tobiash	:)	14:16
*** Rohaan has joined #zuul		14:20
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Make a proper driver notification API https://review.openstack.org/574896	14:22
fungi	tobiash: corvus: is there a reason we've still got https://storyboard.openstack.org/#!/story/2002177 set to private since https://review.openstack.org/574788 is public (and merged for close to a day now)?	14:22
mordred	corvus: https://github.com/ansible/ansible/pull/41495 - made the PR	14:24
*** Rohaan has quit IRC		14:24
tobiash	fungi: the upstream bug is https://bugzilla.redhat.com/show_bug.cgi?id=1588855	14:24
openstack	bugzilla.redhat.com bug 1588855 in vulnerability "ansible: Failed tasks do not honour no_log option allowing for secrets to be disclosed in logs" [Medium,New] - Assigned to security-response-team	14:24
tobiash	it was private and I got access so cannot see if it's still private	14:24
fungi	tobiash: it's public	14:24
fungi	(note even our meetbot was able to scrape it)	14:24
mordred	corvus: the tl;dr is that the linear strategy plugin makes fake Noop Task objects that it sticks into its task queue for non-matching hosts - but it also only fires the callback method on the first task/host in the queue	14:25
tobiash	just saw it :)	14:25
fungi	tobiash: as is the commit message for the change which merged to fix that in ansible	14:25
fungi	or, rather, changelog entry i mean	14:25
*** elyezer has joined #zuul		14:25
mordred	corvus: so if the first host for a given task doesn't match, the callback gets fired with the fake Noop task, which means setting the parameter on that task in the callback doesn't persist to the real Task	14:25
*** dmsimard has joined #zuul		14:29
*** dmsimard has joined #zuul		14:29
tobiash	mordred: fwiw in local tests I never saw the callback handling a noop task	14:33
tobiash	mordred: so this could be a red herring	14:33
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul master: Retry queries for commits https://review.openstack.org/575137	14:47
tobiash	fungi: so I think we can make our story public too	14:48
fungi	tobiash: mostly wondering what we risk disclosing in the story at this point since we already merged the fix for it with a relatively detailed commit message	14:49
tobiash	fungi: it was just late in the evening for me and everyone else working on the other ansible problems	14:50
tobiash	fungi: so I think just no one thought of opening that story	14:50
fungi	no problem, just making sure i wasn't missing anything there. in openstack, our vulnerability management process is to switch embargoed bug reports to public before pushing any change into public code review	14:50
fungi	which then allows us to reference the report in the commit messages too	14:50
tobiash	I'll probably writing a mail later today	14:50
fungi	so doing something similar in zuul would make sense to me anyway	14:51
fungi	thanks!	14:51
tobiash	fungi: maybe we should do that the next time	14:51
tobiash	thanks for that tipp	14:51
fungi	my pleasure	14:51
* tobiash seems to remain being the bad security news poster ;)		14:52
tobiash	have to run now	14:52
pabelanger	Hmm, an untrusted job can't use zuul_return, well can't delegate to localhost	15:20
*** dmsimard has quit IRC		15:21
pabelanger	what are thoughts about changing that? Otherwise, parent / child jobs would all need to be trusted, right?	15:29
mordred	pabelanger: do you need to delegate_to: localhost? could you just hosts: localhost: tasks: - zuul_return ?	15:37
* mordred thinking out loud		15:37
pabelanger	mordred: I think we still block that too, but I can test	15:38
mordred	pabelanger: I think that is easy to add zuul_return to a localhost whitelist - I think it might be harder to whitelist zuul_return for delegate_to - but I might be wrong and it might be the exact same work	15:38
pabelanger	mordred: yah, doesn't need to be delegate_to, hosts: localhost would be fine	15:40
mordred	(in any case, I think it's appropriate to allow use of zuul_return on localhost)	15:40
mordred	corvus: ^^ yeah?	15:40
Shrews	tobiash: corvus: Am I correct in thinking that pools within the static driver are essentially useless? Because the name for a node in the config must be unique (hostname or IP based on https://zuul-ci.org/docs/nodepool/configuration.html#static-driver), we really don't need the pool concept	15:51
pabelanger	mordred: yah, hosts: localhost same issue. But happy to use that syntax if we are to whitelist zuul_return	15:52
mordred	pabelanger, corvus: we could also write zuul_return as an action plugin instead of a module so that it's just inherently always something that runs on localhost	15:54
tobiash	pabelanger: delegate or directly localhost doesn't make any difference regarding whitelisting	15:56
tobiash	That probably just needs whitelisting in normal.py	15:57
*** myoung is now known as myoung\|biaf		15:58
tobiash	But as mordred said action plugin would be universal and you wouln't have to take care of any delegate or localhost	15:58
pabelanger	yah, I like that actually	15:58
tobiash	Shrews: I think you're right	15:59
*** weshay is now known as weshay\|ruck		16:00
*** myoung\|biaf is now known as myoung		16:03
*** rlandy\|rover is now known as rlandy		16:06
*** myoung is now known as myoung\|biaf		16:07
corvus	mordred: i agree with all the things you said	16:08
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897	16:10
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897	16:10
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Pre-register static nodes https://review.openstack.org/574897	16:13
corvus	mordred, tobiash, clarkb: i've started local debugging of the include issue: http://paste.openstack.org/show/723399/	16:15
corvus	that looks like it's making the start callback in the order i'd expect	16:16
clarkb	corvus: ok, I'm sort of distracted, but if I can help let me know	16:16
clarkb	corvus: check the id() of the task object maybe?	16:16
corvus	yeah, that's where i'm going next	16:17
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Clarify static node name uniqueness in docs https://review.openstack.org/575159	16:18
tobiash	Shrews: Just thought a second time about uniqueness of static nodes	16:24
tobiash	Shrews: we haven't thought that there can be multiple static nodes be mapped behind a single ip address but different ports (nat use case)	16:24
Shrews	tobiash: that sounds like new functionality. i'll leave that for someone else to add ;)	16:25
tobiash	I think we had this already working	16:26
tobiash	but not sure anymore	16:26
*** dmsimard has joined #zuul		16:26
tobiash	we at least had already nodes behind a nat gateway	16:26
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool master: Clarify static node name uniqueness in docs https://review.openstack.org/575159	16:26
Shrews	tobiash: hmm	16:27
tobiash	maybe that just magically worked with the dynamic driver?	16:27
tobiash	I think that just didn't care about uniqueness	16:28
Shrews	tobiash: the old _checkConcurrency() method didn't take port into consideration. i'd be surprised if that worked	16:28
Shrews	and if it did, maybe accidentally	16:28
tobiash	but it's just a list with two nodes with same address and same label	16:28
Shrews	nothing explicitly pointed it out	16:28
tobiash	it's walking down the list until the first free node	16:28
*** myoung\|biaf is now known as myoung\|lunch		16:29
corvus	can we split name/ip? name is required to be unique, but ip isn't?	16:29
tobiash	name/hostname?	16:30
tobiash	or would ip also accept a dns name?	16:30
tobiash	but that split sounds good	16:30
tobiash	just checked our production config and we don't use that nat gateway anymore, so that at least wouldn't break me immediately	16:31
*** hasharAway is now known as hashar		16:34
*** ianychoi has quit IRC		16:34
*** ianychoi has joined #zuul		16:35
corvus	http://paste.openstack.org/show/723404/ the task executor does in fact get a different task object the second time	16:55
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173	16:57
pabelanger	corvus: tobiash: mordred: ^first attempt at whitelisting zuul_return	16:57
corvus	i've confirmed that the iterator in linear.run() gets two different task objects for the failing task	17:04
tobiash	pabelanger: commented	17:05
corvus	i think that means we need to understand _get_next_task_lockstep	17:05
pabelanger	tobiash: thanks, let me figure that out	17:08
pabelanger	any idea how to get the executor-lib path from ansible?	17:14
pabelanger	eg: /tmp/tmpag22d2js/zuul-test/executor-lib/ansible/zuul/ansible/library/zuul_return.py	17:14
clarkb	zuul knows what it is rgiht? is it added as a variable by zuul to the zuul dict?	17:17
tobiash	pabelanger: I think there is no env variable but it writes the library path to the config	17:17
tobiash	pabelanger: so I guess that ansible has some api to get the library path	17:17
clarkb	hrm we only get the work root and stuff below that	17:18
*** myoung\|lunch is now known as myoung		17:19
tobiash	pabelanger: maybe the path cache contains what you want?	17:22
tobiash	https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/loader.py#L23	17:22
tobiash	hm I guess not	17:23
*** jpena is now known as jpena\|off		17:24
mordred	corvus: that's what my ansible PR is about (I just pushed up a second version of it - missed a variable in the first one)	17:29
clarkb	mordred: I'm the wrong person to review that ansible PR but I would be concerned that if all the tasks end up being noops then your callback won't run. This maybe makes sense given its on_start and you never really start the task however it is a behavior change I think	17:31
fungi	okay, this is weird. now the error i'm hitting is that `/etc/ansible/hosts/openstack --list` is broken on puppetmaster.o.o, with a 404 getting raised by https://auth.vexxhost.net/v2.0/tokens	17:34
fungi	er, wrong channel, sorry1	17:35
corvus	mordred: i'm not sure we're on the same page; are you around now?	17:37
*** gtema has quit IRC		17:40
tobiash	mordred: I ran the test case with the ansible from your branch and it crashed ;)	17:41
tobiash	http://paste.openstack.org/show/723407/	17:41
corvus	i need to afk for a bit	17:42
*** electrofelix has quit IRC		17:45
mordred	tobiash: excellent!	17:48
*** yolanda has quit IRC		17:50
mordred	corvus: well, there is a provision for running it on the first task if all the tasks are noop - but also we might not be on the same page as well	17:52
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173	18:03
pabelanger	tobiash: clarkb: ^found a way to get the info	18:03
pabelanger	thanks for the pointer tobiash	18:03
*** openstackgerrit has quit IRC		18:04
*** yolanda has joined #zuul		18:05
*** myoung is now known as myoung\|biab		18:08
*** harlowja has joined #zuul		18:10
corvus	mordred: in the include_role case, there are no no-op tasks	18:17
tobiash	pabelanger: added another comment	18:18
tobiash	pabelanger: lgtm otherwise	18:18
corvus	mordred: it looks like the playbook iterator is actually producing two different task objects for the shell task following the include_role (normally it returns the same task object for both hosts)	18:19
corvus	the play compiler returns a single task object for each task. normally that same object is returned by the iterator and used to run the tasks. but in the case of the task-after-include_role, the task object the compiler produces is not used for either of the hosts. something creates two entirely new tasks. that's as far as i've gotten.	18:22
*** openstackgerrit has joined #zuul		18:23
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173	18:23
pabelanger	tobiash: looking	18:23
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173	18:31
pabelanger	tobiash: okay, that should cover future changes^	18:31
openstackgerrit	Paul Belanger proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173	18:32
tobiash	pabelanger: thanks, +2	18:35
pabelanger	tobiash: yay	18:35
mordred	corvus: ah - gotcha. thank you - that makes sense -- or, at least, the words you said do	18:36
tobiash	mordred: note that the error only happens if you have an include_role task before	18:36
tobiash	so that seems to have some weird sort of side effect	18:36
pabelanger	mordred: clarkb: corvus: do you also mind looking at 575173^	18:36
*** elyezer has quit IRC		18:40
corvus	pabelanger: will be happy to after i'm done with the ansible error	18:43
clarkb	pabelanger: is the default module path not where we install external job defined modules too?	18:45
pabelanger	clarkb: no, it should be tmp/tmpag22d2js/zuul-test/executor-lib/ansible/zuul/ansible/library for all jobs, which is only used by zuul I think	18:48
clarkb	pabelanger: where do other modules get installed?	18:49
clarkb	or maybe that isn't implemented yet?	18:49
pabelanger	clarkb: I don't think they do, they are loaded directly from their src_root path	18:49
clarkb	got it	18:50
clarkb	and those paths are not added to the default module path?	18:50
clarkb	we have to tell ansible how to find them somehow and it seems like the moduel path in config is how one might do it	18:50
pabelanger	right, that is the value we set at https://git.zuul-ci.org/cgit/zuul/tree/zuul/executor/server.py#n1400	18:50
corvus	mordred, tobiash: https://github.com/ansible/ansible/blob/devel/lib/ansible/playbook/included_file.py#L150	18:54
corvus	mordred, tobiash: https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/strategy/linear.py#L331	18:55
mordred	corvus: ooh	18:56
corvus	the linear strategy processes the results of the include_role task, which causes new copies of subsequent(?!) tasks to be made... apparently	18:56
mordred	that ...	18:57
tobiash	interesting	18:59
*** hashar is now known as hasharAway		19:00
corvus	it should only be copying the include_role task	19:03
corvus	so i'm not sure how the block above it gets copied	19:03
corvus	mordred, tobiash: we're deep into ansible spaghetti here. do we need to pause and regroup and see if we should try to solve this another way?	19:06
mordred	corvus: yes	19:06
tobiash	corvus: you mean maybe switching to the alternative log streaming solution?	19:07
corvus	i know of two alternate ideas so far:	19:08
corvus	1) clarkb suggested we could use the task uuid itself (i think that should generally work, except we need to add an inventory host uniqueness to it, in case the same task is run on the same physical host under two different inventory hosts)	19:08
corvus	2) the unix socket log streaming. i'm not sure if we've thought about the interaction of that with containers, so if we go down that road, we should give that at least some quick thought.	19:09
mordred	I do not believe 1 will work - we run in to the same issue	19:10
mordred	(I investigated 1 for a while yesterday)	19:10
corvus	ah, i guess the task uuid isn't automatically in the module args?	19:10
mordred	yah - that is the issue	19:10
corvus	mordred: however, the issue doesn't seem to be that on_task_start isn't being called, it is, it's just getting a different task every time	19:11
mordred	it turns out that ansible provides VERY minimal things to module args by default ... basically nothing	19:11
corvus	mordred: so if we update the existing code to do what it's doing now, except, instead of generating the uuid, it just uses task._uuid.... maybe that will work?	19:11
mordred	corvus: right- and with the linear strategy, we only get a task_start once per task/host pair	19:11
corvus	mordred: oooh i see	19:12
corvus	so indeed it won't be called on the second one	19:12
mordred	exactly	19:12
mordred	same problem as the thing I was chasing, even though the mechanism you found for producing that problem is different	19:12
corvus	oh, i have a meeting now, have to run. biab	19:13
*** myoung\|biab is now known as myoung		19:15
*** gtema has joined #zuul		19:27
*** gtema has quit IRC		19:57
clarkb	is there a tldr on how event processing works for github triggers? because I have a ~40 minute old event that is logged as being sent in github, logged in zuul web, but nothing from the scheduler doing anything with it	20:30
clarkb	reading the code the X-Github-Delivery field is what I should grep for	20:37
clarkb	but I don't see any of those logs in the scheduler at all	20:38
clarkb	weird now it shows up in there	20:40
clarkb	I guess zuul was just behind on event processing	20:40
clarkb	I found a bug /me makes a fix	20:46
openstackgerrit	Clark Boylan proposed openstack-infra/zuul master: Log github delivery ids properly https://review.openstack.org/575231	20:56
corvus	mordred: so regarding the domain socket logging -- does this impact future container work at all?	20:58
corvus	it shouldn't be an issue with k8s_raw module, nor would it be for a container that ran ssh that we just used with the ssh connection plugin	21:00
corvus	that leaves the kubectl connection plugin	21:00
clarkb	corvus: mordred what is the requirement for the logger? that it get a unique string for each task? could we just generate a uuid on the remote side when it starts logging to have that?	21:01
clarkb	basically does this have to come from the control side of ansible or can we just make up a number	21:02
corvus	the current system requires running zuul_stream on the worker. even if kubectl started zuul_stream, ansible isn't going to be able to connect to it without setting up ingress	21:02
corvus	so i think that means that the current system won't work with kubectl, so switching to domain sockets won't be a regression	21:03
corvus	mordred: does that sound right?	21:03
*** openstackgerrit has quit IRC		21:04
corvus	mordred: sorry, s/zuul_stream/zuul_console/ in that ^	21:04
corvus	clarkb: the zuul_stream callback plugin needs to know the ID that the command module will use on the remode side	21:05
corvus	clarkb: so today, that's handled by the zuul_stream callback plugin generating an id, sticking it in the arguments for the command task, then using that id internally to start up the stream receiver and request the log stream	21:06
corvus	clarkb: any piece of information that's shared between the callback plugin on the local side and the command module on the remote side could be used. the point mordred made earlier is that there isn't really any shared info there by default. we have to add it ourselves. and adding it is proving problematic.	21:07
clarkb	gotcha because it is an explicit request for a specific file from the streamer	21:07
corvus	clarkb: yep. and it's an explicit request because we have to keep supplying old files, because the remote side can complete a command, and the local side can start a new one, all before the local side has finished receiving the data from the earlier command. in other words, they can overlap.	21:09
clarkb	could we use the task name for that instead?	21:09
corvus	they aren't unique	21:09
clarkb	normalize it to fs safe string, both sides have that info iirc	21:09
clarkb	oh for some reason I thought they were	21:09
corvus	clarkb: but still, the command module doesn't have it.	21:10
corvus	the only information the command module has is the arguments passed to it.	21:10
corvus	clarkb: this is what we have to work with on the remote side: http://paste.openstack.org/show/723414/	21:11
clarkb	hrm	21:11
corvus	i'm starting to think we need to revert to 2.3	21:13
clarkb	could we provide an override of the linear strategy module and call our callback on every task?	21:14
clarkb	seems like the issue is it is only called the once	21:14
clarkb	but if it were called each time then copies of tasks would be ok?	21:14
corvus	clarkb: that might work	21:14
clarkb	if ^ doesn't work then ya I'm stumped as to fixing in short period of time	21:15
corvus	probably best to think of that as a short term fix	21:15
clarkb	yup	21:15
corvus	(we've had users unable to make progress on their work since friday, so this is becoming a serious issue for us)	21:16
clarkb	I'm not really in a great position to write and tset ^ either	21:19
clarkb	I think a revert is likely doable as I doubt many have updated to ansible 2.5 specific code	21:19
corvus	clarkb: i'm informally testing that idea real quick	21:20
*** hasharAway has quit IRC		21:22
corvus	clarkb: yeah, i think that might work	21:28
corvus	clarkb: we just add this to zuul_stream: http://paste.openstack.org/show/723415/	21:28
corvus	clarkb: and then add that callback after the v2_on_task_start callback in the linear strategy plugin	21:29
corvus	(except always call it)	21:29
clarkb	outside of the block that checks ya that	21:29
corvus	it passes my local testing with my hacked up copy of ansible	21:29
corvus	i'll try to form that into a real patch	21:30
clarkb	cool I can review it when its up	21:30
*** openstackgerrit has joined #zuul		21:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487	21:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy https://review.openstack.org/575248	21:54
corvus	it occurred to me that you probably can't see the changes to linear.py easily.. i can split that into two changes real quick	21:55
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2) https://review.openstack.org/575248	21:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487	21:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2) https://review.openstack.org/575250	21:58
corvus	clarkb, mordred, tobiash: ^	21:58
clarkb	what calls import ansible stretgey plugin? (sorry reviewing over sandwich so that isnt exact name)	22:03
*** harlowja has quit IRC		22:11
*** harlowja has joined #zuul		22:12
corvus	clarkb: nothing; it's left over from tobiash's change. it would be necessary if we needed to import the real plugin.	22:18
corvus	but in this case, the method we're changing is most of the plugin anyway, so i just copied id.	22:19
corvus	it	22:19
*** myoung is now known as myoung\|off		22:20
*** harlowja has quit IRC		22:21
clarkb	corvus: https://review.openstack.org/#/c/575248/2/zuul/ansible/paths.py that is leftover?	22:27
clarkb	oh I see its there to allow us to get at the real one	22:27
clarkb	but as you say we just copy the whole thing	22:27
corvus	yeah, we could remove it since it's unused; i just left it there to ride along with the rest of the framework.	22:28
mordred	corvus: yes - I think what you said about domain sockets not being a regression is correct	22:34
mordred	corvus: and yes- I think temporarily overriding the whole strategy should give us time to figure out a proper fix	22:36
clarkb	corvus: the proposed stack lgtm as short term fix	22:36
clarkb	I've +2'd those change that needed reviews	22:36
mordred	since, it turns out - this one is hairy	22:36
corvus	mordred: cool, so we'll try to land 575248 for now, and then work on the domain socket as the "real" solution?	22:36
mordred	yes	22:37
mordred	I think that's the best bet	22:37
corvus	oh, 575248's parent isn't going to pass tests because i didn't put the flake8 note in there	22:37
corvus	i'll fix that in just a sec	22:37
corvus	mordred: now the real question: what do we need to do to progress the domain socket work?	22:38
mordred	also - I thnik for kubectl, we might be able to get the console stream from the k8s api	22:38
mordred	since it has a console output endpoint	22:38
clarkb	that won't give you the ansible output though	22:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2) https://review.openstack.org/575250	22:38
clarkb	will it?	22:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2) https://review.openstack.org/575248	22:38
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487	22:38
clarkb	seems like that is the wrong layer	22:38
mordred	clarkb: oh - you're right - that won't work	22:39
mordred	corvus: I think it might be helpful to have a phone call to figure out what to do to finish the domain socket work	22:39
corvus	mordred: have time now?	22:40
mordred	I think it's a bit late for my brain to dive in fully - how about first thing your morning? (that way I can go re-read patches and remind myself where we are)	22:40
corvus	sounds good... we can use the conf bridge and clarkb and tobiash (and others) can join if they are around. and whatever we come up with we can summarize at the end	22:41
clarkb	I can do 8am tomorrow	22:41
clarkb	pacific	22:41
corvus	wfm	22:42
corvus	looks like https://review.openstack.org/541434 is the bottom of the stack in question	22:42
mordred	++	22:42
mordred	yes - that's it	22:43
clarkb	do we want to go ahead and try to merge this current stack nowish? (part of the issue here is node contention with tripleo...)	22:44
clarkb	we could direct enqueue to the gate to get zuul stuff in then do executor restarts	22:44
corvus	clarkb: yeah, i think that's a reasonable course of action	22:45
mordred	corvus, clarkb: I've +2'd or +3'd all of them	22:46
clarkb	corvus: mordred https://review.openstack.org/#/c/574786/1 and https://review.openstack.org/#/c/574643/4 just need to be approved	22:46
clarkb	then we can enqueue to gate I think	22:47
clarkb	I'll let you approve those since you have reviewed them previously	22:47
*** threestrands has joined #zuul		22:47
openstackgerrit	James E. Blair proposed openstack-infra/zuul master: Allow zuul_return in untrusted jobs https://review.openstack.org/575173	22:48
clarkb	corvus: has -2 on that first one	22:48
corvus	clarkb: gtg	22:49
clarkb	ok I'll do the enqueue now	22:50
corvus	mordred: thanks i was about to ask you to look at 575173 if you had time :)	22:51
clarkb	ok enqueued	22:53
clarkb	corvus: mordred https://review.openstack.org/575231 that fixes an opsy debugging issue I found today	22:53
mordred	clarkb: that's super weird - and we should talk to jlk about fixing it in the github library	22:54
corvus	clarkb: yeah, i believe all the headers are lowercased now because "different web frameworks handle things differently"	22:54
mordred	oh - wait - nevermind ... that's a thing we construct	22:54
clarkb	I decided to grep on the prefix of that log string and found a bunch of Nones	22:54
mordred	python requests has a caseinsensitivedict that it uses for headers - and I was mistakenly thinking this was based on it	22:54
clarkb	which explained my initial debug confusion of why doesn't this show up in the log	22:54
mordred	yah	22:55
corvus	yeah, i think this is partly because of the fact that we ship the headers over the gearman bus	22:55
mordred	yah	22:55
mordred	we _could_ reconstruct them into a CIdict when we pull them out of that args - but that seems like work	22:56
corvus	now all our uses are lower case :)	22:56
mordred	>>> import requests.structures	22:57
mordred	>>> f = requests.structures.CaseInsensitiveDict({'foo': 'bar'})	22:57
mordred	>>> f['Foo']	22:57
mordred	'bar'	22:57
mordred	just for reference	22:57
clarkb	https://zuul.openstack.org/stream.html?uuid=bae09735e26f44ed985e53b7b792faea&logfile=console.log zuul running its first kata test that didn't die early due to lack of nested virt	22:59
clarkb	looks like its still broken but all of the other tests were too, so not sure that says anything	22:59
jlk	so, is there something I need to fix or...?	23:01
clarkb	jlk: I don't think so. we just have to handle the lower case version internally since that is what it gets normalized to	23:04
jlk	okay	23:05
openstackgerrit	Merged openstack-infra/zuul master: Fix broken command tasks in handlers https://review.openstack.org/574786	23:13
openstackgerrit	Merged openstack-infra/zuul master: Fix command tasks with free play strategy https://review.openstack.org/574787	23:13
openstackgerrit	Merged openstack-infra/zuul master: Add blocks to the zuul stream test https://review.openstack.org/574643	23:13
openstackgerrit	Merged openstack-infra/zuul master: Temporarily override Ansible linear strategy (1/2) https://review.openstack.org/575250	23:13
corvus	other changes are still running tests	23:14
corvus	but they've both passed at least one of the py35 / py36 jobs	23:15
openstackgerrit	Merged openstack-infra/zuul master: Temporarily override Ansible linear strategy (2/2) https://review.openstack.org/575248	23:21
openstackgerrit	Merged openstack-infra/zuul master: Add test for shell after include_role https://review.openstack.org/574487	23:21
clarkb	ok I think that those are the changes we wanted for restarting	23:21
* clarkb migrates conversation to -infra		23:21
openstackgerrit	Merged openstack-infra/zuul master: Log github delivery ids properly https://review.openstack.org/575231	23:24

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!