Thursday, 2017-07-20

mordred	pabelanger: for zuul-jobs since the consumption is intended to be via git rather than releases I'm not sure if reno would be a win - however, I like reno a lot - so maybe it still would be?	00:00
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix change history cycle detection https://review.openstack.org/485368	00:02
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add project info to JobDirPlaybook https://review.openstack.org/485273	00:02
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Make playbook run meta info less fragile https://review.openstack.org/485284	00:02
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add spacer after playbook stats rather than before playbook https://review.openstack.org/485285	00:03
pabelanger	mordred: ya, I like the idea of reno too	00:03
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Switch from tox-linters to tox-pep8 https://review.openstack.org/485379	00:04
mordred	pabelanger: ^^ :)	00:04
pabelanger	+2	00:05
mordred	pabelanger: I'm going to restart the executor to pick up the logging format hanges	00:09
mordred	changes	00:09
mordred	not hanges	00:09
mordred	pabelanger: also - if you have a sec, ould you look at https://review.openstack.org/#/c/485345/ ?	00:10
mordred	pabelanger: jeblair and Shrews +2'd it earlier, but I had a fix a puppet apply test issue	00:10
pabelanger	mordred: +2, feel free to +3 is you get nobody else	00:11
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Gather facts for tox/run playbook https://review.openstack.org/485380	00:16
pabelanger	remote: https://review.openstack.org/483987 Create openstack-py35 job with upper-constraints	00:17
pabelanger	mordred: so those 2 are interesting ^. and think we need to gather_facts for all our playbooks in zuul-jobs by default	00:17
mordred	pabelanger: whyfore? what are we missing?	00:21
* mordred looks at patch		00:21
pabelanger	mordred: i'm trying to pass ansible_user_dir into an environmental variable	00:21
mordred	pabelanger: and we don't have that one without gather_facts?	00:22
pabelanger	right, facts are disabled by default	00:22
pabelanger	in ansible.cfg	00:22
pabelanger	so we always have to opt into them	00:22
mordred	yup	00:23
mordred	pabelanger: I just plopped in a comment about gather_subset ...	00:23
pabelanger	http://logs.openstack.org/27/484027/5/check/openstack-py35/a18c5ba/job-output.txt passes	00:23
mordred	pabelanger: actually - we should have it there	00:24
pabelanger	looking	00:24
mordred	pabelanger: because unittests ... oh, nevermind	00:24
mordred	it'sunittests _pre_ that has a gather_facts	00:24
jeblair	pabelanger: you're losing me on that. would you mind explaining the whole problem in the commit message please?	00:24
mordred	http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/playbooks/unittests/pre.yaml	00:24
pabelanger	jeblair: sure I can go into more detail	00:24
mordred	jeblair: https://review.openstack.org/#/c/483987/9/zuul.yaml for context	00:27
jeblair	okay, so that's fallout from dropping zuul_workspace_dir	00:28
mordred	yah	00:28
pabelanger	right, ansible_user_dir would be the same thing	00:29
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Gather facts for tox/run playbook https://review.openstack.org/485380	00:30
pabelanger	okay, some more details	00:30
jeblair	to be fair, hardcoding /home/zuul there, or in a job variable would be about as valid. but i feel like the problem may be likely to occur in a less openstack-specific way in the future.	00:30
jeblair	is it worth considering adding zuul_workspace_dir back? or should we focus on making the facts approach better?	00:31
jeblair	can we cache facts across playbook runs?	00:31
mordred	yah. we could switch for gather: false in ansible.cfg to on but with gather_subset: !all in ansible.cfg	00:31
jeblair	mordred: seems like if the bulk of our jobs are doing that anyway (and now are about to twice), that may be a win.	00:31
mordred	gathering = smart	00:32
pabelanger	It is actually an interesting question, aside from the preformance issues (I am not sure how bad it is) what is the downside of having ansible variables by default to jobs?	00:32
mordred	fact_caching = jsonfile	00:32
jeblair	(and who knows, the devstack jobs may end up doing it too, we just haven't gotten there yet)	00:32
mordred	fact_caching_connection = /tmp/facts_cache	00:32
jeblair	pabelanger: i think the reason we turned facts off was performance issues?	00:33
mordred	pabelanger: yah its an extra hit on each play - but in the zuul context I think we could turn on jsonfile fact_caching and it'll be likely pretty fine	00:33
pabelanger	ya, I am not sure why we disabled actually. But cachine of facts does seem like a nice idea	00:33
pabelanger	caching*	00:33
jeblair	mordred: any concern with having the fact cache be in the untrusted writable work dir?	00:34
jeblair	i guess if a trusted post-playbook relied on a fact, an untrusted main playbook could have overwritten it beforehand...	00:36
jeblair	so caching may be tricky	00:36
mordred	jeblair: we should see what things it caches	00:36
* mordred tests		00:36
pabelanger	ya, doing the same	00:37
mordred	it does not cache facts set with set_fact	00:38
mordred	so it's only a cache of facts that are gathered by the system	00:38
pabelanger	ya, that make sense	00:38
jeblair	mordred: but someone could still directly write the file	00:38
pabelanger	fact_caching_connection is a local filesystem path to a writeable directory, so we'd have it outside the src directory on executor?	00:40
jeblair	mordred: oh, i guess we can stick it in jobroot/ansible and that should work	00:40
pabelanger	can brwap write to jobroot/ansible?	00:41
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384	00:42
mordred	jeblair, pabelanger: I was thinking like that ^^	00:42
jeblair	pabelanger: yeah, all of jobroot is writable inside of bwrap. the only protection for inside of jobroot but outside of jobroot/work is the ansible module path checks	00:42
pabelanger	jeblair: cool, had to look it up to see	00:43
jeblair	so if you escape ansible, you can't hose the executor, but you can alter the code that's about to run in trusted post playbooks.	00:43
pabelanger	mordred: think you have a typo	00:43
pabelanger	%/.ansible	00:43
mordred	probably	00:43
pabelanger	%s/.ansible	00:43
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384	00:44
jeblair	mordred: do you mean to use ".ansible/" rather than "ansible/"?	00:44
mordred	jeblair: yah - I figured sticking it in the same place as local_tmp and remote_tmp since it's ultimately ephemeral data	00:45
mordred	although I guess I could make it be facts_tmp to math those dir names ...	00:45
jeblair	mordred: i don't think facts_tmp is necessary	00:46
mordred	kk	00:46
jeblair	(the other options are literally "remote_tmp")	00:47
pabelanger	+2	00:47
mordred	jeblair: while I've got you here: https://review.openstack.org/#/c/485345/ and https://review.openstack.org/#/c/485379/ could both use a quick +A	00:48
jeblair	done	00:48
jeblair	i'm going to eod now	00:49
pabelanger	mordred: have I also mentioned this is pretty fun	00:51
mordred	pabelanger: ++	00:52
mordred	pabelanger: also - https://review.openstack.org/#/c/262597/ is the reno patch I made for zuul forever ago	00:52
pabelanger	cool, I'll look into it in the morning	00:53
pabelanger	EOD now	00:53
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Switch from tox-linters to tox-pep8 https://review.openstack.org/485379	00:56
*** jkilpatr has quit IRC		01:11
mordred	http://logs.openstack.org/27/484027/5/check/openstack-py35/8f780a0/job-output.txt latest logging updates applied	01:18
*** harlowja has quit IRC		01:32
clarkb	https://blog.sileht.net/automate-rebasing-and-merging-of-your-pr-with-pastamaker.html may be of interest to the channel	02:08
mordred	indeed	02:10
*** harlowja has joined #zuul		02:52
mordred	clarkb: the promise of github seems to be allowing every dev team to write their own PR bot	03:22
clarkb	right	03:23
SpamapS	It's the yak every dev team is dying to shave.	03:44
*** harlowja has quit IRC		04:18
*** harlowja has joined #zuul		04:30
openstackgerrit	Merged openstack-infra/zuul-jobs master: Rework tox -e linters https://review.openstack.org/484836	05:14
*** isaacb has joined #zuul		05:16
tobiash	sounds like they actually would need BonnyCI...	05:24
*** harlowja has quit IRC		05:33
*** isaacb has quit IRC		06:13
*** isaacb has joined #zuul		06:24
*** bhavik1 has joined #zuul		06:30
*** amoralej\|off is now known as amoralej		07:26
*** hashar has joined #zuul		07:41
*** yolanda has quit IRC		07:58
*** clarkb has quit IRC		08:04
*** clarkb has joined #zuul		08:05
*** isaacb has quit IRC		08:31
*** isaacb has joined #zuul		08:32
*** yolanda has joined #zuul		08:53
*** yolanda has quit IRC		08:54
*** yolanda has joined #zuul		08:55
*** smyers has quit IRC		09:04
*** smyers has joined #zuul		09:18
*** bhavik1 has quit IRC		10:27
*** xinliang has quit IRC		10:29
*** xinliang has joined #zuul		10:41
*** xinliang has quit IRC		10:41
*** xinliang has joined #zuul		10:41
*** jkilpatr has joined #zuul		11:02
*** amoralej is now known as amoralej\|lunch		11:04
*** amoralej\|lunch is now known as amoralej		12:24
*** dkranz has joined #zuul		13:18
pabelanger	mordred: nice work on console logs! One request, LOOP [Collect tox logs] in your job-output.txt above, can we prettyprint the dict item to make it easier to read the output?	13:38
dmsimard	pabelanger, clarkb, mordred: did you guys just flip the switch when you moved from jenkins to zuul 2.5 ? I forget. Was there a weird period where both of them were running side by side ? We're wondering how to make sure Zuul and Nodepool don't get confused on which target the job is meant to run	13:39
dmsimard	tristanC: ^	13:39
pabelanger	dmsimard: no, we stopped jenkins and started launchers IIRC. Don't think we ever had both running at same time	13:40
pabelanger	dmsimard: are you seeing an issue?	13:41
pabelanger	mordred: also, I looked like we are missing ok between some tasks:	13:41
pabelanger	2017-07-20 01:04:44.850458 \| TASK [bindep : Look for other-requirements.txt]	13:41
pabelanger	2017-07-20 01:04:44.871528 \| TASK [bindep : Define bindep_file fact]	13:41
pabelanger	would have expected to see ok or skipped there	13:41
dmsimard	pabelanger: we intended to run jenkins and zuul-launcher side by side to allow for a transition period but it looks when both targets are enabled, zuul and nodepool get confused about which node is requested and allocated	13:42
dmsimard	pabelanger: I can forward you an email since I haven't personally witnessed this	13:43
dmsimard	sent	13:43
pabelanger	right, different image names would be one way to fix that	13:43
dmsimard	pabelanger: that's what I thought too, see internal irc sf channel backlog	13:45
pabelanger	but ya, I wouldn't expect both running at the same time to be good. Going back into irc logs to see what we did. But, I seem to remember we did the roll out dance a few times	13:45
pabelanger	dmsimard: where is your nodepool.yaml file?	13:45
dmsimard	pabelanger: mixture of https://softwarefactory-project.io/r/gitweb?p=config.git;a=blob;f=nodepool/nodepool.yaml;h=ee8c998c4c80b9de8ddf3e5891671ed16f880c9b;hb=HEAD	13:46
dmsimard	and https://softwarefactory-project.io/r/gitweb?p=software-factory/sf-config.git;a=blob;f=ansible/roles/sf-nodepool/templates/_nodepool.yaml.j2;hb=HEAD	13:46
dmsimard	(it's mashed by software factory when running the nodepool config update job)	13:46
dmsimard	this is what nodepool list is giving: https://softwarefactory-project.io/paste/show/xLmGA1UyA1bfi4fGUT2h/	13:47
pabelanger	dmsimard: Ya, I'd just home your images to either jenkins or launcher for now	13:48
pabelanger	not across both	13:48
dmsimard	pabelanger: tristanC says that wouldn't work, see #softwarefactory	13:48
pabelanger	then setup a new cloud provider	13:49
pabelanger	and have jenkins and zuulv2.5 only access one or other	13:49
dmsimard	pabelanger: hmm, how would we go about that ?	13:49
pabelanger	both could use the same project, just different names in nodepool.yaml. Like we do for osic-cloud1	13:50
dmsimard	like, I know how to setup additional providers but how to get only jenkins to consume this cloud and zuul the other ?	13:50
pabelanger	dmsimard: look at osic-cloud1-s3500 and osic-cloud1-s3700 in nodepool.yaml, both are using the same cloud, project	13:50
pabelanger	Oh, hmm	13:51
tristanC	pabelanger: iiuc the issue is because when there are node ready, nodepool doesn't check if they are correctly assigned to the requesting target	13:51
tristanC	pabelanger: even if in the logs it seems to correctly create server and assigned them to the right target when requested, we ends up with lots of node ready in the wrong target	13:52
pabelanger	ya, guess you'd need a 2nd nodepool too	13:53
pabelanger	which didn't know about jenkins	13:53
pabelanger	maybe jeblair or clarkb have an idea on how to work around that with gearman	13:53
tristanC	pabelanger: yes, that would work too	13:54
tristanC	dmsimard: iirc, openstack-infra was running jenkins and zuul-launcher with the same set of jobs	13:54
pabelanger	https://review.openstack.org/#/c/325992/	13:59
pabelanger	tristanC: ya, you are right, we did run both in production	14:02
pabelanger	I'm remembering now	14:02
pabelanger	however, we wanted jobs to run on both jenkins and zuul-launcher	14:02
pabelanger	as not to interrupt the gate	14:02
pabelanger	I don't think we ever pinned a job just to launcher or jenkins	14:03
pabelanger	because we wanted to ensure the migration between launcher and jenkins worked as expected	14:03
pabelanger	since both were JJB, they both should run on both	14:03
pabelanger	dmsimard: so looping back, why do you want a job to only run on launcher?	14:04
dmsimard	pabelanger: I'm a scaredy cat of flipping the switch 100% from jenkins to launcher without the opportunity to test things first	14:05
pabelanger	dmsimard: right, so you can use both jenkins and launcher per tristanC comment, and make sure everything is 100% across both	14:06
dmsimard	pabelanger: and we're in a bit of a time crunch from a lot of angles so if we don't get that transition period where both are running side by side, we will need to delay the upgrade	14:06
dmsimard	pabelanger: well, running side them side by side was the plan, but this whole discussion is about that not exactly working as intended	14:07
pabelanger	dmsimard: more JJB jobs into another SF instance and test them there first?	14:07
pabelanger	but ya, no easy migration path for this	14:08
dmsimard	pabelanger: requires time and it's a precious resource right now	14:08
dmsimard	especially with folks on PTO left and right	14:08
pabelanger	it took us about 4 weeks to migrate to zuul-launchers and all hands were on deck	14:09
pabelanger	don't want to rush it	14:09
dmsimard	yeah, that is what I am thinking as well -- RDO's deployment is not nearly openstack-infra scale but we still have a considerable amount of jobs, some that are fairly complicated	14:10
dmsimard	so I'm not confident in just flipping the switch	14:10
dmsimard	We'll discuss our options, thanks	14:10
*** isaacb has quit IRC		14:30
jeblair	yeah, we added zuul into the mix, so with 8 jenkins masters and one zuul launcher, zuul ran 1/9 of the jobs.	14:44
jeblair	if we saw failures, we turned off the zuul launcher, fixed them and repeated.	14:44
jeblair	then as things got gradually better, we added more zuul launchers and reduced the jenkins masters.	14:45
pabelanger	mordred: jeblair: so smart facts and bubblewrap are broken at the moment. Working on fix, this is because the 'zuul' user doesn't exist on my local system when tox is run	14:58
pabelanger	and the ansible connection for localhost is defaulting to 'zuul' user	14:58
pabelanger	http://paste.openstack.org/show/616030/ is the issue	15:01
pabelanger	this also means we'd be gathering facts on executors	15:02
jeblair	why does it default to 'zuul' for localhost?	15:04
pabelanger	jeblair: not sure yet. trying to figure that out	15:05
jeblair	pabelanger: i think this could cause problems for folks running zuul-executor as a different username than, well, whatever thing is causing it to use 'zuul'. :)	15:05
pabelanger	agree, think we exposed a bug	15:05
jeblair	pabelanger: localhost facts probably aren't important; any way to turn it off?	15:05
pabelanger	jeblair: not sure yet, will find out	15:06
Shrews	forgive my ignorance, what are "smart facts"	15:06
Shrews	?	15:06
pabelanger	ansible will scan host, if missing from fact cache	15:07
pabelanger	http://docs.ansible.com/ansible/intro_configuration.html#gathering	15:07
pabelanger	is 'host' is mssing from fact cache	15:07
Shrews	ah	15:08
*** isaacb has joined #zuul		15:13
*** dmsimard is now known as dmsimard\|cave		15:14
pabelanger	jeblair: SpamapS: so, If I try to run the brap command from our tox jobs myself, I get the following error: Can't write data to file /etc/passwd: Bad file descriptor	15:41
pabelanger	any idea why?	15:41
jeblair	pabelanger: zuul creates a new password file as a pipe and hands the fd to bubblewrap. i don't know what could cause a problem with that. can you paste your command and error?	15:46
pabelanger	sure	15:48
pabelanger	http://paste.openstack.org/show/616042/	15:48
pabelanger	error is above is only thing I get back	15:48
jeblair	pabelanger: oh, you're copy/pasting a previously run bwrap command	15:49
jeblair	pabelanger: so you're telling bwrap to use a file descriptor that doesn't exist	15:50
pabelanger	jeblair: ya, I'm trying to reproduce the environment from tox	15:50
jeblair	--file 1000 /etc/passwd"	15:50
pabelanger	ah	15:50
jeblair	pabelanger: try running the bwrap module as main. it has a main method so you can do testing like that	15:50
jeblair	python zuul/drivers/bubblewrap/__init__.py	15:51
jeblair	that should take care of creating the passwd/group files and run the command you give it in bwrap	15:51
pabelanger	jeblair: ya, I did that and smart facts works	15:53
pabelanger	so, was thinking we might be doing something different some how	15:54
jeblair	pabelanger: i ran tests with vvv to get the full tb: http://paste.openstack.org/show/616046/	15:57
pabelanger	jeblair: yup, that is the error I am seeing too from tox	15:57
pabelanger	it looks like zuul isn't in /etc/passwd file	15:58
pabelanger	but trying to confirm	15:58
pabelanger	2017-07-20 12:00:51,174 zuul.AnsibleJob DEBUG [build: 19831828f32042a7bfddba05a2059dd9] Ansible output: b' "stdout": "pabelanger:x:1000:1000:pabelanger:/tmp/tmpaapfgaf3/zuul-test/19831828f32042a7bfddba05a2059dd9:/bin/bash",'	16:01
pabelanger	so ya, it is using my uid	16:01
pabelanger	not zuul	16:01
jeblair	why does ansible want to look up zuul?	16:02
pabelanger	I think we are running ansible-playbook as zuul user some how	16:02
jeblair	pabelanger: as you pointed out, you don't have a zuul user on your system, and there isn't one in the bwrap passwd file either.	16:04
pabelanger	jeblair: Ya, that is what is driving me crazy. I don't know where it is coming from atm	16:05
jeblair	right, that was my question -- what, specifically, causes ansible to do a pwnam lookup on zuul?	16:07
jeblair	the code in ansible is https://github.com/ansible/ansible/blob/stable-2.3/lib/ansible/module_utils/facts.py#L551	16:07
jeblair	so it's getting the name to look up from getpass.getuser()	16:07
pabelanger	Oh	16:09
pabelanger	I see it now	16:09
pabelanger	http://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/executor/server.py?h=feature/zuulv3#n1318	16:10
jeblair	pabelanger: yep. python getpass looks at LOGNAME first	16:10
*** isaacb has quit IRC		16:11
jeblair	the only reason i know of we hardcode logname to user is so that when we used the default ansible logging module to create the job log, it put "zuul" as the username.	16:11
jeblair	pabelanger: we can probably drop that now.	16:11
pabelanger	jeblair: cool, let me try that	16:11
jeblair	since i think that has entirely been superceded by zuul_stream and friends	16:12
jeblair	pabelanger: the hello-world job is failing the playbook test because we're gathering facts locally, and executing local code is prohibited	16:23
pabelanger	ya, that is what I am debuging now	16:23
jeblair	pabelanger, mordred: that error did not make it into the text console log, only the json one. i think that's a bug.	16:23
jeblair	i'll file a story for that	16:24
pabelanger	so, should we allow facts to be gathered on localhost?	16:25
jeblair	no	16:25
pabelanger	k, I'll see how we can stop smart from doing that	16:26
pabelanger	maybe prime the fact cache with something	16:26
jeblair	mordred: for whenever your next cup of coffee is: https://storyboard.openstack.org/#!/story/2001129	16:29
clarkb	pabelanger: tristanC I do not know why you'd have nodes assigned to the wrong target. I can imagine that targets "forgetting" nodes so they leak on the nodepool side is possible though	16:40
*** bhavik1 has joined #zuul		16:50
pabelanger	jeblair: mordred: okay, so if we prime the 'localhost' json file in fact-cache with {"module_setup": true} ansible will not run setup task	17:02
pabelanger	that will stop untrusted playbooks from trying run it on executor	17:02
*** hashar has quit IRC		17:03
pabelanger	and trusted playbooks will not have ansible facts I think	17:03
pabelanger	unless we use a different cache for trusted / untrusted?	17:03
jeblair	pabelanger: i don't think we want anything different between trusted/untrusted	17:04
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Remove hardcoded LOGNAME for ansible-playbook https://review.openstack.org/485749	17:05
pabelanger	jeblair: k, any thoughts about accessing ansible varialbes on executor with trusted playbook?	17:07
pabelanger	I think if the first playbook that runs today is trusted, localhost fact gets created and other playbooks will use it	17:08
pabelanger	however, if untrusted is first, localhost fact cache will fail	17:08
pabelanger	the other intereting thing is, from an untrusted playbook, it does create the localhost json file in fact-cache, even though job fails	17:18
SpamapS	jeblair: FYI, you don't need to find the bwrap driver to run its main method. it has a CLI entrypoint called zuul-bwrap :)	17:21
jeblair	SpamapS: oh right! i should put a comment in there to remind myself. :)	17:24
jeblair	pabelanger: i thought you were looking at pre-populating the cache with something so it doesn't run at all, and we would have the same behavior in both places.	17:25
pabelanger	jeblair: right, we can do that. but means we'd never be able to use ansible varibles for a playbook on executor	17:26
pabelanger	want to make sure we are okay with that	17:27
jeblair	pabelanger: until someone comes up with a use case for it, i'm okay with it.	17:27
pabelanger	ok	17:28
* SpamapS sets out to write a disk monitor thing		17:32
*** artgon has joined #zuul		17:43
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384	17:49
pabelanger	mordred: jeblair: ^ should be the fix	17:49
*** amoralej is now known as amoralej\|off		17:59
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329	18:00
jeblair	pabelanger: lgtm, thanks. let's see what mordred says when he awakens.	18:02
*** harlowja has joined #zuul		18:06
*** bhavik1 has quit IRC		18:35
*** amoralej\|off is now known as amoralej		18:59
Shrews	Has anyone started poking at the zuul auto-hold stuff yet? I might start looking into it if not.	19:05
*** isaacb has joined #zuul		19:20
jeblair	Shrews: not that i'm aware of. there's a story for it with a plan i think should work. and i think if you assign it to yourself, you win. :)	19:28
*** dmsimard\|cave is now known as dmsimard		19:28
pabelanger	++ to auto-hold, would make debugging new jobs easier too	19:29
Shrews	been skirting the boundaries of zuul code thus far, so it would be my first adventure with it. hopefully it won't end up being over my head	19:33
jeblair	Shrews: i'm happy to help with any questions!	19:39
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329	19:49
Shrews	i <3 watching streaming logs	19:53
Shrews	much more readable these days, too	19:53
jeblair	o hrm. the url in the status page isn't switching to the log url when the job is finished. it's sposed to do that.	19:57
*** openstack has joined #zuul		20:01
pabelanger	can we also display finger URL also on status page?	20:01
pabelanger	but steamming is cool	20:02
jeblair	yes we can+should	20:04
*** jkilpatr has quit IRC		20:05
tobiash	jeblair: I also noticed that, but didn't have time yet to look into that	20:06
pabelanger	tobiash: feedback for html stream, have autoscroll happen in you are at bottom of browser, but don't auto scroll if you move up a few lines	20:07
pabelanger	I have no idea how you'd do that with JS	20:07
* tobiash too		20:07
jeblair	pabelanger, mordred, clarkb, SpamapS: i'm working on tidying up things related to non-change jobs. periodic, post, tag, etc. can you take a look at the commit message and the docs for https://review.openstack.org/485329 and let me know if that direction looks good?	20:08
tobiash	pabelanger: I'm a js and html noob so this was poor mans first try on console streaming ;)	20:08
pabelanger	tobiash: well done! much better then I could have done :D	20:09
jeblair	pabelanger, mordred, clarkb, SpamapS: (i know we're going to have jobs where we want to keep using the old shell variables during a transition period. i figure we can have some ansible to do the translations for us and add that to those jobs.)	20:10
*** amoralej is now known as amoralej\|off		20:10
mordred	morning all	20:10
jeblair	mordred: morning!	20:10
pabelanger	jeblair: sure, looking now	20:10
pabelanger	look a mordred	20:10
jeblair	pabelanger, mordred, clarkb, SpamapS: also, the changes to model.py there -- i've got distinct objects for ref, branch, tag, and change now.	20:12
*** isaacb has quit IRC		20:12
*** openstackgerrit has quit IRC		20:17
*** isaacb has joined #zuul		20:19
*** dkranz has quit IRC		20:22
*** isaacb_ has joined #zuul		20:22
*** isaacb has quit IRC		20:25
mordred	jeblair: change looks good in general - but I'm also still waking up	20:32
jeblair	mordred: no takebacks!	20:34
*** dkranz has joined #zuul		20:35
SpamapS	jeblair: I think we'll all go a little less insane if we can change as little as possible about what gets run by jobs.. so yeah, using the old envvars, perhaps for a long time (till zuul4?) is a good idea.	20:39
jeblair	SpamapS: i agree, though my time horizon is perhaps a bit shorter. the bulk of our jobs won't use the old zuul env variables out of the gate (none of the zuulv3 jobs currently do). but yeah, i don't want to be under time pressure to get the long tail cleaned up before ptg.	20:43
*** openstackgerrit has joined #zuul		20:45
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: gzip console log before uploading it https://review.openstack.org/483611	20:45
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Remove hardcoded LOGNAME for ansible-playbook https://review.openstack.org/485749	20:45
SpamapS	jeblair: the ones that use them are also going to be the stickiest nastiest ones that do everything we told them not to do ;)	20:46
jeblair	SpamapS: indeed!	20:47
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Gather facts smartly and cache them https://review.openstack.org/485384	20:47
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Simplify bindep logic removing fallback support https://review.openstack.org/482650	20:48
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Simplify bindep logic removing fallback support https://review.openstack.org/482650	20:49
pabelanger	Yay, new facts merged	20:50
mordred	pabelanger: I abaondoned your gather-facts change for that job	20:51
pabelanger	mordred: ++	20:52
mordred	jeblair: if you get a sec: https://review.openstack.org/#/c/485377	20:52
pabelanger	mordred: caching is also fast too	20:52
pabelanger	tried it a few times	20:52
mordred	there's only 3 open zuul-jobs changes: https://review.openstack.org/#/q/project:openstack-infra/zuul-jobs+status:open	20:52
pabelanger	Yup, I can start back up tox role now	20:53
openstackgerrit	Merged openstack-infra/zuul-jobs master: Remove OOM check from tox role https://review.openstack.org/485377	20:56
pabelanger	mordred: jeblair: I'm thinking we should also remove the 'zero' tests run check. Or is that something we care about for zuul-jobs? Maybe we still want it in openstack?	20:59
jeblair	pabelanger: that one may be good to keep and generally useful.	21:01
pabelanger	jeblair: okay, let me see what the logic would look like	21:02
jeblair	pabelanger: i assume it will have to branch based on testr/nose/etc	21:03
mordred	pabelanger: I think we're probably going to want to have a "did I find a .testrepository? if so, include: testrepository.yaml	21:03
pabelanger	mordred: ya, that's what I am thinking too	21:04
mordred	pabelanger: you know what - that might want to be in its own role too -so that if someone is writing a job that does things but doesn't use tox (like a job to just run testr without tox or something) thatthey could add the "check-zero-tests" role to their playbook	21:05
mordred	(or something)	21:05
pabelanger	Agree	21:06
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Remove nodepool DIB specific logic https://review.openstack.org/485824	21:07
pabelanger	mordred: jeblair: I also wouldn't object if we wanted to restart zuulv3 to pickup smart facts too	21:08
mordred	++	21:09
jeblair	pabelanger: wfm	21:10
jeblair	mordred: speaking of restart... did you see this from earlier? https://storyboard.openstack.org/#!/story/2001129	21:10
jeblair	mordred: i had to read a .json file to figure out what went wrong with a job :(	21:10
jeblair	mordred: (the first play failed but did not log its failure to the job log. the second play also failed -- don't be distracted by that. :)	21:12
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329	21:13
mordred	jeblair: oh goodie	21:19
*** dkranz has quit IRC		21:20
mordred	jeblair: cool. I should be able to reproduce thatlocally pretty easily	21:21
pabelanger	jeblair: should graceful restart work?	21:21
pabelanger	I haven't run the command yet on ze01	21:21
mordred	jeblair: so - unfortunately I actually CAN'T reproduce that locally	21:25
mordred	oh - wait	21:26
mordred	uyup. there it is (I had fact caching turned on - so it wasn't runing)	21:27
mordred	jeblair: turnsit the fact that it was the fact gathering is the relevant thing	21:27
mordred	although there is another thing that's ugly ...	21:28
jeblair	pabelanger: i think so?	21:35
jeblair	pabelanger: but it might run into the pidfile issue on restart	21:36
pabelanger	k, I'll stop when no jobs are running	21:41
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833	21:41
mordred	jeblair, pabelanger: that'll fix the story (gah, I should reference in commit message - one sec)	21:42
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833	21:42
mordred	that should fix the story - and also fix error messages that are just an error message from getting json-displayed	21:43
jeblair	mordred: merge conflicts	21:46
mordred	gah	21:47
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833	21:47
mordred	pabelanger: your item/pprint request is ... harder	21:47
mordred	pabelanger: the problem is that what we're dealing with is that the dict on the ITEM line is the loop key	21:49
mordred	in this case, we're looping over the results from the previous task	21:49
mordred	so there is no "good" thing to display there	21:49
pabelanger	hmm	21:50
pabelanger	k, +3 lets not hold up the patch	21:52
pabelanger	mordred: also, using zuul_stream in tox jobs makes it pretty hard to see what ansible is doing. I had to actually disable it in ansible.cfg today	21:53
pabelanger	I am not sure if we have a better story for that	21:53
pabelanger	eg: tox -epy35 on zuul	21:54
jeblair	pabelanger: i turn on KEEP_TEMPDIRS and go look at the job log.	21:55
jeblair	pabelanger: if we want to make that more automatic, we could attach the job log as a subunit attachment	21:55
mordred	pabelanger: can you expand on that? like - what is it that you were missing? (also, I expect that to be better once I get the json formatting thing done)	21:55
pabelanger	jeblair: ya, I did have keep_tempdirs eventually. But attachment might be nice too.	21:55
mordred	actually - lemme do a couple of things ...	21:56
pabelanger	mordred: it was around the gather facts stuff today. So part of it was ansible traceback, -vvv help but others were shell output is not longer displayed in zuul_stream. So, shell: cat /etc/passwd wouldn't return any data for me	21:57
pabelanger	so, have to stop using zuul_stream and default back to normal ansible output	21:57
mordred	pabelanger: wait - that' sa bug	21:57
mordred	shell output should ABSOLUTELY be displayed in zuul_stream	21:58
pabelanger	Ah, I thought we disabled it for some reason	21:58
mordred	that's like the entire reason it exists	21:58
mordred	is to live-stream shell output :)	21:58
pabelanger	let me test is again here locally with your latest changes	21:58
jeblair	i worry that we're not all on the same page here	21:59
jeblair	so at the risk of stating things people may already know:	21:59
mordred	cool - please do - and yes, any time that you have to disable something, or hold a temp dir or look in the json file to figure out what went wrong with a job, please point that out	21:59
jeblair	i think pabelanger is concerned with running tests on zuul itself. so not a running production instance of zuul, but rather what's happening inside of a unit test.	22:00
mordred	AH. gotcha	22:00
mordred	thank you	22:00
mordred	we were not on the same page	22:00
pabelanger	jeblair: yes, thanks	22:01
pabelanger	ze01 has been restarted	22:04
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Remove gather_facts: true https://review.openstack.org/485834	22:06
*** jkilpatr has joined #zuul		22:07
pabelanger	mordred: we likely can drop validate-host gather fact logic too, and maybe just use the fact_cache files instead?	22:08
mordred	pabelanger: yah - you mean the gather_facts at the top of playbooks/unittests/pre.yaml ?	22:10
mordred	oh - you mean the setup call	22:10
pabelanger	ya: http://git.openstack.org/cgit/openstack-infra/zuul-jobs/tree/roles/validate-host/tasks/main.yaml#n16	22:11
pabelanger	that _should_ be the same as cached facts right?	22:11
pabelanger	we could have a post job to add it into logs folder	22:11
mordred	pabelanger: yup. I agree	22:11
mordred	pabelanger: we can ALSO remote the gather_facts: and gather_subset: lines	22:12
mordred	pabelanger: (we're also doing validate-host in both base and unittests atm)	22:12
pabelanger	mordred: ya, I see that now too	22:12
pabelanger	bug?	22:12
pabelanger	our logs are looking real good now too	22:13
pabelanger	much nicer to debug with	22:13
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Improve display of simple error messages https://review.openstack.org/485833	22:20
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: Remove validate-host from unittests/pre.yaml https://review.openstack.org/485836	22:23
pabelanger	mordred: openstack-py35 now passing on shade: http://logs.openstack.org/27/484027/5/check/openstack-py35/1b33d30/job-output.txt	22:23
pabelanger	mordred: still need to rework the job based on your feedback still	22:24
jeblair	oO	22:24
pabelanger	also, still	22:24
mordred	woot!	22:24
*** dkranz has joined #zuul		22:25
pabelanger	mordred: I would have expected TASK [Gathering Facts] to return ubuntu-xenial \| ok in that log too. But holding out to restart ze01 again with latest changes	22:25
pabelanger	jeblair: did you mention you were working on a fix to logs.o.o on zuulv3.o.o status page after job finishes?	22:28
pabelanger	also, https://review.openstack.org/#/c/483611/ could use a review too. gzip our job-output.txt file	22:29
jeblair	pabelanger: no fix, just observed the issue	22:29
openstackgerrit	Merged openstack-infra/zuul-jobs master: gzip console log before uploading it https://review.openstack.org/483611	22:33
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329	22:37
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	22:49
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Check out implicit branch in timer jobs https://review.openstack.org/485329	22:55
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	22:56
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	23:03
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Check out implicit branch in timer jobs https://review.openstack.org/485329	23:03
jeblair	pabelanger, mordred, SpamapS: ^ that's in final form for review	23:03
jeblair	also, it's 50% docs. :)	23:04
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename tags job variable jobtags https://review.openstack.org/485845	23:06
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	23:07
openstackgerrit	Paul Belanger proposed openstack-infra/zuul-jobs master: WIP: Move subunit processing https://review.openstack.org/485840	23:24
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename uuid to build https://review.openstack.org/485851	23:32
* SpamapS getting into nitty gritty of config file changes for a disk monitor		23:39
SpamapS	This feels like something that would possibly be unique to each executor.	23:40
SpamapS	Though that may also not really be true in practice, since it would be annoying if sometimes your job works, and sometimes it fails, because sometimes it hits the big executor and sometimes the small one.	23:41
pabelanger	jeblair: mordred: going to restart ze01 again to pick up latest logging changes	23:41
mordred	cool	23:41
jeblair	SpamapS: i think we also had the idea of a jobs-per executor limit. so fine tuning might involve considering those values together.	23:43
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add zuul.items to job vars https://review.openstack.org/485853	23:44
mordred	jeblair:your stack looks good - but there's a missing word in the docs in the first patch	23:48
SpamapS	jeblair: Right, what would limit an executor from overloading itself now? Nothing?	23:54
jeblair	mordred: ugh. maybe a followup at this point? :)	23:58
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Log items in loops better https://review.openstack.org/485864	23:58
mordred	jeblair: yah. that's what I was thining	23:59
mordred	pabelanger: ^^ that should make your item/loop thing from earlier better	23:59
jeblair	SpamapS: indeed. we spun up zl08 recently because we think we overloaded our v2 launchers when running at full capacity (we were at 213 simultaneous jobs, now down to 187)	23:59
pabelanger	okay, ze01 restarted	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!