Thursday, 2017-05-25

jlk	Turns out, when we poke at the dusty corners, we uncover some ugly stuff.	00:06
*** adam_g has quit IRC		00:06
*** adam_g has joined #zuul		00:09
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory https://review.openstack.org/467635	00:16
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini https://review.openstack.org/467634	00:16
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Display command/shell results in a banner https://review.openstack.org/467603	00:21
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Strip unneeded cmd, changed, stdout and stderr from results https://review.openstack.org/467310	00:21
mordred	jeblair: ^^ I updated that per your review and also responded. relatedly - I think we're only fetching the contents of the first task	00:22
mordred	jeblair: I seem to remember we discovered that was a bug before	00:22
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Disable leaked repo check https://review.openstack.org/466810	00:23
mordred	Shrews: ^^ since you've been looking at logging, maybe you'll have an idea? (It'll be easier to spot once the logs aren't spammy)	00:23
jlk	oooh merged the leaked repo.	00:28
jlk	time to recheck all the things.	00:28
mordred	jlk: I just rechecked the first half of your stack	00:29
jlk	^5	00:29
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Wait for merger to complete before item is ready https://review.openstack.org/463054	00:44
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add support for requiring github pr head status https://review.openstack.org/449390	00:44
*** tristanC has joined #zuul		01:06
jlk	mordred: you've used betamax in the past, would you have any objections if it were brought into zuul to better test github driver code that interacts with the API? We've hit a number of bugs that were not discoverable until running the actual code against the API instead of our mocked fake stuff.	01:22
jlk	mordred: I'm thinking we can just stuff json events into the webapp, record the zuul responses to interacting with real PRs on github, and then re-use those betamax responses in tests.	01:23
jlk	(similar to how github3.py does things, except they don't deal with webhooks)	01:23
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement trigger require/reject-review https://review.openstack.org/463387	01:35
jeblair	jlk, mordred: i'm a fan of betamax for such things	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Include tenant name in github context https://review.openstack.org/463871	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Save installation ids to a cache and fetch them per project https://review.openstack.org/463424	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Catch integration key file not found in zuul https://review.openstack.org/463425	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Create github client each time it's called. https://review.openstack.org/463421	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Use integration_id with github https://review.openstack.org/463420	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Refactor integration key loading to always load https://review.openstack.org/463423	01:39
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Decode JSON body once for requests https://review.openstack.org/463422	01:40
jeblair	jlk: i'm thinking that might be best as a somewhat more traditional unit test of the github driver, yeah? verifying one thing at a time and keeping the fake for the functional tests?	01:41
jlk	maybe? I'm a bit of a novice when it comes to designing tests. Mostly I just dislike that in the github test area we're manufacturing API call/responses and at times we've done it wrong. SO we wind up mostly testing our fake methods instead of testing the real methods.	01:43
jeblair	jlk: (the only reason i'd hesitate about betamax is the thought that the fake needs to be accurate enough for the functional tests, so once we get all the errors out, will betamax still offer something beyond the fake?	01:43
jlk	I've tried to extract the calls to external APIs into small functions but it's not complete.	01:43
* jlk afk for the dinners.		01:44
jeblair	the main thing is that the fake needs to be fully functional so that we can make changes to zuul itself with confidence. also testing the driver with betamax is fine with me (if, at the end of the day, it's still useful)	01:46
jeblair	fwiw, on the gerrit side, we created the fake by copying the data from production responses and then stuffing them into the fake with templated values	01:49
jeblair	betamax might be useful getting a better baseline for the fake	01:49
jlk	oh oh	02:25
jlk	So	02:25
jlk	I can easily fake the events.	02:25
jlk	that we have to have no matter what	02:25
jlk	it's what the driver does in _response_ to the events that matters. Because unlike gerrit, we don't get _all_ the data. Github driver has to make calls back to the API to get more data.	02:25
jlk	and, it makes calls to _set_ data, like status. It's those things that we're doing now with fake calls, that's bypassing parts of the actual driver code. If we used betamax, it'd use the driver code, and we should still be able to inspect that it Did The Right Thing.	02:26
*** adam_g has quit IRC		02:28
*** adam_g has joined #zuul		02:30
SpamapS	jlk: I think jeblair is suggesting that you could make a fake for that side too.	03:26
SpamapS	2017-05-25 03:57:40,836 kazoo.client INFO Connecting to localhost:2181	03:58
SpamapS	2017-05-25 03:57:40,837 kazoo.client WARNING Connection dropped: socket connection error: Connection refused	03:58
SpamapS	the plot thickens	03:58
SpamapS	btw, we have _something_ calling basicConfig during tests	03:58
* SpamapS puts a pin in it to come back to		03:58
SpamapS	I monkeypatched basicConfig out and at least started getting sane output from these fails	03:58
SpamapS	gah n/m on the ZK thing	04:00
* SpamapS has too many things going on at once ;)		04:00
jlk	SpamapS: but... isn't that what betamax does? Creates the fakes for you, based on the real interaction, so you can run it over and over again without needing the real API there?	04:03
SpamapS	jlk: I believe it will parrot things back to you. I don't believe it will handle a two way comm functionally the way the fake gerrit does for instance.	04:06
jlk	what's the difference?	04:06
SpamapS	The fake gerrit keeps some state and will feed it back to you as it was submitted. I don't recall that betamax has a facility for that.	04:06
SpamapS	so you can tell the fake gerrit "add a change. now give me the event stream".. betamax is more "if I add a change, respond with this response"	04:07
SpamapS	but I could be wrong	04:07
SpamapS	my betamax experience was over a year ago and brief	04:08
jlk	ah.	04:12
jlk	hrm.	04:12
jlk	I'm not sure if it's that way or if I'm following	04:12
jlk	but it seemed like you could betamax a multi-call thing and have the back/forth	04:13
jlk	maybe it's because there isn't an "event stream" on the github side per se	04:13
*** bhavik1 has joined #zuul		05:25
SpamapS	jlk: probably worth investigating.	05:38
* SpamapS 's been debugging on the ssh agent.. all the live long day		05:53
*** bhavik1 has quit IRC		07:01
*** DangerousDaren has joined #zuul		07:39
*** Cibo_ has joined #zuul		08:40
*** smyers has quit IRC		09:04
*** Cibo_ has quit IRC		09:09
*** smyers has joined #zuul		09:15
*** bhavik1 has joined #zuul		09:41
*** jkilpatr has quit IRC		10:57
*** jkilpatr has joined #zuul		11:14
*** isaacb has joined #zuul		11:26
*** isaacb has quit IRC		11:30
*** bhavik1 has quit IRC		11:43
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use ssh for git-upload-pack https://review.openstack.org/436802	11:48
*** jkilpatr has quit IRC		12:00
*** jkilpatr has joined #zuul		12:01
*** jkilpatr has quit IRC		12:16
*** jkilpatr has joined #zuul		12:43
*** rcarrillocruz has quit IRC		12:46
Shrews	mordred: sorry, an idea on what?	13:05
mordred	Shrews: I _believe_ there is a bug in the consumption of log streaming and writing it to the file on the executor	13:17
mordred	Shrews: I _think_ the bug is that it for $reason only winds up streaming the first shell task and not subsequent ones	13:18
Shrews	hmm	13:18
mordred	Shrews: it's hard to see right now because we also log a whole mess of garbage	13:18
mordred	which https://review.openstack.org/#/c/467310/ will hopefully fix	13:18
Shrews	i still see garbage :(	13:19
mordred	yah. so much garbage	13:19
mordred	we need to land 467310 and 467603 and the logs should be prettier	13:19
mordred	pabelanger: ^^ if you're up - those are for the logging issue	13:20
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Re-enable E305 pep8 errors https://review.openstack.org/466286	13:21
*** dkranz has quit IRC		13:23
* SpamapS has resorted to appending debug traces to files to try and find out where the thing is getting stuck		13:36
mordred	SpamapS: print is the ultimate debug tool	13:38
SpamapS	nope	13:38
SpamapS	that gets swallowed	13:38
SpamapS	thanks testr!	13:38
mordred	oh yuck	13:38
SpamapS	(and the problem doesn't happen not in testr)	13:38
mordred	SpamapS: running the test in isolation outside of testr everything is fine?	13:39
SpamapS	so like literally I have a thing that appends to a file at ~/tmp/debug.{pid}	13:39
mordred	wow	13:39
SpamapS	mordred: no one test kills it	13:39
SpamapS	it just gets stuck somewhere and times out	13:39
*** Cibo_ has joined #zuul		13:39
SpamapS	in futex()	13:39
SpamapS	which tells me "in python thread stuffs"	13:40
* mordred hands SpamapS a large bucket into which to put his sads		13:40
mordred	I _hate_ these types of bugs	13:40
mordred	because the sad thing is - they're REAL bugs that will hit actual production at some point	13:41
SpamapS	the good news is I just found the line it's getting stuck at	13:41
mordred	ooh!	13:41
* mordred hands SpamapS a large bucket into which to put his happy		13:41
SpamapS	the bad news is.. now what? :-P	13:41
mordred	SpamapS: have you tried deleting the line?	13:41
mordred	maybe it's just naughty	13:41
SpamapS	mordred: who needs to join threads anyway?	13:42
SpamapS	be an individual	13:42
SpamapS	not a joiner	13:42
mordred	that's what I'm saying	13:43
mordred	SpamapS: so -it's trying to join threads but one of the threads decides it can't because $reason and $reason is the issue	13:43
SpamapS	Right, the threads do try to cooperate a bit	13:43
SpamapS	so we have a running flag	13:44
SpamapS	and we set it to False, and then wait for that to work	13:44
mordred	but if one of the threads is blocked on something, it can't cycle back around to obey the flag	13:44
SpamapS	I think yes	13:44
SpamapS	I think	13:46
mordred	SpamapS: golly - you might have to add ~/tmp/debug.{threadid} files	13:46
SpamapS	I may be wrong	13:46
SpamapS	I think it's possible we're doing a blanket 'wait()'	13:46
SpamapS	and picking up the ssh-agent	13:46
mordred	ahhhhh	13:46
mordred	that is not a bad hypothesis	13:46
SpamapS	hrm no... waitpid	13:47
SpamapS	self.proc.wait() :-P	13:47
mordred	ssh-agent is going in to subprocess right?	13:47
mordred	but I guess there is a controlling execution thread that owns that subprocess handle	13:47
SpamapS	so, I tried it with os.fork/execlp actually	13:48
SpamapS	that doesn't seem to fix it	13:48
* mordred is just being the dumb sounding board saying stupid things in case the trigger a thought btw		13:48
SpamapS	appreciate it	13:48
SpamapS	there are a lot of bad theories bouncing around in my head	13:48
mordred	SpamapS: ok - I have a REALLY stupid idea	13:48
SpamapS	that's exactly what I need	13:49
mordred	SpamapS: but it's REALLY stupid - so bear with me	13:49
mordred	SpamapS: make a completely separate daemon whose job is only to fork and reap ssh-agents. have the executor request a new ssh-agent from the agent daemon rather than forking one itself, and have the executor return the agent when it's done	13:50
SpamapS	Ugh	13:50
SpamapS	I mean yeah	13:50
SpamapS	that's stupid	13:50
mordred	it is	13:50
SpamapS	but	13:50
SpamapS	I'm getting to where I have no idea why having this process running is locking up my python interpreter	13:51
mordred	however - we might be hitting the point where we're at enough layers of subprocesses interleaved with threads that we just don't have the tools to actually find the sticking point	13:51
Shrews	SpamapS: ooh, i cherish the bugs that force me to log to a file	13:51
SpamapS	found spot seems to have been a red herring.. it hasn't locked up there again	13:51
Shrews	o/` Cherish the bugs you have o/`	13:52
SpamapS	Cherish is the word I use to remind me of...	13:53
SpamapS	Shrews's Love	13:53
Shrews	aawwwww	13:53
Shrews	oops, autocorrect. should have been...	13:53
Shrews	eewwwww	13:53
Shrews	mordred: is it ok to filter result._result before calling the super?	13:55
Shrews	need to check what the super does with that...	13:55
* SpamapS has to go begin the child shuttling process		13:55
SpamapS	mordred: I'll ponder on that.. thanks for the dumb idea	13:56
mordred	Shrews: I responded to same question in the review from jeblair	13:58
mordred	Shrews: we have no choice	13:58
Shrews	oh	13:59
mordred	Shrews: well - we do have a choice - if we find that filtering first is unacceptable because it would break something else - we can stop calling super and write our own complete methods	13:59
mordred	Shrews: so figuring out if it's a bad idea _is_ worthwhile	13:59
*** isaacb has joined #zuul		14:00
Shrews	mordred: i'm not immediately seeing anything that would break. However, I'm willing to bet that this is yet another unexpected bastardization of the ansible API that would make bcoca and jimi cringe. :)	14:01
mordred	yah	14:02
mordred	Shrews: that said - we need to put on the list "need defined API for result objects in callback plugins"	14:03
mordred	becauseyou basically can't do _anything_ without hitting private members	14:03
mordred	I know bcoca has said before that they weren't ready to commit to what's there yet	14:03
mordred	which I totally respect	14:03
*** Cibo_ has quit IRC		14:04
Shrews	yep	14:04
mordred	but it seems enough people are writing callback plugins that it's time ready or not	14:04
*** dkranz has joined #zuul		14:18
*** isaacb has quit IRC		14:34
rbergeron	pabelanger: where is beefy being used?	14:48
pabelanger	rbergeron: just some random hacking by dougbtv	14:50
dougbtv	:D	14:51
dougbtv	it's in a cowsay-beefymiracle rpm available in fedora fwiw	14:51
rbergeron	yay	14:56
rbergeron	ahhh	14:56
rbergeron	ah, thanks to rrix and will woods	14:58
* rbergeron wonders if they can be bribed to package it for epel, lol		14:59
dougbtv	+1!	15:13
*** isaacb has joined #zuul		15:17
pabelanger	mordred: clarkb: jeblair: do you mind adding https://review.openstack.org/#/c/452494/ so your review pipeline. Its our nodepool console-log patch	15:24
mordred	pabelanger: we wrote get_server_console in shde to just return none if the API call fails,right?	15:26
pabelanger	let me confirm	15:27
pabelanger	mordred: empty string it looks like	15:28
mordred	pabelanger: ok - cool. so no exceptions	15:38
mordred	pabelanger: patch +2 from me	15:38
*** jasondotstar has quit IRC		15:43
*** leifmadsen has quit IRC		15:45
*** leifmadsen has joined #zuul		15:49
*** jasondotstar has joined #zuul		15:51
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add override-branch property to job repos https://review.openstack.org/467375	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_one_branch cloner test https://review.openstack.org/467779	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Check out the appropriate branch in executor https://review.openstack.org/467778	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Use the executor cached repos more often https://review.openstack.org/467777	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Return resulting commits from merger https://review.openstack.org/467776	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add default-branch property to projects https://review.openstack.org/467334	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename 'repos' job attribute to 'required-projects' https://review.openstack.org/467376	15:58
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add override-branch property to jobs https://review.openstack.org/467775	15:58
*** DangerousDaren has quit IRC		16:08
mordred	SpamapS, jeblair: topic came up in a completely unrelated conversation - but I thought it might be worth mentioning here	16:21
mordred	SpamapS, jeblair: http://opentracing.io/ ... current effort Adrian Cole (dude who wrote jclouds originally) is working on - it's a spec and a set of implementation for tracing things through distributed systems	16:22
mordred	we have a distributed system that's goal in life is to become even more distributed - thought at least pointing people to that would be nice - on the off chance we happen across the problems they're focused on solving	16:23
SpamapS	heh	16:25
SpamapS	tracing :)	16:25
jeblair	mordred: nod could be interesting. though i'd like to start by making sure our log files are sufficient (and sufficiently tagged) so that an admin doesn't need an extra tool to do basic debugging. but once we have our house in order, might be nice. :)	16:26
SpamapS	I need something like set -x for python right now :-P	16:28
* SpamapS is literally putting a debug append to file on every line in executor server right now		16:28
jeblair	SpamapS: maybe i should poke at that problem with you... what do i do?	16:29
*** bhavik1 has joined #zuul		16:29
jeblair	mordred, jlk: i reviewed a bunch of github changes	16:29
mordred	jeblair: woot. I'll do that next	16:32
SpamapS	jeblair: Would love a second set of eyes. Especially if they're your elf eyes.	16:32
SpamapS	jeblair: the problematic patch is 462712	16:32
SpamapS	jeblair: running tests under testr times out tests reliably. Have tried on single thread, still happens.	16:33
SpamapS	stracing reveals before they get SIGALRM they're in 'futex()'	16:33
jeblair	but running a single test succeeds?	16:33
SpamapS	and looks like they're stuck on joining threads	16:33
mordred	speaking of logging, I just pushed up a shade stack related to logging that ends in https://review.openstack.org/#/c/468095/ which is a document about the loggers we log to. mostly mentioning here in that, in as much as nodepool uses shade, admin info on how to control the logging shade emits can be useful	16:33
SpamapS	jeblair: yes any single test succeeds though it's possible that's a sample size problem	16:34
jeblair	mordred: yeah, we may want to xref that in nodepool docs	16:34
SpamapS	also setting gentle=True deadlocks forever, presumably because the thread that receives SIGALRM is the one stuck in futex()	16:35
mordred	jlk: I'm not going to -1 it - but in https://review.openstack.org/#/c/453844/17/tests/base.py you moved user = 'zuul' to a parameter - but the comment at the top of the method says "we hard set user to zuul" - I think the comment needs changing/removal	16:36
jeblair	SpamapS: neat. my ssg-adent has no "-D" argument	16:37
jeblair	ssh-agent even	16:37
jeblair	ii openssh-client 1:6.6p1-2ubuntu2.8	16:37
jeblair	SpamapS: that's what zuulv3-dev has as well. any chance yours is a gnome thingy?	16:40
SpamapS	whhhhhhhhhat?	16:40
SpamapS	openssh-client: /usr/bin/ssh-agent	16:40
SpamapS	ii openssh-client 1:7.2p2-4ubuntu2.2 amd64 secure shell (SSH) client, for secure access to remote machines	16:41
SpamapS	so -D is new?	16:41
SpamapS	hrm	16:41
jeblair	seems that way	16:41
jeblair	http://paste.openstack.org/show/610657/	16:41
SpamapS	good 'ol BSD people.. daemonizing everything	16:41
SpamapS	so the original patchset did use the daemonized one	16:41
SpamapS	that made me nervous though	16:41
SpamapS	as it makes it harder for us to ensure they get cleaned up	16:41
SpamapS	at least with it in the same pg as zuul-executor in theory when zuul-executor's session ends it will get a HUP and die.	16:42
pabelanger	I was using ssh-agent -k	16:42
pabelanger	to help clean up	16:42
SpamapS	so of course we can track pids and kill	16:42
SpamapS	but the kernel helps us quite a bit if we stay in the same pg	16:42
jeblair	yeah, i really like the foregroud/pgroup option	16:43
SpamapS	but requiring openssh >= 7.x might be a bit much?	16:43
SpamapS	Trusty still has 2 years left.	16:43
clarkb	we already require bubblewrap	16:44
clarkb	or mostly require it	16:44
jeblair	and we're on the verge of being py3 only	16:44
SpamapS	and RHEL6 still has 2 years too :-P	16:44
clarkb	rhel6 is python2.6 which I doubt we care about	16:44
pabelanger	centos-7 lacks zookeeper currently	16:44
SpamapS	so	16:44
SpamapS	agree on that!	16:44
SpamapS	however..	16:44
SpamapS	there's a difference between helpers (zk, bwrap) and core system software (ssh)	16:44
pabelanger	I've been testing on fedora latest, works well	16:44
SpamapS	quite a different thing to run backported ssh	16:45
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Adds github triggering from status updates https://review.openstack.org/453844	16:45
clarkb	aren't we talking about running backported kernels?	16:45
clarkb	I don't think you get more core sytstem software than that	16:45
pabelanger	ssh-agent on xenial support -D	16:46
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Implement pipeline requirement on github reviews https://review.openstack.org/453845	16:46
pabelanger	and I think we're planning on migrating more things in openstack-infra to xenial	16:46
pabelanger	I just did nodepool-builders this week	16:46
SpamapS	I'm surprised we booted zuulv3-dev on trusty :-P	16:47
pabelanger	easy mode	16:47
clarkb	yes I awnt it if for no other reason than to be systemd everywhere and no longer have to deal with 3 init systems	16:47
mordred	also - isn't zesty already out?	16:47
pabelanger	We could upgrade it to xenial, if people would like to do so.	16:47
mordred	oh - wait - zesty still isn't lts is it?	16:48
SpamapS	zesty is in fact out	16:48
SpamapS	but zesty is a 9 month support release	16:48
jeblair	pabelanger: we'll need to soon anyway for the py3 stuff	16:48
mordred	jeblair: ++	16:48
pabelanger	okay, I can prep zuulv3 upgrade	16:48
mordred	I certainly don't think we should target zuul v3 to trusty	16:48
clarkb	no, but xenial is fine	16:49
mordred	yup	16:49
pabelanger	actually	16:49
pabelanger	can we just inplace distro upgrade it? That there is alot of stuff that is not under puppet right now	16:49
jeblair	pabelanger: well, maybe we should leave it alone and build zuulv3.openstack.org on xenial with puppet?	16:50
pabelanger	we can, just a lot of puppet work atm	16:50
SpamapS	you can borrow BonnyCI's ansible if you want ;-)	16:50
* SpamapS ducks		16:51
jeblair	i'd be down with that :)	16:51
mordred	honestly, if puppet is hurting not helping us to manage the systems, I think that should be considered	16:52
mordred	managing our systems via config management is suppoesd to be a force-multiplier	16:52
SpamapS	ya'll puppet what you need to puppet. :)	16:52
pabelanger	well, we do have some ansible roles today in gate. I'd love to demo some of the stuff I did will windmill testing :)	16:52
mordred	if we're avoiding tasks because too-much-puppet - it's not helping	16:52
pabelanger	that's how I have been rolling zuulv3 locally	16:52
SpamapS	hoist likely has some non-generic code that would make porting it to infra as hard as porting openstack's stuff to v3. :)	16:52
pabelanger	but, didn't want to push the ansible / puppet cfgmgt issue in openstack-infra	16:53
clarkb	I'm not sure that changing cfg mgmt systems magically fixes anything	16:53
Shrews	jeblair: mordred: well, i absolutely am baffled as to why I cannot get the asyncio server to shutdown when I daemonize the thing. It works fine non-daemonized. I'm about to give up on this.	16:53
SpamapS	(hoist being our pirate-themed Ansible for deploying BonnyCI)	16:53
clarkb	(they all have problems and make like miserable in their own ways)	16:53
jeblair	at any rate, one way or another, we do need the production system to be under config management, so i think we should leave the dev machine alone and launch the prod server on xenial with cfgmgt	16:53
mordred	I agree with jeblair	16:53
mordred	I also agree with clarkb - Ido not think there is a magical bullet - but if humans are avoiding the puppet work it's a worthy facet to consider	16:54
mordred	jeblair, jlk: different topic- minor bikeshed: https://review.openstack.org/#/c/460700/14/tests/base.py ... gh_time_format = '%Y-%m-%dT%H:%M:%SZ'	16:54
mordred	I bring this up because I quite literally JUST hit this with shade and nova api	16:54
clarkb	protip never ever read teh ansible service management code if you want to use ansible to manage services	16:54
jlk	So, just so I can keep it straight again, EventFilter deals with data in the actual event (like the branch, the repo, etc.. and the RefFilter is about deeper data in the change, like what votes it has or whether it's the latest ref in the change request and whatever.	16:55
SpamapS	clarkb: to be fair, they have to deal with even more than 3 init systems. :)	16:55
jlk	mordred: I have no skin in that game, I think that was a time format created by adam_g	16:55
mordred	jeblair, jlk: '%Y-%m-%dT%H:%M:%SZ' is a subset of ISO 8601 - but is not full ISO 8601 - are we certain that github uses the subset and will never send richer ISO 8601 values?	16:55
jeblair	jlk: yep	16:55
jlk	er not format, but code	16:55
clarkb	SpamapS: ya tahts actually a big part of my grump. Everywhere else ansible has explicitly decided to avoid abstracting things into generic resources. Except for service. And its terrible	16:55
jlk	mordred: we're not certain about anything with regard to the github API	16:55
clarkb	I think in part because it flies in the face of the general design of the rest of it	16:56
jlk	all we have is empirical evidence	16:56
jlk	oh oh are we griping about the "service" module?	16:56
* jlk hates it		16:56
jlk	clarkb: the "package" module is getting to be like that too, trying to abstract around apt, yum, dnf, etc...	16:57
mordred	jlk: which is a dumb idea	16:58
jlk	I get the desire, but I choose not to embrace it	16:58
mordred	jlk: the original ansible approach of an apt, yum and dnf module is much nicer	16:58
mordred	yup	16:58
jlk	I'd rather use playbook branching logic to call the correct modules.	16:58
mordred	dehann wasn't always right - but he's right on this one	16:59
jlk	agreed	16:59
mordred	well - you have to have branching logic ANYWAY	16:59
SpamapS	clarkb: Agree, would much rather see a systemctl module.	16:59
mordred	because fo package names	16:59
mordred	SpamapS, clarkb: https://docs.ansible.com/ansible/systemd_module.html	16:59
jlk	well..	16:59
jlk	yes	16:59
SpamapS	well there you go	16:59
clarkb	mordred: oh nice	16:59
jlk	you need branching logic for variable data loading	16:59
clarkb	even more reason to just get to systemd everywhere world :)	16:59
jlk	and presumably you always feed the same _variable name_ into the "package" action plugin.	17:00
clarkb	as much as I'm sure it hurts mordred to do that :)	17:00
* SpamapS is now more confused after adding tracing		17:00
mordred	jlk: so - data type is not listed in thier API reference anywyhere for datatimes - but in their graphql docs they have "An ISO-8601 encoded UTC date string." - https://developer.github.com/early-access/graphql/scalar/datetime/	17:00
jlk	variable branching is WAY easier than playbook branching	17:00
SpamapS	thread is exiting... join is not returning	17:00
jlk	mordred: I really need to get a grasp on graphql and see if we should move over to it, but for now we're on the REST API	17:00
mordred	in any case - it may be worth considering just using iso8601 to parse the incoming string for potential forward compat	17:00
jlk	mordred: if that's safe with the not quite 8601 we get now, sure.	17:01
mordred	since it'll parse the subset	17:01
mordred	well - the string above is valid iso 8601	17:01
jlk	my time code skills are REALLY lacking.	17:01
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming https://review.openstack.org/463353	17:01
mordred	it's just that there are other things that are also optional and valid	17:01
mordred	jlk: I'll make a follow up patch - I don't think it's worth reworking the existing patch	17:02
*** isaacb has quit IRC		17:02
jeblair	SpamapS: i'll switch to a xenial machine to poke at this for now, and we can defer the ssh version decision for later	17:02
jlk	kay thanks	17:02
jlk	I have to get back to rebasing this stuff, because I screwed up the EventFilter vs RefFilter for some things	17:02
pabelanger	jeblair: mordred: does that mean we want ze01.o.o and zuulv3.o.o ?	17:02
pabelanger	or all in one still	17:02
*** Cibo_ has joined #zuul		17:02
jeblair	pabelanger: let's go ahead and separate	17:02
SpamapS	jeblair: There's an earlier patch set that doesn't use -D ... if that one works rebased on feature/zuulv3, it might be a clue about what's causing the weirdness :-P	17:03
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming https://review.openstack.org/463353	17:03
jeblair	SpamapS: true; i'll reserve that line of inquiry for later	17:03
Shrews	mordred: In case there's any chance you might have a clue on this, lines 164-170 of https://review.openstack.org/#/c/463353/13/zuul/streamer.py summarize the situation.	17:04
mordred	jlk: while you're in there- https://review.openstack.org/#/c/46076 is +3 but jeblair left an inline comment	17:06
jlk	alrighty	17:06
Shrews	hah. typical. now that i've publicly given up, i finally find something on the internet: https://bugs.python.org/issue21998	17:07
mordred	Shrews: yay!	17:08
Shrews	from the comments i've read so far, the answer is "don't do that"	17:08
Shrews	so maybe we just have to treat this differently from the other zuul processes	17:09
jeblair	Shrews: where's the daemonization code?	17:09
Shrews	jeblair: i posted the latest in 463353	17:09
Shrews	jeblair: zuul/cmd/streamer.py does the normal zuul stuff	17:10
Shrews	with a little manual forking thrown in	17:11
clarkb	oh Shrews is here. Shrews pabelanger can has review on https://review.openstack.org/#/c/467783/ its the failed image upload records thing	17:11
clarkb	if that looks good on the v3 branch I'd like to port it over to master too so we can get that in soonish	17:11
Shrews	clarkb: yeah, it's on my list. i want to make sure we aren't depeding on those anywhere	17:11
Shrews	pretty sure we aren't, but need to refresh my memory	17:12
clarkb	Shrews: I did some rudimentary checking (grepping for FAILED or was it FAILING), but extra eyes appreciated	17:12
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Ensure PRs arent rejected for stale negative reviews https://review.openstack.org/460700	17:12
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Represent github change ID in status page by PR number https://review.openstack.org/460716	17:12
jeblair	Shrews: ah, i see it; i was misled by the "TODO: make zuul/cmd/streamer.py" comment, as that appears to be TODONE :)	17:12
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Comment on PRs if a remote call to merge a change failed https://review.openstack.org/460762	17:13
Shrews	jeblair: where is that comment?	17:16
jeblair	Shrews: bottom of zuul/streamer.py	17:16
SpamapS	#4 0x000000000049f414 in PyThread_acquire_lock ()	17:16
* SpamapS finds all the lock acquires		17:16
Shrews	hah	17:16
jeblair	Shrews: i'll poke at it after i finish poking at the ssh-agent thing	17:17
SpamapS	oh hrm	17:17
SpamapS	that's lower level	17:17
SpamapS	all the threads are at that spot	17:17
Shrews	jeblair: according to that bug report, the event loop FD appears to be the problem	17:17
Shrews	i suspect i need to play to creating a new event loop	17:19
SpamapS	Shrews: what are you forking for?	17:22
* SpamapS has no context		17:22
Shrews	SpamapS: all zuul processes support being run in the foreground (with -d option) or being daemonized (the default when you run the command). Forking is part of the daemonizing process	17:25
SpamapS	Shrews: so you haven't done any actual work before daemonizing right?	17:25
Shrews	SpamapS: not i, but i can't control what the asyncio lib does	17:26
SpamapS	Seems like moving the event loop init to after daemonizing would be the simplest thing?	17:26
Shrews	SpamapS: right, that's part of asyncio	17:26
Shrews	i wonder if i move the import to much later...	17:27
SpamapS	zomg this is awesome. So I added a crap ton of debug writes to my log files... and of course.. now it's not timing out.	17:27
SpamapS	Shrews: the event loop isn't initialized on import, is it?	17:27
jeblair	Shrews: yeah, if it does something in module import (which is evil... didn't quido write this?), you may need to do that. we have similar late imports due to paramiko	17:27
SpamapS	That would be .. awful.	17:27
jeblair	guido even	17:27
clarkb	Shrews: was it not somethign as simple as keeping fds open?	17:28
SpamapS	import side effects are one of those newbie mistakes I thought	17:28
Shrews	SpamapS: i have to assume so because one does not initialize the event loop	17:28
clarkb	daemonization closes all open fds	17:28
clarkb	but you can whitelist them	17:28
jeblair	i don't think we want any fds open	17:28
jeblair	we want daemonization to happen before any asyncio stuff starts	17:29
Shrews	the fds are part of asyncio internals. i don't want to try to mess with those	17:29
jeblair	right. i'm just saying they shouldn't be open before we daemonize.	17:29
clarkb	well you don't mess with them so much as tell daemonization process to leave them alone (the opposite of messing with them :) )	17:29
clarkb	and ya doing asyncio setup after wards is fine too	17:29
Shrews	i'm going to try creating a new event loop, and late import as a last resort	17:30
SpamapS	I'm not kidding though.. I actually find it hard to believe asyncio initializes the event loop on import. I believe you it does, but it _hurts_ my brain to think they did that on purpose.	17:30
jeblair	well, just speculation at this point. but it would be nice to have a definitive answer.	17:31
SpamapS	since you always have to asyncio.get_event_loop() , that seems like the thing to move after fork.	17:36
Shrews	SpamapS: that is after the fork	17:37
Shrews	absolutely nothing is done with asyncio until after the fork	17:38
Shrews	and creating a new event loop, assuming i'm doing it correctly, doesn't help	17:38
jlk	jeblair: can you help me understand why in gerritmodel.py the EventFilter class deals with required_approvals and reject_approvals, instead of just handling those in RefFilter? Since they deal with the deeper content of the change, shouldn't it be only in the Ref filter?	17:46
SpamapS	Shrews: I think you can get a fresh one with get_event_loop_policy().get_event_loop()	17:49
SpamapS	Shrews: oh, that's kind of what new_event_loop() does actually.	17:50
jeblair	jlk: it's a borderline abstraction violation -- it means "this event matches iff the change already has some approval (wich may not be related to the event)"	17:50
SpamapS	Shrews: set_event_loop(new_event_loop()) maybe?	17:50
Shrews	SpamapS: i tried all of that	17:50
SpamapS	ow	17:51
jlk	jeblair: er.... hrm. How would that play out? I'm trying to determine if I need to make the same violation on github.	17:51
jeblair	jlk: it seems like it should be obsoleted by the pipeline requirement, however, we use it to say "enqueue a change into check if someone adds a workflow +1 vote and jenkins has a -1 vote". normally a workflow +1 would not enqueue in check. and you definitely don't want to require a workflow +1 for check.	17:51
jlk	ah, that feels... gerrit/openstack specific.	17:52
jlk	I'm going to ignore that for now and not duplicate it for github.	17:52
jeblair	jlk: well, our use of it is; i was just giving you an example of how it's different. i think it's okay to ignore for now, but we may end up finding it useful to add later.	17:53
SpamapS	Shrews: so yeah, next step is to delay import of asyncio I guess. :-/	17:53
Shrews	and just tried that (well, delayed the import of the thing that imports asyncio). still no worky	17:54
* Shrews stabs all things python		17:54
jeblair	jlk: (the same thing can map to github pretty easily: enqueue in check if zuul has voted failure already and someone approves the pr nonetheless; essentially, an implied "recheck")	17:54
jlk	so normally a "review" would trigger gate, but gate can't trigger because it requires a zuul status, and zuul has said no. Somehow that scenario manages to trigger the check?	17:56
jlk	thinking	17:56
jlk	so when scheduler tries to match up the event to a pipeline,	17:57
jlk	It won't see the "check" pipeline because the trigger config doesn't match, it has no "review" trigger	17:57
SpamapS	Shrews: what about the websocket library?	18:01
SpamapS	Shrews: perhaps it has import side effects	18:02
SpamapS	reading asyncio's code, I don't think it creates the event loop on import	18:02
Shrews	SpamapS: ah, that's entirely probable	18:03
Shrews	but still, nothing is done with the websocket lib until after fork	18:04
SpamapS	Shrews: import?	18:04
Shrews	it's imported in the same file that i tried delaying import on	18:04
SpamapS	Shrews: is it possible the problem has been misdiagnosed?	18:06
jlk	jeblair: oh hrm, I found something else	18:07
jlk	jeblair: A trigger schema has to account for 'require-approval', and that gets defined in <driver>/<driver>trigger.py, which calls <Driver>EventFilter	18:08
Shrews	SpamapS: i welcome others to investigate. all code is up in 463353 that demonstrates the problem	18:08
jlk	so the EventFilter has to handle the require/reject intput	18:08
jlk	even though it seems it should be handled by RefFilter	18:08
* jlk pulls more hair out		18:08
Shrews	SpamapS: just run 'zuul-web-streamer -c zuul.yaml' and 'kill <child pid>'	18:09
jeblair	jlk: a trigger doesn't have to support require-approval -- i think you just suggested not doing that for github, yeah?	18:09
jlk	I think I was arguing the wrong thing	18:09
jeblair	jlk: successfully though :)	18:09
jlk	jeblair: in my head, I mapped EventFilter to "just look at the content of the event, i.e. the stuff that changed right now to cause the event"	18:10
jlk	So, EventFilter handles things like a comment being added, a label being added, a ref being updated, a vote being added.	18:10
jlk	and I mapped RefFilter to "deeper data about the change itself that must be in a specific set"	18:11
jeblair	jlk: yes, that is, generally, and ideally, the case. except for this one thing. :)	18:11
jlk	so "this label must exist on the change, even if it's not set on this event" or "this vote must exist even if it wasn't set on this event" or things like that.	18:11
SpamapS	Shrews: If I get tired of my current personal threading hell, I'll try your asyncio hell.	18:12
jeblair	jlk: yep. it's just that we also discovered that it was useful to say "this event, but only if this vote exists".	18:12
jlk	the code makes this awkward when you want to express a RefFilter level requirement at the trigger config point	18:12
jlk	yeah, we've got pipeline requirements that generically cover this	18:13
jlk	but I think I see what you're saying	18:13
jlk	pipeline reqs are more like "any of these events are cool, but this must be in place for any of them"	18:13
jlk	and a trigger req is more "this event is _only_ cool in this specific scenario, and _STILL_ the pipeline req must be in place."	18:14
jlk	so I could configure a trigger for check that was an approved review from a writer, but only if there is a failed status from zuul.	18:15
*** Cibo_ has quit IRC		18:15
jlk	so that an approved review from a writer wouldn't trigger if there was a success status from zuul.	18:16
jlk	I think I get it now. I had originally looked at trigger requirements as a way to have a driver specific requirement on a pipeline with multiple drivers, but we fixed that a different way.	18:17
jeblair	jlk: yep!	18:17
jeblair	jlk: yeah, i recall we talked about that earlier, and i think it's no longer necessary to think of them in that way since we made driver-specific pipeline reqs	18:18
jeblair	(it would be awkward to use them that way since you'd have to duplicate them on every trigger)	18:18
jlk	But this does mean now that EventFilter has to deal with deeper data of the change.	18:18
jlk	since a trigger calls EventFilter	18:18
jlk	I think it'll make sense to implement a trigger requirement on status, but not on review.	18:19
jlk	gerrit doesn't really have status vs review, it's all in just differently formatted approvals. So it makes sense for gerrit to have require-approval	18:20
jeblair	that sounds reasonable	18:23
jlk	and this definitely needs a new test case	18:24
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127	18:37
*** bhavik1 has quit IRC		18:39
Shrews	hrm, ok. seems something about the Ctl-C stops the event loop.	18:40
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Include tenant name in github context https://review.openstack.org/463871	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Add cachecontrol to requests to github https://review.openstack.org/461587	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Save installation ids to a cache and fetch them per project https://review.openstack.org/463424	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Catch integration key file not found in zuul https://review.openstack.org/463425	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Include exc_info in reporter failure https://review.openstack.org/460765	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement github trigger requirement status https://review.openstack.org/463386	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Create github client each time it's called. https://review.openstack.org/463421	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Use integration_id with github https://review.openstack.org/463420	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Refactor integration key loading to always load https://review.openstack.org/463423	18:44
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Decode JSON body once for requests https://review.openstack.org/463422	18:44
*** rcarrillocruz has joined #zuul		18:46
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming https://review.openstack.org/463353	18:51
Shrews	that PS ^^^ captures SIGINT and attempts to shutdown things, but doesn't work as expected, whether daemonized or not	18:52
Shrews	so i guess it ISN'T fork related	18:52
Shrews	progress, at least	18:53
jeblair	Shrews: that looks like it should immediately exit? (it calls start then stop)	18:55
jeblair	er, rather, start then it just exits	18:55
Shrews	jeblair: start() doesn't return	18:55
Shrews	not until the loop stops	18:56
jlk	mordred: jeblair: I think I addressed the stated concerns in those patches. I dropped one more, the trigger level review requirement. Tests appear to pass on the tip of the branch locally. I'm off to find some fish and chips.	18:56
Shrews	jlk: event_loop.stop() should cause the run_forever() call to exit at the next opportunity	18:57
Shrews	err, jeblair ^^^	18:57
jeblair	Shrews: ah, well, ctrl+c only goes to the main thread, so it will interrupt the streamer start() method which is running run_forever	18:57
jeblair	Shrews: what happens if you run that code then send sigterm?	18:58
jeblair	it'll do the same, actually	18:58
Shrews	jeblair: i'm not using threads	18:58
jeblair	Shrews: yeah, but you still get one for free. :)	18:58
Shrews	well, yeah. there's only 1, i meant	18:59
jeblair	Shrews: so ctrl+c will raise an exception inside of run_forever	18:59
Shrews	jeblair: right, but we SHOULD be able to stop it manually	19:00
jeblair	i would not be surprised if that breaks things, so i say it's best to set that aside for now and try a different method of triggering the shutdown	19:00
jeblair	like sigterm, which should call the exit handler then return to processing insed of run_forever	19:00
jeblair	Shrews: so what happens when you start that with "-d", then in another terminal, kill -15 the pid?	19:01
Shrews	jeblair: same thing	19:02
Shrews	event loop is never stopped, even though we've asked it to	19:02
Shrews	jeblair: the answer might be here: https://stackoverflow.com/questions/42628795/indirectly-stopping-a-python-asyncio-event-loop-through-sigterm-has-no-effect/42635128	19:04
Shrews	"Instead of signal.signal() use loop.add_signal_handler()"	19:05
Shrews	I will try that	19:05
jeblair	Shrews: ah, so incidentally, fixing the ctrl-c problem may fix this as well.	19:09
jeblair	Shrews: if you have the streamer start in a thread, and have the start() method return, then wait on signal.pause() in the main thread, it may cause both things to work without the need for loop.add_signal_handler	19:10
Shrews	jeblair: i'll try that next because add_signal_handler does the same thing :/	19:11
*** hashar has joined #zuul		19:42
*** jamielennox is now known as jamielennox\|away		19:56
* SpamapS returns to the ssh-agent salt mine		20:23
SpamapS	hrm	20:27
SpamapS	when I attach to the threads of my running tests that are timing out, I just noticed they're actually really busy in the kazoo threads	20:28
pabelanger	heads up, nodepool dsvm jobs are broken. Working to fix them	20:28
*** jkilpatr has quit IRC		20:29
SpamapS	2017-05-25 20:25:58,064 zuul.zk.ZooKeeper DEBUG ZooKeeper connection: CONNECTED	20:30
SpamapS	2017-05-25 20:25:58,080 zuul.nodepool DEBUG Updating node request <NodeRequest 100-0000000000 <NodeSet OrderedDict([('controller', <Node None controller:invalid-label>)])>>	20:30
SpamapS	2017-05-25 20:25:58,080 zuul.nodepool INFO Submited node request <NodeRequest 100-0000000000 <NodeSet OrderedDict([('controller', <Node None controller:invalid-label>)])>>	20:30
SpamapS	running with 'ttrun -epy27' just prints those 3 lines and then that process is looping hard selecting/recving/sending with zookeeper	20:31
SpamapS	http://paste.openstack.org/show/610675/	20:33
Shrews	SpamapS: that's likely the kazoo keepalive thread	20:37
Shrews	we had to restrict it to a single kazoo connection b/c otherwise it hit the cpu pretty hard with multiple	20:38
Shrews	in nodepool, that is	20:38
Shrews	i think kazoo uses something like 3 threads per connection, iirc	20:39
SpamapS	Yeah that sounds about right. Need at least 1 for watches.	20:39
jeblair	SpamapS: in general, what i've found is that the stop process for ssh isn't robust, and when we hit the wait timeout, errors stopping ssh can cause the executor to raise an exception when stopping and end up aborting the test shutdown. i'm still tracking stuff down, but i'm making progress.	20:39
SpamapS	Shrews: it's an awfully busy thread.	20:39
SpamapS	jeblair: I have some updates that I may not have pushed up where I run .wait() in a thread	20:40
SpamapS	jeblair: so that I can do terminate, then start waiting for it to die, then after a few seconds use kill.	20:40
jeblair	SpamapS: yeah, though normally, we'd expect terminate to be pretty fast, right?	20:42
SpamapS	jeblair: in fact I was just re-reading the subprocess manual..	20:42
SpamapS	it says to be careful with subprocess.PIPE and wait.	20:42
SpamapS	jeblair: terminate should be fast, but wait needs to wait for the actual death of ssh-agent.	20:42
SpamapS	the agent does have to unlink its socket	20:42
SpamapS	so, a few ms, but not instant	20:43
SpamapS	jeblair: I'm not seeing executor exceptions ever though, so you're clearly finding more than I could.	20:43
SpamapS	jeblair: I wonder if I should just send all the fd's to devnull and poll for the socket existence, rather than try to be fancy and select stdout.	20:44
SpamapS	that way we don't have any pipes	20:44
jeblair	SpamapS: i think that's a good idea. i think the buffering, etc, there is a theoretical problem, though i'm not sure it's an actual one. at any rate, a lot of stuff gets simpler if we do that. (also, incidentally, the same is true if we don't run in foreground, so we might still consider running without -D depending on whether we find the pgroup thing useful)	20:46
jeblair	SpamapS: the biggest thing i've found is that stopping is racy, and ExecutorServer.stop() can end up running with self not having a .thread attribute; so i added protection there (and for ssh_agent).	20:47
jeblair	SpamapS: i still have some tests hitting the hard timeout though; so still something else to track down	20:47
SpamapS	jeblair: I'm trying with devnull now.	20:47
*** yolanda has quit IRC		20:49
*** jkilpatr has joined #zuul		20:49
SpamapS	jeblair: also re-reading the stop method, I think it will be more robust with wait() running in a thread. Otherwise we could get stuck there.	20:50
jeblair	SpamapS: i keep seeing something that looks like git commands are getting stuck. i wonder if there is some interaction there with ssh-agent? (i don't understand why -- all the git commands are local)	20:50
SpamapS	oh hm	20:50
SpamapS	I saw that too	20:50
SpamapS	but kind of let it fall by the wayside	20:50
SpamapS	I was wondering if maybe the pipes were open and causing FD issues with git	20:51
SpamapS	still hard timeout with devnull so pipes probably not the culprit	20:51
jeblair	SpamapS: but this only happens sometimes...	20:51
SpamapS	http://paste.ubuntu.com/24659247/	20:52
SpamapS	my current diff	20:52
SpamapS	jeblair: merger runs an update thread, right?	20:53
jeblair	SpamapS: yes, and that's one of the things that can get stuck and block shutdown	20:53
SpamapS	jeblair: so what if that merger thread is concurrently running forks and picks up FDs opened for ssh-agent	20:53
SpamapS	subprocess is supposed to handle this	20:54
SpamapS	by closing all fds	20:54
SpamapS	but I'm grasping for anything to hold on to now	20:54
jeblair	ya	20:54
SpamapS	would be great if I could attach gdb but get actual python line numbers	20:56
jeblair	SpamapS: that used to work with https://wiki.python.org/moin/DebuggingWithGdb but last time i tried (not recently) it was somehow broken	20:59
jeblair	i just ran ps during a failing test and saw a zombie git process	21:00
clarkb	I want to say I got ^ mostly working, where it falls down is when you are in code outside of stdlib iirc	21:01
jeblair	also, amazingly, it was a single test in the foreground. first time that's happened. test_zuul_refs	21:02
clarkb	whcih was my problem when trying to debug the python3.4 segfaults	21:02
*** dkranz has quit IRC		21:02
jeblair	oh, yeah, i just tried it. it's producing some output. let's see if it's helpful	21:04
jeblair	be in the zuul root directory when you run it so it can pick up the local files	21:06
SpamapS	jeblair: I'm trying without -D	21:09
jeblair	http://paste.openstack.org/show/610684/	21:10
jeblair	there's the traceback (via gdb) for the thread stuck on a git operation	21:10
jeblair	its git subprocess is (Z)ombie	21:10
SpamapS	My copious debug logging led me to innerUpdateLoop, but I couldn't crack what was happening there	21:13
SpamapS	specifically updateRepo	21:13
jeblair	SpamapS: http://bugs.python.org/issue12786	21:14
jeblair	(via https://stackoverflow.com/questions/14615462/why-does-communicate-deadlock-when-used-with-multiple-popen-subprocesses)	21:14
jeblair	SpamapS: looks like your fd hunch may have been right	21:14
jeblair	SpamapS: i am running in 2.7; are/were you when you saw the git hangs?	21:15
SpamapS	jeblair: yes, 2.7	21:16
SpamapS	the paint is still a little wet on 3.5 ;)	21:16
jeblair	SpamapS: yeah, that's what i thought too. :) i'm adding close_fds=True to the ssh-agent and re-running now	21:17
SpamapS	but I'm on 2.7.12	21:17
SpamapS	jeblair: reading the python issue.. this definitely sounds like our problem	21:18
Shrews	OMG OMG OMG	21:18
Shrews	i have left the asyncio hell	21:19
jeblair	Shrews: yay!	21:20
SpamapS	jeblair: oh doh.. 3 changed the default	21:20
* SpamapS shoudl have tried 3.5		21:20
SpamapS	running without -D also seems to have had a positive effect I think	21:20
SpamapS	which makes sense, because ssh-agent detaches and exits relatively quickly.	21:20
jeblair	SpamapS: yeah, though then we would have thrown up our hands and wondered why 3 "fixed" it :)	21:20
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming https://review.openstack.org/463353	21:20
SpamapS	does anybody else get a spray of logs to stdout just before the tests succeed?	21:21
Shrews	jeblair: SpamapS: ^^^ changed from run_forever to run_until_complete on a Future, where I set the Future (scheduled in the event loop) in our stop()	21:22
SpamapS	Shrews: oh yay	21:22
Shrews	i am now going to drink copious amounts of alcohol	21:22
SpamapS	if close_fds=True solves this issue I may do the same	21:23
SpamapS	jeblair: FYI, the run without -D did still get one deadlocked test process, so it just narrowed the window. I'm re-running with my non -D patch, and close_fds=True	21:24
mordred	Shrews: yay indeed!	21:24
jeblair	Shrews: cool -- though ultimately we shouldn't have any signal stuff in the streamer. did you try the thread idea?	21:24
Shrews	nope. was running down some other last minute hunches	21:25
Shrews	plus i read some scary things about threads and event loops	21:25
Shrews	but i'll look more closely tomorrow	21:26
SpamapS	IIRC, usually one has an event-loop per thread and dispatches things between them using queues.	21:26
Shrews	jeblair: i THINK it will work by moving the signal handling back out of the streamer	21:27
* Shrews tests real quick		21:27
jeblair	yeah, we're not trying to do anything funny with the event loop; we just want to start it in its own thread, and tell it to stop.	21:27
jeblair	and we're going to want it in its own thread too so it follows the pattern of the other services, and so we can start/stop it from tests	21:27
Shrews	jeblair: hrm, nope. gotta be done in the event loop	21:29
jeblair	Shrews: what does?	21:29
Shrews	signal handling	21:29
jeblair	Shrews: did you put it in another thread?	21:29
Shrews	(for the current method that i have working)	21:29
Shrews	no	21:29
Shrews	will try tomorrow	21:29
jeblair	Shrews: right, i don't expect that to work with the current method because of the way signals are delivered. they are always delivered to the main thread. so the trick is to put the event loop in a new (second) thread, so that the main thread just waits for a signal. then it tells the event loop to stop when it gets a signal.	21:30
SpamapS	jeblair: close_fds worked for me!	21:30
jeblair	SpamapS: it's working for me, so far, inasmuch as i'm getting real test timeouts and not alarm clocks :)	21:31
SpamapS	Ran 274 (+5) tests in 455.393s (-103.340s)	21:31
SpamapS	PASSED (id=88, skips=18)	21:31
SpamapS	when running with my non -D patch as well	21:31
SpamapS	already have 4 rogue ssh-agents though	21:32
jeblair	oh, my ssh-adds are failing, that's why all the failures	21:33
SpamapS	I have a patch for that	21:34
SpamapS	I'm going to push up this ssh agent patch as a new change ID so we can see the same things	21:34
jeblair	SpamapS: http://paste.openstack.org/show/610686/	21:35
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage https://review.openstack.org/468173	21:35
jeblair	SpamapS: that's the other thing we should include	21:35
SpamapS	Ah yeah	21:36
SpamapS	saw those missing threads too which were confusing me	21:36
SpamapS	I thought it was just fallout	21:36
SpamapS	ok 468173 is without -D, and with an attempt at a singleton one-per-test-run-process SSH key generator	21:36
SpamapS	the __del__ approach there produces wonky stuff htough	21:37
SpamapS	though	21:37
SpamapS	Exception TypeError: TypeError("'NoneType' object is not callable",) in <bound method SshKeyResource.__del__ of <tests.base.SshKeyResource object at 0x7f4ea8a23610>> ignored	21:37
jeblair	SpamapS: oh, that also needs initializers, i left that off the diff, but you get the idea	21:37
jeblair	SpamapS: regarding the key -- i was thinking it might be nice to just generate a key and stick it in as a test fixture so we don't have to do that live	21:38
jeblair	SpamapS: we do that for the RSA keys	21:38
*** hashar has quit IRC		21:38
SpamapS	jeblair: oh good idea	21:39
SpamapS	that crossed my mind but this was more fun ;)	21:39
SpamapS	I'll rework around that	21:39
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage https://review.openstack.org/468173	21:40
SpamapS	oh I had another thought which was that we probably don't need to explicitly call SshAgent.stop in AnsibleJob.stop because it's already called in the finally block of execute	21:40
jeblair	SpamapS: ^ i pushed the .thread attribute safety as a new PS to your new chang ^	21:41
jeblair	SpamapS: oh, even better. :)	21:41
jeblair	SpamapS: let's go with that.	21:41
SpamapS	jeblair: and how about -D vs. not?	21:44
SpamapS	I'm still of two minds	21:44
SpamapS	jeblair: thanks so much for the help on debugging this. I was definitely at the end of the rope.	21:45
SpamapS	I'm going to collect my thoughts and see about putting in a final version of the patch we can be proud of :)	21:45
jeblair	SpamapS: yeah, i'm not sure what the right answer is there yet....	21:49
jeblair	SpamapS: that patch passes tests for me	21:54
jeblair	oh look, my "actually check out the right branches of stuff" patch series is all green. that's 461176 through 467779 if folks want to start taking a look. it should be ready for review.	21:56
jlk	oh	22:04
jeblair	mordred: reviewed your group/inventory changes	22:13
*** jamielennox\|away is now known as jamielennox		22:36
* SpamapS never thought watching tests pass would bring tears to his eyes.. but... today...		22:43
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage https://review.openstack.org/468173	22:49
SpamapS	jeblair: ^ Let's stick without -D. It makes us work on more platforms and should work well enough.	22:50
SpamapS	I also like that after .communicate() I know the agent is up and working.	22:50
SpamapS	(vs. with -D where I have to select.select on the stdout)	22:51
jeblair	SpamapS: wfm	22:51
* SpamapS restacks bwrap on that		22:52
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	22:52
mordred	jeblair: thanks! (I forgot to say that before I started working on addressing your reviews a little while ago)	22:53
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add support for defining groups in nodesets https://review.openstack.org/467611	22:53
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Put variables into the inventory https://review.openstack.org/467635	22:53
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini https://review.openstack.org/467634	22:53
mordred	jeblair: hopefully those will be mo better	22:53
mordred	jeblair: and yay on check out the right branches of stuff!	22:54
mordred	jeblair: I believe I've read this stack many times already ... I know I've read the commit message at least multiple times	22:55
SpamapS	doh	23:10
SpamapS	http://paste.openstack.org/show/610690/	23:10
SpamapS	have to muck with the perms	23:10
jeblair	mordred: i just assume you're always saying "thanks!"	23:22
jeblair	SpamapS: ah yeah; we could copy it into the test root and muck it up there	23:22
SpamapS	jeblair: testing that now	23:26
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	23:29
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage https://review.openstack.org/468173	23:29
jlk	jeblair: what happens if you have a pipeline that triggers on push, but a requirement of "open"? A push event doesn't have a change that can either be open or closed.	23:32
openstackgerrit	Clark Boylan proposed openstack-infra/nodepool feature/zuulv3: Revert "Use devstack's zookeeper support" https://review.openstack.org/468205	23:32
jeblair	jlk: probably nothing?	23:32
jeblair	jlk: ie, all changes rejected	23:33
jlk	It looks like the "open" attribute to a change only gets created/updated when _updateChange is called	23:33
jlk	so it kind of looks like you'd get a traceback on a missing attribute in the model	23:33
jeblair	jlk: yeah, probably needs tidying up and has never been exercised due to the contradiction	23:35
jlk	I'll put a thumbtack in that	23:35
jlk	(as I'm implementing it for github)	23:35
jlk	Probably need to test a number of things with a push event instead of a change event for requirements and such	23:36
mordred	jeblair: my updates to the inventory group patches are still failing - but not in a super weird way or anything - I'll look at them tomorrow and get them fixed	23:41
jeblair	mordred: yeah, i think the first change is fine; that py35 failure looks unrelated	23:42
jeblair	i rechecked	23:42
jeblair	(but something to keep an eye on)	23:42
jeblair	the second one def has a real problem	23:42
jeblair	mordred: the second one is just some job name mismatches	23:44
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208	23:45
SpamapS	mordred: ^	23:45
SpamapS	I think that makes it work inside the bwrap	23:46
SpamapS	but I'm out of runway for today..	23:46
* SpamapS EOD's		23:46
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement github pipepline req of open https://review.openstack.org/468213	23:56

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!