Friday, 2017-05-26

openstackgerrit	Clark Boylan proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127	00:12
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Revert "Use devstack's zookeeper support" https://review.openstack.org/468205	00:38
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat https://review.openstack.org/468253	02:09
*** yolanda has joined #zuul		05:32
*** Cibo_ has joined #zuul		06:15
tobiash	morning	06:26
tobiash	jeblair: how does zuulv3 choose the branch of the config repos?	06:27
tobiash	I have the following use case and trying to find out if that's already possible or not.	06:28
tobiash	Let's say I have several deployments for different projects. We will have an internal release process of the whole ci system where different deployments could be at different releases.	06:30
tobiash	I'd like to be able to refer the different deployments to a common config repo with common roles but stick them to individual branches/tags per deployment.	06:31
tobiash	(independent of the branching model of the project repos itself)	06:33
*** Cibo_ has quit IRC		06:40
*** jamielennox is now known as jamielennox\|away		06:48
*** Cibo_ has joined #zuul		06:52
*** bhavik1 has joined #zuul		06:53
*** bhavik1 has quit IRC		06:59
*** Cibo_ has quit IRC		07:30
*** openstackgerrit has quit IRC		07:48
*** jkilpatr has quit IRC		10:30
*** Cibo_ has joined #zuul		10:34
*** jkilpatr has joined #zuul		10:52
*** openstackgerrit has joined #zuul		11:08
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Test shared pipeline across multiple gerrits https://review.openstack.org/468374	11:08
*** Cibo_ has quit IRC		11:14
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Decode ssh output of gerrit connection https://review.openstack.org/468392	12:09
*** rcarrillocruz has quit IRC		12:33
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix socket.error exception usage https://review.openstack.org/468405	12:44
Shrews	pabelanger: i noticed this py3.5 failure ^^^ in your job failure on 468253	12:45
pabelanger	Shrews: thanks I am looking at iterate error now too	12:50
pabelanger	trying to see if	12:50
pabelanger	for k in dict.keys() vs for k in dict vs for k in list(dict.keys()) has differences	12:51
pabelanger	I think we could just do: for label in self._submittedRequests	12:51
Shrews	pabelanger: does that work in 2.7?	12:56
Shrews	nah, surely not	12:58
Shrews	heh, it does	12:59
Shrews	TIL	12:59
Shrews	oh, the other error there is use doing bad things with a dict. we loop through it and modify it within the loop. yikes	13:05
Shrews	imma bad programmer	13:06
pabelanger	Shrews: Ya, not sure which one we'd want to go with	13:10
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration https://review.openstack.org/468409	13:13
Shrews	let's see if that fixes it. pabelanger, you wanna rebase 468253 on top of that one and see what we get?	13:13
Shrews	maybe that's not necessary though	13:14
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Use https to clone Github3.py https://review.openstack.org/468413	13:15
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127	13:17
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat https://review.openstack.org/468253	13:17
Shrews	pabelanger: i meant rebase on 468409, but that's alright. let's see what the experimental job comes back with	13:18
pabelanger	we need 468127 currently	13:19
Shrews	i thought maybe your patches were exposing the problem, but looks like maybe not	13:19
pabelanger	so, if you rebase 468409 on 468127 it should expose the failure	13:19
pabelanger	or 468253	13:20
pabelanger	that is tip	13:20
Shrews	ah, thx	13:20
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration https://review.openstack.org/468409	13:21
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208	13:40
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	13:40
SpamapS	jeblair: ^ Amusing amend to the bwrap there informed by our close_fds find yesterday. Since py2.7 lacks the pass_fds parameter if we add close_fds=True to our ansible running (not a bad idea) we may become py2.7 incompatible :-P	13:41
*** isaacb has joined #zuul		14:02
SpamapS	http://zuulv3-dev.openstack.org/logs/9275f367a0ae4993965dc4961f94b9b2/ <--- can somebody explain this?	14:06
SpamapS	looks like maybe something went terribly wrong	14:06
jeblair	tobiash: the config repos are assumed to be branchless -- zuul only reads from master there (mostly because considering more than one branch of a config repo becomes very difficult to reason about). i suppose we could have a zuul configuration setting to indicate which single branch (master or something else) of config repos to read, but i would want to be very careful with that that we don't make it too confusing.	14:24
jeblair	SpamapS: do you mean the fact that there is a lot of test output going to the console there?	14:25
tobiash	jeblair: something like http://paste.openstack.org/show/610740/ ?	14:27
jeblair	tobiash: yep	14:29
tobiash	jeblair: ok, then I'll try this next week	14:29
tobiash	thanks	14:29
*** dkranz has joined #zuul		14:30
jeblair	tobiash: note that a lot of the self-testing jobs and instant re-configuration in zuulv3 is geared toward something more like a continuous deployment model. so pinning to a tag, etc, will inhibit some of that.	14:30
tobiash	jeblair: I'm aware of that so I'll try to minimize the pinned config parts	14:31
tobiash	jeblair: I'm just thinking of the really basics of the deployment like log collection or similar which would need pinning if I share them between deployments and want to add something which requires a zuul update	14:32
jeblair	tobiash: that makes sense. we're also considering site-local variables, so that even base jobs could have, say, log collection destinations in a config file rather than a repo.	14:34
tobiash	great, I think a carefully chosen (minimal) mixture of both of these two concepts would fit well in our deployment/release model	14:37
Shrews	pabelanger: looks like my patches eliminated the exceptions, but now looks like the nodes is never booted. stuck in the waitfornode loop	14:41
Shrews	s/nodes is/node is/	14:41
pabelanger	Shrews: ya, I think base64.encodestring() is the issue now	14:44
pabelanger	not python3 compat	14:44
pabelanger	reading docs, it expects a bytes-like object	14:45
pabelanger	and we pass string	14:45
tobiash	Was there a change in the executor config? Looks like it doesn't know the configured gerrit connections anymore.	14:51
Shrews	pabelanger: hrm, yeah. also, would be nice if we could set min-ready:0 for labels we don't build images for	14:54
jeblair	pabelanger: there's probably some base64 changes in zuul you can reference	14:57
jeblair	tobiash: yeah, i think we may have merged a change that should only load the gerrit connections in the executor (ie, not sql reporters). I453754052b75de3d4fac266dc73921c7ce9c075f obviously that shouldn't cause the behavior you're seeing, but maybe there was an issue with it?	14:59
jeblair	tobiash: that's the only recent thing i can think of	14:59
tobiash	jeblair: yepp, reverting this fixes my issue	14:59
jeblair	i guess we're not running it yet :\|	15:01
jeblair	i wonder if we need isinstance rather than issubclass	15:01
tobiash	jeblair: so with this change only timer and zuul get configured in the merger	15:01
tobiash	I'll try this out	15:01
jeblair	that sounds like the opposite of the desired behavior	15:02
jeblair	oh yeah, 'driver' in that function is an object instance, not a class object. so i think the fix will be to switch to isinstance	15:03
jeblair	i would have expected this to be caught by tests... :\|	15:04
tobiash	changing to isinstance -> timer, zuul, gerrit connections	15:05
tobiash	I'll try to craft a test case	15:05
jeblair	oh, i wonder if the tests aren't passing source_only correctly. maybe we should not make that a default argument.	15:06
jeblair	since it defaults to false, that means the executor in the tests would load everything.	15:06
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not request nodes for which we have no images https://review.openstack.org/468447	15:11
Shrews	pabelanger: ^^^ i think that will work	15:11
tobiash	jeblair: do the tests have separate connection registries by scheduler and executor?	15:14
jeblair	tobiash: oh, you're right, they don't.	15:15
jeblair	tobiash: i'm not sure i want to change that, so maybe we do just need a separate unit test of this.	15:15
tobiash	jeblair: I think I have a test case (+fix)	15:33
tobiash	will upload shortly	15:33
jeblair	tobiash: cool, thanks :)	15:33
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix missing source connections on merger https://review.openstack.org/468451	15:42
tobiash	jeblair: ^^^ that's my try	15:42
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Fix base64 encoding of server key https://review.openstack.org/468453	15:42
Shrews	pabelanger: maybe?? ^^^	15:42
pabelanger	Shrews: ya, that should work	15:45
pabelanger	just tested locally	15:45
pabelanger	[u'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCbqq2UG8bkIcd30BsNt0w9HizdwyB5MMZfBQi8phXaHuKieUEZ1NnXfohIKWBCMC1wYm/XH0rsJOGuolNlDKc12bQ/ibvnNmAKT7H95QHa6QVOwMioeQQPVZcGvs0g6iJOM6bfLn6OJpDQeUMJz9aGJ6JuFeR8/kQsT8s/hXLOxLI58Ac1m5heDZ/8u+GKYW5eB9rXWWKDcYoeWOgs3hsw9rjP2ILW2QjNYAKnv9RPir/CCpa6A2Ydf9Kq5Mr4rdWz1ZP4J/zvpgyIsCrc9pmGPUZ0WkcivxPIk04al71pLKz2ZXX9mtIbmY6sq2L3DBE+vKBpJfj9fi4Zz/4/yqZJ']	15:45
pabelanger	lets see if zookeeper likes it now	15:46
jeblair	tobiash: lgtm, thanks!	15:48
tobiash	:)	15:48
jeblair	anyone else want to review mordred's log output cleanup patches? 467310 467603	15:49
* SpamapS will take a look		15:52
SpamapS	jeblair: would that make the link I pasted easier to read?	15:52
SpamapS	because I honestly have no idea what went wrong there	15:52
jeblair	SpamapS: yes	15:52
SpamapS	"Could not determine repository location of /home/zuul/src/git.openstack.org/openstack-infra/zuul" ?	15:52
jeblair	SpamapS: however, i think mordred identified a further problem in that we might not be getting data back from subsequent command invocations? Shrews, were you looking into that?	15:53
Shrews	no, i think monty was	15:54
jeblair	SpamapS: (it's not clear to me whether we would make things "worse" by landing mordreds fix in that we wouldn't see any output after the first command; though "worse" in this case is definitely subjective)	15:54
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Use https to clone Github3.py https://review.openstack.org/468413	15:58
jeblair	SpamapS: inline question on 468173; otherwise lgtm!	16:00
SpamapS	jeblair: I think the net result is "better"	16:01
jeblair	Shrews: how so? we don't need to test that we can generate an ssh key, right? that seems like it would unecessarily slow down the tests...	16:03
jeblair	SpamapS: oh! that was a response to the console log thing. :)	16:04
jeblair	ha, sorry	16:04
jeblair	Shrews: and sorry, i s-tab-failed there	16:04
jeblair	we all kinds of mixed up in irc	16:05
SpamapS	jeblair: indeed, we should add a list of tags to any multiplexed irc conversation. tags=['console']	16:06
jeblair	yes, and require unique first letters... we may need to use unicode	16:06
mordred	jeblair, Shrews: my thinking was that seeing the issue would be clearer if we land the patches - although I agree that it 'reduces' our ability to see test output at that point	16:08
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Strip unneeded cmd, changed, stdout and stderr from results https://review.openstack.org/467310	16:09
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Display command/shell results in a banner https://review.openstack.org/467603	16:09
jeblair	mordred: wfm	16:09
Shrews	mordred: yeah. i'll review those too	16:10
mordred	k. let's see what new issues I created	16:10
jeblair	i love seeing the vintage of zuul unit tests when they say things like "stable/havana"	16:12
Shrews	pabelanger: eh? http://logs.openstack.org/53/468453/1/experimental/gate-dsvm-nodepool-py35-src-nv/2f1785a/logs/devstacklog.txt.gz#_2017-05-26_15_55_28_164	16:12
pabelanger	Shrews: need to be rebased on 468127	16:14
pabelanger	we should see about landing that	16:14
Shrews	gah. i rebased the earlier patches in gerrit, not locally	16:14
Shrews	grrr	16:14
mordred	jeblair: since those are the action plugins we should pick up those changes without a restart, right?	16:16
jeblair	clarkb, pabelanger: i'm okay with 468127, but just for my own edification, why do we need to drop -e there?	16:17
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix base64 encoding of server key https://review.openstack.org/468453	16:17
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Do not request nodes for which we have no images https://review.openstack.org/468447	16:17
openstackgerrit	David Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix removeCompletedRequests for dict iteration https://review.openstack.org/468409	16:17
jeblair	mordred: i think those are the things we copy into a staging area on start, so we will need a restart	16:18
jeblair	(we'll also need a manual git pull and pip install since the server isn't puppeted)	16:19
pabelanger	jeblair: it appears pip -e for python3 write more data to our git repo, we thing maybe some addition setup files. The issue is, setup_develop above is run under devstack, and sudo. So our choice was either use sudo or drop -e.	16:20
pabelanger	happy to do a 3rd option if there is one	16:21
pabelanger	http://logs.openstack.org/53/468453/1/experimental/gate-dsvm-nodepool-py35-src-nv/2f1785a/logs/devstacklog.txt.gz#_2017-05-26_15_55_28_164 is the issue	16:21
mordred	jeblair: ah - ok. cool. I can work on doing that	16:22
mordred	jeblair: it should just be the executor that needs a restart yah?	16:25
jeblair	mordred: yep	16:27
mordred	jeblair, pabelanger: I'm tailing the debug log of the executor right now ... and I'm seeing issues float by about no-route-to-git.o.o	16:36
pabelanger	odd	16:41
*** isaacb has quit IRC		16:42
jeblair	mordred: there's an ipv6 and and ipv4 check; i think it's okay if the ipv6 check fails if ipv4 works	16:47
mordred	ah - ok	16:50
* SpamapS rebases		16:52
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208	16:52
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	16:52
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add SSH Agent Primitives and usage https://review.openstack.org/468173	16:52
pabelanger	ya, we just started doing traceroute / traceroute6 on zuulv3-dev	16:53
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Fix socket.error exception usage https://review.openstack.org/468405	17:02
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Drop -e for pip install for devstack plugin.sh https://review.openstack.org/468127	17:03
mordred	jeblair, Shrews: WELP - those patches just flat didn't work at all ... investigating	17:13
mordred	jeblair: is it possible we only do the code staging copy on schedule restart?	17:14
* mordred goes to look		17:14
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Update keyscan for python3 compat https://review.openstack.org/468253	17:15
mordred	jeblair: OH FOR THE LOVE OF ...	17:16
jeblair	mordred: did you do a "pip install -U ." ?	17:16
mordred	jeblair: nope	17:16
mordred	jeblair: I just did the git pull :)	17:16
jeblair	that'll do it	17:16
mordred	I'm going to restart the executor again	17:17
jeblair	++	17:17
mordred	maybe THIS time I'll be able to make something break because I write bad code	17:17
mordred	rather than making it no-op because I don't know how to intsall software	17:17
SpamapS	hrm	17:19
SpamapS	http://paste.openstack.org/show/610752/	17:19
jeblair	mordred: a universe of possibilities awaits!	17:19
Shrews	oh, what a universe	17:20
clarkb	Shrews: I've approved https://review.openstack.org/#/c/468409/3 but left a comment inline if thats something you want to clean up	17:20
SpamapS	mordred: HeroOps!	17:20
SpamapS	which is also "Her Oops" which I'm thinking is why we haven't seen much uptake on that moniker. ;)	17:21
Shrews	clarkb: ah, yep. will clean it up after helping pabelanger fix the py3.5 things	17:21
jlk	THEY CALL ME, MISTER OOPS!	17:21
jlk	uh, without the comma	17:22
jeblair	oops	17:22
Shrews	capitalization is everything, SpaMaps	17:22
jeblair	SpamapS: [paste] hrm, do you think that's related to/racing with ssh-agent and close_fds?	17:23
jeblair	SpamapS: (or racing with git ops?)	17:23
SpamapS	Glittering marble tile and tranquil aroma therapy with local cannabis incense burning awaits you once you brave this dirt track outside Fontana, CA... THIS, is Spa Maps!	17:23
SpamapS	jeblair: I do.. I think we may need to start passing close_fds=True on some things. But that also might hasten our py3 transition (or we have to double-fork to run bwrap with open fds)	17:25
jeblair	SpamapS: oh, that's py3 too, so everything should be getting close_fds by default	17:26
jeblair	SpamapS: do you set close_fds=False for the bwrap run?	17:26
SpamapS	jeblair: I don't force it no, since I'm letting the caller pass any subprocess.Popen args in.	17:26
clarkb	Shrews: I've approved those nodepool cleanups. Doesn't look like py35 job is happy yet, but I figure those changes can only help	17:27
SpamapS	jeblair: I recently added some code that is py3 only to it though, where if you do pass close_fds=True, we have to use pass_fds=[passwd_r, group_r] so that our getent stuff makes it in.	17:27
jeblair	SpamapS: closed_fds defaults to true in py3 though	17:27
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Fix base64 encoding of server key https://review.openstack.org/468453	17:27
SpamapS	jeblair: right... let me show you the patch..	17:28
Shrews	clarkb: thx. waiting for the log to see what else is wrong	17:28
SpamapS	jeblair: https://review.openstack.org/#/c/453851/18/zuul/driver/bubblewrap/__init__.py	17:28
SpamapS	https://server/job/tox-py35/0/ <-- handy link	17:29
SpamapS	:-P	17:29
jeblair	SpamapS: Shrews has a fix for that in progress :)	17:30
SpamapS	indeed!	17:30
SpamapS	so the other option btw, is to double-fork	17:30
SpamapS	we could use the zuul-bwrap thing I wrote to do it actually	17:30
jeblair	SpamapS: i'm really confused now. it seems like your code should handle the py35 case... :?	17:31
SpamapS	if we get a close_fds=True in py27, we could run zuul-bwrap, and zuul-bwrap would run bwrap with close_fds=False	17:31
Shrews	i have a fix for what?	17:31
jeblair	Shrews: log streaming	17:31
SpamapS	jeblair: It does handle py35. It does not handle py27 if we start passing close_fds=True anywhere else. These are facts, but I think we got off the context path...	17:32
jeblair	(specifically, a fix for the lack thereof)	17:32
jeblair	SpamapS: i thought you showed me a bad fd error with py3	17:32
SpamapS	jeblair: in the py35 run that failed, we can presume close_fds=True is the default, and so there may be something else causing our bad file descriptor errors. :-/	17:32
*** rcarrillocruz has joined #zuul		17:32
SpamapS	for once, though, this reliably fails when run just as that one test	17:34
jeblair	nice!	17:34
SpamapS	so I can actually employ like, debuggers and stuff	17:34
SpamapS	The quality of my problems is increasing.	17:34
jlk	It's a good problem, Bort	17:39
jlk	Is there a description of what the various gerrit statuses mean? Context is using them as a pipeline requirement.	17:42
pabelanger	http://paste.openstack.org/show/610755/	17:45
pabelanger	have we changed anything on zuulv3-dev.o.o recently^	17:46
pabelanger	ya, looks like it was restarted	17:46
mordred	pabelanger: yup	17:48
jeblair	mordred, pabelanger: we should land https://review.openstack.org/468451	17:48
jeblair	sorry i lost track of that	17:49
mordred	jeblair: oh - why yes we should!	17:49
mordred	+3	17:49
jeblair	jlk: https://review.openstack.org/Documentation/rest-api-changes.html#change-info is all i see	17:50
jeblair	jlk: it just enumerates them	17:50
jeblair	jlk: but i can describe them more if you have questions	17:51
jlk	I'm trying to determine if I need to map any of that over to github	17:51
jlk	"status" is a completely different thing over there	17:51
jlk	so I was hoping to understand the deeper meaning on the gerrit side	17:52
jeblair	jlk: (the trickiest one is probably submitted, which is gerrit speak for "someone or something (ie zuul) has told gerrit to merge the change and that action has been added to the queue)	17:52
SpamapS	oh man this is getting ugly	17:52
SpamapS	it's definitely still a race	17:53
jeblair	jlk: most closely maps to pull request 'state': open/closed	17:53
SpamapS	http://paste.openstack.org/show/610756/	17:53
jlk	yeah, and we've already got an "open" boolean	17:53
jeblair	though... how is it that pull requests have only one closed state? shouldn't there be closed-and-merged and closed-unmerged?	17:53
SpamapS	seems like maybe there's a bunch of thread-unsafe stuff	17:53
jlk	well	17:53
jlk	on sec	17:54
jlk	there are multiple data points on github side	17:55
jeblair	SpamapS: there should never be more than one thread dealing with a single jobdir	17:55
jlk	there is "state" which can be open or closed	17:55
SpamapS	jeblair: the problem is pipes and fds	17:55
jeblair	jlk: ah, and 'merged' boolean	17:55
SpamapS	jeblair: there's one global list of open fds for the executor.. and something is closing other threads' fds	17:55
jlk	jeblair: yeah, merged_at	17:56
jlk	or an API call to see if the thing is merged.	17:56
jlk	GET /repos/:owner/:repo/pulls/:number/merge 204 merged, 404 not.	17:56
jeblair	jlk: i see merged_at as well as merged	17:56
jlk	ah right	17:57
jlk	and for even more confusion, there is 'merge_commit_sha' which changes it's meaning and value depending on the state of the PR and of the repo configuration	17:57
mordred	I see merged: False for unmerged things	17:57
mordred	maybe	17:58
mordred	or something	17:58
SpamapS	OSError: [Errno 9] Bad file descriptor: 'role_0'	17:58
jeblair	jlk: at any rate... i wouldn't recommend mapping things, per se. but rather just exposing (some of) what github provides. so let 'status' be different between github and gerrit. if we decide we want a merged attribute, we can add it.	17:59
SpamapS	that's weird	17:59
jeblair	SpamapS: yes that is a funny looking file descriptor	18:00
jlk	oh right I wanted those to be different	18:00
jlk	I am exposing the things in github native ways. I just didn't know what all you could expect from "status" on gerrit.	18:01
jeblair	jlk: got it, cool. so basically gerrit status is a combo of github 'state' and 'merged'.	18:02
jeblair	state really being the important one to have	18:02
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Fix missing source connections on merger https://review.openstack.org/468451	18:02
jeblair	(which is covered by 'open' as you point out)	18:02
jlk	On github side I already added a boolean for open, so you can stop processing if a thing is closed.	18:02
jlk	Does it make sense to expose 'merged' as another boolean?	18:02
jlk	I can't think of one immediately, so I'm inclined to not add it.	18:03
jeblair	jlk: same here	18:03
jeblair	add it when it proves useful, i'd say	18:03
jlk	+1	18:03
SpamapS	__del__ strikes again	18:04
SpamapS	oh	18:04
SpamapS	haha	18:04
* SpamapS sees the bug		18:04
SpamapS	so..........	18:07
* SpamapS just found 5 ssh-agents hanginging around		18:07
SpamapS	beginning to doubt running them in the background	18:07
SpamapS	they're all from aborted test runs in the last 2 hours :-P	18:07
jeblair	SpamapS: yeah... though they didn't seem to die after aborted test runs with -D either, iirc...	18:10
* jeblair assumes something something testr		18:11
SpamapS	jeblair: for desktop it's not a big difference. But I'm concerned about the server.	18:11
SpamapS	no it's not testr, it's actually systemd/upstart holding them open	18:11
SpamapS	because all the terminals are in the same process group now	18:11
SpamapS	so your child processes get orphaned to the desktop session manager	18:12
SpamapS	on a server though, if zuul-executor dies, the ssh-agents that it hasn't killed would get killed _if_ it's running as a daemon or systemd service since systemd will make a process group for service units by default	18:13
jeblair	SpamapS: in production, it should only be an issue in the case of an unclean shutdown....	18:13
SpamapS	agreed, a 'splosion level crash	18:13
jeblair	SpamapS: and since we already make a pid group for the ansible run, an unclean shutdown is already messy there	18:13
SpamapS	what's the thinking behind that btw?	18:14
jeblair	(since all the ansible processes would continue in the same way as ssh-agent, right?)	18:14
jeblair	SpamapS: we have to kill ansible sometimes, and we want it and all its subprocesses to die	18:14
SpamapS	the ansible processes would have their stdout/err connected to pipes though, no?	18:14
SpamapS	so they'd get EPIPE or EBADF on operations	18:15
jlk	jeblair: if zuul gets an event for a change, and queues up that event for processing, and time passes, when it's ready to process that event when it looks at the change to run filters, does it get new data about the change at that point? In gerrit it's _updateChange (with some caching), in github it's getChange.	18:15
jlk	the getChange happens at schedule time, which could be some time after event time, right?	18:15
jeblair	SpamapS: maybe? we can try killing it and see what happens :)	18:15
jeblair	jlk: part of the reason for the caching is so that any time there's new information about a change, we update the singleton object in zuul to reflect the new information	18:16
jeblair	jlk: so in gerrit's case, it will use the most recent data	18:17
jeblair	jlk: iirc, github doesn't implement that yet, but i think it should	18:17
jlk	yeah I think we talked through the caching once before, and I put a pin in implementing a cache for github.	18:17
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Remove unnecessary list() https://review.openstack.org/468489	18:17
jlk	mostly I wanted to be sure I had the model right, that "fresh" data is grabbed at schedule time	18:17
jeblair	jlk: yep. and it'll go stale until we implement the caching/update thing. then it should get asynchronously updated.	18:18
jlk	it'll "go stale" between the time scheduler decides it needs to be tested and it gets tested?	18:19
jeblair	jlk: yes, or even in the time between when it was enqueued and when it's evaluated	18:20
jlk	what would cause a fresh update at evaluation time?	18:20
jlk	(in the gerrit case)	18:21
jeblair	jlk: in either case, any further update to the change/pr.	18:22
jlk	ooooh wait, I see, because a new event would come in, which would cause an update to the in-memory cached object	18:23
jeblair	jlk: exactly	18:23
jlk	<--- LIGHT BULB	18:23
jeblair	jlk: that might just be a comment-added. it might also be something substantial like someone closed the pr.	18:23
jlk	yup, makes total sense now. I was looking at it from the wrong angle.	18:23
jeblair	in any case, that event should (as a side effect) update the data on the change/pr in memory	18:24
jeblair	and if the scheduler hasn't gotten around to evaluating the original event, once it does, it'll use the new data (so it may well not enqueue a newly updated change in check if someone manages to close it quickly)	18:25
jeblair	or, if it's already running, the next time the pipeline manager passes over that change, it will see that it no longer meets the "open: true" pipeline requirement that it has, and kick it out.	18:25
SpamapS	Oo, I wonder if we should check for leaked FDs in our test shutdown checks.	18:28
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208	18:30
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	18:30
* SpamapS stops closing other peoples' fds		18:30
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Cleanup failed upload records https://review.openstack.org/467783	18:31
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Fetch server console log if ssh connection fails https://review.openstack.org/452494	18:35
* jlk afk for some MTB riding		18:52
jeblair	SpamapS: aha! i see what you mean about __del__ there. is it worth thinking about accomplishing that with a context manager?	18:58
jeblair	mordred: did you re-restart zuulv3 yet after the connection fix landed?	18:58
mordred	jeblair: nope. I will now do that	18:58
mordred	heh	18:59
mordred	Successfully installed zuul-2.5.2.dev834	18:59
mordred	that's a lot of dev patches :)	18:59
jeblair	hehe	18:59
jeblair	mordred: if only it would tell us how many patches less than 3.0 we are :)	18:59
mordred	jeblair: I did service zuul-executor stop and it did not stop	19:00
mordred	jeblair: how much do we care about that righ tnow?	19:00
jeblair	mordred: i don't know what "service zuul-executor stop" does :\|	19:00
mordred	me either	19:01
jeblair	mordred: "zuul-executor stop" will, however, dtrt (and that is, ultimately, what service zuul-executor stop should do)	19:01
jeblair	or maybe it should graceful	19:01
jeblair	i dunno	19:01
mordred	nope. still running	19:01
jeblair	mordred: if it did graceful, then that's expected	19:01
jeblair	it will stop eventually	19:01
mordred	ah. ok	19:01
mordred	the last 4 lines from the log are:	19:02
mordred	2017-05-26 18:59:41,381 DEBUG zuul.ExecutorServer: Stopping	19:02
jeblair	(also, it will be really nice to have rotated logs)	19:02
mordred	2017-05-26 18:59:41,381 DEBUG zuul.CommandSocket: Accepted socket connection <socket._socketobject object at 0x7efdb0a2f7c0>	19:02
mordred	2017-05-26 18:59:41,381 DEBUG zuul.CommandSocket: Received _stop from socket	19:02
mordred	2017-05-26 18:59:41,382 DEBUG zuul.AnsibleJob: Abort: no process is running	19:02
mordred	++	19:02
jeblair	so i guess it ran "stop" and not graceful	19:02
jeblair	which means this is unexpected	19:03
jeblair	mordred: i'd sigkill at this point	19:03
mordred	jeblair: ok. just didn't want to lose any valuable debugging if we thought it was valuable	19:03
jeblair	(maybe some side effect of the bug we're fixing?)	19:03
jeblair	mordred: nah	19:04
mordred	jeblair: I'm voting side-effect	19:04
mordred	jeblair: ok - killed and restarted	19:04
jeblair	mordred, Shrews, pabelanger, jlk, jesusaur, SpamapS, clarkb, fungi, anyone else: er i just realized monday is a us holiday; should we cancel the zuul meeting?	19:21
* jeblair is bad at meetings		19:21
Shrews	Yes	19:21
mordred	jeblair: ++	19:22
mordred	jeblair: that'll give me another week to accomplish my tasks so that tristanC doesn't dislike me	19:22
clarkb	I'm likely to be afk	19:23
fungi	jeblair: i will gladly attend a zuul meeting monday, but will be just as happy not to need to remember to take a break from holidaying to do so	19:24
* jeblair is also bad at mondays		19:25
jlk	I'm AFK	19:28
jlk	for Monday	19:28
pabelanger	jeblair: wfm	19:29
* jesusaur will likely be around monday, but not working		19:34
SpamapS	+1 cancel	19:44
*** jkilpatr has quit IRC		19:49
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Make zuul ansible dir self-contained https://review.openstack.org/468208	19:52
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Add support for bwrap https://review.openstack.org/453851	19:52
mordred	Shrews, jeblair: I think the hard-restart we did of the zuul executor may have broke something	19:57
mordred	Shrews, jeblair: http://zuulv3-dev.openstack.org/ is not returning and the executor debug log hasn't really done anything in a while	19:59
jeblair	bummer	20:03
jeblair	mordred: http://zuulv3-dev.openstack.org/status.json works	20:03
* Shrews certainly has no idea about executor things		20:04
mordred	Shrews: well - I was pinging you originally because I had a stuck status page that made it look like everything was queued - so I thought maybe zk/nodepool blah	20:05
mordred	but then I reloaded the status page	20:05
mordred	and got blank	20:05
jeblair	mordred: http://git.openstack.org/cgit/openstack-infra/zuul/tree/etc/status/public_html/index.html?h=feature/zuulv3#n34	20:07
jeblair	i think that's the problem	20:07
mordred	jeblair: oh - that sure does seem to have wrapped wrong	20:08
jeblair	yeah, i'm assuming line 34 should be on line 33	20:08
mordred	yup	20:08
*** jkilpatr has joined #zuul		20:08
jeblair	mordred: i'm knee deep in something i don't want to stash; you want to push up a fix?	20:08
mordred	jeblair: so - that said- the executor debug log is still very not doing anything	20:08
mordred	jeblair: yup. I'm on it	20:08
jeblair	mordred: agreed; lemme look at the status json and see if it should be	20:09
jeblair	yeah, it definitely thinks it should be running jobs	20:09
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix bad text wrap in status page https://review.openstack.org/468532	20:09
mordred	oh! it just did something	20:10
mordred	jeblair: so - the last time it sat there doing nothing, the last line in the log was:	20:10
mordred	2017-05-26 20:02:01,121 DEBUG zuul.Repo: CreateZuulRef feature/zuulv3/Zdfed0f5092ff42cd8f510ce65be7771d at 7b7520b55df36b62b6800273c16e1200c7c76c03 on <git.Repo "/var/lib/zuul/executor-git/git.openstack.org/openstack-infra/zuul/.git">	20:10
mordred	it is once again seeming to not do anything - and the last line in the log is now:	20:11
mordred	2017-05-26 20:10:09,099 DEBUG zuul.Repo: CreateZuulRef feature/zuulv3/Zed63cd22bbdb43b7a2b67a337215a2b2 at 9c702bcb666e17d57f05372eafd68b39a6e8a280 on <git.Repo "/var/lib/zuul/executor-git/git.openstack.org/openstack-infra/zuul/.git">	20:11
jeblair	yeah, so the executor's merger is doing stuff, but it's apparently not running jobs	20:11
mordred	yah	20:11
jeblair	executor:execute001	20:12
jeblair	nothing queued	20:12
mordred	jeblair: 14b315d was the commit that was running before - there aren't a ton of things related to interaction in there, but there is some of the work finishing the merger/executor split in the new running executor	20:15
jeblair	mordred: maybe we should restart the scheduler in case something changed with the interaction	20:16
mordred	jeblair: perhaps we're on opposite sides of a shift?	20:16
mordred	yah	20:16
* mordred tries that		20:16
jeblair	jobs!	20:17
mordred	jeblair: yah. much happier now	20:17
mordred	jeblair: so - let's say it was interface mismatch	20:18
jeblair	wfm	20:18
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Write inventory as yaml not ini https://review.openstack.org/467634	20:34
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add a space and a capital letter to success banner https://review.openstack.org/468539	20:39
mordred	SpamapS, jeblair: ^^ based on looking at the output now, there is a minor edit to the format string	20:41
mordred	http://zuulv3-dev.openstack.org/logs/59f6cd6d3ae84dd2af24d90878f0e964/ansible_log.txt	20:41
pabelanger	clarkb: also means, if we create the same setup in openstackzuul project, this should just work	20:41
clarkb	pabelanger: for ipv6? I think we want to have consistent connectivity and routes	20:41
clarkb	but ya	20:41
mordred	SpamapS, jeblair: I also think we need a filtered on_task_start method too - printing _raw_params is bad	20:42
pabelanger	mordred: ++	20:42
pabelanger	i noticed that	20:43
mordred	and honestly I think we need to sort out how to not double-timestamp this	20:43
pabelanger	clarkb: oops wrong room	20:43
SpamapS	mordred: can't we just ARA this stuff and be done? ;)	20:48
mordred	SpamapS: :)	21:00
mordred	pabelanger: I feel like a few weeks ago one of us figured out what we thought was wrong with the "stream logs from host only works for first task" issue	21:01
mordred	pabelanger: you don't happen to remember do you?	21:01
pabelanger	mordred: was it something to do with unicode / encoding / decoding?	21:02
pabelanger	cannot remember	21:02
jeblair	mordred: i agree it would be nice not to double timestamp; i'm really not sure how to accomplish it (i don't even mean technically -- i'm not sure what the desired outcome would actually be)	21:07
jeblair	mordred: the inner timestamp is critical for debugging (it's the actual timestamp); the outer one can be discarded... except that means that the log no longer has a consistent timestamp series or format.	21:09
jeblair	i'm inclined to that (discard the outer timestamp, if that's possible). i just worry about whether that makes life difficult for log parsers like osla and grok	21:10
mordred	well - I'm thinking we could change the logger format we emit in the callback plugin (which is what controls the outer one)	21:10
jeblair	ooh yeah	21:10
mordred	to match the format the inner one would have	21:11
jeblair	if we change that to match the command module, and then drop it from content streamed from the command module, we might be set	21:11
mordred	then when we emit the messages from the remote logger, don't append timestamps	21:11
mordred	yup	21:11
mordred	I think that's doable	21:11
jeblair	(also p=871 u=zuul are really unecessary for us)	21:11
mordred	yup	21:12
mordred	they should go away	21:12
mordred	and a few other cleanups	21:12
mordred	pabelanger: I don't either :(	21:12
jeblair	(it's only interesting when the pid changes, but we can easily throw in a "starting new playbook" line)	21:12
mordred	jeblair: indeed	21:13
mordred	jeblair: oh wow - wanna see a fun hack?	21:14
mordred	jeblair: logger = logging.getLogger("p=%s u=%s \| " % (mypid, user))	21:14
mordred	jeblair: they name the logger itself "p=%s u=%s \| " % (mypid, user) - then set the format to '%(asctime)s %(name)s %(message)s'	21:15
jeblair	oh, wow	21:22
jeblair	why?	21:22
jeblair	maybe they couldn't think of another way of adding in extra data like that...?	21:24
clarkb	mordred: its too near the weekend for eye bleeding	21:25
mordred	jeblair: I think yes - I think it's because in the callback plugin itself it actually doesn't know that information perhaps?	21:26
mordred	(I honestly have no clue)	21:26
jeblair	there are so many ways of manipulating logger data if you use the python logging library correctly...	21:26
clarkb	os.getpid() should work anywhere?	21:26
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Release executor jobs on test exit https://review.openstack.org/468553	21:27
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add LoggerAdapter for executor jobs https://review.openstack.org/468554	21:27
jeblair	there's one ^ :)	21:27
jeblair	mordred: anyway, we can probably get the logger and reset the formatter?	21:28
mordred	jeblair: yah	21:28
mordred	I'll poke at that on money	21:28
mordred	monday	21:28
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Actually check out the branch in the jobdir https://review.openstack.org/468556	21:32
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test https://review.openstack.org/468557	21:32
jeblair	2 cloner tests enabled! ha ha ha	21:32
SpamapS	cloner?	21:33
SpamapS	isn't that a dead thing?	21:33
SpamapS	is zuulv3-dev not running jobs now?	21:34
SpamapS	not seeing it post results to my most recent patches	21:34
jeblair	SpamapS: [cloner] yes, but what it does is now handled by the executor. the cloner tests are really detailed and important, so i'm enabling them applied to the executor with minimal auditable changes; i'll rename them at the end.	21:36
jeblair	SpamapS: er, there were issues earlier; i think they were resolved, but i haven't followed up with very recent stuff	21:36
jeblair	SpamapS: anything uploading since 20:17 utc should be okay	21:38
jeblair	er uploaded since	21:38
jeblair	older than that, not so much	21:38
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add a space and a capital letter to success banner https://review.openstack.org/468539	21:43
SpamapS	jeblair: ah ok I fell into the "problems" window ;)	21:46
* SpamapS will recheck		21:46
SpamapS	jeblair: also [cloner] ACK	21:47
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Verify executor branch checkout https://review.openstack.org/468564	22:45
* SpamapS hits reviewer exhaustion		22:46
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Enable test_multi_branch cloner/executor test https://review.openstack.org/468557	22:47
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Verify executor branch checkout https://review.openstack.org/468564	22:47
pabelanger	jeblair: have we committed to nolonger having zuul-merger running for zuulv3?	22:48
SpamapS	pabelanger: I believe the thought is that it might be useful to have it doing just merges to offload work from executors.	22:48
SpamapS	tho if the result of that work doesn't make it into the executor and must be repeated, I'm not sure there's value in doing that.	22:49
jeblair	SpamapS: unfortunately, since jobs for a single change can run on multiple executors, we will either need to repeat it or fetch it from the mergers (which require they serve git repos (though only privately, not publicly))	22:50
jeblair	at any rate, i've implemented "repeat it"; if we later decide we'd rather have them fetch, that's fine -- it'll mostly just be a matter of deleting code :)	22:51
SpamapS	I could see just having two ways of configuring it as we learn to scale.	22:52
SpamapS	for big ones like infra... you have the mergers serving to the executors	22:52
SpamapS	for the average one, just have executors merging.	22:52
jeblair	while that's related to the question of whether standalone mergers are worthwhile, it's not the only consideration -- either way, there's significant work that can be offloaded from the executors by having standalone mergers, so i want to keep them around until we see how they perform.	22:52
SpamapS	and I'd rather focus on the average one first, and then when we break it, we have data to guide our optimization	22:53
jeblair	well, we kind of need to run the infra one :)	22:53
SpamapS	what about another way of thinking.. what if mergers could push their merge results to executors, sort of like how gerrit push triggers work	22:54
jeblair	at any rate, both are supported in the code right now, and i think if pabelanger is asking "what should the puppet modules do?" i think the answer is they should continue to support both for now.	22:54
SpamapS	jeblair: what I mean by that is that we haven't run an infra with this scheme, so we don't know that it will break at that scale.	22:54
pabelanger	jeblair: ya, puppet / ansible. I have the code to support both still, was mostly curious if still valid. Thanks for the feedback	22:55
SpamapS	entirely possible the executors will remain blissfully busy and humming doing merges sometimes and running jobs others.	22:55
jeblair	SpamapS: yeah, though that's the same problem with the direction of transfer reversed. it's actually a little trickier since an executor knows where to pull from, but a merger doesn't know where to push.	22:55
SpamapS	jeblair: I was thinking merger would push to all. :-P	22:55
SpamapS	or.. something crazy like.. to AFS	22:55
SpamapS	would that even work	22:56
* SpamapS has no idea		22:56
jeblair	SpamapS: about scale: yes, that much i agree with; we don't need to push people to run mergers yet; we just need to keep them around in case infra-scale needs them.	22:56
jeblair	SpamapS: (mostly ^ i wanted to clarify that i think we do need to keep that architecture working so we have it ready if infra-scale needs it)	22:56
SpamapS	ACK	22:56
jeblair	SpamapS: pull would have the advantage of only doing work on the machines that need it; i think the access/network concerns are about the same either way	22:57
* SpamapS really would love to be in the position of actually watching it scale or fail to scale. ;)		22:59
jeblair	(i'm just really happy that we're out of the business of saying "here's how you set up apache to serve git repos"; probably if we did pull, we could just have intra-zuul ssh access or something to make it simpler)	22:59
jeblair	(it has also occurred to me that we could push git blobs over gearman, but i'm trying not to think about that)	22:59
jeblair	the "repeat" approach is dumb, but simple.	23:00
SpamapS	is there such a thing as a git blob?	23:01
SpamapS	I wondered if maybe there was , like a git pack thing, that can be easily created.	23:01
jeblair	SpamapS: well, the stuff that git upload-pack ships around	23:01
pabelanger	ya, ssh would be nice over apache	23:02
SpamapS	I just don't know how those are created	23:02
SpamapS	I guess you know the refs you made with the merge	23:02
jeblair	it could get complicated since it depends on what objects are on both sides of the transaction	23:02
jeblair	probably you'd just assume the executor had all the objects required by the current branch tips, and then send all the ones required by the change not in that set. then the executor could start by updating from origin, then apply the objects from gearman.	23:04
jeblair	ugh. i'm failing to not think about this. :)	23:04
jeblair	SpamapS: you've now gotten me convinced this is worth spending a few minutes poking at.	23:05
SpamapS	https://git-scm.com/docs/git-pack-objects	23:06
SpamapS	yep	23:06
SpamapS	it actually doesn't seem so haphazard	23:06
jeblair	so next thing is: get a list of objects we need to send (so we can pass that to git-pack-objects)	23:07
jeblair	would that just be a list of commits, or is it more complex than that?	23:08
jeblair	(like, would we also have to identify what tree objects are required?)	23:09
SpamapS	jeblair: if you know that the other side has everything except those commits, then it's just those commits.	23:09
SpamapS	But if there are things in those objects' tree that are not present, you need those too to unpack.	23:09
jeblair	SpamapS: so basically: for each new commit object, get the associated tree object. pack the set of commit and tree objects. send it over the wire and unpack. ?	23:17
SpamapS	jeblair: indeed	23:20
* SpamapS is playing localy		23:21
SpamapS	locally even	23:21
SpamapS	jeblair: so I don't know how to get the updates to refs	23:21
SpamapS	I commit to master, pack up that commit hash, and move it to the other repo with unpack-objects ... and then I can checkout that hash, but master is still at the old hash	23:22
jeblair	SpamapS: that's cool -- i already have code to update all the refs, including branch heads, to point at the right commits	23:22
SpamapS	so yeah seems doable	23:23
SpamapS	and the pack files are pretty efficiently sized.. already gzipped	23:23
SpamapS	so even a large chain of merges would probably keep under the 1MB limit	23:23
jeblair	okay... i'll putz around with doing this with gitpython, and maybe try it out next week...	23:24
jeblair	SpamapS: 1MB limit?	23:24
SpamapS	gearman	23:24
jeblair	SpamapS: per packet?	23:24
SpamapS	technically the limit is 4GB but realistically pushing bigger than 1MB through gearman is pretty inefficient.	23:25
SpamapS	gearmand will flat out refuse > 1MB jobs	23:25
SpamapS	dunno if geard does the same	23:25
jeblair	nope, didn't know about a limit :)	23:26
SpamapS	but .. memory usage would obviously get pretty hairy	23:26
SpamapS	it's just an implementation specific limit	23:26
SpamapS	the protocol allows a 32-bit unsigned int for packet length.. that's the only theortical limit.	23:26
jeblair	SpamapS: but a job can send multiple work_data packets, so we could stream reasonable chunks at a time.	23:26
SpamapS	that's a good point, those are handled one at a time for that reason	23:26
SpamapS	the original usage of that was resizing images on livejournal	23:27
jeblair	(we might even want to go ahead and do that just to avoid starvation)	23:27
SpamapS	yeah I could see just pushing a merge job with a flag that says "get some git objects from work data first"	23:28
SpamapS	err no	23:28
SpamapS	clients don't send work data (doh)	23:28
jeblair	oh good point	23:28
SpamapS	feeling more and more like configuring apache to fetch from mergers is simpler ;)	23:29
jeblair	so if we do merger -> scheduler -> executor, we can only chuck from merger -> scheduler, not from scheduler-> executor....	23:29
jeblair	but	23:29
jeblair	if we have the scheduler ask the merger for data directly (via gearman), that could stream in chunks, and, incidentally, avoid the scheduler seeing a lot of data it doesn't care about.	23:30
SpamapS	if we had a coalescing gearmand we could go in reverse	23:31
jeblair	scheduler -> [merger:merger], returns {merger: merger-name; objects: ...}. then executor -> [merger-name:fetch-objects, objects:...] would be able to stream chunks from the merger to the executor directly (at the cost of requiring the merger to stay online for the duration of the buildset)	23:32
SpamapS	this feels like a very roundabout way to get around mergers simply serving their completed merges, doesn't it?	23:33
SpamapS	like, not needing tests nodes to contact mergers is a big win. Not needing executors to do the same seems less important.	23:33
jeblair	SpamapS: yes, fetch over http or ssh would be much simpler, but complicates the operational deployment.	23:33
jeblair	SpamapS: agreed, internal access isn't as significant as external access.	23:34
jeblair	SpamapS: how does coalescing help here?	23:34
SpamapS	you know I think it could be done w/o coalesce in this instance	23:36
SpamapS	but basically flip the roles	23:37
SpamapS	have the executors submit a job to a queue name in response to receiving a merge request, and then basically let the mergers serve it like cache requests.	23:37
SpamapS	"do I have these objects already", no, do the merge.. send packfile.	23:38
SpamapS	the coalesce would just be good for avoiding repeating merges	23:39
SpamapS	since if you had several executors request the same objects they could coalesce on a hash of that request and only one merger would do the merge	23:39
jeblair	SpamapS: oh, yeah, i think this is only a win if we only perform the merge once	23:40
SpamapS	but if you want the merges done beforehand, stuffing the result in somewhere addressable and fetching it from there seems like the win.	23:40
jeblair	which is why i was thinking one option is to have the executors ask the merger which performed it for the objects	23:40
SpamapS	I guess you could have a gearman queue per merger and attach the merger name that did the initial merge to the job that the executors get?	23:40
jeblair	yeah that ^	23:41
SpamapS	That would achieve 1-merge-per-change and would avoid new network pathways	23:41
jeblair	still sends lots of data through gearman, but it's the shortest path version of that.	23:41
SpamapS	and you wouldn't need coalescing to make it efficient since even if multiple executors submitted they'd just get served rapidly in succession	23:42
jeblair	(both sides can be efficient about how they handle the data since it doesn't require intermediate storage)	23:42
jeblair	yep	23:42
SpamapS	(though coalescing _would_ make it more efficient when you get to lots and lots of executors)	23:42
jeblair	if we go with that, we really should look into whether we can switch to C gearmand though. i'm not worried about gear clients bottlenecking here, but this may exceed what we can do with geard.	23:43
SpamapS	I don't think gearman handling lots of data is much of a problem.. it's stateless so the only waste is network bandwidth and a little buffer churn	23:43
SpamapS	And yeah, C gearmand gets you multi-threaded performance.	23:43
jeblair	i feel fairly confident that if there are any remaining problems with gearmand for zuul, we can fix them. :)	23:44
SpamapS	perhaps	23:44
* SpamapS eyes rustygear longingly, but shuns it again		23:45
SpamapS	so basically we just need 'git pack over gearman' ;-)	23:45
SpamapS	I will point out that whenever users on the ML ask me for solutions like this, I recommend that they just serve that data over http ;)	23:46
jeblair	SpamapS: here's what i have so far (up to the point where we would git-pack): https://etherpad.openstack.org/p/v5FIUHK4hQ	23:49
SpamapS	jeblair: seems legit :)	23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!