Friday, 2017-06-16

clarkb	I just retested and ps says it doesn't use the venv python	00:01
clarkb	I ran 'ansibleenv/bin/ansible localhost -i invetory -m shell -a "sleep 100"	00:01
mordred	jamielennox: biggest tl;dr is that the command module sync was breaking jobs - dont' know _why_ yet - but with the sync patches things break WEIRDLY and without them they don't	00:01
clarkb	then ps -elf \| grep asnible it shows: /usr/bin/python /home/clark/.ansible/tmp/ansible-tmp-1497571250.32-216375718117125/command.py	00:02
mordred	jamielennox: so short-term fix is revert - and then we can dig in to figure out what specifically is broken	00:02
jamielennox	mordred: interesting, do you have an example of weird failure	00:03
clarkb	and it actually forks an sh to run the python to run the ansible. interesting	00:04
mordred	jamielennox: yah - running the tox-pep8 job from openstack-zuul-jobs breaks - one sec	00:04
mordred	jamielennox: http://paste.openstack.org/show/612772/	00:09
mordred	jamielennox: it's super gross to read	00:10
mordred	oh - I hav ea better log of that	00:10
mordred	jamielennox: http://paste.openstack.org/show/612773/	00:10
mordred	jamielennox: I'm running this playbook: http://paste.openstack.org/show/612774/	00:11
mordred	jamielennox: with and ansible config set up to point to the zuul stuff in my local source directory	00:12
mordred	that failure happens with the command stuff applied	00:12
jamielennox	mordred: wow, the command sync causes that? that is weird	00:12
mordred	jamielennox: RIGHT????	00:12
mordred	jamielennox: it has us completely baffled	00:12
SpamapS	I'm lacking context	00:12
*** jkilpatr has quit IRC		00:12
SpamapS	what is "the command sync" ?	00:12
jamielennox	SpamapS: so to make zuul run in python 3 i updated the "command.py" zuul forked to current ansible	00:13
jamielennox	it only matters when you delegate_to: localhost, because then you run in the same python that ansible is running in	00:13
mordred	SpamapS: https://review.openstack.org/474808 contains a revert of the appropriate files	00:14
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Revert sync from latest command from ansible https://review.openstack.org/474808	00:14
jamielennox	but it should have been a fairly simple patch applying some str/bytes handling	00:14
mordred	yup. I agree looking at it	00:14
mordred	so I can't WAIT to know what's broken	00:14
mordred	but I must now eat dinner	00:14
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Use python2 for ansible on remote hosts https://review.openstack.org/474810	00:14
SpamapS	ehhh	00:16
SpamapS	https://review.openstack.org/#/c/474808/1/zuul/ansible/library/command.py	00:16
SpamapS	line 367	00:16
SpamapS	am I crazy, or are we popening twice?	00:16
SpamapS	or is that just the reverse diff baking my brain?	00:18
jamielennox	SpamapS: ... no you would appear to be reading that correctly ....	00:19
SpamapS	yep	00:19
SpamapS	mordred: ^^^	00:19
SpamapS	I think we can unrevert ;)	00:19
jamielennox	which, lol, would probably be mordred's issue because he's running tox twice in quick succession and so it'd check for a venv, find nothing, try and create one and fail when it exists	00:20
jamielennox	hmm, i wonder if that solves mine as well	00:20
SpamapS	twice, in quick succession, in parallel	00:20
jamielennox	right	00:20
jamielennox	SpamapS: i feel you should get the fixup honours there	00:21
jamielennox	because i looked at that file a fair bit and didn't see that	00:21
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Revert "Revert sync from latest command from ansible" https://review.openstack.org/474814	00:21
SpamapS	jamielennox: ^ done :)	00:21
SpamapS	jamielennox: chalk it up to me taking a few days off from zuul'ing, allowing me to come back in with less assumption brain and more "wtf is going on" brain :)	00:22
jamielennox	there is now a crazy amount of undoing debugging i have to do to even test that properly	00:23
SpamapS	jamielennox: ohmy	00:23
jlk	oh neat...	00:23
jlk	I'm getting stumped by the system. Trying to make org/project2 depend on a change in org/project1, but enqueuing isn't working because they don't share a queue. Oops.	00:24
pabelanger	okay, so that does make sense. Because I think that is why apt was failing about getting a lock too	00:26
pabelanger	I thought it was getting run twice, but was not able to actually see it happening	00:26
jlk	When something gets added to the queue, but live=False, what does that mean for test runs? It seems that the added job never actually runs.	00:28
pabelanger	clarkb: guess we figured out lock issue too, ^	00:29
SpamapS	jlk: hm	00:35
jlk	the non-matching queue behavior is right	00:35
SpamapS	jlk: I may be thinking too high level here, but I'd expect things with no shared queue to still be able to depend on one another.	00:35
jlk	as a writer of project2, I want to depend on a change in-flight in project1, but I don't contribute to project1, and zuul doesn't really report on project1, so make _use_ of the content in the git dir, but don't report on the change.	00:36
jlk	yeah they do	00:36
jlk	SpamapS: you can depend on it for the content in the git repo when you test (for independent pipelines), and you can depend on it being merged (and the content) for dependent pipelines.	00:40
jlk	if they _share_ a queue, then an event on project2 that depends on a pr in project1 will cause the pr in project1 to become enqueued ahead and run through the pipeline as well.	00:41
jlk	it's a twisty path :/	00:41
jlk	mostly I'm trying to figure out where the logic branches are and how to properly test them.	00:41
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github [wip] https://review.openstack.org/474401	00:56
jlk	okay got one test minimally working	00:56
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: run trusted playbook in a bubblewrap https://review.openstack.org/474460	01:15
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: bubblewrap: adds --die-with-parent option https://review.openstack.org/473164	01:15
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: config: refactor config get default https://review.openstack.org/474484	01:15
mordred	jamielennox, SpamapS: great find! I have verified that it works	02:19
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix unicode encoding gotcha in streamer https://review.openstack.org/474827	02:21
mordred	jamielennox, SpamapS: I also noticed that while debugging the other thing	02:22
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: config: refactor config get default https://review.openstack.org/474484	02:25
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: add support for custom ansible_port https://review.openstack.org/468710	05:08
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add support for custom ssh port https://review.openstack.org/468752	05:08
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool request handling code https://review.openstack.org/468748	05:08
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool provider management code https://review.openstack.org/468749	05:09
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Collect request handling implementation in an OpenStack driver https://review.openstack.org/468750	05:15
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Extend Nodepool configuration syntax to support multiple drivers https://review.openstack.org/468751	05:16
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement a static driver for Nodepool https://review.openstack.org/468624	05:16
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github [wip] https://review.openstack.org/474401	05:32
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement pipeline requirement of github labels https://review.openstack.org/473962	05:41
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: Add support for zuul.d configuration split https://review.openstack.org/473764	06:02
*** hashar has joined #zuul		07:21
*** hashar_ has joined #zuul		09:58
*** hashar has quit IRC		09:59
*** hashar_ is now known as hashar		10:43
*** jkilpatr has joined #zuul		10:59
openstackgerrit	Tristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver https://review.openstack.org/468753	11:07
*** hashar has quit IRC		11:43
*** hashar has joined #zuul		12:08
*** hashar has quit IRC		12:29
*** hashar has joined #zuul		12:51
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Remove item.layout https://review.openstack.org/474188	13:25
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Move dependency cycle detection into pipelines https://review.openstack.org/451423	13:27
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Revert "Revert sync from latest command from ansible" https://review.openstack.org/474814	13:31
mordred	tristanC: just left a longish feedback on the trusted execution wrapper patch - we should probably make sure jeblair agrees or disagrees with what I said before you do anything about it though	13:53
mordred	tristanC: also - while you're looking at executor related restrictions - there's an in-progress patch that could use being picked up and finished (that I imagine would be a useful feature for SF too)	13:54
mordred	tristanC: https://review.openstack.org/#/c/463022/ - which is about adding disk space limit protections	13:55
tristanC	mordred: thanks, i'll update the review accordingly	14:00
tristanC	mordred: and regarding further restriction, i still need to wrap my head around the potencial issues with the executor...	14:05
tristanC	I don't get how a job could use disk space on the executor node, beside through stdout	14:06
pabelanger	Ya, expsoing a r/w to bubblewrap make me feel tingly. Not sure why. Seems like we should just have job use SSH, and they could either delegate logs to localhost or another server. Then we don't need to toggle bind mounts in bwrap	14:16
tristanC	pabelanger: but if a job can ssh localhost, then what's the point to run the ssh process in bwrap?	14:17
tristanC	pabelanger: also that implies zuul can ssh localhost in the first place	14:18
tristanC	mordred: regarding 463022, I wonder if using a watchdog for diskusage is the way to go, then how about for iops and cpu usage?	14:22
pabelanger	tristanC: So, what I am planning on doing today is add our ssh key to logs.o.o into a trusted job, and that job will then have ssh access to the remote host. You could do the same for localhost. Only that job has access to the key	14:23
pabelanger	yes, there would have to be a step to setup authorize keys	14:24
pabelanger	like we;d do with remote logging server today	14:24
pabelanger	both jobs are dependent on existing configuration	14:24
tristanC	pabelanger: oh well, that would work too, but that's not what the base/post is currently doing	14:25
pabelanger	I think blindly bind mounting any trusted job to /opt/zuul-logs (in your example) would provide less control. In the case of SSH, it depends on ssh key within said job.	14:26
pabelanger	tristanC: right, we delete that	14:26
pabelanger	tristanC: it was just a hack to get logs on zuulv3-dev.o.o	14:26
mordred	tristanC: yah - the disk was the first thing we were worried about because the executor gets used as a log transfer location and we've had jobs in openstack accidentally start transferring huge amounts of logs	14:27
pabelanger	and reason we haven't done it yet, was didn't want to add ssh key for logs.o.o to executor until we knew things were starting to get secure.	14:27
pabelanger	next step is SSL for gearman, hope to do that today	14:27
mordred	tristanC: for the iops/cpu- I think our original thinking was that those are things that can actualy be restricted at the bubblewrap layer	14:27
mordred	but it turns out total disk space used is a bit harder	14:28
pabelanger	https://review.openstack.org/#/c/474726/ removes copy to /opt/zuul-logs too	14:28
tristanC	pabelanger: well I don't mind removing extra bind in trusted wrapper after all, but giving zuul ssh access to localhost feels wrong to me	14:31
pabelanger	Personally, I don't think jobs sould be writing to localhost on executor. I think it should be considered ephemeral by default. That way it makes scaling out much easier	14:33
mordred	pabelanger, clarkb: when we were talking about limiting disk usage at the PTG - is there a reason we decided against using cgroups to limit total bytes written? I mean, I know it's possible that someone could hit that by writing and deleting a ton of small files, but was there a different reason?	14:34
* mordred reads scrollback between tristanC and pabelanger		14:34
pabelanger	mordred: I'll have to defer to clarkb on cgroups, I cannot remember	14:35
tristanC	mordred: I can't tell if bubblewrap will accept such feature, on the other hand runc would provide all the restriction available in Linux...	14:36
mordred	tristanC: yah - the reason we didn't go with runc is that it requires root to run - and we were trying to avoid making the executor need to run as root	14:37
mordred	tristanC: but thanks - that's probably the reason we also didn't use cgroups for disk usage limitation	14:37
jeblair	mordred, tristanC, pabelanger: i'll go look at the bubblewrap patch now...	14:39
mordred	pabelanger: I think we should have a broader conversation about localhost writing/execution and ssh to localhost- I believe there are strong opinions on both sides of that one (I do not believe I'm one of the folks with a strong opinion)	14:39
pabelanger	IIRC, we did discuss using LVM and carving out slices for jobs	14:39
mordred	jeblair: thanks!	14:39
pabelanger	but not sure where we ended up with that	14:39
jeblair	mordred: but while i'm doing that, maybe it's my morning brain, but why is https://review.openstack.org/474827 necessary?	14:39
mordred	pabelanger: yah - I think the problem we ended up with there was efficiency	14:39
mordred	jeblair: when I was testing the execution yesterday without that I would trigger an exception about inability to encode a byte value	14:40
jeblair	mordred: also, requires lvm.	14:40
mordred	jeblair: yah	14:40
mordred	and root	14:40
mordred	but has the same problem as bind-mounting files - you wind up allocating the max disk to each job	14:40
mordred	when most jobs will use FAR less than the max disk	14:40
pabelanger	ya	14:41
jeblair	mordred: i don't understand the u'' thing. i thought with py3.5, we did not need that...	14:41
mordred	jeblair: I believe it might just be because I was running with python2 ansible-playbook	14:41
mordred	jeblair: also - donm't know if you saw in scrollback, but SpamapS and jamielennox found the actual bug in the reverted patch	14:46
mordred	jeblair: which was a duplicate popen call	14:46
jeblair	mordred: yup, nice. sorry you were deprived of your morning debugging :)	14:46
mordred	bah. I have plenty of morning debugging available	14:46
jeblair	pabelanger: i think we're thinking along the same lines, in wanting to restrict the trusted wrapper as much as possible, but i'm not quite as far along as you...	14:46
jeblair	pabelanger: one of the things i'm considering is afs	14:47
jeblair	pabelanger: without bubblewrap, we can obviously have trusted jobs work directly in afs running on the executor host	14:47
mordred	tristanC: oh- I also left some comments on your nodepool provider interface static nodes stack - I think for the static nodes one it'll be really useful for us to shove a mockup of the config into an etherpad	14:48
jeblair	pabelanger: with bubblewrap, but configurable read-write dirs, we might be able to bind mount afs into the bubblewrap? (mordred -- do you think this will work?)	14:48
tristanC	mordred: thanks you for digging through those! I'll add snippet in the good.yaml	14:48
mordred	jeblair: I don't think it should be a problem - that said, we should probably check real quick, because there is some amount of interaction with the kernel module and the krb creds needed	14:49
jeblair	pabelanger: we could, as you suggest, ssh into localhost instead. that seems like we're working around the design of the system. but maybe it's okay for this. afs is weird after all.	14:50
mordred	jeblair: that said - if we took pabelanger's approach and had writing to privileged places on localhost actualy be a thing that happened over ssh instead of bind-mounts - then we could have creds for an AFS-enabled account on localhost, scp things and not have to worry about interactions between bubblewrap and afs/krb	14:51
jeblair	i read pabelanger's suggestion as essentially: the only difference between trusted and untrusted should be the ansible module restriction. i do think that's workable if you have a 'ssh to localhost' option.	14:51
mordred	yah	14:51
jeblair	mordred: yeah, a write to afs job will need 2 creds -- an ssh key for the executor and a krb password	14:52
mordred	jeblair: also- if we have someone running zuul in k8s, the path to afs would likely be running a container with a kvm process in it that ran an afs client and ssh-ing in to that for afs operations	14:52
mordred	jeblair: yup	14:52
jeblair	mordred: i can get behind the pabelanger plan. do you have any reservations?	14:53
mordred	jeblair: I don't. I konw SpamapS was originally concerned about limiting network access from untrusted jobs to localhost and we assuaged that concern by noting that permissions would not allow that ssh anyway	14:53
pabelanger	jeblair: ya, that is a good summary.	14:54
mordred	jeblair: I think that's still taken care of in this model	14:54
mordred	since the ssh key to be able to ssh to localhost would be a secret given to a trusted job	14:54
jeblair	mordred: yep -- "just don't do that" -- don't add an ssh key for localhost :)	14:54
mordred	so the untrusted would still not be able to	14:54
mordred	exactly	14:54
jeblair	mordred: the other "use case" would be the idea of bypassing the secrets system to get a "secret". in other words, having a trusted job just use /etc/my-secret-ssh-key to do something.	14:55
jeblair	without extra access, there's no back-channel way of getting data to jobs. that's probably okay, we'd just be all-in on zuul secrets.	14:56
mordred	jeblair: you know - tristanC's thing adding the extra mapping isn't a ton of code, so we can always add it later	14:56
mordred	jeblair: so maybe let's start with all-in-on-secrets?	14:56
mordred	jeblair: also - we COULD still add the capability but just not use it ourselves	14:57
mordred	jeblair: theuse case you mention could be nice for people who want to run a local zuul on their laptop	14:57
jeblair	mordred: true.	14:57
jeblair	it's a bit coarse grained though, and once you start thinking like that, you'd want it per-job as pabelanger points out, and that gets tricky... :)	14:58
mordred	well - I think if the general case's answer is "use secrets and ssh to localhost"	14:59
mordred	then the coarse grain isn't a problem	14:59
mordred	it's really an admin get-out-of-jail card for anything that's TRULY globally needed	14:59
mordred	if you need a thing per-job, you _must_ go through secrets and ssh	14:59
jeblair	ya.	15:00
jeblair	tristanC also pointed out that we use that trick in the unit tests right now. though, we could probably work around it.	15:02
tristanC	well it's saturday here, i'll update the change with the solution you come up with :-)	15:05
jeblair	tristanC: we'll leave a good comment, thanks! happy saturdaying!	15:05
jeblair	mordred: so are we now thinking we should go with the mordred plan?	15:06
tristanC	thank you, have a good day!	15:06
mordred	jeblair: the pabelanger plan you mean?	15:07
pabelanger	I'm working on the sercret / ssh plan today, I think that's what we decided to do for logs.o.o publishing	15:08
mordred	jeblair: and yah - I think it makes the most sense to me and keeps things in ansible-remote language	15:08
jeblair	mordred: well, you were just saying we could go ahead and add the bind-mount options... which is more leaning toward your suggestion on what tristan already has, right?	15:09
jeblair	mordred: the pabelanger plan, at that point, would just be not to use those options. :)	15:09
mordred	jeblair: oh - yes. I think we should add the bind-mount option that tristanC has written, and then for infra to do the pabelanger plan which is to not use those options	15:09
jeblair	pabelanger: that work for you ^ ?	15:10
mordred	jeblair: (it would also let us continue to use them as a hack for unit-tests for the things where the interaction chain gets kind of deep and where we really could do with a multi-node integration test but aren't there yet)	15:10
pabelanger	jeblair: Sure, I wouldn't expect us to use that setting in openstack-infra like mordred said.	15:12
jeblair	mordred: yeah, if we think we're going to want it at all, i figure we should do it now so we don't have to rework the unit tests.	15:12
pabelanger	I also have no idea how you would protect any trusted job from using that mount	15:12
jeblair	pabelanger: you wouldn't -- it's trusted. :)	15:13
pabelanger	Right :)	15:13
pabelanger	okay, are we okay to bring nl01.o.o back online and restart ze01.o.o to pick up our double popen fix?	15:14
pabelanger	Also, it would be great if we could do SSL on gearman today	15:14
jeblair	pabelanger: ++. do you mind doing that while i breakfast?	15:14
pabelanger	going to ping openstack-infra people	15:14
mordred	pabelanger: ++	15:14
pabelanger	jeblair: sure, np	15:14
mordred	pabelanger: anything you need that I can help unblock?	15:14
pabelanger	mordred: SSL ca right now is the blocker, via openstack-infra	15:15
pabelanger	nl01.o.o and ze01.o.o restarted	15:18
pabelanger	okay, ze01.o.o started now. We seem to be leaking out executor pid on shutdown, which causes the service to fail to start.	15:25
pabelanger	on my list to fix	15:25
jlk	so much scrollback	15:28
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop supporting python2 https://review.openstack.org/475009	15:29
jlk	If anybody has time, I'd like a peek at https://review.openstack.org/474401 to make sure I'm going in the right direction.	15:30
mordred	jeblair, pabelanger, jlk, Shrews, SpamapS, clarkb, fungi: ^^ patch here and in project-config for killing py2 support. testing the websockets work is extra difficult keeping support for 2 because of syntax changes even with test skips. let's go ahead and pull the plug	15:31
mordred	jlk: looking	15:31
mordred	tristanC, jamielennox: ^^ sorry, missed you in the ping line	15:32
fungi	everything's working well on py3k at this point though	15:32
fungi	?	15:32
jlk	oooh I'll have to update my docker stuffs	15:32
jlk	but I'm cool w/ that.	15:32
Shrews	i whole-heartedly support killing py2 for the reasons	15:32
pabelanger	+2	15:33
mordred	fungi: yup	15:33
pabelanger	also, zuulv3 SO much more stable this morning: http://zuulv3.openstack.org/	15:33
pabelanger	yay	15:33
jeblair	mordred: in tristanC's patch, why is state_dir involved?	15:34
fungi	mordred: pabelanger: are we still building the zuul-cloner venv w/ python2.7 in dib elements?	15:34
mordred	fungi: we are - but we don't need the zuul-cloner venv in zuul v3	15:35
mordred	jeblair: because state_dir has to get passed to the rw list for bubblewrap - so it's just getting added to the rw map so that there is one mechanism not two	15:35
jeblair	and master branch should stay v2 compatible and tested until we merge v3	15:35
mordred	jeblair: ++	15:35
Shrews	to mordred's point about ws testing, our chosen implementation (autobahn) uses asyncio. to test it, i need to create a client within the unit test, but doing so is hard when py2 barfs on any asyncio-isms	15:35
pabelanger	fungi: mordred: I can confirm I haven't changed our virtualenv for DIB	15:35
jeblair	mordred: i don't see that happening though -- all i see is that it's added to what's already there	15:36
fungi	mordred: good point, jobs no longer invoke zuul-cloner, they declare the repos they need instead	15:36
jlk	Oooh dep cycle detection in pipeline merged, I'll need to rebase my github depends-on	15:36
mordred	jeblair: ah - well, we should make sure that what I said happens - or that state_dir doesn't get added to the mapping	15:36
fungi	(which will get tricky, we have some jobs righth now that decide at runtime whether to install source via z-c checkout or from pypi depending on what change is being tested)	15:37
jeblair	jlk: yeah, it looked correct to me -- there's still follow-on things we'll want to do, like make a driver api method to find cross-connection dependencies, and also report cyclic deps, but it's a good base.	15:37
mordred	fungi: I think that'll actually get easier for them ...	15:37
mordred	fungi: the git repos will all be there at the start of their job, but their logic can just be "use the git repo that's there or install from pip"	15:38
fungi	mordred: oh, yep i guess they can still decide that at runtime regardless, they can just ignore the provided checkouts if they want	15:38
jeblair	mordred, fungi: ++	15:38
jlk	jeblair: okay. I need to sprinkle in a few TODOs, implement the needed-by detection, and for real implement the canMerge method for github ( jamielennox has a patch up to help with that). I hope to have this ready for real review today.	15:38
jeblair	fungi: it makes the related use case, "make this cross-repo unit test work on a dev workstation and in the gate" easier too, because both of those cases are likely to have local git repos. :)	15:38
mordred	fungi: and since the git repo will actually be checked out to the right ref - they can just do things like "pip install {{ zuul_workspace_root }}/src/git.openstack.org/openstack/repo" and not clone at all	15:39
fungi	true, true	15:39
fungi	mordred: looks like the docs job may not be py3k-ready. looking through the failure logs now	15:39
mordred	jeblair, jlk: looking at the gh patches - sake of argument real quick - is Depends-On in the commit the right model? why not Depends-On: in the PR message?	15:40
mordred	fungi: neat! that'll be fun	15:40
fungi	heh	15:40
fungi	sphinx.errors.SphinxWarning: WARNING: Unexpected re	15:40
fungi	turn code 1 from command 'zuul --help'	15:40
jeblair	mordred: for one: so it shows up in the commit log	15:40
* fungi grumbles at stray newline in his paste		15:40
mordred	jeblair: PR message is what's used as the commit message of the merge commit typicaly	15:40
fungi	i agree with sphinx, zuul --help should not exit 1	15:41
mordred	jeblair: although, I suppose if there is no merge commit, then you'd lose it	15:41
jeblair	mordred: both good points. :)	15:41
jlk	mordred: couple reasons. 1) PR message can be edited, and we're not currently handling _anything_ about the PR message. 2) it shows up in the commit log of git, where as the PR message does not.	15:41
jlk	and it also mirrors gerrit which uses the commit message	15:41
jeblair	mordred: another point in favor of commit: it's more specific in a multi-commit branch	15:41
jeblair	like "this commit specifically needs this other thing"	15:42
mordred	yah. those are good points - let's just keep our eye on it as we use it to see if it starts to feel weird	15:42
jeblair	mordred: yep. certainly a legit question.	15:42
jlk	totes	15:43
jlk	We could _add_ looking at the PR message later.	15:43
jlk	but I think we should keep the commit message method	15:43
mordred	kk	15:43
jlk	Thanks for the feedback. I'm afk for a little bit, then back at it.	15:46
SpamapS	2017-06-16 15:33:49.160236 \| sphinx.errors.SphinxWarning: WARNING: Unexpected return code 1 from command 'zuul --help'	15:47
mordred	that seems to be a legit error	15:48
SpamapS	Like, zuul --help doesn't work on py3 now?	15:49
SpamapS	AttributeError: 'dict_keys' object has no attribute 'append'	15:49
SpamapS	yep	15:49
jeblair	weird, it works on zl01	15:50
SpamapS	mordred: you working on a patch?	15:50
mordred	yah	15:50
mordred	it's the dict_keys thing	15:50
SpamapS	Very obvious py3 problem	15:50
SpamapS	choices=self._show_running_jobs_columns().keys().append('ALL'),	15:50
SpamapS	durn views	15:50
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop supporting python2 https://review.openstack.org/475009	15:51
mordred	fungi: ^^	15:51
fungi	mordred: okay that's weird... what happened to patchset #2 for that change?	15:53
SpamapS	it went to plaid	15:53
mordred	fungi: it was a py3 bug in the zuul command line client	15:53
fungi	did you abort a push partway or something, maybe?	15:53
mordred	oh - yah	15:53
mordred	I realized halfway-through that I shoujld updated the commit message	15:54
fungi	the sad sad state of gerrit data integrity	15:54
jeblair	there's not not a patchset 2...	15:54
fungi	my point exactly	15:55
fungi	seems like we've seen that behavior before too. you can abort mid-push and gerrit doesn't unroll whatever transactions it started in reserving that patchset, so skips to the next one because it considers it used or something i guess?	15:56
jeblair	oops, my bad, it is broken on ze01. i was on zl01 which is py2. :)	15:56
pabelanger	https://review.openstack.org/#/c/474321/ is an easier view, fixes deprecation warnings on ze01.o.o for our tox/pre.yaml file	15:57
pabelanger	review*	15:57
mordred	pabelanger: can you explain what gather_subset: "!all" does? (the patch looks fine- but now I'm curious)	15:59
pabelanger	mordred: clarkb recommended it for a minimal set of facts to be gathered	15:59
pabelanger	to help speed things up a little	15:59
mordred	ah - so that gathers basic facts but nothing additional	15:59
mordred	cool	15:59
pabelanger	Ya, never knew about it until clarkb suggested it. And so far seems to be what we need for our usage of facts	16:00
mordred	++	16:10
clarkb	ya fact gathering can be quite slow	16:26
*** hashar has quit IRC		16:29
openstackgerrit	Paul Belanger proposed openstack-infra/zuul feature/zuulv3: Add known_hosts to bubblewrap jobir.work_root https://review.openstack.org/473100	16:31
pabelanger	mordred: SpamapS: just a rebase, but maybe jeblair would also like to review^	16:31
jeblair	pabelanger: lgtm!	16:40
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Add known_hosts to bubblewrap jobir.work_root https://review.openstack.org/473100	16:51
*** jamielennox has quit IRC		17:03
*** jamielennox has joined #zuul		17:03
*** jlk has quit IRC		17:23
*** jlk has joined #zuul		17:23
*** jlk has quit IRC		17:24
*** jlk has joined #zuul		17:24
jlk	hrm, I need a wee bit of help on this depends-on change again. I'm looking at _getNeededByFromCommit() in gerritconnection.py, I'm trying ot understand what the query is doing here. It's looking for any commit that has in the change ID anywhere in it, and then running the regex locally on the results?	17:26
jeblair	jlk: yes, that was due to a limitation in gerrit searching (we couldn't search for "Depends-On: changeid")	17:27
jeblair	jlk: but we could search for the sha, so we get all those changes, then reduce them to actual depends-on. the set we get back from gerrit will, for example, also include the change itself since it has its own change id in its own commit message.	17:28
jlk	so for github, I could search for any commit that has 'Depends-On' in the message, then check if any PR has that commit in it, and check if the PR is still open, and add it to the needed by list.	17:29
jeblair	jlk: sounds right	17:30
openstackgerrit	James E. Blair proposed openstack-infra/zuul-sphinx master: Initial commit https://review.openstack.org/475041	17:38
jlk	oh light bulb. I'm searching for something that depends on this specific change, not just any change. I could search for "depends-on:" and "org/project/pull/number" to really narrow down results.	17:42
jeblair	jlk: yes; i think you should do that with github. you should in most cases end up with only exactly what you need.	17:45
jeblair	jlk: i think we need to support url depends on	17:46
jlk	yeah, the regex in my wip uses the full URL. e.g.: https://github.com/org/repo/pull/2332	17:46
jeblair	ok cool	17:46
jlk	but I was going to ask for help with the regex, as we probably should support more than just "github.com", for github enterprise, and for people who do https://www.github.com/	17:47
jeblair	i don't think we've achieved consensus in the email thread on whether we should also support github-local syntax... i was going to resume that after ansiblefest. but i do think we need to at least support the full url, so we may as well start with that.	17:47
jlk	to start with, should we only look for deps within a given connection? like when dealing with a change in Connection A, the regex should use connection A's hostname to look in?	17:48
jeblair	(and hey, if i'm wrong about that, it'll be easy to change)	17:48
jeblair	jlk: well, the thing that ultimately needs to happen is for the pipeline manager to drive the process, rather than the connection.	17:48
jlk	agreed, starting with the URL makes sense. It's easier to copy/paste and github itself will translate the full URL into a relative url within the UI	17:49
jeblair	jlk: so, the pipeline manager would ask each connection "hey, does this URL look like something you can handle? ie, is this your hostname?"	17:49
jlk	right, okay, so it sounds like yes, the regex SHOULD be connection hostname specific	17:49
jeblair	jlk: then if the connection says yes, the connection performs the query and the $something (pipeline manager? something in a driver? whatever it takes) should update the change dependencies	17:50
jeblair	jlk: so yeah, sorry if i went too far down the road... :)	17:50
jlk	all good	17:50
jeblair	jlk: i think now and later the actual query will be connection specific, so that's safe.	17:50
jeblair	jlk: and for now, since none of that other stuff i said exists, just doing it all intra-connection, like gerrit currently is, is fine	17:51
jeblair	then we'll build the pipeline superstructure for this and move it into it	17:51
mordred	jeblair: on the zuul-sphinx repo - as a followup patch you should probably set basepython to python3 so that we don't accidentally make a sphinx plugin that would break in zuul's docs build	17:52
jeblair	mordred: ++	17:53
mordred	jlk: so that means you should be able to just do a string substitution with the connection's hostname for now, yeah?	17:54
jeblair	mordred: yep	17:55
jlk	mordred: right, it means the regex we look at the commit messages for, and the search string we hit github for should both be hostname specific to the connection	17:55
* jeblair realizes he is not jlk but answers anyway		17:55
jlk	replace "github.com" with connection.hostname or connection.canonical_hostname	17:56
jeblair	probably .hostname i think	17:56
mordred	yah	17:56
jlk	oh, for github driver that appears to be "git_host" config entry	18:01
jlk	oh screw you github search. "depends-on" should not be matching against "depends" "on"	18:03
jeblair	wow. i mean, they're searching computer programs. it's not like multi-word tokens are rare....	18:04
jlk	they have a method for doing multi word too, they say quote the pair, like "computer program" to match both, instead of computer program to match either	18:05
jlk	but it's taking the - and making it a space	18:05
clarkb	it should treat the other end the same way though	18:06
jlk	crap, this search might be only matching commits on master branches.	18:09
jeblair	jlk: this may end up being slightly more like the gerrit query than would otherwise be ideal then. "depends" "on" "https://github.com/url". then filter locally to get actual "depends-on: url".	18:09
jeblair	what, really?	18:09
jlk	yeah I might have to. I'm seeing some results of people putting "depends on" in the commit message, not our style.	18:09
mordred	jlk: I can't get github to return matches for https://github.com/j2sol/z8s-sandbox/pull/1	18:10
jlk	mordred: that's what I'm discovering too :(	18:10
jeblair	mordred, jlk: do you have a curl command you can paste me?	18:11
jlk	We're using the search interface I think	18:11
mordred	yah - just using the search box in the web ui	18:11
jeblair	probably worth trying the api in case there's a behavior difference	18:11
jlk	"depends on:" https://github.com/j2sol	18:12
jlk	at https://github.com/search	18:12
jlk	yeah I'll get into the API in a minute	18:12
mordred	jeblair: fwiw, https://github.com/j2sol/z8s-sandbox/pull/2 is the pull request that contains a commit with depends-on	18:12
jeblair	curl -H Accept:application/vnd.github.cloak-preview https://api.github.com/search/commits?q=repo:octocat/Spoon-Knife+css	18:13
jeblair	okay, that's their docs example	18:13
mordred	curl -H Accept:application/vnd.github.cloak-preview https://api.github.com/search/commits?q=repo:j2sol/z8s-sandbox+Depends-On	18:14
mordred	gets me no hits	18:14
jeblair	yeah. "zuul" does. "dumb" does not.	18:15
jeblair	confirming the master-branch only behavior in the api.	18:15
jlk	that's going to be a bummer.	18:15
jlk	so...	18:15
jlk	what we could do, since this is mostly a convenience routine, is just look in our own internal cache of changes and the commit messages associated with those changes	18:16
jlk	(since I have a cache of all of them we've seen)	18:16
jeblair	jlk: yeah, that's an option.... though i'm also wondering if "do nothing" would be better. because using the cache means inconsistent behavior. at least "approved changes which depend on unapproved changes will need another approval after their dependencies merge" is explainable.	18:17
mordred	searching commits is wrong	18:17
mordred	curl -H Accept:application/vnd.github.cloak-preview https://api.github.com/search/issues?q=repo:j2sol/z8s-sandbox+"Depends-On:"	18:17
mordred	that works	18:17
jeblair	mordred: of course :)	18:18
jlk	whaaat?	18:18
mordred	pull requests are issues	18:18
jlk	how hte hhell	18:18
jlk	yeah, but	18:18
jlk	oh	18:18
mordred	if we're searching commits - we're asking it to search commits that are in the repo	18:18
jlk	You're finding the pull request message, not the commit message	18:18
mordred	pull reuqests aren't landed yet	18:18
jlk	pull requests have landed SOMEWHERE	18:18
jlk	rather the commit in question exists	18:18
mordred	they have - you just don't know where	18:18
jlk	it's just not on a master branch	18:18
jlk	you do know.	18:18
mordred	yah	18:18
jlk	er, you know via the PR details	18:19
jlk	I really wonder if GraphQL would save us here, but that's a lot of node walking	18:19
jlk	I lvoe that this limitation isn't documented	18:21
mordred	https://stackoverflow.com/questions/31891733/searching-code-in-a-specific-github-branch	18:24
jlk	yeah, that's the indication I got	18:25
mordred	https://stackoverflow.com/questions/43533851/search-for-commits-on-a-branch-other-than-master-branch-in-github "Only the master branch is indexed for search in github.com."	18:25
jeblair	jlk: in https://github.com/j2sol/z8s-sandbox/pull/2 is that first comment automatically generated, or did that come from your commit?	18:26
jeblair	basically, i'm wondering if it's that comment that matches mordred's search, rather than the commit, and what the implications are for that	18:26
mordred	jeblair: yes, I am certain it's that comment athat matches my search	18:27
jlk	jeblair: that's the description of the pull request	18:27
jlk	when opening a PR you can put in a subject, and a description (like a git commit). Often it'll be pre-populated from the git commit itself, if the PR only has one commit on it.	18:28
jlk	but they can be edited after the fact	18:28
jlk	so yeah, his search is finding the PR comment, not the commit message	18:28
jeblair	so if the pr had 3 commits and the depends-on was in the second header, we would likely not find it in a search?	18:28
mordred	that's correct	18:28
mordred	unless the PR author put it in their PR message	18:28
jeblair	welp. you may get your wish, mordred. :)	18:29
jlk	yeah, if the PR has 3 commits, the PR author usually gets a blank box for subject/description	18:29
mordred	jeblair: well, that's sad, because y'all had convinced me commits were better	18:29
jlk	yeah :(	18:30
mordred	but I agree	18:30
mordred	it seems like they're gonna need to go in the PR message if search is going to work	18:30
jeblair	this system is so black-boxy, we should probably test that case, just to confirm	18:30
jlk	oh I can totes do that on PR 1	18:30
jlk	See https://github.com/j2sol/z8s-sandbox/pull/1	18:32
jlk	I just added a second commit there, that has Depends-On in the commit message	18:32
jlk	(and even though the UX hides it, it does actually have the full string "Depends-On: https://github.com/j2sol/z8s-sandbox/pull/2"	18:33
mordred	jeblair: so - remind me again - the purpose of this bit is to be able to take action on changes that depended on a change once we take action on it, right?	18:33
jeblair	jlk: yep, i get no results for that :(	18:33
jlk	but now... watch	18:34
jeblair	mordred: yeah, so when B->A and you approve B, and then go on vacation and then come back and approve A, both of them land.	18:34
jlk	I added a comment with that string	18:34
jeblair	jlk: and i get a result now	18:34
jlk	yeah, so. hrm.	18:35
mordred	jeblair: so - sake of argument- the only values of B we're going to care about will be Bs that will have generated an approval event already at some point in the past, right?	18:36
jlk	If we do this off of PR comments, we just have to handle PR comment edit events, at least to update existing changes.	18:36
jlk	"an approval event" ?	18:36
jeblair	mordred: generally, yes. we can't rely on the cache though.	18:36
jlk	mordred: something that would match a pipeline/trigger requirement?	18:37
mordred	jeblair: right. I don't think the cache is the right idea, it would be something new	18:37
mordred	jlk: yah	18:37
jeblair	mordred: i'm all ears	18:37
mordred	jeblair: but just saying if we think commits are the correct way to do this, and we can't search for them, we could record pending dependencies for later reference	18:37
jlk	okay, yeah for github that'd be a review, a label, a status event	18:37
jlk	mordred: I think that fails if zuul is restarted between events.	18:38
jeblair	mordred: ah, more state; another database.	18:38
mordred	jeblair: not necessarily - it could be something we save/restore along side pipeline state	18:38
mordred	since it's essentially like a deferred entry for a pipeline	18:38
mordred	"this would be enqueued but can't yet be because it's waiting on X"	18:39
jlk	oh you'd want to stash it during graceful restarts?	18:39
jeblair	mordred: yeah, but it's also essentially unbounded in terms of size and time. i'm not sure "only save it during restarts" is that much different than "always save it" for something like that.	18:39
jeblair	(these are, specifically things we expect to stick around for days, if not months. unlike pipeline entries which stick around for minutes or hours)	18:40
mordred	jeblair: indeed.	18:40
jeblair	either way -- the main thrust is "persistent cache of this data", yeah?	18:41
mordred	yah, essentially	18:41
mordred	since it'sa strict subset of PRs we will have seen at some point in time	18:41
jeblair	i'd like to cipher on this for a bit... i propose we all mull it over for a while. perhaps we should defer implementation of reverse depends for a bit? unless jlk wants to proceed with the issue search just for fun, even though we may discard it...	18:43
jlk	well, issue search is essentially dead right now.	18:43
jlk	I can delay implementing the reverse depends.	18:43
mordred	I agree - I think forward depends gets us a good way forward	18:43
jlk	just leave the TODO here.	18:44
mordred	yah	18:44
jeblair	jlk: right, issue search is only viable if we decide to move headers to pr messages rather than commit msgs.	18:44
jlk	oh sorry yeah I see what you mean there. issue search vs commit search. Derp.	18:45
jeblair	so really, we have 2 related things to think about: issue description vs commit, and the memoization idea (that's a better word than persistent cache)	18:45
*** persia has quit IRC		18:47
* jlk off to lunch, and a kid's school function.		19:14
Shrews	jeblair: tristanC: i looked at the first few reviews for tristanC's nodepool abstraction changes. direction seems ok to me. i only worry about changes touching the request handling since that was so difficult to get solid	19:39
Shrews	and i can't quite see how other drivers will use (or not use) that, atm.	19:40
jeblair	Shrews: yeah, i think we'll want to try to minimize driver interaction with that. i confess it's not an area i have thought a lot about yet. it deserves more attention.	19:42
Shrews	jeblair: yeah. i mean, on the one hand, i don't want to see it just plopped into a currently solid system without thinking more about that. but on the other hand, i find it hard to think about that interaction without seeing it in action, so.... ¯\_(ツ)_/¯	19:44
clarkb	iirc one of my comments was that the request handling needs to be a base class with minimal amounts of changes in subclesses	19:50
Shrews	Would kind of be helpful to have a second/separate system running with the changes (and maybe even a static node or two) to observe for a bit	19:50
clarkb	a lot of that got refactored in the later changes, I think if we want to say the later changes are lower priority we will want ot push those improvements down into the static driver	19:50
mordred	Shrews, clarkb: it seems the main things that the openstack driver does that arent' in the base driver are quota and images	20:13
mordred	and then the stay-in-same-az portion of the algorithm	20:14
mordred	so - yah - we may want to think about what it means for those to exist or not exist in a given driver	20:14
clarkb	mordred: off of memory that is right. I think the biggest problem is that request handling and "cloud" drivering got mashed together fairly well in these new drivers. And really we want to have 99% of it pushed into the driver	20:15
clarkb	with just enough smarts in the handler to know which driver to dispatch to	20:15
clarkb	(which I left comments about in the changes, problem was that in a stack of ~6 changes things don't really congeal until the end but we are saying we only want the first change right now, so we want to rebase/refactor to get that congealed stuff into the static driver too)	20:16
mordred	clarkb: yah - that's fair	20:16
mordred	clarkb: I do feel like I'm learning a good deal about the surface area from going through them though, so that's good	20:16
clarkb	I think if we do that and make the separate clear then request handling is a non issue	20:16
clarkb	basically its only an issue because he interfaces eveolve through the set of things we don't want immediately	20:17
fungi	jeblair: jlk: mordred: depends-on in gerrit, being in the commit message, means you automatically get a new commit whenever adding/changing/removing one... if we relied on pr description fields in gh, which can be updated at any point, i suppose we have to re-evaluate them via the api every time we want to use them in case they get removed in a subsequent edit, right?	20:27
jlk	fungi: correct. We'd have to respond to comment edits and cause the in-memory change to get updated with current details.	20:31
jlk	arguably we might need to do that in case somebody typos a "recheck" comment and edits to fix.	20:32
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Stop supporting python2 https://review.openstack.org/475009	20:41
pabelanger	zuul-ca should be coming online in next 5mins after puppet run	20:41
pabelanger	I'll generate ssl certs then	20:41
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add node to README about Python3 https://review.openstack.org/475070	20:42
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add ROADMAP.rst file https://review.openstack.org/475071	20:42
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add note to README about Python3 https://review.openstack.org/475070	20:43
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add ROADMAP.rst file https://review.openstack.org/475071	20:43
openstackgerrit	Merged openstack-infra/zuul-sphinx master: Initial commit https://review.openstack.org/475041	21:11
*** jkilpatr has quit IRC		21:31
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Add redhat-rpm-config to rpm build deps https://review.openstack.org/475082	22:31
jeblair	SpamapS, mordred, pabelanger: does this cover what we discussed yesterday about control persist: https://storyboard.openstack.org/#!/story/2001072	23:01
mordred	jeblair: yes	23:02
pabelanger	++	23:10
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github https://review.openstack.org/474401	23:15
jlk	jeblair: jamielennox: SpamapS: mordred: ^^^ is ready for real review.	23:16
*** SotK_ has joined #zuul		23:21
* SpamapS reads		23:21
mordred	jlk: zuul is angry	23:22
jlk	ffffff.	23:22
jlk	looking	23:22
jlk	oh bother. Okay, I can't use self. in the regex that way, because self doesn't exist at that point. grr.	23:23
jlk	Should I move that out of the class def and into __init__, or should I define the string with the blank spot and do the variable substitution at use time?	23:24
*** mmedvede_ has joined #zuul		23:25
*** mmedvede has quit IRC		23:26
*** SotK has quit IRC		23:26
*** mmedvede_ is now known as mmedvede		23:26
mordred	jlk: I'd just move it into the __init__	23:27
*** toabctl has quit IRC		23:28
*** toabctl has joined #zuul		23:30
openstackgerrit	Jesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github https://review.openstack.org/474401	23:31
jlk	done.	23:31

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!