Friday, 2017-06-16

clarkbI just retested and ps says it doesn't use the venv python00:01
clarkbI ran 'ansibleenv/bin/ansible localhost -i invetory -m shell -a "sleep 100"00:01
mordredjamielennox: biggest tl;dr is that the command module sync was breaking jobs - dont' know _why_ yet - but with the sync patches things break WEIRDLY and without them they don't00:01
clarkbthen ps -elf | grep asnible it shows: /usr/bin/python /home/clark/.ansible/tmp/ansible-tmp-1497571250.32-216375718117125/command.py00:02
mordredjamielennox: so short-term fix is revert - and then we can dig in to figure out what specifically is broken00:02
jamielennoxmordred: interesting, do you have an example of weird failure00:03
clarkband it actually forks an sh to run the python to run the ansible. interesting00:04
mordredjamielennox: yah - running the tox-pep8 job from openstack-zuul-jobs breaks - one sec00:04
mordredjamielennox: http://paste.openstack.org/show/612772/00:09
mordredjamielennox: it's super gross to read00:10
mordredoh - I hav ea better log of that00:10
mordredjamielennox: http://paste.openstack.org/show/612773/00:10
mordredjamielennox: I'm running this playbook: http://paste.openstack.org/show/612774/00:11
mordredjamielennox: with and ansible config set up to point to the zuul stuff in my local source directory00:12
mordredthat failure happens with the command stuff applied00:12
jamielennoxmordred: wow, the command sync causes that? that is weird00:12
mordredjamielennox: RIGHT????00:12
mordredjamielennox: it has us completely baffled00:12
SpamapSI'm lacking context00:12
*** jkilpatr has quit IRC00:12
SpamapSwhat is "the command sync" ?00:12
jamielennoxSpamapS: so to make zuul run in python 3 i updated the "command.py" zuul forked to current ansible00:13
jamielennoxit only matters when you delegate_to: localhost, because then you run in the same python that ansible is running in00:13
mordredSpamapS: https://review.openstack.org/474808 contains a revert of the appropriate files00:14
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Revert sync from latest command from ansible  https://review.openstack.org/47480800:14
jamielennoxbut it should have been a fairly simple patch applying some str/bytes handling00:14
mordredyup. I agree looking at it00:14
mordredso I can't WAIT to know what's broken00:14
mordredbut I must now eat dinner00:14
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use python2 for ansible on remote hosts  https://review.openstack.org/47481000:14
SpamapSehhh00:16
SpamapShttps://review.openstack.org/#/c/474808/1/zuul/ansible/library/command.py00:16
SpamapSline 36700:16
SpamapSam I crazy, or are we popening twice?00:16
SpamapSor is that just the reverse diff baking my brain?00:18
jamielennoxSpamapS: ... no you would appear to be reading that correctly ....00:19
SpamapSyep00:19
SpamapSmordred: ^^^00:19
SpamapSI think we can unrevert ;)00:19
jamielennoxwhich, lol, would probably be mordred's issue because he's running tox twice in quick succession and so it'd check for a venv, find nothing, try and create one and fail when it exists00:20
jamielennoxhmm, i wonder if that solves mine as well00:20
SpamapStwice, in quick succession, in parallel00:20
jamielennoxright00:20
jamielennoxSpamapS: i feel you should get the fixup honours there00:21
jamielennoxbecause i looked at that file a fair bit and didn't see that00:21
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Revert "Revert sync from latest command from ansible"  https://review.openstack.org/47481400:21
SpamapSjamielennox: ^ done :)00:21
SpamapSjamielennox: chalk it up to me taking a few days off from zuul'ing, allowing me to come back in with less assumption brain and more "wtf is going on" brain :)00:22
jamielennoxthere is now a crazy amount of undoing debugging i have to do to even test that properly00:23
SpamapSjamielennox: ohmy00:23
jlkoh neat...00:23
jlkI'm getting stumped by the system. Trying to make org/project2 depend on a change in org/project1, but enqueuing isn't working because they don't share a queue.  Oops.00:24
pabelangerokay, so that does make sense. Because I think that is why apt was failing about getting a lock too00:26
pabelangerI thought it was getting run twice, but was not able to actually see it happening00:26
jlkWhen something gets added to the queue, but live=False, what does that mean for test runs? It seems that the added job never actually runs.00:28
pabelangerclarkb: guess we figured out lock issue too, ^00:29
SpamapSjlk: hm00:35
jlkthe non-matching queue behavior is right00:35
SpamapSjlk: I may be thinking too high level here, but I'd expect things with no shared queue to still be able to depend on one another.00:35
jlkas a writer of project2, I want to depend on a change in-flight in project1, but I don't contribute to project1, and zuul doesn't really report on project1, so make _use_ of the content in the git dir, but don't report on the change.00:36
jlkyeah they do00:36
jlkSpamapS: you can depend on it for the content in the git repo when you test (for independent pipelines), and you can depend on it being merged (and the content) for dependent pipelines.00:40
jlkif they _share_ a queue, then an event on project2 that depends on a pr in project1 will cause the pr in project1 to become enqueued ahead and run through the pipeline as well.00:41
jlkit's a twisty path :/00:41
jlkmostly I'm trying to figure out where the logic branches are and how to properly test them.00:41
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github [wip]  https://review.openstack.org/47440100:56
jlkokay got one test minimally working00:56
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: run trusted playbook in a bubblewrap  https://review.openstack.org/47446001:15
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: bubblewrap: adds --die-with-parent option  https://review.openstack.org/47316401:15
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: config: refactor config get default  https://review.openstack.org/47448401:15
mordredjamielennox, SpamapS: great find! I have verified that it works02:19
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Fix unicode encoding gotcha in streamer  https://review.openstack.org/47482702:21
mordredjamielennox, SpamapS: I also noticed that while debugging the other thing02:22
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: config: refactor config get default  https://review.openstack.org/47448402:25
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: executor: add support for custom ansible_port  https://review.openstack.org/46871005:08
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Add support for custom ssh port  https://review.openstack.org/46875205:08
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool request handling code  https://review.openstack.org/46874805:08
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Abstract Nodepool provider management code  https://review.openstack.org/46874905:09
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Collect request handling implementation in an OpenStack driver  https://review.openstack.org/46875005:15
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Extend Nodepool configuration syntax to support multiple drivers  https://review.openstack.org/46875105:16
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement a static driver for Nodepool  https://review.openstack.org/46862405:16
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github [wip]  https://review.openstack.org/47440105:32
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement pipeline requirement of github labels  https://review.openstack.org/47396205:41
openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: Add support for zuul.d configuration split  https://review.openstack.org/47376406:02
*** hashar has joined #zuul07:21
*** hashar_ has joined #zuul09:58
*** hashar has quit IRC09:59
*** hashar_ is now known as hashar10:43
*** jkilpatr has joined #zuul10:59
openstackgerritTristan Cacqueray proposed openstack-infra/nodepool feature/zuulv3: Implement an OpenContainer driver  https://review.openstack.org/46875311:07
*** hashar has quit IRC11:43
*** hashar has joined #zuul12:08
*** hashar has quit IRC12:29
*** hashar has joined #zuul12:51
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove item.layout  https://review.openstack.org/47418813:25
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Move dependency cycle detection into pipelines  https://review.openstack.org/45142313:27
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Revert "Revert sync from latest command from ansible"  https://review.openstack.org/47481413:31
mordredtristanC: just left a longish feedback on the trusted execution wrapper patch - we should probably make sure jeblair agrees or disagrees with what I said before you do anything about it though13:53
mordredtristanC: also - while you're looking at executor related restrictions - there's an in-progress patch that could use being picked up and finished (that I imagine would be a useful feature for SF too)13:54
mordredtristanC: https://review.openstack.org/#/c/463022/ - which is about adding disk space limit protections13:55
tristanCmordred: thanks, i'll update the review accordingly14:00
tristanCmordred: and regarding further restriction, i still need to wrap my head around the potencial issues with the executor...14:05
tristanCI don't get how a job could use disk space on the executor node, beside through stdout14:06
pabelangerYa, expsoing a r/w to bubblewrap make me feel tingly. Not sure why. Seems like we should just have job use SSH, and they could either delegate logs to localhost or another server. Then we don't need to toggle bind mounts in bwrap14:16
tristanCpabelanger: but if a job can ssh localhost, then what's the point to run the ssh process in bwrap?14:17
tristanCpabelanger: also that implies zuul can ssh localhost in the first place14:18
tristanCmordred: regarding 463022, I wonder if using a watchdog for diskusage is the way to go, then how about for iops and cpu usage?14:22
pabelangertristanC: So, what I am planning on doing today is add our ssh key to logs.o.o into a trusted job, and that job will then have ssh access to the remote host. You could do the same for localhost. Only that job has access to the key14:23
pabelangeryes, there would have to be a step to setup authorize keys14:24
pabelangerlike we;d do with remote logging server today14:24
pabelangerboth jobs are dependent on existing configuration14:24
tristanCpabelanger: oh well, that would work too, but that's not what the base/post is currently doing14:25
pabelangerI think blindly bind mounting any trusted job to /opt/zuul-logs (in your example) would provide less control.  In the case of SSH, it depends on ssh key within said job.14:26
pabelangertristanC: right, we delete that14:26
pabelangertristanC: it was just a hack to get logs on zuulv3-dev.o.o14:26
mordredtristanC: yah - the disk was the first thing we were worried about because the executor gets used as a log transfer location and we've had jobs in openstack accidentally start transferring huge amounts of logs14:27
pabelangerand reason we haven't done it yet, was didn't want to add ssh key for logs.o.o to executor until we knew things were starting to get secure.14:27
pabelangernext step is SSL for gearman, hope to do that today14:27
mordredtristanC: for the iops/cpu- I think our original thinking was that those are things that can actualy be restricted at the bubblewrap layer14:27
mordredbut it turns out total disk space used is a bit harder14:28
pabelangerhttps://review.openstack.org/#/c/474726/ removes copy to /opt/zuul-logs too14:28
tristanCpabelanger: well I don't mind removing extra bind in trusted wrapper after all, but giving zuul ssh access to localhost feels wrong to me14:31
pabelangerPersonally, I don't think jobs sould be writing to localhost on executor. I think it should be considered ephemeral by default. That way it makes scaling out much easier14:33
mordredpabelanger, clarkb: when we were talking about limiting disk usage at the PTG - is there a reason we decided against using cgroups to limit total bytes written? I mean, I know it's possible that someone could hit that by writing and deleting a ton of small files, but was there a different reason?14:34
* mordred reads scrollback between tristanC and pabelanger14:34
pabelangermordred: I'll have to defer to clarkb on cgroups, I cannot remember14:35
tristanCmordred: I can't tell if bubblewrap will accept such feature, on the other hand runc would provide all the restriction available in Linux...14:36
mordredtristanC: yah - the reason we didn't go with runc is that it requires root to run - and we were trying to avoid making the executor need to run as root14:37
mordredtristanC: but thanks - that's probably the reason we also didn't use cgroups for disk usage limitation14:37
jeblairmordred, tristanC, pabelanger: i'll go look at the bubblewrap patch now...14:39
mordredpabelanger: I think we should have a broader conversation about localhost writing/execution and ssh to localhost- I believe there are strong opinions on both sides of that one (I do not believe I'm one of the folks with a strong opinion)14:39
pabelangerIIRC, we did discuss using LVM and carving out slices for jobs14:39
mordredjeblair: thanks!14:39
pabelangerbut not sure where we ended up with that14:39
jeblairmordred: but while i'm doing that, maybe it's my morning brain, but why is https://review.openstack.org/474827 necessary?14:39
mordredpabelanger: yah - I think the problem we ended up with there was efficiency14:39
mordredjeblair: when I was testing the execution yesterday without that I would trigger an exception about inability to encode a byte value14:40
jeblairmordred: also, requires lvm.14:40
mordredjeblair: yah14:40
mordredand root14:40
mordredbut has the same problem as bind-mounting files - you wind up allocating the max disk to each job14:40
mordredwhen most jobs will use FAR less than the max disk14:40
pabelangerya14:41
jeblairmordred: i don't understand the u'' thing.  i thought with py3.5, we did not need that...14:41
mordredjeblair: I believe it might just be because I was running with python2 ansible-playbook14:41
mordredjeblair: also - donm't know if you saw in scrollback, but SpamapS and jamielennox found the actual bug in the reverted patch14:46
mordredjeblair: which was a duplicate popen call14:46
jeblairmordred: yup, nice.  sorry you were deprived of your morning debugging :)14:46
mordredbah. I have plenty of morning debugging available14:46
jeblairpabelanger: i think we're thinking along the same lines, in wanting to restrict the trusted wrapper as much as possible, but i'm not quite as far along as you...14:46
jeblairpabelanger: one of the things i'm considering is afs14:47
jeblairpabelanger: without bubblewrap, we can obviously have trusted jobs work directly in afs running on the executor host14:47
mordredtristanC: oh- I also left some comments on your nodepool provider interface static nodes stack - I think for the static nodes one it'll be really useful for us to shove a mockup of the config into an etherpad14:48
jeblairpabelanger: with bubblewrap, but configurable read-write dirs, we might be able to bind mount afs into the bubblewrap?  (mordred -- do you think this will work?)14:48
tristanCmordred: thanks you for digging through those! I'll add snippet in the good.yaml14:48
mordredjeblair: I don't think it should be a problem - that said, we should probably check real quick, because there is some amount of interaction with the kernel module and the krb creds needed14:49
jeblairpabelanger: we could, as you suggest, ssh into localhost instead.  that seems like we're working around the design of the system.  but maybe it's okay for this.  afs is weird after all.14:50
mordredjeblair: that said - if we took pabelanger's approach and had writing to privileged places on localhost actualy be a thing that happened over ssh instead of bind-mounts - then we could have creds for an AFS-enabled account on localhost, scp things and not have to worry about interactions between bubblewrap and afs/krb14:51
jeblairi read pabelanger's suggestion as essentially: the only difference between trusted and untrusted should be the ansible module restriction.  i do think that's workable if you have a 'ssh to localhost' option.14:51
mordredyah14:51
jeblairmordred: yeah, a write to afs job will need 2 creds -- an ssh key for the executor and a krb password14:52
mordredjeblair: also- if we have someone running zuul in k8s, the path to afs would likely be running a container with a kvm process in it that ran an afs client and ssh-ing in to that for afs operations14:52
mordredjeblair: yup14:52
jeblairmordred: i can get behind the pabelanger plan.  do you have any reservations?14:53
mordredjeblair: I don't. I konw SpamapS was originally concerned about limiting network access from untrusted jobs to localhost and we assuaged that concern by noting that permissions would not allow that ssh anyway14:53
pabelangerjeblair: ya, that is a good summary.14:54
mordredjeblair: I think that's still taken care of in this model14:54
mordredsince the ssh key to be able to ssh to localhost would be a secret given to a trusted job14:54
jeblairmordred: yep -- "just don't do that" -- don't add an ssh key for localhost :)14:54
mordredso the untrusted would still not be able to14:54
mordredexactly14:54
jeblairmordred: the other "use case" would be the idea of bypassing the secrets system to get a "secret".  in other words, having a trusted job just use /etc/my-secret-ssh-key to do something.14:55
jeblairwithout extra access, there's no back-channel way of getting data to jobs.  that's probably okay, we'd just be all-in on zuul secrets.14:56
mordredjeblair: you know - tristanC's thing adding the extra mapping isn't a ton of code, so we can always add it later14:56
mordredjeblair: so maybe let's start with all-in-on-secrets?14:56
mordredjeblair: also - we COULD still add the capability but just not use it ourselves14:57
mordredjeblair: theuse case you mention could be nice for people who want to run a local zuul on their laptop14:57
jeblairmordred: true.14:57
jeblairit's a bit coarse grained though, and once you start thinking like that, you'd want it per-job as pabelanger points out, and that gets tricky... :)14:58
mordredwell - I think if the general case's answer is "use secrets and ssh to localhost"14:59
mordredthen the coarse grain isn't a problem14:59
mordredit's really an admin get-out-of-jail card for anything that's TRULY globally needed14:59
mordredif you need a thing per-job, you _must_ go through secrets and ssh14:59
jeblairya.15:00
jeblairtristanC also pointed out that we use that trick in the unit tests right now.  though, we could probably work around it.15:02
tristanCwell it's saturday here, i'll update the change with the solution you come up with :-)15:05
jeblairtristanC: we'll leave a good comment, thanks!  happy saturdaying!15:05
jeblairmordred: so are we now thinking we should go with the mordred plan?15:06
tristanCthank you, have a good day!15:06
mordredjeblair: the pabelanger plan you mean?15:07
pabelangerI'm working on the sercret / ssh plan today, I think that's what we decided to do for logs.o.o publishing15:08
mordredjeblair: and yah - I think it makes the most sense to me and keeps things in ansible-remote language15:08
jeblairmordred: well, you were just saying we could go ahead and add the bind-mount options... which is more leaning toward your suggestion on what tristan already has, right?15:09
jeblairmordred: the pabelanger plan, at that point, would just be not to use those options.  :)15:09
mordredjeblair: oh - yes. I think we should add the bind-mount option that tristanC has written, and then for infra to do the pabelanger plan which is to not use those options15:09
jeblairpabelanger: that work for you ^ ?15:10
mordredjeblair: (it would also let us continue to use them as a hack for unit-tests for the things where the interaction chain gets kind of deep and where we really could do with a multi-node integration test but aren't there yet)15:10
pabelangerjeblair: Sure, I wouldn't expect us to use that setting in openstack-infra like mordred said.15:12
jeblairmordred: yeah, if we think we're going to want it at all, i figure we should do it now so we don't have to rework the unit tests.15:12
pabelangerI also have no idea how you would protect any trusted job from using that mount15:12
jeblairpabelanger: you wouldn't -- it's trusted.  :)15:13
pabelangerRight :)15:13
pabelangerokay, are we okay to bring nl01.o.o back online and restart ze01.o.o to pick up our double popen fix?15:14
pabelangerAlso, it would be great if we could do SSL on gearman today15:14
jeblairpabelanger: ++.  do you mind doing that while i breakfast?15:14
pabelangergoing to ping openstack-infra people15:14
mordredpabelanger: ++15:14
pabelangerjeblair: sure, np15:14
mordredpabelanger: anything you need that I can help unblock?15:14
pabelangermordred: SSL ca right now is the blocker, via openstack-infra15:15
pabelangernl01.o.o and ze01.o.o restarted15:18
pabelangerokay, ze01.o.o started now. We seem to be leaking out executor pid on shutdown, which causes the service to fail to start.15:25
pabelangeron my list to fix15:25
jlkso much scrollback15:28
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop supporting python2  https://review.openstack.org/47500915:29
jlkIf anybody has time, I'd like a peek at https://review.openstack.org/474401 to make sure I'm going in the right direction.15:30
mordredjeblair, pabelanger, jlk, Shrews, SpamapS, clarkb, fungi: ^^ patch here and in project-config for killing py2 support. testing the websockets work is extra difficult keeping support for 2 because of syntax changes even with test skips. let's go ahead and pull the plug15:31
mordredjlk: looking15:31
mordredtristanC, jamielennox: ^^ sorry, missed you in the ping line15:32
fungieverything's working well on py3k at this point though15:32
fungi?15:32
jlkoooh I'll have to update my docker stuffs15:32
jlkbut I'm cool w/ that.15:32
Shrewsi whole-heartedly support killing py2 for the reasons15:32
pabelanger+215:33
mordredfungi: yup15:33
pabelangeralso, zuulv3 SO much more stable this morning: http://zuulv3.openstack.org/15:33
pabelangeryay15:33
jeblairmordred: in tristanC's patch, why is state_dir involved?15:34
fungimordred: pabelanger: are we still building the zuul-cloner venv w/ python2.7 in dib elements?15:34
mordredfungi: we are - but we don't need the zuul-cloner venv in zuul v315:35
mordredjeblair: because state_dir has to get passed to the rw list for bubblewrap - so it's just getting added to the rw map so that there is one mechanism not two15:35
jeblairand master branch should stay v2 compatible and tested until we merge v315:35
mordredjeblair: ++15:35
Shrewsto mordred's point about ws testing, our chosen implementation (autobahn) uses asyncio. to test it, i need to create a client within the unit test, but doing so is hard when py2 barfs on any asyncio-isms15:35
pabelangerfungi: mordred: I can confirm I haven't changed our virtualenv for DIB15:35
jeblairmordred: i don't see that happening though -- all i see is that it's *added* to what's already there15:36
fungimordred: good point, jobs no longer invoke zuul-cloner, they declare the repos they need instead15:36
jlkOooh dep cycle detection in pipeline merged, I'll need to rebase my github depends-on15:36
mordredjeblair: ah - well, we should make sure that what I said happens - or that state_dir doesn't get added to the mapping15:36
fungi(which will get tricky, we have some jobs righth now that decide at runtime whether to install source via z-c checkout or from pypi depending on what change is being tested)15:37
jeblairjlk: yeah, it looked correct to me -- there's still follow-on things we'll want to do, like make a driver api method to find cross-connection dependencies, and also report cyclic deps, but it's a good base.15:37
mordredfungi: I think that'll actually get easier for them ...15:37
mordredfungi: the git repos will all be there at the start of their job, but their logic can just be "use the git repo that's there or install from pip"15:38
fungimordred: oh, yep i guess they can still decide that at runtime regardless, they can just ignore the provided checkouts if they want15:38
jeblairmordred, fungi: ++15:38
jlkjeblair: okay. I need to sprinkle in a few TODOs, implement the needed-by detection, and for real implement the canMerge method for github ( jamielennox  has a patch up to help with that). I hope to have this ready for real review today.15:38
jeblairfungi: it makes the related use case, "make this cross-repo unit test work on a dev workstation and in the gate" easier too, because *both* of those cases are likely to have local git repos.  :)15:38
mordredfungi: and since the git repo will actually be checked out to the right ref - they can just do things like "pip install {{ zuul_workspace_root }}/src/git.openstack.org/openstack/repo" and not clone at all15:39
fungitrue, true15:39
fungimordred: looks like the docs job may not be py3k-ready. looking through the failure logs now15:39
mordredjeblair, jlk: looking at the gh patches - sake of argument real quick - is Depends-On in the commit the right model? why not Depends-On: in the PR message?15:40
mordredfungi: neat! that'll be fun15:40
fungiheh15:40
fungisphinx.errors.SphinxWarning: WARNING: Unexpected re15:40
fungiturn code 1 from command 'zuul --help'15:40
jeblairmordred: for one: so it shows up in the commit log15:40
* fungi grumbles at stray newline in his paste15:40
mordredjeblair: PR message is what's used as the commit message of the merge commit typicaly15:40
fungii agree with sphinx, zuul --help should not exit 115:41
mordredjeblair: although, I suppose if there is no merge commit, then you'd lose it15:41
jeblairmordred: both good points.  :)15:41
jlkmordred: couple reasons. 1) PR message can be edited, and we're not currently handling _anything_ about the PR message. 2) it shows up in the commit log of git, where as the PR message does not.15:41
jlkand it also mirrors gerrit which uses the commit message15:41
jeblairmordred: another point in favor of commit: it's more specific in a multi-commit branch15:41
jeblairlike "this commit specifically needs this other thing"15:42
mordredyah. those are good points - let's just keep our eye on it as we use it to see if it starts to feel weird15:42
jeblairmordred: yep.  certainly a legit question.15:42
jlktotes15:43
jlkWe could _add_ looking at the PR message later.15:43
jlkbut I think we should keep the commit message method15:43
mordredkk15:43
jlkThanks for the feedback. I'm afk for a little bit, then back at it.15:46
SpamapS2017-06-16 15:33:49.160236 | sphinx.errors.SphinxWarning: WARNING: Unexpected return code 1 from command 'zuul --help'15:47
mordredthat seems to be a legit error15:48
SpamapSLike, zuul --help doesn't work on py3 now?15:49
SpamapSAttributeError: 'dict_keys' object has no attribute 'append'15:49
SpamapSyep15:49
jeblairweird, it works on zl0115:50
SpamapSmordred: you working on a patch?15:50
mordredyah15:50
mordredit's the dict_keys thing15:50
SpamapSVery obvious py3 problem15:50
SpamapS    choices=self._show_running_jobs_columns().keys().append('ALL'),15:50
SpamapSdurn views15:50
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Stop supporting python2  https://review.openstack.org/47500915:51
mordredfungi: ^^15:51
fungimordred: okay that's weird... what happened to patchset #2 for that change?15:53
SpamapSit went to plaid15:53
mordredfungi: it was a py3 bug in the zuul command line client15:53
fungidid you abort a push partway or something, maybe?15:53
mordredoh - yah15:53
mordredI realized halfway-through that I shoujld updated the commit message15:54
fungithe sad sad state of gerrit data integrity15:54
jeblairthere's not not a patchset 2...15:54
fungimy point exactly15:55
fungiseems like we've seen that behavior before too. you can abort mid-push and gerrit doesn't unroll whatever transactions it started in reserving that patchset, so skips to the next one because it considers it used or something i guess?15:56
jeblairoops, my bad, it *is* broken on ze01.  i was on zl01 which is py2.  :)15:56
pabelangerhttps://review.openstack.org/#/c/474321/ is an easier view, fixes deprecation warnings on ze01.o.o for our tox/pre.yaml file15:57
pabelangerreview*15:57
mordredpabelanger: can you explain what gather_subset: "!all"  does? (the patch looks fine- but now I'm curious)15:59
pabelangermordred: clarkb recommended it for a minimal set of facts to be gathered15:59
pabelangerto help speed things up a little15:59
mordredah - so that gathers basic facts but nothing additional15:59
mordredcool15:59
pabelangerYa, never knew about it until clarkb suggested it. And so far seems to be what we need for our usage of facts16:00
mordred++16:10
clarkbya fact gathering can be quite slow16:26
*** hashar has quit IRC16:29
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add known_hosts to bubblewrap jobir.work_root  https://review.openstack.org/47310016:31
pabelangermordred: SpamapS: just a rebase, but maybe jeblair would also like to review^16:31
jeblairpabelanger: lgtm!16:40
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add known_hosts to bubblewrap jobir.work_root  https://review.openstack.org/47310016:51
*** jamielennox has quit IRC17:03
*** jamielennox has joined #zuul17:03
*** jlk has quit IRC17:23
*** jlk has joined #zuul17:23
*** jlk has quit IRC17:24
*** jlk has joined #zuul17:24
jlkhrm, I need a wee bit of help on this depends-on change again. I'm looking at _getNeededByFromCommit() in gerritconnection.py, I'm trying ot understand what the query is doing here. It's looking for any commit that has in the change ID anywhere in it, and then running the regex locally on the results?17:26
jeblairjlk: yes, that was due to a limitation in gerrit searching (we couldn't search for "Depends-On: changeid")17:27
jeblairjlk: but we could search for the sha, so we get all those changes, then reduce them to actual depends-on.  the set we get back from gerrit will, for example, also include the change itself since it has its own change id in its own commit message.17:28
jlkso for github, I could search for any commit that has 'Depends-On' in the message, then check if any PR has that commit in it, and check if the PR is still open, and add it to the needed by list.17:29
jeblairjlk: sounds right17:30
openstackgerritJames E. Blair proposed openstack-infra/zuul-sphinx master: Initial commit  https://review.openstack.org/47504117:38
jlkoh light bulb. I'm searching for something that depends on this specific change, not just any change. I could search for "depends-on:" and "org/project/pull/number" to really narrow down results.17:42
jeblairjlk: yes; i think you should do that with github.  you should in most cases end up with only exactly what you need.17:45
jeblairjlk: i think we need to support url depends on17:46
jlkyeah, the regex in my wip uses the full URL. e.g.: https://github.com/org/repo/pull/233217:46
jeblairok cool17:46
jlkbut I was going to ask for help with the regex, as we probably should support more than just "github.com", for github enterprise, and for people who do https://www.github.com/17:47
jeblairi don't think we've achieved consensus in the email thread on whether we should *also* support github-local syntax... i was going to resume that after ansiblefest.  but i do think we need to at least support the full url, so we may as well start with that.17:47
jlkto start with, should we only look for deps within a given connection?  like when dealing with a change in Connection A, the regex should use connection A's hostname to look in?17:48
jeblair(and hey, if i'm wrong about that, it'll be easy to change)17:48
jeblairjlk: well, the thing that *ultimately* needs to happen is for the pipeline manager to drive the process, rather than the connection.17:48
jlkagreed, starting with the URL makes sense. It's easier to copy/paste and github itself will translate the full URL into a relative url within the UI17:49
jeblairjlk: so, the pipeline manager would ask each connection "hey, does this URL look like something you can handle?  ie, is this your hostname?"17:49
jlkright, okay, so it sounds like yes, the regex SHOULD be connection hostname specific17:49
jeblairjlk: then if the connection says yes, the connection performs the query and the $something (pipeline manager?  something in a driver? whatever it takes) should update the change dependencies17:50
jeblairjlk: so yeah, sorry if i went too far down the road... :)17:50
jlkall good17:50
jeblairjlk: i think now and later the actual query will be connection specific, so that's safe.17:50
jeblairjlk: and for now, since none of that other stuff i said exists, just doing it all intra-connection, like gerrit currently is, is fine17:51
jeblairthen we'll build the pipeline superstructure for this and move it into it17:51
mordredjeblair: on the zuul-sphinx repo - as a followup patch you should probably set basepython to python3 so that we don't accidentally make a sphinx plugin that would break in zuul's docs build17:52
jeblairmordred: ++17:53
mordredjlk: so that means you should be able to just do a string substitution with the connection's hostname for now, yeah?17:54
jeblairmordred: yep17:55
jlkmordred: right, it means the regex we look at the commit messages for, and the search string we hit github for should both be hostname specific to the connection17:55
* jeblair realizes he is not jlk but answers anyway17:55
jlkreplace "github.com" with connection.hostname  or connection.canonical_hostname17:56
jeblairprobably .hostname i think17:56
mordredyah17:56
jlkoh, for github driver that appears to be "git_host" config entry18:01
jlkoh screw you github search. "depends-on" should not be matching against "depends" "on"18:03
jeblairwow.  i mean, they're searching *computer programs*.  it's not like multi-word tokens are rare....18:04
jlkthey have a method for doing multi word too, they say quote the pair, like "computer program" to match both, instead of computer program   to match either18:05
jlkbut it's taking the -  and making it a space18:05
clarkbit should treat the other end the same way though18:06
jlkcrap, this search might be only matching commits on master branches.18:09
jeblairjlk: this may end up being slightly more like the gerrit query than would otherwise be ideal then.  "depends" "on" "https://github.com/url".  then filter locally to get actual "depends-on: url".18:09
jeblairwhat, really?18:09
jlkyeah I might have to. I'm seeing some results of people putting "depends on" in the commit message, not our style.18:09
mordredjlk: I can't get github to return matches for https://github.com/j2sol/z8s-sandbox/pull/118:10
jlkmordred: that's what I'm discovering too :(18:10
jeblairmordred, jlk: do you have a curl command you can paste me?18:11
jlkWe're using the search interface I think18:11
mordredyah - just using the search box in the web ui18:11
jeblairprobably worth trying the api in case there's a behavior difference18:11
jlk"depends on:" https://github.com/j2sol18:12
jlkat https://github.com/search18:12
jlkyeah I'll get into the API in a minute18:12
mordredjeblair: fwiw, https://github.com/j2sol/z8s-sandbox/pull/2 is the pull request that contains a commit with depends-on18:12
jeblaircurl -H Accept:application/vnd.github.cloak-preview https://api.github.com/search/commits?q=repo:octocat/Spoon-Knife+css18:13
jeblairokay, that's their docs example18:13
mordredcurl -H Accept:application/vnd.github.cloak-preview https://api.github.com/search/commits?q=repo:j2sol/z8s-sandbox+Depends-On18:14
mordredgets me no hits18:14
jeblairyeah.  "zuul" does.  "dumb" does not.18:15
jeblairconfirming the master-branch only behavior in the api.18:15
jlkthat's going to be a bummer.18:15
jlkso...18:15
jlkwhat we could do, since this is mostly a convenience routine, is just look in our own internal cache of changes and the commit messages associated with those changes18:16
jlk(since I have a cache of all of them we've seen)18:16
jeblairjlk: yeah, that's an option.... though i'm also wondering if "do nothing" would be better.  because using the cache means inconsistent behavior.  at least "approved changes which depend on unapproved changes will need another approval after their dependencies merge" is explainable.18:17
mordredsearching commits is wrong18:17
mordredcurl -H Accept:application/vnd.github.cloak-preview https://api.github.com/search/issues?q=repo:j2sol/z8s-sandbox+"Depends-On:"18:17
mordredthat works18:17
jeblairmordred: of course :)18:18
jlkwhaaat?18:18
mordredpull requests are issues18:18
jlkhow hte hhell18:18
jlkyeah, but18:18
jlkoh18:18
mordredif we're searching commits - we're asking it to search commits that are in the repo18:18
jlkYou're finding the pull request message, not the commit message18:18
mordredpull reuqests aren't landed yet18:18
jlkpull requests have landed SOMEWHERE18:18
jlkrather the commit in question exists18:18
mordredthey have - you just don't know where18:18
jlkit's just not on a master branch18:18
jlkyou do know.18:18
mordredyah18:18
jlker, you know via the PR details18:19
jlkI really wonder if GraphQL would save us here, but that's a lot of node walking18:19
jlkI lvoe that this limitation isn't documented18:21
mordredhttps://stackoverflow.com/questions/31891733/searching-code-in-a-specific-github-branch18:24
jlkyeah, that's the indication I got18:25
mordredhttps://stackoverflow.com/questions/43533851/search-for-commits-on-a-branch-other-than-master-branch-in-github "Only the master branch is indexed for search in github.com."18:25
jeblairjlk: in https://github.com/j2sol/z8s-sandbox/pull/2  is that first comment automatically generated, or did that come from your commit?18:26
jeblairbasically, i'm wondering if it's that comment that matches mordred's search, rather than the commit, and what the implications are for that18:26
mordredjeblair: yes, I am certain it's that comment athat matches my search18:27
jlkjeblair: that's the description of the pull request18:27
jlkwhen opening a PR you can put in a subject, and a description (like a git commit). Often it'll be pre-populated from the git commit itself, if the PR only has one commit on it.18:28
jlkbut they can be edited after the fact18:28
jlkso yeah, his search is finding the PR comment, not the commit message18:28
jeblairso if the pr had 3 commits and the depends-on was in the second header, we would likely not find it in a search?18:28
mordredthat's correct18:28
mordredunless the PR author put it in their PR message18:28
jeblairwelp.  you may get your wish, mordred.  :)18:29
jlkyeah, if the PR has 3 commits, the PR author usually gets a blank box for subject/description18:29
mordredjeblair: well, that's sad, because y'all had convinced me commits were better18:29
jlkyeah :(18:30
mordredbut I agree18:30
mordredit seems like they're gonna need to go in the PR message if search is going to work18:30
jeblairthis system is so black-boxy, we should probably test that case, just to confirm18:30
jlk oh I can totes do that on PR 118:30
jlkSee https://github.com/j2sol/z8s-sandbox/pull/118:32
jlkI just added a second commit there, that has Depends-On in the commit message18:32
jlk(and even though the UX hides it, it does actually have the full string "Depends-On: https://github.com/j2sol/z8s-sandbox/pull/2"18:33
mordredjeblair: so - remind me again - the purpose of this bit is to be able to take action on changes that depended on a change once we take action on it, right?18:33
jeblairjlk: yep, i get no results for that :(18:33
jlkbut now... watch18:34
jeblairmordred: yeah, so when B->A and you approve B, and then go on vacation and then come back and approve A, both of them land.18:34
jlkI added a comment with that string18:34
jeblairjlk: and i get a result now18:34
jlkyeah, so. hrm.18:35
mordredjeblair: so - sake of argument- the only values of B we're going to care about will be Bs that will have generated an approval event already at some point in the past, right?18:36
jlkIf we do this off of PR comments, we just have to handle PR comment edit events, at least to update existing changes.18:36
jlk"an approval event" ?18:36
jeblairmordred: generally, yes.  we can't rely on the cache though.18:36
jlkmordred: something that would match a pipeline/trigger requirement?18:37
mordredjeblair: right. I don't think the cache is the right idea, it would be something new18:37
mordredjlk: yah18:37
jeblairmordred: i'm all ears18:37
mordredjeblair: but just saying if we think commits are the correct way to do this, and we can't search for them, we could record pending dependencies for later reference18:37
jlkokay, yeah for github that'd be a review, a label, a status event18:37
jlkmordred: I think that fails if zuul is restarted between events.18:38
jeblairmordred: ah, more state; another database.18:38
mordredjeblair: not necessarily - it could be something we save/restore along side pipeline state18:38
mordredsince it's essentially like a deferred entry for a pipeline18:38
mordred"this would be enqueued but can't yet be because it's waiting on X"18:39
jlkoh you'd want to stash it during graceful restarts?18:39
jeblairmordred: yeah, but it's also essentially unbounded in terms of size and time.  i'm not sure "only save it during restarts" is that much different than "always save it" for something like that.18:39
jeblair(these are, specifically things we expect to stick around for days, if not months.  unlike pipeline entries which stick around for minutes or hours)18:40
mordredjeblair: indeed.18:40
jeblaireither way -- the main thrust is "persistent cache of this data", yeah?18:41
mordredyah, essentially18:41
mordredsince it'sa strict subset of PRs we will have seen at some point in time18:41
jeblairi'd like to cipher on this for a bit... i propose we all mull it over for a while.  perhaps we should defer implementation of reverse depends for a bit?  unless jlk wants to proceed with the issue search just for fun, even though we may discard it...18:43
jlkwell, issue search is essentially dead right now.18:43
jlkI can delay implementing the reverse depends.18:43
mordredI agree - I think forward depends gets us a good way forward18:43
jlkjust leave the TODO here.18:44
mordredyah18:44
jeblairjlk: right, issue search is only viable if we decide to move headers to pr messages rather than commit msgs.18:44
jlkoh sorry yeah I see what you mean there. issue search vs commit search. Derp.18:45
jeblairso really, we have 2 related things to think about: issue description vs commit, and the memoization idea (that's a better word than persistent cache)18:45
*** persia has quit IRC18:47
* jlk off to lunch, and a kid's school function.19:14
Shrewsjeblair: tristanC: i looked at the first few reviews for tristanC's nodepool abstraction changes. direction seems ok to me. i only worry about changes touching the request handling since that was so difficult to get solid19:39
Shrewsand i can't quite see how other drivers will use (or not use) that, atm.19:40
jeblairShrews: yeah, i think we'll want to try to minimize driver interaction with that.  i confess it's not an area i have thought a lot about yet.  it deserves more attention.19:42
Shrewsjeblair: yeah. i mean, on the one hand, i don't want to see it just plopped into a currently solid system without thinking more about that. but on the other hand, i find it hard to think about that interaction without seeing it in action, so....  ¯\_(ツ)_/¯19:44
clarkbiirc one of my comments was that the request handling needs to be a base class with minimal amounts of changes in subclesses19:50
ShrewsWould kind of be helpful to have a second/separate system running with the changes (and maybe even a static node or two) to observe for a bit19:50
clarkba lot of that got refactored in the later changes, I think if we want to say the later changes are lower priority we will want ot push those improvements down into the static driver19:50
mordredShrews, clarkb: it seems the main things that the openstack driver does that arent' in the base driver are quota and images20:13
mordredand then the stay-in-same-az portion of the algorithm20:14
mordredso - yah - we may want to think about what it means for those to exist or not exist in a given driver20:14
clarkbmordred: off of memory that is right. I think the biggest problem is that request handling and "cloud" drivering got mashed together fairly well in these new drivers. And really we want to have 99% of it pushed into the driver20:15
clarkbwith just enough smarts in the handler to know which driver to dispatch to20:15
clarkb(which I left comments about in the changes, problem was that in a stack of ~6 changes things don't really congeal until the end but we are saying we only want the first change right now, so we want to rebase/refactor to get that congealed stuff into the static driver too)20:16
mordredclarkb: yah - that's fair20:16
mordredclarkb: I do feel like I'm learning a good deal about the surface area from going through them though, so that's good20:16
clarkbI think if we do that and make the separate clear then request handling is a non issue20:16
clarkbbasically its only an issue because he interfaces eveolve through the set of things we don't want immediately20:17
fungijeblair: jlk: mordred: depends-on in gerrit, being in the commit message, means you automatically get a new commit whenever adding/changing/removing one... if we relied on pr description fields in gh, which can be updated at any point, i suppose we have to re-evaluate them via the api every time we want to use them in case they get removed in a subsequent edit, right?20:27
jlkfungi: correct. We'd have to respond to comment edits and cause the in-memory change to get updated with current details.20:31
jlkarguably we might need to do that in case somebody typos a "recheck" comment and edits to fix.20:32
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Stop supporting python2  https://review.openstack.org/47500920:41
pabelangerzuul-ca should be coming online in next 5mins after puppet run20:41
pabelangerI'll generate ssl certs then20:41
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add node to README about Python3  https://review.openstack.org/47507020:42
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add ROADMAP.rst file  https://review.openstack.org/47507120:42
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add note to README about Python3  https://review.openstack.org/47507020:43
openstackgerritMonty Taylor proposed openstack-infra/zuul feature/zuulv3: Add ROADMAP.rst file  https://review.openstack.org/47507120:43
openstackgerritMerged openstack-infra/zuul-sphinx master: Initial commit  https://review.openstack.org/47504121:11
*** jkilpatr has quit IRC21:31
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add redhat-rpm-config to rpm build deps  https://review.openstack.org/47508222:31
jeblairSpamapS, mordred, pabelanger: does this cover what we discussed yesterday about control persist: https://storyboard.openstack.org/#!/story/200107223:01
mordredjeblair: yes23:02
pabelanger++23:10
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github  https://review.openstack.org/47440123:15
jlkjeblair: jamielennox: SpamapS: mordred: ^^^ is ready for real review.23:16
*** SotK_ has joined #zuul23:21
* SpamapS reads23:21
mordredjlk: zuul is angry23:22
jlkffffff.23:22
jlklooking23:22
jlkoh bother. Okay, I can't use self. in the regex that way, because self doesn't exist at that point.  grr.23:23
jlkShould I move that out of the class def and into __init__, or should I define the string with the blank spot and do the variable substitution at use time?23:24
*** mmedvede_ has joined #zuul23:25
*** mmedvede has quit IRC23:26
*** SotK has quit IRC23:26
*** mmedvede_ is now known as mmedvede23:26
mordredjlk: I'd just move it into the __init__23:27
*** toabctl has quit IRC23:28
*** toabctl has joined #zuul23:30
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Implement Depends-On for github  https://review.openstack.org/47440123:31
jlkdone.23:31

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!