Monday, 2017-11-20

tristanC	SpamapS: that would be definitelly nice to have, do you think nodepool could/should do that (manage speculative image)?	00:31
tristanC	SpamapS: i'm affraid the ansible kubectl connection plugin only seems to work with the raw/shell modules...	00:35
tristanC	SpamapS: perhaps another workflow would be to let zuul-executor build and push images to k8s, and then write the tests as part of a Job: https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/	00:37
tristanC	but then again, we'll lose the capability to write test definition in Ansible	00:39
tristanC	SpamapS: and what about lxd, why would it be a better choice?	00:41
clarkb	I think you'd build and push the image as a pre step	00:56
clarkb	since it will have access to the git repos there	00:56
tristanC	clarkb: maybe zuul could pass the speculative git repos as part of the requestNodes() call :)	00:57
clarkb	tristanC: one of the redesign goals of v3 was to avoid needing to serve git repos though	00:58
clarkb	I think ideally anything requiring speculatively merges code runs on the executor aide of things	00:59
tristanC	well the test instances do require the speculatively merged code... so instead of building the environment once with nodepool or a "containerpool" service, we end up pushing the merged code to each environment	01:02
tristanC	thinking about a kernel ci, it would be nice to have glance image with the speculative kernel so that it could run test in parallel	01:04
clarkb	for that I think you just install it and reboot as part of job?	01:05
clarkb	potentially with an earlier job building it	01:05
clarkb	especially since the kernel may not boot at all	01:06
clarkb	so youd want that job side to record it properly	01:06
tristanC	clarkb: yes, that's exactly what jeblair suggested :)	01:06
clarkb	having built kerbels that nuked grub you definitely want a away to account for that	01:07
tristanC	we could even get a post-nova-console.yml play	01:07
tristanC	clarkb: i am mostly interrested in the kernsec/bwrap/oci stack test, for which a aki/ari/ami would work fine	01:10
tristanC	i mean, as a thought experiment, it's an interresting workflow to address	01:12
clarkb	ya though I dont think it needs to be complicated, just install kernel and reboot	01:13
clarkb	if it doesnt come back treat it as a failure. if it comes back test it	01:13
tristanC	yes, thanks for that suggestion, that would easily work indeed	01:14
SpamapS	tristanC: nodepool would build the image that the pre job FROM's	01:40
openstackgerrit	Rui Chen proposed openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582	01:44
SpamapS	tristanC: and lxd is meant to run containers that act like VMs.	02:02
tristanC	SpamapS: iiuc, lxd doesn't provide an api, so you would still have to manage host instances (which is what the oci driver does)	02:10
tristanC	oh my bad, it does have a rest api	02:17
tristanC	SpamapS: anyway i'm not convinced lxd is a better choice over oci/k8s, at least not for tests like tox, go test, rpmbuild... Those seems to work fine with the containerized sshd trick	02:25
tristanC	it seems like more comparable to openstack instance, e.g. when you need the whole system stack	02:28
tristanC	well i never used lxd so i may be missing something. anyway it looks like a good nodepool driver candidate	02:43
*** threestrands has quit IRC		06:03
*** bhavik has joined #zuul		06:47
openstackgerrit	Ian Wienand proposed openstack-infra/zuul-jobs master: Ignore missing .tox/env/logs directorys for copy https://review.openstack.org/521436	06:52
openstackgerrit	Ian Wienand proposed openstack-infra/zuul-jobs master: Ignore missing .tox/env/logs directories for copy https://review.openstack.org/521436	06:55
*** xinliang has quit IRC		07:06
*** xinliang has joined #zuul		07:18
*** hashar has joined #zuul		08:33
*** bhavik has quit IRC		09:53
*** electrofelix has joined #zuul		09:55
*** bhavik has joined #zuul		10:03
*** bhavik1 has joined #zuul		10:37
*** bhavik has quit IRC		10:39
*** bhavik1 is now known as bhavik		10:39
*** bhavik has quit IRC		10:55
openstackgerrit	Merged openstack-infra/zuul-jobs master: Ignore missing .tox/env/logs directories for copy https://review.openstack.org/521436	10:56
*** kmalloc has joined #zuul		11:01
*** openstackgerrit has quit IRC		11:32
*** jkilpatr has quit IRC		11:52
*** jkilpatr has joined #zuul		12:24
rcarrillocruz	http://38.145.34.35/logs/ansible-networking/check/github.com/rcarrillocruz-org/ansible-fork/2/1e759d67fe084e1297cab4fba9440cd1/run-openvswitch-integration-tests/logs/job-output.json	12:32
rcarrillocruz	\o/	12:32
rcarrillocruz	first job run of openvswitch ansible modules on my testing zuulv3 server	12:33
rcarrillocruz	hooked with GH	12:33
SpamapS	rcarrillocruz: congrats!	13:52
SpamapS	tristanC: also sorry for getting all engineer-crit on your thing. I think its really cool to have a k8s option for zuul. :)	13:53
tristanC	SpamapS: heh, no offense taken :) i'm very new to docker/k8s and your suggestions are much appreciated	14:01
tristanC	sorry if i sounded defensive	14:02
SpamapS	not at all	14:18
SpamapS	just woke up and realized I had only been critical	14:18
SpamapS	And really I just want to boil it down to the thinnest thing possible.	14:19
SpamapS	sshd is certainly thinner than a whole VM :)	14:19
*** openstackgerrit has joined #zuul		14:22
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	14:22
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	14:23
leifmadsen	mordred: jeblair: fyi pabelanger helped me last Thursday/Friday, and I was able to get far enough to run a "hello world"!	14:36
leifmadsen	so I'll be circling back around in the next week or two, and building out a cleaned up set of docs/notes for a "Zuul From Scratch" (I'm thinking about avoiding use of "quickstart" in the docs, because it's not really very quick or light :))	14:36
leifmadsen	once I get that far, I'll propose some changes to the existing feature/zuulv3 branch	14:37
mordred	leifmadsen: \o/	15:00
leifmadsen	indeed heh	15:02
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Make build-python-release job https://review.openstack.org/513925	15:10
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Remove old python-sdist job https://review.openstack.org/513926	15:10
rcarrillocruz	folks, not finding much docs about how to stream console jobs on zuul. What is it needed? web-console up and zuul_console spawned on each job on the executor, what else	15:15
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Half-Revert "Revert "Add ensure-reno and ensure-babel roles"" https://review.openstack.org/521558	15:24
tristanC	rcarrillocruz: what is web-console? you meant zuul-web right?	15:24
*** bhavik1 has joined #zuul		15:24
rcarrillocruz	erm, yeah	15:25
rcarrillocruz	zuul-web sorry	15:25
tristanC	not sure what's wrong with your setup, but that should be enough. here is an apache conf to sit in front of zuul-web: https://review.openstack.org/#/c/505452/9/zuul/web/static/README	15:26
tristanC	well, without that patch cherry-picked, change it to ProxyPassMatch /console-stream ws://localhost:9000/console-stream nocanon retry=0	15:27
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Half-Revert "Revert "Add ensure-reno and ensure-babel roles"" https://review.openstack.org/521558	15:27
*** bhavik1 has quit IRC		15:39
rcarrillocruz	hmm, k, was mixing app webapp and zuul-web	15:41
*** jkilpatr has quit IRC		16:13
*** jkilpatr has joined #zuul		16:29
*** bhavik1 has joined #zuul		16:39
*** hashar is now known as hasharAway		16:45
*** bhavik1 has quit IRC		16:50
*** bhavik1 has joined #zuul		16:50
tobiash	ianw: I left an answer on https://review.openstack.org/#/c/520855	16:53
openstackgerrit	Merged openstack-infra/nodepool feature/zuulv3: Fix nodepool cmd TypeError when no arguemnts https://review.openstack.org/519582	16:56
*** bhavik1 has quit IRC		16:59
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	17:01
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars https://review.openstack.org/521590	17:01
mordred	jeblair: if you didn't see in the scrollback, electrofelix was asking questions in #openstack-infra about the ability to push merges from zuul and how adding that might or might not relate to merger/executor interactions ... I thought that was a good jeblair conversation :)	17:08
*** jkilpatr has quit IRC		17:09
*** jkilpatr has joined #zuul		17:10
mordred	jeblair: also - important to note - there is what I think is an issue with override-checkout not working as expected	17:15
mordred	jeblair: we disabled tox-py35-on-zuul from zuul-jobs as it was bombing out due to zuul master being checked out: https://review.openstack.org/#/c/521096/ and I failed at figuring out what was happening	17:16
jeblair	mordred: does that affect override-branch too? (or have we removed it?)	17:17
electrofelix	mordred: thanks, I keep forgetting this is the best room for zuul discussions, I'll break my habbit at some point	17:17
mordred	jeblair: well, first stab at fixing was "swich from override-branch to override-checkout"	17:17
mordred	jeblair: http://logs.openstack.org/42/521142/1/check/tox-py35-on-zuul/81bff30/ is an log from a failure	17:18
mordred	jeblair: but switching from one to the other did not have any impact	17:18
jeblair	electrofelix: i caught up on the earlier infra discussion -- but what's the issue you're trying to solve?	17:18
jeblair	(i think i need just a little more context)	17:18
electrofelix	We build artifacts during the gate that we don't want to rebuild afterwards, because what we built is what actually passed tests	17:19
jeblair	mordred: ok, so it seems to affect both. did this use to work and something changed?	17:19
electrofelix	some teams like to attach the metadata of the git repos to those artifacts	17:19
jeblair	electrofelix: okay, so this may actually touch on two related issues:	17:19
electrofelix	but the commit SHA1 won't line necessarily up subsequently, because it corresponds to the proposed merge by zuul is just a test rather than the actual merge	17:20
electrofelix	jeblair: need to switch from office to home, so maybe hold off discussion until tomorrow and I'll ping you again about it	17:21
jeblair	electrofelix: 1) using the result of the initial zuul merger calculation in the executors (rather than repeating the action in the executors), as well as 2) pushing the result of the initial zuul merger calculation to the target branch rather than having gerrit/github perform the merge.	17:21
jeblair	electrofelix: sounds good	17:21
*** electrofelix has quit IRC		17:21
mordred	jeblair: yes, I believe it worked at some point in the past as that was our initial thing to make sure zuul-jobs changes didn't break unittest jobs	17:25
mordred	jeblair: THAT SAID - it's possible we never actually validated that it was running zuulv3's unittests and not zuulv2s'	17:25
mordred	jeblair: I'm also not sure, even if it WAS running zuul v2's unittests - that that would break due to the patch in question	17:25
jeblair	mordred: yeah, i guess that sort of thing could slip by (were we running py3 tests on v2?)	17:26
mordred	jeblair: but there were too many variables in the air and I was more focused on helping get releasenotes jobs fixed, so I didn't diagnose deeply	17:26
jeblair	mordred: thanks, i'll try to dig in shortly	17:27
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	17:41
jeblair	mordred: there's a good chance we forgot to restart the executors after adding support for override-checkout. that may explain why override-checkout isn't working, but it would not explain why override-branch wasn't.	17:46
jeblair	(override-checkout change merged nov 1, a random executor start time is oct 31)	17:47
jeblair	Shrews: in shutting down openstack's executors, i see that 2 of them are stuck at the following:	18:05
jeblair	sendto(7, "59.676215 \| wheel-mirror-ubuntu-"..., 4096, 0, NULL, 0	18:05
jeblair	zuul-exec 9708 zuul 7u IPv6 3547817819 0t0 TCP ze04.openstack.org:finger->zuulv3.openstack.org:34988 (ESTABLISHED)	18:06
jeblair	Shrews: it looks like it's stuck transmitting data to the multiplexer	18:06
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618	18:07
jeblair	Shrews: these were last restarted on nov 10 and nov 15, so they should be running the old process code	18:07
jeblair	(though i suspect this would be the same in either case -- process or threading)	18:07
mordred	tristanC: what version of angular is required? (I just realized that we didn't add an entry into etc/status/fetch-dependencies.sh	18:13
openstackgerrit	James E. Blair proposed openstack-infra/zuul-jobs master: DNM: test tox-py35-on-zuul https://review.openstack.org/521623	18:14
mordred	dmsimard: do you perhaps know the answer ^^ ?	18:15
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add angular to fetch-dependencies.sh https://review.openstack.org/521625	18:23
mordred	tristanC, dmsimard: ^^ based on looking at xstatic packages and also storyboard-webclient depends I'm guessing that's the version you were using	18:25
dmsimard	There's no angular in ARA	18:30
dmsimard	Missing context, let me read back..	18:30
dmsimard	looking at https://softwarefactory-project.io/r/gitweb?p=scl/zuul-distgit.git;a=blob;f=zuul.spec;h=bfaaa32b7bb1b61ed84fa5695f665f05bd9af0b8;hb=HEAD there's no obvious version for it.. seems like it's being pulled from either in-tree or elsewhere	18:35
dmsimard	This is the angular.js file from a test deployment http://paste.openstack.org/raw/626839/ .. there's no version header in it. Something about 1.3.7 but looks about error handling instead.	18:41
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618	18:53
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	19:09
dmsimard	mordred, SpamapS: fyi along the lines of our discussion last week, I'm starting a thread on openstack-dev around leveraging zuul v3 jobs/roles/playbooks outside OpenStack -- it's focused around migrating some "downstream" TripleO jobs to Zuul v3 but I feel like the discussion will be worthwhile to see what we can do, what we can't, what works and what doesn't	19:20
dmsimard	I feel the need to mention it since you might filter out [TripleO] threads :)	19:20
mordred	dmsimard: sweet	19:22
jeblair	dmsimard: sounds cool :)	19:23
dmsimard	jeblair: oh, we started a pad to document issues with running zuul-jobs outside the gate: https://etherpad.openstack.org/p/downstream-zuul-jobs-issues	19:24
dmsimard	there's not much there yet, we only scratched the surface	19:24
dmsimard	We were also discussing how we can safely include zuul configuration from zuul-jobs/openstack-zuul-jobs/etc without incurring things as potential syntax failures due to clashing names or whatever else.	19:25
jeblair	dmsimard: cool, when things settle down, we should probably establish a storyboard tag or something for issues like that	19:25
dmsimard	So far we're using parameters like shadow and selective includes	19:25
jeblair	cool, it'll be good to know whether those work as intended or need further work	19:26
dmsimard	I was surprised to even see that those were available in the first place, it means someone thought about the use case	19:28
dmsimard	so whoever added that in, +++	19:28
tobiash	dmsimard: I added a point to your etherpad	19:47
dmsimard	tobiash: yeah, I haven't quite figured out that one yet	19:49
jlk	tristanC: SpamapS haven't read full backlog, but I also was thinking in teh direction of a kubectl driver for ansible, so that you can do kubectl exec type things and not need ssh inside the images.	19:49
tobiash	I looked a bit into that some weeks ago but didn't have time for really working on that yet	19:49
jlk	I see "ssh" as a byproduct of how openstack VMs work	19:50
jlk	and if they're not necessary to execute things inside the contained (vm or otherwise) environment, then all the better.	19:50
pabelanger	jlk: the comment that interested me from tristanC was how would do synchronize task for logs from container?	19:51
jlk	synchronize stil works with the docker module	19:52
jlk	the docker connection module for Ansible, which just uses docker exec. Presumably the same sort of thing can work on k8s	19:53
jlk	granted, I haven't looked close enough at _how_ it works for Docker, but I've used it before.	19:53
pabelanger	kk	19:53
*** hasharAway is now known as hashar		20:00
openstackgerrit	Andreas Jaeger proposed openstack-infra/zuul-jobs master: Half-Revert "Revert "Add ensure-reno and ensure-babel roles"" https://review.openstack.org/521558	20:07
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	20:15
mordred	jlk: ++ to using native k8s/docker connection for the things - I believe flaper87 wrotea kubectl module for ansible - but I think in this case we'd need a kubectl connection plugin, yeah?	20:19
clarkb	in fairness ssh is a byproduct of how ansible works :P	20:21
dmsimard	what's a native k8s connection ? I know openshift has "oc rsh" but I haven't done much pure k8s	20:21
clarkb	if we used some other rpc system we'd use whatever protocol that speaks over	20:22
dmsimard	on openshift if you do "oc rsh <pod>" it opens a shell in the pod, not sure what it does behind the scenes (or how it would pick one of the containers in the pod)	20:23
dmsimard	openshift-ansible probably have a module/plugin for that actually	20:23
clarkb	(it also happens to be the case that jenkins used ssh as well, so that wasn't a change for us)	20:23
clarkb	(but nothing about it is openstack vm specific	20:23
dmsimard	yeah doesn't look like openshift-ansible folks have something like that yet but it'd be "insanely cool"	20:26
dmsimard	the different upstream connection plugins are here https://github.com/ansible/ansible/tree/devel/lib/ansible/plugins/connection	20:27
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	20:49
mordred	clarkb, dmsimard: yah - I think it's actually "ssh is a byproduct of the fact that we're currently using VMs that look like computers for our test nodes" - due to the connection plugin support, if we produce test nodes that are not intended to be connected to over ssh, that should not be a blocker	20:59
mordred	we're already aware of at least win_rm for the windows nodes	21:00
*** jkilpatr_ has joined #zuul		21:00
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Normalize daemon process handling https://review.openstack.org/517381	21:00
*** jkilpatr has quit IRC		21:01
mordred	so yah - being able to support docker connection plugin for docker container build nodes or a theoretical kubectl connection plugin for k8s pods seems like a good way to handle those as we grow support for them	21:01
*** jkilpatr_ has quit IRC		21:01
jeblair	reminder, meeting is in 1 hour (this may have changed in reference to your local time for folks in the usa)	21:01
mordred	jeblair: nice daemon process handling cleanup patch	21:03
clarkb	mordred: ya I just think its important to decouple that from openstack, nothing in openstack says you have to do it that way and in fact you could run windows on openstack if you wanted and do whatever it is you do with windows for example.	21:05
jeblair	yeah, in working with leifmadsen i was able to see what worked well and didn't in zuul/nodepool. nodepool's handling of pidfiles wins because of the way they use paths, and zuul's default logging config wins -- that's something we should work on porting to nodepool.	21:05
clarkb	mordred: the determining factor for us is ansible (and prior to that was jenkins)	21:06
mordred	clarkb: I agree - it's not caused by openstack ...	21:06
mordred	clarkb: but I think it's just as important to point out that it's not actually driven by ansible either, as ansible has plenty of non-ssh connection plugins	21:07
mordred	it is the combination of the fact that we are running ssh capable VMs and that is the default ansible connection mechanism	21:07
clarkb	mordred: right thats fine we could use whatever ansible supports	21:07
mordred	yah	21:07
clarkb	it does ssh by default so we've ended up there by default	21:08
mordred	yup	21:08
clarkb	jenkins too fwiw, there were non ssh methods	21:08
mordred	and I thnik it's 100% the right choice of how to connect to linux-based things that look and behave like multi-user/multi-process computers	21:08
mordred	(whether those come from bare metal, vms or containers)	21:09
clarkb	ya	21:09
jeblair	okay, i'm going to go out on a limb here... is there anything new happening in this discussion, or are we just repeating the discussion we have every few weeks?	21:09
jeblair	i'm asking because i'd like for us to, as a group, avoid spinning our wheels	21:10
jeblair	these discussions take a lot of time and attention	21:10
jeblair	and this is an important topic	21:10
clarkb	if nothing else, I think it shows that we may have confusion that ansible is what drives this	21:10
jeblair	and we're not going to be able to address it fully right now. we do have an item (maybe more than one item) on the roadmap to deal with non-vm test hosts, after the release	21:10
clarkb	zuul doesn't really care, nodepool doesn't really care and openstack definitely doesn't care. Ansible is what determines this so maybe we need to write that down "if you want to connect to something not sshd then you need to make ansible talk to it"	21:10
jeblair	is there a way we can structure our work so that we can say "yeah, we totally know this is a thing, let's work on it later?"	21:11
jeblair	or do we need to further delay the zuulv3 release so that we can work on this now?	21:11
clarkb	I don't think we need to work on it, its not something that zuul has to be directly aware of I don't think (but I could be wrong about that)	21:12
jeblair	this is what i was trying to accomplish with a roamap. have a place where we say we know what we're going to do in the future, but it's in the future. now we have boring^Wexciting things like "actually make system usable and release" to do :)	21:12
clarkb	it would just be documentation for zuul users that says "make ansible do it"	21:12
clarkb	regardless of what the system is (windows, k8s, docker, etc)	21:13
jeblair	clarkb: well, i think this needs a significant amount of thought and discussion but i'm not prepared to have it now, and i'd like other folks to have the space to focus on near-term goals.	21:14
jeblair	so how can we do that? or am i wrong? do we just need to accept that we need to solve containers right now?L	21:14
jeblair	basically, in my mind, the roadmap is flexible -- we can change it. but if we agree about it, let's stick to it.	21:15
clarkb	I don't personally think we need to solve containers right now (we've not needed them in ~5 yaers). I do think it is something people are interested in tackling though and if there isn't any direction for them that maybe isn't a great thing	21:15
jeblair	i'm happy to work on that, but i'm afraid we don't have enough people signed up to do the things we need to get to the v3.0 release.	21:16
leifmadsen	isn't this the type of discussion we have every 6 months? :)	21:17
leifmadsen	i.e. isn't that was devcons are for?	21:17
mordred	jeblair: the main thing I was trying to communicate by responding to this particular incarnation of the topic was to remind folks that support for non-ssh connection plugins is going to be important no matter what the thing on the other end of the connection winds up being - and I think from a zuul pov much of that doesn't have a ton to do with containers vs. non-containers	21:17
jeblair	leifmadsen: yes, i'd like to get us synchronized with that. hopefully in the next cycle we can take advantage of it more.	21:17
jeblair	leifmadsen: we did indeed sketch out our roadmap at the last one	21:17
jeblair	leifmadsen: i'm wondering if it has meaning. :)	21:18
leifmadsen	depends where it is, and if you're working from it	21:18
leifmadsen	and if everyone knows what it is, and that it's being worked from :)	21:18
jeblair	leifmadsen: indeed	21:19
leifmadsen	also, it's good to document these things as you go for sure, because you'll forget all this stuff when you're building an agenda for the next devcon :)	21:19
leifmadsen	so a place to document these discussions so they can be useful and had, then documented, then added to roadmap, is a good place to be	21:20
jeblair	leifmadsen: that's another reason to put the roadmap in storyboard i suppose	21:20
leifmadsen	is it just in an etherpad right now? :)	21:20
jeblair	leifmadsen: it was an etherpad, became an email, and i believe at the last meeting pre-summit, clarkb and i signed up to make it be in storyboard	21:21
leifmadsen	yea, so basically, it doesn't exist :)	21:21
jeblair	leifmadsen: not sure why not -- we're all on the email list	21:21
leifmadsen	I don't consider etherpads and email documentation :D	21:21
leifmadsen	email lists are incredibly hard to go and lookup information. The ideal place, in my head, is a link to a sane location from the README	21:22
jeblair	it's not documentation, it's meant to be a discussion, and something to agree on	21:22
leifmadsen	it's documentation	21:22
leifmadsen	you're documenting a plan	21:22
leifmadsen	otherwise, you're just pontificating	21:22
jeblair	leifmadsen: i think there's a miscommunication here	21:22
jeblair	leifmadsen: i think it's really important for us to discuss something like this, which is why i started it in person, and online, at the summit in an etherpad	21:22
leifmadsen	I'm not sure there is :)	21:22
leifmadsen	yes, I agree	21:23
jeblair	leifmadsen: then followed up on the mailing list to make sure even more people were included	21:23
leifmadsen	what I'm saying, is your roadmap is not yet documented	21:23
leifmadsen	if it's only on an email list and an etherpad	21:23
jeblair	leifmadsen: it will eventually end up somewhere. at our last meeting before the summit, we discussed whether it should be in a readme in repo or in storyboard	21:23
jeblair	leifmadsen: and we decided to put it in storyborad	21:23
jeblair	leifmadsen: i think that should make you happy. but i suggest your criticism is unwarranted.	21:23
leifmadsen	sure, that works, and then you can link to it from a README so that people getting to the project know where to look	21:24
leifmadsen	unwarranted?	21:24
leifmadsen	ummm ok	21:24
leifmadsen	I'm not intending my tone to be harsh	21:24
jeblair	here's the email: http://lists.openstack.org/pipermail/openstack-infra/2017-November/005657.html	21:24
pabelanger	re:containers, it does seem to be something people want before adopting zuulv3. But at the same token, we are seeing requests to tag zuulv3 now. So feel like catch-22 in that aspect. But agree, we need to stablizing things more before new features	21:25
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	21:26
leifmadsen	everyone wants all the things all the time	21:26
leifmadsen	just gotta have your must haves, then move along with your life; development is never ending	21:26
mordred	jeblair: having just now gone to look at the roadmap again real quick, I notice that "nodepool backends" is in "long term / design", which I think we may want to at least partially reconsider, as I know getting windows nodes is important to tobiash	21:26
jeblair	mordred: that's a backend?	21:27
mordred	jeblair: the part that of that that I think may be worth considering on the pre-3.0 roadmap is ensuring that we can support backends that don't use ssh ... as I could imagine that there might be some sort of weird breaking change associated with that	21:27
jeblair	mordred: i don't see why that can't be a forward-compatible thing in v3.1?	21:28
pabelanger	I could sign up for 'demonstrate openstack-infra reporting on github' (gtest-org project) and 'add command socket to scheduler and merger for consistent start/stop' items myself	21:29
jeblair	leifmadsen: yep, we have to make compromises	21:29
mordred	it might be able to be, for sure- but I know that talking through the win_rm auth needs with tobiash from a nodepool->zuul perspective exposed a few design assumptions	21:29
jeblair	pabelanger: thank you so much!	21:29
jeblair	mordred: could maybe someone write that up as an email or something?	21:29
mordred	jeblair: sure	21:30
mordred	jeblair: like, I don't think we need to have other nodepool backends fully implemented - it's more "before we release a 3.0, can we sanity check that we're not including something that would make doing so extra hard/awkward"	21:30
mordred	jeblair: but happy to write that up as an email to the list	21:30
jeblair	mordred: if there is something breaking, i agree we should look at it early. but i think our goal should be to polish up the thing that we are basically running, and do the minimum to ensure we're not backing ourselves into a corner later, and defer the rest.	21:31
mordred	jeblair: yes, I agree with that 100%	21:31
jeblair	(because "we're running a thing but telling no one else to run it because doing so sucks" is not a state we should be in long-term)	21:31
mordred	jeblair: I mostly want to make sure we're not backing ourselves into a corner on this one	21:31
mordred	jeblair: ++	21:31
clarkb	jeblair: I've reviewed the daemon change. I like it but a couple things that I think need to be addressed	21:39
jeblair	clarkb: thanks, i'll look at that. i also just noticed it collided with another patch that landed ahead of it. that's why pep8 failed. i'll have to untangle that too.	21:40
jeblair	clarkb: i'm having trouble following your static comment	21:42
jeblair	clarkb: i call "Executor().main()", so that's an instance method	21:43
clarkb	jeblair: when you call main() you are doing so as ClassName.main() not ClassObject.main()	21:43
jeblair	clarkb: nah, there's a () after Executor	21:43
clarkb	oh am I just compeltely blind? because that may be too	21:44
jeblair	it's just anonymous	21:44
clarkb	ya I'm just blind. So I think the only other thing is making sure you don't need a pidfile when running non daemon mode	21:44
jeblair	clarkb: cool. i agree with that and will implement in next rev	21:44
clarkb	and I think that with: pass will lock the file	21:44
jeblair	clarkb: yeah, it's intended to lock-then-unlock	21:45
jlk	jeblair: I think the "new thing" that's happened with "how do we container" is that a driver implementation was submitted	21:47
jeblair	jlk: oh, this is tristanC's change?	21:48
jeblair	521356?	21:49
jlk	that's the one that spurred the conversation. I have the tab open to read it, but I haven't yet. I glean from the conversation that it implements / relies upon sshd inside the container	21:49
jeblair	jlk: i suspect that's going to spawn the design discussion that we all know we need to have	21:50
jeblair	so it still may be worth asking ourselves, is it most beneficial to have this now, or would it be better to defer it until we're closer to that point on the roadmap. or do we need to change the roadmap.	21:50
jeblair	all of those '.' should be '?'. :)	21:51
jlk	well yeah	21:51
jlk	I'd like to contribute to this discussion/feature, because I have use cases, and some time to dedicate to it. But if y'all aren't ready for it, then now's not the time	21:51
jeblair	well, it's more "us" than "ya'll" i hope :)	21:52
jlk	that's... entirely fair. I've been checked out lately and I slipped phrasing	21:52
jeblair	to be fair "us aren't ready for it" isn't great phrasing either. heh. :)	21:53
jeblair	i'm apparently englishing poorly today	21:54
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Normalize daemon process handling https://review.openstack.org/517381	21:55
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	21:59
clarkb	meeting time now right?	21:59
jeblair	yep!	22:04
jeblair	in #openstack-meeting-alt	22:04
pabelanger	I'll only be able to attend first 30mins today	22:04
jeblair	jlk, SpamapS, leifmadsen, Shrews, dmsimard: ^	22:05
*** hashar has quit IRC		22:06
jeblair	mordred: ^	22:07
leifmadsen	I can't make 5pm meetings	22:09
leifmadsen	so won't be there :)	22:09
*** jlk has quit IRC		22:31
*** jlk has joined #zuul		22:32
*** jlk has quit IRC		22:32
*** jlk has joined #zuul		22:32
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add general sphinx and reno jobs and role https://review.openstack.org/521142	22:38
dmsimard	oh hey just got my notification that the zuul meeting is in 10 minutes from now .. :/	22:50
openstackgerrit	Merged openstack-infra/zuul-jobs master: Make build-python-release job https://review.openstack.org/513925	22:56
ianw	dmsimard: just be sure not to change the timeline, don't want to have a marty mcfly situation on our hands :)	23:00
clarkb	jeblair: re zuul config breakages the one I'm most familiar with is the parent to final job one so I'll start with that	23:01
clarkb	jeblair: one of the release jobs that only runs on tags was marked final. Neutron/Horizon then parented that job in order to add require projects to the job. This merged with no errors. Problem then only came up when tag was made and zuul said no that job is final	23:02
clarkb	this is less than ideal because it could be quite a bit of time between jobs. Maybe we handle this not with zuul self checking internally but by running a job that does an actual compile of the jobs?	23:02
dmsimard	jlk: the support for different version of Ansible doesn't necessarily have to come out of Zuul.. for example for ARA integration testing I just install a "nested" ansible of the desired version on the target nodes	23:03
dmsimard	jlk: because I want to test that ara works with different version of ansible/py27/py35/etc	23:03
clarkb	I'm guessing that the "binding" of final happens far too late in the compile process for the simpler syntax checking to catch it	23:03
jeblair	clarkb: yeah, that's a run-time error which is impossible (or at least very difficult) to detect in advance	23:03
dmsimard	jlk: however, it sort of sucks because then the "zuul" ara report contains just one large command instead of the granular tasks	23:04
jlk	dmsimard: right. That was my intent. I don't think you should be able to select which executor your job runs from based on the ansible version on the executor	23:04
jlk	otherwise I think it feels like being overly dependent on the fact that Zuul runs Ansible under the hood, making it that much harder to change (if we ever changed)	23:04
clarkb	jeblair: I'm digging through gerrit changes to find the error that prevented zuul from starting now	23:05
jeblair	clarkb: basically, you can construct combinations of variants that could theoretically trigger or not trigger such problems. so it's hard to detect in advance.	23:05
dmsimard	jlk: I still think, however, that "forcing" the upgrade from 2.3 to 2.4 on users is dangerous	23:05
clarkb	jeblair: https://review.openstack.org/#/c/519949/ is the change	23:05
clarkb	https://review.openstack.org/#/c/520205/ is another that we may want to look at as far as pre merge testing goes	23:05
dmsimard	jlk: historically there has always been issues between "major" versions (2.0 -> 2.1 -> 2.2 -> 2.3 -> 2.4)	23:06
jlk	yeah, do you pin that to major zuul versions?	23:06
dmsimard	people usually pin ansible for that reason	23:06
clarkb	we've had issues with minor updates too fwiw	23:06
dmsimard	clarkb: yeah, but they're not as common	23:06
jeblair	clarkb: there is perhaps a simple case where you could say that all variants for a certain job are final and therefore there would be an error. but that's hard, and it's half a solution, so i worry about whether it's a good idea.	23:06
dmsimard	they're gotten better at testing but their internal API is not stable	23:06
jlk	yeah there's two concerns at play	23:07
jlk	will Zuul's integration with Ansible continue to work	23:07
clarkb	jeblair: ya thinking it might be better to have a job that compiles zuul outside the running zuul?	23:07
clarkb	jlk: if that makes sense?	23:07
jlk	and will end user playbooks continue to work as expected	23:07
jeblair	clarkb: well, the thing is that the job doesn't exist until it's run. it's based on variants matching a specific change/ref/etc	23:08
dmsimard	clarkb: my idea around testing base jobs was along these lines, run a nested zuul (complete with executor, with a second node to be used in the upcoming static nodepool driver) and then do a "manual" zuul enqueue of the real job	23:08
jlk	Heizenjob	23:08
dmsimard	clarkb: but the problem with that is that you still need to load like 2000 repositories worth of configuration	23:09
clarkb	dmsimard: ya	23:09
clarkb	jeblair: gotcha	23:09
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Update fetch sphinx output to use sphinx vars https://review.openstack.org/521590	23:09
dmsimard	clarkb: but then the nested zuul also doesn't have the private keys to decrypt the secrets	23:09
dmsimard	which is totally okay, because otherwise that'd allow people to peek at things	23:10
clarkb	I wonder, could we do a lint type check that just walks up the parent tree for finals?	23:10
clarkb	jeblair: ^	23:10
clarkb	maybe thats the case you mean where all variants are final	23:10
jeblair	clarkb: that's the thing i've probably been unable to express clearly -- even the parent hierarchy isn't fully determined until the job runs	23:11
clarkb	jeblair: got it	23:11
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618	23:11
jeblair	if you say "parent: foo" and "foo" has N variants, we don't know which will apply. it could be anywhere from 0-N.	23:11
jeblair	(if 0 apply, the job doesn't run)	23:12
clarkb	ok lets ignore that one for now because its reltively minor and not easily fixable. https://review.openstack.org/#/c/519949/1 is the fix for the thing that prevented zuul from starting	23:12
clarkb	I think ^ is more important as it impacts the ability to run zuul	23:12
clarkb	(also if anyone is wondering we think we tracked back the source of the OOM to puppet openstack pushing a ton of job changes all at once which I think is a known issue, we asked OSA to push them slowly)	23:13
jeblair	clarkb: i suspect that error came in due to mordred's git surgery?	23:15
openstackgerrit	Monty Taylor proposed openstack-infra/zuul-jobs master: Add support for warning-is-error to sphinx role https://review.openstack.org/521618	23:15
jeblair	clarkb: i was wondering, thanks	23:15
clarkb	jeblair: ya I think .zuul.yaml must've come from os-c-c	23:15
clarkb	but that should still have failed pre merge no?	23:15
jeblair	clarkb: we should be in much better shape than when osa did it originally, but still, lots of job changes can use lots of ram	23:15
jeblair	clarkb: i think mordred did git surgery to create that branch and push it directly	23:16
clarkb	ah	23:16
clarkb	that may be the piece I am missing	23:16
jeblair	it is likely that caused zuul to get stuck on an old config as well	23:16
jeblair	(it would have kept running with the last config it was able to fully load)	23:16
clarkb	lesson here then is be very careful with force merges	23:16
* mordred reads		23:17
jeblair	one thing that will make this very specific case better: we plan to drop the project name from in-repo config files	23:17
mordred	AH. derp	23:17
mordred	yah. that's my bad for sure	23:18
jeblair	but the general problem of dealing with erroneous configs remain. i think we'll eventually have to do something like have zuul automatically remove projects with broken configs. once we have the dashboard, there will be a nice place to have a big red warning that has happened.	23:18
mordred	we can actually delete that branch already ... it was just thereto prevent gerrit from creating thousands of gerrit changes	23:18
jeblair	(that could have lots of follow-on effects, like breaking other projects, but i don't think there's anything else that could be done)	23:19
mordred	in fact, how about I go ahead and delete it now	23:19
mordred	jeblair: ++	23:19
SpamapS	sorry to miss the meeting.. had a long drive and we got moving late :-P	23:46
jeblair	no meeting while driving! :)	23:46
SpamapS	FYI, I use my zuul to test our ansible deployment stuff, which is pinned at ansible 2.1	23:56
SpamapS	I just install ansible in a virtualenv on a node and run it.	23:56
SpamapS	Which is good, because I wouldn't want to mix the concerns of Zuul with the concerns of my k8s deploying ansible code.	23:57

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!