Wednesday, 2017-06-28

*** dmsimard has quit IRC		00:18
*** dmsimard has joined #zuul		00:20
mordred	jeblair: yes - that was my thinking - the string there is a thing for a log entry	00:29
mordred	jeblair: however, I don't feel strongly one way or the other	00:29
mordred	jeblair: also - yes, your playbook in the etherpad looks right to me	00:35
*** xinliang_ has quit IRC		01:13
*** xinliang_ has joined #zuul		01:24
*** timrc has quit IRC		02:33
*** timrc has joined #zuul		02:42
xinliang_	tristanC: I figure out the reason why my CI won't triger job. I was confused by the two project: openstack-dev/ci-sandbox and openstack-dev/sandbox	02:54
xinliang_	My ci is listen ci-sandbox project, but I add recheck comment to sandbox project, so it won't trigger job ;-)	02:55
xinliang_	now it is working, thanks.	02:55
tristanC	xinliang_: nice! good to hear :-)	02:57
xinliang_	:-)	02:57
tristanC	xinliang_: you might want to setup the zuul status web page, it's quite handy to check what's going on. For example using http://git.openstack.org/cgit/openstack-infra/puppet-zuul/tree/templates/zuul.vhost.erb	03:00
tristanC	and copying the zuul/etc/status/public_html over to /var/lib/zuul/www	03:01
xinliang_	tristanC: sounds great! will try to setup it	03:02
xinliang_	tristanC: The zuul web is already setup by the puppet.	03:27
xinliang_	tristanC: One other question: Do you know is there any way to run CI jobs on bear metal machines?	03:29
xinliang_	currently, all the openstack CI jobs are running on vm machines, right?	03:30
tristanC	xinliang_: with jenkins, you can setup jenkins slave on bare metal, and if it has the right node label, then it will pick up zuul job	03:33
tristanC	xinliang_: with zuul-launcher, you can setup custom node in zuul.conf so that job get scheduled directly to a specific node	03:34
tristanC	xinliang_: and yes, currently all the openstack CI jobs run on ephemeral vm machines, so that the environment is identic between run	03:35
xinliang_	tristanC: thanks, I heard that openstack CI is migrated away from jenkins, so jobs are only managed by zuul right?	03:41
tristanC	xinliang_: yes, zuul2.5 has a zuul-launcher service that can execute jenkins-job-builder job in place of jenkins	03:42
tristanC	xinliang_: and the next version (zuul3) will replace jjb definition by ansible playbooks	03:43
xinliang_	that sounds good, which make ci system more simple	03:44
xinliang_	And if we run jobs on bear metal machines, is there any upstream tools for manage the metal machines?	03:44
tristanC	not afaik, you'll have to build something custom with ironic for example	03:46
xinliang_	ok	03:47
tristanC	at least you can prevent a slave to run more than a job using the OFFLINE_NODE_WHEN_COMPLETE job parameter	03:47
clarkb	nodepool works with ironic and nova	03:48
clarkb	people are using it that way for some ci	03:48
xinliang_	clarkb: great! good to hear this.	03:49
tristanC	clarkb: oh didn't knew that, good to know!	03:50
*** lennyb has quit IRC		05:09
*** lennyb has joined #zuul		05:21
*** isaacb has joined #zuul		05:51
*** isaacb has quit IRC		06:31
*** isaacb has joined #zuul		08:37
*** hashar has joined #zuul		08:47
*** jkilpatr has quit IRC		10:00
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Common threading layer https://review.openstack.org/478466	10:34
*** jkilpatr has joined #zuul		10:58
*** jkilpatr has quit IRC		11:41
*** dkranz has quit IRC		11:52
*** jkilpatr has joined #zuul		12:01
*** jkilpatr has quit IRC		12:11
*** jkilpatr has joined #zuul		12:12
*** jkilpatr has quit IRC		12:28
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul feature/zuulv3: Common threading layer https://review.openstack.org/478466	13:10
*** dkranz has joined #zuul		13:27
*** rcarrill1 is now known as rcarrillocruz		14:44
*** isaacb has quit IRC		14:58
*** jkilpatr has joined #zuul		15:37
jeblair	mordred: i think we're going to run into tobiash's role-repo problem soon. i think we hadn't noticed it because of our openstack-zuul-jobs/openstack-zuul-roles split, but as we move roles into zuul-jobs and project-config, we're going to need to name those same repos as role repos.	15:56
mordred	jeblair: yay!	16:04
jeblair	mordred: so i was thinking i might take a stab at implementing that rather than writing it up as a story :)	16:04
jeblair	mordred: can you take a look at 478313 and 478315 ?	16:05
jeblair	i'd like to merge those and 478311 and restart the executor and see if we can't get log publishing going	16:05
mordred	jeblair: ++	16:08
mordred	jeblair: for 478313 - as a temp workaround while you fix the tobiash bug, we could also just add openstack-infra/zuul-jobs to the roles list of the base job	16:09
jeblair	mordred: yeah i think we'll need to do that	16:09
mordred	jeblair: I can do that as a followup patch	16:10
jeblair	mordred: cool	16:11
mordred	jeblair: oh - I just realized something - hosts: all roles: collect-logs - is the collect-logs role handling overlapping log file names?	16:11
jeblair	mordred: it will handle them by clobbering them right now. :)	16:11
mordred	:)	16:11
* mordred will file story for that		16:11
tobiash	mordred: you might want to encode inventory_hostname into the destination dir	16:12
jeblair	mordred: so that's an interesting thing -- do we want the collect-logs to ... what tobiash said :)	16:12
jeblair	mordred, tobiash: maybe yes, if there's more than one host, but no if there's only one host	16:12
jeblair	(that way you don't have to navigate to logs/ubuntu-xenial/subunit.html for a unit test)	16:13
tobiash	jeblair: that should be possible with ansible	16:14
mordred	https://storyboard.openstack.org/#!/story/2001092	16:14
mordred	funny - I wrote allof those things into the story :)	16:14
jeblair	it's like we're all on the same page :)	16:15
mordred	there's probably a third case - which is "what if a job wants to do pre-rationalization" - like I could imagine devstack maybe wanting the logs from the 'controller' node to show up in the base logs dir as "the logs" - but still have subdirs for each additional node	16:16
mordred	which I dont think the base job needs to know about	16:16
mordred	but maybe there should be a way for a job to do some log collection and then say "I've already done this work, please to not collect logs for me"	16:16
mordred	I added a note about that to the story	16:18
jeblair	mordred, tobiash: or... perhaps we leave the role as is -- it copies every logs/ dir from every host, which is fine for the single-host case. and if you write a multi-node job, well, it's not going to do anything by default. you just expect to pre-rationalize them. ie, devstack may pull all of its logs into the logs/ dir on the controller, in whatever layout it wants.	16:18
tobiash	jeblair: would also be an option	16:19
jeblair	(i mean, a job has to put something into the logs/ dir on the node -- so just say that the job needs to be responsible for understanding what it's putting there)	16:19
mordred	jeblair: I like that for an answer to the follow up about "what to do about advanced jobs like devstack" ... I kind of like the system automatically handling multi-node jobs with host directory logs	16:21
mordred	since it's kind of a neat way to support multi-node - and if the rationalize-step is just "is there content in only one hostname dir in the local logs dir, if so collapse" - then devstack can simply move all of the logs to the 'controller' node	16:22
jeblair	mordred: yeah, if we implement it that way, we may have the best of both worlds. if you accidentally or on purpose have content in logs/ on more than one host, we rationalize it for you. if you know what you're doing, you don't end up in that position and we do nothing.	16:23
mordred	++	16:24
* tobiash is heading home now		16:24
*** bhavik1 has joined #zuul		16:30
*** bhavik1 has quit IRC		16:32
dmsimard	The zuul executor nodes, are they localized to each region or is it a global zuul cluster somewhere ?	16:34
dmsimard	https://github.com/openstack-infra/system-config/blob/master/hiera/common.yaml suggests it's a common cluster somewhere	16:35
mordred	dmsimard: common cluster	16:36
mordred	dmsimard: the communication between scheduler and executors is over gear - communication between regions from executors to nodes is over ssh - which is better suited to WAN traffic	16:37
dmsimard	mordred: fair, I guess the hidden question behind that was did you ever consider localizing more parts of the infrastructure (such as logs) -- cause that common cluster (for example) could end up pulling logs from an EMEA region to USA	16:38
dmsimard	I'm not really concerned about things like data privacy laws, but more around latency and throughput	16:38
dmsimard	I guess decentralizing the infrastructure would have a cost on complexity/management	16:39
dmsimard	(not just in software but in human resources)	16:39
mordred	dmsimard: it has come up in conversation a few times (for openstack, we do, in fact, pull logs from EMEA to USA) - but doing so would require teaching zuul more about clouds and regions, and at the moment that's not information zuul takes in to account when scheduling	16:40
dmsimard	yeah, that logic lives in nodepool.	16:40
mordred	dmsimard: since it's been fine so far at infra-scale, the idea hasn't really bubbled up to the point of being a problem that needs solving	16:40
dmsimard	alright, thanks :)	16:40
clarkb	fwiw our bandwidth between emea and north america is really great	16:41
clarkb	so as long as you aren't sensitive to latency you tend to not have problems	16:41
clarkb	our clouds in europe seem to have really solid network setups (ovh and citycloud)	16:42
dmsimard	yeah, it's also okay not to prematurely optimize things until there's a real problem	16:42
dmsimard	if it works at infra scale it's good enough for me :)	16:43
mordred	dmsimard: :)	16:43
mordred	dmsimard: it's nice that we have a stupily big version for data isn't it? :)	16:43
clarkb	(afs is sensitive to latency so I've got a couple ideas to test in order to make that better, but currently focused on gerrit upgrade items)	16:43
dmsimard	mordred: the 14x1TB cinder volume over LVM is still a bit mind boggling :)	16:44
clarkb	dmsimard: and its too small!	16:44
clarkb	:)	16:44
dmsimard	clarkb: I know right, part of why decentralizing that would perhaps ease the burden :p	16:45
dmsimard	but fungi said it's a WIP to move to a bigger node/provider (at which point does it make sense to scale horizontally instead?)	16:45
clarkb	potentially. We want to switch providers to avoid volume maintenance taking out the entire fs as first priority	16:46
clarkb	but sounds like there may be some room for scaling up too	16:46
clarkb	the problem with 14 1TB volumes under a single fs is that anytime that cloud does maintenance on their volumes at least one of our volumes seems to be affected	16:47
clarkb	s/the/a/	16:47
dmsimard	yeah that's awful	16:47
*** hashar has quit IRC		16:48
fungi	it's an outage magnet, more or less	16:49
fungi	coupled with needing a day or more outage to our ci system to perform a full fsck when that happens	16:50
fungi	so, so, so many tiny files	16:50
*** jkilpatr has quit IRC		16:50
mordred	dmsimard: one of the biggest issues with de-centralized log storage is that we'd lose a global index for the logs - knowing the change id would no longer be enough to find the build logs - you'd also have to know what cloud it ran on	16:51
fungi	decentralized log storage seems fine to me as long as there's a centralized index	16:51
mordred	dmsimard: this, of course, is solvable, as all things are - mostly just pointing out one of the issues that can make it slightly more complex to deal with	16:51
mordred	fungi: yup	16:51
jeblair	someone said something about a zuul dashboard... :)	16:51
fungi	if only the thing deciding where the logs go knew where the logs went...	16:52
clarkb	just tested to reassure myself and we get about 10MBps from gra1 to dfw on a 57MB file	16:52
jeblair	speaking of logs, i'm going to try restarting the executor now	16:52
clarkb	which isn't bad for crossing an ocean. Better than my home connection	16:52
jeblair	and i will restart the scheduler	16:57
rcarrillocruz	hey folks, noob question: a have a nodepool node stuck in locked state, can't delete it. I assume this can be unlocked with some zookeeper command?	17:04
Shrews	rcarrillocruz: weird. what state does ZK report the node in?	17:05
Shrews	READY, BUILDING, etc	17:05
rcarrillocruz	deleting	17:05
Shrews	rcarrillocruz: hmm, is the cleanup thread running?	17:06
rcarrillocruz	should be, cos I can delete other nodes just fine	17:06
rcarrillocruz	but can't this one	17:06
jlk	o/	17:07
Shrews	would be good if we could figure out how it got into this state	17:08
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Create a new logger for gerrit IO https://review.openstack.org/478566	17:08
rcarrillocruz	Shrews: can't really tell how i got there, any hint how to unlock the node to delete it tho?	17:09
Shrews	rcarrillocruz: do you have zk-shell?	17:10
rcarrillocruz	let me see	17:10
rcarrillocruz	nope, is that on pypi?	17:10
Shrews	https://github.com/rgs1/zk_shell	17:10
rcarrillocruz	yah, pip installing it	17:10
jeblair	mordred: speedy +3 on 478570 pls?	17:11
rcarrillocruz	lol, it installs twitter libs ?	17:11
rcarrillocruz	anyway, i'm on zk-shell shell now Shrews	17:11
rcarrillocruz	oh neat	17:12
rcarrillocruz	so	17:12
rcarrillocruz	zookeeper has a tree structure	17:12
Shrews	rcarrillocruz: using that, just rm the znode	17:12
jeblair	wait	17:12
rcarrillocruz	ls nodepool/nodes	17:12
jeblair	do not rm the node	17:13
jeblair	give me a second to find some debug commands	17:13
* rcarrillocruz waits		17:14
mordred	jeblair: done	17:14
jeblair	(it's never okay for a nodepool admin to have to use zk-shell, so we need to debug any time it happens)	17:14
jeblair	rcarrillocruz: run 'dump' and paste the output please	17:14
jeblair	you can actually run that without zk-shell	17:16
jeblair	echo dump\|nc localhost 2181	17:17
jeblair	will work	17:17
jeblair	https://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#The+Four+Letter+Words	17:17
jeblair	(but zk-shell also supports it)	17:17
rcarrillocruz	there you go	17:17
rcarrillocruz	https://paste.fedoraproject.org/paste/FmmFyItw~8KpVWFZIVz0wA	17:17
jeblair	so it is the poolworker that holds the lock	17:19
jeblair	rcarrillocruz: are you running with debug level logging?	17:20
rcarrillocruz	i'm not running nodepool in foreground with debug level set is that what you ask, installed with ansible role and started nodepool with systemd scripts. I can tail launcher-debug.log	17:22
jeblair	rcarrillocruz: no, i just meant debug logs -- if "DEBUG" entries show up in launcher-debug.log, that's what i'm looking for	17:23
rcarrillocruz	yeah, they show up	17:23
jeblair	rcarrillocruz: can you grep "00166" in launcher-debug.log? it will also have a server UUID on the "Waiting for server" line. also grep for that UUID and pastebin both please?	17:23
rcarrillocruz	http://paste.openstack.org/show/613957/	17:25
rcarrillocruz	that the last hit	17:25
rcarrillocruz	otoh	17:25
rcarrillocruz	i spotted an error	17:25
rcarrillocruz	sec, i'll paste	17:25
rcarrillocruz	http://paste.openstack.org/show/613958/	17:26
jeblair	rcarrillocruz: perfect, thanks!	17:26
rcarrillocruz	sorry for the rightmost dots, i'm on a tmux session, some other user must have diff window size	17:27
*** Shuo has joined #zuul		17:27
jeblair	rcarrillocruz, Shrews: i wonder if, since we have a stuck ephemeral node, if the best solution might be to restart the launcher rather than manually delete the zk node? Shrews, what do you think?	17:27
Shrews	that would work	17:28
jeblair	rcarrillocruz: oh, that traceback is useful, though i'm not convinced that's the problem	17:28
jeblair	i think it may be from a different thread	17:28
jeblair	rcarrillocruz: was there any more interesting log entries around 2017-06-27 17:01:12,489 ?	17:29
rcarrillocruz	let me look	17:29
Shrews	rcarrillocruz: did you shutdown/restart or otherwise do something to zookeeper while nodepool was running?	17:29
jeblair	(that was the "deleting ZK node" line)	17:29
*** bhavik1 has joined #zuul		17:35
*** bhavik1 has quit IRC		17:42
Shuo	What's zuul v3's dependency to nodepool? Can a zuul environment work without it?	17:43
jlk	Zuul relies on nodepool to provide nodes on which to execute the tests	17:43
jlk	s/tests/jobs/	17:44
SpamapS	Shuo: I'd say that zuulv3 needs _something_ to be on the other side of ZK. Nodepool is currently the only thing that implements that, so it's an effective dependency, but you could, in theory, write something else to do nodepool's job.	17:44
jlk	Without nodepool, there is very limited things Zuul can do	17:44
SpamapS	I've oft wondered if, when we get to a place of stability in that interface, if we could write smaller shims for clouds that have their own nodepool-ish things. Like AWS autoscale groups, for instance.	17:45
SpamapS	sans nodepool, you can only do noop jobs.. I think?	17:46
mordred	SpamapS: yah - although those are also likely good backend modules for nodepool itself	17:48
mordred	a heat driver got brought up the other day, for instance - but there's some design questions we'll need to think about in such cases	17:49
tobiash	I think it could be possible to run jobs without nodeset locally on the executor	17:51
jeblair	yeah, there's enough overhead to the protocol and resource management that i think it's going to be useful to think of most species of that problem as another nodepool driver (possibly, even out-of-tree driver in the future). but, yeah, if we run into a problem from a different genus, that might be a good option for just writing to the zk api.	17:51
jeblair	tobiash: correct; a job with no nodes can still do things on the executor. that could come in handy for things like "trigger this remote webhook".	17:52
Shuo	Spamaps: I wonder if something like Kubernetes/Mesos can be placed on the nodepool's role	17:52
tobiash	that would restrict the jobs to trusted jobs (not sure it makes any sense for a nodepool less deployment)	17:52
jeblair	tobiash: untrusted jobs run on the executor too and can still be useful. like that webhook example. it doesn't need to be trusted.	17:53
Shuo	I am asking this because we are not a OpenStack shop, not a VM shop (in our data center)	17:53
jlk	SpamapS: you can't even do a "noop" job without a nodepool. That doesn't work outside of the test runner	17:53
mordred	Shuo: writing a k8s/mesos driver for nodepool is definitely a thing that we should do	17:54
tobiash	jeblair: ah, right, missed things like that	17:54
jeblair	Shuo: yeah, kubernetes, etc, support in nodepool is on the roadmap, but we don't know what it will look like yet. mesos may be easier. :)	17:55
jlk	I think there's some code needed to allow zuul to run a job without a node	17:55
Shuo	jlk: if my plan is to avoid dependency to nodepool (and say, I can bear with a static resource capacity at this moment), can I have some kind of setup?	17:55
jeblair	jlk: i think it should work -- that's actually the code path for about 85% of the tests :)	17:55
jeblair	Shuo: we plan on using nodepool to manage static resources as well. there are actually patches in progress for that.	17:56
jlk	jeblair: oh? I'd like to know more! I thought it was special cased for the test runner.	17:56
mordred	Shuo: static resource capacity is another thing that should come from nodepool - there isn't really any intent to modify zuul to not need nodepool - however, tristanC is working on the patches to allow nodepool to have staticly defined resources	17:56
mordred	Shuo: as that's a thing many people need	17:56
Shuo	jeblair: that would be great because my current thought is to make a simple cluster to make the case for zuul.	17:57
mordred	Shuo: ++	17:57
jeblair	jlk: i think it's just an empty node request. it might still go out to nodepool to be satisfied (which is silly). let's test it later after i get logs working. :)	17:57
jlk	fair enough	17:57
SpamapS	jlk: oh, fake nodepool to the rescue I guess ;-)	18:00
SpamapS	funny story...	18:00
SpamapS	I think if you had nodeset-less jobs..	18:00
SpamapS	you could just scale out executors	18:00
SpamapS	with "whatever method you like"	18:00
Shuo	jeblair: "let's test it later after i get logs working. :)" sound a very quick ETA? My time horizon is around 5 weeks :-)	18:00
SpamapS	and if you don't want to let job definers define their host OS... that would be a viable Zuulv3 use case.	18:00
Shuo	modred: glad this resonates :-)	18:01
SpamapS	like if all you want to let them do is run stuff that will also be on the executor.. I see no problem with that.	18:01
SpamapS	(it's not something I want... but.. I can see a case for it)	18:01
jlk	I could see a case for it	18:02
jlk	zuul as a relay agent to other systems	18:02
jlk	handle ingest, gate keeping, and reporting, but farm "work" out to somewhere else.	18:02
Shuo	jeblair: "mesos may be easier. :)" if that's something I can manage to contribute to (bear in mind that I don't know any zuul internals), I'd love to take someone to my plate.	18:03
jlk	Shuo: that'd be more of a nodepool contribution	18:03
Shrews	tfw you spend an hour debugging a client/server interaction to realize a lack of \n caused all of your problems	18:03
Shuo	jlk: do we have an up-to-date nodepool archtiecture diagram?	18:04
jlk	Shuo: I do not know the answer to that	18:04
Shuo	jeblair: do we have an up-to-date zuul (working with nodepool) architecture diagram?	18:04
jeblair	Shuo: nope	18:05
Shrews	well, we have this (https://github.com/openstack-infra/nodepool/blob/feature/zuulv3/doc/source/devguide.rst) though we've let that get slightly out of date. but it is close	18:05
Shrews	and only covers the builder	18:05
Shuo	Shrews: thanks.	18:06
jeblair	Shuo: we're just now working out what the internal nodepool architecture is for multiple drivers	18:06
Shrews	oh, yeah, def no diagrams for the drivers	18:06
Shuo	jeblair: not meant to interrupt or demand any commitment, but if documentation is something that I can help on, I'd love to help on as I hope to make zuul internally well discussed.	18:07
Shuo	Shrews: thanks for sharing. btw, are you working on nodepool specifically?	18:08
jeblair	Shuo: you may be interested in taking a look at 463328 and its 12 child changes	18:08
jeblair	Shuo: https://review.openstack.org/463328	18:08
Shrews	Shuo: i rewrote most parts of nodepool to implement jeblair's v3 specs. not actively working on any changes to it right now.	18:09
*** jkilpatr has joined #zuul		18:09
* SpamapS now wonders about something like a set of ansible modules that manage mesos things that would just run on the executor.		18:10
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Make sure playbook and role repos are updated https://review.openstack.org/477627	18:10
mordred	SpamapS: yah - passing in mesos/k8s endpoint information as part of the inventory is definitely a thing thathas come up in the discussions we've had so far when we've looked in to container workloads intended for such systems - there's a few fun and complex issues that come up once the whiteboard comes out that it's going to be a fun conversation when we all have the brainspace to tackle it	18:13
jeblair	SpamapS: it would work, but i don't think it's a good fit for most kinds of jobs. we gave up on that architecture with jenkins 6 years ago when openstack was 1/200th the size it is now. managing resources separately from job execution makes a lot of things better at all scales.	18:14
Shuo	Shrews: reading the pointer you shared now. maybe a dump question, can you give an example of "BuildWorker" (is that a single machine or is that an endpoint of a driver such as mesos master)?	18:15
jeblair	mordred, SpamapS: we need to land https://review.openstack.org/477627 and restart the executor again. we're running into the problem that fixes.	18:16
mordred	jeblair: +2 from me - SpamapS you got a sec to look at it?	18:16
* SpamapS looks		18:19
Shrews	Shuo: that diagram is not useful to you in relation to drivers. It's for the nodepool-builder (nodepool/builder.py). You would be interested in nodepool-launcher (nodepool/launcher.py).	18:19
SpamapS	so many updates	18:20
SpamapS	mordred: I feel like maybe mesos and k8s include better resource management themselves than jenkins + openstacks. But I could be wrong.	18:21
* SpamapS hears Dak ... STAY ON TARGET...		18:21
SpamapS	I +3'd 477627, but do we need to kick it to get it into the gate?	18:22
SpamapS	ah no, it's just rechecking	18:22
mordred	SpamapS: I'm not necessarily agreeing or disagreeing with that - I'm saying there's a bunch of other bits that get tricky, and it'll be a fun problem to work on in a few months because I think solving it will be valuable to many people	18:23
SpamapS	mordred: Right.. first let's get these torpedos into the exhaust port.	18:29
mordred	SpamapS: ++	18:31
openstackgerrit	Merged openstack-infra/zuul feature/zuulv3: Make sure playbook and role repos are updated https://review.openstack.org/477627	18:32
Shuo	jeblair: "https://review.openstack.org/463328" looks like a great start point for me. How do I start picking it up?	18:35
jeblair	Shuo: i'm not sure i understand the question	18:36
Shuo	jeblair: https://review.openstack.org/463328 is an active CL under review, how should I jump in and start working on it?	18:38
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix and test report urls for unknown failures https://review.openstack.org/469214	18:46
jlk	oh crap, I think I need to change my docker file for python3 needs now	18:46
jeblair	Shuo: feel free to log into gerrit and submit comments; there are 12 patches after that one too	18:48
Shuo	jeblair: thanks. leave for lunch for now, will be back and try to catch up in the afternoon. by now...	18:49
openstackgerrit	David Shrewsbury proposed openstack-infra/zuul feature/zuulv3: WIP: Add web-based console log streaming https://review.openstack.org/463353	18:53
*** Shuo has quit IRC		19:06
jeblair	mordred: i'm not seeing any output from the post playbook: http://paste.openstack.org/show/613976/	19:09
jeblair	mordred: i'm re-running with verbose	19:09
mordred	jeblair: hrm. let me look in just a sec	19:18
*** Guest28796 is now known as mgagne		19:19
*** mgagne has quit IRC		19:19
*** mgagne has joined #zuul		19:19
*** Shuo has joined #zuul		20:20
jeblair	i'm going to temporarily stop the nodepool launcher as a poor substitute for lacking autohold	20:21
*** Shuo has quit IRC		20:28
*** Shuo has joined #zuul		20:28
mordred	jeblair: heya - sorry - I had a phone call with dkranz - any current debugging state I should start looking at? or just start from scrtach?	20:44
jeblair	mordred: nah, verbose didn't give any more output; i'm going to "hold" a node ^ so i can poke	20:45
mordred	jeblair: kk	20:45
jeblair	mordred: while i'm waiting -- i just realized that pabelanger implemented a log pull in the tox job -- so i think he was imagining that any jobs with interesting logs would do their own pulls back to the executor, whereas i imagined they would move things into ~/logs on the node. i'm not sure my idea is better. we may want to go with the pabelanger plan for a bit and see what that looks like.	20:48
clarkb	jeblair: with proper secrets they don't have to use the executor as a staging area right?	20:50
clarkb	seems like avoidingthat if possible would be good for bw reasons	20:50
jeblair	clarkb: technically... yeah, we could rsync directly from node to logserver, however, that means the node has some kind of credential access (not sure what ansible does there... agent forwarding?) after the node has run an untrusted job, so i get itchy about that.	20:52
jeblair	clarkb: v2.5 does the staging thing and we haven't really noticed :)	20:52
clarkb	thats true and we staged wtih jenkins for years too (that is how scp worked)	20:53
jeblair	mordred: there's some output (like the "PLAY" and "TASK") lines going to the ZUUL_JOB_OUTPUT_FILE that isn't ending up in the zuul log because it's not going to stdout; i think that's normal -- but did we discuss maybe having everything go to stdout when we use -vvv?	20:58
jeblair	in ansible, i want to have a "when:" conditional based on whether an attribute/key/whatever is defined. in other words, sometimes "zuul.newrev" will be defined, sometimes it won't. "when: zuul.newrev" throws an error when it doesn't exist... what's the right way to do that?	21:03
mordred	jeblair: maybe? I mean, we can definitely do that	21:03
jeblair	maybe jinja with a default value...?	21:03
mordred	yah	21:04
mordred	when: zuul.newrev \| default(None) or somehting like that	21:04
jeblair	[WARNING]: when statements should not include jinja2 templating delimiters such as {{ }} or {% %}. Found: {{ zuul.newrev \| default(None) }}	21:05
jeblair	i think it worked since it's just a warning....	21:05
jeblair	oh it just doesn't want {{}} because it's already a jinja string	21:07
jeblair	yeah, so just dropping that works	21:07
Shuo	jeblair: can you add me to the reviewer's list? I don't seem to be able to publish my comment on https://review.openstack.org/#/c/463328/ for some reason (though I logged onto gerrit)	21:07
jeblair	Shuo: that shouldn't be necessary -- what happens when you try to publish it?	21:07
jeblair	mordred: apparently "zuul.newrev is defined" is also a thing	21:08
mordred	jeblair: that seems clearer	21:08
jeblair	mordred: ya, hard to argue with that :)	21:08
Shuo	jeblair: I have written a fully comment, and can 'save' (now it's draft from my web view) but can't publish -- sorry, long time no use gerrit :-)	21:09
*** dkranz has quit IRC		21:10
jeblair	Shuo: after you do that, there's a "Reply..." button at the top of the main change screen, that should publish your comments	21:10
jeblair	mordred: i don't see a way to set the user with add_host, but we need to log in as 'jenkins' to static.openstack.org (which we add to the inventory with add_host)	21:13
jeblair	mordred: oh, wait, maybe i just add an ansible_user variable and it does it...trying	21:15
Shuo	jeblair: done	21:15
jeblair	Shuo: thanks!	21:15
mordred	jeblair: yes - should be ansible_ssh_user iirc	21:15
mordred	nope. ansible_user -ansible_ssh_user is the old one	21:16
Shuo	also, sent an email to your @redhat.com, please take a look once you get chance...	21:16
Shuo	jeblair: also, sent an email to your @redhat.com, please take a look once you get chance...	21:16
jeblair	Shuo: will do	21:18
Shuo	jeblair: thanks!	21:18
jeblair	mordred: w00t! i got http://logs.openstack.org/76/478576/2/check/31feea4/logs/ to happen with a bunch of manual changes. :) patching now.	21:23
jeblair	21:26 < openstackgerrit> James E. Blair proposed openstack-infra/project-config master: Fix base post playbook https://review.openstack.org/478639	21:28
jeblair	21:28 < openstackgerrit> James E. Blair proposed openstack-infra/openstack-zuul-roles master: Fix several issues with upload-logs role https://review.openstack.org/478646	21:28
jeblair	mordred: ^ can you +3 those?	21:29
Shuo	I'd like to share this with zuul community (See the 2nd reply to first comment), I really feel zuul has its huge potential to grow	21:36
Shuo	https://lwn.net/Articles/702177/	21:36
mordred	jeblair: on it - also - did you have to patch zuul itself? or just the playbooks?	21:51
mordred	jeblair: and if it was just playbooks - do we need to address debuggability? or just these are hard becaue they are dealing with the log publication system, so chicken-and-egg?	21:51
jeblair	mordred: i think both. i think sending all output to stdout on -vvv will help. as will the cmdline zuul-executor idea.	21:53
jeblair	and, of course, autohold	21:53
jeblair	i think with those 3 things, this would be human-debuggable.	21:54
mordred	++	21:54
jlk	Interesting question time	22:02
jlk	wait, no. n/m	22:02
SpamapS	Shuo: interesting article	22:04
* SpamapS is reading it		22:04
SpamapS	they completely missed that gerrit has keyboard shortcuts for one.. and git-review..	22:04
SpamapS	and "it's hard to do local testing" .. ???!!!	22:04
SpamapS	the UI has the clone instructions right there (not to mention gertty just does it for you)	22:04
SpamapS	weird	22:04
jeblair	you can't argue with kernel developers	22:05
jeblair	2017-06-28 22:03:02,939 DEBUG zuul.AnsibleJob: [build: 4f9c2f7d45e9428ba1f09964d3cb9e33] Ansible output: b'fatal: [static.openstack.org]: UNREACHABLE! => {"changed": false, "msg": "SSH Error: data could not be sent to remote host \\"static.openstack.org\\". Make sure this host can be reached over ssh", "unreachable": true}'	22:05
jeblair	okay that's new	22:05
mordred	jeblair: that doesn't seem ideal	22:06
SpamapS	Shuo: "if only zuul scaled well" .. but no supporting evidence. I'd say Zuul scales pretty darn well.	22:07
mordred	SpamapS: someone said zuul doesn't scale?	22:07
mordred	SpamapS: are they high?	22:07
jeblair	mordred: apparently we did, in the openstack-infra channel	22:07
mordred	we did?	22:08
jeblair	so says a person on the internet	22:08
mordred	wow	22:08
mordred	I mean	22:08
mordred	people on the internet are often wrong	22:08
mordred	but that's about the wrongest I've heard anyone be in quite some time	22:08
mordred	it may not (yet) scale DOWN as well as we'd like - but I think we prove just aout every day that it scales UP	22:09
jeblair	this is last october, too, after zuul achieved it's highest scaling point :)	22:09
mordred	oh - that's from prometheanfire - that's really weird, he's much more clued in than that	22:10
* mordred walks away from the lwn article		22:10
SpamapS	yeah	22:11
SpamapS	lwn is like that	22:11
jeblair	this is one of those "is everything i read in the newspaper about things that i'm not expert in this inaccurate?" moments	22:12
jeblair	mordred, SpamapS: could it be that we have to add static.o.o host keys to known_hosts?	22:13
mordred	jeblair: it's worth trying an ssh as zuul to jenkins@static and see - are you on and can try that?	22:14
clarkb	jeblair: I'm pretty sure its all that inaccurate yes	22:14
mordred	jeblair: there is definitely no entry in zuul user's known_hosts	22:15
jlk	jeblair: so I was just introducing zuul to somebody, and they kind of stumbled over the project definition schema, and was a bit confused/irritated that the pipelines weren't in their own key, to make it much more clear what was a pipeline. instead of just taking anything that isn't one of the reserved keywords in a project hash	22:15
jlk	jeblair: they expected something more like https://review.openstack.org/#/c/463328/5/doc/source/project-config.rst	22:16
jlk	er	22:16
jlk	https://gist.github.com/caphrim007/40ec478f0ab2e233ea804f22b8531d99	22:16
mordred	wow, how fascinating. I never want additional deepness in my yaml :)	22:17
mordred	jeblair: oh - duh. we create a known_hosts in the job dir - so we'd need to be able to pass in additional known_hosts	22:19
mordred	jeblair: or maybe start with the zuul user's known hosts and append to it?	22:19
jlk	mordred: yeah, he's totally unfamiliar with zuul so he thought "check" was a specific keyword for the project, not implicitly a pipeline name (because it didn't match any other keys in the schema)	22:23
mordred	jlk: nod	22:23
mordred	jlk: I was actually just chatting with someone earlier today and the "how does a new user discovery what pipelines are available" came up - which may or may not be related	22:24
jlk	somewhat related yea	22:24
jlk	comes down to "admin has to have documented them"	22:24
jlk	until we do something like an API where you can ask zuul what pipelines exist	22:25
mordred	this is the lovely area in between "the system is flexible and lets you define any pipeline name you want" and "as a noob user, what are the names and what do they mean"	22:27
clarkb	++ on not adding unnecessary depth to yaml. I think that happens a lot because people don't understand the datastructure itself	22:27
clarkb	the datastructure is part of the sematics here and we should take advantage	22:27
mordred	clarkb: sure - but also I think we should not ignore reported confusion from new user	22:32
clarkb	definitely	22:33
Shuo	mordred: to reply "SpamapS: someone said zuul doesn't scale?", this is exactly why I feel it is worth speaking out my thoughts here https://review.openstack.org/#/c/463328/	22:37
jeblair	jlk: yeah, i could probably be convinced we should move it down a level; we've had to do similar things elsewhere (job graph), this is probably about the last place we have something like that...	22:37
Shuo	SpamapS: I was not saying zuul does not scale, and I think the person making such a comment probably did not spend time to do some research, and that's why I felt something I commented in Jame's Cl might make huge value to zuul https://review.openstack.org/#/c/463328/	22:40
mordred	Shuo: totally! fwiw I did not think you were saying that - and I agree	22:43
SpamapS	Shuo: indeed we were just reacting to the comment violently. It is not directed at you, and I like where you're going. :)	22:46
jeblair	mordred: yeah, if i re-run my command, i get asked to accept the key, and if i say no, i get the same error	22:51
jeblair	mordred: so i think it's known_hosts	22:51
mordred	jeblair: yah	22:52
jeblair	mordred: i think i'll just append to it in the playbook for now (since that's where the host is being added, it makes sense to keep those things together)	22:52
mordred	jeblair: kk. you have access to the host_key in the playbook?	22:52
jeblair	mordred: well, it's got a bunch of hardcoded stuff in there right now: username and host, so i can just add this.	22:53
mordred	jeblair: ++	22:53
jeblair	something something parameterize later	22:53
jeblair	sweet there's a known_hosts module	22:53
mordred	jeblair: ya- seems like either an option to seed the per-job known_hosts file with ~zuul/.ssh/known_hosts - or to pass in one or more to inject	22:53
mordred	jeblair: \o/	22:53
jlk	crap I think I hit some python3 issues in the github code	23:00
jlk	TypeError: a bytes-like object is required, not 'str'	23:02
clarkb	yup	23:03
clarkb	you'll need to encode whatever is str to bytes	23:03
jeblair	mordred: 23:07 < openstackgerrit> James E. Blair proposed openstack-infra/project-config master: Add static.o.o key to known_hosts for log publisher https://review.openstack.org/478672	23:08
jlk	wtf, I can't even access the object	23:09
jlk	(Pdb) p request.json	23:10
jlk	*** TypeError: a bytes-like object is required, not 'str'	23:10
mordred	jlk: request.data I believe is where the raw data is, if that's a requests object	23:12
jlk	it's not, it's a webob	23:12
jlk	<class 'webob.request.Request'>	23:12
mordred	jlk: ah. well - request.body I think?	23:13
jlk	(Pdb) request.body	23:13
jlk	*** TypeError: a bytes-like object is required, not 'str'	23:13
jlk	webob page claims to support python3. I'm skeptical	23:14
mordred	jlk: where in this code is this hitting you - do you know?	23:14
openstackgerrit	Monty Taylor proposed openstack-infra/zuul feature/zuulv3: Add ability to auto-generate simple one-line shell playbooks https://review.openstack.org/478675	23:14
jlk	it's in the githubconnection.py file, when it's trying to handle a webhook event	23:14
jlk	it's where we try to get the content to be a json dict object	23:15
mordred	json_body = request.json_body	23:15
mordred	that bit?	23:15
jlk	these are all internal methods of webob.	23:15
jlk	yeah	23:15
mordred	jlk: will it let you print request.headerlist?	23:16
mordred	or request.headers	23:16
jlk	yeah	23:17
jlk	ugh, python3, how do I quickly show the keys of a thing...	23:17
clarkb	jlk: foo.__dict__ ?	23:17
mordred	jlk: list(foo.keys())	23:17
clarkb	oh actual keys of a dict	23:17
mordred	jlk: specifically curious if the incoming payload from gh had a 'Content-type' header set	23:18
jlk	'CONTENT_TYPE': 'application/json'	23:18
jlk	I'm sending it with curl	23:18
mordred	or charset	23:18
mordred	ah	23:18
jlk	maybe I need to add a charset?	23:19
clarkb	content-encoding is I think what you want	23:19
jamielennox	jlk: this one? http://paste.openstack.org/show/613989/	23:19
jlk	jamielennox: the bottom one yes	23:19
mordred	jlk: yah - my hunch is that you need to add a charset in your curl	23:20
jamielennox	ok - but not caused by the same thing then?	23:20
mordred	jlk: my reading of the webob docs is that it needs a charset set so that it knows what to do with encoding	23:20
jamielennox	whatever is making repository == None	23:20
jamielennox	webob should set a default charset?	23:20
jlk	this I think is my first time running this all in python3, I may be up for a while :(	23:20
jlk	github doesn't send that in the headers	23:21
mordred	jlk: you should be able to pass "content-type: application/json; charset=utf8" and webob claims it'll do the right thing	23:21
mordred	hrm	23:21
jlk	http://paste.openstack.org/show/613990/	23:21
mordred	jlk: excellent	23:22
jlk	yeah I can't select others.	23:23
jlk	default_body_encoding is set to 'UTF-8' by default, it exists to allow users to get/set the Response object using .text, even if no charset has been set for the Content-Type.	23:25
jlk	The default_content_type is used as the default for the Content-Type header that is returned on the response. It is text/html.	23:25
jlk	oh that's content-type, not charset :/	23:25
mordred	yah - default body encoding seems like the right thing	23:25
jamielennox	charset should be it's own property on webob, you shouldn't need to manually do the headers	23:26
jlk	the object knows it's application/json	23:27
jlk	and it thinks the charset is 'UTF-8'	23:27
jeblair	i think we should drop the --keep-jobdir argument to the executor and replace it with an IPC like verbose. so we'd run "zuul-executor keep" or something to enable it.	23:45
jeblair	(it's difficult to start the executor with the init system and pass an argument like that)	23:46
mordred	jeblair: ++	23:46
openstackgerrit	James E. Blair proposed openstack-infra/zuul feature/zuulv3: Replace --keep-jobdirs with an IPC https://review.openstack.org/478682	23:51
jeblair	2017-06-28 23:49:30.347020 \| localhost \| ERROR: Results: => {"changed": false, "failed": true, "msg": "Failed to write to file /root/.ssh/known_hosts: [Errno 2] No such file or directory: '/root/.ssh/tmpS8tEVy'"}	23:51
jeblair	why's it trying to write there?	23:51
jlk	I am so confused	23:51

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!