Thursday, 2016-12-01

clarkb	the old behavior was to not build an image if it didn't have a target output format	00:51
openstackgerrit	Merged openstack-infra/nodepool: Add image_name to UploadWorker INFO message https://review.openstack.org/404891	01:15
*** jamielennox is now known as jamielennox\|away		01:34
pabelanger	sweet	01:35
pabelanger	with nodepool passing the md5 / sha256 files into shade, uploads start immediately now	01:36
pabelanger	must fast	01:36
pabelanger	1h20mins to build debian-jessie	01:37
*** Shuo has quit IRC		02:08
*** rcarrillocruz has quit IRC		02:12
*** rcarrillocruz has joined #zuul		02:16
*** jamielennox\|away is now known as jamielennox		02:54
*** saneax-_-\|AFK is now known as saneax		02:56
*** jamielennox is now known as jamielennox\|away		03:17
*** jamielennox\|away is now known as jamielennox		03:26
*** jamielennox is now known as jamielennox\|away		03:47
*** jamielennox\|away is now known as jamielennox		04:33
*** saneax is now known as saneax-_-\|AFK		04:36
*** saneax-_-\|AFK is now known as saneax		06:45
*** willthames has quit IRC		07:03
*** willthames has joined #zuul		07:16
*** abregman has joined #zuul		07:27
*** Cibo_ has quit IRC		07:31
*** herlo has quit IRC		08:42
*** herlo has joined #zuul		08:44
*** hogepodge has quit IRC		09:11
*** hogepodge has joined #zuul		09:12
*** hogepodge has quit IRC		09:31
*** hogepodge has joined #zuul		09:32
*** openstack has joined #zuul		10:06
*** hashar has joined #zuul		10:32
openstackgerrit	Merged openstack-infra/nodepool: Test rotation of builds in nodepool-builder https://review.openstack.org/400004	10:45
openstackgerrit	Merged openstack-infra/zuul: Don't merge post-merge items https://review.openstack.org/404903	11:10
openstackgerrit	Merged openstack-infra/zuul: Define the internal noop job https://review.openstack.org/404864	11:14
openstackgerrit	Tobias Henkel proposed openstack-infra/zuul: Cloner: Better infrastructure failure handling https://review.openstack.org/403559	11:33
openstackgerrit	Tristan Cacqueray proposed openstack-infra/zuul: Keep existing loggers with fileConfig https://review.openstack.org/405333	12:12
*** abregman has quit IRC		12:53
*** jamielennox is now known as jamielennox\|away		13:00
*** abregman has joined #zuul		14:18
*** abregman has quit IRC		14:23
*** abregman has joined #zuul		14:23
*** abregman has quit IRC		14:25
*** abregman has joined #zuul		14:25
*** saneax is now known as saneax-_-\|AFK		14:57
pabelanger	morning	15:00
pabelanger	nb01.o.o did its thing again last night	15:01
pabelanger	all iamges uploads	15:01
pabelanger	uploaded*	15:02
pabelanger	Shrews: so, looks like you are right. nodepool-builder is currently pinning a CPU, even though the system is idle. We should be running the recently changes to interval for build / upload workers	15:03
pabelanger	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63748&rra_id=all	15:04
Shrews	pabelanger: i read those as mostly idle, no?	15:10
pabelanger	ah	15:11
pabelanger	I didn't link the right one	15:11
pabelanger	http://cacti.openstack.org/cacti/graph.php?action=view&local_graph_id=63747&rra_id=all	15:11
pabelanger	load is >1 all the time	15:11
pabelanger	and it is nodepool-builder	15:11
pabelanger	must be looping some place	15:11
Shrews	pabelanger: how many build and upload threads?	15:13
pabelanger	1 builder, 16 uploads	15:13
pabelanger	however, we might be able to 1/2 uploads now	15:13
Shrews	ok, so each thread sleeps 10s between checks. with that many, chances are one of them is doing something every few seconds. that's my best guess	15:14
Shrews	i have to head out for a quick back adjustment. bbiab	15:14
openstackgerrit	Jan Hruban proposed openstack-infra/zuul: Allow to fetch git packs via SSH https://review.openstack.org/405441	15:20
pabelanger	Shrews: looks like it might be our connection to zookeeper	15:20
pabelanger	when I enable DEBUG logging for kazoo.client, and 1 upload / build worker, I see a spike to 100%	15:21
pabelanger	so, with 16 threads, we must be hammering zk	15:21
pabelanger	ya, looks like it	15:22
Shrews	pabelanger: ah yeah. The kazoo client has its own threads for keepalive things. Not sure if that can be adjusted, or if we should even try.	15:39
Shrews	Can look when I get back	15:39
*** abregman has quit IRC		16:01
jeblair	well, there are 311 threads running, which seems surprisingly high	16:05
jeblair	and apparently we don't support sigusr2 there so i accidentally killed it	16:06
jeblair	however, it seems to have stopped logging around 15:24	16:07
pabelanger	jeblair: I reduced kazoo logging back to INFO and stopped / started nodepool-builder. So, we're currently idle	16:09
pabelanger	and nothing to log	16:09
jeblair	pabelanger: any idea why i'm not able to restart it?	16:11
jeblair	ah there's the pid file	16:11
pabelanger	odd,	16:11
jeblair	ok it's running again	16:11
pabelanger	never seen that	16:11
jeblair	the pid file still existed	16:11
jeblair	still running 311 threads, so at least it's not leaking, but jeepers that's a lot.	16:13
pabelanger	I'm sure we can decrease our upload workers now, maybe 8	16:13
jeblair	pabelanger: why would we want to?	16:14
pabelanger	save some threads?	16:14
jeblair	pabelanger: well, i'd rather find out why we're spinning our wheels doing nothing	16:14
pabelanger	ack	16:14
pabelanger	even with 2 upload workers, it was still spiking to 100% CPU	16:15
jeblair	i'm going to add the stack dump handler to it	16:16
pabelanger	Also, working on sorting our default outputs, for example:	16:20
pabelanger	current output (unsorted): http://paste.openstack.org/show/591153/	16:21
pabelanger	updated output (sorted by, provider, image, age): http://paste.openstack.org/show/591152/	16:21
pabelanger	2nd should be how we do it today in nodepoo	16:21
pabelanger	nodepool*	16:21
jeblair	oooooh	16:22
pabelanger	another thing we could do is actually expose sorting options to CLI	16:22
jeblair	the threads are because we did not centralize the provider managers	16:23
jeblair	i should have caught that :(	16:23
jeblair	so we have (builders+uploaders+cleanup)*providers	16:24
jeblair	and there are 3 zk threads for each of those	16:30
pabelanger	remote: https://review.openstack.org/405482 Sort output of image-list	16:30
pabelanger	if we are happy with that, I'll add another patch for dib-image-list	16:31
jeblair	pabelanger: you don't want to sort by build,upload?	16:34
jeblair	er build,provider,upload i guess	16:34
*** hashar has quit IRC		16:34
pabelanger	let me see what that looks like	16:34
pabelanger	At min, I think we want provider then image	16:36
pabelanger	then we can do build / upload	16:37
pabelanger	that give a nice format too	16:37
jeblair	pabelanger: maybe ask in -infra?	16:37
pabelanger	will do	16:37
EmilienM	I have a stupid question	16:43
EmilienM	a few months ago, I asked if with current version of zuul we could mix a zuul layout with files constraint and projects	16:44
EmilienM	example: I want a job run only if it's for a specific project OR/AND some specific files in a project	16:44
EmilienM	last time I asked, the answer was no, you can only do one or the other	16:44
jeblair	EmilienM: is there more to your question?	16:48
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Sort output of image-list https://review.openstack.org/405482	16:50
EmilienM	jeblair: no	16:50
EmilienM	jeblair: I'm not sure I explained correctly	16:50
EmilienM	let me know if it's clear	16:50
jeblair	EmilienM: well, i understand the question you asked a few months ago (was it that long?). i just don't understand what you're asking now (you didn't actually ask a question just now) -- are you asking if anything has changed?	16:51
jeblair	and sorry, i'm not trying to be dense, i really don't know if that's what you want, or if you have some follow-up question...	16:52
EmilienM	I would like to know if situation has changed and if now, I can create a zuul layout that says "run this job if we touch this file or/and this project"	16:53
jeblair	EmilienM: no -- it's not likely to change before zuulv3, where you will be able to do that. zuulv2 is mostly frozen for new features.	16:54
Shrews	jeblair: "did not centralize the provider managers"... Help me understand that a bit more? Not sure what that means/entails since current code just uses ProviderManager class methods	16:54
Shrews	want to understand what I did wrong	16:54
EmilienM	jeblair: fair enough, thanks!	16:55
jeblair	EmilienM: np, and sorry. i'm working as fast as i can. :)	16:55
Shrews	for i in range(1..100): workers.append(jeblair.copy())	16:57
EmilienM	jeblair: no, thanks, sorry I didn't express it well :)	16:58
jeblair	Shrews: the provider manager is meant to be a global object that all threads use to contact a single provider (it's a shade taskmanager)	16:58
jeblair	Shrews: having said that... i'm not sure it's super important in this instance, since we're not actually doing that much cloud communication	16:58
jeblair	Shrews: and i may have been off-base when i blamed it for the number of threads -- it actually shouldn't be spawning any threads	16:59
Shrews	jeblair: right. not sure where in the code where we break that usage model	16:59
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Add stack dump handler to builder https://review.openstack.org/405529	17:00
jeblair	Shrews: well, if we were to use it as intended, NodePoolBuilder would contain the provider managers, and the workers would use those copies, rather than the workers having their own copies	17:00
Shrews	jeblair: do those copies come along with the config object then?	17:02
Shrews	no intentional copying that i see	17:02
Shrews	ah, i think so	17:03
jeblair	Shrews: yeah, when you reconfigure the providermanagers, they get attached to the config, so it's the loadconfig/provider_manager.reconfigure that's happening in each worker thread that causes them to get their own	17:04
Shrews	gotcha	17:04
jeblair	however, let's not move that just yet :)	17:04
jeblair	it may be advantageous for us to leave it as-is, since this way we can multiplex uploads to a single provider	17:05
jeblair	i am still eyeing the zk connection as something we may want to centralize. i think it is reponsible for 54 threads.	17:06
jeblair	but i think the way we should proceed is to land 405529 and run that on nb01 and see what all those threads actually are	17:07
Shrews	54? wow	17:07
jeblair	Shrews: yeah, i count 3 threads locally for each zk connection, and nb01 has 18 workers	17:08
jeblair	i have to run an errand... Shrews, pabelanger: if you can land 405529 while i'm gone, i'd be grateful	17:10
Shrews	centralizing the zk connection seems like a bad idea. it would serialize everything, including lock management	17:10
Shrews	jeblair: i can +1 it, but can't land it ;)	17:10
jeblair	Shrews: oh, would it actually serialize access?	17:13
pabelanger	looking	17:14
pabelanger	+2	17:17
pabelanger	3 more hours until builds for today start	17:20
pabelanger	should be flexing the cleanup worker today	17:20
pabelanger	as we'll be rotating images	17:20
Shrews	jeblair: i'm actually not sure which parts of the client are thread safe and which aren't, or if we'd need to switch to the async api	17:20
Shrews	we'll have to explore it	17:20
Shrews	transactions aren't, but we aren't using those so far	17:22
openstackgerrit	Merged openstack-infra/nodepool: Have an ending line-feed on the generated id_rsa.pub file https://review.openstack.org/383496	17:32
openstackgerrit	Paul Belanger proposed openstack-infra/nodepool: Sort output of image-list https://review.openstack.org/405482	18:15
*** morgan is now known as morgred		18:20
*** morgred is now known as morgan		18:21
*** saneax-_-\|AFK is now known as saneax		18:23
*** abregman has joined #zuul		18:30
jesusaur	im trying to piece together how zuul works in a post-jenkins world, how does zuul-launcher know to update job definitions?	18:44
*** bhavik1 has joined #zuul		18:46
pabelanger	jesusaur: zuul-launcher reconfigure	18:50
pabelanger	when JJB folder changes, we Exec that in puppet	18:50
pabelanger	zuul will then reload JJB into memory	18:50
jesusaur	pabelanger: thanks :)	18:51
pabelanger	jesusaur: np	18:51
pabelanger	jesusaur: there are a few commands for zuul-launcher now, everything from graceful restart to drop nodes	18:51
Shrews	jesusaur: wow, 405529 had some epic fails (other than the pep8 thing). not sure what happened there	18:53
Shrews	err, jeblair	18:53
Shrews	sorry jesusaur	18:53
jesusaur	no worries	18:54
jlk	something else we've noticed in the new zuul land	18:56
jlk	and maybe this is because we've got gearman running on its own	18:56
jlk	but if we do a full restart of zuul-launcher, it seems to lose track of existing job nodes, and I have to rm the nodepool nodes so that nodepool launches them again and re-registers.	18:57
jlk	DOes that sound right to anybody?	18:57
pabelanger	yup	18:57
pabelanger	so, you can do 2 thing	18:57
pabelanger	each time you restart zuul, delete all your online nodes	18:58
pabelanger	or	18:58
pabelanger	check_job_registration=False	18:58
pabelanger	http://docs.openstack.org/infra/zuul/zuul.html#gearman	18:58
pabelanger	IIRC, that will stop the NOT_REGISTERED message	18:59
jlk	What's the implications of turning job registration off?	18:59
pabelanger	zuul will try to launch a node	18:59
jlk	what we see isn't a "NOT_REGISTERED" scenario, but instead sort of a infinite wait	18:59
pabelanger	Hmm	18:59
jlk	and as soon as the node shows up, the job goes through	19:00
jlk	(new node)	19:00
pabelanger	right	19:00
pabelanger	1 sec	19:00
pabelanger	jlk: http://docs.openstack.org/infra/system-config/zuul.html#restarts	19:00
pabelanger	you should follow that process for restarts	19:00
pabelanger	it should outline how to handle the missing registrations	19:01
pabelanger	you can also release you nodes back to nodepool too, before you stop now	19:01
jlk	kk	19:02
pabelanger	actually, that is only zuul-launcher	19:02
jlk	hopefully we don't have to restart nearly allt hat much	19:02
pabelanger	my mistake	19:02
pabelanger	jlk: Ya, we usually schedule them	19:03
pabelanger	jlk: look into zuul-launcher graceful	19:03
pabelanger	give that a try	19:03
pabelanger	that should help	19:03
jlk	alrighty	19:04
jeblair	when zuul-lancher stops, it returns all nodes to nodepool	19:04
jeblair	pabelanger, Shrews: my change didn't land :(	19:05
pabelanger	jeblair: oh, I did't approve	19:05
pabelanger	sorry about that	19:05
jeblair	pabelanger: i did, but it didn't pass tests	19:05
Shrews	jeblair: yep. been looking at those failures, but... geez	19:06
jlk	looksl ike graceful just does a stop	19:06
jlk	and then need to start it back up again	19:06
pabelanger	jlk: it keeps running jobs alive, so when we run it. It takes about 3 hours	19:06
pabelanger	Oh, yes	19:06
jeblair	jlk: fyi, none of this is relavent in zuulv3	19:06
jlk	jeblair: fair enough. Just trying to get a handle on running something to fully understand the changes of v3	19:07
*** abregman has quit IRC		19:07
jeblair	jlk: yep, just wanted to make sure you were aware so could plan accordingly :)	19:07
pabelanger	we do have a playbook to handle some of this too today	19:08
pabelanger	thankfully, restarts are minimal right now	19:09
jlk	we're in somewhat deep dev mode, so we've got frequent code changes, so restarts are more common place for us currently	19:10
pabelanger	http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/hard_restart_zuul_launchers.yaml and http://git.openstack.org/cgit/openstack-infra/system-config/tree/playbooks/restart_zuul_launchers.yaml are what we have today	19:10
jeblair	pabelanger, Shrews: i see the problem with the tests	19:14
jeblair	patch shortly	19:14
pabelanger	mordred: ya, also not a fan of the magic numbers, open to suggestions	19:15
jeblair	pabelanger: ^ what did i miss?	19:20
pabelanger	jeblair: 405482	19:22
jeblair	pabelanger: wow i don't understand that	19:23
clarkb	pabelanger: jeblair I think a more readablr way to do that is change the loop orders	19:24
pabelanger	jeblair: ya, some magic for sure	19:24
jeblair	i think the loops are in the right order	19:24
clarkb	oh ya its image build provider upload	19:24
jeblair	we could add sorted() calls to each one, or change the zk methods to return them sorted	19:25
pabelanger	Ya, I think might be cleaner	19:25
pabelanger	I can work on that	19:25
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Add stack dump handler to builder https://review.openstack.org/405529	19:25
jeblair	pabelanger, Shrews: ^	19:25
pabelanger	jeblair: because I don't know, why did you need to update tests/__init__.py?	19:28
Shrews	b/c he added the self.name attribute	19:29
Shrews	to the workers	19:29
pabelanger	ah, I see	19:29
pabelanger	the final whitelist check	19:30
pabelanger	thanks	19:30
jeblair	they were slipping in under "Thread-" earlier ("apscheduler thread pool")	19:30
*** openstackgerrit has quit IRC		19:32
*** openstackgerrit has joined #zuul		19:34
*** openstackgerrit has quit IRC		19:36
*** bhavik1 has quit IRC		19:38
*** saneax is now known as saneax-_-\|AFK		19:40
*** openstackgerrit has joined #zuul		19:41
openstackgerrit	James E. Blair proposed openstack-infra/zuul: Fix variants not picking up negative matches. https://review.openstack.org/399871	19:41
openstackgerrit	James E. Blair proposed openstack-infra/zuul: Add test for variant override https://review.openstack.org/399871	19:42
*** openstackgerrit has quit IRC		19:43
jeblair	SpamapS: i think i found one error in the tests you're adding in 399871. when i correct that and run it with my fix, it works. can you take a close look at that and make sure my analysis is correct?	19:44
dmsimard	rcarrillocruz: o/ where can I see those ansible based jobs run for devstack you've been working on ?	19:48
jeblair	dmsimard: https://review.openstack.org/#/q/topic:zuulv3+project:openstack-infra/devstack-gate	19:49
jeblair	dmsimard: the first one has merged already	19:51
mordred	we're working through transitioning d-g stepwise in small chunks	19:52
SpamapS	jeblair: ditto here ;)	19:52
*** openstackgerrit has joined #zuul		19:52
openstackgerrit	Merged openstack-infra/nodepool: Add stack dump handler to builder https://review.openstack.org/405529	19:52
dmsimard	yeah, I'm trying to see where the ansible bits are running	19:52
* dmsimard looks		19:52
*** jamielennox\|away is now known as jamielennox		19:53
jeblair	dmsimard: note the callback plugin, so the output is changed slightly from standard	19:53
dmsimard	jeblair: yeah, in fact I was curious to see what it would look like if I threw ARA in there	19:55
jeblair	dmsimard: d-g is self testing, so please do... maybe you could put a change at the end of rcarrillocruz's stack so it has something to chew on; that could be really useful :)	19:56
clarkb	keep in mind we do index the text logsfot aggregate processing so we cant replacethose, but having athing alongaide for specific ansible stuffcould be nice	19:58
jeblair	clarkb: yeah, i'm thinking at the very least a lot of the "setting up host this should take 10 minutes" stuff gets a lot better	19:58
dmsimard	clarkb: I actually want a way to export reports in a elasticsearch-friendly manner at some point in time so we can index fields and such. ARA can already sort-of do that through cliff which supports json output for the CLI but it's not straightforward to crawl through all the results and dump them right now.	20:00
pabelanger	come on time, move faster. 90mins until our next round of builds on nb01.o.o	20:00
clarkb	ya its also not necessaryto solve right away as long as you krrp the old text logs too since wealready process those	20:01
dmsimard	So instead of having logstash parse through console log output, you'd have real groks/filters/fields to work with	20:01
pabelanger	clarkb: also, no upload failures yesterday	20:01
clarkb	pabelanger: at all? seems like a list i saw had failures	20:02
jeblair	clarkb, pabelanger: maybe our cloud providers haven't blacklisted our new builder ip.	20:02
clarkb	the sort output one	20:02
pabelanger	clarkb: those were 2 days ago	20:02
clarkb	ah	20:02
pabelanger	jeblair: ha	20:02
rcarrillocruz	dmsimard: we were discussing yesterday about dropping role defaults and have a central vars file to put all d-g variables	20:05
rcarrillocruz	https://review.openstack.org/#/c/404974/	20:05
rcarrillocruz	clarkb: ^	20:05
rcarrillocruz	just in case you start ansiblying other stuff	20:05
clarkb	dmsimard: also we have tons of real fields :) it just happens that one of them starts as the entire message.But we inject a bunch of job info too	20:05
rcarrillocruz	if we agree on ^, i suggest you put your vars on it	20:05
dmsimard	rcarrillocruz: what's the last patch in your tree ?	20:05
rcarrillocruz	clarkb when you get a chance, have a look, i pasted last night some paste link with how it looks likes	20:05
dmsimard	rcarrillocruz: I'll send a poc patch with ara at the end just to see what it looks like	20:06
rcarrillocruz	https://review.openstack.org/#/c/404243/ , although that one i need to iterate, it's wip now	20:06
dmsimard	clarkb: fair, not mutually exclusive -- porque no los dos :p	20:06
pabelanger	+2	20:08
pabelanger	makes sense	20:08
pabelanger	jeblair: Shrews: anything I should be working on for nodepool?	20:09
adam_g	jeblair: so ive been banging on test_scheduler:test_build_configuration, and am wondering why merge_mode is hard-coded to MERGER_MERG_RESOLVE in v3 at https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3	20:10
jeblair	pabelanger: the sigusr2 patch landed, can you restart the builder? then we can get an idea of what those threads are	20:10
pabelanger	jeblair: sure	20:10
pabelanger	just waiting for git to update	20:11
rcarrillocruz	erm, thx for the topic change jeblair	20:11
jeblair	adam_g: that should just be the default, but it should be overridable in configuration; i don't know if that made it into configloader, if not, we should add it.	20:11
pabelanger	jeblair: restarted	20:12
adam_g	jeblair: configurable on the project?	20:13
adam_g	oh, merge-mode	20:13
jeblair	adam_g: yep, like "project: nova\n merge-mode: something"	20:13
jeblair	yeah	20:13
jeblair	pabelanger: i just sent it sigusr2	20:13
pabelanger	see it in debug logs	20:14
dmsimard	jeblair, rcarrillocruz: there you go, let's wait and see: https://review.openstack.org/#/c/405613/	20:14
pabelanger	thats a lot of threads	20:14
clarkb	rcarrillocruz: what does the second arg to default() do there?	20:14
clarkb	all the docs I can find are for single arg default()	20:14
adam_g	jeblair: ok yeah, doesn't look like its configurable ATM. will see about adding it and adding some tests for it	20:15
jeblair	adam_g: cool, thanks! that's an easy one for us to overlook since right now, we're using the default exclusively	20:15
jeblair	but definitely need to support the others	20:15
jeblair	yeah, so we do have one thread for each provider for each worker	20:16
clarkb	http://jinja.pocoo.org/docs/dev/templates/#default finally found the function signature	20:17
jeblair	pabelanger, Shrews: ah, right because the taskmanager itself is a thread	20:17
jeblair	so probably what we want to do is either decide that we want to serialize all access to a provider from a given process, or if we want to keep it parallel, stop using taskmanagers here	20:18
jeblair	i lean toward option 2	20:19
jeblair	i'm going to grab some lunch	20:19
rcarrillocruz	clarkb: lookup doesn't return None	20:20
rcarrillocruz	but an empty string	20:20
rcarrillocruz	thus we need the false thing	20:21
rcarrillocruz	dmsimard: \o/ excited to have ARA!	20:21
* rcarrillocruz goes for dinner, be back shortly		20:21
clarkb	rcarrillocruz: I thnik if mordred and jeblair are happy with 404974 we can go ahead and approve that then rebase things like https://review.openstack.org/#/c/402107/8 onto it	20:22
rcarrillocruz	doesn't return None if var is not in environment, that is	20:22
clarkb	rcarrillocruz: ya	20:22
Shrews	pabelanger: uh, shrug. anything you want to pick up from storyboard is fine with me. i may experiment with using shared ZK connections for a bit, in case we need decide to go that route (though I'm not certain how necessary it is yet)	20:24
pabelanger	Shrews: k, let me check the board	20:25
*** openstack has joined #zuul		20:46
pabelanger	20mins, fedora-23 build starts	20:58
dmsimard	rcarrillocruz: neat: http://logs.openstack.org/13/405613/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/507dbc5/logs/ara/playbook/97db8f0e-56f8-4925-91a5-710a19e4ea7b/host/localhost/index.html	21:08
jeblair	very cool	21:09
pabelanger	ya, that looks nice	21:09
dmsimard	jeblair: so basically I wonder how this would work from a Zuul perspective -- like, could we get the whole thing recorded "first party" by Zuul when the "end user" job leverages Ansible ?	21:10
dmsimard	I had done this a long time ago: https://review.openstack.org/#/c/330874/ but I sort of stopped there because what Zuul did with Ansible at that time was still fairly limited	21:11
*** rcarrillocruz has quit IRC		21:12
pabelanger	I would imagine it being some sort of post playbook	21:12
jeblair	dmsimard: yeah, i think so. probably by making it easy to add callback plugins.	21:12
jeblair	dmsimard: we're going to have a pretty big focus on roles in v3 -- would it be possible to group the tasks by roles?	21:12
jeblair	pabelanger: there are those two nodepool v3 stories i filed yesterday	21:13
dmsimard	jeblair: more or less... last I checked, Ansible doesn't really pass the concept of roles to callbacks -- files are recorded and ARA can filter by files afterwards. Ex: http://logs.openstack.org/13/405613/2/check/gate-tempest-dsvm-neutron-full-ubuntu-xenial/507dbc5/logs/ara/playbook/97db8f0e-56f8-4925-91a5-710a19e4ea7b/file/3f036130-6be1-428f-97c5-d7e19f46d878/index.html	21:14
jeblair	pabelanger, Shrews: i'm going to look into avoiding the taskmanagers	21:14
pabelanger	jeblair: ack, let me get started on a test to reproduce it	21:14
dmsimard	jeblair: the closest I have right now is to be able to filter by play properly (like, if there is a 1-to-1 relationship between plays and roles..)	21:15
jeblair	dmsimard: got it -- the filter next to the role file is useful	21:15
clarkb	dmsimard: firefox gets really unhappy with that	21:15
clarkb	(thinks its a run away script)	21:15
jeblair	clarkb: wfm	21:15
dmsimard	clarkb: it does ? let me look	21:15
clarkb	jeblair: could be my local browser then	21:15
jeblair	dmsimard: yeah, we could do 1:1 here, but i don't know that we would end up doing that generally in v3	21:15
clarkb	(it eventually works thouhg)	21:15
dmsimard	clarkb: ok.	21:16
dmsimard	jeblair: a 1:1 relationship is an ugly constraint to work with, I'll ask the ansible devs if there's anything exposed that I could use cause I would definitely love that.	21:16
jeblair	cool	21:16
jeblair	mordred, Shrews: ^ fyi	21:17
dmsimard	jeblair: back to my zuul question, though -- does Zuul fork out to another ansible process on the nodepool node ? Or does it sort of run the whole thing straight from ansible launcher ?	21:17
* dmsimard not familiar enough with new zuul architecture		21:17
jeblair	dmsimard: from the launcher	21:17
pabelanger	jeblair: okay, I didn't comment since I didn't know how much work was involved.	21:17
dmsimard	Ok, so if ARA would run on the launcher and everything else until the end is ansible, we could do something like http://status.openstack.org/openstack-health/#/ basically.	21:18
pabelanger	I think how we do stackviz today works, assuming you can get access to all the data from the launcher	21:18
jeblair	dmsimard: i think we would want to package up the results and publish them somewhere in a post playbook as pabelanger suggested	21:19
jeblair	(launchers aren't meant to serve end-users)	21:20
dmsimard	Yeah, stackviz is great -- ARA can generate a static version too. At some level of scale it gets sort of absurd, though. Like OpenStack-Ansible's gate has >6k tasks which ends up being >10k files and ~50MB gzipped .. all from a 6MB sqlite database file.	21:20
clarkb	(which is basically how this test worked)	21:20
pabelanger	indeed, that would be pretty slick if we did that	21:20
clarkb	dmsimard: all the more reason to distribute it ?	21:20
dmsimard	clarkb: but we don't /need/ to generate the static version	21:20
jeblair	dmsimard: ah, yeah, sure we can copy the sqlite file over to something that can serve it	21:21
dmsimard	It's a flask app that's super lightweight when using sqlite/mysql	21:21
dmsimard	jeblair: yeah, I discussed with mtreinish to sort of retrieve the sqlite files the way he retrieves the tempest results	21:21
clarkb	dmsimard: right but what happens when an openstack ansible change fails a bunch of tests and all of a sudden you have to render all of those at once off a single server	21:21
pabelanger	clarkb: so, have a central ARA server that will generate things and archive them off some place?	21:21
clarkb	pabelanger: dmsimard is suggesting that we have the launcher possibly host the contents	21:22
pabelanger	Ah	21:22
dmsimard	not really, hang on, let's start again	21:22
jeblair	dmsimard: well, we will have different constraints with v3, in that we will be able to push things rather than pull, but still we'll have the idea of launcher collects data, then something is shipped off to somewhere at the end of the job.	21:22
jeblair	(that could mean shipping a small data bundle to a central ara flask server)	21:23
dmsimard	What's important about the launchers if to have the callback configured and saving stuff. The stuff can be saved to a central mysql server (not unlike openstack-health) directly through sqlalchemy -- or saved locally inside a sqlite database that'd be pushed/pulled and imported in a central place (again, like openstack-health)	21:24
dmsimard	Generating the static version of the app is nuts at any kind of serious scale, like beyond 4k tasks is not reasonable .. at least currently. So I'd tend to run the database driven flask app instead of generating the static files.	21:24
jeblair	oh if the callback can stream to mysql server that's probably even easier then :)	21:25
dmsimard	This runs off of the flask app with sqlite: http://46.231.133.111/	21:25
dmsimard	jeblair: yeah, it's just sqlalchemy .. http://ara.readthedocs.io/en/latest/configuration.html#ara-database	21:25
dmsimard	it's magic*alchemy	21:25
clarkb	dmsimard: one benefit of the static generation is you pay the cost once for any single report and that cost is eaten by the test node itself. But if its unwieldy for a web broswer to handle I can see how that wouldn't work well	21:25
jeblair	cool, so i feel pretty confident we can fairly easily slip this in to v3 when the time comes	21:26
dmsimard	At scale, it's not a web browser issue .. I mean, you're still browsing one page at once, right ? It just takes a long time to generate the static version and it's a stupid amount of files.	21:26
clarkb	dmsimard: gotcha so the 50MB isn't a single file that browser has to uncompress	21:27
jeblair	(since the callback will run on the launcher, security concerns are lessened)	21:27
clarkb	its many little files	21:27
clarkb	dmsimard: how long does it take?	21:27
dmsimard	clarkb: right, it's 50MB in ~10k files for the openstack-ansible playbooks. 15 minutes on my laptop.	21:27
clarkb	gotcha	21:27
dmsimard	clarkb: it's otherwise a 6MB sqlite file that's instantaneously loaded	21:27
dmsimard	This is probably the heaviest page from OSA: http://46.231.133.111/playbook/293c1e05-066d-44a6-b7e1-11cc4fc4a1ef/	21:27
clarkb	dmsimard: and you don't see that high cost reflected on dynamic laods of a centralized server?	21:28
jeblair	dmsimard: have you tested what it might look like with 3600000 runs in a single db?	21:28
clarkb	right ^ being the next thing :)	21:28
dmsimard	clarkb: if there's no one browsing it, there won't be any load	21:28
jeblair	(that's ~6 months for us)	21:28
*** rcarrillocruz has joined #zuul		21:28
dmsimard	jeblair: Nope, I have no clue. tbh there are probably a fair bit of improvements to do at that kind of scale. Some features that are definitely not in yet. Searching, paging, etc.	21:29
dmsimard	I'm only one guy and this is a side project though :(	21:29
jhesketh	Morning	21:29
jeblair	dmsimard: well, searching/paging probably isn't too important off the bat -- we can deep-link job results into it	21:29
clarkb	there is definitely a trade off between generating things once at high cost vs generating only what you need on demand at potentially high cost	21:29
dmsimard	doesn't require "generating" on demand	21:29
clarkb	dmsimard: sure it does	21:30
dmsimard	it's flask/jinja rendered in real time, database driven	21:30
clarkb	dmsimard: you have to serve http and js with db data	21:30
dmsimard	I don't see that as a potentially high cost	21:30
clarkb	dmsimard: if it takes 15 minutes to do that for one job the total cost of all those pages is ~15 minutes	21:30
clarkb	dmsimard: now if you get users to browse that job several times its still 15 minutes	21:30
dmsimard	No, it's not	21:31
clarkb	in the static generation case	21:31
dmsimard	yeah, for static.	21:31
jeblair	clarkb: though not all jobs get looked at	21:31
clarkb	in the dynamic case it depends on how many pages they hit	21:31
clarkb	jeblair: right	21:31
dmsimard	Otherwise it's 0 minutes or 1 second for one page or something :p	21:31
dmsimard	(dynamic)	21:31
clarkb	my point being depending on browsing habits one may be better than the other	21:31
mordred	yah - I think it'll be a thing where poking at it a few different ways will likely teach us a lot	21:32
jeblair	if i had to guess, from a global carbon footprint perspective, the central server would probably be a win for us :)	21:32
dmsimard	That said, this is part of the reason why I submitted https://review.openstack.org/#/c/397773/	21:33
clarkb	dmsimard: oh cool is that ready?	21:33
dmsimard	It wfm on logs-dev	21:33
clarkb	awesome will review	21:33
rcarrillocruz	mordred , jeblair : you good with +A https://review.openstack.org/#/c/404974/	21:34
rcarrillocruz	if so, i'll rebase the chain on it and remove role defaults	21:34
dmsimard	clarkb: ty :)	21:35
pabelanger	doh, misread our DIB schedule, another 45mins until fedora-23 start on nb01.o.o	21:35
clarkb	there are other benefits to consider too, for example rendering in job means you can self test changes to the hosting. Downside to that means old content wont' get bugfixed when you push them only new things will. etc. I think one of openstack-health's biggest issues has just been the pure quantity of data	21:36
clarkb	and since it tries to provide a global view we can't as easily split that workload out onto the test jobs themselves	21:36
clarkb	but "we have too much data in the mysql db" is probably a good problem to have as far as problems go	21:37
pabelanger	that's what I like about stackviz, the in job rendering. So I hope we could also do the same with ARA, even if we did a central approach	21:37
*** abregman has joined #zuul		21:38
jeblair	rcarrillocruz, mordred: i +3 404974 but mordred should still see it	21:39
rcarrillocruz	thx	21:40
* mordred looking		21:40
mordred	yah - looks great	21:41
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663	21:56
jeblair	mordred, pabelanger, Shrews: ^ that should get rid of the bulk of our idle threads. even with that change, we can still rework the builder so that all the workers share providermanagers, and if we do that, it's actually a simple boolean switch to whether they serialize cloud api calls or not.	21:58
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder https://review.openstack.org/405663	22:00
*** openstackgerrit has quit IRC		22:03
mordred	jeblair: that patch makes sense - and image uploads are certainly not what TaskManagers exist to help with	22:04
rcarrillocruz	mordred: mind re-reviewing https://review.openstack.org/#/c/401975/ pls, i amended clarkb comments	22:19
SpamapS	jeblair: indeed, 399871 looks correct now, thanks for spotting that. I'm pushing a new patch that fixes pep8.	22:47
jeblair	SpamapS: thanks, sorry i missed the pep8	22:47
*** openstackgerrit has joined #zuul		22:48
openstackgerrit	K Jonathan Harker proposed openstack-infra/nodepool: Write per-label nodepool demand info to statsd https://review.openstack.org/246037	22:48
openstackgerrit	Clint 'SpamapS' Byrum proposed openstack-infra/zuul: Add test for variant override https://review.openstack.org/399871	22:49
*** abregman has quit IRC		23:03
openstackgerrit	Jamie Lennox proposed openstack-infra/nodepool: Accept user-home in config validator https://review.openstack.org/404519	23:25
jeblair	adam_g: https://storyboard.openstack.org/#!/story/2000785	23:43
jeblair	SpamapS: take a look at https://storyboard.openstack.org/#!/board/39	23:46
jeblair	SpamapS: i wrote a quick script that uses the storyboard api to semi-automatically manage that board	23:46
jeblair	SpamapS: the rules are: merged items are removed from the board, inprogress/review must show up in "in progress" or "blocked" (in progress is the default). todo must show up in "new", "backlog" or "todo" ("new" is the default).	23:48
jeblair	SpamapS: so basically, a "todo" task it hasn't seen before will be added to "new". we can move that to backlog or todo. when someone grabs it, it will be moved to "in progress" automatically. we can then move it to/from blocked as needed. when it merges, it's removed from the board.	23:49
jeblair	(could easily have a merge lane as well, but i'm not sure how useful it is)	23:49
jeblair	Zara, SotK: ^ fyi	23:52
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul: Re-enable requirement vote tests https://review.openstack.org/401061	23:54
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul: Re-enable requirement status tests https://review.openstack.org/401062	23:54
openstackgerrit	Jamie Lennox proposed openstack-infra/zuul: Re-enable requirement reject tests https://review.openstack.org/401063	23:54

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!