Friday, 2016-11-11

clarkb	jeblair: Shrews I am reviewing https://review.openstack.org/#/c/383962/5/nodepool/tests/test_zk.py and one thing that stands out at me is we reuse the chrooted path but not the connection. Tests shouldn't interfere with each other bceause they each get a different chroot, but is it possible for the connections for a single test to interfere with each other?	00:02
clarkb	I think thats a very long winded way to ask "should we be using the zk fixture's connection in tests to avoid things fighting?"	00:03
clarkb	oh we are making a second connection to test lock behavior nevermind	00:04
clarkb	(so any interference is intentional()	00:04
Shuo_	clarkb: is there zuul installation doc (more ops oriented)	00:04
clarkb	Shuo_: http://docs.openstack.org/infra/zuul/quick-start.html#install-zuul is what we have I think	00:05
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool: Handle exception from image upload https://review.openstack.org/396441	00:10
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool: Separate image upload logs into separate logger https://review.openstack.org/396442	00:10
jeblair	clarkb: i think that's correct.	00:10
clarkb	just starting to chew on https://review.openstack.org/#/c/394592/9/nodepool/zk.py and I feel like I need a schema/model written down somewhere	00:18
clarkb	quick digging in nodepool doesn't show one, is that something maybe we can add (or if I am blind help me find it?)	00:19
clarkb	for example I don't know what attributes should be valid in the objects there etc	00:19
Shrews	clarkb: the schema is "modeled" in the spec (http://specs.openstack.org/openstack-infra/infra-specs/specs/nodepool-zookeeper-workers.html), but having someone document it proprely is probably a good idea	00:23
clarkb	Shrews: ya but that doesn't include eg the valid states and so on. Its a good high level overview but was hoping for something a bit more indepth without reverse engineering it	00:25
clarkb	or what the lock node is (which ahs shown up)	00:25
clarkb	anyways I have to go grocery shopping so have to put reviewing on hold (I revidewed the rest of the stack whcih is far less chewy, feel free to move head on the stack without me)	00:26
Shrews	clarkb: the states and lock nodes were pretty dynamic as i worked my way through it, but i think we can properly document them, too, now	00:26
Shrews	just not sure how to make clear docs that do that. if anyone has good doc skillz, would be a good chance to contribute	00:27
clarkb	maybe using a lang like voluptuous to describe the schema?	00:29
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Re-enable test_image_list_empty https://review.openstack.org/396449	00:29
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool: Separate image upload logs into separate logger https://review.openstack.org/396442	00:35
jeblair	clarkb: that's nearly a 1:1 with the current db sqlalchemy model. we could probably throw some docstrings in there for the properties and states, but it's an internal api, so i don't think we need to polish it for external consumption.	00:38
jeblair	Shrews: ^	00:38
Shrews	jeblair: still might be useful for the dev internals doc: https://github.com/openstack-infra/nodepool/blob/feature/zuulv3/doc/source/devguide.rst	00:39
Shrews	which reminds me, that needs updating	00:43
jeblair	Shrews, clarkb, ianw: i'm wondering if we want to keep the image-upload command.	00:43
jeblair	currently, it submits a job over gearman to the builder to trigger an upload	00:44
jeblair	this is not necessary in the zk model because the zk builder aggressively tries to upload any images that need it	00:44
jeblair	the only use would be by an operator who manually logged into a builder node to run it locally	00:44
*** jamielennox is now known as jamielennox\|away		00:44
jeblair	while the builder daemon was not running	00:44
jeblair	i'm not sure we have an actual use for that (especially since we can configure nodepool to perform image work without launching nodes on a provider)	00:45
jeblair	(just to be clear -- i'm talking about the "nodepool image-upload" CLI command)	00:46
Shrews	well	00:46
Shrews	if a build were done by hand, then the ZK data would be missing	00:47
Shrews	so it wouldn't be uploaded automagically	00:47
Shrews	i'm not sure how a manual command could make that work, even in that case.	00:48
Shrews	uploading is very dependent on build zk info... so if we need that functionality, some thought about making that work is needed	00:49
*** Shuo_ has quit IRC		00:49
Shrews	i don't think we need it, since we have the build request API	00:50
Shrews	uploading would happen automatically in that case	00:50
* Shrews votes to remove image-upload		00:51
ianw	it seems logical to roll uploading into building	01:04
clarkb	one reason manual splitting of the two is nice is uploads are super slow	01:09
clarkb	and sometimes you just eant to make osic happen first	01:09
clarkb	as its fastest with most quota	01:09
openstackgerrit	Ian Wienand proposed openstack-infra/nodepool: Add option to force image delete https://review.openstack.org/396478	03:00
ianw	that's just the image-delete --force for v3	03:02
*** harlowja has quit IRC		03:33
*** bhavik1 has joined #zuul		04:10
*** harlowja has joined #zuul		04:38
*** bcoca has quit IRC		04:41
*** phschwartz has quit IRC		04:47
*** harlowja has quit IRC		05:51
*** harlowja has joined #zuul		05:52
*** yolanda has quit IRC		07:41
*** yolanda has joined #zuul		07:41
*** harlowja has quit IRC		08:03
*** abregman has joined #zuul		08:47
*** abregman has quit IRC		08:58
*** openstackgerrit has quit IRC		09:48
*** openstackgerrit has joined #zuul		09:49
*** abregman has joined #zuul		09:50
*** abregman is now known as abregman\|nb		10:07
*** abregman\|nb is now known as abregman\|afk		11:28
*** bhavik1 has quit IRC		12:10
*** pabelanger has quit IRC		12:57
*** pabelanger has joined #zuul		12:57
pabelanger	I think it's fine to remove it now, but with the option to add it back in if needed	13:21
*** bcoca has joined #zuul		13:32
*** abregman\|afk is now known as abregman		14:07
*** abregman is now known as abregman\|afk		14:12
*** herlo has quit IRC		14:55
*** herlo has joined #zuul		14:55
*** herlo has joined #zuul		14:55
*** abregman\|afk has quit IRC		15:22
timrc	Good morning!	15:40
timrc	So I'm not sure what the exact plans for console logging are for zuulv3, but has anyone looked at kafka for this? Seems like we could get a lot out of it the box here... there's a kafka-console-producer that streams stdout to kafka and then things like kafka-websocket-consumer which will can read that stream (provided it knows the topic) to html5	15:44
timrc	It also uses zookeeper, sounds like a match made in heaven, really :) jk	15:45
mordred	morning timrc !	15:46
mordred	so - amongst things that are important as we start poking at v3 console logging is that we need to be very careful to not need to install stuff on the build nodes	15:48
openstackgerrit	Merged openstack-infra/nodepool: Add option to force image delete https://review.openstack.org/396388	15:48
timrc	Oh well that most likely invalidates kafka :)	15:49
mordred	the current approach involves spinning up a little tiny daemon that just knows how to cat the console logs to a port directly - I think for v3 we likely want to hook in to that with an ansible callback plugin somehow to get a holistic view (the console logs and the rest of the ansible logs are split currently, which isn't a great user experience)	15:49
mordred	however - if we're serving logs from the launcher/workers instead of the build nodes, then it's possible that something like the kafka thing you're talking about could be useful	15:50
mordred	since we don't have the same concern there	15:50
pabelanger	Ya, would be great if we could leverage ansible callback somehow	15:50
mordred	pabelanger: yah - the real tricky part is that ansible callback isn't _really_ set up for streaming as much as it is reporting events being done when they're done	15:51
timrc	So something like https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/log_plays.py ?	15:51
mordred	pabelanger: I mean, it's definitely what we need to do - but something it going to be ugly in there	15:51
timrc	Could nc back to the launcher and then do something with it from there	15:51
mordred	timrc: yah - exactly	15:52
pabelanger	mordred: Right, maybe something we need to submit back to ansible to make better?	15:52
timrc	I'm not sure how threading works with ansible modules or if that even matters.	15:52
mordred	pabelanger: indeed	15:52
mordred	timrc: I think in the callback plugin we can use threads ourselves without any issues - it'll have been launched in a subprocess	15:53
bcoca	mordred: as long as it does not update any vars ...	15:54
mordred	bcoca: ++	15:54
timrc	Reminds me of a time I tried using a generator in a multithreaded Python app :)	15:56
bcoca	i just gave you a gun, you are the one that loaded it, pointed at your foot and started puling the trigger ...	15:57
*** herlo has quit IRC		15:58
timrc	Hey it _felt_ intuitive and it immediately became obvious what was happening, so I learned something about Python itself that day :)	16:01
bcoca	well, at least you have all your toes	16:04
timrc	Then there was this other time... jk	16:05
*** herlo has joined #zuul		16:05
*** herlo has joined #zuul		16:05
mordred	timrc: that's why we call you 8-toe-tim	16:06
timrc	mordred: I like that folk think zuul is something new :)	16:06
timrc	re: that interivew you just did	16:06
mordred	timrc: ikr?	16:06
timrc	timmy-two-toe	16:06
mordred	timrc: timmy-two-toe is a much better name	16:07
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Update webapp status json to support tenants https://review.openstack.org/391681	16:11
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_time_database test https://review.openstack.org/396684	16:11
*** abregman has joined #zuul		16:29
*** abregman is now known as abregman\|nb		16:35
*** abregman\|nb is now known as abregman		16:43
*** abregman is now known as abregman\|afk		17:03
Shrews	jeblair: i don't see how 396422 will work	17:05
Shrews	jeblair: in the _buildImage() method, image.name is used in the basename	17:05
jeblair	Shrews: except that image is really a diskimage	17:07
jeblair	Shrews: so with that change, we use diskimage name in both places	17:07
Shrews	oh, different sources	17:09
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_repo_deleted test https://review.openstack.org/396703	17:12
*** abregman\|afk is now known as abregman\|nb		17:14
jeblair	Shrews: let me take another stab at trying to resolve the image/diskimage confusion	17:17
jeblair	(it's really chafing on me)	17:17
jeblair	(not to change that patch -- if i succeed, i'll build on to the existing stack)	17:18
Shrews	jeblair: k. i'm stepping out for lunch anyway	17:20
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_check_smtp_pool test https://review.openstack.org/396707	17:21
* timrc looks on with envy as he works on refactoring his jjb repo		17:39
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Remove snapshot support https://review.openstack.org/396719	18:04
*** harlowja has joined #zuul		18:55
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: [WIP] Implement tracking of launch attempts for jobs https://review.openstack.org/395056	19:07
openstackgerrit	Caleb Boylan proposed openstack-infra/nodepool: Fix subnode deletion https://review.openstack.org/370455	19:12
*** persia has quit IRC		19:34
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Remove snapshot support https://review.openstack.org/396719	19:35
openstackgerrit	James E. Blair proposed openstack-infra/nodepool: Remove diskimage parameter from config https://review.openstack.org/396749	19:35
jeblair	Shrews, clarkb: ^ the first one is substantial, but 'easy' since it's just deleting a bunch of code we don't plan on using. the second builds on that to remove the image->diskimage indirection in the config file which has been causing so much confusion for us. it should be a backwards compatible change as long as your diskimage and provider image names match (ours do). we will just need to remove the extra keys from our config after this lands.	19:38
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: [WIP] Implement tracking of launch attempts for jobs https://review.openstack.org/395056	20:01
*** abregman\|nb is now known as abregman\|afk		20:37
*** _ari_ has quit IRC		20:44
*** _ari_ has joined #zuul		20:44
*** hashar has joined #zuul		20:47
*** _ari_ has quit IRC		20:56
*** _ari_ has joined #zuul		21:01
*** _ari_ has quit IRC		21:14
*** _ari_ has joined #zuul		21:19
*** harlowja has quit IRC		21:32
*** hashar has quit IRC		21:43
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_disable_at test https://review.openstack.org/396785	21:52
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_disable_at test https://review.openstack.org/396785	21:59
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_live_reconfiguration test https://review.openstack.org/393488	21:59
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Re-enable test_crd_check_reconfiguration test https://review.openstack.org/396788	22:02
jeblair	pabelanger: retry change lgtm except a couple of suggestions for improving the test.	22:22
pabelanger	jeblair: great, looking	22:25
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs https://review.openstack.org/395056	22:52
pabelanger	jeblair: ^ updated. I also exposed the ability to configure the attempt per job. If you don't like that, I can remove it	22:53
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs https://review.openstack.org/395056	22:53
jeblair	pabelanger: lgtm. metajob .* can act as a global default if needed.	22:55
pabelanger	jeblair: cool	22:57
clarkb	would it be terrible if I asked that the test set a non default attempts to make sure that plumbing works? and also we should probably check in the test that we report failure as a result of running out of attempts? (maybe that is already checked and I am missing it)	22:59
jeblair	i think those are both good ideas	23:00
clarkb	(and for UI it might be simpler to understand if its not zero indexed and change the comparison to >= attempts, but I can go either way on that so meh)	23:00
pabelanger	sure, I can update it	23:00
*** harlowja has joined #zuul		23:01
jeblair	pabelanger, clarkb: yeah, let's go full-on pedant here: if the word in the config file is 'attempts', it should be one based and >=. if it is retries it should be 0 based and >.	23:01
clarkb	jeblair: +1	23:01
clarkb	(you might have to update the tries dict initial value too to make that work)	23:02
jeblair	actually, i don't know if i got the >/>= right in that, but you get the idea.	23:02
* pabelanger nods		23:02
* jeblair hopes to avoid another 'rate' situation :)		23:02
clarkb	yes rate is sort of what I had in mind with that comment	23:03
clarkb	its documented so you can figure it out, but...	23:03
jeblair	switch to v3 would be a good time to change that	23:04
jeblair	SpamapS: I spent some time with https://review.openstack.org/393544 and some test scripts and left some comments	23:06
SpamapS	jeblair: cool, glad you were able to play with it. I'm mostly just a huge inheritance hater, so the first sign of trouble with it has me removing it. :)	23:17
SpamapS	jeblair: pretty sure you can still do bytes for anything going in, and if you use a non Text* on the other side, you'll get bytes back.	23:18
SpamapS	I'll see if I can construct some func tests like that and look at how we might make the API simpler.	23:19
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs https://review.openstack.org/395056	23:23
pabelanger	clarkb: ^ how is that?	23:23
jeblair	SpamapS: i'm hoping the idea of not removing the tool of inheritence from gear users is enough to paper over our differences there :) truth is -- gear was designed pretty heavily for inheritance (with the expectation that any slightly complex worker would end up subclassing Worker for example). so it seems weird to break that model here.	23:25
openstackgerrit	Paul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs https://review.openstack.org/395056	23:25
jeblair	SpamapS: in my py3 client test, it did not like me giving bytes to a TextJob.	23:26
clarkb	pabelanger: is the comparison in https://review.openstack.org/#/c/395056/8/zuul/launcher/gearman.py still correct?	23:28
clarkb	first time its 1 > 3, then 2 > 3, then 3 > 3 fails so only 2 attempts?	23:28
clarkb	(its weird because the check is in launch not in the oncompleted handling	23:29
clarkb	though reading the test it shouldn't pass if ^ si correct	23:30
clarkb	since its 2 passes for merge and test2, then 4 fails for test1	23:30
SpamapS	jeblair: Yeah, I gave up on inheritance because I was being too generic. I can perhaps just define the attributes and arguments that get textified and keep inheritance.	23:34
SpamapS	jeblair: Either way, I'll make tests that take bytes, because the idea is for it to do the Text->bytes on the way in, and bytes->Text on the way out.. but not to care if you already give it bytes.	23:35
jeblair	SpamapS: yeah	23:36
clarkb	pabelanger: and the tests do pass so I am slightly confused	23:36
jeblair	SpamapS: what do you think of the 'job names are always utf8' idea?	23:36
SpamapS	jeblair: Like, even in the non-text client/worker?	23:37
jeblair	yep	23:37
SpamapS	It might frustrate users of alternate encodings.	23:37
SpamapS	Don't know how much we care.	23:37
SpamapS	But they could get around that by not passing strings in.	23:38
jeblair	SpamapS: basically, it would never occur to me that a job name would be anything other than utf8, so for ease of use in the lib (including for people with otherwise binary payloads), i'm willing to go out on a small limb and be opinionated on that.	23:38
pabelanger	clarkb: Ya, compare is correct. By default zuul will launch the job 3 times, and on the 4th time is where we hit the limit. So, we had 3 attempts and failed	23:39
SpamapS	jeblair: as long as we allow bytes it's a work-aroundable change.	23:39
jeblair	SpamapS: (as usual, it's the out direction that's more of a problem -- you're right about passing bytes in, but i'm also saying that a WorkerJob and TextWorkerJob should both have a utf8 string as their name attr)	23:40
clarkb	pabelanger: right but you start job and set attempts to 1, job stops you set attempts to 2, and now you are off by one I think but test says not so trying to reconcile that in my tracing	23:40
clarkb	pabelanger: oh! its because we don't then increment on the next job start we only increment on job finishes	23:40
clarkb	pabelanger: now I udnerstand	23:40
pabelanger	right	23:40
SpamapS	jeblair: yeah, I think that's theoretically a problem, but I don't know if it truly would matter.	23:40
clarkb	pabelanger: +2	23:40
pabelanger	yay	23:41
jeblair	SpamapS: agreed	23:41
pabelanger	clarkb: should be able to bring the change on line next week I think	23:41
clarkb	jeblair: SpamapS: couldn't you theoretically use integer names?	23:42
clarkb	I'd have to go reread the protocol spec	23:42
SpamapS	you can	23:42
SpamapS	you can use anything except \0	23:42
jeblair	so yes, i'm in the unusual position of advocating for a deviation from the spec :)	23:43
clarkb	to me it seems easy enough to recomment text interaction with protocol and provide the tools to do so, while still having a "raw" version under the hood for people too	23:44
jeblair	basically, this is my argument based on a theoretical jpeg encoding worker: http://paste.openstack.org/show/588987/	23:44
clarkb	*recommend	23:44
SpamapS	We COULD make the current implementation "BytesClient" and "BytesWorker"...	23:44
SpamapS	and then Client and Worker add the job name thing	23:44
SpamapS	and TextClient and TextWorker also do string conversions on arguments.	23:45
SpamapS	so you'd technically break API for the tiny tiny tiny section of theoretical users who are in fact using non-utf8 function names with gear	23:45
SpamapS	but you'd give them an out (Bytes*)	23:45
jeblair	SpamapS: i think that would work	23:46
SpamapS	and then python3 users who want ease of port will either be covered for their =='s, or go to Text* for their payloads that they expect to be strings.	23:46
SpamapS	Seems like the simplest "harm reduction strategy".	23:46
clarkb	having used binary protocols in other places for things like radios in airplanes, it feels really weird to me to not support the bianry aspects of a binary protocol	23:47
SpamapS	and now that I'm thinking about arguments and name separately, it makes the whole tamale simpler to code.	23:47
jeblair	clarkb: your phrase 'text interaction with protocol' makes me realise that the gearman admin protocol pretty much assumes job names will be printable text.	23:47
SpamapS	clarkb: Having supported hundreds of gearman users, I haven't found one yet that used anything but ascii.	23:47
clarkb	and we even had implemetnations written in lua which has no Integer type...	23:48
SpamapS	Like I totally get you, but we _are_ supporting the weird case of using invalid UTF-8, We're just breaking API for them.	23:48
clarkb	SpamapS: right I have no issue with making it easier to use utf8 if you assert it	23:49
jeblair	clarkb, SpamapS: so i'm going to continue to believe there's an implicit assumption that the job name is utf8 at best (or, more realistically as SpamapS just said, ascii). and it would be totally okay for us to do that. however, this BytesJob/Job/TextJob solution still works, so if that's what you want to do, okay. we should probably annotate BytesJob with "this is probably a bad idea". :)	23:49
clarkb	and since under the hood it must be binary anyways, I thinkwe can just build it in layers	23:49
jeblair	(since we've just now made an easy api for people to have non-utf8 job names)	23:50
SpamapS	I don't think it's a bad idea.	23:50
clarkb	yup I think that spread of options is fine and fine with telling people that using human readable strings for job names is recommended	23:50
jeblair	SpamapS: it's not a bad idea to have '\x01' as a job name?	23:50
SpamapS	It's just highly irregular and I personally have not seen it in the 7 years I've been developing, using, and supporting gearman.	23:50
clarkb	one reason people might do \x01 as a job name is if they want to be silly fancty with optimizing job dispatch using jump tables and such	23:51
jeblair	SpamapS: it will not look right when you issue the 'status' admin command	23:51
SpamapS	jeblair: no, it's not. It's weird. It's not going to be easy. But there may very well be a use case. :-P	23:51
clarkb	(not sure thats a good idea but its the sort of thing people seem to do in the world)	23:51
SpamapS	jeblair: hah yeha, it's going to go all wonk	23:52
jeblair	SpamapS: maybe we can negotiate down to "not recommended for general use"? :)	23:52
SpamapS	some people don't even know the admin protocol exists :p	23:52
SpamapS	"crazy pants"	23:52
jeblair	you drive a hard bargain	23:52
SpamapS	I personally love wearing my crazy pants.	23:52
SpamapS	I'll think hard about it, and submit something with the next patch round.	23:53
jeblair	(to be fair, the admin protocol also implies <tab> would be a bad idea^W^W^W crazy pants for a job name)	23:53
SpamapS	but discouragement is definitely the goal	23:53
jeblair	SpamapS: that's like my motto	23:53
SpamapS	API docs: discouraging crazypants since 1974	23:54
* jeblair (nerdsniped) attempts to find earliest api documentation		23:54
SpamapS	hook.. line.. und sinkah	23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!