Friday, 2016-11-11

clarkbjeblair: Shrews I am reviewing https://review.openstack.org/#/c/383962/5/nodepool/tests/test_zk.py and one thing that stands out at me is we reuse the chrooted path but not the connection. Tests shouldn't interfere with each other bceause they each get a different chroot, but is it possible for the connections for a single test to interfere with each other?00:02
clarkbI think thats a very long winded way to ask "should we be using the zk fixture's connection in tests to avoid things fighting?"00:03
clarkboh we are making a second connection to test lock behavior nevermind00:04
clarkb(so any interference is intentional()00:04
Shuo_clarkb: is there zuul installation doc (more ops oriented)00:04
clarkbShuo_: http://docs.openstack.org/infra/zuul/quick-start.html#install-zuul is what we have I think00:05
openstackgerritIan Wienand proposed openstack-infra/nodepool: Handle exception from image upload  https://review.openstack.org/39644100:10
openstackgerritIan Wienand proposed openstack-infra/nodepool: Separate image upload logs into separate logger  https://review.openstack.org/39644200:10
jeblairclarkb: i think that's correct.00:10
clarkbjust starting to chew on https://review.openstack.org/#/c/394592/9/nodepool/zk.py and I feel like I need a schema/model written down somewhere00:18
clarkbquick digging in nodepool doesn't show one, is that something maybe we can add (or if I am blind help me find it?)00:19
clarkbfor example I don't know what attributes should be valid in the objects there etc00:19
Shrewsclarkb: the schema is "modeled" in the spec (http://specs.openstack.org/openstack-infra/infra-specs/specs/nodepool-zookeeper-workers.html), but having someone document it proprely is probably a good idea00:23
clarkbShrews: ya but that doesn't include eg the valid states and so on. Its a good high level overview but was hoping for something a bit more indepth without reverse engineering it00:25
clarkbor what the lock node is (which ahs shown up)00:25
clarkbanyways I have to go grocery shopping so have to put reviewing on hold (I revidewed the rest of the stack whcih is far less chewy, feel free to move head on the stack without me)00:26
Shrewsclarkb: the states and lock nodes were pretty dynamic as i worked my way through it, but i think we can properly document them, too, now00:26
Shrewsjust not sure how to make clear docs that do that. if anyone has good doc skillz, would be a good chance to contribute00:27
clarkbmaybe using a lang like voluptuous to describe the schema?00:29
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Re-enable test_image_list_empty  https://review.openstack.org/39644900:29
openstackgerritIan Wienand proposed openstack-infra/nodepool: Separate image upload logs into separate logger  https://review.openstack.org/39644200:35
jeblairclarkb: that's nearly a 1:1 with the current db sqlalchemy model.  we could probably throw some docstrings in there for the properties and states, but it's an internal api, so i don't think we need to polish it for external consumption.00:38
jeblairShrews: ^00:38
Shrewsjeblair: still might be useful for the dev internals doc: https://github.com/openstack-infra/nodepool/blob/feature/zuulv3/doc/source/devguide.rst00:39
Shrewswhich reminds me, that needs updating00:43
jeblairShrews, clarkb, ianw: i'm wondering if we want to keep the image-upload command.00:43
jeblaircurrently, it submits a job over gearman to the builder to trigger an upload00:44
jeblairthis is not necessary in the zk model because the zk builder aggressively tries to upload any images that need it00:44
jeblairthe only use would be by an operator who manually logged into a builder node to run it locally00:44
*** jamielennox is now known as jamielennox|away00:44
jeblairwhile the builder daemon was not running00:44
jeblairi'm not sure we have an actual use for that (especially since we can configure nodepool to perform image work without launching nodes on a provider)00:45
jeblair(just to be clear -- i'm talking about the "nodepool image-upload" CLI command)00:46
Shrewswell00:46
Shrewsif a build were done by hand, then the ZK data would be missing00:47
Shrewsso it wouldn't be uploaded automagically00:47
Shrewsi'm not sure how a manual command could make that work, even in that case.00:48
Shrewsuploading is very dependent on build zk info... so if we need that functionality, some thought about making that work is needed00:49
*** Shuo_ has quit IRC00:49
Shrewsi don't think we need it, since we have the build request API00:50
Shrewsuploading would happen automatically in that case00:50
* Shrews votes to remove image-upload00:51
ianwit seems logical to roll uploading into building01:04
clarkbone reason manual splitting of the two is nice is uploads are super slow01:09
clarkband sometimes you just eant to make osic happen first01:09
clarkbas its fastest with most quota01:09
openstackgerritIan Wienand proposed openstack-infra/nodepool: Add option to force image delete  https://review.openstack.org/39647803:00
ianwthat's just the image-delete --force for v303:02
*** harlowja has quit IRC03:33
*** bhavik1 has joined #zuul04:10
*** harlowja has joined #zuul04:38
*** bcoca has quit IRC04:41
*** phschwartz has quit IRC04:47
*** harlowja has quit IRC05:51
*** harlowja has joined #zuul05:52
*** yolanda has quit IRC07:41
*** yolanda has joined #zuul07:41
*** harlowja has quit IRC08:03
*** abregman has joined #zuul08:47
*** abregman has quit IRC08:58
*** openstackgerrit has quit IRC09:48
*** openstackgerrit has joined #zuul09:49
*** abregman has joined #zuul09:50
*** abregman is now known as abregman|nb10:07
*** abregman|nb is now known as abregman|afk11:28
*** bhavik1 has quit IRC12:10
*** pabelanger has quit IRC12:57
*** pabelanger has joined #zuul12:57
pabelangerI think it's fine to remove it now, but with the option to add it back in if needed13:21
*** bcoca has joined #zuul13:32
*** abregman|afk is now known as abregman14:07
*** abregman is now known as abregman|afk14:12
*** herlo has quit IRC14:55
*** herlo has joined #zuul14:55
*** herlo has joined #zuul14:55
*** abregman|afk has quit IRC15:22
timrcGood morning!15:40
timrcSo I'm not sure what the exact plans for console logging are for zuulv3, but has anyone looked at kafka for this? Seems like we could get a lot out of it the box here... there's a kafka-console-producer that streams stdout to kafka and then things like kafka-websocket-consumer which will can read that stream (provided it knows the topic) to html515:44
timrcIt also uses zookeeper, sounds like a match made in heaven, really :) jk15:45
mordredmorning timrc !15:46
mordredso - amongst things that are important as we start poking at v3 console logging is that we need to be very careful to not need to install stuff on the build nodes15:48
openstackgerritMerged openstack-infra/nodepool: Add option to force image delete  https://review.openstack.org/39638815:48
timrcOh well that most likely invalidates kafka :)15:49
mordredthe current approach involves spinning up a little tiny daemon that just knows how to cat the console logs to a port directly - I think for v3 we likely want to hook in to that with an ansible callback plugin somehow to get a holistic view (the console logs and the rest of the ansible logs are split currently, which isn't a great user experience)15:49
mordredhowever - if we're serving logs from the launcher/workers instead of the build nodes, then it's possible that something like the kafka thing you're talking about could be useful15:50
mordredsince we don't have the same concern there15:50
pabelangerYa, would be great if we could leverage ansible callback somehow15:50
mordredpabelanger: yah - the real tricky part is that ansible callback isn't _really_ set up for streaming as much as it is reporting events being done when they're done15:51
timrcSo something like https://github.com/ansible/ansible/blob/devel/lib/ansible/plugins/callback/log_plays.py ?15:51
mordredpabelanger: I mean, it's definitely what we need to do - but something it going to be ugly in there15:51
timrcCould nc back to the launcher and then do something with it from there15:51
mordredtimrc: yah - exactly15:52
pabelangermordred: Right, maybe something we need to submit back to ansible to make better?15:52
timrcI'm not sure how threading works with ansible modules or if that even matters.15:52
mordredpabelanger: indeed15:52
mordredtimrc: I think in the callback plugin we can use threads ourselves without any issues - it'll have been launched in a subprocess15:53
bcocamordred: as long as it does not update any vars ...15:54
mordredbcoca: ++15:54
timrcReminds me of a time I tried using a generator in a multithreaded Python app :)15:56
bcocai just gave you a gun, you are the one that loaded it, pointed at your foot and started puling the trigger ...15:57
*** herlo has quit IRC15:58
timrcHey it _felt_ intuitive and it immediately became obvious what was happening, so I learned something about Python itself that day :)16:01
bcocawell, at least you have all your toes16:04
timrcThen there was this other time... jk16:05
*** herlo has joined #zuul16:05
*** herlo has joined #zuul16:05
mordredtimrc: that's why we call you 8-toe-tim16:06
timrcmordred: I like that folk think zuul is something new :)16:06
timrcre: that interivew you just did16:06
mordredtimrc: ikr?16:06
timrctimmy-two-toe16:06
mordredtimrc: timmy-two-toe is a much better name16:07
openstackgerritPaul Belanger proposed openstack-infra/zuul: Update webapp status json to support tenants  https://review.openstack.org/39168116:11
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_time_database test  https://review.openstack.org/39668416:11
*** abregman has joined #zuul16:29
*** abregman is now known as abregman|nb16:35
*** abregman|nb is now known as abregman16:43
*** abregman is now known as abregman|afk17:03
Shrewsjeblair: i don't see how 396422 will work17:05
Shrewsjeblair: in the _buildImage() method, image.name is used in the basename17:05
jeblairShrews: except that image is really a diskimage17:07
jeblairShrews: so with that change, we use diskimage name in both places17:07
Shrewsoh, different sources17:09
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_repo_deleted test  https://review.openstack.org/39670317:12
*** abregman|afk is now known as abregman|nb17:14
jeblairShrews: let me take another stab at trying to resolve the image/diskimage confusion17:17
jeblair(it's *really* chafing on me)17:17
jeblair(not to change that patch -- if i succeed, i'll build on to the existing stack)17:18
Shrewsjeblair: k. i'm stepping out for lunch anyway17:20
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_check_smtp_pool test  https://review.openstack.org/39670717:21
* timrc looks on with envy as he works on refactoring his jjb repo17:39
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove snapshot support  https://review.openstack.org/39671918:04
*** harlowja has joined #zuul18:55
openstackgerritPaul Belanger proposed openstack-infra/zuul: [WIP] Implement tracking of launch attempts for jobs  https://review.openstack.org/39505619:07
openstackgerritCaleb Boylan proposed openstack-infra/nodepool: Fix subnode deletion  https://review.openstack.org/37045519:12
*** persia has quit IRC19:34
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove snapshot support  https://review.openstack.org/39671919:35
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove diskimage parameter from config  https://review.openstack.org/39674919:35
jeblairShrews, clarkb: ^ the first one is substantial, but 'easy' since it's just deleting a bunch of code we don't plan on using.  the second builds on that to remove the image->diskimage indirection in the config file which has been causing so much confusion for us.  it should be a backwards compatible change as long as your diskimage and provider image names match (ours do).  we will just need to remove the extra keys from our config after this lands.19:38
openstackgerritPaul Belanger proposed openstack-infra/zuul: [WIP] Implement tracking of launch attempts for jobs  https://review.openstack.org/39505620:01
*** abregman|nb is now known as abregman|afk20:37
*** _ari_ has quit IRC20:44
*** _ari_ has joined #zuul20:44
*** hashar has joined #zuul20:47
*** _ari_ has quit IRC20:56
*** _ari_ has joined #zuul21:01
*** _ari_ has quit IRC21:14
*** _ari_ has joined #zuul21:19
*** harlowja has quit IRC21:32
*** hashar has quit IRC21:43
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_disable_at test  https://review.openstack.org/39678521:52
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_disable_at test  https://review.openstack.org/39678521:59
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_live_reconfiguration test  https://review.openstack.org/39348821:59
openstackgerritPaul Belanger proposed openstack-infra/zuul: Re-enable test_crd_check_reconfiguration test  https://review.openstack.org/39678822:02
jeblairpabelanger: retry change lgtm except a couple of suggestions for improving the test.22:22
pabelangerjeblair: great, looking22:25
openstackgerritPaul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs  https://review.openstack.org/39505622:52
pabelangerjeblair: ^ updated. I also exposed the ability to configure the attempt per job.  If you don't like that, I can remove it22:53
openstackgerritPaul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs  https://review.openstack.org/39505622:53
jeblairpabelanger: lgtm.  metajob .* can act as a global default if needed.22:55
pabelangerjeblair: cool22:57
clarkbwould it be terrible if I asked that the test set a non default attempts to make sure that plumbing works? and also we should probably check in the test that we report failure as a result of running out of attempts? (maybe that is already checked and I am missing it)22:59
jeblairi think those are both good ideas23:00
clarkb(and for UI it might be simpler to understand if its not zero indexed and change the comparison to >= attempts, but I can go either way on that so meh)23:00
pabelangersure, I can update it23:00
*** harlowja has joined #zuul23:01
jeblairpabelanger, clarkb: yeah, let's go full-on pedant here:  if the word in the config file is 'attempts', it should be one based and >=.  if it is retries it should be 0 based and >.23:01
clarkbjeblair: +123:01
clarkb(you might have to update the tries dict initial value too to make that work)23:02
jeblairactually, i don't know if i got the >/>= right in that, but you get the idea.23:02
* pabelanger nods23:02
* jeblair hopes to avoid another 'rate' situation :)23:02
clarkbyes rate is sort of what I had in mind with that comment23:03
clarkbits documented so you can figure it out, but...23:03
jeblairswitch to v3 would be a good time to change that23:04
jeblairSpamapS: I spent some time with https://review.openstack.org/393544 and some test scripts and left some comments23:06
SpamapSjeblair: cool, glad you were able to play with it. I'm mostly just a huge inheritance hater, so the first sign of trouble with it has me removing it. :)23:17
SpamapSjeblair: pretty sure you can still do bytes for anything going in, and if you use a non Text* on the other side, you'll get bytes back.23:18
SpamapSI'll see if I can construct some func tests like that and look at how we might make the API simpler.23:19
openstackgerritPaul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs  https://review.openstack.org/39505623:23
pabelangerclarkb: ^ how is that?23:23
jeblairSpamapS: i'm hoping the idea of not removing the tool of inheritence from gear users is enough to paper over our differences there :)  truth is -- gear was designed pretty heavily for inheritance (with the expectation that any slightly complex worker would end up subclassing Worker for example).  so it seems weird to break that model here.23:25
openstackgerritPaul Belanger proposed openstack-infra/zuul: Add attempts logic for jobs  https://review.openstack.org/39505623:25
jeblairSpamapS: in my py3 client test, it did not like me giving bytes to a TextJob.23:26
clarkbpabelanger: is the comparison in https://review.openstack.org/#/c/395056/8/zuul/launcher/gearman.py still correct?23:28
clarkbfirst time its 1 > 3, then 2 > 3, then 3 > 3 fails so only 2 attempts?23:28
clarkb(its weird because the check is in launch not in the oncompleted handling23:29
clarkbthough reading the test it shouldn't pass if ^ si correct23:30
clarkbsince its 2 passes for merge and test2, then 4 fails for test123:30
SpamapSjeblair: Yeah, I gave up on inheritance because I was being too generic. I can perhaps just define the attributes and arguments that get textified and keep inheritance.23:34
SpamapSjeblair: Either way, I'll make tests that take bytes, because the idea is for it to do the Text->bytes on the way in, and bytes->Text on the way out.. but not to care if you already give it bytes.23:35
jeblairSpamapS: yeah23:36
clarkbpabelanger: and the tests do pass so I am slightly confused23:36
jeblairSpamapS: what do you think of the 'job names are always utf8' idea?23:36
SpamapSjeblair: Like, even in the non-text client/worker?23:37
jeblairyep23:37
SpamapSIt might frustrate users of alternate encodings.23:37
SpamapSDon't know how much we care.23:37
SpamapSBut they could get around that by not passing strings in.23:38
jeblairSpamapS: basically, it would never occur to me that a job name would be anything other than utf8, so for ease of use in the lib (including for people with otherwise binary payloads), i'm willing to go out on a small limb and be opinionated on that.23:38
pabelangerclarkb: Ya, compare is correct. By default zuul will launch the job 3 times, and on the 4th time is where we hit the limit. So, we had 3 attempts and failed23:39
SpamapSjeblair: as long as we allow bytes it's a work-aroundable change.23:39
jeblairSpamapS: (as usual, it's the out direction that's more of a problem -- you're right about passing bytes in, but i'm also saying that a WorkerJob and TextWorkerJob should both have a utf8 string as their name attr)23:40
clarkbpabelanger: right but you start job and set attempts to 1, job stops you set attempts to 2, and now you are off by one I think but test says not so trying to reconcile that in my tracing23:40
clarkbpabelanger: oh! its because we don't then increment on the next job start we only increment on job finishes23:40
clarkbpabelanger: now I udnerstand23:40
pabelangerright23:40
SpamapSjeblair: yeah, I think that's theoretically a problem, but I don't know if it truly would matter.23:40
clarkbpabelanger: +223:40
pabelangeryay23:41
jeblairSpamapS: agreed23:41
pabelangerclarkb: should be able to bring the change on line next week I think23:41
clarkbjeblair: SpamapS: couldn't you theoretically use integer names?23:42
clarkbI'd have to go reread the protocol spec23:42
SpamapSyou can23:42
SpamapSyou can use anything except \023:42
jeblairso yes, i'm in the unusual position of advocating for a deviation from the spec :)23:43
clarkbto me it seems easy enough to recomment text interaction with protocol and provide the tools to do so, while still having a "raw" version under the hood for people too23:44
jeblairbasically, this is my argument based on a theoretical jpeg encoding worker: http://paste.openstack.org/show/588987/23:44
clarkb*recommend23:44
SpamapSWe COULD make the current implementation "BytesClient" and "BytesWorker"...23:44
SpamapSand then Client and Worker add the job name thing23:44
SpamapSand TextClient and TextWorker also do string conversions on arguments.23:45
SpamapSso you'd technically break API for the tiny tiny tiny section of theoretical users who are in fact using non-utf8 function names with gear23:45
SpamapSbut you'd give them an out (Bytes*)23:45
jeblairSpamapS: i think that would work23:46
SpamapSand then python3 users who want ease of port will either be covered for their =='s, or go to Text* for their payloads that they expect to be strings.23:46
SpamapSSeems like the simplest "harm reduction strategy".23:46
clarkbhaving used binary protocols in other places for things like radios in airplanes, it feels really weird to me to not support the bianry aspects of a binary protocol23:47
SpamapSand now that I'm thinking about arguments and name separately, it makes the whole tamale simpler to code.23:47
jeblairclarkb: your phrase 'text interaction with protocol' makes me realise that the gearman admin protocol pretty much assumes job names will be printable text.23:47
SpamapSclarkb: Having supported hundreds of gearman users, I haven't found one yet that used anything but ascii.23:47
clarkband we even had implemetnations written in lua which has no Integer type...23:48
SpamapSLike I totally get you, but we _are_ supporting the weird case of using invalid UTF-8, We're just breaking API for them.23:48
clarkbSpamapS: right I have no issue with making it easier to use utf8 if you assert it23:49
jeblairclarkb, SpamapS: so i'm going to continue to believe there's an implicit assumption that the job name is utf8 at best (or, more realistically as SpamapS just said, ascii).  and it would be totally okay for us to do that.  however, this BytesJob/Job/TextJob solution still works, so if that's what you want to do, okay.  we should probably annotate BytesJob with "this is probably a bad idea".  :)23:49
clarkband since under the hood it must be binary anyways, I thinkwe can just build it in layers23:49
jeblair(since we've just now made an easy api for people to have non-utf8 job names)23:50
SpamapSI don't think it's a bad idea.23:50
clarkbyup I think that spread of options is fine and fine with telling people that using human readable strings for job names is recommended23:50
jeblairSpamapS: it's not a bad idea to have '\x01' as a job name?23:50
SpamapSIt's just highly irregular and I personally have not seen it in the 7 years I've been developing, using, and supporting gearman.23:50
clarkbone reason people might do \x01 as a job name is if they want to be silly fancty with optimizing job dispatch using jump tables and such23:51
jeblairSpamapS: it will not look right when you issue the 'status' admin command23:51
SpamapSjeblair: no, it's not. It's weird. It's not going to be easy. But there may very well be a use case. :-P23:51
clarkb(not sure thats a good idea but its the sort of thing people seem to do in the world)23:51
SpamapSjeblair: hah yeha, it's going to go all wonk23:52
jeblairSpamapS: maybe we can negotiate down to "not recommended for general use"? :)23:52
SpamapSsome people don't even know the admin protocol exists :p23:52
SpamapS"crazy pants"23:52
jeblairyou drive a hard bargain23:52
SpamapSI personally love wearing my crazy pants.23:52
SpamapSI'll think hard about it, and submit something with the next patch round.23:53
jeblair(to be fair, the admin protocol also implies <tab> would be a bad idea^W^W^W crazy pants for a job name)23:53
SpamapSbut discouragement is definitely the goal23:53
jeblairSpamapS: that's like my motto23:53
SpamapSAPI docs: discouraging crazypants since 197423:54
* jeblair (nerdsniped) attempts to find earliest api documentation23:54
SpamapShook.. line.. und sinkah23:55

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!