Wednesday, 2017-02-15

jeblairShrews: wfm (another approach might be to built a set while iterating, then check that the sets match, but i don't think that's necessary).00:01
Shrewsyeah, didn't think that was necessary00:02
*** jamielennox_ has joined #zuul00:02
*** saneax-_-|AFK has joined #zuul00:04
*** jamielennox_ is now known as jamielennox00:06
SpamapSoy01:43
SpamapSI've gotten nowhere on this01:43
SpamapS2 hours in and I basically have spent 8 minutes actually debugging :-P01:43
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement node cleanup  https://review.openstack.org/43373602:11
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable devstack test job  https://review.openstack.org/43164902:15
ShrewsIf 649 passes, I'm spending tomorrow drinking02:15
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable devstack test job  https://review.openstack.org/43164902:18
*** Cibo_ has joined #zuul02:32
*** Zara_ has joined #zuul03:06
*** adam_g_ has joined #zuul03:06
*** gundalow_ has joined #zuul03:06
*** jkt_ has joined #zuul03:06
*** nibz has joined #zuul03:06
*** cinerama` has joined #zuul03:07
*** Cibo_ has quit IRC03:07
*** Zara has quit IRC03:07
*** adam_g has quit IRC03:07
*** Cibo has quit IRC03:07
*** cinerama has quit IRC03:07
*** nibalizer has quit IRC03:07
*** gundalow has quit IRC03:07
*** jkt has quit IRC03:07
*** adam_g_ is now known as adam_g03:07
*** Cibo has joined #zuul03:08
*** saneax-_-|AFK is now known as saneax03:20
*** saneax is now known as saneax-_-|AFK03:22
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement node cleanup  https://review.openstack.org/43373603:40
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement node cleanup  https://review.openstack.org/43373603:45
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable devstack test job  https://review.openstack.org/43164903:46
Shrewsjeblair: I think we need some more smarts built around the min-ready code path. If an image never becomes available for a provider, nodepoold will just keep generating requests for nodes of that type. We can see that here: http://logs.openstack.org/49/431649/8/check/gate-dsvm-nodepool/c4b26fa/logs/screen-nodepool.txt.gz04:01
Shrewsjeblair: I'm too tired to come up with a solution for that, so hopefully you have some thoughts.04:02
Shrewsjeblair: Also, I'm not clear on why those images aren't available in the gate-dsvm-nodepool job there. But, again... tired04:03
* Shrews enters sleep mode04:03
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix missing mutex release when aborting builds  https://review.openstack.org/43221106:31
*** saneax-_-|AFK is now known as saneax06:39
*** isaacb has joined #zuul07:25
*** isaacb has quit IRC07:51
*** saneax is now known as saneax-_-|AFK07:58
*** isaacb has joined #zuul08:00
*** Cibo_ has joined #zuul08:37
*** Cibo_ has quit IRC08:42
*** saneax-_-|AFK is now known as saneax08:43
*** Cibo_ has joined #zuul08:51
*** gundalow_ is now known as gundalow08:59
*** hashar has joined #zuul09:08
*** bhavik1 has joined #zuul09:16
*** Zara_ is now known as Zara09:51
*** rzetikx has joined #zuul09:56
*** rzetikx has quit IRC09:57
*** isaacb has quit IRC10:14
*** isaacb has joined #zuul10:15
*** isaacb has quit IRC10:25
*** bhavik1 has quit IRC10:33
*** isaacb has joined #zuul11:22
*** jkt_ is now known as jkt11:32
openstackgerritJoshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Merge branch 'master' into feature/zuulv3  https://review.openstack.org/43423211:35
*** Cibo_ has quit IRC11:43
*** isaacb has quit IRC12:11
*** hashar has quit IRC12:25
openstackgerritJoshua Hesketh proposed openstack-infra/zuul feature/zuulv3: Merge branch 'master' into feature/zuulv3  https://review.openstack.org/43423212:34
*** hashar has joined #zuul12:58
*** isaacb has joined #zuul13:02
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove TODO comment that seems to be done  https://review.openstack.org/43396113:03
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Re-enable test_footer_message  https://review.openstack.org/43048613:04
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Drop test_node_label  https://review.openstack.org/43047313:04
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Re-enable test_client_enqueue_ref test  https://review.openstack.org/39388713:05
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable devstack test job  https://review.openstack.org/43164913:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement node cleanup  https://review.openstack.org/43373613:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable devstack test job  https://review.openstack.org/43164913:13
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement node cleanup  https://review.openstack.org/43373613:13
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Re-enable devstack test job  https://review.openstack.org/43164913:32
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Re-enable devstack test job  https://review.openstack.org/43164913:39
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Re-enable devstack test job  https://review.openstack.org/43164913:56
Shrewssorry for the noise. trying to track down a lock issue13:57
*** isaacb has quit IRC14:05
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Set Node image_id and launcher attributes  https://review.openstack.org/43324214:05
*** isaacb has joined #zuul14:06
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add generator API method for node iteration  https://review.openstack.org/43325214:08
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Disconnect from ZooKeeper at shutdown  https://review.openstack.org/43391914:08
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix missing mutex release when aborting builds  https://review.openstack.org/43221114:13
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove subnodes from nodepool  https://review.openstack.org/43240314:13
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Re-enable devstack test job  https://review.openstack.org/43164914:24
*** jianghuaw has joined #zuul14:33
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add reporter for Federated Message Bus (fedmsg)  https://review.openstack.org/42686114:43
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Re-enable devstack test job  https://review.openstack.org/43164914:52
*** isaacb has quit IRC14:59
*** isaacb has joined #zuul15:10
jeblairShrews: how about we have the launcher peek at zk and only submit a min-ready request if there's a ready image for any provider?  that wouldn't catch all the cases, but it would catch the most common one.15:23
jeblairShrews, pabelanger: and i think those imagase which aren't available in the devstack job are the ones that are paused, so we aren't building them.15:24
Shrewsjeblair: that was the first thought i had. but yeah, doesn't catch the other cases (quota exceeded, provider problems)15:25
Shrewsjeblair: another thought i had was add a failure code/reason to the NodeRequest and add some sort of backoff algorithm based on the fail code15:26
Shrewsbut trying to track down this node lock issue so i haven't been able to think too much about either of those solutions15:26
jeblairShrews: yeah, or we could consider moving the min-ready supply into the providerworker itself (if i have quota, and i have an image ready, and there is a label that's under min-ready, build one)15:27
*** saneax is now known as saneax-_-|AFK15:28
jeblairShrews: (iow, bypass the request process)15:28
*** yolanda has quit IRC15:34
*** yolanda has joined #zuul15:35
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: WIP: Re-enable devstack test job  https://review.openstack.org/43164915:44
*** abregman has joined #zuul15:56
*** abregman has quit IRC15:56
*** abregman has joined #zuul15:58
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Implement node cleanup  https://review.openstack.org/43373616:00
*** isaacb has quit IRC16:09
*** morgan_ is now known as morgan16:10
Shrewsjeblair: ok, think i found it. It's a timing issue in check_devstack_plugin.sh. The node is marked as READY, but isn't unlocked until shortly after when the request is marked FULFILLED. The 'delete' command is trying to lock the node and set the DELETE state during the period it is still locked (which is very short). I'm adding a 5 second wait time to the lock attempt in the 'delete' command.16:14
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable devstack test job  https://review.openstack.org/43164916:15
Shrewss/wait time/lock timeout/16:16
jeblairShrews: i wonder if we should add a 'locked' field to the node list command, and have 'waitfornode' also wait for the node to be unlocked?16:18
Shrewsjeblair: hah. i did that to debug this. but i didn't keep it because it would be slightly racey. we'd have to store the field as None just before we actually unlock it.16:21
Shrewsjeblair: oh, my method added a field to Node. maybe you're just suggesting trying the lock in the list command itself16:22
jeblairShrews: oh, i wasn't necessarily suggesting a new field, but rather a check to see if a lock node exists.  that's almost equally racy of course.16:22
jeblairya16:22
Shrewsyes16:22
jeblairbut you know, it's as accurate as we can get, and it's probably useful info for an operator anyway16:22
Shrewsi'll work something up16:23
Shrewsbut i think the delete lock timeout is still worthy. it should mostly be instantaneous for normal use16:23
jeblairyeah16:23
*** cinerama` is now known as cinerama16:23
pabelangerjeblair: re: 433235 where should I be looking to query all nodes?16:29
jeblairpabelanger: there's a getNodes method on the zk object, and Shrews recently added an iterator (i'd recommend using that)16:30
jeblairShrews: and maybe you want to take a look at my comment on 433235 before pabelanger proceeds?16:31
pabelangerjeblair: thanks for the pointer16:31
pabelangersure, I can wait16:32
Shrewsjeblair: pabelanger: yeah, that sounds like a good idea there16:48
Shrewspabelanger: zk.nodeIterator() is what you want16:48
*** abregman has quit IRC16:49
*** Cibo_ has joined #zuul16:59
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add lock state to node listing  https://review.openstack.org/43439717:05
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_diskimage_build_only test  https://review.openstack.org/43326517:14
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_dib_upload_fail test  https://review.openstack.org/43323517:14
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_image_upload_fail test  https://review.openstack.org/43327017:14
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Re-enable working test_builder.py tests  https://review.openstack.org/43326217:14
pabelangerjeblair: Shrews: okay, I've updated 433235.17:14
pabelangerfor loop based on http://stackoverflow.com/questions/3345785/getting-number-of-elements-in-an-iterator-in-python17:14
mordredjeblair: fwiw, I'm tracking down shade's gate being broken by what I think is a new nova microversion. I do not believe this will impact my ability to get logging done - but just FYI as to why there are so many shade patches flying atm17:19
Shrewsmordred: oh joy17:23
mordredyah. tell me about it17:23
mordredsuch things will be made better once we're microversion aware - but that's not going to in the next couple of weeks17:23
jeblairmordred: ack.  also: *ackthpth*.17:23
mordredjeblair: I wholeheartedly agree17:24
clarkbmordred: if you aren't microversion aware wouldn't you just use the base 2.0 api and not worry about changes?17:25
mordredclarkb: nope. not being microversion aware and using novaclient means we get latest17:26
clarkbmordred: couldnt you just hardcode the 2.0 version with novaclient though17:26
clarkbI guess thats what I am suggesting, if not microversion aware then safeest option is base option17:26
mordredpossibly - but since we've been using 'latest' so far, I'd need to verify that we didn't break things17:26
mordredby 'downgrading' in some places - it's probably fine - and probably a good idea17:27
*** Cibo_ has quit IRC17:47
*** hashar has quit IRC17:52
*** saneax-_-|AFK is now known as saneax17:57
*** jamielennox is now known as jamielennox|away18:13
*** saneax is now known as saneax-_-|AFK18:48
SpamapSjeblair: FYI, had a great discussion with some networking folks about a feature they'd like to see in Zuul: https://storyboard.openstack.org/#!/story/200086918:54
mordredSpamapS: neat. is that mostly so that they can start responding to issues as they roll in?18:58
SpamapSmordred: yes, some of the jobs take 5 hours, some 9 hours.18:58
mordrednod18:59
SpamapSI did wonder if they could have post-1, post-2, post-3, and just let the comments from each pipeline feed into the others.18:59
clarkbthat info would be good to add to the story (use cases are often really handy when designing for features)18:59
mordredSpamapS: it's _possible_ (although I'll let jeblair tell me I'm wrong) that that could be considered a case of fail-fast - we've talked before about being able to notice and do something more quickly if some of the jobs associated with somethign fail18:59
clarkblike maybe you only want to know if something fails but otherwise aggregate successes19:01
mordredwhere $do_somthing might be vague/hand-wavey/configurable ... but the "let the dev know that the job failed pep8 before the entire devstack job is done running" - or "reparent to the nearest non-failing change if we notice that" or whatnot19:01
mordredclarkb: yah19:01
clarkb(which is personally what I'd want in that use case)19:01
SpamapSclarkb: good point, added19:01
mordredbut there are a few different potential aspects to "set of jobs take a while, action/reporting before the total set is done is desirable" for many values of each of those things19:01
SpamapSSeems like the simplest thing to do is just allow reporters for jobs19:02
mordredSpamapS: possibly so. I bring up the other thing to raise the possibility that this is actually related to a few other features that have been desired for a while, so something that takes them all into account might be warranted19:10
mordredor they may be completely unrelated19:10
jeblairmordred, SpamapS: yeah, i think we can make pipeline reporting a bit more granular -- note that the current design is mostly based on traditional gerrit because a partial report is expensive for a human to parse in gerrit (each partial report is another message).  we mask some of that in openstack, but it doesn't necessarily make it less so.  however, there's now a test results plugin for gerrit which we hope to start using once we upgrade.  ...19:29
jeblair... that greatly reduces the (mostly human) costs associated with partial reports.19:29
jeblairi'm not sure where on the spectrum github falls, but that's something to keep in mind too19:29
SpamapSjeblair: github has test statuses... and those can be [1runner]:[Nstatuses]19:30
SpamapSjeblair: in this case, _they want the partial result to bother a human_19:31
SpamapSbecause it's the longer part of their pipeline and thus they need to fail fast and get fixes into the pipeline as fast as possible, so it's worth the noise.19:32
jeblairSpamapS: yep19:32
SpamapSThey may also want successes to trigger humans19:32
jeblairi think that will not be too hard to accomplish without major changes19:33
jeblairi'd describe this as 'partial reports' (jobs keep running and issue a final report) vs 'fail fast' (all jobs aborted when first fails and there's a single report)19:36
SpamapSyeah, two different things19:40
*** hashar has joined #zuul19:44
jeblairSpamapS: cool, i left notes on potential implementation in the story19:48
SpamapSsweet19:48
SpamapSSitting here with some people who bounced off Zuul a year ago (From OpenNFV and OpenDaylight), so if you have questions for them, let me know19:49
mordredSpamapS: I think we mostly want to let them know we value them as humans and that their concerns are all things that are front and center in the v3 work - so we hope they don't stay bounced off19:52
SpamapSThat's basically what I said: We weren't really a community for not-OpenStack people yet, but we are now, and we want to help.19:53
mordred++20:03
mordredSpamapS: also tell them I'd love to buy them all drinks next time we're colocated :)20:03
*** jamielennox|away is now known as jamielennox20:36
openstackgerritAntoine Musso proposed openstack-infra/zuul master: Expose commitMessage as a Change attribute  https://review.openstack.org/22279121:23
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Run zuul v3 launcher  https://review.openstack.org/43334721:44
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Inherit playbooks and modify job variance  https://review.openstack.org/43048321:47
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Split merger and launcher git roots  https://review.openstack.org/43045621:47
jheskethMorning21:55
jeblairjhesketh: good morning!  can you take a look at my response on 433964?22:02
jheskethsure22:02
jheskethjeblair: yep, makes sense22:03
jeblaircool22:03
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Re-enable test_queue_names  https://review.openstack.org/42988322:03
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove misleading log statements  https://review.openstack.org/43397022:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Squelch ref-replication gerrit warnings  https://review.openstack.org/43395922:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add Git driver  https://review.openstack.org/43394222:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove gearman settled check  https://review.openstack.org/43394322:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add (minimal) support for topic-changed event  https://review.openstack.org/43396622:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename zuul-server zuul-scheduler  https://review.openstack.org/43398022:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct getGitwebUrl  https://review.openstack.org/43396422:05
jeblairthe git driver change collided with the split repo paths change, so that's a fix and rebase22:05
jeblairi'm going to self re-approve that since it's not a substantial change22:06
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add Git driver  https://review.openstack.org/43394222:30
*** hashar has quit IRC22:32
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove gearman settled check  https://review.openstack.org/43394322:32
jeblairShrews: i left two comments on 43373622:43
pabelanger64 bytes from nl01.openstack.org (23.253.92.28): icmp_seq=1 ttl=61 time=0.721 ms22:56
pabelangerwaiting for puppet to run now22:57
jeblairpabelanger, mordred, jhesketh: i kind of want to disable the behind_dequeue test until we have a chance to dig into it23:09
jeblairi think it's going to burn too much of our time this week and next23:09
pabelangerno objections here23:10
jheskeththat sounds sensible to me23:10
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Remove misleading log statements  https://review.openstack.org/43397023:11
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Squelch ref-replication gerrit warnings  https://review.openstack.org/43395923:11
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Add (minimal) support for topic-changed event  https://review.openstack.org/43396623:11
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Rename zuul-server zuul-scheduler  https://review.openstack.org/43398023:11
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Correct getGitwebUrl  https://review.openstack.org/43396423:11
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Temporarily disable test_dependent_behind_dequeue  https://review.openstack.org/43455323:11
jeblairpabelanger, jhesketh: okay, that's the stack rebased on 434553; if you could +3 that one i'd appreciate it :)23:11
pabelangerFeb 15 23:11:51 nl01 puppet-user[20255]: Finished catalog run in 5.62 seconds23:12
pabelangercool, puppet running on nl01.o.o now23:12
mordredjeblair: ++23:13
jheskethdone23:13
pabelangerdone also23:14
Shrewsjeblair: ack on ProviderManager comment, though the main NodePool thread does not have managers. They're down in the ProviderWorker threads. I'm going to have to rework moving them to the main thread and providing access for the child threads.23:20
pabelangerjhesketh: Shrews: mind a review on https://review.openstack.org/#/c/433235/23:21
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Temporarily disable test_dependent_behind_dequeue  https://review.openstack.org/43455323:23
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Squelch ref-replication gerrit warnings  https://review.openstack.org/43395923:24
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Correct getGitwebUrl  https://review.openstack.org/43396423:25
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add (minimal) support for topic-changed event  https://review.openstack.org/43396623:25
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable test_dib_upload_fail test  https://review.openstack.org/43323523:26
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Remove misleading log statements  https://review.openstack.org/43397023:26
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Rename zuul-server zuul-scheduler  https://review.openstack.org/43398023:26
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable working test_builder.py tests  https://review.openstack.org/43326223:27
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable test_diskimage_build_only test  https://review.openstack.org/43326523:28
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable test_image_upload_fail test  https://review.openstack.org/43327023:28
pabelangermerges!23:31
jeblairShrews: ah, i may have crossed wires with the old code which used Nodepool.getProviderManager23:42
jeblairShrews: so if you decide you want them on the main nodepool object, you can probably re-use much of that23:43
jeblairShrews: alternatively, we could move deleting to be something that the providerworker kicks off23:43
jeblairShrews: (in its main run loop, it could assign handlers, complete handlers, start deleters)23:44
jeblairjhesketh: 432211 is a forward-port of a patch you have +2d on master if you want to +3 it on v323:46
Shrewsjeblair: no, i don't want each providerworker iterating through all nodes continuously. i'll move the managers up a level23:46
openstackgerritMerged openstack-infra/zuul master: Fix missing mutex release when aborting builds  https://review.openstack.org/38498023:49
jeblairShrews: makes sense23:49

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!