Tuesday, 2017-02-07

openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix race in waitUntilSettled  https://review.openstack.org/42993900:05
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Fix race in waitUntilSettled  https://review.openstack.org/42993900:06
jeblairi fixed that by reading test debug logs \o/00:06
clarkbjeblair: so the count of events would be off?00:10
clarkber count of completed events00:10
jeblairclarkb: well, correct at the time, but when a nodepool request completes, it makes a new event.  so we need to make sure we check events after checking for outstanding nodepool requestss.00:12
jeblairbasically, there's a right order to check that race-free00:12
clarkbright, I mean on the assertion side after waiting for settled things were failing beacuse numebrs came out wrong?00:13
clarkb(trying to understand how the failures manifested)00:13
jeblairclarkb: ah -- the failure i saw was premature test ending, because a waitForSettled returned true before jobs ran00:13
jeblairso yeah, any of the assertions could fail (pipelines not empty, jobs not run, etc)00:14
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Implement job aborting  https://review.openstack.org/42630600:17
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Set job class attributes in __init__  https://review.openstack.org/42885900:18
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Run pre and post playbooks  https://review.openstack.org/42749400:18
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add SourceContext class  https://review.openstack.org/42784600:18
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Report exceptions in launcher  https://review.openstack.org/42849900:18
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add some gearman related debugging  https://review.openstack.org/42853000:19
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Use ZUUL_TEST_ROOT in launcher  https://review.openstack.org/42839100:19
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add inheritance debugging  https://review.openstack.org/42906500:19
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Re-enable test_mutex  https://review.openstack.org/42912200:21
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Re-enable test_mutex  https://review.openstack.org/42912200:21
jeblairadam_g: ^ updated that to include playbooks (i ran tests/make_playbooks.py)00:22
adam_gjeblair: ah cool, thanks00:34
*** pleia2_ is now known as pleia201:04
*** nt has quit IRC05:08
*** saneax-_-|AFK is now known as saneax06:45
*** jamielennox is now known as jamielennox|away07:59
*** jamielennox|away is now known as jamielennox08:06
openstackgerritOleksandr Karpenko proposed openstack-infra/zuul master: create directory inside state_dir/times if needed  https://review.openstack.org/43011208:17
*** hashar has joined #zuul08:18
*** abregman has joined #zuul08:32
*** abregman is now known as abregman|afk10:31
*** abregman|afk is now known as abregman12:00
*** Cibo_ has quit IRC12:22
*** Cibo_ has joined #zuul12:22
*** saneax is now known as saneax-_-|AFK12:53
*** nt has joined #zuul12:59
openstackgerritMonty Taylor proposed openstack-infra/nodepool master: Consume Task and TaskManager from shade  https://review.openstack.org/41475913:55
pabelangermorning14:38
pabelangerjeblair: mordred: Shrews: any ideas of how many (new) servers we want for the PTG?14:51
pabelangerwill they eventually become production servers or is this just for testing things?14:52
*** bhavik1 has joined #zuul15:17
jeblairpabelanger: why don't we do an all-in-one zuulv3 server?  and maybe a second server for the nodepool launcher?  i'd guess the nodepool launcher could become production later.  the zuulv3 server might end up getting wiped, or we might morph it into the main zuulv3-scheduler server.15:33
pabelangerjeblair: WFM15:34
*** Cibo_ has quit IRC15:40
*** Cibo_ has joined #zuul15:45
*** Cibo_ has quit IRC15:50
*** Cibo_ has joined #zuul15:51
*** Cibo_ has quit IRC15:55
*** bhavik1 has quit IRC15:55
*** Cibo_ has joined #zuul16:17
*** openstackgerrit has quit IRC16:35
jeblairpabelanger: comments on 42988317:06
*** Cibo_ has quit IRC17:09
*** Cibo_ has joined #zuul17:10
pabelangerjeblair: great, thanks for the comment. In fact, we no longer need __eq__ in the Project object. I'm assuming we can drop it too?17:14
*** openstackgerrit has joined #zuul17:16
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Re-enable test_queue_names  https://review.openstack.org/42988317:16
pabelangerupdates^17:16
jeblairpabelanger: yes17:18
*** Cibo_ has quit IRC17:19
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Re-enable multiple gerrit connection test  https://review.openstack.org/40669917:20
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Re-enable multiple gerrit connection test  https://review.openstack.org/40669917:20
*** Cibo_ has joined #zuul17:20
jeblairjamielennox: ^ updated to add newly required playbooks17:20
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Re-enable test_abandoned_not_timer  https://review.openstack.org/42798517:23
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Re-enable test_abandoned_not_timer  https://review.openstack.org/42798517:24
jeblairSpamapS: ^ updated to add newly required playbooks17:24
*** Cibo_ has quit IRC17:27
*** abregman has quit IRC17:35
*** saneax-_-|AFK is now known as saneax17:36
pabelangerokay, nodepool-launcher server should be ready for review: https://review.openstack.org/#/q/topic:infra-ptg-pike17:40
jeblairpabelanger, mordred: see comments on 42985017:42
jeblairpabelanger: w00t!17:43
pabelangerjeblair: sure, I'll dive more into it shortly17:48
SpamapSjeblair: oh thanks.. I've been staring at the problems with that, btw. I have a few local changes that get it further, but ultimately I think something happens that makes it generate NullChange instead of FakeChange.17:50
SpamapSstarting to get a little cross-eyed at the fakes17:51
jeblairSpamapS: i think the timer is expected to generate nullchange objects?17:52
SpamapSjeblair: right, but those don't interact as well with the fakes in v3 for some reason. THat's where my debugging is at right now.17:55
*** hashar has quit IRC18:00
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Assign node set to node requests  https://review.openstack.org/42842818:15
Shrewsjeblair: I believe I addressed all of your concerns now for ^^^^18:16
jeblairShrews: yeah, lgtm18:17
Shrewsjeblair: and the hang I experienced yesterday had to do with moving storeNode() to the NodeLaunch thread and not giving the thread time to complete from the poll actions18:17
Shrewsthat was a doozy18:17
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Add 'hostname-format' to provider config section  https://review.openstack.org/42984118:25
Shrewsjeblair: and the bike is sheddededed ^^^18:25
pabelangerShrews: jeblair: left comment on 429841 about the usage of 'template'18:29
Shrewspabelanger: yeah, i don't 'template' there either. can easily remove it if there is agreement18:31
pabelangerShrews: so we do in fact remove it in nodepool.yaml for openstack-infra. Since we are refactoring now, we can likely drop it, but want to make sure everybody is cool with it18:37
*** saneax is now known as saneax-_-|AFK18:58
jheskethMorning19:01
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Re-enable test_abandoned_not_timer  https://review.openstack.org/42798519:32
SpamapSjeblair: oo, progress. Now with your playbook, _one_ job actually does get held in builds. :)19:35
SpamapShttp://paste.openstack.org/show/598008/19:38
SpamapSmy new fail19:38
SpamapSException: Node <Node 0000000000 static:ubuntu-xenial> is not locked19:39
SpamapShttp://paste.openstack.org/show/598009/ is better19:40
SpamapSShrews: around?19:40
ShrewsSpamapS: infra meeting atm19:40
SpamapSoooooooh19:40
SpamapSderp19:40
SpamapSShrews: ok, when you're available, I need to try and understand if and why my test might be trying to use unlocked nodes19:49
*** hashar has joined #zuul19:51
ShrewsSpamapS: ok. kinda sorta freeish now19:57
ShrewsSpamapS: is there a review #?20:00
SpamapSShrews: I've been digging through, I think it's a logic error in the launcher.20:00
SpamapSthere may be a path in where acceptNodes isn't called before useNode20:00
SpamapSShrews: yeah 42798520:01
SpamapSI think an exception elsewhere is causing the problem.20:01
ShrewsSpamapS: so, what zuul is supposed to do when it gets a FULFILLED node request is go through the node set and lock each one, then set its state to IN_USE.20:03
SpamapSShrews: yeah this isn't nodepool's problem at all20:03
SpamapSit's a race20:03
ShrewsSpamapS: but, i'm not familiar enough with the zuul code to say if it is doing that.20:04
SpamapSI put a big sleep in right before the failing assertion and the test passes20:04
SpamapSShrews: but thanks anyway :)20:04
Shrewsnot suggesting nodepool is the issue. zuul is supposed to do those things *after* nodepool changes the request to FULFILLED20:05
Shrewsbut if it is a race, that's really going to be fun  :)20:05
SpamapSwell but there's no nodepool running anyway20:06
SpamapSjeblair: did you have a patch stack addressing the test races you were seeing? I may need to rebase on it.20:06
ShrewsSpamapS: i do recall jeblair putting something up for waitUntilSettled20:07
ShrewsSpamapS: https://review.openstack.org/42993920:08
SpamapSlet's see if we rebase on that what happens..20:14
jeblairSpamapS: that merged, and i rebased your change on the branch tip which includes it in order to get the playbooks20:16
jeblairSpamapS: from the logs, it looks like zuul is canceling a build during a reconfiguration (and so returns the nodeset), but then tries to start the build again (and tries to re-use the existing, already returned nodeset).20:27
jeblairSpamapS: there's probably at least two bugs here: 1) a job canceled in reconfiguration shouldn't be relaunched; 2) the job shouldn't have been canceled during reconfiguration.20:29
*** ianw has quit IRC21:07
jeblairSpamapS: i see why the job is canceled -- it does not match the newly defined job because the newly defined job is in a different repo.  basically, the updateConfigLayout trick used in that test isn't going to work anymore.  probably we instead want to make a live configuration change to the repo with the configuration rather than changing to a different repo.21:08
*** harlowja has quit IRC21:09
jeblairSpamapS: the re-use of the nodeset is a separate issue; i don't know if we're going to have a test that covers that or not.  we can reproduce that on its own by having a job with the same name before and after reconfiguration, but different attributes.21:10
*** harlowja has joined #zuul21:35
*** jamielennox is now known as jamielennox|away21:42
Shrewsjeblair: What do you think we should do with READY nodes that have been created for a node request, but that request was removed while we launched them? Leave the nodes for other node requests, or delete them?21:44
ShrewsFor common node types, it would be more efficient to leave them. But for rarely used node types, they might linger for a while, using resources.21:45
clarkbShrews: current behavior us they linger until the cleanup timwout21:46
Shrewsclarkb: that's the approach I favor, too21:47
pabelangerreminds me of https://review.openstack.org/#/c/83980/ something I had issues with when I had limited cloud resources21:47
Shrewspabelanger: interesting. in fact, i'm not sure min_ready is something that makes sense in the new v3 world since allocation is now per-request21:49
jeblairShrews, clarkb: ++21:49
jeblairShrews: i think we still want to support min-ready so that we can be more responsive21:49
jeblair(i don't think we have to have it by the ptg :)21:50
Shrewsjeblair: we may need some thought around that then. which node types do we build upfront?21:50
Shrewsperhaps that logic already exists in nodepool though. haven't seen that bit of it yet21:51
clarkbShrews: ya its label specofic currwbtly21:52
jeblairShrews: yeah, min-ready is attached to a label, so nodepool can just periodically check that the min-ready levels are satisfied, and if not, spin up some more nodes21:52
jeblairShrews: that's the "watermark" in "watermark_sleep"21:52
jeblairit refills the pool21:52
Shrewsmmm, code simile21:52
pabelangernever did finish that patch, ended up abandoning it21:53
jeblairShrews: to your original question, i think "leave them" for now is the most compatible with where we want to go in the future21:55
pabelangerokay, early EOD for me.  If people would like to review https://review.openstack.org/#/q/topic:infra-ptg-pike I can start spinning up nodepool-launcher for tomorrow.  zuulv3-dev, still needs some work21:56
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Unallocate new nodes if request is pulled  https://review.openstack.org/43045321:58
Shrewsooh, that's incomplete, actually22:00
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Unallocate new nodes if request is pulled  https://review.openstack.org/43045322:03
Shrewsthat should do it22:03
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Split merger and launcher git roots  https://review.openstack.org/43045622:05
jeblairpabelanger: reviewed!22:16
*** hashar has quit IRC22:16
*** jamielennox|away is now known as jamielennox22:42
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Split merger and launcher git roots  https://review.openstack.org/43045622:51
openstackgerritAdam Gandelman proposed openstack-infra/zuul feature/zuulv3: Drop test_node_label  https://review.openstack.org/43047323:20
jeblairadam_g: is there an included python functions test fixture file we can delete with that?23:29
adam_gjeblair: oh yes, one min23:31
SpamapSjeblair: Thanks! Makes perfect sense.. and I remember thinking "I wonder if trying to switch repos is going to screw things up?" but forgot that while debugging. :-P23:33
openstackgerritAdam Gandelman proposed openstack-infra/zuul feature/zuulv3: Drop test_node_label  https://review.openstack.org/43047323:49
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul feature/zuulv3: Re-enable test_abandoned_not_timer  https://review.openstack.org/42798523:52
SpamapSjeblair: ^ so the latest version uses the upstream repo and copyies it into layout-idle. During reconfigure, this aborts the jobs, which I believe is the exact bug the test is trying to avoid.23:52
SpamapSor, not avoid.. but.. it's the bad behavior that we want to make sure isn't happening23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!