Friday, 2016-11-18

SpamapSjeblair: should we make it automatically ignore /COMMIT_MSG ?00:00
SpamapSI say that, because that will always show up in every change.00:00
SpamapSand would we even go so far as to ignore it in change_matchers.MatchAllFiles ?00:01
adam_ghey. so digging into some tests, looking through test_success_pattern. it looks {success, failure}-pattern has been replaced by {success, failure}-url, is that correct?00:02
jeblairSpamapS: i think there was some stuff to handle that at some point, though i may have broken it.  git spelunking for when i created irrelevant-files may be in order.  regardless, yeah, i think whatever is necessary to remove commit_msg from consideration is the right thing.  i think the complexity may come with merge commits?  in that case the *only* file is commit_msg...00:05
SpamapSjeblair: with merge commits, I'd think we'd be applying file matchers to the children..?00:08
jeblairadam_g: yes -- and i *think* that's just a name change00:09
jeblairadam_g: ('cause what's a success-pattern? :)00:09
adam_gjeblair: is there anything to be considered wrt backward compat of job config or is that out the window for v3?00:10
jeblairadam_g: out the window00:10
adam_g\o/00:10
jeblair(it's so different, the answer will involve phrases like 'conversion guide' and 'translation tool')00:11
jeblairSpamapS: this is among the reasons skip-if is problematic.  we don't really get that information from gerrit, and i think extracting it would be non-trivial.  i think the existing solution just tried to get out of the way and let the job run on a merge commit, which is probably the safest simple thing.00:12
SpamapSahh, seems dbe6fab14f446b1929b2b30c31a18d489d16b272 had the logic I've been thinking of00:12
jeblairSpamapS: yep.  that's desirable.  did i break that?00:12
SpamapSjeblair: I believe so yes.00:13
jeblairyay tests00:13
SpamapS\o/00:13
SpamapSSo I think there are really now 2 things to test00:13
SpamapS1) On a regular change with file changes, based on irrelevant-files, does the job run/not run.00:14
SpamapS2) On a merge commit, is irrelevant-files ignored entirely?00:14
SpamapSIs that too simplistic?00:14
jeblairSpamapS: that sounds right.  as i continue to think about this, i wonder what would have happened before if a merge commit *did* have a file (i think this happens when it resolves a conflict).  i wonder if in that case, the normal skip-if would apply and we might end up not running a job we should (because the merge commit fixed a conflict in an irrelevant file, but a child change touched a relevant file).  that behavior sounds wrong and yours ...00:17
jeblair... sounds right.00:17
jeblairSpamapS: (but i bet the old logic didn't check whether it was a merge commit, it just narrowly handled the case of commit with no files)00:18
jeblairSpamapS: if you want to be bug-compatible with v2 on that and backlog improving it, that would be okay.  :)00:19
SpamapSjeblair: indeed. Well I think first I'll test that irrelevant-files works on regular changes.. and then add another commit that tests merges.00:19
SpamapSand yeah, if the logic gets hairy, I'd be fine with an "expected fail" test.00:19
SpamapS"We know this is broke.. we'll decide when we let it un-break"00:19
SpamapSjeblair: actually you did not break the /COMMIT_MSG assumption00:22
SpamapSjeblair: something else is still letting jobs run even when they match. :-/00:22
jeblairSpamapS: yay me? :)00:25
SpamapSyes yay you00:26
* SpamapS is in pdb hell00:26
jeblairi get a momentary reprieve until you find out what i *did* break00:26
SpamapSI'm still leaning toward me breaking things in my test harness00:32
SpamapSbut.. that's just my self deprecating side00:32
SpamapSlooking like maybe a pretty deep logic problem in variant inheritance... getting closer anyway00:37
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove builds that are no longer in use  https://review.openstack.org/39930700:38
jeblairclarkb, ianw: ^ an actual tested process for removing an image (and provider)!00:39
jeblairSpamapS: certain to be my fault if so; sic me on it as soon as your ready (though i have EOD brain now, so probably tomorrow at earliest).00:40
jeblair*you're* ready (told you; eod brain)00:41
clarkbjeblair: thanks will try to read in the morning.currently dealing with twins that have fevers do not recommend00:43
SpamapSjeblair: running into similar trouble here. :-)00:44
adam_gjeblair: if you've got anything left in that EOD brain.. any tips on whats happening at https://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/launcher/server.py?h=feature/zuulv3#n370 and why that url is stomping on the urls configured on the job?00:45
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Use constants for zk states  https://review.openstack.org/39930900:57
jeblairadam_g: that should be the url that gets used if there is no success-url (or failure-url) defined.  ie, the normal behavior is "use whatever url the launcher gave us".00:58
jeblairadam_g: realistically, in v3, the launcher should not be responsible for that, so if you wanted to rip that part out, feel free.  :)00:59
adam_gjeblair: ok, yeah, it looks to be stomping on the defined success-url AFAICS00:59
jeblair(ripping it out would probably be a distraction though, since apparently there's a logic reversal; you may just want to backlog that idea)01:00
adam_gwell, ill certainly be digging thru more tomorrow to figure out whats going on. the test seems to be legit failing now on the wrong message being reported. still dont fully grok what goes on once the job is launched there.01:01
jeblairadam_g: that is strange.  it looks like Item.formatJobResult dtrt.  that should be what's called by Reporter._formatItemReportJobs to produce that part of the report01:04
jeblairadam_g: oh!01:06
jeblairadam_g: there is an exception handler in there that will hide an exception in formatting the url; i bet that's it01:06
* adam_g takes a look01:07
jeblairhttps://git.openstack.org/cgit/openstack-infra/zuul/tree/zuul/model.py?h=feature/zuulv3#n93901:07
adam_ghmm. the job seems not to have the message or url params configured at that point01:09
adam_gback to the top01:09
ianw"The repository 'http://mirror.regionone.osic-cloud1.openstack.org/ubuntu xenial Release' is not signed."01:14
*** mgagne has quit IRC06:18
*** mgagne has joined #zuul06:21
*** mgagne is now known as Guest5228506:21
*** jamielennox is now known as jamielennox|away06:36
*** willthames has quit IRC07:02
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul: Refactor skip-if tests to use irrelevant-files  https://review.openstack.org/39941507:47
SpamapSjeblair: so, when you're around... in writing the test above^ I had some weird variant stuff just by doing the configs a different way. If I defined the job in the 'jobs' section and then mentioned it in the projects section.. I got two variants, one of which did not have the irrelevant-files section, and thus, which would still run.07:48
*** jamielennox|away is now known as jamielennox08:03
*** openstackgerrit has quit IRC08:03
*** openstackgerrit has joined #zuul08:04
*** hashar has joined #zuul08:10
*** hogepodge has quit IRC08:15
openstackgerritJoshua Hesketh proposed openstack-infra/zuul: Fix missing mutex release when aborting builds  https://review.openstack.org/38498009:15
*** willthames has joined #zuul09:55
*** willthames has quit IRC10:34
*** rcarrillocruz has joined #zuul10:53
*** rcarrillocruz has quit IRC11:39
*** rcarrillocruz has joined #zuul11:40
*** abregman|afk has quit IRC12:16
*** hogepodge has joined #zuul12:31
*** abregman has joined #zuul12:42
*** abregman is now known as abregman|afk12:57
Shrewsjeblair: pabelanger: Still injecting caffeine into my system, but I don't think the test in 399230 is valid.13:13
Shrewsleft a comment13:14
*** abregman|afk is now known as abregman13:49
pabelangermorning13:58
pabelangerShrews: looking13:58
*** lindycoder has joined #zuul14:00
*** abregman has quit IRC14:03
*** abregman has joined #zuul14:15
pabelangerShrews: also left a comment14:20
*** rcarrillocruz has quit IRC14:30
Zaraheh, I should probably write to the list in a bit-- if you're having problems with storyboard, and filing an issue about storyboard itself isn't possible, please feel free to ask us about stuff in #storyboard; it might be that we've done a bad job explaining something, or that something is still in-progress, but we can only improve things when we know what isn't working.14:43
Zaraapi issues should be logged at: https://storyboard.openstack.org/#!/project/456 , ui issues at: https://storyboard.openstack.org/#!/project/45714:46
*** rcarrillocruz has joined #zuul14:51
*** abregman is now known as abregman|afk14:52
Zaraoh, and now we have boartty (thanks, jeblair)! https://storyboard.openstack.org/#!/project/85414:58
Zara(so if the issue for the web ui would be 'everything', maybe try that, though it's still young.)14:59
ShrewsZara: I assume that is posted in response to my email response to SpamapS. It's difficult to create an issue when it's the entire UX that is tripping me up.14:59
ShrewsAs someone new to using it, it is not intuitive as to relationships between Dashboards, and Worklists, and Boards, and Stories and Projects (as a start to my issues).15:00
Shrewsbut, happy to chat outside of #zuul if you want more details15:01
ZaraShrews: yeah, it's to the whole thread really, and thank you! I've got to vanish for a meeting but I'll be back shortly :), generally #storyboard is the place to go if filing an issue via the ui itself isn't possible15:01
SpamapSShrews: what response btw? Haven't read my list mail lately15:08
ShrewsSpamapS: just replied a little bit ago15:08
SpamapSAh, I'll go peek15:09
SpamapSShrews: we definitely need people to assign themselves and in many instances add tasks (like on the test re-enablement story)15:12
SpamapSBut you can adjust just squawk at me and I can do it for you if the UI is truly annoying you.15:12
SpamapSs/adjust/also/. DYAC15:12
*** abregman|afk has quit IRC15:22
*** abregman has joined #zuul15:29
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add tests for enable / disabled labels  https://review.openstack.org/39964215:36
pabelangerShrews: jeblair: ^ that is going to fail, but wanted to first see if that is the correct way to disable a image from being built.  min-ready was the mentioned in zuulv215:37
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/zuul: Re-model the job auth  https://review.openstack.org/39964515:38
rcarrillocruzweee15:39
rcarrillocruzi'm back to doing zuul things15:39
rcarrillocruzslowly getting settled in Granada15:39
pabelangerzuul all the things15:39
rcarrillocruz(helo folks!)15:39
pabelangerrcarrillocruz: how did your talk go? Was it recorded?15:39
rcarrillocruzi hope not!15:40
rcarrillocruzit was ok i guess15:40
rcarrillocruzi mean15:40
rcarrillocruzzuul is super dense15:40
rcarrillocruzfor people not from openstack land15:40
rcarrillocruzcan be hard to grasp concepts15:40
rcarrillocruzi saw quite a few 'i understand nothing' faces15:40
rcarrillocruzi heard you going to talk at openstack canada day ?15:41
mordredrcarrillocruz: yup. I know those faces well15:41
rcarrillocruzpabelanger: ^15:41
rcarrillocruz:D :D15:41
pabelangerrcarrillocruz: yup, next Tuesday. Infra-cloud15:41
rcarrillocruzawesome15:42
rcarrillocruzyolanda told me she'll meet you there15:42
rcarrillocruzshe has some customer meeting i believe in montreal15:42
pabelangerhaha, ya. I think we'll have some snow for her too15:42
yolandapabelanger, neh... i looked at the weather reports and is not that bad15:43
yolandagoing to go shopping anyway, i really don't have clothes for cold weather, just some ski equipment15:43
pabelangerWarm jacket an shoes for sure15:44
*** abregman is now known as abregman|afk15:44
rcarrillocruzit's going to take a while for me to get used to Granada15:44
rcarrillocruz0-1 degrees now15:44
rcarrillocruzin Marbella is like 2015:44
rcarrillocruzsigh15:44
mordredrcarrillocruz: oy15:45
mordredrcarrillocruz: why did you move away from marbella? it's so purty there :)15:45
rcarrillocruzinorite? my wife got an offer to open a new branch (she's an optics shop manager) , and life cost here is WAY CHEAPER to Marbella. Marbella is actually the first or second most expensive city, we need a new house and we'd have to go to very far outskirts to get something decent15:48
rcarrillocruzgood news is that we're just 1h:30 drive from there, so we can go whenever on weekends15:48
rcarrillocruzmordred , SpamapS : just going a round over the zuul v3 tasks, https://storyboard.openstack.org/#!/story/2000772 is complete no?15:50
SpamapSrcarrillocruz: no idea15:51
rcarrillocruzor jeblair if you're around ^15:51
rcarrillocruzi don't see anything that would prevent a playbook from running, we already transform JJB to ansible playbook on-the-fly now15:52
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add tests for enable / disabled labels  https://review.openstack.org/39964215:54
pabelangerShrews: jeblair: ^ I believe that restores v2 functionality for disabling image-builds15:54
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add tests for enable / disabled labels  https://review.openstack.org/39964215:58
mordredrcarrillocruz: nod. having recently moved to a much cheaper place from a much more expensive place - I can tell you it's worth it :)15:58
pabelanger++15:59
rcarrillocruzyeah, i will also literally pick my son with a 4min walk at kindergarten, whereas in Marbella you need a car for EVERYTHING15:59
pabelangerplan on doing that too, once we sell our condo15:59
rcarrillocruzhere public transport is also a thing, Marbella is non-existant, buses are terrible15:59
mordredrcarrillocruz: for https://storyboard.openstack.org/#!/story/200077216:02
mordredrcarrillocruz: in v2.5 we do the jjb transform, but in v3 we need to find playbooks/roles in paths and run those16:02
rcarrillocruzah, gotcha16:02
mordredso there will be some work to take what's in 2.5 and expand/improve it16:02
mordredI think jeblair or jhesketh started the work of forward-porting the 2.5 code into v3 already16:03
jeblairyeah, the forward porting is done.  someone needs to actually write the "get a playbook and run it" code.16:06
jeblairwe also need to deal with roles, but that spec update is still in-progress.16:06
jeblairShrews, pabelanger: responded on 39923016:08
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Use constants for zk states  https://review.openstack.org/39930916:09
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove builds that are no longer in use  https://review.openstack.org/39930716:09
Shrewsjeblair: but, getMostRecentImageUpload() only works per provider. You never actually verify anything about the removed provider16:13
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add tests for enable / disabled labels  https://review.openstack.org/39964216:13
Shrewsjeblair: "The main thing we're doing here is making sure that fake-image is gone from fake-provider2" ... there's nothing there to verify that other than the waitForImageDeletion16:14
Shrewsjeblair: oh, maybe that's enough. now i see the intent16:14
Shrewsdidn't understand the assertEqual there, but the added explanation makes sense now16:15
Shrewsjeblair: +1'd16:16
jeblairShrews: w00t.  maybe i should throw a comment around rando extra-credit test asserts like that.  :)16:18
Shrewsjeblair: i suggested that in the +1, but not a reason to -1 for now16:19
ZaraShrews: okay, back from meeting, though getting on a plane in a bit! As it happens, I've long been concerned about that issue; I started some work toward drafting a guide to how things relate; https://storyboard.openstack.org/#!/story/2000667 is the beginning of an overview. I planned to do more as we got more input on what was confusing, make it prettier, and move it somewhere more visible. *really* the ui lay16:22
Zaraout itself should convey how things relate, but that's a bigger task.16:22
Shrewsjeblair: 399307 ... we haven't yet defined __eq__ for ImageBuild. Need to add that16:23
Zara(and I feel bad for being distracting in #zuul so suggest we continue the conversation in #storyboard)16:24
ShrewsZara: let's pick it up after plane things16:25
Zaranp, and thanks. :)16:26
jeblairShrews: oh, i changed the construction of builds_to_keep to filter down from all builds, so that will actually be object identity matching16:27
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add tests for enable / disabled labels  https://review.openstack.org/39964216:27
Shrewsjeblair: hrm, and just realized... we added provider for equality in ImageUpload, but we also need image name. Same for ImageBuild. *sigh*16:27
Shrewsjeblair: ah16:27
jeblairShrews: (i was adding a third 'list of builds' so i thought i'd go ahead and do that to simplify things so we could do set operations, etc)16:27
jeblairsaves some zk traffic and nixes that race condition too16:28
*** hashar is now known as hasharAway16:29
Shrewsjeblair: gotcha. i'll work up a change for that ImageUpload equality thing16:29
jeblaircool16:31
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add test for provider max-servers enable / disable  https://review.openstack.org/39968116:39
pabelangerjeblair: Shrews: clarkb: that should add coverage when we need to disable a provider (removal for nodepool) and only do uploads (max-servers 0)16:39
pabelangerwhich is broken today :)16:40
clarkbpabelanger: I thought that was working and its max-servers -1, don't do uploads which doesn't work16:41
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Use image name in ImageUpload equality  https://review.openstack.org/39968316:42
pabelangerclarkb: Ya, I call that disabled provider, but max-server -1 is covered now16:43
jeblairpabelanger: left a comment on that16:45
clarkbpabelanger: ok I guess I don't understand what is broken about max-servers: 0 today16:46
pabelangerjeblair: replied16:48
jeblairclarkb, pabelanger: i think we have an option to perhaps have a clearer way to disable images -- simply remove the label section entirely, which is what i did in the test in https://review.openstack.org/39930716:49
pabelangerclarkb: I should have been more clear, sorry about that. max-servers: 0 works correctly today, max-servers: -1 is broken. 399681 adds coverage for both now16:49
jeblairoh this is max-servers.16:49
jeblairi don't think we ever did max-servers=-116:49
pabelangerI thought we did16:49
pabelangerlet me double check16:49
clarkbya pretty sure that <0 max servers implied no image builds for a time16:55
jeblairthought that was only min-ready16:56
pabelanger375448 was the change to master branch, but didn't get tests working16:56
clarkbI want to say it could be toggled provider wide with max-servers, but maybe I crossed the wired there16:57
jeblairclarkb, pabelanger: i'm leaning toward making this not about numbers but more about data structure.  examples: https://etherpad.openstack.org/p/HCNcnSkBdj16:58
jeblair(that works in the v3 branch)16:58
pabelangerlooking16:59
jeblairbasically, if you don't want an image in a provider, remove the 'image' stanza for that image from that provider.17:00
clarkbya that works17:00
jeblairif you don't want an image *at all*, remove all traces of it except the diskimage section (which is needed to tell the builder that it's reponsible for it; once the image is removed, you can remove the diskimage section)17:00
jeblairwe might even be able to smooth that last bit out, actually, with a bit more work (a builder could check its local disk for an image and see that it is responsible for it even if it doesn't show up in its config)17:01
pabelangerjeblair: clarkb: sure, we can do that17:02
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul: Refactor skip-if tests to use irrelevant-files  https://review.openstack.org/39941517:13
jeblairpabelanger, clarkb: if we also want to support max-servers=-1, we can last 681 but should make a corresponding change to the delete obsolete uploads method so that it deletes any already uploaded images17:14
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add test for provider uploads enable / disable  https://review.openstack.org/39968117:17
pabelangerjeblair: clarkb: ^ wasn't had to implement the etherpad, it works out of box :)17:18
jeblairpabelanger: well, that is what i spend yesterday doing :)17:19
Shrewsanyone care to +A 399230? already two +2's17:19
pabelangerjeblair: I do like max-server: -1, as it would be a one line change to disable uploads, images: [] works, but more line of code change17:19
pabelangerjeblair: Yay17:19
clarkbjeblair: https://review.openstack.org/#/c/399307/ being the important one right?17:20
jeblairclarkb: yeah, that's the delete-image-builds change17:20
clarkbI will start reviewing that stack in a bit, then its off to making a new zanata server17:21
openstackgerritMerged openstack-infra/nodepool: Remove obsolete uploaded images  https://review.openstack.org/39923017:24
openstackgerritMerged openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921317:25
openstackgerritMerged openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened  https://review.openstack.org/39919617:25
openstackgerritMerged openstack-infra/nodepool: Remove image delete tests from test_nodepool  https://review.openstack.org/39865717:26
openstackgerritMerged openstack-infra/nodepool: Add image addition test to builder  https://review.openstack.org/39929417:26
SpamapSjeblair: btw, in https://review.openstack.org/399415 .. .My trouble with variants may have been PBKAC..  at first, I had it like this: http://paste.openstack.org/show/589742/17:31
SpamapSjeblair: and what I ended up with was a variant that had no irrelevant-files section and ran every time17:32
SpamapSSeems like anyone might make that mistake.. _OR_ it's a bug and I shouldn't have had that variant. I'm just not sure.17:32
jeblairSpamapS: that's a bug :)17:33
*** Guest52285 is now known as mgagne17:33
*** mgagne has quit IRC17:33
*** mgagne has joined #zuul17:33
jeblairwe shouldn't override things unless they are explicitly set17:33
SpamapSjeblair: k, I'll write it up as a test17:33
jeblairSpamapS: this should probably be a unit test in test_model17:38
jeblair(they are lighter weight and don't run the scheduler)17:39
clarkbjeblair: not sure if you noticed but looks like the table prefix for tests change is still unhappy17:39
SpamapSjeblair: good idea.. should be easier to iterate on that too.17:39
jeblairclarkb: blah17:39
jeblairclarkb: that one is going to be tricky to solve -- it's basically because we're creating a second database object (because we have the main nodepool daemon and the command process running in the same python interpreter)17:43
jeblair(so we're setting up the mappers twice)17:44
clarkbmaybe we should explore the common lock file appraoch? That should be straightforward /tmp/nodepooldbdroplockfile (or put it in the db itself)?17:44
*** rcarrillocruz has quit IRC17:47
mordredok. I just found a thing that wins the prize for being the most absurd programming advice EVER: https://dev.to/dawranliou/never-write-for-loops-again17:50
jeblairmordred: never write with vowels?  only write with punctuation? (hi perl!)17:53
clarkbmordred: if that was rewritten as "use tools that solve common iteration problems well" I would be on board17:53
clarkbjeblair: I enjoyed the use of set differences in that one change btw :)17:54
jeblairyeah, we're a stone's throw away from 'use a functional programming language'17:54
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Re-enable TestWebApp tests  https://review.openstack.org/39971617:54
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Use an external lockfile when dropping the test database  https://review.openstack.org/39971717:56
jeblairclarkb: ^17:56
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Re-enable TestWebApp tests  https://review.openstack.org/39971617:56
clarkbjeblair: thanks17:57
jeblairSpamapS: yeah, i think looking in Layout._createJobTree, it looks like we're assuming that we're going to run at least one variant, so we don't really have the idea that a more specific variant might end up telling us *not* to run the job.  basically, this algorithm works for branch selection, but not deciding whether to run the job in the first place.18:01
jeblair(the algorithm is roughly "find the first variant that matches, use that, but also mutate it with any subsequent variants which also match".  so in this case, the variant without irrelevant-files matches after the one with irrelevant files failed to match)18:03
clarkbjeblair: we might want to make it support TMPDIR, but +2 as is (I think we can make it support TMPDIR if that ever beomces a problem)18:04
jeblairclarkb: let's put that on the roadmap for nodepool v12.  ;)18:05
SpamapSjeblair: yeah, might need to think about order of negative vs. positive filtering18:12
jeblairSpamapS: yeah.  also i'd like to amend my earlier comment -- i think this fails for branch selection in the same way (if you had a job that globally said only run on stable/foo, and thet added that to a project, it would probably run on all changes).18:13
jeblair(which is nice!  it means the problem is generalized!)18:13
SpamapSyeah maybe not as corner case as I originally thought18:14
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove waitForBuiltImages() from nodepool.py  https://review.openstack.org/39972718:25
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove waitForBuiltImages() / JobTracker() from nodepool.py  https://review.openstack.org/39972718:29
pabelangerShrews: don't want to step on toes, but should I start helping in test_commands to enable tests?18:30
Shrewspabelanger: working on image-delete now. feel free to tackle any others18:31
pabelangerack18:31
pabelangerlet me get a drink first18:31
Shrewsi think things have settled down enough to hit em again18:31
jeblairSpamapS: this is curious -- in ProjectTemplateParser._parseJobTree if you instantiate a job on a project-pipeline and do not give it any arguments, it adds the first instance of the job directly to the tree.  so in your exmaple, it should have added the main job definition, and *not* created a new variant without the matcher.18:32
Shrewspabelanger: jeblair: image-delete is going to require provider, image name, and upload ID (only takes an id argument now). I plan to just have 3 required args for those18:33
Shrewsimage-list already provides those values18:34
jeblairSpamapS: so i don't actually know why that failed.  even if we find and fix that, i think there might still be a problem with unwanted broadening of jobs in this manner (if you supply, say, an extra variable to a job in a project-pipeline jobtree, you would end up getting a blank variant instead of the main job.  we might be able to solve that by inheriting from the main job in that case)18:35
jeblair(where the 'main job' is 'the variant of the job that appeared first in the config')18:36
jeblairShrews: ++18:36
SpamapSjeblair: I know it's sort of fundamental to your vision of jobs, but for the record, complexity is why I shy from inheritance. I rather like composition and macros. It's the old crufty C programmer in me. ;-)18:36
SpamapSThat said18:37
SpamapSthis should not be complex18:37
SpamapSJust need to feed it a good matrix of test scenarios.18:37
clarkbfwiw composition and macros worked poorly for us in the jjb world18:37
jeblairSpamapS: yes, this is designed to be very simple for users.  it may require some brain bending on our parts to implement it, but we need to take the hit because of what clarkb said.  :)18:38
jeblairhopefully, the hit won't be that hard once we have, as you say, good test scenarios18:38
jeblairif anyone hasn't seen it lately, this is what we're working to avoid: http://git.openstack.org/cgit/openstack-infra/project-config/tree/zuul/layout.yaml#n124618:40
jeblairbasically lines 1181 through 3560 of that file are completely incomprehensible by a human, and, at this point, likely never to be removed from the file even when no longer needed.18:41
jeblairthat's 2400 lines of ad-hoc regular expression based rules applied to jobs, most of which are on average about 7000 lines away from where they're used.18:42
SpamapSyes, we are aligned on all points. now if i can just get more than 2 minutes at the keyboard ... #ParentLife18:43
jeblairpabelanger, clarkb: so where are we on max-servers=-1?  i think i argued it wasn't necessary because you can pull the image sections from the provider, and pabelanger argued that it could be useful if you want to temporarily disable a provider with a simple one-liner change.  i think i buy that -- that both methods are useful (pabelangers for when you want to suspend service, mine for when you want to end service).18:47
jeblairpabelanger, clarkb: do we want to support both things?18:47
jeblair(part of me wants only one way to do this for simplicity, but i recognize we end up flipping these bits in production a lot)18:48
pabelangerjeblair: clarkb: I don't mind trying it the new way, and skip max-server: -1 for now. Maybe revisit it in the future if images: [] is too much code churn.18:48
clarkbgit and git revert make it easy to do that18:52
clarkbyes its more lines changed but logically its the same sort of change18:52
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Enable test_delete / test_delete_now tests  https://review.openstack.org/39973418:53
jeblairokay, so no more -1s for now then18:54
pabelanger++18:54
pabelangerShrews: jeblair: is it too early to talk about timeline for rolling nodepool zookeeper into production? Do you have something in mind?18:56
clarkbI was thinking about ^ a bit and since uploads haev been so unreliable recently I think we may want some way of prepopulating the zk db with current data when we do that18:57
clarkband that way we don't spend 2 weeks trying to get our images rebuilt and uploaded18:57
jeblairpabelanger: i think it can happen real soon now :)18:57
jeblairclarkb: we could do that, or we could run a zk builder in parallel (maybe finally split it into its own machine like we've been threatening to) then make the prod switch when we have image quorum?18:59
pabelangermordred: do you want to review https://review.openstack.org/#/c/399717 and +3? Adds lockfile to the database drop issue18:59
pabelangerYa, I think the idea of the split19:00
pabelangerI've started on puppet manifests for that19:00
pabelangerand need to revist it19:00
pabelangerrevisit*19:00
jeblairclarkb: (the db->zk migration idea isn't too hard; it would require not only the db migration, but also moving the diskimage files on disk since they are named differently)19:00
clarkbjeblair: ya tahts another option but I think it will take ~2 weeks to reach that19:01
clarkbwhcih may not be the worst thing considering the long holiday coming up19:01
jeblairclarkb: well, the new builder is far more agressive about uploading images19:01
pabelangerthat would actually, be really cool. first run zk nodepool-builder, before we upgrade nodepool.o.o, it should just upload our images across all providers19:01
clarkbjeblair: ya, but the clouds are much more aggressive at failing recently too... :(19:01
jeblairit totally rickrolls the old one19:01
clarkbapparently rax just times out now19:02
jeblairokay, maybe migration is the way to go then so we don't hate ourselves19:02
clarkbbut yes that would be a realtively easy way to transition, let it run on the side for a bit, once it is ready update nodepoold (well the 3 daemons)19:02
jeblairclarkb: two daemons!19:03
jeblairclarkb: (we get to drop the 'images' daemon)19:03
jeblair(or, i guess we could keep it just for the webapp)19:03
clarkbor attach webapp to one of the other two19:03
clarkbbut ya19:03
pabelangerhttps://review.openstack.org/#/c/356484/ can be approved, for puppet-nodepool.  moves ::diskimage_builder to builder.pp19:08
pabelangergoing to rebase the other changes19:08
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_delete_invalid  https://review.openstack.org/39973919:09
Shrewsi think it makes sense to run parallel since we don't really have experience with ZK in production. if we find problems, we don't break the world while we figure it out19:12
jeblairi find that ^ fairly convincing.  clarkb?19:16
clarkbya I think tahts a good reason to do it that way19:16
clarkband if we find the clouds are being annoying maybe thats a good time to dig into that more19:16
jeblairyeah, we'll have some space to poke at that too19:17
Shrewsugh, my 739 change is dependent on two different changes19:21
Shrewscan anyone +3 https://review.openstack.org/399683 ?19:23
jeblairi'm going to approve 717 first19:23
jeblairsince the db drop just tanked another series of changes19:24
Shrewsjeblair: yeah. one of those has my 2nd dependency19:25
* Shrews takes a walk. bbiab19:25
jeblairShrews: i approved 683 behind it; when they merge i'll recheck the others19:25
jeblairShrews: fun fact though, in the future -- while it's usually easy enough to just restack changes to have the right dependency graph, you can actually use Depends-On within the same repo if, for instance, you don't want to restack either of the two forks of the tree your change needs.19:27
jeblair(i thought about doing that yesterday, but one of the series i needed was all -2 anyway so i just restacked)19:28
openstackgerritMerged openstack-infra/nodepool: Use an external lockfile when dropping the test database  https://review.openstack.org/39971719:29
openstackgerritMerged openstack-infra/nodepool: Use image name in ImageUpload equality  https://review.openstack.org/39968319:31
*** abregman|afk has quit IRC19:39
Shrewsjeblair: oh, hah. didn't even consider that.19:53
Shrewsbtw, smoke from forest fires travels a loooong way. almost difficult to be outside19:53
mordredShrews: you have forest fires in NC in November?19:57
Shrewsmordred: oh, there are several burning right now near Asheville19:58
Shrewsmordred: http://wildfiretoday.com/2016/11/17/wildfire-smoke-forecast-for-november-17-2016/20:00
openstackgerritMerged openstack-infra/nodepool: Remove builds that are no longer in use  https://review.openstack.org/39930720:00
mordredShrews: wow20:00
Shrewsthat area is at the worst of the drought scale20:01
Shrewshttp://www.ncdrought.org/20:01
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove waitForBuiltImages() / JobTracker() from nodepool.py  https://review.openstack.org/39972720:03
Shrewspabelanger: btw, i'm changing test_image_delete_snapshot to test_image_delete, just waiting for some merges first20:03
pabelangerShrews: same, working on hold tests20:04
pabelangerthey are an easy enable20:04
* Shrews installs his new Red Hat branded laptop camera covers20:08
openstackgerritMerged openstack-infra/nodepool: Use constants for zk states  https://review.openstack.org/39930920:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_delete_invalid  https://review.openstack.org/39973920:14
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add test_image_delete test  https://review.openstack.org/39975720:15
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Re-enable test_hold test  https://review.openstack.org/39975820:15
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add test_image_delete test  https://review.openstack.org/39975720:18
*** bhavik has joined #zuul20:21
ShrewsSo, image-update... is that the same as requesting a new DIB image build and re-upload? Or just to re-upload using the exiting image?20:21
Shrewspabelanger: clarkb: ^^^20:21
Shrewss/exiting/existing/20:22
pabelangerrebuild the image and upload to provider20:22
pabelangerhttp://docs.openstack.org/infra/nodepool/operation.html#command-line-tools20:22
pabelangerI don't use it often20:22
pabelangerI usually do, dib-image-build, image-upload as 2 steps20:22
Shrewsoh, the -h output did not include such detail20:22
clarkbya workflow tends to be ^ because you only need one build then upload to 20 regions20:23
pabelangerbut, think we should keep it20:23
Shrewshrm, we are already covered by 'image-build' then, since the upload happens automatically. yeah?20:24
pabelangerin fact, I'd likely use it more if image-update all ubuntu-trusty, didn't try to upload an image to tripleo-test-cloud-rh1, since it doesn't have the image today20:24
Shrewswell, current builder wouldn't try to upload to that if the image isn't in the provider config20:25
Shrewsi think we can remove the image-upload option then, if I understand correctly20:26
pabelangerRight, I don't think an issue with zuulv320:26
Shrewsnodepool image-build   <-- builds and uploads at the next scheduler interval20:27
*** bhavik has quit IRC20:27
Shrewswell, builds immediately, then the upload will happen at the next interval (which is currently very quick)20:27
pabelangerYa, a little different workflow from today, since we do that manually (both steps)20:28
pabelangersince we have 24 hour loop20:28
pabelangerShrews: what about image-upload, how are we using that?20:29
pabelangererr20:30
pabelanger1 sec20:30
SpamapSjeblair: I feel like the trouble is that when a job inherits from another one, we need to _not_ consider the parent as a potential match anymore.20:30
Shrewspabelanger: image-upload is not needed now20:30
pabelangerSpamapS: Ya, image-upload.20:30
pabelangerSpamapS: sorry20:30
SpamapSyou got S'd20:30
* SpamapS fistpumps20:30
SpamapStake that20:30
SpamapS;)20:30
pabelangerShrews: yes, that20:30
pabelangerso, image-upload / image-build would go away20:30
pabelangerat the cost of not being able to manually trigger them20:31
Shrewspabelanger: no, image-build stays, but it replaces the image-build/image-update combo20:31
pabelangerah20:31
pabelangerwhat about dib-image-build?20:31
pabelangererr20:32
pabelangerbrain is mush20:32
pabelangerokay, I understand now20:32
Shrewsi don't see that one20:32
Shrewshah20:32
pabelangerShrews: I'm happy to try things with out them in the new world order20:34
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Remove image-update based tests  https://review.openstack.org/39976420:37
Shrewsso, i guess image-upload is also unnecessary now20:39
Shrewsthat would be the last of the gearman based commands20:40
pabelangerright20:40
Shrewsw00t20:40
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Remove image-update based tests  https://review.openstack.org/39976420:41
pabelangerShrews: looking at 399757, that's a lot of args :)  would it be possible to update that to use the image name?  nodepool image-delete template-fake-image-1479501568 ?20:42
pabelangerI guess that would be the Provider Image Name20:42
Shrewspabelanger: nope20:42
pabelangerl20:43
pabelangerk20:43
Shrewspabelanger: well, if we want to pull all uploads, then compare image.external_name20:43
Shrewswe'd still need build-id20:43
Shrewsand upload-id20:43
pabelangerokay, will have to try it and see how it works20:44
jeblairSpamapS: note that we have actual inheritance (where jobs with different names are related to each other -- job: name: foo; parent: bar) and also the things i've been calling variants (where one job out of several with the same name is chosen to run).  it's potentially confusing because behind the scenes they both use the 'inheritFrom' method to copy attributes around.20:44
Shrewsthough, i guess in reality, the same image for a build wouldn't be successfully upload more than once20:44
jeblairpabelanger, Shrews, clarkb: the little voice in my head saying maybe we should make a unique id for uploads is getting louder.20:45
Shrewsjeblair: yeah20:46
clarkb++20:46
* Shrews is starting to hear it too20:46
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Remove image-upload command and tests  https://review.openstack.org/39976620:46
* clarkb is hoping that latest commit to ansible PR will actually apss tests and we can get this silly bug fixed20:46
Shrewsjeblair: i suggest allowing the current changes through, then considering how that would look afterwards20:48
jeblairShrews: agreed20:48
jeblairbrainstorming on that: a) redesign the zk storage hierarchy so it isn't deep (all uploads go into one container); b) have a global counter for upload ids (like an autoincrement table in postgres); c) generate uuids (they're longer but copy/pastable)20:48
jeblair[that's in increasing order of preference for me]20:49
Shrews(a) would touch quite a bit of code20:50
jeblairyeah, and aside from this, i think it's a good schema design20:50
Shrews(b) we could repurpose sequence znodes20:50
pabelangerlooking into the nodepool.tests.test_builder.TestNodePoolBuilder.test_image_removal failures20:51
Shrews(c) we could keep the status quo, but add uuids to the znode data. but now some APIs would have to do more ZK queries20:52
Shrewspabelanger: eh? where?20:52
pabelangerhttp://logs.openstack.org/58/399758/1/check/nodepool-coverage-db-ubuntu-xenial/b8776e3/console.html20:52
clarkbI like uuids because they are easy to reason about20:52
pabelangerShrews: it is the race you warned about20:53
clarkbI think a) and b) would require extra brain processing around changing that code that c) wouldn't20:53
Shrewspabelanger: ah, yeah. perhaps we need a loop around getBuild()20:55
pabelangerShrews: agreed, doing that now20:55
jeblairShrews: yeah, with (c) the lack of indexing would hurt, but if we only use it for the CLIs it probably wouldn't be that bad.20:58
Shrewsjeblair: oh, well that's a good point20:58
clarkbwoot tests pass now on ansible sync retry fix21:00
clarkbnow I can get lunch21:00
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Better protect for race condition in waitForBuildDeletion()  https://review.openstack.org/39977021:03
pabelangerShrews: ^more protection21:03
Shrewshrm, not quite sure why 399757 failed21:07
Shrewslooks like one of the threads abnormally aborted. if that's the cleanup thread, i could see the timeout21:09
Shrewspabelanger: is that call to wait_for_threads() in that loop necessary there? looking at the code for that method, i'm not entirely certain we need that for testing the builder21:12
Shrewslike, don't need it in any of the tests21:12
pabelangerlet me check21:12
jeblairShrews: it should be harmless21:12
Shrewsok. i'm not 100% certain exactly what that's doing anyway21:14
jeblairShrews: (it doesn't look like it's at issue in that test anyway, since it didn't time out inside it)21:14
Shrewsall of my changes have failed with that delete timeout, though the test passes locally. weird21:16
jeblairShrews: it makes sure that any threads that are running to create nodes, etc, have completed (since a thread running one of those will change the state).  essentially it helps make sure the system is stable before we inspect it.  it's generally not at issue for many builder tests, but any that incidentally wait for a node to be built would need it.  easier to just have it.21:17
jeblairShrews: was that image uploaded twice?21:18
jeblairwith the same number?21:18
Shrewsjeblair: i don't see how21:20
jeblairShrews: i don't either, but i ask because i see two lines like 2016-11-18 20:27:56.026391 | INFO [nodepool.builder.UploadWorker] Image build 0000000001 in fake-provider is ready21:21
Shrewswell that is interesting21:21
Shrews!21:21
jeblair(and if there were two uploads, it would cause the failure).  first thing to eliminate is whether that's just log duplication.21:22
Shrewsthis smells like a race21:22
Shrewstwo threads uploading the same image21:22
jeblairShrews: we may need to turn zk logging back on and throw this at zuul a few times21:23
jeblairShrews: do you want to do that?  i want to make some logging changes so we can discern the different upload threads.21:24
Shrewsjeblair: hold on a sec...21:24
Shrewsjeblair: i think we need a check inside the upload lock code to make sure some other thread hasn't already done the work21:25
Shrewsjeblair: b/c we check if it has been uploaded, if not, upload it (w/o doing the check in the critical section again)21:26
* Shrews codes the fix21:28
jeblairShrews: oh!  they're not doing it at the same time, they're doing it sequentially :)21:29
*** hasharAway is now known as hashar21:29
Shrewsright21:29
jeblairmakes sense21:29
jeblairShrews: hrm, well why would they get the same upload id then?21:30
Shrewsjeblair: the build id is the same (that's what is in the log)21:31
hasharhello :)21:32
hasharFound a nice abuse of Depends-On on our setup a patch with 58 depends-on meta headers!21:34
jeblairShrews: doh.21:34
jeblairhashar: do you mean someone was being mischevious, or they had a really complex thing?21:35
hasharcomplex thing21:35
jeblaircool.  did it work? :)21:35
hasharthat is a breaking change in MediaWiki itself, it has been made to depend on all the ptch that updates the plugins21:35
hasharthink about Wordpress suddenly moving to python and depending on every single wordpress plugins that do the switch :D21:35
hasharZuul is still processing21:36
hasharI am confident it will eventually reach the end!21:36
jeblairyeah, that'll require some churn... how many mergers do you have?21:36
hashar1 :D21:36
hasharit is merely doing Running query change:Ixxxxx21:36
jeblairah yeah.21:37
hasharand yeah I havent thought about all the merge:merge jobs that are going to be send21:37
SpamapSjeblair: ah yes that did confuse me21:37
SpamapSbut that may be because I had only 2 hours of sleep last night21:37
hashar(I dont think too many depends-on need a fix. I  reported it because I find it a creative use of the feature)21:38
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add secondary upload check  https://review.openstack.org/39977421:38
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Log each worker thread separately  https://review.openstack.org/39977521:38
Shrewsjeblair: ^^^21:38
jeblairSpamapS: that is less than the surgeon general recommends before starting work on zuul21:38
jeblairShrews: ^^^ :)21:39
* SpamapS has always been a rebel21:39
jeblairShrews: that's quite the race condition too, since that's a non-blocking lock.21:40
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add test_image_delete test  https://review.openstack.org/39975721:42
Shrewsgah21:42
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add test_image_delete test  https://review.openstack.org/39975721:44
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Remove image-update based tests  https://review.openstack.org/39976421:44
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_delete_invalid  https://review.openstack.org/39973921:44
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Remove image-upload command and tests  https://review.openstack.org/39976621:44
Shrewsrebased those on the race fix21:45
jeblairpabelanger, Shrews, clarkb: https://review.openstack.org/399770 failed with the db drop.  it was running the lockfile code.  :(21:45
jeblairpabelanger, Shrews, clarkb: maybe we should lockfile the schema *creation* as well?21:45
Shrews*sigh*21:46
Shrewsi hate databases21:46
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Also use a lockfile when creating the schema  https://review.openstack.org/39977721:47
jeblairalmost gone21:47
*** rcarrillocruz has joined #zuul22:04
pabelangerjeblair: worth a try22:04
Shrewspabelanger: that failure on 770 is all cray cray22:09
ShrewsOk, I am ahead of schedule and already 2 beers into my weekend. I should neither review nor code anymore. C U all Monday.22:11
pabelangerShrews: ack, enjoy22:18
openstackgerritMerged openstack-infra/nodepool: Add secondary upload check  https://review.openstack.org/39977422:30
*** hashar has quit IRC22:31
*** lindycoder has quit IRC22:31
openstackgerritMerged openstack-infra/nodepool: Also use a lockfile when creating the schema  https://review.openstack.org/39977722:38
jeblair23:29 < openstackgerrit> James E. Blair proposed openstack-infra/infra-specs: Zuul v3: Add section on secrets  https://review.openstack.org/38628123:29
*** rcarrillocruz has quit IRC23:46

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!