Thursday, 2016-11-17

SpamapShrm00:04
SpamapSso why was job_has_change removed from ZuulTestCase ?00:04
* SpamapS can't find the commit that did it00:04
clarkbSpamapS: git log -p is magical way to find commits that do things00:05
clarkb(I use it far too much)00:05
SpamapSclarkb: indeed, I forgot about that00:06
clarkbpabelanger: I rechecked that stack because I selfishly want to see integration tests work there. Will approve once I get taht data back00:06
clarkbpabelanger: I mean assuming the tests check out like initial code review does00:06
jeblairSpamapS: looks like I654a269d37c0bc323ed73afa68a73ddd558be7e200:06
SpamapSindeed00:06
SpamapSso, the problem is, there are some tests that don't have any builds00:06
SpamapSbecause they're testing the queue00:07
SpamapSso they just have a gear.Job00:07
SpamapSand need to go ahead and load the json, the same way that function did00:07
jeblairSpamapS: have an example handy?00:08
SpamapSjeblair: test_failed_change_at_head_with_queue00:08
SpamapSspecifically asserts that there are no builds yet00:08
SpamapSI think it may be worth writing a new version of job_has_changes that is much smaller than the old one00:09
SpamapSmaybe call it queue_job_has_changes to differentiate00:09
SpamapSjeblair: and I understand that the things expected to be in those jobs changes.. like the name, and some of the content00:14
jeblairSpamapS: yeah, i think we pass along 'items' as a parameter to the build function, so you should be able to inspect that00:14
jeblairSpamapS: (items contains the instructions on what changes to include)00:15
jeblairSpamapS: yes, the _gearman_ job name is no longer interesting (it is simply 'build'), so you'll want to look for the 'job' argument of the job.00:17
jeblairin clase that was clear as mud00:17
openstackgerritMerged openstack-infra/nodepool: Re-enable test_job_end_event test  https://review.openstack.org/39849300:18
SpamapSindeed, I had moved past that one00:18
SpamapSok yeah items is what I need00:18
jeblairSpamapS: the gearman job named 'build' has arguments telling the launcher what to run.  one of those arguments is 'job' which is the zuul job name (eg, 'project-test1').  another argument is 'items' which contains the list of changes to be merged for the test.00:18
SpamapSThis is actually the only place that uses job_has_changes with the gear job tho00:19
pabelangerclarkb: sounds good00:19
jeblairSpamapS: it may not be worth it then00:19
SpamapSYeah exactly.. just going to do the extraction there00:19
jeblairSpamapS: also, that seems like a slightly extraneous check.  you could probably drop it entirely and i don't think we'd miss it.00:20
jeblair(that single specific assertion, i mean)00:21
openstackgerritMerged openstack-infra/nodepool: Re-enable test_job_auto_hold_* tests  https://review.openstack.org/39849500:21
openstackgerritMerged openstack-infra/nodepool: Remove SnapshotImage from the database  https://review.openstack.org/39860200:26
pabelangerclarkb: excellent, all green!00:27
clarkbyup00:28
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Delete test_handle_dib_build_gear_disconnect test  https://review.openstack.org/39865500:29
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul: Re-enable test_failed_change_at_head_with_queue  https://review.openstack.org/39865600:30
SpamapSjeblair: ^ another one bites the dust00:30
jeblairpow!00:30
openstackgerritMerged openstack-infra/nodepool: Replace snap_image with cloud_image  https://review.openstack.org/39863800:31
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Delete image related tests from test_nodepool  https://review.openstack.org/39865700:32
pabelangerjeblair: clarkb: okay, last tests from test_nodepool that were skipped.00:33
pabelanger^00:33
jeblairpabelanger: do those first 2 tests have equivalents in test_builder?00:37
jeblairor, i guess test_command00:38
clarkbI was going to ask, maybe we should at least stub out the things we need todo in the other places?00:38
clarkb(help us to not forget)00:38
jeblairwell, if they aren't covered, i'd say we shouldn't land that change -- it should have the add + delete.00:38
jeblairbut they may be covered00:39
clarkbI am good with that too00:39
openstackgerritMerged openstack-infra/zuul: Re-enable test_failed_change_at_head_with_queue  https://review.openstack.org/39865600:39
pabelangerjeblair: not yet, but I can work on that tomorrow morning for test_builder00:40
pabelangerthen add them to the current patch00:40
clarkbpabelanger: maybe wip the change in the mean time?00:41
clarkbdoes https://review.openstack.org/#/c/398655/1/nodepool/tests/test_nodepool.py have a semi similar "test builder losing connectivity to zk" type test?00:41
pabelangerclarkb: yup, did already00:41
pabelangerclarkb: same, I'll get a replacement up too00:42
pabelangerI'll WIP00:42
clarkbthanks00:42
jeblairi'm inclined to think that the cli tests are probably sufficient for the image delete tests00:42
jeblairbut if someone wants to argue we need additional tests for that, i'll listen :)00:42
clarkbjeblair: well I think earlier they were saying that the cli tests should only check that zk gets updated and we need to test the background deletion separately? If thats teh case I don't think the cli tests are sufficient00:43
clarkb(I may have misparsed that conversation)00:43
pabelangerI think it would be good to check the disk to see if the image is actually removed, for a unit test.00:45
jeblairclarkb: well, that would be one approach.  if we went that way, then yes, we would need a second test to test the actual deletion.  however, i would suggest that a cli test that verified the deletion happened would be good.00:45
clarkbjeblair: ya I think master's cli tests work that way00:45
clarkbso happy to set it up in that manner still00:45
jeblair(also, i think it would be extra work to build the test the other way -- you would have to shut down the builder before you ran the command to prevent it racing)00:47
jeblairclarkb, pabelanger: so i have +1d 657 with that comment.00:48
clarkbok00:48
clarkbjeblair: re morgan_'s comment in -infra, is ZUUL_CHANGE expected to stick around long term?00:48
jeblairthe zk disconnect is a good question.  it's not a 1:1 mapping here...00:48
pabelangerjeblair: ack00:48
clarkbjeblair: ya I think the rough mapping is you restart the builder in the middle of a build, does it properly update zk on startup00:49
jeblair(since that was testing a gearman disconnect while a running gearman job handle was still open)00:49
jeblairbut certainly the idea of testing zk disconnects is a good one00:49
jeblairclarkb: and yeah, that's the most likely case and one we should start with00:50
clarkbwhat that test is guarding against is that we update databse state properly when things get restarted00:50
openstackgerritIan Wienand proposed openstack-infra/nodepool: Separate image upload logs into separate logger  https://review.openstack.org/39751600:52
*** Shuo_ has quit IRC00:56
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove image delete tests from test_nodepool  https://review.openstack.org/39865701:02
pabelangerjeblair: clarkb: So, https://review.openstack.org/#/c/398418/ test_dib_image_delete will check the state if the images. So, that gives us coverage for the tests removed in test_nodepool01:02
pabelangerI'd still like to add a follow up test to make sure the image is removed from disk01:02
jeblairpabelanger: that test will race the builder to actually delete it.  so we should either modify that test to wait for the deletion to happen, or stop the builder before running the command (and then probably start it again and wait for deletion)01:27
*** bhavik1 has joined #zuul04:32
*** abregman has joined #zuul06:23
*** Cibo has joined #zuul06:41
*** bhavik1 has quit IRC08:27
*** abregman has quit IRC08:28
*** abregman has joined #zuul08:30
*** abregman has quit IRC08:40
*** abregman has joined #zuul08:53
*** hashar has joined #zuul09:34
*** openstackgerrit has quit IRC09:48
*** openstackgerrit has joined #zuul09:49
*** Cibo has quit IRC10:07
*** hashar is now known as hasharLunch11:33
*** hasharLunch is now known as hasharAway12:29
*** abregman has quit IRC13:43
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_config_validate  https://review.openstack.org/39842713:53
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Re-enable test: test_image_build  https://review.openstack.org/39843713:58
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete  https://review.openstack.org/39844014:01
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Remove test: test_snapshot_image_update  https://review.openstack.org/39844514:01
SpamapSShrews: know anyting about 'irrelevant-files' replacing 'skip-if' ?14:22
SpamapSI'm struggling to see how the new vision handles different irrelevant-files in different projects.14:23
ShrewsSpamapS: i do not understand your words, so I'm going to go with "no"14:23
SpamapSShrews: apparently 'irrelevant-files' has replaced 'skip-if'14:23
SpamapSbut I'm not sure I understand how that's supposed to work to define per-job per-project skips14:24
ShrewsSpamapS: ah, a zuul change, I assume. I have been too engrossed with nodepool to follow much of the zuul changes14:27
SpamapSwell you're no help then14:28
Shrewsthis is often true14:31
Shrewspabelanger: any chance nodepool.tests.test_nodepool.TestNodepool.test_node_vhd_image_StringException was one of the new tests you enabled?14:32
Shrewspabelanger: http://logs.openstack.org/37/398437/3/check/gate-nodepool-python27-db-ubuntu-xenial/0087592/testr_results.html.gz14:32
Shrewserr, minus the "_StringException" there14:33
Shrewsah, yep14:34
SpamapSmordred jeblair see above.. I am looking at the change from skip-if to irrelevant-files. In the old test_parse_skip_if test, we test a situation where one job is fed by two projects, with different skip-if's in each project. I don't see how that's possible with irrelevant-files.14:38
mordredSpamapS: I am also no help14:44
SpamapSmordred: crikey mate.14:45
mordredSpamapS: I do have a cup of coffee though - so I might become useful once I've consumed it14:45
SpamapSmordred: yeah, I'm interested in understanding how integration jobs work in the new world order. But I hear the stompy-stomp of little feat overhead, so it's time to leave the basement and tend to the little ones for a while.14:48
mordredSpamapS: stompy stompy!14:48
olaphstompy, indeed15:02
pabelangerShrews: Yes, that was enabled yesterday. Looks like it is flapping?15:08
Shrewspabelanger: we'll have to keep an eye on it15:09
pabelangerShrews: Oh, so this is a long bug that clarkb mentioned yesterday. We try to drop the database on clean up, but it fails.  For now, we can recheck but we should keep track of the failures, so we can revisit to debug15:09
mordredpabelanger, Shrews: it's also worth noting that the db is going away eventually, so unless it gets flappity flappity, it may not be worth solving15:10
mordredbut yes on keeping track of it15:11
openstackgerritAndreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh  https://review.openstack.org/39907915:34
jeblairSpamapS: if an irrelevant files should apply to all projects, it should be placed on the bare job definition.  if it only applies to one project, then it should be placed on the job variant definition attached to the project-pipeline.15:48
jeblairSpamapS: eg: http://paste.openstack.org/show/589613/15:50
*** abregman has joined #zuul16:06
*** abregman has quit IRC16:06
openstackgerritMerged openstack-infra/nodepool: Re-enable test: test_config_validate  https://review.openstack.org/39842716:13
SpamapSjeblair: what about a job that is triggered by two different projects, each of which need a different set of "irrelevant-files" defined.17:04
SpamapSjeblair: so, like, an integration job in which nova wants to ignore 'contrib/*' but keystone actually wants to run if files in contrib/* change.17:05
jeblairSpamapS: they'll each have a job variant defined on their project-pipeline17:05
SpamapSOh those are no longer the same job?17:05
jeblairSpamapS: 'variant' is a term of art here meaning 'the same job but with different preconditions'17:06
jeblairSpamapS: so the job is still named 'foo' in both cases, and the content of the job is the same, but the decision about whether and when to run the job is different in a variant17:07
jeblairSpamapS: (variants can also alter the behavior by selecting different node types, or set different ansible variables)17:07
SpamapSjeblair: Ok, I want to update the docs as I refactor the tests.. so I am trying to fully understand.17:08
SpamapSAnd I truly do not at this point.17:08
SpamapSbut that's probably because I haven't thought about it for very long.17:08
* SpamapS will fake it, type something up, and we can edit together in review17:09
jeblairSpamapS: okay.  i've avoided updating the docs so far because it's still quite a bit in flux (not all of the behaviors are fully defined yet), and honestly, the docs need a rewrite anyway.  but we can certainly start to put some words down.  :)17:11
jeblairSpamapS: there's some discussion of this in the v3 spec here: http://specs.openstack.org/openstack-infra/infra-specs/specs/zuulv3.html#jobs17:12
jeblair(though i used the word 'aspects' instead of 'variant' there.  i'm open to which is the better term.  :)17:12
SpamapSjeblair: Well for me, if something is totally different, like this... I need some prose.17:12
SpamapSDoesn't have to be in docs/*17:12
SpamapSI could put it in the test description.17:12
pabelangerjeblair: Shrews: Just a heads up, running into some issues validating an image is delete by nodepool-builder: http://paste.openstack.org/show/589624/ This is from testtools.run nodepool.tests.test_commands.TestNodepoolCMD.test_dib_image_delete, only change I made was to wait for a non-existent image to come online.  Going to be working on this for the next little while17:13
SpamapSjeblair: makes me quite nervous that such a fundamental thing is still considered to be in flux.17:13
SpamapSI'd rather we be wrong at this point.17:13
SpamapSthan be confused. :-P17:13
rbergeron(thinking out loud) -- I was just mailing Will Thames, the fine maintainer of ansible-lint who just happens to work for Shadowman in IT / engineering services ... was thinking maybe ansible-lint might be a useful thing to run on the jobs/playbooks themselves when they are written / submitted.17:16
rbergeroni know it's a ways off and for all i know this could already be happening anyway on other things, but... figured i'd mention it :)17:16
rbergeron</squirrel>17:18
pabelangerrbergeron: Yup, some projects already do that today. The Ansible Openstack team has some jobs already in place for that, I think ansible-role-cloud-launcher too.  I also do it for some playbooks / roles I am hosting upstream17:19
rbergeronpabelanger: awesomesauce. was just thinking it might be useful when all the jobs are ansibley. :)17:21
pabelanger++17:22
SpamapSrbergeron: thank you, I didn't know there was a well maintained ansible-lint ... I have many uses for such a thing.17:22
gundalowhttps://github.com/willthames/ansible-review & https://github.com/willthames/ansible-lint17:23
rbergeronyeah, those. gundalow on the ball :)17:23
pabelangerI haven't done ansible-review yet17:23
pabelangerbut ansible-lint is great17:23
gundalowditto17:23
rbergeronSpamapS: will came out to ansiblefest in london last year; he was on our list of folks to sponsor and we were like, "let's see who he works for" and ... oh hey, he works here! it was pretty amusing. :)17:26
openstackgerritMerged openstack-infra/nodepool: Log image delete exception as exception  https://review.openstack.org/39859617:31
openstackgerritAndreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh  https://review.openstack.org/39907917:42
pabelangerShrews: have a min?17:57
pabelangertrying to figure out why I am getting an exception calling zk.deleteBuild()17:58
pabelangerhttp://paste.openstack.org/show/589629/17:58
openstackgerritAndreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh  https://review.openstack.org/39907917:58
Shrewspabelanger: not atm. Out at lunch waiting for someone17:58
Shrewspabelanger: what nodes are left under that zk build path?18:00
Shrewspabelanger: you'll need to use zk-shell to look probably18:00
* Shrews wonders if a lock node remains18:01
pabelangerShrews: http://paste.openstack.org/show/589630/18:02
pabelangerya, looks like a lock18:02
pabelangertrying to delete build 000000000118:03
pabelangerlet see if I can find it18:03
Shrewspabelanger: OK, then that's a logic error in the code18:03
pabelangerShrews: thanks18:04
ShrewsCan think of a couple of ways to deal with that, but will need to wait until I get back to my computer18:04
pabelangersure, np. Enjoy lunch18:04
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests  https://review.openstack.org/39917218:19
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder  https://review.openstack.org/39917318:19
jeblairpabelanger, Shrews: ^ that works.  i'm going to work on the corresponding provider removal test now, which we can use to explore the concerns we had yesterday about deleting uploaded images when we remove a provider18:20
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add tools/test-setup.sh  https://review.openstack.org/39907918:24
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add tools/test-setup.sh  https://review.openstack.org/39907918:26
openstackgerritAndreas Jaeger proposed openstack-infra/nodepool: Add tools/test-setup.sh  https://review.openstack.org/39917718:33
pabelangerjeblair: Shrews: what do you think about adding waitForBuildDelete() (name open to bikeshed) as a helper function to watch the filesystem to make sure our builds are actually deleted?18:48
jeblairpabelanger: maybe it can check zk and the filesystem?  but yes.18:54
pabelangerk18:57
Shrewsjeblair: me likey19:22
*** hasharAway is now known as hashar19:22
openstackgerritPaul Belanger proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk  https://review.openstack.org/39919619:25
pabelangerjeblair: Shrews: This is what I've come up with so far to check that builds are removed from disk, however the test currently fails19:25
pabelangerbecause getBuild() isn't empty and deleteBuild() throws an exception (still has a lock)19:26
pabelangerwant to make sure I'm on the right path before digging into the lock19:26
Shrewspabelanger: well, that is slightly racey to begin with (deleting ondisk image and the zk node are not atomic)19:32
Shrewsbut should rarely hit that, i would think (famous last words)19:33
Shrewspabelanger: so, 2 thoughts: we could do the recursive thing you brought up yesterday, which would solve that problem. But that opens the possibility of us accidentally deleting the upload data if we haven't properly done that aspect yet.19:34
*** abregman has joined #zuul19:34
Shrewspabelanger: 2nd thing, we could manually delete the lock after all uploads have been deleted19:34
pabelangerya, let me see where we are missing the lock before we do recursive, I think that is worth the effort19:35
Shrewsthe second would involve deleting the locks for each provider19:35
Shrewsactually, the entire "provider/XYZ/images" path19:36
Shrewspabelanger: ooh, 3rd option19:37
Shrewsput a check in to deleteBuild() that does the verification that no uploads exist. if they do, throw an exception. if not, delete recursively19:37
Shrewsi like #319:37
Shrewspabelanger: i can code that up real quick.. gimme a min19:39
pabelangerokay19:39
Shrewspabelanger: unless you were eager to do that?19:39
pabelangerShrews: I'll have to step away shortly for a meeting, so I can do that when I get back. Or happy to let you do so19:40
Shrewspabelanger: i'll toss the code up and rebase your test on top of it19:41
pabelangerWFM19:41
*** openstackgerrit has quit IRC19:48
*** openstackgerrit has joined #zuul19:49
jeblairpabelanger, Shrews: cool -- i think i'm very close to having the remove provider test working, which does require some changes in deleting.  i think i can push that up after my lunch.  so hopefully around then we can look at both that and the lock issue together (since they are all around the same bit of code).19:56
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk  https://review.openstack.org/39919620:03
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921320:03
Shrewspabelanger: jeblair: ^^^20:03
Shrewscrossing my fingers pabelanger's new test passes now20:03
Shrewsgah. wrong log level20:05
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk  https://review.openstack.org/39919620:07
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921320:07
*** abregman is now known as abregman|afk20:08
Shrewspabelanger: your test passes now! i had a stupid pep8 failure to get the -1 from jenkins though20:35
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921320:35
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: [WIP] Check that builds are removed from disk  https://review.openstack.org/39919620:36
* Shrews needs to alias 'git review' to 'tox -epep8 ; git review'20:37
timrcI use vim-spf13 and it auto-includes an inline pep8 check which is handy.20:43
timrcShrews: But would you want tox -e pep8 && git review so it short circuits?20:44
pabelangerand back20:44
Shrewstimrc: it was a half-jest, not subject to checks for accuratenesss :)20:44
* timrc crawls back into his hole20:45
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder  https://review.openstack.org/39917320:45
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests  https://review.openstack.org/39917220:46
pabelangerShrews: Yay, that is awesome20:46
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images  https://review.openstack.org/39923020:46
jeblairpabelanger, Shrews: okay, that's there ^  (the other 2 are rebased on branch tip).  i'll look at what you've been up to now.  :)20:47
Shrewsjeblair: looking20:48
Shrewsjeblair: i think you need to rebase on 39843720:50
Shrewsbecause that assertEqual() of the images won't really work w/o it20:50
Shrewshttps://review.openstack.org/#/c/399230/1/nodepool/tests/test_builder.py20:51
jeblairShrews, pabelanger: two questions:  1) would it be safer to just delete the lock right before the build delete?  2) don't we need to continue to trap NoNodeError?20:51
Shrewsjeblair: 1) there would be a lock for each provider, 2) no, b/c i added an exists() check above20:52
jeblairShrews: but this could be racing with other builders, right?20:52
Shrewsjeblair: hmm, fair point20:54
Shrewsas far as point #2. i don't think #1 makes a difference, does it?20:54
jeblairShrews: yeah, i think we can ignore #1 for now.  i lean toward needing #2 because of other builders.20:56
jeblairShrews: re 437 -- hrm, yes i was counting on it and thought it merged.  now i don't know why the upload deletion works at all (i changed it to use imageupload.build_id)20:57
Shrewsyeah. actually, that exists check added is not needed20:57
Shrewsjeblair: also, i just realized that __eq__ doesn't take provider into account, so we could get some false equality there20:58
jeblairShrews: oh, right!  437 only adds eq.  build_id *did* already merge.20:58
jeblairso, yeah, that all makes sense.  i will rebase on 437 and improve it.20:58
Shrews++20:58
pabelangerShrews: Haha, we need to update test_dib_image_delete now because deleted won't be in the output. Which now make sense based on the previous patch20:59
pabelangerhttp://logs.openstack.org/96/399196/4/check/gate-nodepool-python27-db-ubuntu-xenial/bcf992a/testr_results.html.gz20:59
jeblairwow, there are *two* ...437 changes in my zuulv3 dashboard :)20:59
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921321:00
jeblairpabelanger: can we paint the bikeshed 'waitforBuildDeletion'?21:00
pabelangerWFM21:00
jeblairpabelanger: i just wrote 'waitforUploadDeletion'.  reckon they should match.21:01
pabelangerYup21:01
jeblairer waitForBuildDeletion/waitForImageDeletion.21:01
Shrewspabelanger: we need to add a wait there in test_dib_image_delete, since that, too, is racey21:01
pabelangerya21:02
Shrewsthis is why i didn't tackle any more test_commands tests. lots of moving targets atm21:03
Shrewswheeee21:03
jeblairpabelanger: we're going to need to reconcile 399196 and 399230 -- we both touched _cleanupProvider.21:03
Shrewsjeblair: are you adding provider to __eq__ in your improvement?21:03
pabelangerjeblair: Ya, I can rebase a top of yours21:03
jeblairShrews: yes, i was planning on doing that21:04
Shrewsossum21:04
jeblairpabelanger: i normalized to passing in a provider object.  we can switch mine to accept provider_name, but is there a reason you went with that?21:05
jeblairpabelanger: oh, i see another thing we can converge on -- i saved the images dir on the test object because i needed that for the configuration reload.  you saved the nodepool builder object and used the imagesdir attribute on its config.21:06
jeblairpabelanger: we could use either approach to solve both problems21:07
pabelangerjeblair: I found we only used the provider_name, and there was a 2 bugs where we just passed provider. So switching it to provider_name seems to make sense.  However, don't mind reverting to original way and using provider.name21:07
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921321:08
jeblairpabelanger: i have a *slight* preference for my approach of saving the directory since it involves less reaching into the nodepool builder.  (i mean, i think that's fine to do, but in this case, i think it's easy enough to do without reaching into the ._config object).21:08
pabelangerjeblair: sure, lets use the original way21:09
pabelangerand use your directory too21:09
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921321:09
jeblairpabelanger: give me a second before you rebase21:10
pabelangerk21:10
jeblairpabelanger: i'm rebasing and have a conflict21:10
openstackgerritMerged openstack-infra/nodepool: Catch all upload exceptions  https://review.openstack.org/39814821:10
jeblairoh let's do this21:10
jeblairpabelanger: can you +3 398437 ?21:11
jeblairi need that to be merged with branch tip before i can do this21:11
pabelangerdone21:13
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Fix getMostRecentImageUpload return docstring  https://review.openstack.org/39924321:13
Shrews^^^ fixes a suspected copy-pasta21:13
jheskethMorning21:23
pabelangero/21:24
Shrewsdarn you jenkins21:25
Shrewsactually, darn you mysql21:25
Shrewspabelanger: 437 hit the db timeout again21:25
Shrewsrechecked21:27
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload  https://review.openstack.org/39925021:28
pabelangerShrews: :(21:28
jeblairthat's problematic....21:29
jeblairpabelanger, Shrews: i'm going to rebase 437 on branch tip21:29
pabelangerack21:29
jeblairsince we've hit the timeout, the cost is sunk already, and this will let us restack everything21:29
Shrewsai'ight21:29
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Re-enable test: test_image_build  https://review.openstack.org/39843721:30
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete  https://review.openstack.org/39844021:31
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload  https://review.openstack.org/39925021:31
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove test: test_snapshot_image_update  https://review.openstack.org/39844521:31
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder  https://review.openstack.org/39917321:31
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests  https://review.openstack.org/39917221:31
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images  https://review.openstack.org/39923021:32
jeblairpabelanger: okay i think you can rebase your changes on that21:33
jeblairpabelanger: i think that would include 213 and 230, right?21:33
pabelanger213 and 19621:34
jeblairyep, that's right, sorry.21:34
Shrewsjeblair: comments on 39925021:35
jeblairShrews: rgr will do21:35
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images  https://review.openstack.org/39923021:37
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload  https://review.openstack.org/39925021:37
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder  https://review.openstack.org/39917321:37
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests  https://review.openstack.org/39917221:37
jeblairpabelanger: sorry, one more rebase there ^21:37
pabelangernp21:37
*** jamielennox is now known as jamielennox|away21:40
jeblairi'll be back in 1521:41
openstackgerritMerged openstack-infra/nodepool: Fix getMostRecentImageUpload return docstring  https://review.openstack.org/39924321:42
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened  https://review.openstack.org/39919621:56
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921321:56
pabelangerjeblair: Shrews: okay, I think that fixes our race condition and confirms our build has been removed21:56
clarkbok https://github.com/ansible/ansible/issues/18281 now has a pull request and more thoughts on the problem based on what I have learned21:57
Shrewspabelanger: is that assert_listed() the equivalent of "not listed"22:00
pabelangerShrews: ya, is there a better way to do that?22:00
Shrewsah, just saw the comment. nah, just seems odd to with the method name: "assert_listed, but not really"22:01
Shrews"odd to with" ?? me no speaky goodly22:02
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened  https://review.openstack.org/39919622:02
Shrews399250 hit the db timeout again. this is too problematic. we'll have to fix that, i believe22:06
*** lindycoder has joined #zuul22:07
mordredShrews: yah. that's :( but I agree22:07
Shrewshas anyone looked into it before?22:08
openstackgerritMerged openstack-infra/nodepool: Re-enable test: test_image_build  https://review.openstack.org/39843722:08
openstackgerritMerged openstack-infra/nodepool: Re-enable tests: test_job_create, test_job_delete  https://review.openstack.org/39844022:08
jeblairclarkb: ^ what do you know about the db timeout thing?22:09
openstackgerritMerged openstack-infra/nodepool: Remove test: test_snapshot_image_update  https://review.openstack.org/39844522:09
jeblairit looks like a deadlock22:09
clarkbjeblair: https://review.openstack.org/#/c/280989/ is the stack where I was trying to work it out22:10
jeblair'drop database' is running but sitting there until the test timeout22:10
clarkbjeblair: the parent of that change tries to dump a bunch of extra mysql debugging data but none of the data I was able to collect amde it clear why that was happening22:10
jeblair(after the actual test ran and completed in a few seconds)22:10
clarkband yes I think its somethign else holding open the db22:10
clarkbmaybe a stray thread or similar22:10
clarkbthere are probably different and more better things that could be added to https://review.openstack.org/#/c/278146/7/nodepool/tests/__init__.py that would help more22:11
Shrewsdo the tests create a schema for each test? or is a single schema shared among them?22:11
clarkbbut my familiarity with debugging mysql deadlocks is minimal22:11
clarkbShrews: schema per test22:11
Shrewswell that's very odd then22:11
clarkbShrews: and at the end of each test it attempts to drop the schema22:11
clarkbwhcih is what is hanging22:11
clarkbjeblair: I think I also tested that if I set the test timeout to some ridiculous number it never actually manages to drop the db22:12
clarkbjeblair: pretty sure I did that locally22:12
jeblairi wonder how well mysql is tested/protected against deadlocks in drop database (it's possible we're the only people who tried to drop 8 schemas at the same time?)22:12
clarkblet it sit there for an hour before deciding it wasn't going to happen22:12
clarkbjeblair: we do have a knack for finding all the bugs22:13
jeblairunfortunately, all the runners are difference processes, so it would not be easy to serialize across them... :/22:14
jeblairdifferent22:14
jeblairclarkb, Shrews: we (ugh) could (maybe) try to prefix the table names instead of per-test schemas.22:15
clarkbor do schema per process and reduce contention on drop dbs22:16
Shrewsjeblair: or use sqlite!22:16
Shrewsmordred: ya know, we wouldn't have this problem if mysql supported transactions22:16
* Shrews hides22:16
jeblairShrews: nodepool does not even *remotely* think about working on sqlite.  :)22:17
jeblair(we used to.  it did not last long)22:17
mordredShrews: yah. adding transaction support sounds great22:17
jeblair(nodepool has ~1000 concurrent open connections to mysql, each writing to one row in the same table)22:18
Shrewsmy bet is on innodb doing some processing that's taking a while22:18
jeblairShrews: but more than an hour?22:19
Shrewsan hour? oh22:19
jeblairShrews: yeah, clarkb let this run locally for an hour once22:19
clarkbits reproduceable locally22:20
clarkbI think I set the timeout var to big number then did while tox -epy2722:20
clarkband ya it sat there for a long time once I hit it22:20
openstackgerritMartin Roy proposed openstack-infra/zuul: Add a mutex release upon cancelling a job via gearman  https://review.openstack.org/39927422:21
lindycoder^ i'm not sure how to tag this as resolving this : https://storyboard.openstack.org/#!/story/200065722:22
pabelangerjeblair: looks like 399230 is missing node_two_provider.yaml22:23
clarkbjeblair: Shrews mordred I definitely felt like I got in over my head once the first set of basic debugging didn't lead anywhere. Some of you are mysql experts, I am not one of them :)22:24
openstackgerritJames E. Blair proposed openstack-infra/zuul: Add a mutex release upon cancelling a job via gearman  https://review.openstack.org/39927422:24
jeblairlindycoder: ^ like that!  and thanks! :)22:24
clarkbbut the issue also seemed to mostly go away22:24
*** jamielennox|away is now known as jamielennox22:24
clarkbwee timing races are the best22:24
lindycoderjeblair, thanks!22:24
jeblairand look, i did it right and the bot did its thing!  https://storyboard.openstack.org/#!/story/200065722:25
lindycoderamazing!22:25
jeblairpabelanger: oh, node_two_provider_remove22:26
pabelangerOh, yes22:27
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Update test_dib_image_delete to validate delete happened  https://review.openstack.org/39919622:28
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Remove obsolete uploaded images  https://review.openstack.org/39923022:28
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add check for uploads to deleteBuild()  https://review.openstack.org/39921322:28
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add provider_name to ImageUpload  https://review.openstack.org/39925022:28
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add test_provider_addition to builder  https://review.openstack.org/39917322:28
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add printZKTree debug method for tests  https://review.openstack.org/39917222:28
jeblairpabelanger: fixed, rebased whole stack22:28
pabelangerYay22:28
pabelangerdanke22:28
lindycoderjeblair, tests passes, do i need to add anything to the PR to your opinion?22:35
jeblairlindycoder: nope, should be all set.  i just need to spend a few minutes thinking about it with a fresh mind (mutexes are hard)22:36
*** hashar has quit IRC22:37
lindycoderSure thing, an alternative to that would be to keep the mutex handling logic in the scheduler and have gearman call a new onBuildCancelled event that manages that22:41
SpamapSadam_g: make sure to check https://storyboard.openstack.org/#!/story/2000773 before you pick up a test. It's been kept pretty up to date22:41
SpamapSShrews: innodb doing things slow?22:42
SpamapSShrews: not possible! :)22:42
lindycodersorry i gotta leave for today i'll be back tomorrow22:43
*** lindycoder has quit IRC22:44
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Remove image delete tests from test_nodepool  https://review.openstack.org/39865722:44
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Use a table prefix instead of per-test schema  https://review.openstack.org/39928622:48
jeblairShrews, clarkb, mordred: <shrug> ^22:49
openstackgerritMerged openstack-infra/nodepool: Add provider_name to ImageUpload  https://review.openstack.org/39925023:00
openstackgerritMerged openstack-infra/nodepool: Add printZKTree debug method for tests  https://review.openstack.org/39917223:01
*** willthames has joined #zuul23:13
openstackgerritMerged openstack-infra/nodepool: Add test_provider_addition to builder  https://review.openstack.org/39917323:17
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Use a table prefix instead of per-test schema  https://review.openstack.org/39928623:28
jeblairokay, that should actually pass tests now.  the docs for this method are an interesting read:  http://docs.sqlalchemy.org/en/latest/orm/mapping_api.html#sqlalchemy.orm.clear_mappers23:29
jeblairclarkb, mordred: ^23:29
mordredjeblair: interesting23:38
jeblairmordred, clarkb, Shrews: i thought of an alternative to that, if we want to keep the schemas -- we could have a well-known lockfile used by the tests.  so the cleanup method locks the lockfile on disk first before executing the drop schema.  [we could probably do something similar in the database itself, actually].23:39
jeblairhonestly, i don't care too much since it should all be going away in the not-too-distant future23:40
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Add image addition test to builder  https://review.openstack.org/39929423:41
mordredjeblair: yah - that's my feeling too- it's going away soon, so whatever fixes it23:59
SpamapSjeblair: so.. irrelevant-files ...23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!