Friday, 2016-12-02

openstackgerritMerged openstack-infra/zuul: Re-enable test_success_pattern as test_success_url  https://review.openstack.org/40045500:10
*** jamielennox is now known as jamielennox|away00:17
*** jamielennox|away is now known as jamielennox00:23
*** jamielennox is now known as jamielennox|away00:46
*** jamielennox|away is now known as jamielennox00:53
pabelangerokay, fedora-23-0000000004 didn't delete properly01:58
pabelangerhttp://paste.openstack.org/show/591193/01:58
pabelangerwe'll have to debug it in the morning01:58
clarkbpabelanger: it waits forall the uploads to delete first right?02:02
pabelangerclarkb: yup, they look to be in the deleting state02:02
pabelangerlet me get output02:02
clarkbbut they arestillthere02:02
clarkbeg they have tobe removed02:02
pabelangerright, let me check02:03
pabelangerhttp://paste.openstack.org/show/591194/02:03
pabelangerchecking infracloud02:03
clarkb(nit saying this is correct but ^ may explain it)02:03
*** persia has quit IRC02:06
pabelangerinfracloud looks good02:06
clarkbpabelanger: meaning the image isnt in infracloud?02:07
pabelangerright02:07
clarkb(but it still has a node in zk)02:07
pabelangerI see 4 fedora-24 images02:07
pabelangerand times look correct02:08
*** persia has joined #zuul02:09
pabelangerYa, 0004 is not in infracloud-vanilla02:09
pabelangerbut still exists in ZK02:09
pabelangerso, out of sync02:09
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add log message when deleting images  https://review.openstack.org/40574102:27
*** jamielennox is now known as jamielennox|away03:11
*** jamielennox|away is now known as jamielennox03:20
*** saneax-_-|AFK is now known as saneax03:27
*** jamielennox is now known as jamielennox|away03:30
*** jamielennox|away is now known as jamielennox03:37
*** saneax is now known as saneax-_-|AFK03:58
*** bhavik1 has joined #zuul05:19
*** openstackgerrit has quit IRC06:33
*** jamielennox is now known as jamielennox|away07:18
*** saneax-_-|AFK is now known as saneax07:43
*** jamielennox|away is now known as jamielennox07:44
*** openstackgerrit has joined #zuul07:57
openstackgerritTobias Henkel proposed openstack-infra/zuul: tools: Add git-repo support script  https://review.openstack.org/40597607:57
*** openstackgerrit has quit IRC08:03
Zarajeblair: ah, cool, I'm guessing those are all tasks in stories tagged v3, which is why automatic lanes can't be used (yet)? I'm glad the api makes it possible to work around that way for now! (and yeah, the more content in the board, the slower things will be, so I'd recommend only using a 'merged' lane if you need to)09:31
*** openstackgerrit has joined #zuul09:40
openstackgerritTristan Cacqueray proposed openstack-infra/zuul: Read all Gerrit events from poll interruption  https://review.openstack.org/40601909:40
*** hashar has joined #zuul09:53
openstackgerritTobias Henkel proposed openstack-infra/zuul: tools: Add git-repo support script  https://review.openstack.org/40597611:49
*** Cibo_ has joined #zuul12:32
*** bhavik1 has quit IRC13:21
*** saneax is now known as saneax-_-|AFK13:38
*** dmsimard is now known as dmsimard|pto13:39
openstackgerritTristan Cacqueray proposed openstack-infra/zuul: Don't getChange on source not triggering a change  https://review.openstack.org/40612914:01
pabelangermorning14:10
pabelangerimages builds / uploads worked for nb01.o.o14:10
pabelangerlooking into clean up this morning14:10
Shrewspabelanger: let me know if i can help.14:28
pabelangerShrews: I think I see the issue, upload.external_name is None14:29
pabelangerso, we are raising an exception14:29
pabelangerhowever14:29
Shrewsoh neat14:29
pabelangerhttp://paste.openstack.org/show/591246/14:29
pabelangerthat is the traceback14:30
pabelangerso, if we clean up TypeError, I think nodepool-builder will do the right thing14:30
Shrewshrm, how can we delete the image through the manager without an external name? Is that an error propogated from somewhere else?14:31
Shrewsshould we be using external_id instead?14:31
Shrews(that code was copied straight from the older builder code)14:32
pabelangerShrews: I think this is left over crud from your delete patch14:32
pabelangerin fact, we don't have any images that are uploaded14:32
pabelangerbut14:32
pabelanger{'build_id': u'0000000003', 'external_name': None, 'state_time': 1480674727, '_state': u'deleting', 'image_name': u'ubuntu-xenial', 'provider_name': 'ovh-bhs1', '_id': u'0000000001', 'external_id': None}14:32
pabelangerthat is the current value14:32
Shrewsso we probably just need an "if external_name:" before the deleteImage call14:35
pabelangerthink so14:35
*** Cibo_ has quit IRC14:38
Shrewspabelanger: add it to 405741?14:42
pabelangerShrews: sure14:44
*** jamielennox is now known as jamielennox|away14:45
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Handle delete exception when external_name is empty  https://review.openstack.org/40615114:46
pabelangeroops14:46
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Handle delete exception when external_name is empty  https://review.openstack.org/40574114:47
pabelangerokay, left a comment about how this could happen14:48
pabelangerhowever, concerned we could leak images14:48
pabelangerbut, in our case, I think we just have bad data in zookeeper14:50
pabelangerI'm also going to stop nodepool-builder, as we are hammering our clouds trying to delete images that don't exist14:52
Shrews++14:53
pabelangerback shortly, running an errand15:03
*** jhesketh has quit IRC15:04
*** jhesketh has joined #zuul15:06
*** openstackgerrit has quit IRC15:33
clarkbpabelanger: also this is what the alien image list is for, to find those leaks if they happen15:55
*** openstackgerrit has joined #zuul16:23
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Handle delete exception when external_name is empty  https://review.openstack.org/40574116:23
clarkbpabelanger: maybe ^ should come with a test?16:24
pabelangerclarkb: Sure, was thinking that too16:25
clarkbI would let it run with the fakes to successfully upload, then edit the zk entry to set externals to None then see that it gets cleaned up16:25
pabelangerYa, that works16:25
*** saneax-_-|AFK is now known as saneax16:39
*** saneax is now known as saneax-_-|AFK16:55
pabelangerokay17:01
pabelangerreproduced the issue17:01
*** bhavik1 has joined #zuul17:06
pabelangerremote:   https://review.openstack.org/405741 Make sure we clean up diskimages with invalid external_name17:10
pabelangerclarkb: ^ when you have time, see what you think of the test17:10
pabelangerthat should be close to what happened yesterday17:10
pabelangerjeblair: Shrews: ^too17:11
clarkbpabelanger: I'll admit I don't immediately grok when the test starts setting the rebuild age to 2 days but then later we only decrement by one day17:15
clarkbpabelanger: oh ya I think thats not necesary in this test?17:17
pabelangerclarkb: I wanted to emulate 2 days passed for our build times. So, first image build is 172800, 2nd 86400, and 3rd would be now()17:17
*** hashar has quit IRC17:17
clarkbok, maybe a comment explaining that might be good17:18
pabelangersure17:18
clarkbpabelanger: ok posted comments. Mostly nits but also I think the assertion can be more robust17:22
pabelangerlooking17:22
pabelangerclarkb: re: 3rd comment, imageDelete would not happen in this case, since we don't have an external_name. Suggestions on how to check we actually don't issue the deleteImage() call17:28
clarkbpabelanger: I think what you are looking for is that the upload no longer exists in zk17:28
pabelangerya, we can do that17:28
clarkbbasically get uploads 0000001 should be None17:29
clarkbor [] etc17:29
pabelangeryes17:29
SpamapSjeblair: that's amazing!17:37
SpamapSjeblair: (the auto board you made)17:37
SpamapSjeblair: will it get confused if we add some stories as well? I like the idea of having epic stories in 'todo' that invite people to click through and _add_ tasks, like the test adding one.17:38
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Make sure we clean up diskimages with invalid external_name  https://review.openstack.org/40574117:44
jeblairSpamapS: nope, that will be fine -- it ignores stories17:46
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Make sure we clean up diskimages with invalid external_name  https://review.openstack.org/40574117:47
pabelangerokay, commands to hopefully explain the madness17:47
pabelangercomments*17:47
jeblairSpamapS: i can point the script at board 41 if you want17:47
pabelangerjeblair: Shrews: do you mind reviewing 405741 again17:53
jeblairnp17:59
SpamapSjeblair: sounds like a good idea to me. Let me add a new lane.18:03
SpamapSjeblair: ok, It has a New, and I removed Done since the script will just cull the finished things18:05
SpamapSjeblair: maybe we can put your script in the zuulv3 branch under a contrib dir or something?18:06
jeblairSpamapS: good idea, will do.18:17
jeblair18:17 < openstackgerrit> James E. Blair proposed openstack-infra/infra-specs: Zuul v3: use Zookeeper for Nodepool-Zuul protocol  https://review.openstack.org/30550618:18
jeblairShrews, jhesketh, mordred, clarkb: ^ i have updated that based on comments and things we've learned so far... if you can take a look at that today or monday, that would be great -- that should let us get it on the infra agenda tuesday, and hopefully have it in place by the time we're ready to start on it.18:20
mordredjeblair: looks good on first read - I'm going to give it another read before voting to make sure18:29
jeblairmordred: cool, thx.  i added extra detail (mostly because i had to think a little more deeply to answer questions/comments and didn't want it to go to waste) but hopefully it's not too much.18:30
mordredno - I thnk it's great detail - it makes it easier to read18:31
mordredsort of like how the extended lord of the rings movies actually feel shorter18:31
Shrewspabelanger: technically, you should be locking the upload before calling storeImageUpload (similar to storeBuild), but I can't mentally trace what issues not doing so in that test might cause18:32
jeblairthe two towers is so different18:32
clarkbShrews: pabelanger in the test as long as the builder has reached steady state there shouldn't be any contention for the resources, however doing the locking probably not a bad idea just for sanity18:34
Shrewsclarkb: cleanup thread does locking18:34
clarkbShrews: but only when it makes changes which happens after these updates right?18:34
clarkbbut ya better/easier to just lock and not worry than try to figure out if it is safe not to18:35
pabelangersure, I can lock18:36
pabelangergive me a sec18:36
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Make sure we clean up diskimages with invalid external_name  https://review.openstack.org/40574118:41
pabelangerShrews: does that look right?^18:51
pabelangerah, you added +1 cool18:51
pabelangerjeblair: clarkb: okay, if you don't mind reviewing 405741 again, should be able to merge and start nodepool-builder again on nb01.o.o18:52
pabelangerand things cleanup magically18:52
pabelanger:)18:52
ShrewsThis pretty simple change has been seeking some love for a while: https://review.openstack.org/40429218:54
pabelangerI'll work on the sorted output of image-list once nodepool-builder is restarted18:58
jlkjust a quick question, is there any active work ongoing for github integration with zv3?18:59
jeblairjlk: nope, it's deferred until we have the first stage working18:59
jlk(we're discussing some workflow stuff and github limitations right now)18:59
jlkjeblair: okay thanks.19:00
openstackgerritMerged openstack-infra/nodepool: Add znode stat structure to build/upload objects  https://review.openstack.org/40429219:04
*** morgan has quit IRC19:34
*** TheJulia has quit IRC19:34
openstackgerritMerged openstack-infra/nodepool: Make sure we clean up diskimages with invalid external_name  https://review.openstack.org/40574119:43
pabelangerrestarting nodepool-builder19:50
pabelangeryay19:50
pabelangerthings are happening19:50
pabelangerhttp://paste.openstack.org/show/591281/19:51
pabelangercurrent image-list: http://paste.openstack.org/show/591282/19:52
jeblairthat's nicer19:55
pabelangeroh, haha19:55
pabelangerchecksum still exist19:55
pabelangerneed to add them into cleanup worker19:55
jeblair:)19:55
pabelangerwe still have some cruf in dib-image-list: http://paste.openstack.org/show/591284/19:56
pabelangerwill have to see what is going on there19:56
jeblairpabelanger: may i suggest adding a remove function to dibimagefile?  that way we centralize all the things needed to remove a "file" from the filesystem?19:56
pabelangerjeblair: sure, that makes sense19:56
jeblairand yeah, i agree those 'deleting' ubunte-precise images should be gone (within a minute or so after starting i would think)19:59
jeblairpabelanger: did the underlying files for those builds get deleted (or never created)?20:00
jeblairpabelanger: if so, the builder won't know it's responsible and won't delete them20:00
pabelangerjeblair: never created, we have diskimage-builder failures20:00
pabelangerhad*20:01
jeblairso yeah, i think we will need to use zk shell for that20:01
pabelangerlikely so20:01
pabelangerI plan on adding more testing around disk-image-create to make sure we don't leak info like that, haven't looped back to it20:02
pabelangerI should make a story for it20:02
jeblairand probably make sure we handle that case better in the future (remove the znode if the build fails).  or, use the builder name to determine whether a builder is responsible20:02
pabelangerlet me see how to do it in zk-shell20:03
jeblair(to summarize: only the responsible builder can delete the image build znode because it must also delete the file from disk.  currently, whether the file is on disk is used to determine whether a given builder is responsible.  so the case where a failure happens and no file is created needs to be handled somehow)20:04
clarkbIf its set to failed the builder that built it should be able to safely remove the znode regardless of on disk situationright?20:04
clarkbah20:04
clarkbthat answered my uestion as ifyou knew Iwould ask20:04
jeblair:)20:05
jeblairclarkb: right.  so we either have the builder remove the znode immediately after failure (reduces the edge case down to "the builder is forcefully killed before the file is created"), or use the builder name in the znode as authoritative (makes unique and unchanging hostnames "important").20:05
jeblairpabelanger: ^ some copy/paste material for the story :)20:06
clarkbmaybe use a nodepool install specific identifer that defaults to hostname?20:07
pabelangerack20:07
clarkbthen we can manage logical ownership separately20:07
jeblairya20:10
jeblairi think in our case it will always be okay to use hostname (until we decide to store builds on $networkfilesystem -- then the configured logical id makes sense)20:11
*** hashar has joined #zuul20:12
pabelangerokay, I've cleaned up the cruf in zk-shell20:15
pabelangerhave to run an errand, I'll address ^ once I get back20:16
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Delete hard upload failures from current builds  https://review.openstack.org/40634220:16
Shrewspabelanger: jeblair: Is something like that ^^^ what you guys had in mind?20:17
*** TheJulia has joined #zuul20:19
*** morgan has joined #zuul20:25
Shrewsthe only thing about that change is that you lose a record of the failure, unless you examine logs20:25
Shrewsthough that error should be very rare now, actually20:28
clarkbShrews: isnt that error dependent on the cloud?20:33
Shrewsclarkb: we've only seen that happen on older versions where we weren't trapping the exception20:33
clarkbor is this a caseof host instance reboots?20:33
clarkbgotcha20:34
Shrewsi've since fixed that20:34
Shrewsso it's a "clean up our oops from before" and "should it somehow happen again, clean that up too"20:34
clarkbya20:35
clarkbmaybe we want an explicit log for it?20:35
clarkbsomewhere around line 362 of builder.py20:35
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Delete hard upload failures from current builds  https://review.openstack.org/40634220:37
Shrewsgah20:38
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Delete hard upload failures from current builds  https://review.openstack.org/40634220:38
Shrewsclarkb: ^^^ log added20:38
clarkbcool will post proper review when done eating lunch20:39
*** bhavik1 has quit IRC20:39
clarkbbut looks fine at first glance20:39
openstackgerritAdam Gandelman proposed openstack-infra/zuul: Re-enable merge-mode config option and add more tests  https://review.openstack.org/40636121:06
clarkbShrews: in this change why are we restricting this cleanup to to builds_to_keep?21:21
clarkbShrews: is it not possible for a build record to have a non ready state and also have upload in uploading state?21:22
Shrewsclarkb: because a non-ready state build should be deleted by the cleanup thread. these are already handled in that case21:26
clarkbShrews: even if there are outstanding records in a non delete state?21:29
Shrewsclarkb: yep, unless one of those is a most recent upload21:30
Shrewsbut an aged out build should have already been cleaned up by this new code while it was current, anyway21:31
clarkbkk21:33
jeblairlgtm21:56
Shrewshrm, there was a bug there, but surprised the test did not catch it22:08
Shrewsah, found it22:17
*** jamielennox|away is now known as jamielennox22:22
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/zuul: WIP Add per-repo public and private keys  https://review.openstack.org/40638222:28
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/zuul: WIP Add per-repo public and private keys  https://review.openstack.org/40638222:30
Shrewsi'm impressed at how much that patch just does NOT work22:41
pabelangerShrews: jeblair: just noticed, fedora-24 currently has 2 uploading for the same provider: http://paste.openstack.org/show/591299/22:47
pabelangerI believe we are only upload 1 image right now, so possible it is a stale status22:47
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/zuul: WIP Add per-repo public and private keys  https://review.openstack.org/40638222:48
Shrewspabelanger: it would have to be. uploading is locked. either one is from a failure, or perhaps you kill it during a upload22:48
Shrewsmy last review should get rid of that, but it currently doesn't work22:49
pabelangerShrews: Oh, that is possible. I might have stopped nodepool-builder while an upload was in process22:49
pabelangerk22:49
openstackgerritRicardo Carrillo Cruz proposed openstack-infra/zuul: WIP Add per-repo public and private keys  https://review.openstack.org/40638222:51
ShrewsSo... VERY WEIRD zk behavior, but when i fix that code to actually delete the 'uploading' node, a new build ID is generated for that build, but it isn't from something that the builder does23:03
jeblairefriday23:03
jeblairShrews: if you want to push up a non-working patchset, i'd be happy to poke at it.23:04
jeblair23:07 < openstackgerrit> James E. Blair proposed openstack-infra/infra-specs: Zuul v3: update with Ansible role information  https://review.openstack.org/38132923:07
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Delete hard upload failures from current builds  https://review.openstack.org/40634223:07
Shrewsjeblair: run that single test with that ^^^ and look at the FOUND BUILDS log lines23:08
Shrewsthe JSON errors are thrown because build ID 000000002 shows up, but there is no data there to parse23:09
jeblaircool, i see that behavior.  i will eat some chocolate and ponder.23:10
openstackgerritAdam Gandelman proposed openstack-infra/zuul: Re-enable merge-mode config option and add more tests  https://review.openstack.org/40636123:10
Shrewsjeblair: let me know when your eyes start to bleed, as mine have. i will go do dinner type things for a while23:14
*** hashar has quit IRC23:44
* SpamapS trying to figure out where test_rerun_on_error gets the 'RUN_ERROR' state at all23:53
SpamapSjeblair: I only ever see 'RUN_ERROR' as a build state in the zuul test suite. Is it possible this test is an old vestige of the pre-2.5 days and some aspect of jenkins?23:56
SpamapSlike I can't even see how anything other than the test suite itself arrives at this RUN_ERROR result23:56

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!