Tuesday, 2016-12-06

*** Cibo_ has joined #zuul00:50
Shrewspabelanger: i have a suspicion about that random cleanup failure. waiting for it to reproduce to confirm, but i think what's happening is that sometimes the upload state_time for 001 and 002 is the same, so 001 could sort before 002 in the upload recency table, so we see it as one of the 2 most recent uploads (not 002 and 003)03:47
Shrews(state_time is a rounded number because we int() it). getting things to happen quick enough, and at just the right time, is why it would be so hard to recreate03:49
Shrewsjeblair: maybe we shouldn't int() the state_time?03:51
jamielennoxhey, just before i sink any more time into the tests is zuultrigger expected to work in the same way ?04:11
*** mordred has quit IRC04:26
*** greghaynes has quit IRC04:26
*** Shrews has quit IRC04:27
*** phschwartz has quit IRC04:27
*** phschwartz has joined #zuul05:00
*** Shrews has joined #zuul05:03
*** mordred has joined #zuul05:04
*** greghaynes has joined #zuul05:04
*** anteaya has quit IRC05:16
*** mordred has quit IRC05:18
*** Shrews has quit IRC05:18
*** Shrews has joined #zuul05:26
*** mordred has joined #zuul05:27
*** anteaya has joined #zuul05:29
*** saneax-_-|AFK is now known as saneax07:11
*** bstinson has quit IRC07:23
*** mordred has quit IRC07:25
*** bstinson has joined #zuul07:28
*** phschwartz has quit IRC07:29
*** Shrews has quit IRC07:30
*** mordred has joined #zuul07:34
*** phschwartz has joined #zuul07:34
*** Shrews has joined #zuul07:35
*** abregman has joined #zuul08:17
*** willthames has quit IRC09:57
*** hashar has joined #zuul10:06
*** Zara_ is now known as Zara10:10
*** jamielennox is now known as jamielennox|away11:44
*** abregman has quit IRC12:04
*** Cibo_ has quit IRC12:16
*** abregman has joined #zuul13:02
Shrewspabelanger: so, yep, looks like state_time issue. we could add a .5s sleep between 001 and 002 creation in the test to get rid of that14:24
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Fix race in test_image_rotation  https://review.openstack.org/40753914:36
*** Cibo_ has joined #zuul14:37
Shrewspabelanger: ^^^  had to use a full second b/c rounding up or down isn't performed14:38
*** saneax is now known as saneax-_-|AFK14:44
openstackgerritMerged openstack-infra/nodepool: Delete hard upload failures from current builds  https://review.openstack.org/40634215:04
Shrewsoh, w00t for that merging15:05
pabelangerclarkb: Ya looking at it now15:06
clarkbpabelanger: sorry I wasn't able to track it down further15:07
Shrewspabelanger: i'm going to fix the merge failures of our test fixes15:07
pabelangerShrews: ack15:08
pabelangerclarkb: I waa thinking about it more last night, maybe we should just touch the md5 / sha256 files, since we are mocking uploads to shade15:08
pabelangerclarkb: going to try that this morning and see if there is a difference15:08
pabelangeror use static md5 / sha256 contents15:08
clarkbpabelanger: well that will fail the if md5 check15:08
clarkbpabelanger: but you could put arbitrary content in them for sure15:09
pabelangerYa, that15:09
clarkbya maybe give that a try15:09
openstackgerritMerged openstack-infra/zuul: Update storyboard links in README  https://review.openstack.org/40721215:09
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Fix race in test_image_rotation  https://review.openstack.org/40753915:11
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Fix race condition in test_image_rotation_invalid_external_name  https://review.openstack.org/40713915:11
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create  https://review.openstack.org/40641115:13
*** saneax-_-|AFK is now known as saneax15:30
pabelangerokay, running latest version of nodepool-builder15:31
pabelangerjust seen15:31
pabelanger2016-12-06 15:28:52,190 INFO nodepool.builder.CleanupWorker.0: Removing failed upload record: <ImageUpload {'build_id': u'0000000009', 'stat': ZnodeStat(czxid=955502, mzxid=955502, ctime=1481033836083, mtime=1481033836083, version=0, cversion=0, aversion=0, ephemeralOwner=0, dataLength=92, numChildren=0, pzxid=955502), 'external_name': None, 'state_time': 1481033836, 'image_name': u'centos-7', 'state':15:31
pabelangeru'uploading', 'provider_name': 'rax-ord', 'external_id': None, 'id': u'0000000002'}>15:31
pabelangerlooks like the clean up worked15:32
mordred\o/15:34
clarkbpabelanger: looks like coverage passed on 406411 but base py27 did not15:35
pabelangerclarkb: ya, that is the race in https://review.openstack.org/#/c/407139/15:35
clarkbgotcha15:35
pabelangerwe should land that stack this morning, Shrews added another patch too15:36
*** abregman has quit IRC15:36
*** abregman has joined #zuul15:37
Shrewspabelanger: fyi, took 487 runs overnight to reproduce that with the additional debug info i added  :(15:41
pabelangerShrews: Wow, where is the log for that?15:42
Shrewspabelanger: i have it locally15:42
pabelangerokay15:42
Shrewslemme see if i can find it again and paste for you15:42
Shrewspabelanger: http://paste.openstack.org/show/591555/15:45
Shrewsline 504 was key15:45
Shrewsbut you can see the recency table included 001 and not 00215:46
pabelangerYa, see that15:47
*** saneax is now known as saneax-_-|AFK15:47
Shrewsi'm going to try keeping the state_time precision and see what breaks. using int() on it is a carry over from the old code and i'm not convinced that's necessary15:49
pabelangerack15:50
*** Cibo_ has quit IRC16:17
mordredShrews: I'd love it if keeping precision would let us not have a sleep(1) in the tetss16:40
Shrewsmordred: me too. testing it now16:42
jeblairi think there was actually a sleep(1) or maybe even a sleep(2) in the original nodepool snapshot code to make sure that each snapshot had a unique timestamp.  so this is an improvement.  :)  but yeah, i think adding the sleep in the test is okay, but if we don't need it, even better.16:42
pabelangerSo, doing some ops things on nb01.o.o, I just deleted the latest ubuntu-xenial image in infracloud-vanilla.  nodepool deleted the image, but then proceeded to upload that image again. Obviously this is a change to nodepool gearman but open an interesting issue, I don't think we have a way to roll back to the previous uploaded image16:45
jeblairpabelanger: pause16:45
jeblairpabelanger: pause then delete16:45
pabelangeryes16:45
pabelangerokay, so lets try that16:46
pabelangerHmm, actually, don't think we have pause for uploads yet, just builds16:46
pabelangerlet me double check16:46
jeblair(we may also want a pause CLI)16:46
Shrewsok, i think not removing precision will "just work" for us, making my sleep(1) patch unnecessary16:47
pabelangerya, CLI pause could be useful too16:47
jeblairShrews: well, i +3d it so that other changes don't hit the race.  would you like me to revoke that, or do you just want to revert + change?16:48
Shrewsjeblair: i'll revert it in my next review16:48
jeblaircool16:48
mordredjeblair: would a "rollback" cli call be worthwhile?16:49
jeblairmordred: the sequence there would be: "nodepool rollback <something>; fix; nodepool unpause <something>".  versus "nodepool pause <something>; nodepool delete <something>; fix; nodepool unpause <something>"16:52
mordrednod16:52
jeblairmordred: i have a slight preference for the 3 step process on account of it's a bit more clear what's happening and what needs to be un-done when things are fixed16:52
jeblairer, i guess that wasn't clear.  i meant the second process.  :)16:53
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Do not truncate state_time precision  https://review.openstack.org/40760616:53
jeblairmordred: i'm pretty sure we will need the individual commands anyway... we can allways add rollback if we think it's useful.16:53
mordredjeblair: yah. I follow - I think the main usecase I was thinking of was "there is a content problem in all of the ubuntu-xenial images we want to delete all of them in all providers and wait until tomorrows rebuilds before we upload new images"16:54
mordredbut even that is likely not the right way to think about that _Really_16:55
jeblairmordred: yeah... i think that deleting the *image* might still work that way...16:56
jeblairpabelanger: you deleted the *upload* from infracloud-vanilla, but not the base image, right?16:57
jeblairmordred: i guess i should say that deleting the *build* might still work that way16:57
* jeblair increases terminology precision16:57
jeblairmordred: oh, no that wouldn't either.  not the latest build.16:58
jeblairso in both cases, something needs to be paused.  since we agressively try to build and upload.16:58
pabelangerjeblair: can you rephrase?16:58
jeblairpabelanger: your test deleted the upload from the cloud, not the diskimage, right?16:59
mordredyah - I think such a rollback would need to for all providers: pause, delete upload, delete build, unpause - I guess the question is "is this a situation where just rebuilding now would fix the issue" or "do you need to pause/delete then take manual fixing steps"16:59
pabelangerjeblair: right, we have 2 images uploaded, I did image-delete newest upload17:00
openstackgerritMerged openstack-infra/nodepool: Fix race condition in test_image_rotation_invalid_external_name  https://review.openstack.org/40713917:00
jeblairmordred: exactly.  so if an immediate rebuild will do, just delete the dib image.  if work needs to happen first, pause/delete/unpause17:00
openstackgerritMerged openstack-infra/nodepool: Fix race in test_image_rotation  https://review.openstack.org/40753917:00
jeblair(this is all stuff that we will put in the nodepool docs :)17:01
mordredyah17:01
jeblairpabelanger, Shrews: can you weigh in on https://review.openstack.org/404452 ?17:33
pabelangerAh, I've been meaning to test that17:34
jeblairwill probably need to update the checksum patch if we land it first17:34
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Delete builds when diskimage removed from config  https://review.openstack.org/40042117:36
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add pause option for image uploads  https://review.openstack.org/40762017:37
jeblairpabelanger: i just noticed the existing pause test was in test_commands.  can we move those to test_builder (and rework them to avoid using the cli?)17:39
*** abregman has quit IRC17:40
pabelangerjeblair: sure, I think we should keep test_dib_image_build_pause there, since we have some pause logic in nodepoolcmd.py17:42
pabelangerbut other could move17:42
jeblairpabelanger: agreed.  the next 2 can move i think.17:43
jeblairpabelanger: i +2d that change if you want to do the move of both of those tests as a followup17:43
pabelangeryes17:43
pabelangerI'll reworkthem17:43
pabelangersure17:43
pabelangerwow, our --checksum patch is finding all the things today17:46
pabelangerhttp://logs.openstack.org/11/406411/9/check/nodepool-coverage-ubuntu-xenial/b9dba9a/console.html17:46
pabelangerthat one is new17:46
pabelangerlooks like we lost our connection to zookeeper17:46
pabelangerShrews: ^ might be interested in that one17:47
jeblair2016-12-06 17:38:11.376907 | DEBUG [gear.Client.nodepool] Processing input on <gear.Connection 0x7f9cb42b3150 host: localhost port: 33769>17:49
jeblair2016-12-06 17:38:40.174067 | INFO [gear.Client.nodepool] Received admin data <gear.AdminRequest 0x7f9cb429f450 command: status>17:49
jeblairpabelanger: ^ that timestamp gap is really suspicious; like the host was unresponsive for 29 seconds17:49
Shrewspabelanger: looks like it never even started17:50
pabelangerjeblair: So, that is about the 3rd time we've seen a gap of 20-30 seconds. I'm starting to wonder if we are starving the CPU or something17:50
Shrewsjeblair: will review that other one later. Not at a computer now17:51
openstackgerritMerged openstack-infra/nodepool: Fail on bad options to fake-image-create  https://review.openstack.org/40445217:55
*** hashar is now known as hasharEat18:02
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create  https://review.openstack.org/40641118:02
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder  https://review.openstack.org/40566318:10
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Fix zookeeper config in test fixtures  https://review.openstack.org/40763218:10
jeblairpabelanger, mordred: ^ that test fix could potentially cause some test flakiness, though i don't know if we've actually seen it in zuul runs18:11
jeblairer, sorry, the fix could potentially fix it, not cause it.  :)18:11
pabelangeryay for fixes18:12
pabelangerpretty happy how much we are finding just from our unit tests18:12
jeblairoh, the timestamps in the test aren't real -- they are just when testr outputs everything at the end.  i think we will need to update the formatter to get proper timestamps.18:14
pabelangerAh, that explains a lot18:15
jeblairi'll push a change for that18:16
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Include timestamps in test logs  https://review.openstack.org/40764018:21
jeblairpabelanger, mordred: ^18:22
SpamapSjeblair: Shrews could you confirm that my statement is correct in the comments of https://storyboard.openstack.org/#!/story/2000812 ?18:23
pabelangerjeblair: clarkb: --checksum patch is green again: https://review.openstack.org/#/c/406411/ if you don't mind adding it back into your review queue18:31
clarkbI will take a peak once I get this xenialification change for designate passing and pushed18:32
pabelangerWFM18:32
pabelangerI'm going to look at launching nb02.o.o18:32
jeblairSpamapS: looking18:37
jeblairSpamapS: actually i don't think it is addressed.  i think 811 addressed the thing that brought it to our attention, but when we tried to manually fix the bug described by 811, we noticed that running a manual image-delete didn't actually delete the image, which is what 812 is going on about.  :)18:41
jeblairSpamapS: i'll push up a change with a failing test for that18:41
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Make test_image_delete fail  https://review.openstack.org/40764918:43
openstackgerritJames E. Blair proposed openstack-infra/zuul: Add roadmap to README  https://review.openstack.org/40721318:56
openstackgerritJames E. Blair proposed openstack-infra/zuul: Add update storyboard script  https://review.openstack.org/40722918:57
jeblairpabelanger: regarding 406411 -- i thought we were going to put a remove method on the dibimagefile?18:59
pabelangerjeblair: I still can, thats what clarkb and I can up with now.  But, if you don't mind providing some guidance, I'll update it again19:02
jeblairpabelanger: i left a comment19:03
clarkboh I was just trying to model the object attributes with the class, but making removal a method too sounds good to me19:03
*** hasharEat is now known as hashar19:05
pabelangerwoah19:06
pabelanger| ubuntu-xenial-0000000010  | ubuntu-xenial  | nb02    |           | building | 00:00:01:57 |19:06
pabelangernb02 is our winner today for xenial19:07
pabelangerI had just started it too19:07
*** jamielennox|away is now known as jamielennox19:07
Shrewspabelanger: oh, it's active already. neat.19:28
pabelangerya, it scooped up the build right away19:29
openstackgerritJesse Keating proposed openstack-infra/zuul: Add support for github enterprise  https://review.openstack.org/40767519:33
openstackgerritMerged openstack-infra/nodepool: Include timestamps in test logs  https://review.openstack.org/40764019:33
*** pleia2 has quit IRC19:34
*** pleia2 has joined #zuul19:34
*** toabctl has quit IRC19:35
*** toabctl has joined #zuul19:36
SpamapSjeblair: ah! ok, I'll mark 812 as in progress then19:36
SpamapSoh n/m you did that :)19:37
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add --checksum support to disk-image-create  https://review.openstack.org/40641119:46
pabelangerjeblair: okay, I think I captured your request^. Not sure passing log is the right approach I took, but will update based on comments19:47
pabelangerAlso seeing some cross clean up across nb01 and nb02, each helping to remove images after we've uploaded a new one19:48
openstackgerritJesse Keating proposed openstack-infra/zuul: Add support for github enterprise  https://review.openstack.org/40767519:49
openstackgerritMerged openstack-infra/nodepool: Do not truncate state_time precision  https://review.openstack.org/40760619:56
pabelangergoing to be calling it early today, need to dadops at home today. Everybody is sick20:00
pabelangerI'll pick up the rest of my patches in the morning, unless somebody else beats me to it20:00
clarkbugh everyone is sick is my new ugh20:01
clarkbwe are all recovering from something20:01
pabelangerhowever, nb01.o.o and nb02.o.o are both processing things, if people want to poke at them and look at logs20:01
openstackgerritJesse Keating proposed openstack-infra/zuul: Add support for github enterprise  https://review.openstack.org/40767520:04
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Fix image-delete command  https://review.openstack.org/40764920:05
Shrewsjeblair: ^^^ usurped your failure demonstration review20:06
Shrewspabelanger: how would i poke at them and their logs?20:06
Shrewsor is that for folks with special powers?20:06
Shrewspabelanger: also, the state_time fix merged20:07
clarkbwe should be hosting the image build logs and I thought there were changes to host the service logs that ianw wrote too20:10
clarkbShrews: look in and around http://nb01.openstack.org20:10
ianwclarkb: we don't expose the upload logs ... we *could*, but i just wasn't 100% certain they were sane to expose20:11
Shrewsclarkb: all i see are dib build logs. i'm not interested in those20:11
clarkbianw: gotcha20:11
ianwShrews: yeah, we only output build logs.  the other logs we don't just to avoid leaking info in them20:13
clarkbI know that ksa is supposed to br safe now iy scrubs passowrds20:15
ianwyeah, the way we deploy though, with the "bring in the latest of everything automatically" model, means what's true now might not be in 5 minutes :)  so that would be my concern20:17
mordredclarkb: fwiw, I am pretty satisfied with the level of scrubbing done by ksa20:20
morganif we missed some scrubbing (not in debug mode), let us know20:38
morganwe work very hard to keep secure data scrubbed in ksa20:38
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool: Fix image-delete command  https://review.openstack.org/40764920:38
morganeven in debug mode we do a reasonable job20:38
morganjust less effort because debug is meant to expose actual data20:39
clarkbah ok so maybe only expose info and above20:40
clarkband call that a reasonable compromise between conservative and useful20:40
mordredclarkb: yah - and honestly, we cannot handle ksa debug logging in our logs20:41
mordredit's SUPER HUGE20:41
mordred(that's where we log every REST interaction and their payloads)20:41
harlowjamordred hey21:06
harlowjacan u get the zookeeper package into epel21:06
harlowjatalk to some folks there...21:06
harlowjahttps://review.openstack.org/#/c/406878/ (kolla folks trying to use it)21:06
harlowjabut it doesn't exist (still...)21:06
harlowjause your SUPER powers to make that happen?21:07
mordreduh21:08
mordredharlowja: what possible powers do you think I have WRT epel?21:08
harlowjaall of them21:08
harlowjau are TREMENDOUS21:08
harlowjau are the one21:09
harlowjalol21:09
mordredharlowja: :)21:11
jeblairShrews: cool, thanks!21:12
mordredrbergeron: ^^ you know fancy special people I think ... this is a thing that is about to be more important with zuul - anything sane we can do to help that?21:12
jeblairmordred exercises his superpower21:12
Shrewsjeblair: may i ask you a question re: 400421?21:13
jeblairShrews: but of course21:13
Shrewsjeblair: i'm not 100% certain why you removed the checks against config.images_in_use in builder.py21:13
harlowjamordred thxs the one21:14
Shrewsjeblair: i mean, they still seem like something we want to do in order to not build images that are in the config, but not in use21:15
jeblairShrews: well, i believe i was thinking at the time that we would want to build them even if they are not in use.  but i neglected to recall that we would not know what format(s) to build.  so perhaps i should add those back?21:16
Shrewsjeblair: i think the images_in_use is based only on the 'labels' section (something i did not have from my test .yaml earlier today and i hit this code)21:17
Shrewsso maybe we know the formats?21:17
jeblairShrews: the formats come from the provider:images: section21:17
Shrewsright, so if we have provider:images, and diskimages:, but no labels:, they would not be "in use" and we COULD still build, but do we want to?21:18
jeblairbasically, 'diskimages:' causes an image to be built, 'providers:images:' causes an image to get uploaded, and 'labels:' cause nodes to be created21:18
jeblairmy idea was to make things ^ that simple21:19
jeblairhowever, the format thing makes it complicated21:19
jeblairso it really needs to be:21:19
Shrewsjeblair: ah, that's been a source of confusion for me (the relationship b/w those things)21:19
jeblair'diskimages:' causes an image to be built (as long as there's at least one 'providers:images:' section to tell us the format), 'providers:images:' causes an image to get uploaded, and 'labels:' cause nodes to be created21:19
jeblairShrews: fortunately, we accidentally simplified it (there was a more convoluted relationship between labels and provider-images)21:20
Shrewsjeblair: cool. so removing those checks doesn't guarantee we have provider:images: (i don't believe), so maybe we should keep them21:21
Shrewsbut that would be a really odd config21:21
jeblairShrews: true, though what i really did there was to remove the "in use by labels" check and add in a new "in use by provider-images" check.  so maybe i should replace the old checks with the new check.21:21
jeblairto clarify...21:22
Shrewsjeblair: that would make sense21:22
Shrewsto my feable mind, at least21:22
Shrewsjeblair: thanks for the clarification21:23
jeblairShrews: in config.py i replaced images_in_use with diskimage.in_use -- so i'm thinking that the old "don't build images that aren't in use by labels" checks could reasonably be replaced by "don't built images that don't show up in provider-images"21:23
Shrewsyup21:23
Shrewsthat's much more logical than the original, tbh21:24
jeblairShrews: i could actually implement that as "if diskimage.formats" which might express the issue a little more clearly21:25
Shrewsthat's fine too21:26
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Delete builds when diskimage removed from config  https://review.openstack.org/40042121:27
jeblairShrews: okay, that only affects checkimageforscheduledimageupdates; checkproviderimageupload simply won't even be called in that case, and cleanupimage already has a corresponding check21:28
*** willthames has joined #zuul21:29
Shrewsjeblair: +121:30
jeblairpabelanger: i'm sorry.  i see the problems you're talking about with PS11 of 406411.  given those, i think i now prefer ps10.  i think what i'm envisioning would mean substantial changes to dibimagefile and how it's used, so it might be best to just hold off.21:51
jeblairShrews: i'm inclined to go with ps10 of 406411 over ps11 ^ can you take a look and let me know if you have a preference?21:52
Shrewsjeblair: yep. will look a bit later21:55
rbergeronumm22:08
rbergeronmordred: will have to read up, gimme a bit22:09
rbergeronshrews: hai i'm in your city22:09
openstackgerritJesse Keating proposed openstack-infra/zuul: Add support for github enterprise  https://review.openstack.org/40767522:09
Shrewsrbergeron: ohai!22:10
Shrewsrbergeron: will likely see you at thursday night dinner22:11
rbergeronyay!22:12
rbergeronmordred: oh god, zk isnt in epel?22:12
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Fix zookeeper config in test fixture  https://review.openstack.org/40763222:28
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Don't use taskmanagers in builder  https://review.openstack.org/40566322:28
openstackgerritMerged openstack-infra/nodepool: Add pause option for image uploads  https://review.openstack.org/40762022:32
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Pluralize zk nodes with children  https://review.openstack.org/40773622:32
openstackgerritMerged openstack-infra/nodepool: Fix image-delete command  https://review.openstack.org/40764922:34
mordredrbergeron: so it seems?22:40
*** harlowja has quit IRC22:43
*** harlowja has joined #zuul22:43
*** hashar has quit IRC22:46
ianwharlowja: there is a rather large thread on that on internal rhos chat22:46
harlowjaianw gets me some popcorn22:46
ianwthe gist is, zookeeper is not really anyone's priority22:47
ianwbut we have a few options22:47
harlowjai tried to make it a customer request when i was at yahoo22:47
harlowjai think that bumped the priority22:47
harlowjabut guess not high enough :-P22:47
ianwi think it's fair to say that in terms of rhos, java dependencies are not wanted22:47
ianwand focus there is really on tooz and using i think etcd22:48
* harlowja wonders how that works for zuulv30022:48
Shrewsjeblair: what are the ps11 concerns you mentioned?22:48
harlowjaianw is the java fear really reasonable anymore?22:49
ianwharlowja: tooz?  it doesn't, it's zk specific22:49
ianwi dunno, but that's the vibe :)22:49
harlowjaweird22:49
mordredianw: that's gonna be all sorts of fun when zuul v3 is ready if it only really works on Ubuntu22:50
mordredianw: I guess nobody cares too much about mesos/kafka either?22:50
* mordred mostly trying to sort out in which ways he should go prod which people22:51
jeblairShrews: the logging is weird, for one.22:51
harlowjathe best way mordred22:51
harlowjau should prod the best way22:51
* Shrews hands mordred his freshly sharpened prodding stick22:51
*** yolanda has quit IRC22:52
*** yolanda has joined #zuul22:54
* mordred just starts stabbing, hoping that he hits someone22:54
ianwmordred: i'm going to try making some time with ggilles and we'll try bringing back some of the fedora stuff into his COPR repo and see just how far off it is22:54
ianwif it is a complete cluster f- of dependencies that's not promising, but if it mostly works, we might have a path to EPEL22:55
mordredsweet! let me know if there's any way I can be useful - although I'm guessing the liklihood of that is fairly low22:56
Shrewsjeblair: yeah, that is weird. The way we use mostly class methods of that class make its usage odd, IMO. i think i like ps10, too22:56
Shrewsjeblair: another thought... change from_image_id() to find the extra files, then use that and delete them in deleteLocalBuild()22:58
jeblairShrews: that sounds promising -- do you want to take a stab at that?22:59
Shrewsjeblair: sure. but tomorrow. :)22:59
jeblairk22:59
*** saneax-_-|AFK is now known as saneax23:03
*** yolanda has quit IRC23:04
*** jamielennox is now known as jamielennox|away23:05
*** jamielennox|away is now known as jamielennox23:06
SpamapSianw: to be fair, ZK is the least badly behaved java program out there in terms of dependencies on bad parts of java.23:08
SpamapSit's not maintained by the usual java crowd that vendors the world, only uses oracle java, and burns wings off flies for fun.23:09
jamielennoxjeblair: so i spent a while playing with the zuultrigger test yesterday - is this expected to work differently in v3 or am i misunderstanding it?23:09
jeblairjamielennox: oh, let me take a look23:10
jamielennoxjeblair: so triggers used to get maintained via scheduler and a scheduler is per source, now triggers get maintained per pipeline23:10
jamielennoxand so one pipeline getting triggered from another is never getting the messages23:10
*** yolanda has joined #zuul23:13
jeblairjamielennox: i don't think the end behavior is expected to change23:15
jeblairjamielennox: there's only one scheduler (that's basically the main loop)23:15
jeblairjamielennox: the idea of the zuul trigger is that it has some special methods (eg onChangeMerged) that are called any time interesting events happen, they put synthetic events into the event queue so that any pipeline can end up matching those events in more or less the normal way23:17
jamielennoxjeblair: ok, so the current failure is basically: https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/manager/__init__.py#L33123:18
jamielennoxself.sched.triggers is never populated, triggers live at pipeline.triggers23:19
jeblairjamielennox: ah, gotcha23:19
jamielennoxso self.sched is global and so triggers can be cross pipeline23:19
jamielennoxthere is a note from jhesketh here: https://github.com/openstack-infra/zuul/blob/feature/zuulv3/zuul/scheduler.py#L240 that i'm not sure applies any more23:20
jamielennoxit seems to be relative to loading/unloading which seems to be removed in v323:21
jamielennoxthe easy solve is to add back all triggers to sched.triggers as well23:22
*** yolanda has quit IRC23:22
jamielennoxi tried replacing that block with simply send a global parent merged to the scheduler but the handling gets interesting23:23
jeblairjamielennox: yeah, but i think that might be the only remaining use of that, so i'm thinking maybe we should be getting it/them another awy23:23
jamielennoxjeblair: ++23:26
jamielennoxit seems odd to have to iterate triggers all triggers to get them started and this is the only place it applies, hence the stop and ask23:27
jeblairjamielennox: there are a couple of things that strike me as odd here; let me try to collect thoughts.  back in a few.23:28
jamielennoxjeblair: no worries, i can find other things if you want to mull that one for a while, just thought i'd ask23:28
*** yolanda has joined #zuul23:35
jeblairjamielennox: take a look at https://etherpad.openstack.org/p/MFU7KKGIWH and let me know if that makes sense...23:42
jeblairclarkb: can you look at that quickly?  i'm especially interested what you think about option 2 there...23:43
*** yolanda has quit IRC23:43
* clarkb catches up23:43
jeblairjhesketh: ^ also23:44
clarkbI think the long term plan was to rely on gerrit checking for merge conflicts. However, do other tools like github support it? if not we may want to keep it as an option for zuul23:45
jheskethjeblair: looking23:45
jeblairclarkb: yeah, i have a slight worry that as soon as we remove it, we'll add it back, either for this or something else :|23:46
clarkbthat said maybe we can just keep the centralized triggerers and have them farm out to the pipelines?23:46
mordredI believe github does support reporting mergability23:46
mordredhttps://github.com/ansible/ansible/pull/18785 shows "This branch has no conflicts with the base branch"23:47
clarkbcool23:47
mordred  "mergeable": true,23:48
mordredfrom https://developer.github.com/v3/pulls/#get-a-single-pull-request (near the bottom of the very large json blob)23:48
clarkbjeblair: like maybe have both things where there are singletons that handle the critical sections that need "locking" but otherwise abstract out into trigger per pipeline?23:48
clarkbI dunno just talking out loud23:48
clarkber thinking out loud23:48
jeblairi think the parent-change-enqueued thing was to pull in cross-repo "depends-on" with unshared queues23:49
jeblairit's possible that we still want that, and the only reason we aren't using it is that $someone forgot to write a project-config change to turn it on23:50
jeblairyeah, the commit message suggests that was the reason for adding parent-change-enqueued23:51
jeblairso i'm leaning toward option 1, independent of whether we continue to use project-change-merged23:52
jheskethjeblair: I'd lean towards option 1 as even if we find a way not to use  zuul-triggers for now, they are a useful feature we may find outselves wanting in the future23:54
jeblairjhesketh: yeah, that would not surprise me23:54
jheskethalso if not us, others23:54
jamielennoxjeblair: so my opinion is that it's strange to have triggers handling the onChanged event anyway23:56
jamielennoxthis would seem to be a core part of zuul and something that doesn't need to live out in triggers23:56
jamielennoxzuul core can emit the event every time and then triggers are the things that listen and handle it23:56
jamielennoxwhich i think is option 123:57
jeblairjamielennox: i agree.  so i think that's how we should implement #1 -- except that i do think that the 'core' should check with all the triggers to see if the event will be used since it's expensive to create these events.23:57
clarkbjeblair: basically only emit the event if something is subscribed to it23:58
clarkb(in pub sub terms)23:58
jeblairclarkb: yep23:58
jeblair(project-change-merged in particular is crazy -- it's "enqueue an event for every open change for this project".  it can take a very long time to run on a cold cache)23:58
jamielennoxwhy is the expense in event creation vs just add the event and let the triggers decided if they want to handle it23:58
jamielennoxi mean it's less parent-change at that point and just emit a change-merged event and have the trigger determine if it has anything that needs to be re-enqueued based on that change going in23:59

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!