Saturday, 2016-12-03

jeblairSpamapS: that is a state that only exists in the test suite.  that's basically testing that sendWorkFail() causes the job to run again.  in <=2.5 the fake build has a run_error attribute, and if it is set, it will send WORK_FAIL over gearman, but also add a history entry (to the list of build histories the fake job runner in the tests use to keep track of what has run) with RUN_ERROR so that it's easy to check that happened.00:04
jeblairSpamapS: it looks like the update to FakeBuild for v3 doesn't actually sendWorkFail anymore00:05
jeblairso i suspect the solution will be to figure out how to cause that to happen when the run_error flag is set00:05
SpamapSjeblair: so RUN_ERROR still happens00:08
SpamapSjeblair: but retries do not00:08
SpamapSjeblair: everything just gets flipped to SKIPPED00:08
jeblairSpamapS: right -- since RUN_ERROR is just intra-test communication -- the real thing that should be happening is the WORK_FAIL that results from setting run_error00:09
SpamapSso now trying to figure out why :-P00:09
SpamapSI'm having a hard time even mapping this to any concrete understanding of zuul. :-/00:09
SpamapSperhaps need to bore the hole a bit deeper into head.00:10
SpamapSjeblair: I think I'm grasping. So when a work fail gets received by the client, there's no 'result', so LaunchClient.onBuildCompleted should see that and set retry = True00:12
jeblairSpamapS: exactly00:12
SpamapSMight be the first time I've seen WORK_FAIL used right btw. ;)00:13
jeblairSpamapS: tbh, it was my second try.  :)00:13
SpamapSit's not altogether complicated, but it's just not always obvious that it's just a work complete by a different name. :)00:14
SpamapSmany users expect gearman to retry the job automatically when it's used00:14
jeblairSpamapS: i note that the current launcher in v3 doesn't actually send work_fail unless there's a problem with the job.  i think it's missing a big giant exception handler in _launch to send it.  i think the solution might be to add that, then have RecordingLaunchServer.runAnsible raise an exception if build.run_error is true.00:15
jeblairSpamapS: it's also fine to sendWorkComplete as long as result is None; it's the same thing either way.  i think that is actually what happens with the v2.5 launcher (might be good to examine that since it's battle-hardened).00:17
SpamapSjeblair: I think you're about 1 layer ahead of me. Was just wondering how I bubble that into the launcher00:18
clarkbjeblair: in v2.5 None signifies the worker disappeared or otherwise went away iirc00:18
SpamapSif a worker disappears, geard should send the work to another worker00:18
SpamapS(that's what gearmand will do anyway)00:18
jeblairterm collision here -- i think clarkb meant zuul worker00:19
clarkbyes the thing formerly called a slave00:19
SpamapSack00:19
SpamapSso the gearman worker will see that and sendWorkFail() which is empty result.00:20
SpamapSclarkb: the test I'm trying to re-enable is testing that those get retried00:20
clarkbya so I think in v2.5 it was changed to be success, fail, or nil and nil signifies something happened such that I don't actually have data00:21
clarkblike "slave" went away00:22
SpamapSthat's how I read it yes00:22
SpamapSalso gets that if somehow the worker recieved a job for a queue/function name that zuul doesn't understand00:23
SpamapSs/worker/launcher/00:23
SpamapSwhich is kind of a '# this should never happen' branch00:23
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Make waitForBuildDeletion() more robust  https://review.openstack.org/40641100:26
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Add checksum support to fake-image-create  https://review.openstack.org/40641200:26
SpamapSoh I actually think its' WORK_EXCEPTION not WORK_FAIL that happens in 2.500:26
jeblairSpamapS: yeah, i think you're right; that seems a reasonable thing to keep.00:28
* SpamapS almost figuring it out00:31
pabelangernext round of fedora-23 image uploads started00:32
SpamapSmmmmmmmmmm I think I got it down to a 5 line patch00:37
SpamapSand I think it might actually be correct00:37
* SpamapS will prove that to himself if he can write intellgible prose in the commit message00:37
SpamapSbtw now that tests are turned back on.. running the whole thing slams my laptop00:38
* SpamapS kinda likes that00:38
SpamapSRan 214 (+5) tests in 195.157s (-6.854s)00:40
openstackgerritJames E. Blair proposed openstack-infra/nodepool: Delete hard upload failures from current builds  https://review.openstack.org/40634200:41
jeblairShrews: ^ found it.  you're going to like it.  :)00:41
jeblairShrews: (i wonder if we could start passing in ImageBuild objects to avoid that sort of thing)00:42
clarkbSpamapS: now run it single process and appreciate how much faster it is that we run multiple workers00:42
SpamapSclarkb: right? sent my box to a load of 700:42
jeblairSpamapS: yes, it's doing lots of Important Work!00:42
jeblairSpamapS: oh, i should probably make a note of this in the dev guide -- you may be interested in setting "ZUUL_TEST_ROOT=/tmpfs"00:43
jeblairor wherever you keep a tmpfs00:44
jeblairlots of git hdd thrashing.00:44
openstackgerritClint 'SpamapS' Byrum proposed openstack-infra/zuul: Re-enable TestScheduler.test_rerun_on_error  https://review.openstack.org/40641600:44
SpamapSjeblair: my SSD laughs at your tmpfs00:45
SpamapSbut good idea :)00:45
SpamapSanyway, 406416 is the result00:46
jeblairi actually did it to extend the life of my ssd (probably by several hours)00:46
* SpamapS likes that the test remains unchanged00:46
SpamapSthere's a lot of really heated debate on tmpfs and ssds and swap lately00:46
SpamapSbut ultimatley, tmpfs for /tmp on laptops _should_ give you a bit less writing to the SSD. :)00:47
jeblairyeah, that's a no-go for me the way i use tmp00:47
SpamapS              total        used        free      shared  buff/cache   available00:47
SpamapSMem:          15747        3205        6742         561        5798       1153600:47
SpamapSSwap:         16075           0       1607500:47
SpamapSespecially when you have 16GB :-P00:47
openstackgerritPaul Belanger proposed openstack-infra/nodepool: Make sure we also cleanup checksum files  https://review.openstack.org/40641700:47
SpamapSjeblair: well technically, /tmp is supposed to be for _small_ temp files. Anything that needs to be spooled should be in /var/tmp00:48
clarkbmy tumbleweed install did a really weird setup for /00:49
clarkbit made subvolumes for everything00:50
pabelangereep, looks like we have a race condition in nodepool now00:50
pabelangerhttp://logs.openstack.org/11/406411/1/check/nodepool-coverage-ubuntu-xenial/399fa53/console.html00:50
jeblairSpamapS: i'm 200% more productive with the shorter tmp name00:50
pabelangerI'll poke at it for monday00:51
clarkbpabelanger: the coverage job can tickle them extra good too due to the instrumentation code making stuff run at much different speeds than typical00:51
pabelangerindeed00:52
SpamapSjeblair: actually that's poor practice in a multi-user system :)01:10
SpamapSjeblair: not that your laptop is multi-user :)01:10
SpamapSjeblair: you're much better off with ~/tmp01:10
SpamapSwhich is what I've trained myself to use01:10
clarkbI have one of those too01:11
clarkbused to be beacuse ubuntu though that only encrypting the homedir was smart for some reason01:11
clarkbnow I do it because it has actually helped me keep my homedir cleaner by starting stuff there if I know I want to rm it later01:12
Shrewsjeblair: wow04:13
Shrewsjeblair: if we passed the object in, we could do type checking to prevent such programming errors. (have i mentioned i hate the weak typing of python?)04:15
SpamapSShrews: eh.. python is strongly typed06:44
SpamapS"A strongly-typed programming language is one in which each type of data (such as integer, character, hexadecimal, packed decimal, and so forth) is predefined as part of the programming language and all constants or variables defined for a given program must be described with one of the data types."06:44
*** bhavik1 has joined #zuul11:12
*** willthames has quit IRC11:25
*** jamielennox is now known as jamielennox|away12:22
*** bhavik1 has quit IRC13:30
*** dmsimard|pto is now known as dmsimard13:35
*** harlowja has quit IRC14:40
mordredstrong/weak and static/dynamic == different axes. I agree with Shrews that the dynamic typing of python annoys me. which is funny - there was a time in my life when I really liked it16:44
SpamapSIt's truly a double edged sword though.18:12
SpamapSover time the two will support around the same velocity18:13
SpamapSbut agility in early dev is high with dynamic typing18:13
SpamapSso you get a nice steep early functionality spike18:13
SpamapSbut later as you move into maturity.. you find yourself chasing typing problems and it's definitely annoying18:14
fungianother zuul-related question for people who enjoy answering things on ask.o.o: https://ask.openstack.org/question/9989618:19
fungiwonder if we need an faq about why zuul enforces strict serialization and that it doesn't support atomic cross-repo merges18:19
SpamapSI'd be interested in the background behind why that's not supported (I think I know, but I'm not sure enough to answer)18:31
fungimy understanding is that zuul can't guarantee they all merge at the exact same moment, and if gerrit allows it to merge one and then refuses to let it merge another... you could end up in a bad situation18:45
fungithough there are likely other technical reasons as well18:46
fungi(and then there are the theoretical reasons behind not supporting it, in that it promotes poor development patterns, but that's more of a judgment call)18:47
jeblairit would not be much code for zuul to support that, and we probably will as an option in v3.  but we don't want to enable it for openstack because we value the strict sequencing which aids in CD.18:47
jeblair(in order to support it, we would have zuul push to gerrit for merges rather than depend on the gerrit submission queue)18:47
fungigood point18:52
fungithat would at least mitigate the situation i was concerned about18:52
jeblairyeah, the code is basically there (that was part of the idea of the mergers -- they even already have a push method), we just (intentionally) don't have a way to turn it on.18:53
fungii bet we could also stop crippling the git merge calls to allow the normal suite of strategies18:55
zarojeblair: have you taken a look at the Gerrit batch plugin.  im wondering whether there's any benefit using that over zuul mergers?18:56
fungionce we no longer have to rely on gerrit actually being capable of merging on its own18:56
jeblairfungi: yeah, aside from us possibly changing our minds and allowing simultaneous cross-dependent change merges, we might want to enable that to deal with the octomerge issue18:57
fungihttps://gerrit.googlesource.com/plugins/batch/ "...building and previewing sets of proposed updates to multiple projects/branches/refs that should be applied together..."18:57
fungineat18:57
jeblairzaro: no -- we might be able to use it, but also, we're doing just about all the work needed anyway.18:57
jeblairso i'm not sure it would save us anything18:58
fungialso with zuul-mergers we can scale pretty limitlessly18:58
fungiwhereas the batch plugin would likely put a lot of additional load on gerrit18:59
*** jamielennox|away is now known as jamielennox20:08
zarosorry i had to drop off.21:28
zarofungi: i think the idea is that load is mostly from cloning the repo and pushing back.21:30
zarofungi: therefore when zuul cloning it's putting load on gerrit.  with the batch plugin it's all done on server so no cloning needed, just the merges so might be less load.21:31
zarofungi: actually maybe not, since zuul clones from the cgit repos? so maybe there's not much benefit.21:38

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!