Thursday, 2017-06-22

openstackgerritTristan Cacqueray proposed openstack-infra/zuul feature/zuulv3: requirements: update ansible version  https://review.openstack.org/47522801:10
*** lennyb has quit IRC03:40
*** lennyb has joined #zuul03:51
tristanCtobiash: how about using command: rsync instead of the synchronize module?04:19
tobiashtristanC: at least the shell module also didn't work delegated to localhost04:59
tobiashbut that's ok as I currently have a working workaround05:00
jamielennoxtobiash: you have a workaround to command on localhost?05:05
tobiashjamielennox: not for command but for pulling files05:06
jamielennoxi haven't figured out exactly what fails on mine, but currently there is no output from the command05:06
tobiashjamielennox: the output is missing as ansible tries to use py3 when delegated to localhost and many modules don't work yet with py305:06
tobiash(e.g. synchronize, shell, command,...)05:07
jamielennoxah ok, pull didn't seem like a problem, it was me trying to rsync from the executor to the log server which i was running via command05:07
jamielennoxtobiash: at least command should though, i patched it up th eother day05:07
tobiashjamielennox: didn't try command, just shell and synchronize05:08
jamielennoxtobiash: command and shell are basically the same thing05:08
tobiashah, ok05:08
jamielennoxsame code run, a few flags set05:08
tobiashjamielennox: for pulling files I currently use a combination of find and fetch modules05:09
jamielennoxtobiash: synchronize seemed to be pulling ok for me05:12
jamielennoxwith mode: pull05:12
jamielennoxi'm not sure how much of that runs on client vs server in that mode05:12
tobiashhm, for me that didn't work (failed without any helpfull messages)05:12
jamielennoxi haven't been able to debug it far05:16
*** smyers has quit IRC05:21
openstackgerritTobias Henkel proposed openstack-infra/zuul feature/zuulv3: Fix and test report urls for unknown failures  https://review.openstack.org/46921405:24
*** smyers has joined #zuul05:24
gundalowZuul people at AnsibleFest we have booked "Riverview 9" at 11am. The room is UK 2nd floor, e.g. the floor above the where the main tracks are07:21
jlkgundalow: what's the topics?07:58
pabelangergundalow: thanks08:00
gundalowjlk: I'd like to know the next steps for Zuul + Ansible integration. Especially anything that will need Ansible Core Team people so I can add it into the roadmap08:02
jlkCool. I'll be there.08:03
*** hashar has joined #zuul08:06
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Add a test to verify push reports only set status  https://review.openstack.org/47640008:09
jlkjamielennox: So when working on ^ I ran across something that may bite us. I think our code is trying to create a log path to use with the github status we report, and I think it's trying to use a PR number as part of that URL (and maybe the internal log path). But a push event doesn't have an associated PR.08:11
pabelangerremote:   https://review.openstack.org/476403 Allow secrets for check pipeline08:17
pabelangerso, I think ^ is going to be an issue.08:17
pabelangerI'm not sure we want secrets in check today for any untrusted job, but we do need it for trusted atm08:19
pabelangerssh private key for logs.o.o to publish job logs08:19
jlkoh eww08:26
pabelangerI guess we could move log publishing into a new pipeline08:36
jamielennoxlog publishing is a trusted job, so it runs outside of bwrap on the system08:45
jamielennoxso if you have an ssh key on the executor you can scp your logs to publish08:45
jamielennoxeg (because i have it open) https://github.com/BonnyCI/project-config/blob/master/playbooks/roles/base/tasks/post.yaml08:45
jamielennoxnote, that's the delegate_to that currently doesn't work - but it used ot08:45
pabelangerjamielennox: right, we could do that but means we need to manage keys with puppet.  We also are going to run trusted in bwrap shortly too: https://review.openstack.org/#/c/474460/08:46
pabelangerso, with that in mind, we could just store key as secret and have zuul just pass it08:47
pabelangerand we talked about that last week, and part of the reason we added tls to gearman08:47
jamielennoxpabelanger: what the benefit of being trusted if you're still in a bwrap08:51
pabelangerjamielennox: usage of lookups / ansible things08:51
jamielennoxalso yea, but we're already doing some key management as isn't the first key (the one nodepool puts on it) already going to need to be in the executor08:52
pabelangerya, today we need a the ssh key for nodepool. but I feel a job could manage the SSH key for logs.o.o under zuulv3. I guess if we think that is going to be an issue, we'll just fall back to having puppet manage that key08:55
jamielennoxso i've wondered a bit about that, particularly if you were uploading logs to an api of some sort instead of a straight ssh08:57
jamielennoxwhether you could use like a one time private key from hashicorp vault or something that would also tie the job identifier to the key08:57
jamielennoxbut all of these things will require some sort of trusted key on the executor08:58
pabelangerright, swift is a good example. we'd want a trusted job to publish to swift. I don't see us storing something directly on the executor for that, the job would have that info in a secret and we'd just use it09:00
*** hashar has quit IRC09:01
*** smyers has quit IRC09:03
*** hashar has joined #zuul09:04
jamielennoxright, but temp_urls work because you can create them with your token/username+password09:04
jamielennoxyou'll always have a bootstrap problem of how do you know who can create the upload approval token09:05
jamielennoxbut that's what the trusted jobs have been for09:05
*** hashar_ has joined #zuul09:09
*** hashar has quit IRC09:09
*** smyers has joined #zuul09:15
*** bhavik1 has joined #zuul09:23
jlkgundalow: The rooms aren't labeled, but we've taken the one straight across from the woman's restroom09:38
*** hashar_ has quit IRC09:40
jlkmordred: jeblair: we're upstairs in one of the two Ansible rooms, floor 2 above the main snack area of Ansiblefest.09:46
*** bhavik1 has quit IRC10:00
jeblairpabelanger: yeah, i don't think we want to allow secrets in check.  i think we'll have to have the log key installed on the executor and use it from a trusted playbook.  when we switch to using bubblewrap for trusted, we'll bind-mount that key in as well.  i'd like to be able to use secrets for this, but i think we're going to have to revisit some things, so let's not hold up progress on jobs.10:23
pabelangerjeblair: okay, I'll do that now in puppet10:24
jeblair(in particular, secrets are currently given to an entire job, and we'll have to think about whether we can make them available to just a single playbook within a job)10:28
pabelangerjeblair: so, we do not set zuul_changes today in zuul dict() in inventory file. We do have the legacy ZUUL_CHANGES variable, but I think we are (should now?) removing them10:55
mordredjamielennox, tobiash: do you have access to logs or anythhing thatshow the issues with syncronize and python3?11:01
mordredI'm sitting with toshio looking at the issue11:02
*** jkilpatr has joined #zuul11:04
*** hashar has joined #zuul11:17
tobiashmordred: I haven't figured out how to get logs for that issue11:42
tobiashjust see that ansible starts a py3 session11:42
tobiashand after that there is nothing in the executor log11:42
mordredahh. kk. I think I can try to reproduce that11:43
*** hashar has quit IRC11:46
*** hashar has joined #zuul11:46
tobiashmordred: http://paste.openstack.org/show/613370/11:48
tobiashthat's what I got with verbose executor debug logging11:48
*** hashar has quit IRC11:53
*** hashar has joined #zuul12:03
*** hashar has quit IRC12:18
*** hashar has joined #zuul12:19
mordredtobiash: awesome! thank you!12:52
mordredoh fun12:52
mordredpabelanger: "test/runner/ansible-test sanity --local --python 2.7" is they way to run ansible test without sourcing hacking/env-setup12:54
mordredpabelanger: --local tells it to use the local env not a virtualenv12:54
pabelanger++12:55
mordredtobiash: can you paste me your playbook with synchronize?14:03
tobiashmordred: let me check14:03
tobiashmordred: did much rebasing during debugging, have to check reflog...14:05
tobiashmordred: http://paste.openstack.org/show/613402/14:08
mordredtobiash: awesome! thank you14:09
tobiashprerequisite is that the main or pre job at least creates the logs dir in zuul_workspace_root14:09
tobiash(in theory if it would work)14:10
mordredtobiash: and which task broke?14:10
mordredthe publish logs?14:10
tobiashthe synchronize14:11
tobiash(collect node logs14:11
tobiashmordred: the publish works14:11
tobiashthe create log dirs also works14:12
mordredkk. thanks (we're trying to make it break here so we can debug)14:12
tobiash:)14:14
tobiashmordred: additional info is that I'm already using the patch to run trusted jobs in bwrap too14:18
tobiashif that has an impact on your debugging14:19
mordredtobiash: but you had the isseu before you deployed that right?14:21
tobiashmordred: I think, but I'm not sure anymore14:22
tobiashI was in a "Try and error until it works"-mode...14:23
tobiashand the result was a workaround with find and fetch modules as I wasn't able to really debug this...14:24
mordredtobiash: yah - we need better debugging structure around that :(14:55
mordredtobiash: and ... we can't make it break here :(14:55
mordredtobiash: but we're stil trying things14:55
* SpamapS catching up with all of you in GMT15:32
jeblairtobiash: we've tried running a playbook with that synchronize task inside of bubblewrap under py3.6 and still can't reproduce.  if you have a chance to try this procedure to get more info, that would be great: run zuul-executor with --keep-jobdir, run a job, grab the bubblewrap command from the logs, then run it manually from a shell.15:33
* SpamapS points at zuul-bwrap as a possible aid to debugging15:34
SpamapSI'm sure there's context that has been missed. Have we totally thrown out the idea of just running ansible-playbook using python2.7 ?15:35
jeblairtobiash: if that fails, then edit the ansible.cfg to remove zuul_stream and run it again, hopefully if there is a problem where zuul_stream is eating output, that will show more15:36
jeblairtobiash: also, i forgot to mention say, run 'zuul-executor verbose' before running the job to get -vvv output15:36
jlkI think there were some concerns about the streaming code in python 215:36
mordredSpamapS: well - we've mostly been focused on trying to reproduce the issue while we had the ansible core team here15:37
mordredSpamapS: and have been unable to do so - although we understand basically everything there is to know about synchronize now :)15:37
SpamapShah15:37
mordredSpamapS: general consensus is that there should be no issues with synchronize in python3 since the majority of it is run in the action plugin which is core-not-module15:38
SpamapSthat's exactly what I was hoping we could avoid tying our progress to.15:38
SpamapSahh15:38
SpamapSso maybe it's not python3's fault15:38
mordredyah15:38
SpamapSthat explains why the debug steps taken15:38
mordredSpamapS: we've also got context in folks heads, so if we can get it to happen again and get a reproducible case, we should be able to debug and figure out what's up15:40
SpamapSyeah I don't even have context on when it was seen15:41
*** hashar is now known as hasharAway15:43
*** tobiash_ has joined #zuul16:28
tobiash_jeblair, mordred: thanks, will try tomorrow again16:28
tobiash_I'm already @home without my work laptop16:29
tobiash_the log I posted earlier was already with verbose logging16:30
tobiash_maybe it could be also a different issue16:33
tobiash_how does zuul setup these module overrides?16:33
SpamapStobiash_: ansible.cfg loads them as plugins IIRC16:34
tobiash_there might also be a possibility of a leftover of earlier module overrides?16:34
tobiash_I'll clean up that directory and retry the synchronize again tomorrow16:35
tobiash_btw, will this stay like this or is there a plan to integrate localhost restrictions in ansible itself?16:38
SpamapStobiash_: that's a really long long way away.. but I think it's the right path16:44
SpamapShave a "safe mode"16:44
SpamapSit would need to be designed generically which might get complicated16:45
tobiash_ok, so more a topic for ansible 3 and zuul 516:47
tobiash_what comes to my mind now: does zuul inhibit loading custom modules along the playbooks?16:51
tobiash_like http://blog.toast38coza.me/custom-ansible-module-hello-world/16:54
Shrewsugh, i think my signal.pause() is affecting asyncio happenings16:59
SpamapStobiash_: I believe that's disabled in untrusted mode by the action plugin restrictions. Could be wrong.17:44
SpamapSShrews: Highly likely. Signals and async are always dicey.17:44
tobiash_ok17:48
*** tobiash_ has left #zuul18:11
*** tobiash__ has joined #zuul18:11
*** tobiash__ has quit IRC18:11
*** tobiash__ has joined #zuul18:12
*** tobiash__ has quit IRC18:12
*** tobiash_ has joined #zuul18:12
clarkbShrews: I am following up on the we don't delete uploads on process restart thing from yesterday. The current clean up worker checks if the upload is in progress by attempting to get the lock for that upload. If it gets the lock then it isn't in progress and should be deleted. Do we not release the locks when disconnecting from zk?18:46
clarkbShrews: our init script uses --signal 9, which may be part of it. I wonder if we can gracefully terminate and release locks on sigterm18:50
clarkbShrews: that said, our locks are expected to survive disconnects? or should our lock grabbing code check if whoever has the lock is connected before failing?18:50
clarkbharlowja: do you know re ^ and http://kazoo.readthedocs.io/en/latest/api/recipe/lock.html ?18:53
clarkbharlowja: if a zk client via kazoos lock recipe there hold the lock then disconnects is that lock still held by that client?18:54
clarkbI would expect not but that appears to be the behavior we are seeing18:54
harlowjame to, though potentially after the zk session timeout18:54
harlowjai didn't think there was an explict on disconnect release all locks (or try to before disconnecting)18:55
harlowjalocks are basically https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/lock.py#L213-L21418:57
clarkbharlowja: ok it does look like the records that were there yesterday (that I wanted to go away) have gone away on a query today18:57
clarkbharlowja: so the session timeout may be it18:58
harlowjaya18:58
harlowjacheck whatever that is set to18:58
clarkbharlowja: I don't see us setting it explicitly so I think whatever kazoo sets it to19:00
SpamapSI thought locks had an ephemeral component19:01
SpamapSso if the session goes away -> lock released19:01
SpamapSand that was like, why zk is good19:01
clarkbSpamapS: that is the behavior I expected but it definitely appears like that isn't the case based on nodepool builder behavior19:01
SpamapSso yeah, session timeout is very likely the thing you are seeing.19:01
harlowjasee https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/lock.py#L213-L21419:02
harlowja`node = self.client.create(self.create_path, self.data,19:02
harlowjaephemeral=True, sequence=True)`19:02
harlowjathats pretty much what a lock is, ha19:02
harlowjaan empheral node with sequencing19:02
SpamapSright19:02
SpamapSso until the session times out, lock will be held19:02
clarkboh I bet I now what is going on19:02
SpamapSclarkb: and yeah, --signal 9 seems like a poor choice. What's up with that?19:02
clarkbbecause we use sigkill we don't get to send the tcp fun19:02
clarkb*fin19:03
SpamapSyeah19:03
clarkbwhich means zk has to timeout19:03
SpamapSTCP timeout, even worse19:03
clarkbSpamapS: I don't know why we use sigkill19:03
harlowjaSpamapS ya agreed, if the only way an app can die is via signal -9, prob app should be redesigned :-P19:03
clarkbbut if we sigtermed I think python would gracefully kill that for us19:03
SpamapSwe should be sending SIGTERM and waiting a bit before SIGKILL19:03
SpamapSwhere's this init script?19:03
clarkbit is likely that way because we run daemon threads and don't mark them daemon?19:03
SpamapSstart-stop-daemon will do this for you automatically.19:04
harlowjathere are many reasons19:04
harlowjaa daemon thread really isn't a good answer imho either :-P19:04
clarkbSpamapS: yes this is start stop daemon19:04
clarkbharlowja: it is if this thing is literally supposed to run forever19:04
clarkbbut you still want to be able to stop it19:04
harlowjasorry, i mean, u can use them, but u still need a way to signal to those threads to stop in a nice-happy-fashion19:04
clarkbsomething somewhere is going to be in a tight loop19:04
harlowjanice-happy-death preferred19:04
clarkbharlowja: sure19:04
SpamapSyeah, should be using start-stop-daemon --retry 10  or something19:05
harlowjawhich usually evolves into breaking up your work into small tasks, and then in between tasks checking if some kind of event has been set (ie, for the thread to stop)19:05
harlowjablah blah19:05
harlowjalol19:05
clarkbSpamapS: https://git.openstack.org/cgit/openstack-infra/puppet-nodepool/tree/files/nodepool-builder.init19:06
clarkbwe do actually appear to have a graceful stop in the nodepool builder too19:06
SpamapSargh, cgit has no annotate?19:07
clarkbthat marks each running worker to shutdown, waits for them, then disconnects from zuuk19:07
clarkbI think I see the patch needed to make this work19:08
clarkbnodepool.cmd.builder handles sigint as a stop19:09
clarkbSpamapS: so ya switching the init script to sigint and wait before sigkill should do it19:09
clarkbSpamapS: I can review taht really quick if you are wanting to push it. Otherwise I'll sort out the right start stop daemon incantation19:10
clarkbyou know what the problem is going to be? its the block on image upload19:11
clarkbbeacuse that can take a few hours19:11
clarkbso we might separately have to make that die gracefully on sigint19:11
clarkbyup looks like upload image and image build are both synchronous with their workers19:14
clarkband we join() the workers on stop()19:14
Shrewsimage build spawns a subprocess that should receive any signals sent, so should terminate rather quickly. image uploading doesn't work that way.19:17
clarkbah ok so upload is the one to worry about heer19:17
Shrewsthis should pretty much follow the design of the older nodepool in this area19:18
clarkbya, difference is that older nodepool didn't rely on graceful connection closes to properly update staet in the database19:18
clarkb(beacuse there was no database)19:18
Shrews(sorry, was out getting new tires. yay fun)19:18
clarkbso old nodepool was fine with a forceful kill19:19
clarkbShrews: what do you think about joining only the build workers and not the uploads?19:19
clarkbthats the easy mode fix for this I think19:19
clarkb(and roughly equivalent to the behavior with kill -9 going on now)19:19
clarkbmy only concern doing that is upload holds a lock with the zk client and stop() will attempt to stop the client and kill it under the upload19:21
clarkbwill that raise in the context manager properly/19:21
Shrewswell, i'm not worried about the lock handling. that's automatic19:23
Shrewsi think what would happen is those records will just remain in the UPLOADING state and get cleaned up at some point after restart19:24
clarkbShrews: yup. That is what seems to happen today. The problem today is we kill the tcp connection to zk without letting zk know so we have to wait for that session to timeout. Whereas if we were to call nb.stop() properly with signit we'd close the zk connection which should unlock things at that point19:25
clarkbbut for that to be effective we can't wait for three hours for a thread to join. I think I see some other related bugs and will push up a patch shortly and we can look at this with concrete change19:25
Shrewsclarkb: are you finding the zookeeper session timeout too long?19:27
clarkbShrews: I think that is why those uploading records for centos hung around forever yesterday. Either they were waiting on the zk session timeout or the tcp connection to timeout on the zk side or both19:28
Shrewsclarkb: those hung around because they were part of the latest build19:28
Shrewswe needed two more builds to age them out19:28
Shrews(2 most recent builds and their zk records hang around)19:29
SpamapSI'm a little confused19:29
SpamapSnot looking at the code19:29
SpamapSupload threads or upload subprocesses?19:29
Shrewsthe problem with those was that we were not cleaning up not-in-progress UPLOADING records for the most recent builds19:29
clarkbSpamapS: threads19:29
SpamapSbut Shrews just said processes.19:30
SpamapSoh19:30
SpamapSthat's build19:30
ShrewsSpamapS: image builds are subprocess19:30
SpamapSk19:30
clarkbShrews: if you look at the cleanup code it will delete uploads in progress if nothing holds their lock19:30
clarkbShrews: and that is independent of age19:30
SpamapSthen yeah, don't join those19:30
SpamapSexit() should take care of them.19:30
clarkbSpamapS: yup19:30
Shrewsclarkb: nope. 1 sec...19:31
SpamapSclarkb: and the start-stop-daemon incantation is just --retry <timeoutvalue> instead of --signal 919:31
Shrewsclarkb: https://github.com/openstack-infra/nodepool/blob/master/nodepool/builder.py#L441-L44219:31
clarkbShrews: https://git.openstack.org/cgit/openstack-infra/nodepool/tree/nodepool/builder.py#n37619:31
Shrewsthat little bit right there keeps us from calling cleanupProvider19:31
SpamapS              If timeout is specified instead of schedule, then the schedule signal/timeout/KILL/timeout is used, where signal is the signal specified with --signal.19:32
clarkbSpamapS: thats a build not an upload19:32
clarkber Shrews ^19:32
clarkbShrews: the code I linked is called before the section you point out and is explicitly for builds we want to keep19:33
clarkbShrews: basically it is making sure that the uploads corresponding to builds we keep are curated19:33
clarkbSpamapS: so we just have to change the signal and set a timeout?19:35
Shrewsclarkb: yeah, you're right, i was in the wrong place yesterday19:38
Shrewsclarkb: perhaps we should adjust the session timeout in zookeeper server if it's too long?19:39
clarkbShrews: we could do that too, but right now we aren't able to gracefully shutdown the builder because uploads can take hours19:39
clarkbShrews: which is why I think we use sigkill in thei nit script19:40
clarkbso fixing that first seems like a good idea19:40
clarkbthen if we still have problems we can adjust the session timeout19:40
Shrewsi honestly didn't expect it to be very long, the docs for it are vague, measuring in "ticks": https://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkSessions19:40
clarkbShrews: ya I think SpamapS may be right that the tcp timeout is actually at fault because we don't greacefully close the tcp connection19:41
clarkbso zk has to hang around and wait for tcp to give up19:41
Shrewsah, that would make sense19:41
Shrewswhat is the tcp timeout? anyone recall?19:43
clarkbI don't I think it is defined by the kernel by default19:46
Shrewsclarkb: hrm, a quick test of a 'kill -9' on a kazoo script holding lock shows the lock disappear relatively quickly19:54
clarkbShrews: its entirely possible that there is more bugs in the cleanup code too resulting in the upload hanging around19:55
clarkb(eg its not just the lock we have to wrry about)19:56
Shrewsyeah19:56
clarkbbut I think either way we should avoid using sigkill and instead use sigint19:56
clarkbright now stop() is essentially dead code but it shouldn't be19:56
openstackgerritClark Boylan proposed openstack-infra/nodepool feature/zuulv3: Use private attribute lookup interanlly in builder.py  https://review.openstack.org/47667719:56
openstackgerritClark Boylan proposed openstack-infra/nodepool feature/zuulv3: Don't join image upload workers on stop()  https://review.openstack.org/47667819:56
clarkbthere I wrote lots of words to explain this19:56
Shrewsclarkb: oh, is it possible that the original nb* node that built the image was taken offline?20:00
clarkbShrews: no, I just restarted the service but the node was up the whole time20:01
openstackgerritJoshua Harlow proposed openstack-infra/nodepool master: Use threading.Event for quicker shutdowns  https://review.openstack.org/47668320:02
harlowjamy contribution ^20:03
harlowjalol20:03
Shrewsclarkb: hrm... so, i think we might have some hostname issues. 'nodepool dib-image-list' shows builds that have 'nb04' AND 'nb04.openstack.org' as the builder hostname. that could cause problems20:04
Shrewsi do not know how that could happen, but changing the hostname of a builder is a definite thing that jeblair and I have considered as a potential issue20:05
openstackgerritJoshua Harlow proposed openstack-infra/nodepool master: Use threading.Event for quicker shutdowns  https://review.openstack.org/47668320:05
harlowjaprob a pattern that should be used more ^20:05
Shrewswhich brings up SpamapS' idea of writing a unique id to disk and using that instead of hostnames20:06
clarkbShrews: I think that may have been related to pabelanger? I recall that he noticed hostname was fqdn in one cloud and just hostname in the other. Looks like they have been changed to both be just hostname20:07
clarkbShrews: happens because different cloud images and metadata retrieval/application methods apply that inconsistently20:07
Shrewsharlowja: ooh, i like that. though i don't think using that around the SUSPEND_WAIT_TIME portion of code is correct20:12
harlowjawhy not20:12
harlowjaprob don't want to loop forever waiting for zk to comeback up?20:13
Shrewsharlowja: oh wait. yeah, that works too20:13
Shrewsnm20:13
harlowja:)20:13
harlowjaya, shutting down things quickly is hard, lol20:13
harlowjahttps://github.com/google/python-subprocess32#subprocess32 is also nice, in that it allows the call to timeout20:14
harlowjaits the backport from py3.320:14
harlowjai've done stuff like run(cmd, on_timeout_cb=...)20:14
harlowjaso that u can like stop subprocesses that u start if u your app is dying/getting shutdown20:15
clarkbin this case seconds is fine, but hours isn't :)20:16
harlowjaup to u20:16
harlowjaalot of the time i see time.sleep in thread stuff it makes me thing a threading.Event is really whats wanted (so the timeout can finish and the thread can die), but anyways, the more u know :-P20:17
Shrewsharlowja: no, i definitely like the use of that. i think we can avoid setting very short timeout intervals in our tests using that20:18
harlowjaprob20:18
harlowjahttps://docs.python.org/3/library/threading.html#barrier-objects is the other stuff that is useful20:18
harlowjaespecially for the code that waits for Xthreads to start20:18
harlowjai've got a version that i can extract that is equivalent, somewhere in taskflow, lol20:18
harlowjaya, https://github.com/openstack/taskflow/blob/master/taskflow/types/latch.py#L2220:19
Shrewsi don't think we have need of barriers (thankfully)20:19
harlowjak20:19
*** jkilpatr has quit IRC20:20
Shrewsharlowja: it unfortunately doesn't work :(  debugging now20:28
harlowjaimpassible20:28
harlowjalol20:28
clarkbharlowja: does threading.event force a context switch?20:30
clarkbharlowja: I am assuming it does if it sleeps under teh covers20:30
harlowjayes20:30
harlowjapy2.7 def had time.sleeps, lol20:30
harlowjai think they made it slightly better in py320:30
harlowjahttps://hg.python.org/cpython/file/v2.7.11/Lib/threading.py#l34320:31
harlowja3.x they got rid of that loop, lol20:31
harlowjafrom what i remember20:32
harlowjaor at least moved it down a level...20:33
Shrewsharlowja: found the bug. the running prop needs to do "not is_set"20:34
harlowjaah poops20:34
Shrewsharlowja: i'll push up the fix. trying some other things with the tests, too20:34
harlowjathought i did that, ha20:34
harlowjakk20:34
harlowjathx20:34
harlowjanot high priority, i just saw this, and had urge to change it20:34
harlowjaback to writing peer reviews :-P20:34
clarkbharlowja: I just know sleep() gets used a lot because it is the documented way to force a context switch20:34
harlowjaya20:35
*** jkilpatr has joined #zuul20:40
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Use threading.Event for quicker shutdowns  https://review.openstack.org/47668320:42
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool master: Use threading.Event for quicker shutdowns  https://review.openstack.org/47668320:45
Shrewsharlowja: bug fixed and cleaned up a test. I also removed that TODO b/c I don't feel we need to bring that kind of complexity in for that.20:46
Shrewsharlowja: thanks!20:46
harlowjaShrews np :)20:47
harlowjathe hard part imho becomes when u start to have a thing that u call into that u want to pass that event into to stop that thing early20:48
harlowjau start to end up with coroutines and eventlet at some point, lol20:49
harlowjalol20:49
harlowja(or something like eventlet/asyncio...)20:49
Shrewsharlowja: ugh, please don't mention asyncio. i'm using it for some of our websocket streaming stuff and it's causing me to have nightmares20:49
harlowjalol20:49
harlowjai bet20:49
harlowjai would of just liked if eventet (at least the concept) was mainlined into python20:50
harlowjai think that would of gone over better20:50
harlowjabut meh, is what it is, lol20:50
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Use threading.Event for quicker shutdowns  https://review.openstack.org/47670520:52
Shrewsthe feature/zuulv3 cherry pick ^^^20:52
Shrewswant to carry that over to the launcher process, too20:53
*** dmellado has quit IRC21:00
*** dmellado has joined #zuul21:03
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Use Event for faster shutdown of launcher  https://review.openstack.org/47670921:07
harlowjacools21:32
*** dmellado has quit IRC22:33
*** dmellado has joined #zuul22:37
*** hasharAway has quit IRC22:48
*** dmellado has quit IRC23:03
*** dmellado has joined #zuul23:06
*** dmellado has quit IRC23:15
*** dmellado has joined #zuul23:18
*** dmellado has quit IRC23:31
jlkadam_g: thanks for the comments re GHE. Are you all running another zuul somewhere?23:43
*** dmellado has joined #zuul23:51
SpamapSjlk: they are23:52
SpamapSjlk: They're toying with a BonnyCI for an IBM internal project23:52
jlkneat!23:53
SpamapSalong with mestery23:53
SpamapSso it's all in the family :)23:53
jlkcraziness.23:54
*** dmellado has quit IRC23:55
jamielennoxlol23:57
jamielennoxwell, it shows value for bonny long term in ibm23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!