Friday, 2017-03-10

*** hashar has quit IRC00:18
openstackgerritJohn L. Villalovos proposed openstack-infra/zuul master: Only depend-on open changes  https://review.openstack.org/25495700:20
*** jamielennox is now known as jamielennox|away00:36
*** jamielennox|away is now known as jamielennox00:37
openstackgerritMerged openstack-infra/zuul master: Only depend-on open changes  https://review.openstack.org/25495700:40
*** saneax is now known as saneax-_-|AFK00:55
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Improve job dependencies using graph instead of tree  https://review.openstack.org/44397301:26
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Update test config for job graph syntax  https://review.openstack.org/44405501:26
jeblairthat's an auto-generated update to the test config files.01:27
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Support for github commit status  https://review.openstack.org/44406001:47
jamielennoxin zuul 2.5 is there a way from the CLI to stop a job and marked it failed03:24
*** saneax-_-|AFK is now known as saneax04:09
*** saneax is now known as saneax-_-|AFK05:51
*** saneax-_-|AFK is now known as saneax05:59
*** saneax is now known as saneax-_-|AFK06:27
*** bhavik1 has joined #zuul07:16
*** saneax-_-|AFK is now known as saneax07:20
*** Cibo has joined #zuul07:42
*** Cibo_ has quit IRC07:45
*** Cibo has quit IRC07:46
*** dmsimard has quit IRC08:12
*** bhavik1 has quit IRC08:14
*** dmsimard has joined #zuul08:35
*** hashar has joined #zuul10:05
*** Cibo has joined #zuul10:17
*** Cibo has quit IRC10:22
*** openstackgerrit has quit IRC10:33
*** bhavik1 has joined #zuul10:41
*** bhavik1 has quit IRC11:02
*** saneax is now known as saneax-_-|AFK11:28
*** Cibo has joined #zuul12:29
*** mptacekx has joined #zuul12:30
mptacekxHi zuul, can someone please suggest good nodepool version for zuul v2.5.x ?12:31
*** hashar is now known as rahsah12:32
mptacekxwe tried latest nodepool 0.4.0 but some strange behavior is seen, like time-to-time vm's building / deleting inm never ending loops ...12:32
*** Cibo has quit IRC12:34
*** yolanda has quit IRC12:46
*** yolanda has joined #zuul13:06
pabelangerjamielennox: no CLI command13:12
pabelangermptacekx: we are using 0.4.0 in production for openstack, what do the logs say?13:12
pabelangersounds like your DIBs are failing to build, they get deleted, and start again13:13
mptacekxpabelanger: actually it's about VM's, we noticed that time to time they are in building state but available and manually accessible for several minutes already. sometimes it helps if I connect to them manually, then also nodepool pass its chek or vm goes to delete state, so some change is triggered13:15
pabelangermptacekx: check the debug log for nodepool, it will say why it is deleting the VMs13:17
mptacekxpabelanger: usually it failes on OpenStackCloudTimeout: Timeout waiting for the server to come up.13:19
mptacekx, occasionally on Exception: Unable to run ready script13:19
mptacekxhow often nodepool should be checking that ?13:20
pabelangermptacekx: so you are hitting 2 issues, first is you fail to boot a VM on your cloud.  For that, you'll need to debug the cloud side and see why that is, could be lack of IPs, resources, etc.13:21
pabelangeras for ready-script, if you have nodepool setup to use it, it will be run for every node launch.  Again, you'll have to debug the output of your ready-script to see what it is doing, common issues are networking to the VM and possible DNS issues from the VM to internet13:23
mptacekxpabelanger: I think it's a nodepool issue, VM is spawned properly and I can access it myself, but nodepool didn't. It's not functional problem blocking all attempts, it happens e.g. when 5 vm's are spawned in parallel on 1 of them13:24
*** Cibo_ has joined #zuul13:24
pabelangermptacekx: do you mind posting your debug logs? That will tell us if it is nodepool or cloud13:25
pabelangerit is possible you mind need to update your clouds.yaml file13:25
*** hashar has joined #zuul13:26
pabelangeralso, nodepool is hard on clouds, so launching 5 VMs might be an issue too.  Easy way to test that, is limit jobs to a single VM to start13:26
mptacekxthis might help a lot, do you know how to do that ? I think it's simply failing when too many attempts are in parallel13:27
pabelangermptacekx: max-servers setting13:27
*** Cibo_ has quit IRC13:28
mptacekxpabelanger: this will sets just hard limit for number of servers, I thought if there is any way how to tell nodepool not to spawn 5 vm's at oncve13:29
*** yolanda has quit IRC13:29
mptacekxor you're suggesting to slowly increase max-server limits to avoid that ?13:29
*** yolanda has joined #zuul13:30
pabelangerright, I would decrease max-servers now, to see if you are having cloud issues.13:30
pabelangerbut yes, it is a hard limit for the cloud13:30
pabelangerotherwise, just have nodepool keep doing what is does, it is pretty aggressive about booting nodes. Eventually it will boot something :)13:31
mptacekxthanks, I will explore that option. There is definitely wrong something in cloud but I thought there is some second issue in nodepool itself, as I mentioned if it timeouts accessing VM which is normally accessible. Is there any way how to debug that apart of logs in /var/log/nodepool/* ?13:33
mptacekxAll I have is nodepool timeout in that logs13:33
pabelangermptacekx: we also have a rate setting, you could try playing with. Time, in seconds, to wait between operations for a provider13:33
mptacekxpabelanger: rate setting ? please ellaborate little more13:34
pabelangermptacekx: that to me sounds like a networking issue, once the VM is online, nodepool cannot SSH into it.13:34
pabelangermptacekx: see: https://docs.openstack.org/infra/nodepool/configuration.html#provider13:35
mptacekxhow often nodepool is trying ? each min ?13:35
pabelangertrying what?13:35
mptacekxvm to pass ssh check and turn from build to ready state13:35
pabelangerlook at boot-timeout, I believe the default is 60 secs13:36
mptacekxthanks a lot, I will play with all of that stuff13:37
pabelangersure, np13:37
dmsimardSo I never (ever) wrote a spec before but I took a shot at writing one for "a job reporting interface" in Zuul -- hope it makes sense: https://review.openstack.org/#/c/444088/13:48
*** openstackgerrit has joined #zuul14:13
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add reporter for Federated Message Bus (fedmsg)  https://review.openstack.org/42686114:13
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_alien_list_fail and alien_list cmd  https://review.openstack.org/44371414:58
mordreddmsimard: nice15:02
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add 'requestor' to NodeRequest model  https://review.openstack.org/44315115:07
jeblairdmsimard: thanks, that looks good.  i'll reply with some comments soon.  can you point me at an ara server install i can browse?15:08
dmsimardjeblair: ara is available in openstack-ansible and kolla-ansible jobs already -- let me give you a few examples15:08
jeblairdmsimard: no i meant one running on a server, not the staticly-generated site15:09
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add back statsd reporting  https://review.openstack.org/44360515:09
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove old/dead classes  https://review.openstack.org/44364415:09
dmsimardjeblair: sure, but there's no difference, though15:09
dmsimardthere's no disparity in features15:09
jeblairdmsimard: well, i wondered what you see when you go do "http://ara.example.com/" :)15:10
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Remove job_list, job_create, job_delete cmds/tests  https://review.openstack.org/44434415:10
dmsimardjeblair: I mean, there's http://ara-demo.dmsimard.com/15:10
dmsimardalthough hang on, that one isn't up to date15:10
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add leaked instance cleanup  https://review.openstack.org/44369015:11
jeblairmordred: can you do a pass of zuul changes please?  some of mine have been sitting for a few days.15:12
jeblairmordred: also, feedback on 443973 and 443985 would be appreciated15:13
openstackgerritMonty Taylor proposed openstack-infra/nodepool feature/zuulv3: Stop json-encoding the nodepool metadata  https://review.openstack.org/41081215:16
mordredjeblair: yup - was just doing the walk on the nodepool changes15:16
jeblairmordred: w00t15:17
*** mptacekx has quit IRC15:17
dmsimardjeblair: okay, I updated http://ara-demo.dmsimard.com/ to the latest version (there had been some bugfixes/improvements since I last updated it)15:17
mordredjeblair: gah. I could have sworn I reviewed some of these already15:22
jeblairmordred, Shrews: in 410812 i wonder if, instead of removing snapshot_image_id, we should update it to record the zookeeper image upload id?  that's what it actually is in the current version (recall 'snapshot ~= upload`)15:24
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_image_upload_fail  https://review.openstack.org/44434915:24
jeblairmordred, Shrews: otoh, since there's an arbitrary limit on the number of metadata fields, perhaps we should drop it.15:27
pabelangerShrews: yay for statsd things15:27
Shrewsjeblair: i had a similar question on node_id. i don't currently populate it. should we?15:27
Shrewspabelanger: i'm sure we'll need to adjust the keys15:28
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Add a test for a broken config on startup  https://review.openstack.org/44149915:31
pabelangerShrews: we can do something testing this morning, if you like. Get some data into graphite.o.o then update dashboards as needed15:31
Shrewspabelanger: i'll leave that for you if you want. i'm going to re-enable all the things, then FINALLY remove mysql   \o/15:32
pabelangerShrews: sure, I see if the code is landed and restart nl01.o.o15:33
pabelangeralso, Yay for database removal15:33
jeblairShrews: the old leak algorithm used it to look up the node record to see if it's known.  you switched to scanning all of the nodes and checking server id.  if we switched back to the old method, we would end up retrieving fewer znode records (since we would only pull the ones for our provider).  if we want to stick with the full scan, we can remove it.15:33
jeblairpabelanger: please don't have new nodepool send stats to statsd if it has the same provider names as production nodepool.15:34
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable test_nodepool_osc_config_reload  https://review.openstack.org/44435615:34
jeblairpabelanger: we'll end up overwriting our production data.15:34
jeblairpabelanger: as a solution, we can either adopt the clarkb method of using different provider names (and performing extra image uploads), or you can configure a statsd prefix so all the new stuff goes to a different place15:36
Shrewsjeblair: hrm, lemme think that one over. checking external_id does seem safer, but using the node id would be more efficient.15:36
pabelangerjeblair: Ah, right. good call.15:36
pabelangerlet me see how to configure a prefix15:36
jeblairpabelanger: 'grep -i statsd_prefix' i think will turn up some things15:37
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Re-enable TestWebApp tests  https://review.openstack.org/44435815:42
pabelangerjeblair: looks like export STATSD_PREFIX is a thing15:44
pabelangerreading up more now15:44
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Provide file locations of config syntax errors  https://review.openstack.org/44160615:54
openstackgerritMerged openstack-infra/zuul feature/zuulv3: Clarify Job/Build/BuildSet docstrings  https://review.openstack.org/43594815:56
mordredjeblair: the job graph change looks good to me - other than not passing tests - but who needs tests16:00
mordredjeblair: +1 on the spec change16:02
openstackgerritMonty Taylor proposed openstack-infra/nodepool feature/zuulv3: Rename osc to occ in tests  https://review.openstack.org/44438316:05
jeblairmordred: cool, i've just about finished cleaning up the tests16:06
*** yolanda has quit IRC16:06
mordredjeblair: awesome16:06
mordredShrews: ^^ I just submitted a meaningless followup to one of your patches16:06
*** yolanda has joined #zuul16:06
pabelangermordred: soooo, how much longer until we can finger zuulv3 :)16:07
mordredpabelanger: oh right. I need to finish those patches16:07
Shrewsmordred: stellar16:07
pabelangermordred: I'm happy to help too16:08
*** yolanda has quit IRC16:11
Shrewshrm... why is this node failing and locked? http://logs.openstack.org/44/444344/1/check/gate-dsvm-nodepool/7ab80aa/console.html#_2017-03-10_16_05_48_32981816:16
*** yolanda has joined #zuul16:18
Shrewsok, this should not be happening: http://logs.openstack.org/44/444344/1/check/gate-dsvm-nodepool/7ab80aa/logs/screen-nodepool.txt.gz#_2017-03-10_15_48_15_72116:19
mordredShrews: I blame jaypipes16:20
Shrewsmordred: how random16:23
Shrewspabelanger: you may want to hold off on updating nl01 until we investigate this odd failure16:23
pabelangerShrews: ack16:24
Shrewsand i'm a bit stumped atm16:24
Shrewsoh! i know. it doesn't have an external id yet, so it's racing16:25
Shrewswheeeeee16:26
*** yolanda has quit IRC16:26
Shrewsi blame EVERYONE else for not catching my mistake16:26
*** yolanda has joined #zuul16:27
Shrewsi think we'll have to do jeblair's suggestion to store node_id in metadata and check that against ZK16:27
pabelangerShrews: we should also thing about merging master into feature/zuulv3 for nodepool. Will pick up some testing fixes specifically. eg: we shouldn't be building fedora-25 for the dsvm job16:28
mordredpabelanger: with the deletions Shrews has been doing - it might be easier at this point to cherry-pick relevant testing fixes16:28
mordredpabelanger: but it's worth a try for sure16:29
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: WIP: Improve job dependencies using graph instead of tree  https://review.openstack.org/44397316:29
openstackgerritJames E. Blair proposed openstack-infra/zuul feature/zuulv3: Update test config for job graph syntax  https://review.openstack.org/44405516:29
pabelangermordred: agree16:29
Shrewsalso, we could skip instances whose state is BUILDING16:30
jeblairShrews: aha.  i don't recall if that's the reason we did that originally or not.  it may be.  at any rate, when you make that change, it's probably worth a comment so that we don't have to learn this lesson (yet?) again.  :)16:33
mordredjeblair: the best lessons are the lessons you learn over and over again16:34
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Fix fedora 25 pause bug with devstack  https://review.openstack.org/44440016:34
pabelangermordred: Shrews: we actually just need ^16:34
mordred++16:35
Shrewsjeblair: k16:35
ShrewsGot a lunch appointment with now, so will put up a fix when I return16:36
mordredShrews: silly food eating16:36
Shrewsmordred: it is needed energy-prep for tonight's game16:37
mordredShrews: yeah. good point16:38
mordredI need to find more energy for that myself16:38
mordredsince each team won on their home court so far - I guess the question is - is NYC truly a second home court for Duke?16:38
jeblairpabelanger: tobiash_ has some good comments on 438281.  since that's what people seem to favor, do you want to address those and we can move the tox jobs along?16:41
jeblairmordred: it's just like duke to have a second home in ny.16:42
mordredjeblair: where else would they store all of their i-banker alums?16:48
*** Cibo_ has joined #zuul16:50
jeblairmordred, tobiash_: 443976 (graph) + 444055 (test config file updates) pass tests when combined together now.16:52
pabelangerjeblair: toabctl: thanks, left reply / question.16:52
jeblairso i think i'd like to get some provisional reviews on the first one and the spec, and then we're ready to merge, i'll squash them.16:53
jeblair*when* we're ready to merge, that is16:53
mordredjeblair: fwiw, I do not think squashing those two will make review harder - since there is only one file shared between the two16:59
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks)  https://review.openstack.org/43828117:02
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks)  https://review.openstack.org/43828117:03
jeblairmordred: true... though there are a lot of files in the second one.17:03
mordredthat is true. there are a lot of files in the second one17:03
jeblairpabelanger: are we going to fix the 'all' problem?17:04
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks)  https://review.openstack.org/43828117:04
*** Cibo_ has quit IRC17:05
jeblairpabelanger: looks like it.  :)  some of those still have xenial though17:05
pabelangerjeblair: yes, let me fine error message17:05
openstackgerritPaul Belanger proposed openstack-infra/zuul feature/zuulv3: Add generic tox job (multiple playbooks)  https://review.openstack.org/43828117:06
pabelangeractually, ^ should expose the problem in ansible log17:06
pabelangerwill be interesting moving forward on the tox jobs, my thoughts about setting hosts: ubuntu-xenial in the playbook, means a little more control which hosts jobs run on.  I think its possible now, for a project to override the nodeset for the tox-py27 job and run on centos-7 lets say17:09
pabelangerI guess the same is true that projects and redefine the job and use a new playbook17:09
jeblairpabelanger: we can set "final: true" on any job where we don't want people to override things like nodesets17:10
pabelangergreat17:11
jeblair(though, in general, zuulv3 is a bit more trusting that people want to do the right thing)17:11
pabelangerya17:12
pabelangereither way, Yay, we have playbooks17:12
*** hashar has quit IRC17:20
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: Remove allocator  https://review.openstack.org/44442517:23
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: Remove jenkins_manager  https://review.openstack.org/44442717:25
jeblairShrews: approved https://review.openstack.org/443714 with comments17:26
pabelanger+3 on deletes too17:28
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable test_alien_list_fail and alien_list cmd  https://review.openstack.org/44371417:29
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove job_list, job_create, job_delete cmds/tests  https://review.openstack.org/44434417:31
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable test_image_upload_fail  https://review.openstack.org/44434917:32
pabelangerjeblair: Shrews: we have 5 locked ready nodes on nl01.o.o, do you mind looking?17:32
jeblairsure thing17:32
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable test_nodepool_osc_config_reload  https://review.openstack.org/44435617:33
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Re-enable TestWebApp tests  https://review.openstack.org/44435817:33
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Rename osc to occ in tests  https://review.openstack.org/44438317:34
jeblairpabelanger: http://paste.openstack.org/show/602292/17:35
jeblairpabelanger: the dump command has a section where it shows you what sessions hold ephemeral nodes17:35
jeblairpabelanger: you can see that all the node locks are held by session 0x15aa4882759000c17:35
jeblairpabelanger: and conveniently, that session also holds this node: /nodepool/launchers/nl01-5544-ProviderWorker.infracloud-chocolate17:37
jeblairpabelanger: we know that launchers, when they come online, create an ephemeral node with their name so they know which other launchers are active17:37
jeblairpabelanger: but it's convenient for us because we can see that it is the launcher which holds those node locks, without having to track down that session id some other way :)17:37
jeblairpabelanger: it rather looks like the launcher has frozen?17:38
jeblairpabelanger: i don't see any log entries for the past 30 mins17:38
mordredjeblair: I think launchers freezing sounds unfun17:38
jeblair(btw, we should renamed nodepoold to nodepool-launcher)17:39
mordredjeblair: ++17:39
jeblairi'm going to sigusr2 it to get a stack dump (i hope)17:39
jeblairyay that worked17:39
jeblairokay, first of all -- did we port over that paramiko fix to v3? :)17:40
pabelangernot sure17:41
* jeblair sorts relevant/irrelevant threads17:41
SpamapSmordred: jeblair fyi, I'm about to start writing a launcher security spec in earnest. I hope to have 1st draft shortly. If you have points you think we haven't discussed, now's a good time to poke them into my brain.17:42
openstackgerritPaul Belanger proposed openstack-infra/nodepool feature/zuulv3: Add destructor to SSHClient  https://review.openstack.org/44443317:43
pabelangercherry-pick of paramiko^17:44
jeblairSpamapS: ++. i *think* we hit all the high points in chats at the ptg.17:44
jeblairpabelanger: thanks!17:44
SpamapSjeblair: me too, just making sure so I can reduce edits. :)17:47
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: Handle exception edge cases in node launching  https://review.openstack.org/44443717:54
pabelangerjeblair: so, back to locks and zookeeper. If I understand, for some reason nodepool-launcher didn't unlock the node. Which, is what I see in the logs too17:55
jeblairShrews, pabelanger: i see the problem.  the providerworker is responsible for determining when a launch is complete and releasing the node locks.  however, it does that *after* starting new launches and, if needed, pausing new launches.  the way it pauses is to busy-wait right in the middle of the code path between starting new launches and finalizing completed ones.  in other words: it is paused indefinitely, waiting for nodes to be released.  ...17:58
jeblair... they never will be because everyone is waiting on it to release them.17:58
jeblairShrews, pabelanger: in other other words, we may need to move the work that happens in _removeCompletedHandlers out of the ProviderWorker thread.17:59
jeblairi will work on a test case for this17:59
mordredjeblair: nice catch18:01
pabelangerI see18:02
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove allocator  https://review.openstack.org/44442518:11
Shrewsjeblair: ick18:24
jeblairShrews: i'm making good progress on the test (it will take a while due to the complications you pointed out previously), but i haven't started on a solution yet.18:25
Shrewsthat's such a silly mistake. Can't think of a good solution off the top of my head18:28
Shrewsmaybe instead of waiting to fulfill the node set, short-circuit the fulfillment and release the node set we have, trying again later?18:30
Shrewsjeblair: also, regarding your comments on 443714... removing the database stuff and unused files was going to be last on my list after re-enabling all tests.18:34
jeblairShrews: that will cause large node requests (where large means >1) to starve at the expense of smaller ones when all providers are near quota18:34
Shrewsah yeah, don't want that18:35
Shrewspabelanger: oops, i re-enabled the image upload fail test and forget we needed this: https://review.openstack.org/43548118:39
Shrewsmordred: any chance you can +3 https://review.openstack.org/435481 for us?18:39
mordredShrews: yup18:46
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Disable CleanupWorker thread for test_image_upload_fail  https://review.openstack.org/43548118:52
Shrewsjeblair: congratulations. you encountered our first configuration file disagreement test failure: http://logs.openstack.org/27/444427/1/check/nodepool-coverage-ubuntu-xenial/185e0db/console.html#_2017-03-10_17_30_07_61034518:54
ShrewsUploadWorker working from the old config, CleanupWorker working from the new18:54
Shrewsi don't know how to eliminate that totally18:55
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: Add a failing test of node assignment at quota  https://review.openstack.org/44446218:55
jeblairShrews: ^18:56
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: Merge pull requests from github reporter  https://review.openstack.org/44446318:59
* SpamapS learning about the dark corners of cgroups, containers, and selinux.19:04
Shrewsjeblair: can haz node_quota.yaml file?19:04
mordredSpamapS: fun for you!19:05
jeblairderp19:05
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: Add a failing test of node assignment at quota  https://review.openstack.org/44446219:06
jeblairShrews: ^19:06
dmsimardSpamapS: even the bright corners of those are scary, good luck sir19:06
SpamapSdmsimard: so true :)19:09
jeblairShrews: i think we may want shared locks on builds for that.  get a (shared) read lock on an image to perform an upload, get a write lock on the image to delete it.19:15
Shrewsjeblair: yeah, well... kazoo doesn't have that19:15
jeblairShrews: https://zookeeper.apache.org/doc/r3.1.2/recipes.html#Shared+Locks19:15
jeblairdoesn't look too hard19:16
Shrewsjeblair: https://github.com/python-zk/kazoo/pull/30619:16
Shrewsat least mordred approved the PR!  :)19:17
jeblairi read that as mordred volunteering to fix it19:17
mordredheh19:18
jeblairafter lunch i will look at what is required to get that pr into shape19:18
mordredjeblair: I think mostly it's just about yelling at harlowja right?19:19
jeblair(before lunch, i will try to figure out how to use github)19:19
harlowjalol19:19
harlowjawhatt19:19
mordredjeblair: good luck ... it's a pretty shitty ui19:19
Shrewsharlowja: the kazoo shared locks thing came up again19:20
mordredharlowja: we've hit the point where https://github.com/python-zk/kazoo/pull/306 is gonna be important - so jeblair is going to work on it19:20
harlowjauh oh19:20
harlowjasweet19:20
pabelangerI too like to live dangerously19:20
harlowjamordred bbangert is more senior in that library then me, so once he's happy i'm happy19:20
harlowjathen i can click merge19:20
harlowjalol19:20
mordredjeblair: the fun part is that you can't submit patches to that PR - you'll have to pull those patches into your own repo and put up a completely different pr19:20
jeblairi what?19:21
jeblairhow do i collaborate?19:21
mordredjeblair: you don't19:21
mordredyou fork19:21
jeblairbut i thought github was about collaborating with people?19:21
mordredgithub is all about celebrating the individual ego19:21
harlowjanone of that collaboration crap19:21
harlowjalol19:21
jeblairharlowja: you may get an email with a diff.19:21
harlowjanot sure if i can update someone else's PR either19:21
harlowjalol19:21
Shrewsjeblair: that seems like a LHF task. i'd personally rather see you on zuul things and maybe get a volunteer for the kazoo patch19:22
mordrednope. the only way to update a PR is to push more patches to the branch the PR is a request to merge19:22
Shrewsjeblair: on the other hand, i'm almost done with nodepool, so....19:22
mordredShrews offers to jump on the GH grenade19:22
harlowjai already jumped on the manifesto grenade19:23
harlowjalol19:23
Shrewswell, i'm hoping to NOT have to... just would rather see jeblair's scarce time better used19:23
Shrewsdoesn't SpamapS have some new resources to point at things like that?  ;)19:25
Shrewsjeblair: mordred: SpamapS: I propose we put this as a topic for the next zuul meeting and see who has the time to see that PR through. Unless Jim just REALLY wants to work on it.19:31
jeblairShrews: i'm curious enough to spend a few minutes on it, but i'll give myself a short timeout.  :)19:31
Shrewsjeblair: fair enough19:31
*** bhavik1 has joined #zuul19:33
jeblair"test_dirty_sock"19:34
*** bhavik1 has quit IRC19:37
jeblairShrews: the only ideas coming to my head so far for the deadlock are: 1) have a new thread-per-provider to handle the nodelauncher poll/cleanup.  2) set an attribute on the providerworker indicating we are paused so we can proceed through the main loop without accepting new requests.19:37
Shrewsjeblair: yeah. i'm going to experiment with #219:39
Shrewsjeblair: wow. just what timing issues has your https://review.openstack.org/444427 review exposed? New failures each run19:43
* Shrews tries hard to avoid looking at two problems at once19:43
SpamapSwha hoo?19:44
SpamapSinteresting19:51
SpamapSlxc in its original form is EOL19:51
SpamapSlxc 2.0 is just lxd in local-socket-comm-only mode.19:51
SpamapS(well technically LXC 1.x is supported until 2019, but that's effectively dead to me :)19:51
openstackgerritK Jonathan Harker proposed openstack-infra/nodepool master: Write per-label nodepool demand info to statsd  https://review.openstack.org/24603719:59
jeblairShrews: that wase a removal of a file that isn't used or even loaded by anything carefully crafted to expose subtle timing errors.  apparently.20:07
SpamapShttps://review.openstack.org/444495 <-- Security spec 1st draft20:33
SpamapSIt's pretty light on details. I think we'll want to have subject matter experts weigh in on things.20:33
mordredjesusaur: ^^ your patch there - Shrews just reworked statsd reporting in the v3 nodepool fwiw20:40
pabelangermordred: jeblair: speaking of statsd, any reason not to +3 444363? currently has 2 +220:42
mordredpabelanger: nope20:43
*** hashar has joined #zuul20:45
jesusaurShrews: how drastically has nodepool statsd reporting changed?20:47
Shrewsjesusaur: take a look at the StatsReporter class in nodepool.py20:47
jeblairwell, the main thing relevant to that change is that the allocator is completely gone20:53
Shrewsjeblair: I think I have a fix for the deadlock. But the leaked instance race keeps biting me, so I'm going to now fix that and put that up ahead of it.21:09
jeblairkk21:12
Shrewsjeblair: actually, i'll just put out what i have and rebase your review when i have it21:13
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix failure of node assignment at quota  https://review.openstack.org/44446221:13
Shrewsjeblair: ^^^21:13
Shrewsjeblair: tl;dr, I've made the NodeRequestHandler code re-entrant so that it maintains state across calls to run()21:14
Shrewsjeblair: when ProviderWorker is paused, it just drains the handlers that are waiting on nodes until they're all finished21:15
* jeblair pauses kazoo timer and context switches21:16
Shrewsmordred: i have no idea why your nodepool metadata change is in merge conflict. it applied cleanly for me on top of current feature/zuulv321:25
jeblairShrews: i get it.  i left a couple comments.21:31
Shrewsjeblair: thx21:39
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Stop json-encoding the nodepool metadata  https://review.openstack.org/41081221:41
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Use node ID for instance leak detection  https://review.openstack.org/44450821:41
Shrewsmordred: rebased for you21:41
mordredShrews: thanks!21:45
openstackgerritJesse Keating proposed openstack-infra/zuul feature/zuulv3: support github pull reqeust labels  https://review.openstack.org/44451121:47
jeblairharlowja: while i'm in there, would you prefer RLock/WLock to be called ReadLock/WriteLock ?21:48
harlowjadoesn't matter to me21:48
* jeblair paints bikesheds already painted21:49
harlowjamatch python threading stuff?21:49
harlowjathat'd be fine with me21:49
harlowjaoh wait, python doesn't have it21:49
harlowjamatch the other library i made that does have it, lol21:49
harlowjaReadLock/WriteLock matches more of what i did there21:50
harlowjain https://github.com/harlowja/fasteners/blob/master/fasteners/lock.py#L10021:50
jeblaircool, i like that better.  i have obtained validation.  :)21:50
harlowjau can go read the upstream python change for that21:50
harlowjathey bikesheded all over that21:50
harlowjahttp://bugs.python.org/issue8800 from what i rememer21:51
harlowja'Seems to have fizzled out due to the intense amount of bikeshedding required.'21:51
harlowjalol21:51
harlowjahttp://bugs.python.org/issue8800#msg274795 (wasn't kidding)21:52
jeblairwow, a complete example of the law of triviality!  https://en.wikipedia.org/wiki/Law_of_triviality21:53
jeblairthe term references the act of killing an idea with trivial arguments21:53
openstackgerritDavid Shrewsbury proposed openstack-infra/nodepool feature/zuulv3: Fix failure of node assignment at quota  https://review.openstack.org/44446221:53
jeblairi think we usually use it to innoculate ourselves against that21:54
harlowjaya, so i read over that bug a long time ago, was like wtf, and then just made something21:54
harlowjato much bikeshed for me21:54
harlowjalol21:54
harlowjafeels bad that such patch started in 201021:55
harlowja:(21:55
harlowjapoor author21:55
harlowjasometimes on such threads i want to punch the other commenters in the nuts and tell them to have some feelings21:57
harlowjabut i can't say such things on such threads21:57
harlowjalol21:57
ShrewsOk, I have fixes up for the big flaws found today. Going to call it a week and prepare for the sportsball things. Night all22:05
jeblairharlowja: okay, clear your schedule.  i'm preparing a pr for you.  :)22:06
jeblairShrews: goodnight! happy sportsball!22:06
jeblairharlowja: as soon as i figure out how.  you may have a few minutes.  :)22:07
harlowjalol22:07
harlowjacounting down22:07
jeblairharlowja: https://github.com/python-zk/kazoo/pull/419   make magic happen!22:11
harlowjayes sir22:11
openstackgerritJames E. Blair proposed openstack-infra/nodepool feature/zuulv3: Store a pointer to the paused node request handler  https://review.openstack.org/44452022:33
jeblairShrews: +2 with an option ^22:34
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Add destructor to SSHClient  https://review.openstack.org/44443322:37
jeblairharlowja: hrm, it looks like the election tests are hanging22:44
harlowjapoopie22:44
jeblairthat doesn't make sense to me; i'm looking into it22:44
harlowjatravis and this stuff has always been let's say unreliable22:44
harlowjaso it may be travis fault, but may not, ha22:44
harlowjabut ya, looks like election something or other22:45
harlowjaafaik election stuff is just using a lock22:45
jeblairi think i can reproduce it locally, and i'm pretty sure i had a full working run before starting22:45
harlowjak22:45
harlowjahttps://github.com/python-zk/kazoo/blob/master/kazoo/recipe/election.py#L53-L5422:46
harlowjalol22:46
harlowjathat whole recipe is funny22:46
harlowjalol22:46
jeblairharlowja: derp, i think i see the issue22:46
harlowjaalmost feels like it shouldn't exist22:46
harlowjasince it adds about no logic to the lock recipe22:46
harlowjalol22:46
jeblairharlowja: okay, fixed locally...22:52
jeblairharlowja: while i'm here -- when i looked at the diff, i noticed a couple of minor things i can fix up...22:52
harlowjau can do it22:52
jeblairharlowja: there used to be a Lock._NODE_NAME.  the previous patch got rid of that in favor of passing a veriable to the constructor; i'm making it a class attribute again, but i inadvertently called it Lock.node_name.  would you prefer Lock.node_name, Lock.NODE_NAME, or Lock._NODE_NAME?  (note that the subclasses override this variable, but otherwise we don't expect users to touch it).22:54
jeblairharlowja: https://github.com/python-zk/kazoo/pull/419/files#diff-a08f51f50ea54f2f8138ab6045dc59c0L7222:55
jeblairfor context22:55
jeblairand https://github.com/python-zk/kazoo/pull/419/files#diff-a08f51f50ea54f2f8138ab6045dc59c0R40622:55
harlowjahmmmm23:00
harlowjai'll let u pick23:00
jeblairi'll go with _NODE_NAME and hope that conveys "protected class attribute" :)23:02
harlowjawfm23:02
jeblairharlowja: https://github.com/python-zk/kazoo/pull/419/ updated23:03
jeblairharlowja: builds are passing this time (with the exception of gevent which is failing to install; pretty sure that's not my fault).23:06
harlowjakk23:06
*** rahsah has quit IRC23:11
openstackgerritMerged openstack-infra/nodepool feature/zuulv3: Remove jenkins_manager  https://review.openstack.org/44442723:21
*** saneax-_-|AFK is now known as saneax23:30
mordredharlowja: assuming that PR from jeblair is good, how long do you think it would take to land and get into a release?23:38
harlowjaonce benbangert i guess checks it?23:39
harlowjathen i can make a release pretty quickly23:39
mordredcool!23:39
* mordred hands harlowja a pie23:39
harlowjanow u just have to accept my manifesto23:39
harlowjalol23:39
* harlowja takes pie before it gets taken away23:40
harlowjalol23:40
mordred:)23:40
* harlowja runs away with pie23:40
* mordred starts handing harlowja thousands of pies23:41
* harlowja dies23:41
mordredo noes!23:42
* rbergeron recommends not overdosing nice humans on the pie23:46
mordredMOAR PIE FOR EVERYONE23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!