Wednesday, 2014-11-26

gusharlowja: yep, looks good.   That reminds me that I still need to go back and audit the code I wrote yesterday to make sure all the flows are using consistent names so they can be composed usefully...00:18
openstackgerritJoshua Harlow proposed openstack/taskflow: Add *basic* scope visibility constraints (WIP)  https://review.openstack.org/13724500:20
harlowjagus ^ also00:20
harlowjaseems to work on a local test, ha00:20
harlowjaship it :-P00:23
vishydoes anyone here have any idea why periodics fail to fire while oslo.messaging is atttempting ot reconnect?00:24
harlowjavishy using rabbit?00:30
*** dims_ has quit IRC00:30
vishyyup00:30
vishyi’m seeing a failure which has happened before where suddently a worker is processing messages00:31
vishybut not doing anything with them00:31
vishyin this case it happened after a 5 min network outage00:31
harlowjahmmm, using oslo.messaging master or version?00:32
harlowjacause i think that code just recently changed00:32
harlowja* https://github.com/openstack/oslo.messaging/commit/973301aa7000:32
harlowjamaybe something changed there that shouldn't have (if u are using that)00:33
harlowjasileht might have some ideas to00:35
vishy@harlowja http://paste.openstack.org/show/138493/00:36
vishyicehouse00:37
vishyso that is an example of where it is hung00:37
vishyugh darn it recovered00:37
*** ViswaV_ has quit IRC00:38
*** ViswaV has joined #openstack-oslo00:39
harlowja:(00:42
*** ViswaV has quit IRC00:44
vishyharlowja: ok great00:49
vishyso connecting to the eventlet backdoor and doing a print greenthreads causes it to recover00:49
vishyfun00:49
harlowjaseems like this loop https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_drivers/amqpdriver.py#L27200:50
*** alexpilotti has quit IRC00:51
vishyit looks like it was stuck here:00:51
vishyhttp://paste.openstack.org/show/138506/00:51
vishy@harlowja it looks like that while loop doesn’t ever yield to other greenthreads00:52
harlowjaya00:52
vishywhich might be what hangs the periodics00:52
vishyand then they somehow get into an endless loop where they don’t recover00:52
vishylet me try adding a sleep in there00:52
harlowjakk00:52
harlowjaor at https://github.com/openstack/oslo.messaging/blob/stable/icehouse/oslo/messaging/_drivers/amqpdriver.py#L22000:53
harlowjaif that timeout is None or something that might be bad00:54
harlowjait seems like other greenthreads there are also doing self._queues[msg_id].get(block=True, timeout=timeout) which i'm not sure, is that the same queue everyone is blokcing on00:55
vishyharlowja: weird there is another spot that is responsible for reconnect00:55
vishyand it does have a time.sleep00:55
harlowjavishy i also wasn't aware that more than 1 greenthread runs periodic tasks (but i might be wrong)01:00
harlowjaso if 1 periodic task is sucked up running that loop01:00
vishyyeah it has a spawn_n in loopingcall01:01
cburgessvishy: I'm with harlowja on this. I thought the periodics run in the main loop which will block on RPC reconnect.01:01
vishyso each loopingcall runs one01:01
*** yamahata has joined #openstack-oslo01:01
vishyi think the periodic tasks is a single call01:01
harlowjaso that equates to 1 thread?01:01
* harlowja checking what nova is doing01:01
cburgessOh you are saying it reconnects but the task never unblocks, sorry just read your thread trace.01:01
vishyso the issue is reconnect happens but periodics don’t start and the process is completely screwed01:02
vishymessages get pulled off of the queue but don’t make any progress01:02
vishyi noticed during the reconnect loop periodics are not firing01:02
vishyi thought it might be related01:02
vishythe other odd thing is logging in to backdoor and pgt()01:02
cburgessNormal orchestration requests also get pulled off but don't do any work?01:02
vishyseems to unblocks the periodics01:02
vishyyup01:03
cburgessDamn ok I feel like we saw this once a few months ago.01:03
cburgessTrying to remember what we determined it was.01:03
vishyi got a launch command and it said trying to start01:03
vishybut doesn’t actually do anything01:03
vishylike it doesn’t ever get into the libvirt thread01:03
vishyi strongly suspect this old issue: https://bitbucket.org/eventlet/eventlet/pull-request/29/fix-use-of-semaphore-with-tpool-issue-137/diff01:04
vishyhanging on the logging semaphore or the libvirt tpool01:04
vishybut i’m still a bit confused why the amqp sleep isn’t allowing the periodic tasks to make progress01:05
*** stevemar has quit IRC01:05
*** takedakn has joined #openstack-oslo01:05
cburgessThat link isn't loading.01:06
vishyyay atlassian01:07
vishywhole thing is down https://bitbucket.org/eventlet/eventlet/issues01:07
cburgessLOL01:07
cburgessRAD01:07
cburgessI vaguely remember that one though.01:07
cburgesseventlet 0.13 fixed it or something as I recall.01:07
vishygoogle cache01:08
vishyhttp://webcache.googleusercontent.com/search?q=cache:H57Vyv3erXEJ:https://bitbucket.org/eventlet/eventlet/pull-request/29/fix-use-of-semaphore-with-tpool-issue-137+&cd=1&hl=en&ct=clnk&gl=us01:08
vishyno it has not been fixed01:08
cburgessOh this one from comstud from way back when.01:08
vishyyup01:08
vishyand rax is still using a forked eventlet afaik01:08
vishywith his fix in01:09
*** takedakn has quit IRC01:10
*** amotoki has joined #openstack-oslo01:10
vishyis it possible that somehow that does not have a monkeypatched time?01:11
vishyi don’t see how that could happen01:11
cburgessI guess its theoretically possilbe.01:11
cburgessDoes it happen frequently? You could test the object and log something if its not patched.01:12
harlowjai wonder if the perodic task gets sucked into the following (never getting its reply for some reason)01:13
harlowjahttps://github.com/openstack/oslo.messaging/blob/stable/icehouse/oslo/messaging/_drivers/amqpdriver.py#L25201:13
* harlowja just speculating01:13
vishyholy crap that works01:13
cburgessWhat works?01:13
vishyperiodcs are still going at the moment01:13
vishyoh wait01:13
vishyso yeah the periodics are hanging waiting for replies01:14
vishythey run initially01:14
harlowjaya, do they get locked into that loop?01:14
vishyah so that’s it01:14
vishythe periodic processor gets stuck waiting for a reply from conductor01:14
harlowjaya, which i guess drops on the floor (due to reconnect)01:14
harlowjaand then spins01:15
harlowjamaybe even becoming 'Ok, we're the thread responsible for polling the connection'01:15
harlowja:-/01:15
cburgessYeah that _poll_connection function doesn't seem to have sleep or yielf.01:15
cburgesser yield01:15
cburgesshttps://github.com/openstack/oslo.messaging/blob/stable/icehouse/oslo/messaging/_drivers/amqpdriver.py#L20101:15
harlowjaya, its almost like we don't want periodic tasks to get stuck in that part01:15
harlowjathey shouldn't become the 'thread responsible for polling the connection'01:16
cburgessAgreed01:16
vishyi suspect it is locking trying to send the messagre actually01:16
harlowjaleave it to some other thread, ha01:16
vishylemme look01:16
cburgessCould be..01:16
vishyso i have three threads stuck in recoonect01:17
cburgessSo the rabbit server is down right now?01:18
cburgessOr unavailable?01:18
vishyone in service_update01:18
vishyreport state01:18
vishywhich makes sense01:18
vishynow in this case the recovery happens on that one01:18
vishybecause my services were happily reporting state01:18
vishyone in update_available_resource01:19
vishythat one is the one that never recovers01:19
cburgessOh....01:19
cburgessCan you send a link to the reconnect function you are stuck in?01:20
vishyits here01:20
vishy File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 580, in reconnect01:20
vishy    self._connect(params)01:20
vishybut when it does reconnect01:21
*** tsekiyam_ has joined #openstack-oslo01:21
cburgessThat line number if a bit off from what is in the stable/icehouse01:21
vishyi think it is stuck in loopingcall here:01:21
vishyhttp://paste.openstack.org/show/138506/01:21
vishymaybe i need to update my oslo.messaging?01:22
vishylemme look at a diff01:22
vishyi’m using 1.3.0-0ubuntu1~cloud001:22
harlowjait'd be interesting to see whatever the following outputs01:23
harlowjaLOG.debug('Dynamic looping call %(func_name)s sleeping '01:23
harlowja'for %(idle).02f seconds',01:23
harlowja{'func_name': repr(self.f), 'idle': idle})01:23
harlowjaif u think its stuck there01:23
*** tsekiyama has quit IRC01:23
vishyi added an import so that would move the lines by 201:24
vishyto try switching time.sleep for greenthread.sleep01:25
cburgessOK thats harmless then.01:25
*** tsekiyam_ has quit IRC01:25
vishyharloja that stops printing01:25
vishy*harlowja01:25
harlowjahmmm01:26
vishyi did get a WWARNING nova.openstack.common.loopingcall [-] task run outlasted interval by 426.364903 sec01:26
vishyon one of them01:26
harlowjai do wonder if stuff is just stuck in https://github.com/openstack/oslo.messaging/blob/stable/icehouse/oslo/messaging/_drivers/amqpdriver.py#L25001:27
*** mtanino has quit IRC01:27
harlowjain that durn loop01:27
vishythat is often the last message that prints01:27
vishyi don’t see that loop in the pgt output01:28
vishyany thoughts on how i could test for that?01:28
harlowjaFile "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 280, in wait ?01:28
harlowjahmmm01:29
*** dimsum__ has joined #openstack-oslo01:30
cburgessLog on entrance and exit of the loop?01:30
vishywait a sec01:31
vishyif the dynamic looping call function throws an exception01:32
vishydoesn’t it just exit01:32
harlowjaseems so, maybe that too01:33
harlowjabut i think u get a 'LOG.exception(_LE('in dynamic looping call'))'01:33
vishyi’m not seeing that message01:35
harlowjaya01:35
vishyok so it isn’t an exception01:36
harlowjait'd be interesting to know the timeout that 'message = self.waiters.get(msg_id, timeout)' has01:36
harlowjaor whatever timeout it think its using (hopefully not None)01:36
*** dimsum__ has quit IRC01:36
*** dimsum__ has joined #openstack-oslo01:37
harlowjaalthough the default that seems to be passed along if not provided is timeout=None01:37
harlowjahmmm01:37
*** denis_makogon has quit IRC01:37
harlowjawhich i guess becomes self.conf.rpc_response_timeout01:38
harlowjabut it could be X times of that, seeing all the places where that seems to be used01:38
vishyharlowja: added some logs to that message01:40
vishy* method01:40
vishyso we will see01:40
harlowjacools01:40
vishynot seeing logs at all01:41
vishyI don’t think it is making calls01:41
*** dimsum__ has quit IRC01:41
vishyi think the updates are casts01:41
vishyso it isn’t getting stuck there01:42
harlowjahmmm01:42
harlowjaseems to be though that the update_available_resource calls into the conductor, so i think thats a call; but i guess this isn't it01:43
harlowjapretty sure most of stuff using conductor is call01:43
harlowjaanyway01:44
harlowjaanyway u can get more traces from when this happens?01:45
harlowjawonder if a pattern will emerge01:45
harlowjawonder/hope, ha01:45
harlowjaits interesting how so much goes through the RPC layer nowadays01:46
harlowjaprobably be seeing more stuff like this i think01:46
harlowja*especially under reconnects, disconnects, partitions...01:50
vishyok so this is interesting01:50
vishythis is the last trace01:51
vishyhttp://paste.openstack.org/show/138536/01:51
vishyso it tries to reconnect with timeouts a bunch01:51
cburgessvishy: How are you simulating the failure? Killing rabbit?01:52
vishyno dropping traffic to rabbit01:52
vishyin both direections01:52
vishybut this definitely reproed01:52
cburgessOK random lark... try setting kombu_reconnect_delay to 1001:53
vishymy report state seems to be ok in this case01:53
vishybut my other thread failed01:53
vishyso i’m suspecting that this is the issue01:53
cburgessWhat is?01:53
vishytwo threads are attempting to publish to conductor01:53
*** dimsum__ has joined #openstack-oslo01:53
vishythey both attempt to redeclare the exchange01:54
vishythe first succeeds01:54
vishybut the second fails01:54
vishyand it doesn’t recover01:54
cburgessOH01:54
cburgessOh01:54
cburgessThere is a bug about this.01:54
cburgessWhere the hell did I see this bug...01:54
cburgesshttps://bugs.launchpad.net/neutron/+bug/131872101:55
vishyi remember a bug about needing to redeclare01:55
vishylooks like that never merged01:56
cburgessNope it didn't01:56
cburgessBut pretty sure its the same bug.01:56
harlowjaanyone know the historical reason that oslo.messaging just doesn't have 1 dispatch thread (that can also do reconnects and such); instead of having many threads that seem to try to do it together? (just out of curosity)01:59
vishycburgess: so not sure02:00
vishyso i have two greenthreads logging that they are trying to reconnect02:00
vishyone of them gets a timeout02:01
vishythen the other one gets an ioerror02:01
vishythey both log the error properly02:01
vishybut then only one tries to reconnect02:01
cburgessWell similar02:01
vishythe other one is just sitting there02:01
vishyok i have the hang finally02:06
harlowja??02:06
vishythis is where it is hung02:07
vishyhttp://paste.openstack.org/show/138542/02:07
harlowjaso its in that loop of death02:08
harlowjahttps://github.com/openstack/oslo.messaging/blob/stable/icehouse/oslo/messaging/_drivers/amqpdriver.py#L265 ?02:08
harlowjasomeone the periodic task got to take over ''Ok, we're the thread responsible for polling the connection' ?02:08
vishyno it is hung in send02:09
vishynot receive02:09
cburgessOK I have to run but let me know how this turns out.02:09
harlowjahmmm, from that paste send -> wait (the loop?) -> _poll_connection -> consume02:10
vishyyeah02:10
vishyweird02:11
vishyi added logging there02:11
vishyand i didn’t see any log messages02:11
harlowjaodd02:11
harlowjaleftover pyc file or something?02:11
harlowjathere are 2 branches of that loop, only log one branch?02:11
harlowjaor both?02:11
harlowjathere's also nested while True that it can get stuck in02:12
harlowjaa few of those02:12
harlowjaif the timeout is the default 60; and it does that spinning it really becomes 60 * infinite  from what i can tell (not knowing this code so well)02:13
vishyi definitely put logs everywhere02:14
harlowjaodd02:15
vishymaybe the pyc didn’t get regenerated02:15
vishyi have four logs in that wait method and i didn’t see any of them get hit at any point02:15
harlowjaa few more while True loops in https://github.com/openstack/oslo.messaging/blob/stable/icehouse/oslo/messaging/_drivers/amqpdriver.py#L20102:15
harlowjaweird02:15
vishyok got it logging now02:17
vishysweet02:17
harlowjawoot02:17
harlowjalog baby log02:18
harlowjaha02:18
vishyok i see the log saying it02:19
vishy2014-11-26 02:18:00.833 29972 WARNING oslo.messaging._drivers.amqpdriver [-] GOT LOCK FOR d9d2dec9dc88496fba77cd42f1a4f79a with timeout: 12002:19
vishyi have three separate methods waiting on reconnect02:19
harlowjawith one or 2 spinning?02:20
vishyso i suspect that it is when it passes the 120 seconds that it fails02:20
harlowjapossibly02:20
vishybecause if i reconnect quickly it seems to recover ok02:20
harlowjathe resource tracker does have alot of '@utils.synchronized(COMPUTE_RESOURCE_SEMAPHORE)' on it02:21
harlowjaalthough if one of those threads that has the lock gets promoted to the 'thread responsible for polling the connection' that would seem bad02:21
harlowjasince at that point nobody else will get said lock02:21
harlowjauntil that thread stops being the 'thread responsible for polling the connection'02:22
vishyor it could be  a race02:22
vishycuz it recovered ok this time02:22
harlowjakeep on trying, maybe adding the logging stuff in screwed it up due to timing changes02:22
vishyhttp://paste.openstack.org/show/138558/02:23
vishythat is very interesting02:23
vishyit looked like all three threads that were waiting there attempted to proccess all three message responses02:24
harlowja:-/02:24
* harlowja makes me wonder why oslo.messaging has so many dispatch threads (vs just one)02:24
*** raildo_ has joined #openstack-oslo02:24
harlowjaanyway, gotta head out, let me know what u find02:25
harlowjaseems to be getting closer (maybe)02:25
harlowjaha02:25
harlowja*not like mean 'ha'02:26
harlowjalol02:26
vishyok02:26
vishywell i will try to repro again02:26
openstackgerritJoshua Harlow proposed openstack/taskflow: Add *basic* scope visibility constraints (WIP)  https://review.openstack.org/13724502:26
vishywith the logging in02:26
vishyhaven’t had it repro with logging in02:26
vishyso.. add a sleep :)02:26
harlowjaya, sucky part if it never occurs with logging on due to eventlet02:26
harlowjabut lets see02:26
vishyrecovered 3/3 so far02:27
harlowja:(02:27
harlowjaya, so there's your problem, run more with debug logging02:27
harlowjaha02:27
vishyso looks like it is this issue as I suspected: https://bitbucket.org/eventlet/eventlet/issue/137/use-of-threading-locks-causes-deadlock02:30
*** ftcjeff has joined #openstack-oslo02:33
*** raildo_ has quit IRC02:35
*** dimsum__ has quit IRC02:52
*** raildo_ has joined #openstack-oslo02:57
*** raildo_ has quit IRC03:24
*** harlowja is now known as harlowja_away03:29
*** ftcjeff has quit IRC03:37
*** ftcjeff has joined #openstack-oslo03:41
*** ftcjeff has quit IRC03:42
*** dimsum__ has joined #openstack-oslo03:52
*** ftcjeff has joined #openstack-oslo03:55
*** dimsum__ has quit IRC03:57
*** amrith is now known as _amrith_04:04
*** ftcjeff has quit IRC04:08
*** zzzeek has quit IRC04:52
*** harlowja_at_home has joined #openstack-oslo05:08
*** harlowja_at_home has quit IRC05:12
*** stevemar has joined #openstack-oslo05:13
*** arnaud___ has quit IRC05:21
vishyharlowja_away: so that fix does not fix the issue05:27
vishywhich makes me feel better05:27
*** stevemar has quit IRC05:36
openstackgerritOpenStack Proposal Bot proposed openstack/oslo.db: Imported Translations from Transifex  https://review.openstack.org/13668406:01
*** stevemar has joined #openstack-oslo06:01
*** k4n0 has joined #openstack-oslo06:03
openstackgerritOpenStack Proposal Bot proposed openstack/oslo.utils: Imported Translations from Transifex  https://review.openstack.org/13656606:13
*** arnaud___ has joined #openstack-oslo06:30
*** takedakn has joined #openstack-oslo06:31
*** stevemar has quit IRC06:34
openstackgerritJoshua Harlow proposed openstack/taskflow: Add *basic* scope visibility constraints  https://review.openstack.org/13724506:45
*** takedakn has quit IRC06:58
*** ajo has joined #openstack-oslo07:08
*** exploreshaifali has joined #openstack-oslo07:18
*** e0ne has joined #openstack-oslo07:38
*** e0ne has quit IRC07:38
*** jaypipes has quit IRC07:50
openstackgerritJens Rosenboom proposed openstack/oslo.messaging: Fix reconnect race condition with RabbitMQ cluster  https://review.openstack.org/10315707:56
*** dtantsur|afk is now known as dtantsur08:13
*** oomichi has quit IRC08:17
*** yamahata has quit IRC08:18
fricklervishy: I reanimated the re-declare patch from https://review.openstack.org/103157 before seeing your discussion on this issue.08:32
fricklervishy: Did you check whether that patch might help mitigate the issues you are seeing?08:32
*** andreykurilin_ has joined #openstack-oslo08:35
vishyi haven’t checked yet08:51
vishybut i don’t think it will08:51
vishyfrickler: it seems to be getting stuck on a recv that never times out08:51
vishynasty08:51
*** andreykurilin_ has quit IRC08:58
*** andreykurilin_ has joined #openstack-oslo08:59
*** amotoki has quit IRC09:11
*** i159 has joined #openstack-oslo09:13
*** e0ne has joined #openstack-oslo09:16
*** ihrachyshka has joined #openstack-oslo09:19
*** vigneshvar has joined #openstack-oslo09:20
*** denis_makogon has joined #openstack-oslo09:26
*** pblaho has joined #openstack-oslo09:43
*** yamahata has joined #openstack-oslo09:44
*** andreykurilin_ has quit IRC09:52
*** subscope has quit IRC10:08
openstackgerritMerged openstack/oslo.concurrency: Add a TODO for retrying pull request #20  https://review.openstack.org/13722510:20
fricklerjenkins seems to fail with: 'oslo.middleware' is not in global-requirements.txt -- is this a known issue?10:26
*** andreykurilin_ has joined #openstack-oslo10:27
*** subscope has joined #openstack-oslo10:34
*** yamahata has quit IRC10:43
*** yamahata has joined #openstack-oslo10:45
*** arnaud___ has quit IRC10:48
jokke_ttx: ping11:05
ttxjokke_: pong11:06
jokke_ttx: sorry, wrong channel, will pm you11:06
*** exploreshaifali has quit IRC11:20
ihrachyshkafrickler: a link to review?11:26
openstackgerritMatthew Gilliard proposed openstack/oslo.log: Remove NullHandler  https://review.openstack.org/13733811:45
*** denis_makogon has quit IRC11:49
*** dmakogon_ is now known as denis_makogon11:49
*** denis_makogon_ has joined #openstack-oslo11:49
*** viktors|afk is now known as viktors11:52
jokke_anyone here who could help me with oslo.messaging?11:54
*** denis_makogon_ has quit IRC12:04
viktorshi! Does anybody knows the current state of the gate-tempest-dsvm-neutron-src-*-icehouse gate jobs ?12:15
viktorsjd__: hi! Maybe you know something about ^^12:20
viktors?12:20
fricklerihrachyshka: https://review.openstack.org/10315712:25
ihrachyshkafrickler: hm. I wonder what those jobs are for. they include 'neutron' in their name, and icehouse, and neutron definitely hasn't used oslo.messaging till Juno12:27
fricklerthe icehouse test has a different error from tempest about floating IPs12:29
fricklerbut I think none of these could be related to the patch12:29
ihrachyshkafrickler: as for oslo.middleware issue for juno+ jobs, I guess we'll need to add it to requirements repo for Juno branch12:30
ihrachyshkafrickler: devstack checks that all oslo.messaging requirements are present in Juno requirements repo, and I guess oslo.middleware didn't get there, yet12:30
*** hemanthm has left #openstack-oslo12:32
fricklerihrachyshka: so can you fix that? or who takes care of these gate things?12:34
fricklerall the latest reviews at https://review.openstack.org/#/q/oslo.messaging,n,z seem to share these three failures12:36
ihrachyshkafrickler: see https://review.openstack.org/#/c/134727/12:37
ihrachyshkafrickler: I guess it's meant to fix similar failures, though I wonder whether the patch is actually correct and will solve your particular issue12:38
ihrachyshkafrickler: I've asked Dave to consider your failures in comments, let's see what he has to say12:38
frickleryeah, that seems to be related, so I will wait for that, thx12:41
*** exploreshaifali has joined #openstack-oslo13:04
*** alexpilotti has joined #openstack-oslo13:22
jd__viktors: I don't13:22
viktorsjd__: ok, got it13:23
*** vigneshvar_ has joined #openstack-oslo13:31
*** alexpilotti has quit IRC13:32
*** vigneshvar has quit IRC13:34
*** jaypipes has joined #openstack-oslo13:36
*** gordc has joined #openstack-oslo13:42
*** yamahata has quit IRC13:50
*** dimsum__ has joined #openstack-oslo13:51
*** yamahata has joined #openstack-oslo13:51
*** vigneshvar_ has quit IRC13:55
openstackgerritDavanum Srinivas (dims) proposed openstack/oslo.vmware: Switch to use requests/urllib3 and enable cacert validation  https://review.openstack.org/12195613:56
*** _amrith_ is now known as amrith13:58
*** sigmavirus24_awa is now known as sigmavirus2414:08
openstackgerritMerged openstack/oslo-incubator: Don't log missing policy.d as a warning  https://review.openstack.org/13719114:14
*** exploreshaifali has quit IRC14:15
*** jgrimm is now known as zz_jgrimm14:32
*** kgiusti has joined #openstack-oslo14:49
*** zzzeek has joined #openstack-oslo14:53
openstackgerritDavanum Srinivas (dims) proposed openstack/oslo.vmware: Switch to use requests/urllib3 and enable cacert validation  https://review.openstack.org/12195615:12
*** exploreshaifali has joined #openstack-oslo15:19
*** mtanino has joined #openstack-oslo15:22
*** dimsum__ has quit IRC15:28
*** stevemar has joined #openstack-oslo15:28
*** dimsum__ has joined #openstack-oslo15:29
viktorsdhellmann: hi!15:30
dhellmannviktors: hi. Looks like we have some persistent gate issues. :-(15:35
viktorsdhellmann: are you talking about gate-tempest-dsvm-src-*-icehouse job?15:37
dhellmannviktors: yes15:37
dhellmannviktors: I was out yesterday; is someone working on that?15:37
* dhellmann is still catching up on the backlog of email15:38
viktorsdhellmann: I looked into project meeting log - http://eavesdrop.openstack.org/meetings/project/2014/project.2014-11-25-21.01.log.html15:38
dhellmannI'm just reading that now15:39
*** alexpilotti has joined #openstack-oslo15:39
*** tsekiyama has joined #openstack-oslo15:55
*** pblaho_ has joined #openstack-oslo16:01
*** dimsum__ is now known as dims16:03
viktorsdhellmann: can we merge this patch to unblock gates - https://review.openstack.org/#/c/136856/ ?16:03
dhellmannviktors: see the backlog in #openstack-dev -- I am working on a patch to pin the oslo requirements in the icehouse branch16:04
*** pblaho has quit IRC16:04
viktorsdhellmann: ok, thanks!16:04
*** bogdando has quit IRC16:09
*** bogdando has joined #openstack-oslo16:11
*** kgiusti has quit IRC16:16
*** pblaho_ is now known as pblaho16:16
*** k4n0 has quit IRC16:30
*** pblaho_ has joined #openstack-oslo16:31
viktorsdhellmann: as for stable requirements - what should we do with requirements, which were missed in Icehouse16:33
viktors?16:33
*** pblaho has quit IRC16:35
dhellmannviktors: I'm not sure what you mean?16:44
viktorsdhellmann: patch https://review.openstack.org/#/c/136207/ fails on icehouse gates with error 'doc8' is not a global requirement but it should be,something went wrong16:45
viktorsthat's true, because doc8 was added to global requirements in juno16:45
dhellmannviktors: yeah, I think I need to cap whatever is calling for that to a lower number for icehouse16:45
dhellmannviktors: do you know what is adding that requirement?16:47
viktorsdhellmann: see https://github.com/openstack/requirements/commit/83b3a598558d1ce51d81d3dd0c5dd219f96f8c8416:47
dhellmannviktors: ok, so taskflow at least and possibly some others. I'll work out which version of taskflow added that and lower the cap16:49
*** pblaho_ has quit IRC16:49
*** viktors is now known as viktors|afk16:53
silehtdhellmann, viktors|afk I known that oslo.config and oslo.messaging are broken too16:54
silehtby the introduction of docs8 and oslo.middleware16:55
silehtdhellmann, icehouse/juno compat-jobs for the master branch can be removed when the stable requirements have been merged16:56
*** i159 has quit IRC16:59
*** mfedosin has quit IRC17:05
ihrachyshkadhellmann: may I ask oslo team to prioritize https://review.openstack.org/136999 which blocks oslo.middleware consumption for neutron?17:08
openstackgerritSergey Skripnick proposed openstack/oslo-incubator: Add ConnectionError exception  https://review.openstack.org/13741217:16
vishyharlowja_away: so still no luck in fixing the issue17:21
*** e0ne has quit IRC17:28
*** andreykurilin_ has quit IRC17:34
*** kgiusti has joined #openstack-oslo17:38
dimsihrachyshka: lgtm17:44
ihrachyshkadims: thanks sir!17:45
*** arnaud___ has joined #openstack-oslo17:48
openstackgerritDavanum Srinivas (dims) proposed openstack/oslo.vmware: Switch to use requests/urllib3 and enable cacert validation  https://review.openstack.org/12195618:02
*** arnaud___ has quit IRC18:03
*** harlowja_away is now known as harlowja18:04
harlowjavishy durn, i thought u found it :)18:04
harlowjavishy did adding logs basically just cause eventlet to switch more and then it doesn't happen?18:05
harlowjadhellmann the lower cap i think u put for taskflow is ok with me18:05
dhellmannharlowja: k, thanks18:08
harlowjanp18:09
harlowjasileht yt18:09
*** e0ne has joined #openstack-oslo18:13
*** dtantsur is now known as dtantsur|afk18:13
*** exploreshaifali has quit IRC18:13
openstackgerritJoshua Harlow proposed openstack/taskflow: Add *basic* scope visibility constraints  https://review.openstack.org/13724518:15
*** e0ne has quit IRC18:18
*** harlowja has quit IRC18:18
*** harlowja has joined #openstack-oslo18:19
*** e0ne has joined #openstack-oslo18:20
openstackgerritDarragh Bailey proposed openstack-dev/hacking: Add import check for try/except wrapped imports  https://review.openstack.org/13428818:24
*** e0ne has quit IRC18:29
*** arnaud___ has joined #openstack-oslo18:34
*** _honning_ has joined #openstack-oslo18:38
*** andreykurilin_ has joined #openstack-oslo18:39
*** e0ne has joined #openstack-oslo18:40
*** harlowja_ has joined #openstack-oslo18:41
*** e0ne has quit IRC18:44
*** harlowja has quit IRC18:45
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Add a decorator that logs enter/exit on wait() (no merge)  https://review.openstack.org/13744318:48
harlowja_vishy ^ might be useful for u18:48
harlowja_if u just use that patch, and take all the other logging out, i wonder if that would show the issue again (and still show useful info)18:49
harlowja_can place it on other methods, if u feel thats useful18:49
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Add a decorator that logs enter/exit on wait() (no merge)  https://review.openstack.org/13744318:52
vishyharlowja_: it isn’t in the loop18:56
*** yamahata has quit IRC18:59
vishyharlowja_: it gets stuck doing a recv and never times out19:05
vishythen the other thread hits and doesn’t get the lock so it can’t ever make progress19:05
openstackgerritMerged openstack/oslo-incubator: Add middleware.catch_errors shim for Kilo  https://review.openstack.org/13699919:05
harlowja_vishy kk, so it gets a timeout passed in, but doesn't use it apparently?19:09
harlowja_or recv just broken, lol19:09
*** _honning_ has quit IRC19:10
*** andreykurilin_ has quit IRC19:13
vishyapparently19:14
harlowja_:-19:16
harlowja_:-/19:16
*** exploreshaifali has joined #openstack-oslo19:37
vishyharlowja_: so the main executor thread calls drain events without a timeout :(19:40
harlowja_:(19:40
harlowja_pew pew pew19:40
harlowja_that sux19:40
*** ihrachyshka has quit IRC19:41
*** ihrachyshka has joined #openstack-oslo19:43
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Have the timeout decrement inside the wait() method  https://review.openstack.org/13745619:47
harlowja_vishy ^ probably is a good thing to have also19:47
harlowja_but won't address your main concern i think19:47
vishyso i found the issue19:48
vishyi think19:48
harlowja_cools19:49
vishyharlowja_: https://github.com/openstack/oslo.messaging/blob/master/oslo/messaging/_executors/impl_eventlet.py#L8719:50
harlowja_poll without timeout :(19:50
vishythe main executor polls (which does a drain event) with no timeout19:50
harlowja_durn19:51
vishyso yeah it hangs on reconnect19:56
*** dims has quit IRC19:56
vishyand none of the other threads can process events because they don’t have the lock19:56
vishycburgess: ^^19:57
harlowja_:(19:58
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Have the timeout decrement inside the wait() method  https://review.openstack.org/13745620:01
vishyharlowja_: so it looks like a timeout param was recently added to the poll method for the trollius work20:02
vishythe question is what should the timeout be for that call?20:02
vishyshould it be the same as the rpc timeout or do we need a new param20:02
harlowja_i don't think its rpc timeout related, just seems to be more poll related20:03
harlowja_rpc seems seperate (hopefully 137456 helps with that one)20:03
*** ihrachyshka has quit IRC20:04
*** denis_makogon_ has joined #openstack-oslo20:05
*** e0ne has joined #openstack-oslo20:06
*** ihrachyshka has joined #openstack-oslo20:08
*** e0ne has quit IRC20:17
*** denis_makogon has quit IRC20:19
*** denis_makogon_ is now known as denis_makogon20:19
*** dmakogon_ has joined #openstack-oslo20:19
*** andreykurilin_ has joined #openstack-oslo20:20
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Add a listener provided default poll() timeout  https://review.openstack.org/13746720:28
harlowja_vishy ^20:28
harlowja_seems reasonable...20:28
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Add a listener provided default poll() timeout  https://review.openstack.org/13746720:34
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Add a listener provided default poll() timeout  https://review.openstack.org/13746720:36
*** rpodolyaka1 has quit IRC20:36
*** andreykurilin_ has quit IRC20:37
*** exploreshaifali has quit IRC20:37
*** ajo has quit IRC20:58
*** ajo has joined #openstack-oslo21:05
*** tsekiyam_ has joined #openstack-oslo21:06
*** ajo has quit IRC21:09
*** tsekiyama has quit IRC21:09
*** e0ne has joined #openstack-oslo21:09
*** mtanino has quit IRC21:11
*** tsekiyam_ has quit IRC21:11
*** e0ne has quit IRC21:13
*** andreykurilin_ has joined #openstack-oslo21:19
openstackgerritThiago Paiva Brito proposed openstack/oslo-incubator: Improving docstrings for policy API  https://review.openstack.org/13747621:21
openstackgerritThiago Paiva Brito proposed openstack/oslo-incubator: Improving docstrings for policy API  https://review.openstack.org/13747621:32
*** kgiusti has quit IRC21:39
*** kgiusti has joined #openstack-oslo21:42
*** ajo has joined #openstack-oslo21:44
openstackgerritJoshua Harlow proposed openstack/oslo.messaging: Add a listener provided default poll() timeout  https://review.openstack.org/13746721:44
*** kgiusti has quit IRC21:51
*** ihrachyshka has quit IRC21:52
*** ajo has quit IRC21:56
*** ajo has joined #openstack-oslo21:59
*** alexpilotti has quit IRC22:15
*** ajo has quit IRC22:18
openstackgerritJoshua Harlow proposed openstack/taskflow: Add *basic* scope visibility constraints  https://review.openstack.org/13724522:35
openstackgerritBen Nemec proposed openstack/oslo.concurrency: Add external lock fixture  https://review.openstack.org/13151722:39
*** alexpilotti has joined #openstack-oslo22:45
*** stevemar has quit IRC22:50
*** gordc has quit IRC22:55
*** tsekiyama has joined #openstack-oslo23:01
vishyharlowja_: 1 sec may not be long enough, I got a timeout in normal operation when starting the service23:09
vishyimeout: Timeout while waiting on RPC response - topic: "<unknown>", RPC method: "<unknown>" info: "<unknown>"23:09
harlowja_hmmm, ideally u'd get a timeout, then the loop would continue23:10
harlowja_and try again and again?23:10
vishyyeah it did continue23:11
vishybut i don’t think it should timeout unless there is a failure23:11
*** kgiusti has joined #openstack-oslo23:11
vishyI’m guessing the greenlet switching can cause more than 1 second23:11
vishyor something23:11
vishyjust set it to 12023:11
vishyseems ok so far23:11
vishytesting to see if it fixes the issue23:11
harlowja_kk23:11
harlowja_can also handle the exception better; instead of invoking the logic in excutils.forever_retry_uncaught_exceptions to retry23:12
harlowja_or just mess around with poll() itself23:12
harlowja_although if it is timing out, thats good23:13
harlowja_it'd be interseting to try https://review.openstack.org/#/c/137456/ also23:14
harlowja_which imho does more of the 'correct' thing for threads that get stuck in the wait() method23:15
vishyharlowja_: well partial victory23:19
vishyit does raise a timeout properly23:19
harlowja_yippe23:19
harlowja_lol23:19
vishyevery 120 seconds...23:19
vishyso the socket itself is no longer sending data23:20
vishywhich means it isn’t reconnecting properly23:20
harlowja_onwards toward the next victory/battle!23:20
harlowja_ha23:20
harlowja_what'd the reconnect do, just never really do it and giveup?23:22
*** andreykurilin_ has quit IRC23:22
harlowja_i wonder if https://github.com/openstack/oslo.messaging/commit/973301aa7 then fixed a bunch of that23:23
harlowja_i don't think u are running master/that commit though23:23
vishyharlowja_: was that stuff in juno?23:26
harlowja_not afaik23:26
vishyI could try upgrading to master23:26
harlowja_authored 9 days ago so i don't think so23:27
vishyalthough i’m sure there will be new dependencies that will break23:27
vishysomething about the reconnect logic seems to be a bit borked23:27
harlowja_ya, its complicated, so that doesn't help either :-/23:27
harlowja_which is why i'm wondering if that commit fixes it23:27
harlowja_since it seemed to have overhauled that whole thing23:28
* harlowja_ crosses fingers, ha23:28
vishyi suspect it will be somewhat difficult to use that due to dependencies but i will try it23:51
*** tsekiyama has quit IRC23:52
*** sigmavirus24 is now known as sigmavirus24_awa23:53

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!