Friday, 2018-02-02

*** liyi has joined #senlin00:06
*** liyi has quit IRC00:06
*** fabian has joined #senlin01:01
*** fabian is now known as chenyb401:01
*** pzchen has joined #senlin01:23
*** openstackgerrit has joined #senlin01:36
*** ChanServ sets mode: +v openstackgerrit01:36
openstackgerritXueFeng Liu proposed openstack/senlin master: Updated from global requirements  https://review.openstack.org/53984201:36
*** liyi has joined #senlin01:56
*** zhenglingwu has joined #senlin02:26
*** zhenglingwu has quit IRC02:30
*** liyi has quit IRC02:31
*** liyi has joined #senlin02:35
*** liyi has quit IRC02:35
*** liyi has joined #senlin02:35
openstackgerritJames E. Blair proposed openstack/python-senlinclient master: Zuul: Remove project name  https://review.openstack.org/54022103:03
*** zhenglingwu has joined #senlin03:11
openstackgerritchenpengzi proposed openstack/senlin-tempest-plugin master: Update OpenStack Style Commandments address  https://review.openstack.org/54023603:31
*** pzchen_ has joined #senlin04:18
*** pzchen has quit IRC04:18
*** chenyb4 has quit IRC04:18
*** chenyb4 has joined #senlin04:18
*** ceryx has quit IRC04:36
*** jmlowe_ has quit IRC04:40
*** jmlowe has joined #senlin04:41
*** chenyb4 has quit IRC04:51
*** chenyb4 has joined #senlin04:52
*** jmlowe has quit IRC04:54
*** jmlowe has joined #senlin04:55
*** dtruong_ has joined #senlin04:59
*** dtruong_ has quit IRC05:00
openstackgerritOpenStack Proposal Bot proposed openstack/senlin master: Updated from global requirements  https://review.openstack.org/54025305:06
*** jmlowe has quit IRC05:15
*** jmlowe has joined #senlin05:17
*** dtruong_ has joined #senlin05:29
*** jmlowe has quit IRC05:42
*** jmlowe has joined #senlin05:43
XueFengQiming05:50
XueFengI repaired this patch. https://review.openstack.org/#/c/539842/05:51
*** jmlowe has quit IRC05:57
*** jmlowe has joined #senlin06:00
openstackgerritchenyb4 proposed openstack/python-senlinclient master: Add functional create, update, delete tests  https://review.openstack.org/54026306:05
openstackgerritchenyb4 proposed openstack/python-senlinclient master: Add functional create, update, delete tests  https://review.openstack.org/54026306:08
*** jmlowe has quit IRC06:10
*** jmlowe has joined #senlin06:12
dtruong_qiming: is it possible there is a race condition in this code: https://github.com/openstack/senlin/blob/master/senlin/db/sqlalchemy/api.py#L42406:18
QimingHi, dtruong_, it is possible06:19
dtruong_we have seen errors when multiple attach policy launched simulatanously on the same cluster lead to an error where senlin is attempting to insert the lock for the same cluster into cluster_lock table06:20
QimingMaybe we should add something like line 1062 in the same file06:20
dtruong_i'm sure that would be enough to eliminate the problem we are seeing06:22
*** jmlowe has quit IRC06:22
QimingIn theory, there shouldn't be any race condition, but sqlalchemy plus Oslo.db abstract too many things away, we had a difficult time getting the in-memory objects synchronized with DB contents06:23
dtruong_it appears like in our case two attach policy actions hit line 424 at the same time06:23
*** jmlowe has joined #senlin06:23
dtruong_and since nobody has a lock on the cluster, both will try to add the insert the clusterlock into the cluster_lock table06:24
QimingIf that happens, it means the Lock python object didn't properly updated when the lock status has changed06:24
QimingOne of them will fail, right?06:24
dtruong_yes, the second one will fail with an exception06:24
QimingThen it should retry automatically06:25
dtruong_saying that it is trying to add an entry with duplicate primary key06:25
QimingSigh, sounds like the session_for_write() call is not trustworthy then ...06:26
dtruong_we don't see any retry.  the sql error is going up all the way to the caller of attach policy API06:26
QimingThe retry logic is here: https://github.com/openstack/senlin/blob/master/senlin/engine/senlin_lock.py#L4106:27
dtruong_is the session_for_write supposed to lock out any other thread/process from writing to the DB?06:27
QimingWhatelse do you guess from the call then?06:27
QimingIt is supposed to give us an exclusive write session to the db06:28
QimingIf it is not doing that, it is useless06:28
dtruong_ok.  i'm not too familiar with those APIs06:28
QimingAs I just mentioned, the sqlalchemy and oslo.db abstractions are hiding too many details06:29
QimingWe did received some complaints about lock synchronization06:29
dtruong_the retry is not happening right now because line 435 is throwing an exception06:29
dtruong_let me find the stack trace06:30
QimingAh, okay, the retry logic is not handling exceptions06:31
QimingIn that case, I'd suggest try add the annotation we used on line 1062 and see if it helps06:32
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: 2018-02-02 00:55:51.162 #033[01;31mERROR senlin.engine.actions.base [#033[01;36mreq-8632e7c0-e731-4c69-8dbb-1f700de80ba8 #033[00;36mNone None#033[01;31m] #033[01;35m#033[01;31mUnexpected exception occurred during action CLUSTER_DETACH_POLICY (4ec24ba3-fd33-483a-8cf4-4b6f34324b48) execution: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '80315a11-23d3-4fd3-b0ae-e124542d20cf' for key 'PRIMARY'")06:33
dtruong[SQL: u'INSERT INTO cluster_lock (cluster_id, action_ids, semaphore) VALUES (%(cluster_id)s, %(action_ids)s, %(semaphore)s)'] [parameters: {'action_ids': '["4ec24ba3-fd33-483a-8cf4-4b6f34324b48"]', 'cluster_id': '80315a11-23d3-4fd3-b0ae-e124542d20cf', 'semaphore': -1}] (Background on this error at: http://sqlalche.me/e/gkpj)#033[00m: DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '80315a11-23d3-4fd3-b0ae-e124542d20cf' for key '06:33
dtruongPRIMARY'") [SQL: u'INSERT INTO cluster_lock (cluster_id, action_ids, semaphore) VALUES (%(cluster_id)s, %(action_ids)s, %(semaphore)s)'] [parameters: {'action_ids': '["4ec24ba3-fd33-483a-8cf4-4b6f34324b48"]', 'cluster_id': '80315a11-23d3-4fd3-b0ae-e124542d20cf', 'semaphore': -1}] (Background on this error at: http://sqlalche.me/e/gkpj)06:33
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00mTraceback (most recent call last):06:33
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/opt/stack/senlin/senlin/engine/actions/base.py", line 499, in ActionProc06:33
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    result, reason = action.execute()06:33
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/opt/stack/senlin/senlin/engine/actions/cluster_action.py", line 1101, in execute06:33
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    forced)06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/opt/stack/senlin/senlin/engine/senlin_lock.py", line 58, in cluster_lock_acquire06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    owners = cl_obj.ClusterLock.acquire(cluster_id, action_id, scope)06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/opt/stack/senlin/senlin/objects/cluster_lock.py", line 32, in acquire06:34
*** jmlowe has quit IRC06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    return db_api.cluster_lock_acquire(cluster_id, action_id, scope)06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/opt/stack/senlin/senlin/db/api.py", line 129, in cluster_lock_acquire06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    return IMPL.cluster_lock_acquire(cluster_id, action_id, scope)06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/opt/stack/senlin/senlin/db/sqlalchemy/api.py", line 436, in cluster_lock_acquire06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    return lock.action_ids06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m    self.gen.next()06:34
dtruongFeb  2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m  File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1036, in _transaction_scope06:34
*** dtruong_ has quit IRC06:36
dtruongok, i will try adding the annotation and see if that fixes the problem06:36
QimingHope the sqlalchemy/oslo.db layer can detect this and handle it06:37
*** dtruong_ has joined #senlin06:39
*** jmlowe has joined #senlin06:48
*** dtruong_ has quit IRC06:56
*** jmlowe has quit IRC07:00
*** jmlowe has joined #senlin07:07
*** jmlowe has quit IRC07:17
*** AlexeyAbashkin has joined #senlin07:18
*** jmlowe has joined #senlin07:29
*** jmlowe has quit IRC07:46
*** AlexeyAbashkin has quit IRC07:47
*** jmlowe has joined #senlin07:52
*** AlexeyAbashkin has joined #senlin07:55
*** jmlowe has quit IRC07:59
*** zhenguo has joined #senlin07:59
*** jmlowe has joined #senlin08:23
*** jmlowe has quit IRC08:39
*** jmlowe has joined #senlin08:56
*** jmlowe has quit IRC09:12
*** jmlowe has joined #senlin09:17
openstackgerritchenyb4 proposed openstack/senlin master: Fix node creates the specified cluster error  https://review.openstack.org/54032009:17
*** liyi has quit IRC09:19
openstackgerritchenyb4 proposed openstack/senlin master: Fix node creates the specified cluster error  https://review.openstack.org/54032009:20
*** jmlowe has quit IRC09:26
*** jmlowe has joined #senlin09:48
*** jmlowe has quit IRC10:04
chenyb4hi, Qiming XueFeng https://review.openstack.org/#/c/540320/ unit test appear other error.10:08
*** AlexeyAbashkin has quit IRC10:14
*** AlexeyAbashkin has joined #senlin10:14
openstackgerritAndreas Jaeger proposed openstack/senlin master: Remove _static from releasenotes  https://review.openstack.org/54033010:15
Qimingchenyb4, the gate is bad10:19
QimingIt is trying to use a higher version of openstacksdk version10:19
QimingYou can check from the error log10:19
QimingFrom openstack review, you can see openstacksdk version bump requests have been abandoned several times10:20
chenyb4ok10:20
*** AlexeyAbashkin has quit IRC10:23
*** AlexeyAbashkin has joined #senlin10:23
*** pzchen_ has quit IRC10:24
*** jmlowe has joined #senlin10:25
*** chenyb4 has quit IRC10:25
*** jmlowe has quit IRC10:38
*** test_123 has joined #senlin10:50
*** liyi has joined #senlin10:51
*** test_123 has left #senlin10:51
*** glance_ has joined #senlin10:52
*** liyi has quit IRC10:55
*** liyi has joined #senlin11:39
*** liyi has quit IRC11:44
*** jmlowe has joined #senlin12:15
*** chenyb4 has joined #senlin12:26
*** jmlowe has quit IRC12:30
*** jmlowe has joined #senlin12:38
*** chenyb4 has quit IRC12:41
openstackgerritMerged openstack/senlin master: Remove _static from releasenotes  https://review.openstack.org/54033012:43
*** jmlowe has quit IRC12:48
*** jmlowe has joined #senlin12:51
*** jmlowe has quit IRC13:03
*** jmlowe has joined #senlin13:04
*** jmlowe has quit IRC13:14
*** jmlowe has joined #senlin13:15
*** jmlowe has quit IRC13:27
*** jmlowe has joined #senlin13:29
*** liyi has joined #senlin13:43
*** liyi has quit IRC13:47
*** chenyb4 has joined #senlin14:20
*** chenyb4 has quit IRC15:23
*** liyi has joined #senlin16:02
*** liyi has quit IRC16:06
*** AlexeyAbashkin has quit IRC16:14
*** AlexeyAbashkin has joined #senlin16:56
*** liyi has joined #senlin17:21
*** liyi has quit IRC17:25
*** AlexeyAbashkin has quit IRC17:39
*** jkuei has joined #senlin18:14
jkueiHi XueFeng, seem like we are experience the database locking problem for clusters_lock for bug #1681620. When using our terraform senlin plugin to spin up multiple nodes or attaching multiple policies concurrently it would get into the database locking problem, where if we spin up those serially everything would be ok18:19
openstackbug 1681620 in senlin "ACTION failed when can't get the cluster lock" [Critical,Fix committed] https://launchpad.net/bugs/1681620 - Assigned to XueFeng Liu (jonnary-liu)18:19
*** liyi has joined #senlin18:20
jkueiaccording to the error, it is primary key constraint during insertion to clusters_lock table.18:22
*** liyi has quit IRC18:25
jkueito clarify, it is NOT database lock itself, but failed primary key constraint to the cluster_lock table18:59
*** AlexeyAbashkin has joined #senlin19:11
*** liyi has joined #senlin19:30
*** liyi has quit IRC19:35
*** AlexeyAbashkin has quit IRC19:58
*** AlexeyAbashkin has joined #senlin20:04
*** AlexeyAbashkin has quit IRC20:19
*** liyi has joined #senlin20:20
*** liyi has quit IRC20:24
*** liyi has joined #senlin21:19
*** liyi has quit IRC21:24
eanderssonjkuei, o/22:02
jkueiHalla Hej eandersson!22:10
*** jkuei has quit IRC23:02
*** jkuei has joined #senlin23:05

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!