*** liyi has joined #senlin | 00:06 | |
*** liyi has quit IRC | 00:06 | |
*** fabian has joined #senlin | 01:01 | |
*** fabian is now known as chenyb4 | 01:01 | |
*** pzchen has joined #senlin | 01:23 | |
*** openstackgerrit has joined #senlin | 01:36 | |
*** ChanServ sets mode: +v openstackgerrit | 01:36 | |
openstackgerrit | XueFeng Liu proposed openstack/senlin master: Updated from global requirements https://review.openstack.org/539842 | 01:36 |
---|---|---|
*** liyi has joined #senlin | 01:56 | |
*** zhenglingwu has joined #senlin | 02:26 | |
*** zhenglingwu has quit IRC | 02:30 | |
*** liyi has quit IRC | 02:31 | |
*** liyi has joined #senlin | 02:35 | |
*** liyi has quit IRC | 02:35 | |
*** liyi has joined #senlin | 02:35 | |
openstackgerrit | James E. Blair proposed openstack/python-senlinclient master: Zuul: Remove project name https://review.openstack.org/540221 | 03:03 |
*** zhenglingwu has joined #senlin | 03:11 | |
openstackgerrit | chenpengzi proposed openstack/senlin-tempest-plugin master: Update OpenStack Style Commandments address https://review.openstack.org/540236 | 03:31 |
*** pzchen_ has joined #senlin | 04:18 | |
*** pzchen has quit IRC | 04:18 | |
*** chenyb4 has quit IRC | 04:18 | |
*** chenyb4 has joined #senlin | 04:18 | |
*** ceryx has quit IRC | 04:36 | |
*** jmlowe_ has quit IRC | 04:40 | |
*** jmlowe has joined #senlin | 04:41 | |
*** chenyb4 has quit IRC | 04:51 | |
*** chenyb4 has joined #senlin | 04:52 | |
*** jmlowe has quit IRC | 04:54 | |
*** jmlowe has joined #senlin | 04:55 | |
*** dtruong_ has joined #senlin | 04:59 | |
*** dtruong_ has quit IRC | 05:00 | |
openstackgerrit | OpenStack Proposal Bot proposed openstack/senlin master: Updated from global requirements https://review.openstack.org/540253 | 05:06 |
*** jmlowe has quit IRC | 05:15 | |
*** jmlowe has joined #senlin | 05:17 | |
*** dtruong_ has joined #senlin | 05:29 | |
*** jmlowe has quit IRC | 05:42 | |
*** jmlowe has joined #senlin | 05:43 | |
XueFeng | Qiming | 05:50 |
XueFeng | I repaired this patch. https://review.openstack.org/#/c/539842/ | 05:51 |
*** jmlowe has quit IRC | 05:57 | |
*** jmlowe has joined #senlin | 06:00 | |
openstackgerrit | chenyb4 proposed openstack/python-senlinclient master: Add functional create, update, delete tests https://review.openstack.org/540263 | 06:05 |
openstackgerrit | chenyb4 proposed openstack/python-senlinclient master: Add functional create, update, delete tests https://review.openstack.org/540263 | 06:08 |
*** jmlowe has quit IRC | 06:10 | |
*** jmlowe has joined #senlin | 06:12 | |
dtruong_ | qiming: is it possible there is a race condition in this code: https://github.com/openstack/senlin/blob/master/senlin/db/sqlalchemy/api.py#L424 | 06:18 |
Qiming | Hi, dtruong_, it is possible | 06:19 |
dtruong_ | we have seen errors when multiple attach policy launched simulatanously on the same cluster lead to an error where senlin is attempting to insert the lock for the same cluster into cluster_lock table | 06:20 |
Qiming | Maybe we should add something like line 1062 in the same file | 06:20 |
dtruong_ | i'm sure that would be enough to eliminate the problem we are seeing | 06:22 |
*** jmlowe has quit IRC | 06:22 | |
Qiming | In theory, there shouldn't be any race condition, but sqlalchemy plus Oslo.db abstract too many things away, we had a difficult time getting the in-memory objects synchronized with DB contents | 06:23 |
dtruong_ | it appears like in our case two attach policy actions hit line 424 at the same time | 06:23 |
*** jmlowe has joined #senlin | 06:23 | |
dtruong_ | and since nobody has a lock on the cluster, both will try to add the insert the clusterlock into the cluster_lock table | 06:24 |
Qiming | If that happens, it means the Lock python object didn't properly updated when the lock status has changed | 06:24 |
Qiming | One of them will fail, right? | 06:24 |
dtruong_ | yes, the second one will fail with an exception | 06:24 |
Qiming | Then it should retry automatically | 06:25 |
dtruong_ | saying that it is trying to add an entry with duplicate primary key | 06:25 |
Qiming | Sigh, sounds like the session_for_write() call is not trustworthy then ... | 06:26 |
dtruong_ | we don't see any retry. the sql error is going up all the way to the caller of attach policy API | 06:26 |
Qiming | The retry logic is here: https://github.com/openstack/senlin/blob/master/senlin/engine/senlin_lock.py#L41 | 06:27 |
dtruong_ | is the session_for_write supposed to lock out any other thread/process from writing to the DB? | 06:27 |
Qiming | Whatelse do you guess from the call then? | 06:27 |
Qiming | It is supposed to give us an exclusive write session to the db | 06:28 |
Qiming | If it is not doing that, it is useless | 06:28 |
dtruong_ | ok. i'm not too familiar with those APIs | 06:28 |
Qiming | As I just mentioned, the sqlalchemy and oslo.db abstractions are hiding too many details | 06:29 |
Qiming | We did received some complaints about lock synchronization | 06:29 |
dtruong_ | the retry is not happening right now because line 435 is throwing an exception | 06:29 |
dtruong_ | let me find the stack trace | 06:30 |
Qiming | Ah, okay, the retry logic is not handling exceptions | 06:31 |
Qiming | In that case, I'd suggest try add the annotation we used on line 1062 and see if it helps | 06:32 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: 2018-02-02 00:55:51.162 #033[01;31mERROR senlin.engine.actions.base [#033[01;36mreq-8632e7c0-e731-4c69-8dbb-1f700de80ba8 #033[00;36mNone None#033[01;31m] #033[01;35m#033[01;31mUnexpected exception occurred during action CLUSTER_DETACH_POLICY (4ec24ba3-fd33-483a-8cf4-4b6f34324b48) execution: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '80315a11-23d3-4fd3-b0ae-e124542d20cf' for key 'PRIMARY'") | 06:33 |
dtruong | [SQL: u'INSERT INTO cluster_lock (cluster_id, action_ids, semaphore) VALUES (%(cluster_id)s, %(action_ids)s, %(semaphore)s)'] [parameters: {'action_ids': '["4ec24ba3-fd33-483a-8cf4-4b6f34324b48"]', 'cluster_id': '80315a11-23d3-4fd3-b0ae-e124542d20cf', 'semaphore': -1}] (Background on this error at: http://sqlalche.me/e/gkpj)#033[00m: DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '80315a11-23d3-4fd3-b0ae-e124542d20cf' for key ' | 06:33 |
dtruong | PRIMARY'") [SQL: u'INSERT INTO cluster_lock (cluster_id, action_ids, semaphore) VALUES (%(cluster_id)s, %(action_ids)s, %(semaphore)s)'] [parameters: {'action_ids': '["4ec24ba3-fd33-483a-8cf4-4b6f34324b48"]', 'cluster_id': '80315a11-23d3-4fd3-b0ae-e124542d20cf', 'semaphore': -1}] (Background on this error at: http://sqlalche.me/e/gkpj) | 06:33 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00mTraceback (most recent call last): | 06:33 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/opt/stack/senlin/senlin/engine/actions/base.py", line 499, in ActionProc | 06:33 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m result, reason = action.execute() | 06:33 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/opt/stack/senlin/senlin/engine/actions/cluster_action.py", line 1101, in execute | 06:33 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m forced) | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/opt/stack/senlin/senlin/engine/senlin_lock.py", line 58, in cluster_lock_acquire | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m owners = cl_obj.ClusterLock.acquire(cluster_id, action_id, scope) | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/opt/stack/senlin/senlin/objects/cluster_lock.py", line 32, in acquire | 06:34 |
*** jmlowe has quit IRC | 06:34 | |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m return db_api.cluster_lock_acquire(cluster_id, action_id, scope) | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/opt/stack/senlin/senlin/db/api.py", line 129, in cluster_lock_acquire | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m return IMPL.cluster_lock_acquire(cluster_id, action_id, scope) | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/opt/stack/senlin/senlin/db/sqlalchemy/api.py", line 436, in cluster_lock_acquire | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m return lock.action_ids | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m self.gen.next() | 06:34 |
dtruong | Feb 2 00:55:51 sandboxController senlin-engine[12568]: #033[01;31m2018-02-02 00:55:51.162 TRACE senlin.engine.actions.base #033[01;35m#033[00m File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1036, in _transaction_scope | 06:34 |
*** dtruong_ has quit IRC | 06:36 | |
dtruong | ok, i will try adding the annotation and see if that fixes the problem | 06:36 |
Qiming | Hope the sqlalchemy/oslo.db layer can detect this and handle it | 06:37 |
*** dtruong_ has joined #senlin | 06:39 | |
*** jmlowe has joined #senlin | 06:48 | |
*** dtruong_ has quit IRC | 06:56 | |
*** jmlowe has quit IRC | 07:00 | |
*** jmlowe has joined #senlin | 07:07 | |
*** jmlowe has quit IRC | 07:17 | |
*** AlexeyAbashkin has joined #senlin | 07:18 | |
*** jmlowe has joined #senlin | 07:29 | |
*** jmlowe has quit IRC | 07:46 | |
*** AlexeyAbashkin has quit IRC | 07:47 | |
*** jmlowe has joined #senlin | 07:52 | |
*** AlexeyAbashkin has joined #senlin | 07:55 | |
*** jmlowe has quit IRC | 07:59 | |
*** zhenguo has joined #senlin | 07:59 | |
*** jmlowe has joined #senlin | 08:23 | |
*** jmlowe has quit IRC | 08:39 | |
*** jmlowe has joined #senlin | 08:56 | |
*** jmlowe has quit IRC | 09:12 | |
*** jmlowe has joined #senlin | 09:17 | |
openstackgerrit | chenyb4 proposed openstack/senlin master: Fix node creates the specified cluster error https://review.openstack.org/540320 | 09:17 |
*** liyi has quit IRC | 09:19 | |
openstackgerrit | chenyb4 proposed openstack/senlin master: Fix node creates the specified cluster error https://review.openstack.org/540320 | 09:20 |
*** jmlowe has quit IRC | 09:26 | |
*** jmlowe has joined #senlin | 09:48 | |
*** jmlowe has quit IRC | 10:04 | |
chenyb4 | hi, Qiming XueFeng https://review.openstack.org/#/c/540320/ unit test appear other error. | 10:08 |
*** AlexeyAbashkin has quit IRC | 10:14 | |
*** AlexeyAbashkin has joined #senlin | 10:14 | |
openstackgerrit | Andreas Jaeger proposed openstack/senlin master: Remove _static from releasenotes https://review.openstack.org/540330 | 10:15 |
Qiming | chenyb4, the gate is bad | 10:19 |
Qiming | It is trying to use a higher version of openstacksdk version | 10:19 |
Qiming | You can check from the error log | 10:19 |
Qiming | From openstack review, you can see openstacksdk version bump requests have been abandoned several times | 10:20 |
chenyb4 | ok | 10:20 |
*** AlexeyAbashkin has quit IRC | 10:23 | |
*** AlexeyAbashkin has joined #senlin | 10:23 | |
*** pzchen_ has quit IRC | 10:24 | |
*** jmlowe has joined #senlin | 10:25 | |
*** chenyb4 has quit IRC | 10:25 | |
*** jmlowe has quit IRC | 10:38 | |
*** test_123 has joined #senlin | 10:50 | |
*** liyi has joined #senlin | 10:51 | |
*** test_123 has left #senlin | 10:51 | |
*** glance_ has joined #senlin | 10:52 | |
*** liyi has quit IRC | 10:55 | |
*** liyi has joined #senlin | 11:39 | |
*** liyi has quit IRC | 11:44 | |
*** jmlowe has joined #senlin | 12:15 | |
*** chenyb4 has joined #senlin | 12:26 | |
*** jmlowe has quit IRC | 12:30 | |
*** jmlowe has joined #senlin | 12:38 | |
*** chenyb4 has quit IRC | 12:41 | |
openstackgerrit | Merged openstack/senlin master: Remove _static from releasenotes https://review.openstack.org/540330 | 12:43 |
*** jmlowe has quit IRC | 12:48 | |
*** jmlowe has joined #senlin | 12:51 | |
*** jmlowe has quit IRC | 13:03 | |
*** jmlowe has joined #senlin | 13:04 | |
*** jmlowe has quit IRC | 13:14 | |
*** jmlowe has joined #senlin | 13:15 | |
*** jmlowe has quit IRC | 13:27 | |
*** jmlowe has joined #senlin | 13:29 | |
*** liyi has joined #senlin | 13:43 | |
*** liyi has quit IRC | 13:47 | |
*** chenyb4 has joined #senlin | 14:20 | |
*** chenyb4 has quit IRC | 15:23 | |
*** liyi has joined #senlin | 16:02 | |
*** liyi has quit IRC | 16:06 | |
*** AlexeyAbashkin has quit IRC | 16:14 | |
*** AlexeyAbashkin has joined #senlin | 16:56 | |
*** liyi has joined #senlin | 17:21 | |
*** liyi has quit IRC | 17:25 | |
*** AlexeyAbashkin has quit IRC | 17:39 | |
*** jkuei has joined #senlin | 18:14 | |
jkuei | Hi XueFeng, seem like we are experience the database locking problem for clusters_lock for bug #1681620. When using our terraform senlin plugin to spin up multiple nodes or attaching multiple policies concurrently it would get into the database locking problem, where if we spin up those serially everything would be ok | 18:19 |
openstack | bug 1681620 in senlin "ACTION failed when can't get the cluster lock" [Critical,Fix committed] https://launchpad.net/bugs/1681620 - Assigned to XueFeng Liu (jonnary-liu) | 18:19 |
*** liyi has joined #senlin | 18:20 | |
jkuei | according to the error, it is primary key constraint during insertion to clusters_lock table. | 18:22 |
*** liyi has quit IRC | 18:25 | |
jkuei | to clarify, it is NOT database lock itself, but failed primary key constraint to the cluster_lock table | 18:59 |
*** AlexeyAbashkin has joined #senlin | 19:11 | |
*** liyi has joined #senlin | 19:30 | |
*** liyi has quit IRC | 19:35 | |
*** AlexeyAbashkin has quit IRC | 19:58 | |
*** AlexeyAbashkin has joined #senlin | 20:04 | |
*** AlexeyAbashkin has quit IRC | 20:19 | |
*** liyi has joined #senlin | 20:20 | |
*** liyi has quit IRC | 20:24 | |
*** liyi has joined #senlin | 21:19 | |
*** liyi has quit IRC | 21:24 | |
eandersson | jkuei, o/ | 22:02 |
jkuei | Halla Hej eandersson! | 22:10 |
*** jkuei has quit IRC | 23:02 | |
*** jkuei has joined #senlin | 23:05 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!