*** Qiming has quit IRC | 00:38 | |
*** zzxwill has quit IRC | 01:14 | |
openstackgerrit | zzxwill proposed openstack/python-senlinclient: Correct some typos https://review.openstack.org/303944 | 01:23 |
---|---|---|
*** Qiming has joined #senlin | 01:34 | |
*** yanyanhu has joined #senlin | 01:38 | |
*** elynn has joined #senlin | 02:00 | |
*** elynn has quit IRC | 02:05 | |
*** elynn has joined #senlin | 02:05 | |
Qiming | elynn, seems the session_for_write() thing is not reliable | 02:06 |
Qiming | still some parameters to tune, maybe? | 02:06 |
elynn | Still have some potential race condition. | 02:06 |
openstackgerrit | Merged openstack/python-senlinclient: Correct some typos https://review.openstack.org/303944 | 02:07 |
Qiming | so we cannot treat session_for_write as safe atomic way for db api? | 02:07 |
Qiming | what's the different between obj.save(session) and session.add(obj) ? | 02:07 |
elynn | I thinks so, but I think we can not just lock the db by db_api | 02:07 |
elynn | session.add() is for new record | 02:07 |
Qiming | the later was seen in oslo.db doc, the former is what we have been using | 02:07 |
elynn | ojb.save() is for update. | 02:08 |
Qiming | I'm suspecting if that is true | 02:08 |
Qiming | cannot understand why two nodes being added to the same cluster are getting the same index value | 02:09 |
*** ddeja has quit IRC | 02:09 | |
elynn | save will not flush the session before is save to db I think. | 02:10 |
elynn | If they both go into https://github.com/openstack/senlin/blob/master/senlin/db/sqlalchemy/api.py#L162 at the same time, their index will be the same. | 02:11 |
elynn | I think. | 02:11 |
*** yuanying has quit IRC | 02:12 | |
*** yuanying has joined #senlin | 02:12 | |
Qiming | so we need to check if the cluster_next_index is "successful" and "retry", if the session_for_write is a real DB transaction | 02:13 |
yanyanhu | Qiming, I'm testing the membership issue | 02:15 |
elynn | You mean outside db_api? | 02:16 |
Qiming | not sure | 02:16 |
Qiming | the previous DB isolation hacks are all going away when we switch to the context manager and reader/writer thing | 02:16 |
yanyanhu | didn't happened locally without that patch | 02:16 |
Qiming | yanyanhu, it is rare | 02:17 |
yanyanhu | yes | 02:17 |
yanyanhu | let me make more tests | 02:17 |
Qiming | but there is a possibility, as the gate revealed | 02:17 |
yanyanhu | right, need to figure out what is the reason | 02:17 |
elynn | I think this problem is cause by enable concurrency of functional tests. Which will test some potential race condition. | 02:18 |
yanyanhu | elynn, but that happened inside a single test case | 02:18 |
Qiming | cluster membership one is not that case | 02:18 |
yanyanhu | no clue in engine log | 02:19 |
Qiming | it is telling us that the db transaction isolation is very fragile now | 02:19 |
yanyanhu | Qiming, yes, agree | 02:19 |
Qiming | still need to inject auto_commit and expire_on_commit options I guess | 02:19 |
elynn | hmm, indeed, yes | 02:20 |
Qiming | or even the 'isolation_level' option if that does not solve the problem | 02:20 |
yanyanhu | maybe it happens once during hundred times of running, but there is likelihood | 02:20 |
Qiming | yep, that is very annoying and very dangerous | 02:21 |
elynn | yes, should find a way to lock write session. | 02:21 |
elynn | I should go through oslo_db docs | 02:21 |
yanyanhu | Qiming, elynn, I have some new understanding about the relationship between Rally and Tempest. If you guys have time, I want to make a call with you to make a quick sync about the next step for senlin stress/scenario test support. | 02:23 |
elynn | Do you think will this patch help ? https://github.com/openstack/keystone/commit/9535c0908465b626b01fccec034905196f78dc1a | 02:24 |
elynn | yanyanhu, I want to talk about it with you, How about noon at starbucks? | 02:24 |
yanyanhu | elynn, that's ok for me | 02:25 |
Qiming | ok | 02:25 |
Qiming | http://stackoverflow.com/questions/2334824/how-to-increase-a-counter-in-sqlalchemy | 02:25 |
elynn | hmm, interesting | 02:26 |
elynn | Let's have a try. | 02:27 |
Qiming | http://stackoverflow.com/questions/4167568/how-to-do-atomic-increment-decrement-with-elixir-sqlalchemy | 02:27 |
Qiming | the key of the second one is the combination of the filter and update operation | 02:27 |
elynn | ok, so the key is to do a db transaction as quick as we can. | 02:28 |
Qiming | avoid introduce python values or objects | 02:29 |
Qiming | two possibilities | 02:30 |
Qiming | the session.commit() wasn't called quickly enough in the session_for_write() logic | 02:31 |
Qiming | another possibility is the commit() wasn't flushed into db because the object in question wasn't expired immediately | 02:32 |
elynn | session.commit() will automatically call after with session_for_write I think. | 02:32 |
elynn | session.commit() will automatically call after with session_for_write block I think. | 02:32 |
Qiming | if that is the case, session_for_write() is not preventing a reentrance into the transaction logic | 02:33 |
elynn | From what we observed , that might be true, but I'm not so sure why | 02:34 |
Qiming | so elynn | 02:43 |
Qiming | from http://stackoverflow.com/questions/21653319/sqlalchemy-concurrency-update-issue | 02:43 |
Qiming | and http://stackoverflow.com/questions/26356131/when-a-sqlalchemy-concurrent-updates-the-same-record-what-is-wrong | 02:43 |
Qiming | I think we should go query.with_for_update() | 02:44 |
elynn | Let me look at these links | 02:44 |
Qiming | the second link did mentioned some possibility of dead locks, but that is not a concern in our context | 02:45 |
Qiming | I'm so curious/confused why such a problem wasn't discussed on mailinglist | 02:45 |
Qiming | or ... people have all found some secret sauces of different taste? | 02:46 |
openstackgerrit | Merged openstack/senlin: Catch DBDuplicateEntry Error during cred_create https://review.openstack.org/303204 | 02:46 |
Qiming | sigh, this is really not an easy task: http://skien.cc/blog/2014/01/15/sqlalchemy-and-race-conditions-implementing/ | 02:49 |
elynn | They have their small secrets. | 02:49 |
elynn | I would guess they solve the atomic issue outside of db_api. | 02:50 |
Qiming | using python lock? | 02:50 |
Qiming | that would be an overkill | 02:50 |
Qiming | db consistency is the foundation | 02:51 |
Qiming | unless ... you don't trust the db you are using, which is true ins some contexts where you are using some distributed KV stores | 02:51 |
Qiming | here is another article: http://rachbelaid.com/handling-race-condition-insert-with-sqlalchemy/ | 02:52 |
*** yuanying has quit IRC | 02:52 | |
Qiming | even insertion has some problems, but this post is a good reference for the credential thing we were looking at | 02:53 |
elynn | I'm going through the docs of oslo.db hoping I can find some way to solve this http://docs.openstack.org/developer/oslo.db/api/oslo.db.sqlalchemy.session.html | 02:55 |
Qiming | okay | 02:58 |
Qiming | i'd suggest you read at least the last post | 02:59 |
Qiming | it is very inspiring | 02:59 |
elynn | yes, I'm reading. | 02:59 |
elynn | Try and rollback and get again? | 03:07 |
Qiming | what do you mean? | 03:10 |
Qiming | that solution of 'get_or_create_link' looks pretty clean one for cred_create_or_update() | 03:11 |
elynn | I mean the logic from your pasting blog, it's try to create and catch error and do a session rollback and then try to get existing record, it's a little complicated. But the this blog shows the workflow of how the race condition happens. | 03:11 |
Qiming | your current workaround, catching DBDuplicateEntity looks similar | 03:12 |
elynn | Not so sure which is better. | 03:13 |
Qiming | as for the index problem | 03:13 |
Qiming | I'd suggest we try the with_for_update() hack | 03:13 |
Qiming | according to this: http://docs.sqlalchemy.org/en/rel_0_9/orm/query.html#sqlalchemy.orm.query.Query.with_for_update | 03:14 |
Qiming | with_for_update() is superceding the with_lockmode() call | 03:14 |
elynn | Let's test it. | 03:15 |
Qiming | em | 03:15 |
openstackgerrit | Ethan Lynn proposed openstack/senlin: [WIP] Fix for cluster_next_index db call https://review.openstack.org/304371 | 03:17 |
*** yuanying has joined #senlin | 03:50 | |
*** elynn has quit IRC | 04:21 | |
*** zzxwill has joined #senlin | 05:17 | |
*** elynn has joined #senlin | 05:44 | |
*** elynn has quit IRC | 05:49 | |
*** elynn has joined #senlin | 05:49 | |
*** zzxwill has quit IRC | 06:19 | |
elynn | Hi Qiming | 06:26 |
elynn | Do we need to apply with_for_update for all update operation? | 06:27 |
openstackgerrit | Ethan Lynn proposed openstack/senlin: Rename tempest_tests to tempest https://review.openstack.org/303264 | 06:36 |
Qiming | elynn, I'd suggest we think twice | 06:36 |
Qiming | it is ... after all ... a hack | 06:37 |
elynn | ok, so for now just apply for cluster_next_index. | 06:37 |
Qiming | until we find another trace of concurrency error | 06:38 |
openstackgerrit | Ethan Lynn proposed openstack/senlin: Use with_for_update in cluster_next_index db call https://review.openstack.org/304371 | 06:42 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Clarify some guidelines on contribution https://review.openstack.org/304413 | 07:08 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Release note for API versioning https://review.openstack.org/304414 | 07:11 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Release note for trust creation concurrency https://review.openstack.org/304416 | 07:15 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Release note for the event generation bug fix https://review.openstack.org/304419 | 07:19 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Reorg user documentation to be references https://review.openstack.org/304423 | 07:29 |
openstackgerrit | Merged openstack/python-senlinclient: Updated from global requirements https://review.openstack.org/303006 | 07:41 |
*** zzxwill has joined #senlin | 07:56 | |
openstackgerrit | Qiming Teng proposed openstack/senlin: Initial framework for user tutorial doc https://review.openstack.org/304444 | 08:24 |
*** elynn has quit IRC | 08:33 | |
*** elynn has joined #senlin | 08:36 | |
*** zzxwill has quit IRC | 08:39 | |
*** zzxwill has joined #senlin | 08:40 | |
openstackgerrit | Qiming Teng proposed openstack/senlin: Fix problems in glossary https://review.openstack.org/304450 | 08:42 |
*** elynn_ has joined #senlin | 08:44 | |
*** zzxwill has quit IRC | 08:47 | |
*** elynn has quit IRC | 08:48 | |
*** zzxwill has joined #senlin | 09:07 | |
*** zzxwill has quit IRC | 09:10 | |
*** zzxwill has joined #senlin | 09:15 | |
*** openstackgerrit has quit IRC | 09:17 | |
*** openstackgerrit has joined #senlin | 09:18 | |
*** zzxwill has quit IRC | 09:20 | |
openstackgerrit | Qiming Teng proposed openstack/senlin: Clarify some guidelines on contribution https://review.openstack.org/304413 | 09:23 |
*** yanyanhu has quit IRC | 09:24 | |
*** shu-mutou is now known as shu-mutou-AFK | 09:35 | |
openstackgerrit | Merged openstack/senlin: Clarify some guidelines on contribution https://review.openstack.org/304413 | 09:42 |
*** openstackstatus has quit IRC | 09:57 | |
*** openstack has joined #senlin | 10:02 | |
*** zzxwill has joined #senlin | 10:03 | |
*** Qiming has quit IRC | 10:14 | |
*** zzxwill has quit IRC | 10:23 | |
*** zzxwill has joined #senlin | 10:25 | |
*** elynn_ has quit IRC | 10:28 | |
*** zzxwill has quit IRC | 10:34 | |
*** zzxwill has joined #senlin | 10:34 | |
*** zzxwill has quit IRC | 10:48 | |
*** zzxwill has joined #senlin | 10:50 | |
*** zzxwill has quit IRC | 11:13 | |
*** Qiming has joined #senlin | 11:22 | |
*** zzxwill has joined #senlin | 11:57 | |
*** openstack has quit IRC | 12:04 | |
*** openstack has joined #senlin | 12:08 | |
*** zzxwill has quit IRC | 12:10 | |
openstackgerrit | Merged openstack/senlin: Release note for API versioning https://review.openstack.org/304414 | 12:43 |
openstackgerrit | Merged openstack/senlin: Release note for trust creation concurrency https://review.openstack.org/304416 | 12:43 |
openstackgerrit | Merged openstack/senlin: Release note for the event generation bug fix https://review.openstack.org/304419 | 12:43 |
openstackgerrit | Merged openstack/senlin: Reorg user documentation to be references https://review.openstack.org/304423 | 12:43 |
openstackgerrit | Qiming Teng proposed openstack/senlin: Fix problems in glossary https://review.openstack.org/304450 | 12:44 |
*** cschulz_ has joined #senlin | 12:47 | |
*** elynn has joined #senlin | 12:55 | |
*** yanyanhu has joined #senlin | 12:58 | |
*** zzxwill has joined #senlin | 13:01 | |
*** zzxwill has quit IRC | 13:08 | |
*** zzxwill has joined #senlin | 13:09 | |
*** elynn has quit IRC | 13:12 | |
*** elynn has joined #senlin | 13:13 | |
*** zzxwill has quit IRC | 13:14 | |
*** zzxwill has joined #senlin | 13:19 | |
*** zzxwill has quit IRC | 13:23 | |
*** zzxwill has joined #senlin | 13:29 | |
openstackgerrit | Liuqing Jing proposed openstack/senlin-dashboard: Fix empty `Timestamp` column in cluster/node event tables https://review.openstack.org/302772 | 13:50 |
*** jhesketh has left #senlin | 13:57 | |
*** elynn has quit IRC | 14:00 | |
yanyanhu | hi, Qiming, about the approval process, I think using policy to support it is better since policy is supposed to be customized by user/developer | 14:01 |
yanyanhu | but if we embedded such mechanism into senlin scaling request, it's difficult to make it flexible enough to meet everyone's demand | 14:02 |
Qiming | the approval is not supposed to be done in minutes | 14:05 |
Qiming | it will be processed by human in the middle | 14:05 |
cschulz_ | The issue with policy is that it is after the action is started which causes a problem with locking | 14:05 |
yanyanhu | yes, just a trade-off between refactoring policy mechanism and redesign request process | 14:06 |
Qiming | exactly, our policy definition is rule set to be check when action is about to be executed | 14:06 |
cschulz_ | One way we could do it is to have a policy make a approval request and then cancel the action. | 14:07 |
cschulz_ | The response from approval would be to create a special scaling request. | 14:07 |
Qiming | yes, cschulz_, that is in fact turning the scaling request into two steps | 14:08 |
yanyanhu | so the policy does some works at background | 14:08 |
Qiming | the second one is the one we can handle today | 14:08 |
Qiming | the first step, doesn't look like a real request yet | 14:09 |
yanyanhu | can action timeout provide some help here? | 14:09 |
cschulz_ | The issue is that the condition that generated the original request may have changed. | 14:09 |
cschulz_ | So there really need to be a reevaluation of the situation at the time of the approval | 14:09 |
Qiming | em | 14:10 |
Qiming | that is an issue | 14:10 |
Qiming | but we cannot avoid that, right? | 14:10 |
yanyanhu | yes | 14:10 |
yanyanhu | we can't | 14:10 |
Qiming | if the request is delayed for 20 minutes for approval | 14:10 |
yanyanhu | this problem always exists | 14:10 |
cschulz_ | That is why my thinking was leaning toward doing all the decision process outside before ever issuing any requests to Senlin | 14:11 |
yanyanhu | I agree with cschulz_ | 14:11 |
Qiming | yep | 14:11 |
yanyanhu | since this kind of processing is actually out of senlin's scope | 14:11 |
Qiming | so long the user knows what will happen | 14:11 |
yanyanhu | even out of ceilometer/monasca's scope I guess | 14:11 |
cschulz_ | Makes it much simpler as far as Senlin, just need a secure way for external entities to tell Senlin to take actions. | 14:12 |
cschulz_ | e.g. Zaqar | 14:12 |
yanyanhu | yep | 14:12 |
Qiming | okay, that makes perfect sense | 14:12 |
Qiming | senlin doesn't have to know the source of the trigger | 14:12 |
cschulz_ | Nope | 14:12 |
Qiming | I'm thinking if we should improve our scaling policy | 14:13 |
Qiming | say, if a request is about adding more then 10 VMs a time, we should send a message/alert | 14:13 |
cschulz_ | I think we need to have an notification policy | 14:13 |
yanyanhu | that's a good idea | 14:14 |
Qiming | report this to the boss, :) | 14:14 |
cschulz_ | To send a message on Zaqar, and a Zaqar Receiver | 14:14 |
Qiming | yep | 14:14 |
cschulz_ | Am working on a Zaqar client for Senlin | 14:14 |
Qiming | speaking of emitting notifications | 14:14 |
Qiming | I'm thinking of unifying that into the senlin event module | 14:15 |
yanyanhu | may also need to extend event? | 14:15 |
yanyanhu | :) | 14:15 |
Qiming | use the same event interface | 14:15 |
Qiming | allow more than one backend for the event, configurable for deployers | 14:15 |
cschulz_ | Where can I look into that? Any documentation? | 14:15 |
yanyanhu | yes | 14:15 |
Qiming | one backend is db, one is zaqar queue, one is oslo_notification | 14:15 |
yanyanhu | like ceilometer sampling | 14:16 |
yanyanhu | no document for senlin event I guess... | 14:16 |
yanyanhu | as of now | 14:16 |
Qiming | cschulz_, http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/event.py | 14:16 |
yanyanhu | wow | 14:17 |
yanyanhu | it is there :) | 14:17 |
yanyanhu | oh, just code... | 14:17 |
cschulz_ | thanks. I read the code. | 14:17 |
Qiming | each function is basically constructing a Event object and serializing that into DB | 14:17 |
yanyanhu | will leave. talk to you guys later | 14:18 |
Qiming | we can extend that with DB as just one of backend | 14:18 |
cschulz_ | Qiming, your Notes available times seem to be in US time zone | 14:18 |
Qiming | bye yanyanhu | 14:18 |
cschulz_ | bye yanyanhu | 14:18 |
yanyanhu | bye | 14:18 |
*** yanyanhu has quit IRC | 14:18 | |
Qiming | hahah | 14:18 |
Qiming | my day time is mostly empty, but my night time is ... busy, overlapping meetings | 14:19 |
cschulz_ | Your availability appears on my Notes to be from now to EOD EST. Do you work all night in Beijing? | 14:20 |
*** zzxwill has quit IRC | 14:26 | |
Qiming | no, cschulz_, I sleep | 14:30 |
cschulz_ | So Notes is wrong. That's no surprise | 14:30 |
Qiming | okay, I see | 14:31 |
Qiming | I was trying to schedule something during the coming summit | 14:31 |
Qiming | so ... the best way to do that is to switch my notes time zone, schedule them, then switch back | 14:31 |
Qiming | I forgot to switch back I think | 14:32 |
cschulz_ | OK. makes sense. You got any time tomorrow? Morning your time? | 14:32 |
Qiming | yes | 14:32 |
cschulz_ | let me know what works for you. I'll send invite. | 14:37 |
*** zzxwill has joined #senlin | 14:40 | |
Qiming | ok | 14:42 |
*** zzxwill has quit IRC | 14:46 | |
*** zzxwill has joined #senlin | 15:02 | |
*** jdandrea_ has joined #senlin | 15:10 | |
*** jdandrea has quit IRC | 15:14 | |
*** openstackgerrit has quit IRC | 15:18 | |
*** openstackgerrit has joined #senlin | 15:19 | |
*** jdandrea_ is now known as jdandrea | 15:25 | |
*** zzxwill has quit IRC | 15:47 | |
*** Qiming has quit IRC | 16:22 | |
*** yuanying has quit IRC | 19:04 | |
*** nk2527 has left #senlin | 19:11 | |
*** cschulz_ has quit IRC | 20:38 | |
*** yuanying has joined #senlin | 23:18 | |
*** Qiming has joined #senlin | 23:35 | |
*** yuanying has quit IRC | 23:36 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!