Monday, 2017-04-10

*** zhurong has joined #senlin00:33
*** yanyanhu has joined #senlin01:36
*** dixiaoli has joined #senlin01:38
*** yanyanhu has quit IRC01:40
*** zhurong has quit IRC01:58
*** zhurong has joined #senlin02:10
*** yanyanhu has joined #senlin02:20
openstackgerritQiming Teng proposed openstack/senlin-dashboard master: DNM: Use mock for unit test instead of mox3  https://review.openstack.org/45511303:31
*** catintheroof has joined #senlin03:43
*** catintheroof has quit IRC03:51
openstackgerritMerged openstack/senlin master: Add unit test for action sqlarchemy api.  https://review.openstack.org/45495404:17
openstackgerritXueFeng Liu proposed openstack/senlin master: Add cidr, ip_version to sub_net create  https://review.openstack.org/45513506:50
openstackgerritRUIJIE YUAN proposed openstack/senlin master: revise action data dumping  https://review.openstack.org/45242507:16
openstackgerritQiming Teng proposed openstack/senlin-dashboard master: DNM: Use mock for unit test instead of mox3  https://review.openstack.org/45511307:18
openstackgerritQiming Teng proposed openstack/senlin-dashboard master: DNM: Use mock for unit test instead of mox3  https://review.openstack.org/45511307:44
XueFeng QiMing, onlin? local tempest prepare has been finished. Next will add create/delete to each test cause.07:53
XueFengonline07:53
Qiminggreat! thx!07:54
XueFengmy pleasure07:54
XueFengAnd after these test causes can passed in local.07:55
XueFengWe can remove KEEP_LOCALRC=1 in gate?07:55
*** ruijie has joined #senlin08:08
openstackgerritRUIJIE YUAN proposed openstack/senlin master: dump data to action.outputs for deletion process  https://review.openstack.org/45515108:18
QimingXueFeng, I'm not quite sure about that08:18
Qimingmaybe yanyanhu knows more08:19
yanyanhuhi08:19
*** elynn has joined #senlin08:20
yanyanhuXueFeng, this flag is for ensuring our local setting on devstack take effect during the test08:20
yanyanhulast time we checked the job template, this flag is required. Not sure whether it is still needed(I guess so?)08:21
yanyanhuhi, XueFeng, any special requirement for removing it?08:23
XueFenghi yanyanhu, KEEP_LOCAL=1 use devstack auth. And I am supporting temptest use dynamical auth.08:28
yanyanhudevstack auth, you mean?08:29
XueFengI think KEEP_LOCALRC=1 will user,project... which we installed in devstack08:30
yanyanhuXueFeng, you mean tempest will use those users/projects pre-created during the devstack installation?08:31
XueFengYep08:31
XueFengIn /opt/stack/tempest# vi etc/tempest.conf08:32
XueFeng[auth]08:32
XueFenguse_dynamic_credentials = True08:32
XueFenguse_dynamic_credentials = True is default08:33
XueFengAnd if we use this conf, we can't passs senlin tempest08:34
openstackgerritMerged openstack/senlin master: Add cidr, ip_version to sub_net create  https://review.openstack.org/45513508:34
XueFengWe will get No keypair08:34
yanyanhuXueFeng, yes, I think tempest will use dynamic user credential by default08:34
XueFengfor example08:34
XueFengAnd last time08:34
yanyanhuXueFeng, I think on keypair is because that keypair is created by other user(e.g. admin) and thus will be invisible for those users tempest uses08:35
XueFengWhen Ocata release08:35
yanyanhuon/no08:35
XueFengWe got fail in senlin gate08:35
yanyanhuso a better solution could be creating keypair before using it in tempest test :)08:36
XueFengAnd use KEEP_LOCALRC=1 to avoid08:36
XueFengSo now, I am supporting  use_dynamic_credentials = True08:36
XueFengfro senlin tempest08:36
XueFengfor08:36
XueFengyes, yanyanhu08:37
yanyanhuhi, XueFeng, KEEP_LOCALRC was added last time because some change in devstack side make our local configuration been overridden and thus failed our test jobs08:37
yanyanhuso it was added to ensure those configuration we added in pre_test_hook takes effect08:38
yanyanhuuse_dynamic_credentials is default True I guess?08:38
yanyanhuafter tempest is installed08:39
yanyanhuI did notice there were some users/projects left in my system when some of my tempest test failed locally :)08:39
yanyanhufor creating keypair, we did similar thing before08:40
XueFengYes08:40
yanyanhulet me find some example for you08:40
XueFengI know this.08:40
Qimingthe reason we added 'KEEP_LOCALRC=1' last time was that we cannot customize senlin.conf when setting up tempest08:40
Qimingit is not related to authentication as I see it08:40
XueFengOK08:41
Qimingwe wanted to switch cloud_backend to 'openstack_test' for api and functional tests, but the customization was overridden due to devstack-lib logics08:41
XueFengOk, so  these are two problems.08:42
yanyanhuhttp://git.openstack.org/cgit/openstack/senlin/tree/senlin/tests/tempest/integration/test_nova_server_cluster.py#n2708:42
yanyanhuhere is the logic to create keypair before creating nova server in integration test08:43
XueFengYes, this is create by admin08:43
XueFengBut for api and functional test we can't pass now in loacl.08:44
yanyanhuI guess it should be created by the user that tempest uses to perform the integration test?08:44
XueFengyes08:44
yanyanhunot admin08:44
XueFenghttps://review.openstack.org/#/c/453670/08:45
yanyanhuXueFeng, you mean the functional test can't pass if it is performed locally?08:45
XueFengSo some patches was add08:45
XueFengYes, this may begain from Ocata Release08:46
XueFenghttps://bugs.launchpad.net/senlin/+bug/167862508:46
openstackLaunchpad bug 1678625 in senlin "senlin local temptest fail" [Undecided,In progress] - Assigned to XueFeng Liu (jonnary-liu)08:46
ruijiethe network 'private' created by tempest that 'shared' property is set to 'False' by default, is that still fine?08:46
yanyanhuif so, consider to change is_admin_manager=True to False?08:47
XueFengThe  is_admin_manager is add by me:)08:47
yanyanhuI guess we don't have to create keypair as admin user?08:47
yanyanhuyes, I know that :)08:47
XueFengalso network, subnet08:48
yanyanhuI didn't noticed that it changes the default behavior of integration08:48
XueFengSo next will use these methods.08:48
ruijieits not shared by talents is not admin ?08:48
XueFengyanyanhu, I have test passed in may local08:49
yanyanhuruijie, ideally, we should create our own network for integration test :)08:49
yanyanhuas what we are doing for keypair08:49
XueFengNot submit the last patch08:49
ruijieagreed yanyanhu :)08:50
yanyanhuXueFeng, ok. So I feel we'd better avoid using admin user during tempest test08:50
XueFengYes08:50
yanyanhutempest itself will handle the credential creation/usage during the test : )08:52
XueFengYes08:52
yanyanhuin opposite, we create every resource we need for testing08:52
yanyanhuand delete them after test is done08:52
XueFengThis is way we failed in current senlin version08:52
XueFengI am sloving this problem08:52
yanyanhugreat, thanks :)08:53
XueFengkeypair ,network, subnet api maybe changed when Ocata relase.08:53
yanyanhuyes08:54
yanyanhuthey could be08:54
XueFengAnd then it cause senlin gate fail in February08:54
XueFengThen we use https://review.openstack.org/#/c/437851/ to slove08:55
yanyanhuhi, XueFeng, that fix is not for API change in keypair/network or compute08:56
XueFengOK08:56
yanyanhuit is for a change in devstack which makes our local configuration failed to takes effect during gate installation08:56
yanyanhujust as Qiming mentioned, we will revise the "cloud_backend=openstack_test" option for functional test08:57
yanyanhuthis change is added in pre_test_hook.sh08:57
XueFengGot it. Thanks yanyanhu, Qiming08:58
yanyanhuso they are two different problems :)08:58
yanyanhuone is about fixing gate by add 'KEEP_LOCALRC=1'08:58
xuhaiweiHi guys, sorry to bother you, I got this error when trying to deploy senlin resource from heat, I tried to create two policies(scalein/scaleout) and attach them to one cluster, but the first ACTION didn't release the lock though it has completed, this is the error log from senlin-engine  http://paste.openstack.org/show/606016/  and the error log from heat-engine http://paste.openstack.org/show/606018/08:58
yanyanhuanother is about better designing our tempest test case  :)08:59
*** shu-mutou is now known as shu-mutou-AWAY08:59
xuhaiweithis is the heat template I am using  http://paste.openstack.org/show/606019/09:00
xuhaiweisorry to interrupt09:00
xuhaiweiCan anyone take a look of it? yanyanhu, elynn, Qiming, Xuefeng09:03
yanyanhuxuhaiwei, sounds like a bug. So the first action didn't release the cluster lock?09:03
yanyanhuafter it succeeded09:03
xuhaiweiyanyanhu, yes it seems, the sencond POLICY_ATTACH action's status is ready, but didn't excuted09:04
Qiminglooks like a db concurrency error09:05
Qimingthis should never occur because basically you are inserting two records to the cluster_policy table09:06
Qimingif the lock wasn't released by the first action, you may want to check senlin-engine log09:07
elynnHi xuhaiwei09:07
xuhaiweiQiming, I have pasted senlin-engine log above09:07
xuhaiweielynn: is there any mistake in template?09:08
elynnseems some lock issue in senlin....09:09
Qimingthe template looks fine to me09:09
elynnThe design in senlin resource is execute action one by one, so might need to dig deeper in senlin side..09:10
elynnI guess.09:10
xuhaiweican anyone recreate the error in your environment?09:10
elynnxuhaiwei: Do you encounter this everytime?09:10
xuhaiweielynn: yes, every time09:10
elynnhmm...09:10
elynnLet me check heat codes first...09:11
Qimingsenlin side retry logic was removed ....09:13
xuhaiweiI tried it again, error is still there, but the error log is different09:13
xuhaiwei2017-04-10 09:12:37.198 ERROR senlin.engine.senlin_lock [req-12813ee9-2974-4afd-b8e9-883aeb208814 None None] Cluster is already locked by action [u'8d4dfd7e-5c83-4426-9b6b-3284a70f3377'], action deeb1567-5e1e-4426-b0c9-dbd710603a6e failed grabbing the lock09:13
xuhaiweithis is all the error log in senlin-engine09:14
Qimingyep, that is normal09:14
Qimingtwo requests arrived at senlin attempting to lock the same cluster09:15
Qimingone of them will fail09:15
Qimingbecause the other takes some time to release it09:15
Qiminghttp://git.openstack.org/cgit/openstack/senlin/commit/senlin/engine/senlin_lock.py?id=b6e8d758a0e33547a3fd78c434337e80b0a826fa09:16
xuhaiweiQiming, the first action is finished, it will release the lock to the action which is in 'READY' status, isn't it?09:17
Qimingyes09:17
Qimingbut the second action is supposed to do a RETRY09:17
XueFeng       else:  # result == self.RES_RETRY:09:17
XueFeng            status = self.READY09:17
XueFeng            # Action failed at the moment, but can be retried09:17
XueFeng            # We abandon it and then notify other dispatchers to execute it09:17
XueFeng            ao.Action.abandon(self.context, self.id)09:17
XueFengMaybe here we can optimize09:18
XueFengthe notify hasn't been send immidately. And we will try this action when next action come.09:19
Qimingproblem is here: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/actions/base.py#n47909:20
Qimingwe somehow assigned a action.RES_ERROR to the result variable during exception handling on line 48209:21
Qimingthat line should be removed09:21
Qiminghopefully, the finally statement on line 491 will get executed and set the action to RES_RETRY09:21
Qiminglooks like action.execute() raised an exception here ?09:22
elynnseems like so.09:24
*** dixiaoli has quit IRC09:24
Qimingso we are not properly getting a RES_RETRY return value09:24
Qimingokay, here: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/actions/cluster_action.py#n100209:26
Qimingwe didn't expect cluster_lock_acquire to raise exceptions09:26
*** dixiaoli has joined #senlin09:26
yanyanhuhi, Qiming, in case there are some errors happen during the action execution and the status of some resources have been changed, if we simplly let the action to retry, more serious error could happen09:26
Qimingso we only checked the returned value09:26
xuhaiweiQiming, it seems no exceptions happened in action.execute()09:26
yanyanhuI mean simply removing line482 could cause some problems09:27
Qimingxuhaiwei, yanyanhu, check execute() method in cluster_action09:27
Qimingline 100209:27
yanyanhuyes, we should catch and handle exception here09:27
Qimingthe cluster_lock_acquire may actuall throw an exception, in addition to return some meaningful value09:28
Qimingwe failed to catch it09:28
yanyanhuyep09:28
Qimingif we catch it here, it is a RES_RETRY09:28
Qimingand it is safe09:28
yanyanhuyes09:28
yanyanhuit should be a try in this case09:28
xuhaiweiQiming, yanyanhu, I tested it, after catch the exception you said, it worked09:32
Qimingcool09:33
Qiminga different fix would be keep it at the db level09:33
yanyanhuQiming, +109:33
yanyanhuthat is better09:33
xuhaiweianyone is writing patch for this?09:36
Qimingxuhaiwei as far as I know, :)09:36
xuhaiweiQiming, any hint on fixing it at db layer?09:37
Qimingstudying09:37
Qimingto minimize the potential impact to other code09:38
yanyanhuxuhaiwei, take this as an example :)09:38
yanyanhuhttp://git.openstack.org/cgit/openstack/senlin/tree/senlin/db/sqlalchemy/api.py#n86309:38
QimingI'd suggest we wrap the call here: http://git.openstack.org/cgit/openstack/senlin/tree/senlin/db/sqlalchemy/api.py#n43309:39
xuhaiweiQiming , yanyanhu, thanks , I will try to fix it09:40
Qimingwith a try: ... catch db_exc.DBDuplicateEntry:09:40
xuhaiweiit seems DBDuplicateEntry is not happened every time09:41
Qimingem, right09:42
QimingDBDuplicateEntry is a guard against DB level concurrency09:42
elynnThe weird thing here is that https://github.com/openstack/senlin/blob/master/senlin/db/sqlalchemy/api.py#L425 , two query seems happened at the same time.09:42
elynnThen they both insert the lock.09:43
elynnQiming's way it a way to fix.09:43
QimingI don't have a good idea what the return value should be even if you caught the exception there09:43
elynnShould be None09:43
elynnI guess.09:43
QimingNone is okay09:44
Qimingbut have to double check the senlin_lock module and see if None is properly handled, right?09:44
Qimingno, should return []09:45
Qimingor else, you will get 'None is not iterable' or similar errors in senlin_lock09:46
elynnOh, yes, you are right.09:46
*** yanyanhu has quit IRC09:46
Qimingthen it means cluster_lock_acquire from senlin_lock module will return False if lock has been grabbed09:47
Qimingwe don't need to change ClusterAction.execute() to add exception handling09:47
Qiminghowever, removing line 482 here is still recommended, xuhaiwei : http://git.openstack.org/cgit/openstack/senlin/tree/senlin/engine/actions/base.py#n48209:48
xuhaiweiQiming, ok, will test it09:50
xuhaiweiQiming, by remove line 482, the error is still remained09:54
Qimingline 482 should be removed although it is not the key issue of the bug09:56
elynnxuhaiwei: To fix your problem: catch DBDuplicateEntry and return [] at http://git.openstack.org/cgit/openstack/senlin/tree/senlin/db/sqlalchemy/api.py#n43310:15
xuhaiweielynn: it worked? I will test it, thank you10:16
xuhaiweielynn: I am afraid the exception is not happend there10:16
elynnit didn't work?10:19
xuhaiweielynn: yes, exception didn't happen there10:21
elynnMaybe I misunderstand how session work.10:21
xuhaiweiI will go home now, see you guys10:22
elynnFrom your log, it is happened in cluster_lock_acquire xuhaiwei10:22
elynnxuhaiwei: ttl10:23
xuhaiweiyes10:23
*** elynn has quit IRC10:43
*** dixiaoli has quit IRC10:48
*** zhurong has quit IRC12:26
*** catintheroof has joined #senlin12:36
*** openstackgerrit has quit IRC13:33
*** openstackgerrit has joined #senlin13:34
*** ChanServ sets mode: +v openstackgerrit13:34
openstackgerritXueFeng Liu proposed openstack/senlin master: Add prepare and cleanup for nova server local tempest  https://review.openstack.org/45532513:34
openstackgerritXueFeng Liu proposed openstack/senlin master: Add prepare and cleanup to all needed test cases  https://review.openstack.org/45533113:59
openstackgerritXueFeng Liu proposed openstack/senlin master: Add prepare and cleanup to all needed test cases  https://review.openstack.org/45533114:01
openstackgerritXueFeng Liu proposed openstack/senlin master: Add prepare and cleanup to all needed test cases  https://review.openstack.org/45533114:20
*** sweeneyb has quit IRC21:22
*** catintheroof has quit IRC21:25
*** catintheroof has joined #senlin22:27

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!