Thursday, 2016-04-21

*** rossella_s has quit IRC01:03
*** rossella_s has joined #openstack-ha01:04
*** hoangcx has joined #openstack-ha01:32
*** sasukeh has quit IRC01:35
*** sasukeh has joined #openstack-ha01:56
*** masahito has quit IRC02:32
*** masahito has joined #openstack-ha02:43
*** masahito has quit IRC02:58
*** masahito has joined #openstack-ha03:00
*** sasukeh has quit IRC03:02
*** masahito has quit IRC03:38
*** masahito has joined #openstack-ha03:39
*** masahito has quit IRC03:39
*** moiz has joined #openstack-ha03:51
*** hoangcx_ has joined #openstack-ha03:59
*** beekhof has quit IRC04:00
*** hoangcx has quit IRC04:00
*** sasukeh has joined #openstack-ha04:05
*** sasukeh has quit IRC04:15
*** sasukeh has joined #openstack-ha04:25
*** hoangcx_ has quit IRC04:45
*** masahito has joined #openstack-ha04:45
*** beekhof has joined #openstack-ha04:46
*** hoangcx has joined #openstack-ha04:48
*** rossella_s has quit IRC05:03
*** rossella_s has joined #openstack-ha05:04
moizmasahito: I have setup Masakari controller on openstack controller and host, process & instance monitor on openstack compute nodes.05:12
moizmasahito: all the processes are running. However, the evacuations are not happening.05:13
masahitomoiz: hi05:13
masahitomoiz: I read your problem.05:13
masahitomoiz: first of all, all monitor processes don't have host fencing feature.05:14
masahitomoiz: so you needs to setup RA for nova-compute to fence host when nova-compute goes down.05:15
masahitomoiz: processmonitor only disables nova-compute on its host if processmonitor fails to restart processes listed in proc.list.05:16
moizokay so from pacemaker side, i need to configure nova-compute RA, fence-nova & ipmilan for each compute node05:18
moizso pacemaker is responsible for detecting nova-compute as down & then fencing it automatically05:18
masahitomoiz: yes it is. if you want to fence node when nova-compute goes down.05:19
moizif i dont want to fence the compute node via pacemaker , and may be write my own script which calls fence_compute directly, is it possible?05:19
moizbecause last time i configured nova-compute over pacemaker 1.1.12rc4 , it didnt work.05:20
moizpacemaker kept giving me 'not installed' errors for nova-compute on compute nodes. which doesn't make sense as they are installed and running on compute nods.05:21
openstackgerritchen.xing proposed openstack/ha-guide: Add a note of virtual node  https://review.openstack.org/30823705:22
masahitoit means you don't want to fence node when nova-compute goes down, but want to fence node when some crush happens on the node, right?05:23
masahitoif so, yes.05:24
masahitowrite your fencing script for pacemaker based on your usecase of fencing.05:25
moizyes. okay great. so where does masakari come in? i need to understand the work flow from when libvirt goes down till evacuation.05:26
moizas i understand , masakari process monitor detect libvirt down, tell the controller, waits for fencing & calls the evacuation API. is this correct?05:26
masahitoyes05:27
masahitosorry, no05:27
masahitofor evacuation.05:27
moizwhat is the correct workflow for evacuation? please explain.05:29
masahitoa host goes down, pacemaker running on another host detects the host down, pacemaker marks the host OFFLINE, hostmonitor detects the host down and sends it to controller, and then the controller waits for fencing and calls the evacuation API05:30
masahitolong steps :-)05:30
masahitofollowing steps are processmonitor's step:05:31
*** mjura has joined #openstack-ha05:32
masahitoprocess monitor detects libvirt down, tell the controller, and then the controller disables nova-compute running on the host of libvirt down05:32
moiz1. Process monitor: does exactly the same. i have noticed this on my setup05:33
moiz2. Hostmonitor: pacemaker sets the remote node (compute node) as OFFLINE. which host monitor will detect is as down? because hostmonitor will die along with the compute node.05:34
moizmy host monitor are running on compute node only. i have 1 compute running all 3 monitors, 1 compute is running all 3 monitors as is the RESERVED HOST. openstack controller is only running masakari controller (not monitors)05:35
*** nkrinner has joined #openstack-ha05:35
*** mjura has quit IRC05:39
*** mjura has joined #openstack-ha05:39
masahitoto be clarified, you have 2 compute node, one is compute for VM and has all 3 monitor, another is compute for RESERVED HOST and has all 3 monitor.05:40
masahitoand05:40
masahitowhere is full stuck pacemaker and pacemaker-remote deployed?05:41
moizyes05:41
moizcompute nodes are running pacemaker-remote05:41
moizopenstack controller node is running full stack pacemaker05:42
moizand i have added remote nodes in the pacemaker cluster (on the controller)05:42
masahitooh, got it.05:42
masahitoso can you see the all node's status of pacemaker at reserved host?05:43
masahitoby crm_mon command05:43
moizthats another thing i was going to mention. crm_mon would run on the controller node where the pacemaker stack is. aand on the compute nodes only pacemaker remote is installed, the clients are not installed there. and i was looking at masakari scripts & its calling crm_mon on the compute node05:45
masahitoright, I think it's root cause.05:45
moizi need to install pacemaker clients on the compute nodes. crmsh ?05:46
masahitohostmonitor relies on output of crm_mon. so you need to install crm command onto remote node when you use pacemaker-remote05:46
masahitoyes05:46
moizokay let me try05:46
masahitoI think. but I don't remember exact package name.05:47
moizdone. apt-get install crmsh05:47
moizits working now. i can see the full cluster info on the compute nodes05:47
moizincluding the reserved host05:47
masahitogreat!05:48
masahitooh.05:48
masahitolet me tell you one important thing05:48
masahitoif possible, could you put corosync.conf on full stuck pacemaker into remote host?05:49
moizyes i can do it05:49
masahitohostmonitor detects which cluster the monitor belongs to by parsing the config.05:50
moizgot it.05:50
moizone last thing. at this moment, i dont want pacemaker to monitor the nova-compute process & fence the compute node. Can i see evacuations happening using masakari? how?05:51
masahitoonly seeing logs now.05:54
masahitoor check the number of reserved hosts.05:54
moizif i unplug my compute node. the hostmonitor on RESERVED node will detect that another compute node is down and will tell the controller and controller will call evacuate api for the down compute node. is this is correct?05:57
moizmeanwhile when i unplug, pacemaker will also mark the compute node (remote) as offline.05:58
masahitoright05:58
moizgreat.05:58
moizi am going to try it :-)05:58
*** moizarif has joined #openstack-ha06:07
*** moiz has quit IRC06:08
moizarifmasahito: you mentioned a 5 mins convergence period for evacuations. why 5 mins? is it depenedent on the # of VMs on the host?06:15
moizarifokay i unplugged the compute node. this is what happened:06:18
moizarif1. nova-compute  got disabled06:19
moizarif2. Pacemaker marked the compute node as OFFLINE (crm_mon)06:19
moizarif3. its been 10 mins now and still no evacuations.06:19
moizarifcontroller logs say:06:21
moizarifApr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <_MainThread(MainThread, started 140292292380480)> Recieved notification : {    "id": "2f27dcd3-00c0-45e0-8214-6ea8308d8eb9",    "type": "nodeStatus",    "regionID": "serverstack",    "hostname": "compute1-t4",    "uuid": "",    "time": "20160421060934",    "eventID": "1",    "eventType": "1",    "detail": "02",06:21
moizarif"startTime": "20160421060934",    "endTime": null,    "tzname": "'UTC', 'UTC'",    "daylight": "0",    "cluster_port": ""}'06:21
moizarifApr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <_MainThread(MainThread, started 140292292380480)> {u'eventID': u'1', u'hostname': u'compute1-t4', u'uuid': u'', u'eventType': u'1', u'regionID': u'serverstack', u'cluster_port': u'', u'detail': u'02', u'daylight': u'0', u'tzname': u"'UTC', 'UTC'", u'startTime': u'20160421060934', u'time': u'20160421060934', u'endTime':06:21
moizarifNone, u'type': u'nodeStatus', u'id': u'2f27dcd3-00c0-45e0-8214-6ea8308d8eb9'}'06:21
moizarifApr 21 06:09:34 juju-machine-3-lxc-3 masakari-controller(22126): INFO: <Thread(Thread-4, started 140292101179136)> Disable nova-compute on compute1-t4'06:21
*** sasukeh has quit IRC06:21
masahitomoizarif: 5 mins was just our requirements.06:24
moizarifnothing in the logs of process & host monitor on the reserved host06:24
masahitomoizarif: what we need to be sure is we have to wait the down host is fenced. so it waits 5 mins.06:25
masahitoquestions06:25
masahito1. the host name 'compute1-t4' is equal to the name in crm_mon output?06:26
moizarifyes06:26
masahito2. when you register reserved host by CLI, what did you specify in cluster_port?06:27
moizarifpython /home/ubuntu/masakari/masakari-controller/utils/reserve_host_manage.py --mode add  --port "172.30.1.205:5405" --host compute2-b1 --db-user root --db-password 123  --db-host 127.0.0.106:27
moizarifi used this command. dont think i added any cluster port06:28
masahitogot it.06:29
moizarif5405 port06:29
masahitoI think the difference between the command --port "172.30.1.205:5405" and the notification "cluster_port": ""}' causes evacuation fialure.06:30
masahitoI'm thinking it's bad point in Masakari06:31
masahitoMasakari06:31
moizarifmy mysql output for reserve_list table:06:31
moizarifmysql> select * from reserve_list;06:31
moizarif+----+---------------------+-----------+---------------------+---------+-------------------+-------------+06:31
moizarif| id | create_at           | update_at | delete_at           | deleted | cluster_port      | hostname    |06:31
moizarif+----+---------------------+-----------+---------------------+---------+-------------------+-------------+06:31
moizarif|  1 | 2016-04-20 09:59:19 | NULL      | 2016-04-20 11:55:51 |       1 | 172.30.1.205:5405 | compute2-b1 |06:31
moizarif+----+---------------------+-----------+---------------------+---------+-------------------+-------------+06:31
moizarif1 row in set (0.00 sec)06:31
*** sasukeh has joined #openstack-ha06:32
masahitoMasakari specifies whether the host notified by hostmonitor is in same cluster or not with cluster_port valuse.06:32
masahitobut the cluster_port in a notification is generated based on corosync.conf06:33
masahitowe need to improve it.06:33
moizarifso what can be a temporary workaround for this that i can use?06:34
masahitoso a workaround for it is you add reserved host with --cluster_port "".06:34
moizarifgot it. let me try this out06:34
masahitoinstead of 172.30.1.205:540506:34
*** sasukeh has quit IRC06:39
*** rsjethani has joined #openstack-ha06:43
rsjethanihi masahito06:44
masahitorsjethani: hi06:45
rsjethaniI have a few questions regarding masakari06:45
masahitorsjethani: ok, go ahead.06:46
rsjethaniok, first of all why host monitor and process monitor are written in shell script instead of python?06:47
masahitobecause we are using those as a RA of pacemaker local.06:48
masahitoand both call linux commands, like crm_mon or etc. so it's suite to be linux.06:51
rsjethaniBut RA interface in language independent06:51
*** sasukeh has joined #openstack-ha06:51
*** hoangcx_ has joined #openstack-ha06:53
rsjethanihttp://www.linux-ha.org/wiki/Resource_agents06:53
rsjethaniTopic "Implementation" says the RA just needs to have a predefined interface but06:54
*** hoangcx has quit IRC06:54
rsjethaniThe reason I a msaying this is beacause shell scripts are hard to follow and maintain.06:55
rsjethaniAlso we want to get masakari under openstack where primary language in python06:55
masahitoyes, we had options. but we've decided to implement it by shell script.06:59
rsjethaniok :)06:59
masahitoyes, I know.06:59
masahitoagreed to hard to maintain.07:00
rsjethaniSo where doed host monitor run?07:00
masahitoon compute node? Is that answer for your question?07:00
rsjethaniyes I am trying to understand how and where host monitor runs in a an system where we have say three compute nodes and one master/controller node07:02
masahitoin that case, hostmonitor should run on all 3 compute nodes.07:03
rsjethaniand masakari-controller will run on the controller node right?07:04
masahitoright07:04
rsjethaniok07:05
rsjethaniAnother Question: why we need masakari-controler?07:09
rsjethaniIMO we can make all three comaponents as independent services07:09
rsjethanijust like nova ,glance etc07:10
rsjethanilet HM make its own decisions. Same goes for IM07:10
masahitoTo conduct all error especially for race conditions, we introduce masakari-controller07:12
rsjethaniCan you give example of race condition. thanks07:13
masahitofor example, if the host goes down while instance monitor is rebuilding VM, when should it call evacuate API?07:14
masahitoFrom outside of Nova, we can't stop rebuild steps in Nova.07:14
*** dgurtner has joined #openstack-ha07:15
rsjethaniThanks masahito. I will look further into masakari and come back here :)07:16
*** permalac has quit IRC07:19
moizarifmasahito: the command: python reserve_host_manage.py --mode add --cluster_port "172.40.1.205:5405" --host compute2-b1 --db-user root --db-password 123 --db-host 127.0.0.107:29
moizarifgives: reserve_host_manage.py: error: unrecognized arguments: --cluster_port 172.40.1.205:540507:30
*** moiz has joined #openstack-ha07:35
*** dileepr has quit IRC07:39
masahitosorry, --port is correct07:43
masahitoI meant --port ""07:44
*** jpena|off is now known as jpena07:44
*** masahito has quit IRC07:46
*** moiz_ has joined #openstack-ha07:51
*** moiz has quit IRC07:53
*** hoangcx_ has quit IRC07:56
*** hoangcx has joined #openstack-ha07:56
*** sasukeh has quit IRC08:02
*** moizarif has quit IRC08:03
*** haukebruno has joined #openstack-ha08:13
*** masahito has joined #openstack-ha08:22
*** markvoelker has quit IRC08:27
aspiersrsjethani: are you coming to Austin?08:29
rsjethaniHi aspiers08:44
rsjethaniNo I won't be there :(08:44
aspiers:(08:44
rsjethaniBut my colleagues will be there08:45
aspiersrsjethani: then watch for the video of https://www.openstack.org/summit/austin-2016/summit-schedule/events/732708:45
rsjethaniok08:45
*** rossella_s has quit IRC09:03
*** rossella_s has joined #openstack-ha09:06
*** markvoelker has joined #openstack-ha09:27
*** markvoelker has quit IRC09:32
*** moiz_ has quit IRC09:46
*** masahito has quit IRC09:52
*** sasukeh has joined #openstack-ha09:57
*** hoangcx has quit IRC10:32
*** rsjethani has quit IRC10:40
*** rsjethani has joined #openstack-ha10:40
*** dgurtner has quit IRC10:47
*** dgurtner has joined #openstack-ha10:47
*** sasukeh has quit IRC11:14
*** ChanServ changes topic to "OpenStack HA | next meeting in Austin! 12:30pm Expo Hall 5, table with ClusterLabs sign"11:35
*** jpena is now known as jpena|lunch11:36
aspiershttp://clusterlabs.org/pipermail/users/2016-April/002753.html11:41
*** mjura has quit IRC11:48
*** mjura has joined #openstack-ha12:04
*** markvoelker has joined #openstack-ha12:17
*** yan-gao has quit IRC12:19
*** yan-gao has joined #openstack-ha12:21
*** yan-gao has quit IRC12:29
*** yan-gao has joined #openstack-ha12:29
*** mjura has quit IRC12:38
*** mjura has joined #openstack-ha12:51
*** jpena|lunch is now known as jpena12:59
*** sasukeh has joined #openstack-ha13:38
*** rsjethani has quit IRC13:50
*** kgaillot has joined #openstack-ha13:56
*** sasukeh has quit IRC14:20
*** mjura has quit IRC15:16
*** sigmavirus24_awa is now known as sigmavirus2415:33
*** sasukeh has joined #openstack-ha15:41
*** dgurtner has quit IRC15:46
*** sasukeh has quit IRC15:46
*** openstackgerrit has quit IRC15:48
*** openstackgerrit has joined #openstack-ha15:49
*** sasukeh has joined #openstack-ha15:52
*** sasukeh has quit IRC16:09
*** jpena is now known as jpena|off16:58
*** serverascode has quit IRC17:03
*** rossella_s has quit IRC17:03
*** rossella_s has joined #openstack-ha17:04
*** serverascode has joined #openstack-ha17:05
*** sasukeh has joined #openstack-ha17:10
*** sasukeh has quit IRC17:16
*** rossella_s has quit IRC17:19
*** rossella_s has joined #openstack-ha17:20
*** FL1SK has quit IRC17:23
*** sigmavirus24 is now known as sigmavirus24_awa17:55
*** hoonetorg has joined #openstack-ha17:56
*** jpokorny has joined #openstack-ha17:58
*** sigmavirus24_awa is now known as sigmavirus2418:09
*** haukebruno has quit IRC18:11
*** sasukeh has joined #openstack-ha18:11
*** sasukeh has quit IRC18:16
*** sasukeh has joined #openstack-ha19:12
*** sasukeh has quit IRC19:18
*** FL1SK has joined #openstack-ha19:24
*** sigmavirus24 is now known as sigmavirus24_awa20:09
*** sigmavirus24_awa is now known as sigmavirus2420:13
*** sasukeh has joined #openstack-ha20:15
*** sasukeh has quit IRC20:19
*** sasukeh has joined #openstack-ha21:41
*** sasukeh has quit IRC21:46
*** sigmavirus24 is now known as sigmavirus24_awa22:35
*** sasukeh has joined #openstack-ha22:42
*** vuntz has quit IRC22:43
*** vuntz has joined #openstack-ha22:44
*** kgaillot has quit IRC22:45
*** sasukeh has quit IRC22:47
*** markvoelker has quit IRC23:20
*** sasukeh has joined #openstack-ha23:34
*** sasukeh has quit IRC23:39

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!