Monday, 2016-04-18

*** masahito has joined #openstack-ha00:45
*** smoriya_ has joined #openstack-ha02:27
*** moiz has joined #openstack-ha04:09
moizhey guys.04:37
moizi am trying out masakari approach on my canonical setup04:37
moizi have started with 1 controller (nova lxc) and 1 compute + 1 reserved host04:38
moizmy process & host monitors are running04:38
moizhowever when on the controller i start masakari-controller it doesn't start up04:38
moizsays " * masakari is not running"04:39
moizthe db is setup & the database & tables are also present04:39
moizi also dont see any logs on the controller04:39
*** smoriya_ has quit IRC05:00
moizis there a requirement for masakari controller to run on the same host where keystone, mysql & nova are present?05:06
*** rsjethani has joined #openstack-ha05:52
*** rsjethani has quit IRC06:00
masahitomoiz: what version are you using?06:14
masahitomoiz: I encountered same situation when I forgot installing SQLAlchemy-Utils.06:17
masahitomoiz: now the package is listed on requirements.txt06:17
moizi am using the 1.0.006:28
moizand i have SQLAlchemy-Utils installed06:29
masahito1.0.0 doesn't use SQLAlchemy-Utils, so it's not related...06:35
masahitohmm....06:35
masahitoOr does masakari-controller have access permission to the directory?06:35
masahitos/access permission/write permission/06:36
moizwhich directory?06:36
moizone more thing. i am getting this error in masakari-controller logs:06:38
moizApr 18 06:08:24  masakari(12031): ERROR: --MonitoringMessage--ID:[RecoveryController_0003]An error during initializing masakari controller: Could not find a suitable e$06:38
moizApr 18 06:08:24  masakari(12031): ERROR: <class 'keystoneauth1.exceptions.discovery.VersionNotAvailable'>'06:38
moizApr 18 06:08:24  masakari(12031): ERROR: Could not find a suitable endpoint for client version: 3'06:38
moizApr 18 06:08:24  masakari(12031): ERROR: --MonitoringMessage--ID:[RecoveryController_0003]An error during initializing masakari controller: Could not find a suitable endpoint for client version: 3'06:38
masahitothis error is masakari-controller's log  :-)06:38
moizyes06:39
moizdo keystone & nova need to be on the same machine for masakari-controller to work?06:39
masahitomasakari assumes your openstack is using KeystoneV3 for authentification now06:39
masahitobecause our system, which is designed for masakari's first deploy, uses keystoneV3.06:40
masahitokeystone & nova don' t need to be on the same machine for the controller.06:41
moizokay got it06:41
moizand i believe i have keystone v2 on my system06:42
moizalso. running create_database.sh gives me /usr/bin/python: No module named masakari-controller.db06:55
moizbut running python create_tables.py gives "Successfully created tables"06:56
masahitocould you check if your db has vm_ha database?07:01
masahitoit's a database masakari-controller uses07:01
moizyes it exists07:02
masahitook, no problem ;)07:02
*** pcaruana has joined #openstack-ha07:15
*** mjura has joined #openstack-ha07:17
*** beekhof has joined #openstack-ha07:29
*** dgurtner has joined #openstack-ha07:34
*** dgurtner has joined #openstack-ha07:34
*** haukebruno has joined #openstack-ha07:44
*** dgurtner has quit IRC07:46
*** rossella_s has joined #openstack-ha07:48
*** dgurtner has joined #openstack-ha07:48
*** dgurtner has quit IRC07:48
*** dgurtner has joined #openstack-ha07:48
*** rmart04 has joined #openstack-ha07:55
moizmasahito: i have resolved the issue. masakari controller is now running :)07:58
moizmasahito: on the compute node my process and host monitors are also running but the instancemonitor is starting up " * masakari-instancemonitor is not running "07:59
moizthe 15868 port is also in listening state on the controller07:59
moizno logs for instancemonitor yet. any idea ?08:01
aspiersmorning08:01
aspiersor evening08:01
masahitoaspiers: hi08:02
aspiers-> #openstack-meeting08:02
masahitomoiz: I solved https://github.com/ntt-sic/masakari/pull/23 for instancemonitor08:06
masahitocould you check it?08:06
masahitoit's really tiny patch08:06
aspiersmasahito, moiz: are you joining the meeting?08:07
moizlet me try it08:11
moizaspiers: okay i am there08:11
*** jpena|off is now known as jpena08:49
moizmasahito: the patch is already in place.08:54
moizp.s. i got the new version from 1.1.0: https://github.com/ntt-sic/masakari/releases/tag/1.1.008:54
masahitopython-daemon and python-libvirt are already installed, aren't those.08:57
masahitoIIRC, both are installed when nova is installed.08:58
moizyes09:04
moizthey are installed09:04
masahitohmm...09:05
masahitono logs, no process. right?09:05
moizyes.09:06
moizlogs are there for host & process monitors only09:06
*** Deng has joined #openstack-ha09:07
moizand in the processmonitor.log i see: 2016-04-18 09:05:48 Compute0-B3 process_status_checker.sh:  down process id_no : 0209:10
masahitoit means processmonitor fails to start or restart instancemonitor. it's right log in that case.09:11
moizwhich process is down in this case? id_no: 02 ?09:18
moizokay i need to look why instancemonitor is failing. i have rechecked the .conf files and they are okay09:19
masahitothe number is basing on the number in masakari-processmonitor.conf09:20
aspiersmasahito: are there any reasons not to replace processmonitor with pacemaker process monitoring? IIRC one point was that if nova-compute doesn't start, Pacemaker won't fence - is that right?09:21
masahitoyes09:25
masahitoadditionally, disable the nova-compute09:26
masahitothere were no pacemaker professional when we developed it, so we developped it.09:27
aspiersmasahito: I think it should be possible to fix that in pacemaker09:28
masahitoyap.09:28
aspiersyou could have a custom nova-compute-service resource09:28
aspiersand nova-compute would depend on it09:28
aspierswhen nova-compute gets stopped, nova-compute-service would do nova service-disable09:29
masahitooh, s/no pacemaker professional/no pacemaker professional in our team/09:29
aspiers:)09:29
aspiersmasahito: also, is masakari HA? e.g. what if the daemon crashes?09:30
masahitowe use pacemaker for the controller, hostmonitor and processmonitor.09:31
aspierscool09:31
aspiersmasahito: and it retries failed evacuations?09:32
aspiersmasahito: how does it know evacuation succeeded?09:32
masahitoaspiers: controller has evacuation status in its db.09:33
masahitoaspiers: so it retries evacuation or wait evacuation after controller's failover.09:33
aspiershow does it get evacuation status?09:34
aspiersI think it's possible for resurrection to fail after nova evacuate API succeeds09:34
aspierse.g. if nova-scheduler dies at the wrong time09:34
aspiersI was discussing this with ddeja on Friday09:35
masahitousually check instance status using Nova API, or IIRC wait evacuation timeout set in masakari.conf.09:39
moizmasahito: my db has started receiving node status notications and i can see them in notification_list table.09:42
moizahhhh09:46
moizfound it09:46
moizImportError: No module named httplib209:46
moizsolved it09:46
moizinstancemonitor working now09:46
masahitomoiz: great09:47
moizi have a suggestion: i found out this when i manually ran python file for masakari_instancemonitor.py in python /opt/masakari/instancemonitor/masakari_instancemonitor.py09:47
moizand i couldnt find this issue any where else09:47
moizmasakari should point this out and generate logs even if any service is failing to run09:48
moizlike i should have got this error in instancemonitor.log09:48
masahitomoiz: oh, you're right.09:50
masahitoif possible, could you report issue in masakari repo?09:51
moizcan you share me the link?09:51
masahitohttps://github.com/ntt-sic/masakari/issues09:51
masahitoit helps us not to forget these.09:52
moizokay i will do it.09:52
moizalso faced this same issue while trying to run masakari controller.09:52
aspiersmasahito: so masakari checks that the instance started on another host?09:57
aspiersmasahito: also, does masakari use force-down API?09:58
masahitoaspiers: for first one, yes. for second one, no.10:02
masahitoaspiers: I wrote evacuation steps in doc. please see this: https://github.com/ntt-sic/masakari/blob/master/docs/evacuation_patterns.md10:03
aspiersok thanks!10:03
moizmasahito: https://github.com/ntt-sic/masakari/issues/3510:07
masahitomoiz: thanks!10:08
moizmasahito: now masakari is setup, masakari controller & DB on openstack controller & host/process/instance monitors on compute node & 1 reserve node configured.10:19
moizpacemaker and pacemaker_remote are also setup and running10:19
moizwhat are the next steps, adding stonith resource for compute node in pacemaker ?10:19
moizand test evacuations based on setup written on github?10:19
aspiersmasahito: what is the "resized" status?10:20
masahitomoiz: yes10:20
masahitoaspiers: it's in resizing operation by Nova resize API that enables user to change instance flavor size.10:21
aspiersmasahito: but it's also mentioned for hostmonitor host down event?10:22
aspiersresizing a host??10:22
aspiersmasahito: also, can it evacuate multiple hosts in parallel?10:23
masahitoaspiers: this row says controller receives host down events while the instance on the down host is in 'resized' state.10:24
aspiersoh10:24
masahitoaspiers: multiple means multi hosts in one pacemaker cluster or multi hosts in different pacemaker clusters?10:25
aspiersone cluster10:25
masahitoI think it can if you set reserved hosts more than 2.10:26
aspiersmasahito: can you choose which VMs are HA? or does it always make all HA?10:32
aspiersmasahito: or choose per compute-node, or per AZ, or per project?10:32
masahitoaspiers: yes, you can.10:34
aspiersmasahito: which one?10:34
masahitoaspiers: if VMs have metadata, key is "HA-Enable" and value is "OFF", the vm doesn't evacuated.10:34
aspiersahh, nice10:35
aspierswhat about per-project?10:35
*** dgurtner has quit IRC10:35
masahitoper-vms10:35
aspiersis it possible to set defaults per project?10:35
aspiersI guess that's an OpenStack question10:35
masahitono10:35
aspiersok10:35
moizmasahito: what about VMs belonging to different tenants residing on the same compute node? are all VMs evacuated in this case?10:37
masahitomoiz: all are evacuated. it doesn't care tenants to evacuate.10:39
aspiersmasahito: which platforms/versions does masakari currently support?10:46
masahitoaspiers: ubuntu14.04 is sure to be supported.10:47
masahitosamP tested it on up-to-date CentOS.10:48
aspierswhich CentOS?10:48
aspiersmasahito: which Pacemaker version on 14.04?10:49
masahitoCent710:49
aspiersok10:49
masahitodefault version on 14.04 is 1.1.10, so if you want to use pacemaker-remote on it you need to build it locally.10:49
aspiersok10:50
masahitoI heard 16.04 will support 1.1.1410:50
masahitobtw, I'll leave in few mins here. If you have another question, please mail me or see you towmorrow ;-)10:51
aspiersok :)10:51
aspiersmasahito: I will send you URL to review slide deck10:51
masahitoaspiers: got it.10:51
*** masahito has quit IRC10:54
moizdefault on 14.04 is 1.1.10. i am using 1.1.12 in order to use masakari with pacemaker remote nodes10:59
*** serverascode_ has joined #openstack-ha11:02
*** g3ek- has joined #openstack-ha11:04
*** ljjjustin_ has joined #openstack-ha11:05
*** ljjjustin has quit IRC11:05
*** serverascode has quit IRC11:06
*** zehua has quit IRC11:06
*** g3ek has quit IRC11:06
*** g3ek- is now known as g3ek11:06
*** ljjjustin_ is now known as ljjjustin11:06
*** zehua has joined #openstack-ha11:08
*** serverascode_ is now known as serverascode11:12
*** dgurtner has joined #openstack-ha11:23
*** markvoelker has joined #openstack-ha12:19
*** jpena is now known as jpena|lunch12:36
*** Deng has quit IRC12:44
*** smoriya_afk has joined #openstack-ha12:53
haukebrunoaspiers, crowbar is crowbar, right? I read about crowbar and opencrowbar and digital rebar13:18
* haukebruno is confused13:18
*** dgurtner has quit IRC13:21
aspierserrr13:22
aspiersyes, crowbar is crowbar?!13:22
aspiershaukebruno: opencrowbar is dead13:22
*** dgurtner has joined #openstack-ha13:23
haukebrunoand what is that digital rebar thing?13:23
aspiershaukebruno: digital rebar is a rewrite of Crowbar which no longer focuses on OpenStack deployment13:23
haukebrunogood or evil? or dead too?13:23
haukebrunoah ok13:23
haukebrunoso -> it is crowbar13:23
aspiersby RackN, startup of ex-Dell employees13:23
aspiersrebar is NOT Crowbar13:23
haukebrunoah no. I mean: then it - the solution - for me could be crowbar, not rebar, not opencrowbar13:25
aspiershaukebruno: oh I see13:29
aspiershaukebruno: yes, definitely :)13:30
*** kgaillot has joined #openstack-ha13:34
*** jpena|lunch is now known as jpena13:49
*** moiz has quit IRC13:51
*** sigmavirus24_awa is now known as sigmavirus2414:03
*** nkrinner has quit IRC14:10
*** dgurtner has quit IRC14:43
*** dgurtner has joined #openstack-ha14:45
*** mjura has quit IRC15:20
*** haukebruno has quit IRC16:04
*** raginbajin has quit IRC16:05
*** raginbajin has joined #openstack-ha16:10
*** pcaruana has quit IRC16:21
*** dgurtner has quit IRC16:23
*** rmart04 has quit IRC16:29
*** rossella_s has quit IRC17:03
*** rossella_s has joined #openstack-ha17:04
*** jpena is now known as jpena|off17:07
*** sigmavirus24 is now known as sigmavirus24_awa17:26
*** sigmavirus24_awa is now known as sigmavirus2417:28
*** dgurtner has joined #openstack-ha17:39
*** dgurtner has joined #openstack-ha17:39
*** dgurtner has quit IRC17:59
*** rmart04 has joined #openstack-ha18:15
*** rmart04 has quit IRC18:20
*** moiz has joined #openstack-ha18:54
*** raginbajin has quit IRC19:16
*** FL1SK has quit IRC19:19
*** raginbajin has joined #openstack-ha19:21
*** moiz has quit IRC21:01
*** rossella_s has quit IRC21:03
*** rossella_s has joined #openstack-ha21:04
*** FL1SK has joined #openstack-ha21:20
*** dileepr has joined #openstack-ha21:53
*** sigmavirus24 is now known as sigmavirus24_awa22:20
*** markvoelker has quit IRC22:28
*** kgaillot has quit IRC22:56
*** markvoelker has joined #openstack-ha23:28
*** markvoelker has quit IRC23:33

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!