Monday, 2021-06-21

ignaziocassano1	hello everyone	14:32
ignaziocassano1	Anyone can help me with masakari configuration on kolla ?	14:33
yoctozepto	hi ignaziocassano1, I may	14:34
ignaziocassano1	yoctozepto tannks. Seems masakary on kolla wallaby requires pacemaker. It is right ?	14:35
yoctozepto	masakari always requires pacemaker now if you want to monitor the hosts as opposed to instances only	14:35
ignaziocassano1	ok	14:35
yoctozepto	we are planning to support consul	14:35
yoctozepto	kolla ansible has the role to deploy pacemaker for you	14:36
yoctozepto	and integrates it with masakari	14:36
ignaziocassano1	When hai try to installa hacluster roles it gives me the following error: TASK [hacluster : Ensure config directories exist] ************************************************************** failed: [tst2-osctrl01 -> localhost] (item=hacluster-corosync) => {"ansible_loop_var": "item", "changed": false, "item": "hacluster-corosync", "msg": "There was an issue creating /etc/kolla/config/hacluster-corosync as requested: [Errno 13] Permission	14:36
ignaziocassano1	I do not find any documentation to install hacluster with kolla	14:37
ignaziocassano1	I would be grateful if you could send me some documentation link	14:37
ignaziocassano1	Let me to start with more easy steps before	14:39
ignaziocassano1	As you mentioned there are two configurations: one is the instance monitor only	14:40
ignaziocassano1	Another one is compute node monitor: it is right ?	14:41
ignaziocassano1	In the first case (instance monitor only) I should have masakari host monitor and masakari engine on controllers and monitor on computes nodes	14:42
ignaziocassano1	Then what I must di for enablng one instance HA ?	14:43
ignaziocassano1	Then what I must do for enablng one instance HA ?	14:43
yoctozepto	you can disable pacemaker by setting enable_hacluster to no	14:54
yoctozepto	you can also set the masakari-hostmonitor to be empty to not deploy one	14:55
yoctozepto	interesting though that it gives you an error	14:55
yoctozepto	I think we might be missing a ``become`` somewhere	14:56
ignaziocassano1	It is strange because I can deploy all roles but not hacluster	14:57
yoctozepto	yeah, I believe it's some edge case	15:00
yoctozepto	are you using the latest stable/wallaby?	15:01
yoctozepto	the issue does not seem to exist there	15:01
yoctozepto	there is ``become`` set as everywhere	15:01
ignaziocassano1	yes, I am using the last wallaby	15:03
ignaziocassano1	on mu multinode file now I inserted:	15:04
ignaziocassano1	[control] # These hostname must be resolvable from your deployment host tst2-osctrl01 ansible_user=ansible ansible_become=true tst2-osctrl02 ansible_user=ansible ansible_become=true tst2-osctrl03 ansible_user=ansible ansible_become=true	15:04
ignaziocassano1	Sorry	15:05
ignaziocassano1	tst2-osctrl01 ansible_user=ansible ansible_become=true	15:05
ignaziocassano1	Can I set a global variable rather than specify it for each node ?	15:05
ignaziocassano1	OK	15:12
ignaziocassano1	setting them with environment variables, works fine	15:12
yoctozepto	there is ``become`` at the task level and it works for me; perhaps there is some ansible quirk in play but I can't imagine	15:13
ignaziocassano1	Now the playbook worked	15:14
ignaziocassano1	Please, Hw can I verivy if hacluster is working ?	15:14
ignaziocassano1	I was wrong ....I deployed only masakary, not pacemaker	15:17
yoctozepto	you can issue ``crm_mon -1`` in the pacemaker's container	15:17
ignaziocassano1	I was wrong ....I deployed only masakari, not pacemaker	15:17
ignaziocassano1	I must download the image for pacemaker ?	15:17
yoctozepto	no, I meant the the deployed one	15:19
ignaziocassano1	So, After I deployed only masakary, how can I test ha on an instance	15:26
yoctozepto	ah, you can create a test instance	15:29
yoctozepto	and e.g. kill the qemu process forcibly from the host	15:29
yoctozepto	this will trigger the masakari actions on it	15:29
yoctozepto	the instance should come back up	15:29
ignaziocassano1	I tried	15:30
ignaziocassano1	the instance remains stopped	15:30
ignaziocassano1	I do not know If I must confidure somethng else or tag the instance with some property	15:31
yoctozepto	you want to either enable https://docs.openstack.org/masakari/latest/configuration/config.html#instance_failure.process_all_instance	15:33
yoctozepto	process_all_instances	15:33
yoctozepto	in instance_failure in masakari.conf	15:33
yoctozepto	OR set the metadata on VMs	15:33
yoctozepto	the default is HA_Enabled	15:34
yoctozepto	so setting HA_Enabled as True will enable HA protection per VM	15:34
yoctozepto	you can customise it	15:34
ignaziocassano1	I used the HA_Enabled property on instance but it did not work	15:34
yoctozepto	did you set it to True?	15:35
ignaziocassano1	yes	15:35
yoctozepto	did you check the masakari logs? did anything happen?	15:35
yoctozepto	the instance monitor on the compute host should spot the issue	15:35
ignaziocassano1	I am going to check	15:39
ignaziocassano1	the host where I killed the instance has the following name: tst2-kvm02	15:48
ignaziocassano1	the instance monitor reports: Client Error for url: http://10.102.119.194:15868/v1/notifications, Host with name tst2-kvm02 could not be found.	15:49
yoctozepto	ah, you first need to create a HA segment in masakari and add protected hosts to it	15:50
ignaziocassano1	thanks . I try	15:53
ignaziocassano1	It asks: Rserved, Type , Control Attribute and On Maintanance	15:56
ignaziocassano1	What I can put on them ?	15:56
yoctozepto	reserved is whether you want the host to act as a target for failures, this is not too robust	16:00
yoctozepto	type should always be compute	16:00
yoctozepto	controll attribute is ignored for now; mostly ppl write ssh in there	16:01
yoctozepto	`on maintenance` would put the host in maintenance mode so ignore notifications on failures there	16:01
yoctozepto	this is as the name suggests - for maintenance	16:01
ignaziocassano1	I must have a segment for each host ?	16:01
yoctozepto	no, you can have one that spans all the hosts	16:02
ignaziocassano1	it restarts on the same node :-(	16:05
yoctozepto	yeah, because it's the instance that failed, not the node	16:06
yoctozepto	this is expected	16:06
ignaziocassano1	soIf I want to simulete I crash of the node I can try to reboot it ?	16:08
yoctozepto	no, you should forcibly power it off	16:08
yoctozepto	but you need hostmonitor and pacemaker for that to work	16:08
ignaziocassano1	OK	16:08
ignaziocassano1	I stop it with the IDRAC of Dell	16:08
ignaziocassano1	I powered off the node but the instace results running on it	16:13
yoctozepto	yes, I told you you need hostmonitor and pacemaker for that	16:17
ignaziocassano1	I installed them	16:18
ignaziocassano1	I enabled hacluster: yes in globals.yaml and I pulled pacemaker images	16:19
ignaziocassano1	I deployed with -tmasakari,hacluster without errors	16:20
yoctozepto	have you checked the cluster health with that crm_mon I suggested?	16:24
yoctozepto	also, hostmonitor should now observe the host failure	16:24
ignaziocassano1	I missed de command crm:mon you suggested, sorry	16:27
ignaziocassano1	I usually use pcs command	16:27
ignaziocassano1	Cluster Summary: * Stack: corosync * Current DC: tst2-osctrl02 (version 2.0.3-4b1f869f0f) - partition with quorum * Last updated: Mon Jun 21 18:27:33 2021 * Last change: Mon Jun 21 17:38:50 2021 by root via cibadmin on tst2-osctrl01 * 5 nodes configured * 2 resource instances configured Node List: * Online: [ tst2-osctrl01 tst2-osctrl02 tst2-osctrl03 ] * RemoteOnline: [ tst2-kvm02 ] * RemoteOFFLINE: [ tst2-kvm01 ] A	16:28
ignaziocassano1	yes	16:28
ignaziocassano1	crm_mon list one node remote offiline	16:28
ignaziocassano1	So, I stopped the node 1 and it became remote offlime but instance did not restart on the other node	16:34
ignaziocassano1	The crm_mon list show only remote nodes and controllers nodes . No others resources are needed ?	16:36
yoctozepto	no, it's all right	16:39
yoctozepto	what about hostmonitor logs	16:39
yoctozepto	they should notice the host going down	16:39
yoctozepto	and send a notifications	16:39
yoctozepto	just like instance monitor did	16:39
ignaziocassano1	tst2-kvm01' is 'offline' (current: 'offline').	16:43
ignaziocassano1	Exception caught: 'NoneType' object is not iterable: TypeError: 'NoneType' object is not iterable	16:44
ignaziocassano1	the above is the host-monitor	16:44
ignaziocassano1	ti detected the node went down	16:44
ignaziocassano1	but it gave errors	16:44
ignaziocassano1	Thanks for your help. I will retry tomorrow	16:55
ignaziocassano	Hello, I am facing sono issue with masakari hacluster wallaby on kolla ansible. Please, openstack discuss mailing list can be' user ti send logs ?	18:58
ignaziocassano	Sorry, openstack discuss mailing list can be' used for sending logs?	19:00

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!