Monday, 2021-06-21

ignaziocassano1hello everyone14:32
ignaziocassano1Anyone can help me with masakari configuration on kolla ?14:33
yoctozeptohi ignaziocassano1, I may14:34
ignaziocassano1yoctozepto tannks. Seems masakary on kolla wallaby requires pacemaker. It is right ?14:35
yoctozeptomasakari always requires pacemaker now if you want to monitor the hosts as opposed to instances only14:35
ignaziocassano1ok14:35
yoctozeptowe are planning to support consul14:35
yoctozeptokolla ansible has the role to deploy pacemaker for you14:36
yoctozeptoand integrates it with masakari14:36
ignaziocassano1When hai try to installa hacluster roles it gives me the following error: TASK [hacluster : Ensure config directories exist] ************************************************************** failed: [tst2-osctrl01 -> localhost] (item=hacluster-corosync) => {"ansible_loop_var": "item", "changed": false, "item": "hacluster-corosync", "msg": "There was an issue creating /etc/kolla/config/hacluster-corosync as requested: [Errno 13] Permission 14:36
ignaziocassano1I do not find any documentation to install hacluster with kolla14:37
ignaziocassano1I would be grateful if you could send me some documentation link14:37
ignaziocassano1Let me to start with more easy steps before14:39
ignaziocassano1As you mentioned there are two configurations: one is the instance monitor only14:40
ignaziocassano1Another one is compute node monitor: it is right ?14:41
ignaziocassano1In the first case (instance monitor only) I should have masakari host monitor and masakari engine on controllers and monitor on computes nodes 14:42
ignaziocassano1Then what I must di for enablng  one instance HA ?14:43
ignaziocassano1Then what I must do for enablng  one instance HA ?14:43
yoctozeptoyou can disable pacemaker by setting enable_hacluster to no14:54
yoctozeptoyou can also set the masakari-hostmonitor to be empty to not deploy one14:55
yoctozeptointeresting though that it gives you an error14:55
yoctozeptoI think we might be missing a ``become`` somewhere14:56
ignaziocassano1It is strange because I can deploy all roles but not hacluster14:57
yoctozeptoyeah, I believe it's some edge case15:00
yoctozeptoare you using the latest stable/wallaby?15:01
yoctozeptothe issue does not seem to exist there15:01
yoctozeptothere is ``become`` set as everywhere15:01
ignaziocassano1yes, I am using the last wallaby15:03
ignaziocassano1on mu multinode file now I inserted: 15:04
ignaziocassano1[control] # These hostname must be resolvable from your deployment host tst2-osctrl01  ansible_user=ansible ansible_become=true tst2-osctrl02  ansible_user=ansible ansible_become=true tst2-osctrl03  ansible_user=ansible ansible_become=true15:04
ignaziocassano1Sorry15:05
ignaziocassano1tst2-osctrl01  ansible_user=ansible ansible_become=true15:05
ignaziocassano1Can I set a global variable rather than specify it for each node ?15:05
ignaziocassano1OK15:12
ignaziocassano1setting them with environment variables, works fine15:12
yoctozeptothere is ``become`` at the task level and it works for me; perhaps there is some ansible quirk in play but I can't imagine15:13
ignaziocassano1Now the playbook worked15:14
ignaziocassano1Please, Hw can I verivy if hacluster is working ?15:14
ignaziocassano1I was wrong ....I deployed only masakary, not pacemaker15:17
yoctozeptoyou can issue ``crm_mon -1`` in the pacemaker's container15:17
ignaziocassano1I was wrong ....I deployed only masakari, not pacemaker15:17
ignaziocassano1I must download the image for pacemaker ?15:17
yoctozeptono, I meant the the deployed one15:19
ignaziocassano1So, After I deployed only masakary, how can I test ha on an instance15:26
yoctozeptoah, you can create a test instance15:29
yoctozeptoand e.g. kill the qemu process forcibly from the host15:29
yoctozeptothis will trigger the masakari actions on it15:29
yoctozeptothe instance should come back up15:29
ignaziocassano1I tried15:30
ignaziocassano1the instance remains stopped15:30
ignaziocassano1I do not know If I must confidure somethng else or tag the instance with some property15:31
yoctozeptoyou want to either enable https://docs.openstack.org/masakari/latest/configuration/config.html#instance_failure.process_all_instance15:33
yoctozeptoprocess_all_instances15:33
yoctozeptoin instance_failure in masakari.conf15:33
yoctozeptoOR set the metadata on VMs15:33
yoctozeptothe default is HA_Enabled15:34
yoctozeptoso setting HA_Enabled as True will enable HA protection per VM15:34
yoctozeptoyou can customise it15:34
ignaziocassano1I used the HA_Enabled property on instance but it did not work15:34
yoctozeptodid you set it to True?15:35
ignaziocassano1yes15:35
yoctozeptodid you check the masakari logs? did anything happen?15:35
yoctozeptothe instance monitor on the compute host should spot the issue15:35
ignaziocassano1I am going to check15:39
ignaziocassano1the host where I killed the instance has the following name: tst2-kvm0215:48
ignaziocassano1the instance monitor reports: Client Error for url: http://10.102.119.194:15868/v1/notifications, Host with name tst2-kvm02 could not be found.15:49
yoctozeptoah, you first need to create a HA segment in masakari and add protected hosts to it15:50
ignaziocassano1thanks . I try15:53
ignaziocassano1It asks: Rserved,  Type , Control Attribute and On Maintanance15:56
ignaziocassano1What I can put  on them ?15:56
yoctozeptoreserved is whether you want the host to act as a target for failures, this is not too robust16:00
yoctozeptotype should always be compute16:00
yoctozeptocontroll attribute is ignored for now; mostly ppl write ssh in there16:01
yoctozepto`on maintenance` would put the host in maintenance mode so ignore notifications on failures there16:01
yoctozeptothis is as the name suggests - for maintenance16:01
ignaziocassano1I must have a segment for each host ?16:01
yoctozeptono, you can have one that spans all the hosts16:02
ignaziocassano1it restarts on the same node :-(16:05
yoctozeptoyeah, because it's the instance that failed, not the node16:06
yoctozeptothis is expected16:06
ignaziocassano1soIf I want to simulete I crash of the node I can try to reboot it ?16:08
yoctozeptono, you should forcibly power it off16:08
yoctozeptobut you need hostmonitor and pacemaker for that to work16:08
ignaziocassano1OK16:08
ignaziocassano1I stop it with the IDRAC of Dell16:08
ignaziocassano1I powered off the node but the instace results running on it 16:13
yoctozeptoyes, I told you you need hostmonitor and pacemaker for that16:17
ignaziocassano1I installed them16:18
ignaziocassano1I enabled  hacluster: yes in  globals.yaml and I pulled pacemaker images16:19
ignaziocassano1I deployed with -tmasakari,hacluster without errors16:20
yoctozeptohave you checked the cluster health with that crm_mon I suggested?16:24
yoctozeptoalso, hostmonitor should now observe the host failure16:24
ignaziocassano1I missed de command crm:mon you suggested, sorry16:27
ignaziocassano1I usually use pcs command16:27
ignaziocassano1Cluster Summary:   * Stack: corosync   * Current DC: tst2-osctrl02 (version 2.0.3-4b1f869f0f) - partition with quorum   * Last updated: Mon Jun 21 18:27:33 2021   * Last change:  Mon Jun 21 17:38:50 2021 by root via cibadmin on tst2-osctrl01   * 5 nodes configured   * 2 resource instances configured  Node List:   * Online: [ tst2-osctrl01 tst2-osctrl02 tst2-osctrl03 ]   * RemoteOnline: [ tst2-kvm02 ]   * RemoteOFFLINE: [ tst2-kvm01 ]  A16:28
ignaziocassano1yes16:28
ignaziocassano1crm_mon list one node remote offiline16:28
ignaziocassano1So, I stopped the node 1 and it became remote offlime but instance did not restart on the other node16:34
ignaziocassano1The crm_mon list  show only remote nodes and controllers nodes . No others resources are needed ?16:36
yoctozeptono, it's all right16:39
yoctozeptowhat about hostmonitor logs16:39
yoctozeptothey should notice the host going down16:39
yoctozeptoand send a notifications16:39
yoctozeptojust like instance monitor did16:39
ignaziocassano1tst2-kvm01' is 'offline' (current: 'offline').16:43
ignaziocassano1Exception caught: 'NoneType' object is not iterable: TypeError: 'NoneType' object is not iterable16:44
ignaziocassano1the above is the host-monitor16:44
ignaziocassano1ti detected the node went down16:44
ignaziocassano1but it gave errors16:44
ignaziocassano1Thanks for your help. I will retry tomorrow16:55
ignaziocassanoHello, I am facing sono issue with masakari  hacluster wallaby on kolla ansible. Please, openstack discuss mailing list can be' user ti send logs ?18:58
ignaziocassanoSorry, openstack discuss mailing list can be' used for sending logs?19:00

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!