ignaziocassano1 | hello everyone | 14:32 |
---|---|---|
ignaziocassano1 | Anyone can help me with masakari configuration on kolla ? | 14:33 |
yoctozepto | hi ignaziocassano1, I may | 14:34 |
ignaziocassano1 | yoctozepto tannks. Seems masakary on kolla wallaby requires pacemaker. It is right ? | 14:35 |
yoctozepto | masakari always requires pacemaker now if you want to monitor the hosts as opposed to instances only | 14:35 |
ignaziocassano1 | ok | 14:35 |
yoctozepto | we are planning to support consul | 14:35 |
yoctozepto | kolla ansible has the role to deploy pacemaker for you | 14:36 |
yoctozepto | and integrates it with masakari | 14:36 |
ignaziocassano1 | When hai try to installa hacluster roles it gives me the following error: TASK [hacluster : Ensure config directories exist] ************************************************************** failed: [tst2-osctrl01 -> localhost] (item=hacluster-corosync) => {"ansible_loop_var": "item", "changed": false, "item": "hacluster-corosync", "msg": "There was an issue creating /etc/kolla/config/hacluster-corosync as requested: [Errno 13] Permission | 14:36 |
ignaziocassano1 | I do not find any documentation to install hacluster with kolla | 14:37 |
ignaziocassano1 | I would be grateful if you could send me some documentation link | 14:37 |
ignaziocassano1 | Let me to start with more easy steps before | 14:39 |
ignaziocassano1 | As you mentioned there are two configurations: one is the instance monitor only | 14:40 |
ignaziocassano1 | Another one is compute node monitor: it is right ? | 14:41 |
ignaziocassano1 | In the first case (instance monitor only) I should have masakari host monitor and masakari engine on controllers and monitor on computes nodes | 14:42 |
ignaziocassano1 | Then what I must di for enablng one instance HA ? | 14:43 |
ignaziocassano1 | Then what I must do for enablng one instance HA ? | 14:43 |
yoctozepto | you can disable pacemaker by setting enable_hacluster to no | 14:54 |
yoctozepto | you can also set the masakari-hostmonitor to be empty to not deploy one | 14:55 |
yoctozepto | interesting though that it gives you an error | 14:55 |
yoctozepto | I think we might be missing a ``become`` somewhere | 14:56 |
ignaziocassano1 | It is strange because I can deploy all roles but not hacluster | 14:57 |
yoctozepto | yeah, I believe it's some edge case | 15:00 |
yoctozepto | are you using the latest stable/wallaby? | 15:01 |
yoctozepto | the issue does not seem to exist there | 15:01 |
yoctozepto | there is ``become`` set as everywhere | 15:01 |
ignaziocassano1 | yes, I am using the last wallaby | 15:03 |
ignaziocassano1 | on mu multinode file now I inserted: | 15:04 |
ignaziocassano1 | [control] # These hostname must be resolvable from your deployment host tst2-osctrl01 ansible_user=ansible ansible_become=true tst2-osctrl02 ansible_user=ansible ansible_become=true tst2-osctrl03 ansible_user=ansible ansible_become=true | 15:04 |
ignaziocassano1 | Sorry | 15:05 |
ignaziocassano1 | tst2-osctrl01 ansible_user=ansible ansible_become=true | 15:05 |
ignaziocassano1 | Can I set a global variable rather than specify it for each node ? | 15:05 |
ignaziocassano1 | OK | 15:12 |
ignaziocassano1 | setting them with environment variables, works fine | 15:12 |
yoctozepto | there is ``become`` at the task level and it works for me; perhaps there is some ansible quirk in play but I can't imagine | 15:13 |
ignaziocassano1 | Now the playbook worked | 15:14 |
ignaziocassano1 | Please, Hw can I verivy if hacluster is working ? | 15:14 |
ignaziocassano1 | I was wrong ....I deployed only masakary, not pacemaker | 15:17 |
yoctozepto | you can issue ``crm_mon -1`` in the pacemaker's container | 15:17 |
ignaziocassano1 | I was wrong ....I deployed only masakari, not pacemaker | 15:17 |
ignaziocassano1 | I must download the image for pacemaker ? | 15:17 |
yoctozepto | no, I meant the the deployed one | 15:19 |
ignaziocassano1 | So, After I deployed only masakary, how can I test ha on an instance | 15:26 |
yoctozepto | ah, you can create a test instance | 15:29 |
yoctozepto | and e.g. kill the qemu process forcibly from the host | 15:29 |
yoctozepto | this will trigger the masakari actions on it | 15:29 |
yoctozepto | the instance should come back up | 15:29 |
ignaziocassano1 | I tried | 15:30 |
ignaziocassano1 | the instance remains stopped | 15:30 |
ignaziocassano1 | I do not know If I must confidure somethng else or tag the instance with some property | 15:31 |
yoctozepto | you want to either enable https://docs.openstack.org/masakari/latest/configuration/config.html#instance_failure.process_all_instance | 15:33 |
yoctozepto | process_all_instances | 15:33 |
yoctozepto | in instance_failure in masakari.conf | 15:33 |
yoctozepto | OR set the metadata on VMs | 15:33 |
yoctozepto | the default is HA_Enabled | 15:34 |
yoctozepto | so setting HA_Enabled as True will enable HA protection per VM | 15:34 |
yoctozepto | you can customise it | 15:34 |
ignaziocassano1 | I used the HA_Enabled property on instance but it did not work | 15:34 |
yoctozepto | did you set it to True? | 15:35 |
ignaziocassano1 | yes | 15:35 |
yoctozepto | did you check the masakari logs? did anything happen? | 15:35 |
yoctozepto | the instance monitor on the compute host should spot the issue | 15:35 |
ignaziocassano1 | I am going to check | 15:39 |
ignaziocassano1 | the host where I killed the instance has the following name: tst2-kvm02 | 15:48 |
ignaziocassano1 | the instance monitor reports: Client Error for url: http://10.102.119.194:15868/v1/notifications, Host with name tst2-kvm02 could not be found. | 15:49 |
yoctozepto | ah, you first need to create a HA segment in masakari and add protected hosts to it | 15:50 |
ignaziocassano1 | thanks . I try | 15:53 |
ignaziocassano1 | It asks: Rserved, Type , Control Attribute and On Maintanance | 15:56 |
ignaziocassano1 | What I can put on them ? | 15:56 |
yoctozepto | reserved is whether you want the host to act as a target for failures, this is not too robust | 16:00 |
yoctozepto | type should always be compute | 16:00 |
yoctozepto | controll attribute is ignored for now; mostly ppl write ssh in there | 16:01 |
yoctozepto | `on maintenance` would put the host in maintenance mode so ignore notifications on failures there | 16:01 |
yoctozepto | this is as the name suggests - for maintenance | 16:01 |
ignaziocassano1 | I must have a segment for each host ? | 16:01 |
yoctozepto | no, you can have one that spans all the hosts | 16:02 |
ignaziocassano1 | it restarts on the same node :-( | 16:05 |
yoctozepto | yeah, because it's the instance that failed, not the node | 16:06 |
yoctozepto | this is expected | 16:06 |
ignaziocassano1 | soIf I want to simulete I crash of the node I can try to reboot it ? | 16:08 |
yoctozepto | no, you should forcibly power it off | 16:08 |
yoctozepto | but you need hostmonitor and pacemaker for that to work | 16:08 |
ignaziocassano1 | OK | 16:08 |
ignaziocassano1 | I stop it with the IDRAC of Dell | 16:08 |
ignaziocassano1 | I powered off the node but the instace results running on it | 16:13 |
yoctozepto | yes, I told you you need hostmonitor and pacemaker for that | 16:17 |
ignaziocassano1 | I installed them | 16:18 |
ignaziocassano1 | I enabled hacluster: yes in globals.yaml and I pulled pacemaker images | 16:19 |
ignaziocassano1 | I deployed with -tmasakari,hacluster without errors | 16:20 |
yoctozepto | have you checked the cluster health with that crm_mon I suggested? | 16:24 |
yoctozepto | also, hostmonitor should now observe the host failure | 16:24 |
ignaziocassano1 | I missed de command crm:mon you suggested, sorry | 16:27 |
ignaziocassano1 | I usually use pcs command | 16:27 |
ignaziocassano1 | Cluster Summary: * Stack: corosync * Current DC: tst2-osctrl02 (version 2.0.3-4b1f869f0f) - partition with quorum * Last updated: Mon Jun 21 18:27:33 2021 * Last change: Mon Jun 21 17:38:50 2021 by root via cibadmin on tst2-osctrl01 * 5 nodes configured * 2 resource instances configured Node List: * Online: [ tst2-osctrl01 tst2-osctrl02 tst2-osctrl03 ] * RemoteOnline: [ tst2-kvm02 ] * RemoteOFFLINE: [ tst2-kvm01 ] A | 16:28 |
ignaziocassano1 | yes | 16:28 |
ignaziocassano1 | crm_mon list one node remote offiline | 16:28 |
ignaziocassano1 | So, I stopped the node 1 and it became remote offlime but instance did not restart on the other node | 16:34 |
ignaziocassano1 | The crm_mon list show only remote nodes and controllers nodes . No others resources are needed ? | 16:36 |
yoctozepto | no, it's all right | 16:39 |
yoctozepto | what about hostmonitor logs | 16:39 |
yoctozepto | they should notice the host going down | 16:39 |
yoctozepto | and send a notifications | 16:39 |
yoctozepto | just like instance monitor did | 16:39 |
ignaziocassano1 | tst2-kvm01' is 'offline' (current: 'offline'). | 16:43 |
ignaziocassano1 | Exception caught: 'NoneType' object is not iterable: TypeError: 'NoneType' object is not iterable | 16:44 |
ignaziocassano1 | the above is the host-monitor | 16:44 |
ignaziocassano1 | ti detected the node went down | 16:44 |
ignaziocassano1 | but it gave errors | 16:44 |
ignaziocassano1 | Thanks for your help. I will retry tomorrow | 16:55 |
ignaziocassano | Hello, I am facing sono issue with masakari hacluster wallaby on kolla ansible. Please, openstack discuss mailing list can be' user ti send logs ? | 18:58 |
ignaziocassano | Sorry, openstack discuss mailing list can be' used for sending logs? | 19:00 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!