Friday, 2021-07-16

*** yoctozepto[m] is now known as Guest99407:59
*** bhagyashris_ is now known as bhagyashris|ruck08:24
CheeseWe've noticed that sometimes/always a host is put into maintenance mode (`openstack segment host list <oursegment>`),  but I can't find any documentation on what this actually does.11:40
CheeseIs it just a flag that gets set as an indicator (and it's quite easy to set it to False again) -- or is there something else we should be doing to make the host 'normal' again?        -- This is after a host is brought back up after an error, we made a crm resource to reboot a host if it has an issue.11:41
yoctozeptoCheese: this is to prevent flapping; if a host goes unhealthy, Masakari expects the operator to ensure it truly is healthy and unset the flag11:52
Cheeseyoctozepto: Thanks, I think that's what I wanted to hear :) So doing `openstack segment host update <segment> <host> --on_maintenance False` is all that needs to be done, if we are happy the host is healthy. Good11:55
CheeseWe have also noticed that a host will be rebooted/STONITH will happen, but not always the instances will be migrated. It has happened sometimes, but sometimes not. Not much details, sorry.    Could this be a timing issue?11:56
CheeseFor example, the host is getting rebooted too quickly,   when it should be down longer, in order for the instances to be fully migrated by masakari?11:57
yoctozeptoCheese: it could be that the masakari engine failed to evacuate due to host being up again, yes; you should check the engine logs12:12
CheeseThanks, we had a feeling that possible could be the case!12:35
CheeseIncreasing our `stonith:ipmilan` resource to have a `op_params delay` of 200 seconds (was previously 60) didn't seem to make any difference. The bad host is getting rebooted, but vms are all shutoff, not getting migrated.Strange.13:01
CheeseMore log file reading I guess!13:01
CheeseIt seems increasing the delay did work - but we have an issue that we can't seem to resolve:14:52
CheeseWhen a host reboots, it rejoins our cluster automatically and becomes online again. Is there a way to prevent it from automatically rejoining the cluster, so that it waits until we take action ourselves to allow it to rejoin?14:54
yoctozeptoCheese: you might want to not reboot the host in the first place ;-)15:14
yoctozeptobut if that is not feasible, I guess you could make sure that neither pacemaker nor nova compute start automatically15:14
yoctozeptothat should solve the "i'm back again" issue15:14
CheeseFunny, we hadn't considered that (off vs reboot). Investigating that right now - thanks.   Noticed in the previous logs you were on vacation, good to have you back, hope it was a good time15:15

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!