*** yoctozepto[m] is now known as Guest994 | 07:59 | |
*** bhagyashris_ is now known as bhagyashris|ruck | 08:24 | |
Cheese | We've noticed that sometimes/always a host is put into maintenance mode (`openstack segment host list <oursegment>`), but I can't find any documentation on what this actually does. | 11:40 |
---|---|---|
Cheese | Is it just a flag that gets set as an indicator (and it's quite easy to set it to False again) -- or is there something else we should be doing to make the host 'normal' again? -- This is after a host is brought back up after an error, we made a crm resource to reboot a host if it has an issue. | 11:41 |
yoctozepto | Cheese: this is to prevent flapping; if a host goes unhealthy, Masakari expects the operator to ensure it truly is healthy and unset the flag | 11:52 |
Cheese | yoctozepto: Thanks, I think that's what I wanted to hear :) So doing `openstack segment host update <segment> <host> --on_maintenance False` is all that needs to be done, if we are happy the host is healthy. Good | 11:55 |
Cheese | We have also noticed that a host will be rebooted/STONITH will happen, but not always the instances will be migrated. It has happened sometimes, but sometimes not. Not much details, sorry. Could this be a timing issue? | 11:56 |
Cheese | For example, the host is getting rebooted too quickly, when it should be down longer, in order for the instances to be fully migrated by masakari? | 11:57 |
yoctozepto | Cheese: it could be that the masakari engine failed to evacuate due to host being up again, yes; you should check the engine logs | 12:12 |
Cheese | Thanks, we had a feeling that possible could be the case! | 12:35 |
Cheese | Increasing our `stonith:ipmilan` resource to have a `op_params delay` of 200 seconds (was previously 60) didn't seem to make any difference. The bad host is getting rebooted, but vms are all shutoff, not getting migrated.Strange. | 13:01 |
Cheese | More log file reading I guess! | 13:01 |
Cheese | It seems increasing the delay did work - but we have an issue that we can't seem to resolve: | 14:52 |
Cheese | When a host reboots, it rejoins our cluster automatically and becomes online again. Is there a way to prevent it from automatically rejoining the cluster, so that it waits until we take action ourselves to allow it to rejoin? | 14:54 |
yoctozepto | Cheese: you might want to not reboot the host in the first place ;-) | 15:14 |
yoctozepto | but if that is not feasible, I guess you could make sure that neither pacemaker nor nova compute start automatically | 15:14 |
yoctozepto | that should solve the "i'm back again" issue | 15:14 |
Cheese | Funny, we hadn't considered that (off vs reboot). Investigating that right now - thanks. Noticed in the previous logs you were on vacation, good to have you back, hope it was a good time | 15:15 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!