Friday, 2021-07-16

*** yoctozepto[m] is now known as Guest994		07:59
*** bhagyashris_ is now known as bhagyashris\|ruck		08:24
Cheese	We've noticed that sometimes/always a host is put into maintenance mode (`openstack segment host list <oursegment>`), but I can't find any documentation on what this actually does.	11:40
Cheese	Is it just a flag that gets set as an indicator (and it's quite easy to set it to False again) -- or is there something else we should be doing to make the host 'normal' again? -- This is after a host is brought back up after an error, we made a crm resource to reboot a host if it has an issue.	11:41
yoctozepto	Cheese: this is to prevent flapping; if a host goes unhealthy, Masakari expects the operator to ensure it truly is healthy and unset the flag	11:52
Cheese	yoctozepto: Thanks, I think that's what I wanted to hear :) So doing `openstack segment host update <segment> <host> --on_maintenance False` is all that needs to be done, if we are happy the host is healthy. Good	11:55
Cheese	We have also noticed that a host will be rebooted/STONITH will happen, but not always the instances will be migrated. It has happened sometimes, but sometimes not. Not much details, sorry. Could this be a timing issue?	11:56
Cheese	For example, the host is getting rebooted too quickly, when it should be down longer, in order for the instances to be fully migrated by masakari?	11:57
yoctozepto	Cheese: it could be that the masakari engine failed to evacuate due to host being up again, yes; you should check the engine logs	12:12
Cheese	Thanks, we had a feeling that possible could be the case!	12:35
Cheese	Increasing our `stonith:ipmilan` resource to have a `op_params delay` of 200 seconds (was previously 60) didn't seem to make any difference. The bad host is getting rebooted, but vms are all shutoff, not getting migrated.Strange.	13:01
Cheese	More log file reading I guess!	13:01
Cheese	It seems increasing the delay did work - but we have an issue that we can't seem to resolve:	14:52
Cheese	When a host reboots, it rejoins our cluster automatically and becomes online again. Is there a way to prevent it from automatically rejoining the cluster, so that it waits until we take action ourselves to allow it to rejoin?	14:54
yoctozepto	Cheese: you might want to not reboot the host in the first place ;-)	15:14
yoctozepto	but if that is not feasible, I guess you could make sure that neither pacemaker nor nova compute start automatically	15:14
yoctozepto	that should solve the "i'm back again" issue	15:14
Cheese	Funny, we hadn't considered that (off vs reboot). Investigating that right now - thanks. Noticed in the previous logs you were on vacation, good to have you back, hope it was a good time	15:15

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!