Tuesday, 2022-05-17

__ministrytoday, i also had meet above error about senlin-health-manager loading wrong about clusters make high load in keystone.01:34
__ministryI just run senlin-health-manager in 1 node. senlin-health-manager was auto loading cluster many times.01:35
eandersson__ministry: What version ae you running01:53
eanderssonAlso, do you see any SQL errors anywhere?01:53
eanderssonIt does not have to be on the health manager01:53
__ministryi install senlin from git with commit: f99412750cc068a302150c27074818f57bcdbdba01:56
__ministrylet me see, i will do query interval about register in sql.01:57
eanderssonSo based on the logs it looks like it fails to report to the database for some reason02:08
eanderssonSomething must be locking up the database so that the services cannot report their own health02:09
eanderssonDo you see anything like02:12
eandersson> Breaking locks for dead engine02:12
eanderssonIn the logs? probably in central02:12
eanderssonYou may be able to fix this by increasing the periodic_interval in your config02:12
eanderssonhttps://docs.openstack.org/senlin/ocata/configuration.html#DEFAULT.periodic_interval02:13
eanderssonMaybe try to double it to give it more time to recover safely02:14
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Removed two previously unused config options  https://review.opendev.org/c/openstack/senlin/+/84202002:20
__ministryyep. I had edit code to logs db_registries and self.registries. let me follow it.02:48
eanderssonbtw are you still running 32 workers? Might be worth trying to cut that down.02:58
eanderssonEach worker needs to update its health in the database02:58
eanderssonIf you have 32 workers for all services times 3 that is a lot of updates02:59
__ministryyep. i had run 32 workers in one node.03:00
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups  https://review.opendev.org/c/openstack/senlin/+/84202503:11
eanderssonI wonder if there are just too many health checks here.03:14
eanderssonEach worker will basically update its own health every 60 seconds, and if your database is under load maybe it's just falling behind03:14
eanderssonAnother theory could be that the worker is too busy under load and not able to update its own health fast enough03:15
eanderssonCan you let me know if you see any logs from here03:16
eanderssonhttps://github.com/openstack/senlin/blob/3b0f21972be0bb067e8c4391a6b77aa8815a0ca2/senlin/conductor/service.py#L15503:16
eanderssonThe logs would be under the conductor03:16
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups  https://review.opendev.org/c/openstack/senlin/+/84202503:19
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups  https://review.opendev.org/c/openstack/senlin/+/84202503:21
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Move service health update_at check to sql query  https://review.opendev.org/c/openstack/senlin/+/84202603:58
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Added protection against premature service cleanups  https://review.opendev.org/c/openstack/senlin/+/84202503:59
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Move service health update_at check to sql query  https://review.opendev.org/c/openstack/senlin/+/84202604:00
eanderssondtruong: If you have some time over. The above are just some general ideas on improving things that could go "wrong". ^04:00
eanderssonActually. I misunderstood how that code worked.05:38
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202506:05
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202506:32
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202507:06
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202507:16
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202507:19
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202507:38
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202507:46
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: [WIP] Fix issues with service cleaning  https://review.opendev.org/c/openstack/senlin/+/84202507:54
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Fixed service manage not cleaning properly  https://review.opendev.org/c/openstack/senlin/+/84202508:34
eandersson__ministry: Not sure if this fixes your issue, but was able to identify some issues with the current code. https://review.opendev.org/c/openstack/senlin/+/84202508:36
opendevreviewErik Olof Gunnar Andersson proposed openstack/senlin master: Fixed services not cleaning properly on startup  https://review.opendev.org/c/openstack/senlin/+/84202508:44
__ministryyep. let me test this code10:02
eandersson__ministry: Let me know if you run into any issues. I don't know if this will solve your issue, but I am pretty sure there is a bug with how we clean up dead services that this fix addresses.19:48

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!