opendevreview | Gregory Thiemonge proposed openstack/octavia-dashboard master: Additional VIP support in the LB creation form https://review.opendev.org/c/openstack/octavia-dashboard/+/835176 | 07:20 |
---|---|---|
opendevreview | Omer Schwartz proposed openstack/octavia-tempest-plugin master: Add TERMINATED_HTTPS listener API tests https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/893066 | 09:56 |
racosta | Hi johnsom, I read your comments on this thread: https://bugzilla.redhat.com/show_bug.cgi?id=1964361 | 11:36 |
racosta | It was more related to the DB update process, but what about the health manager for the heartbeat_udp, any tuning recommendations? | 11:36 |
racosta | I'm seeing something like this: | 11:36 |
racosta | octavia-health-manager[xxx]: WARNING octavia.amphorae.drivers.health.heartbeat_udp [-] Amphora ID health message was processed too slowly: 1023.0624372959137s! The system may be overloaded or otherwise malfunctioning. This heartbeat has been ignored and no update was made to the amphora health entry. THIS IS NOT GOOD. | 11:37 |
racosta | Sometimes it takes 1019.3910622596741s and other times it takes 14.932192087173462s :( | 11:39 |
gthiemonge | 1023 sec, wow | 11:42 |
racosta | yeah... I'm trying to figure out the issues that are causing constant errors on LBs. The 1023 seconds is off the curve, the average is some time around 10-15 sec | 11:46 |
opendevreview | Omer Schwartz proposed openstack/octavia-tempest-plugin master: Add TERMINATED_HTTPS listener API tests https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/893066 | 12:02 |
johnsom | racosta That is really bad. It should finish that in 0.001s, there is something seriously wrong with your database server. | 16:01 |
racosta | johnsom, Is there any recommended config tuning to avoid problems with database and/or oslo messaging? since it depends on other infrastructure nodes availability | 16:56 |
johnsom | That level of poor performance if far beyond tuning, I would check the load levels on your DB, the IO backlog, etc. There is something very unhealthy with you DB most likely. I have seen this when people put 32 containers, including the DB, on one host and the IO levels were too high for the hardware. There are also cases where people are using the wrong maintenance or backup commands that lock the whole DB. | 16:59 |
racosta | I saw that the health check (UDPStatusGetter methos) needs to persist the health to amphora_health table and stats to listener_statistics table every time it gets the check... DB's availability is very important here! | 17:15 |
johnsom | Yes, there is a heartbeat every 10s by default. The amphora has six changes to land a heartbeat on one of the HMs before it is marked as failed and recovery processes start. | 17:21 |
opendevreview | Merged openstack/octavia master: Add octavia-grenade-slurp CI job https://review.opendev.org/c/openstack/octavia/+/860221 | 17:24 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!