Thursday, 2023-08-31

opendevreviewGregory Thiemonge proposed openstack/octavia-dashboard master: Additional VIP support in the LB creation form  https://review.opendev.org/c/openstack/octavia-dashboard/+/83517607:20
opendevreviewOmer Schwartz proposed openstack/octavia-tempest-plugin master: Add TERMINATED_HTTPS listener API tests  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/89306609:56
racostaHi johnsom, I read your comments on this thread: https://bugzilla.redhat.com/show_bug.cgi?id=196436111:36
racostaIt was more related to the DB update process, but what about the health manager for the heartbeat_udp, any tuning recommendations?11:36
racostaI'm seeing something like this:11:36
racostaoctavia-health-manager[xxx]: WARNING octavia.amphorae.drivers.health.heartbeat_udp [-] Amphora ID health message was processed too slowly: 1023.0624372959137s! The system may be overloaded or otherwise malfunctioning. This heartbeat has been ignored and no update was made to the amphora health entry. THIS IS NOT GOOD.11:37
racostaSometimes it takes 1019.3910622596741s and other times it takes 14.932192087173462s :(11:39
gthiemonge1023 sec, wow11:42
racostayeah... I'm trying to figure out the issues that are causing constant errors on LBs. The 1023 seconds is off the curve, the average is some time around 10-15 sec11:46
opendevreviewOmer Schwartz proposed openstack/octavia-tempest-plugin master: Add TERMINATED_HTTPS listener API tests  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/89306612:02
johnsomracosta That is really bad. It should finish that in 0.001s, there is something seriously wrong with your database server.16:01
racostajohnsom, Is there any recommended config tuning to avoid problems with database and/or oslo messaging? since it depends on other infrastructure nodes availability16:56
johnsomThat level of poor performance if far beyond tuning, I would check the load levels on your DB, the IO backlog, etc. There is something very unhealthy with you DB most likely. I have seen this when people put 32 containers, including the DB, on one host and the IO levels were too high for the hardware. There are also cases where people are using the wrong maintenance or backup commands that lock the whole DB.16:59
racostaI saw that the health check (UDPStatusGetter methos) needs to persist the health to amphora_health table and stats to listener_statistics table every time it gets the check... DB's availability is very important here!17:15
johnsomYes, there is a heartbeat every 10s by default. The amphora has six changes to land a heartbeat on one of the HMs before it is marked as failed and recovery processes start.17:21
opendevreviewMerged openstack/octavia master: Add octavia-grenade-slurp CI job  https://review.opendev.org/c/openstack/octavia/+/86022117:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!