Thursday, 2023-08-31

opendevreview	Gregory Thiemonge proposed openstack/octavia-dashboard master: Additional VIP support in the LB creation form https://review.opendev.org/c/openstack/octavia-dashboard/+/835176	07:20
opendevreview	Omer Schwartz proposed openstack/octavia-tempest-plugin master: Add TERMINATED_HTTPS listener API tests https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/893066	09:56
racosta	Hi johnsom, I read your comments on this thread: https://bugzilla.redhat.com/show_bug.cgi?id=1964361	11:36
racosta	It was more related to the DB update process, but what about the health manager for the heartbeat_udp, any tuning recommendations?	11:36
racosta	I'm seeing something like this:	11:36
racosta	octavia-health-manager[xxx]: WARNING octavia.amphorae.drivers.health.heartbeat_udp [-] Amphora ID health message was processed too slowly: 1023.0624372959137s! The system may be overloaded or otherwise malfunctioning. This heartbeat has been ignored and no update was made to the amphora health entry. THIS IS NOT GOOD.	11:37
racosta	Sometimes it takes 1019.3910622596741s and other times it takes 14.932192087173462s :(	11:39
gthiemonge	1023 sec, wow	11:42
racosta	yeah... I'm trying to figure out the issues that are causing constant errors on LBs. The 1023 seconds is off the curve, the average is some time around 10-15 sec	11:46
opendevreview	Omer Schwartz proposed openstack/octavia-tempest-plugin master: Add TERMINATED_HTTPS listener API tests https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/893066	12:02
johnsom	racosta That is really bad. It should finish that in 0.001s, there is something seriously wrong with your database server.	16:01
racosta	johnsom, Is there any recommended config tuning to avoid problems with database and/or oslo messaging? since it depends on other infrastructure nodes availability	16:56
johnsom	That level of poor performance if far beyond tuning, I would check the load levels on your DB, the IO backlog, etc. There is something very unhealthy with you DB most likely. I have seen this when people put 32 containers, including the DB, on one host and the IO levels were too high for the hardware. There are also cases where people are using the wrong maintenance or backup commands that lock the whole DB.	16:59
racosta	I saw that the health check (UDPStatusGetter methos) needs to persist the health to amphora_health table and stats to listener_statistics table every time it gets the check... DB's availability is very important here!	17:15
johnsom	Yes, there is a heartbeat every 10s by default. The amphora has six changes to land a heartbeat on one of the HMs before it is marked as failed and recovery processes start.	17:21
opendevreview	Merged openstack/octavia master: Add octavia-grenade-slurp CI job https://review.opendev.org/c/openstack/octavia/+/860221	17:24

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!