Monday, 2021-10-18

*** mnasiadka_ is now known as mnasiadka10:10
masterpe[m]hi18:07
johnsommasterpe[m] Hello20:31
masterpe[m]johnsom we are facing a bug in octavia, we are running Ussuri. The bug is that when send via Magnum and heat an update to two loadbalancers. Octavia set a database lock. The course of the lock, the transition fails for one the both.21:00
masterpe[m]I have proposed Michiel Piscaer proposed openstack/magnum master: Octavia race condition with multiple lb-ers  https://review.opendev.org/c/openstack/magnum/+/814476 to make sure that only one at the time is send.21:01
johnsomSounds like a bug in heat. The API is giving you back a retryable HTTP code right?21:01
masterpe[m]But the question is this the right path.21:01
masterpe[m]No it fails hard21:01
johnsomCan you provide logs? We don't do table locks in Octavia21:02
johnsomThe Octavia API log should tell us what we need to know.21:03
masterpe[m]mnaser proposed https://review.opendev.org/c/openstack/heat/+/793980 to solve it but got declined.21:03
masterpe[m]I'm not behind my desk. I will do some digging and create a bug tomorrow morning.21:05
johnsomOk, yeah, it's unlikely there is an API bug in Octavia, it has had stress testing21:05
johnsomNot saying it's not possible something has happened, but unlikely.21:06
johnsomYeah, so that heat patch shows it is returning the correct 503 code.21:07
masterpe[m]The end result is that the loadbalaner get stuck in pending updates21:10
johnsomNope, can't happen21:10
masterpe[m]So bug needs to be created on https://storyboard.openstack.org/#!/project/openstack/octavia ?21:11
johnsomWell, from the patch you linked, it should be on heat, but if you want you can create one on Octavia, just will probably get closed not-a-bug.21:13
johnsomSee, the LB can't get stuck in PENDING_* especially from the quota check: https://github.com/openstack/octavia/blob/stable/ussuri/octavia/api/v2/controllers/load_balancer.py#L52121:14
johnsomThe whole LB record will be removed21:14
masterpe[m]johnsom  you saying that https://review.opendev.org/c/openstack/magnum/+/814476 can be a solution to our problem?21:19
johnsomI don't think that change is necessary. 21:20
johnsomhttps://review.opendev.org/c/openstack/heat/+/793980 is likely the bug in heat. It should honor 503 correctly21:20
johnsomUgh, part of the problem in heat appears to be they are not using the SDK, they are hijacking the OpenStack Client code.21:23
johnsomBut, anyway, heat should retry on a 503 per the w3c standard.21:24
masterpe[m]Probably i'm saying stupid, bus shouldn't octavia be resilient so that it would not send the 503 on the first place?21:30
johnsomIt does retry and wait already, but for whatever reason your DB is slow or the oslo setting for the deadlock retry timeout is too low, so we give up and respond to the user immediately.21:32
johnsomIt's odd that you even see a 503 with just two creates on a project. Something seems slow21:33
johnsomIt appears it defaults to 20 retries: https://docs.openstack.org/oslo.db/latest/reference/opts.html#database.db_max_retries21:33
masterpe[m]Thanks the databases are running on name's but we have some load on the API's will check. Thanks for the hints and help. I will hope into bed now. Have good night/evening/afternoon/morning.21:36
masterpe[m]S/name's/NVME's/21:37
johnsomo/21:38

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!