Friday, 2023-12-15

tobias-urdingthiemonge: trying to wrap my head around this, how could this: Batch updating members: old='[]', new='[None]'09:12
tobias-urdinever happend here: https://github.com/openstack/octavia/blob/stable/yoga/octavia/controller/queue/v2/endpoints.py#L12809:13
tobias-urdinand could that indicate an issue I see with being stuck in PENDING_UPDATE when a lot of batch members updates is done09:13
gthiemongetobias-urdin: looking...09:16
gthiemongetobias-urdin: I think I know this one09:17
gthiemongetobias-urdin: https://bugs.launchpad.net/octavia/+bug/203615609:17
gthiemongehm maybe it's time to cut a new bugfix release for yoga09:19
tobias-urdinyeah i was looking at that one, I think that is already released in yoga 10.0.109:25
tobias-urdinbut i didn't understand if it's the same based on the error messages in the bugreport09:26
tobias-urdinthanks I don't think we have that so will try with that09:28
tobias-urdinthe pending_update stuck seems to be unrelated, invalid tls cert container caused listener to get stuck in PENDING_UPDATE because certificate was either invalid or not found09:44
tobias-urdinhttps://paste.opendev.org/show/bxsnaKgk096PIiQagFDp/ – i always wondered why there is no failed or update_failed provisioning_status on resources09:48
opendevreviewLukas Piwowarski proposed openstack/octavia-tempest-plugin master: Add backup member tests  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/89756409:58
gthiemongetobias-urdin: yeah, we should have a try/except block to set the state to ERROR if one of those lines triggers an exception https://opendev.org/openstack/octavia/src/branch/stable/yoga/octavia/controller/worker/v2/controller_worker.py#L748-L75710:43
gthiemongethat said, the certs should have been checked in the API10:45
opendevreviewTobias Urdin proposed openstack/octavia master: Catch exceptions in listener update  https://review.opendev.org/c/openstack/octavia/+/90375313:47
tobias-urdingthiemonge: yeah – thinking about it, isn't above ^ valid? doing same in put() that is done in post() otherwise when certificate is invalid we get stuck in PENDING_UPDATE, I guess that's what happend here, the codepath makes it possible for it to be stuck in PENDING_UPDATE when updating the listener but not upon creation13:48
tobias-urdin(stuck in PENDING_UPDATE since we don't rollback?)13:48
tobias-urdinmore context: since somewhere we made it possible to introduce a invalid certificate with the API, the failure to delete is expected since I updated the LB+listener+pool+l7policy from PENDING_UPDATE -> ACTIVE so that I could delete them13:50
tobias-urdinbut I never could delete it until I unset the certificate `openstack loadbalancer listener unset --default-tls-container-ref` and then PENDING_UPDATE went to ACTIVE since that's where the error was, then I could delete everything13:51
gthiemongetobias-urdin: hmm, AFAIK the with session.begin() block should implicitly handle the rollback, so for me, your patch doesn't change the behavior13:53
gthiemonge"call_provider" sends the request to the worker, if the exception is trigger in the worker, the API doesn't catch it13:54
gthiemonge"sends the request (via the RPC)"13:54
tobias-urdinfeels like somewhere the api allowed to introduce the broken certificate without rollback of provisioning_status back to ACTIVE or setting it to ERROR13:54
gthiemongetobias-urdin: do you know what could be wrong with your certificate?13:56
tobias-urdinI'm not sure, I don't think I have access to it or if it's still left – I only have the logs for when I tried to delete it https://paste.opendev.org/show/bxsnaKgk096PIiQagFDp/14:01
tobias-urdinlet me see if I can find the PUT call for the listener, if you suspicion is correct14:01
tobias-urdingthiemonge: i don't have a request ID for the listener post() call but after the "Creating listener" line in the log I don't have any errors14:20
tobias-urdinabout 50 minutes later this listener put() call https://paste.opendev.org/show/bS6QHyAXZj5eoe2iudDh/14:20
tobias-urdinif the cert is invalid it shouldn't send it to amphora provider but looks like that happens (last line), it would make more sense and set back PENDING_UPDATE -> ACTIVE and reject the change or set status to ERROR instead no?14:22
tobias-urdinnow what happend was the LB, listener, pool and l7policy got stuck in PENDING_UPDATE, and it was solved by unsetting the tls container ref on listener and everything went back to ACTIVE so it could be deleted14:23
tobias-urdini get that the secret could be updated without octavia knowing about it and the cert being invalid, but when stuck in pending_update people more feel like "the service is broken" because their input data was accepted and there is no mention of error anywhere14:24
gthiemongeyes I agree, it needs to be fixed, most of the code in controller_worker.py should be in try/except block with an update of the prov_status in case of errors14:32
opendevreviewTobias Urdin proposed openstack/octavia master: Ensure tls containers is validated  https://review.opendev.org/c/openstack/octavia/+/90375914:49
tobias-urdingthiemonge: ^ maybe something, if listener was created in POST /v2/lbaas/loadbalancers call it would not have verified default_tls_container_ref, that is called in lbaas create path through _graph_create() in listener code15:02
tobias-urdini'm asking user to see if listener was created through the POST /v2/lbaas/loadbalancers or with POST /v2/lbaas/listeners (and if default_tls_container_ref was included in POST or if it was PUT /v2/lbaas/listeners/<id> afterwards)15:02
opendevreviewMichael Johnson proposed openstack/octavia-tempest-plugin master: DNM: Testing with enable scope set to True  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/90209623:43

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!