*** redrobot2 is now known as redrobot | 05:58 | |
maysams | hello folks, quick question about octavia lbs with ovn provider | 14:38 |
---|---|---|
maysams | I just noticed a LB ACTIVE , but with pools with ERROR state and lb listeners with state PENDING_UPDATE. Shouldn't the LB be with some PENDING_* state or ERROR? | 14:38 |
gthiemonge | maysams: if a listener is PENDING_UPDATE, the load balancer should also be PENDING_UPDATE (both are set at the same time by the Octavia API) | 14:41 |
maysams | gthiemonge: oh, interesting. So I'm observing a different behavior, I will try to gather more info around that LB | 14:43 |
maysams | gthiemonge++ | 14:43 |
gthiemonge | maysams: not an ovn-provider expert, but this doesn't look good: https://opendev.org/openstack/ovn-octavia-provider/src/branch/master/ovn_octavia_provider/helper.py#L1380-L1388 | 14:45 |
gthiemonge | maysams: on error, in pool_delete, the provisioning_status of the pool is set to ERROR and the LB is set to ACTIVE, but the listener is unchanged | 14:45 |
gthiemonge | https://opendev.org/openstack/octavia/src/branch/master/octavia/api/v2/controllers/pool.py#L512-L514 | 14:46 |
gthiemonge | ^ this call marks the LB and the listeners in PENDING_UPDATE state | 14:47 |
maysams | right, the scenario I see with kuryr is the following: kuryr is trying to delete a lb member, but it cant since the lb pool is with error and listener pending_update | 14:48 |
maysams | it fails with conflict | 14:48 |
gthiemonge | there was probably a previous error that triggered this invalid state | 14:49 |
maysams | seems the logs were rotated :/ | 14:55 |
gthiemonge | #startmeeting Octavia | 16:00 |
opendevmeet | Meeting started Wed Nov 3 16:00:29 2021 UTC and is due to finish in 60 minutes. The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:00 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:00 |
opendevmeet | The meeting name has been set to 'octavia' | 16:00 |
johnsom | o/ | 16:00 |
gthiemonge | hi! | 16:00 |
gthiemonge | #topic Announcements | 16:02 |
gthiemonge | well I don't have any annoucements | 16:02 |
gthiemonge | anyone? | 16:02 |
johnsom | I don't | 16:03 |
gthiemonge | #topic Brief progress reports / bugs needing review | 16:03 |
gthiemonge | I proposed a fix for a problem with some revert functions: | 16:03 |
gthiemonge | #link https://review.opendev.org/c/openstack/octavia/+/815973 | 16:04 |
gthiemonge | in some tasks, in the revert function, we set the load balancer to ERROR and I believe that it is not good, because the provisioning_status of the LB acts as a lock on the resource | 16:04 |
gthiemonge | and in those revert functions, we unlock the LB too early, which may cause race conditions | 16:05 |
johnsom | Yep, unlocking too early | 16:05 |
gthiemonge | only the revert function of the first task of a flow (such as LoadBalancerToErrorOnRevertTask) should set a LB to ERROR in my opinion | 16:05 |
gthiemonge | I opened a story about the issue I got: | 16:05 |
gthiemonge | #link https://storyboard.openstack.org/#!/story/2009652 | 16:05 |
johnsom | Yeah, it should roll up to the capstone task | 16:05 |
gthiemonge | please note that an additional patch will be required for release <=stable/wallaby | 16:06 |
gthiemonge | because some tasks were removed from Xena (spare pool) | 16:07 |
gthiemonge | and FYI I also started using centos 9 stream amphora images. It works in my local env, but not in the CI | 16:08 |
johnsom | Nice | 16:08 |
johnsom | On the Octavia front I have only been doing bug fixes | 16:09 |
johnsom | sigh, reviews that is | 16:11 |
gthiemonge | #topic 200 vs 202 return codes in the API | 16:11 |
johnsom | Yeah, this is my topic item | 16:12 |
johnsom | https://review.opendev.org/c/openstack/octavia/+/816393 | 16:12 |
johnsom | This patch raised an interesting issue. No idea how it went this long without being caught. | 16:12 |
johnsom | Though I'm pretty sure we talked about this long ago. Maybe rm_work remembers the discussion | 16:12 |
johnsom | So, the API reference says that all of the PUT methods return 202 status codes. | 16:13 |
johnsom | This makes sense give that the updates are asynchronous, i.e. need to go update the certs in the amps. | 16:13 |
johnsom | However, the actual API code is returning a 200 code for these calls. | 16:14 |
johnsom | #link https://review.opendev.org/c/openstack/octavia/+/816393 | 16:14 |
johnsom | If you need a reference to the meanings: | 16:14 |
johnsom | #link https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html | 16:14 |
johnsom | The patch proposes simply changing the API reference to show 200's | 16:14 |
johnsom | However, I raised the question of should the API really be returning 202 since they are async methods. | 16:15 |
johnsom | Thoughts? | 16:16 |
gthiemonge | that's a good topic :D | 16:17 |
johnsom | lol, yes | 16:17 |
gthiemonge | AFAICT I see a lot of good reasons to reply 200 to these calls, and a lot of good reasons to reply 202 | 16:17 |
johnsom | Yeah, one could argue for 200 as I think the response includes the updated fields (though PENDING_UPDATE status). | 16:18 |
johnsom | You could argue 202 because they are in PENDING_UPDATE and not yet actually applied to the LBs | 16:19 |
gthiemonge | one concern raised in the review about fixing the code and not fixing the doc is that changing the code may break some clients/sdks | 16:19 |
johnsom | 202 is kind of a signal that you should poll for status updates | 16:19 |
johnsom | Yeah, changing status codes is.... ugly | 16:19 |
johnsom | I don't think it will break openstacksdk or our client | 16:20 |
johnsom | Both should be looking for a 2xx code and not specific sub-codes | 16:20 |
gthiemonge | I didn't any clients/sdks in openstack that use the return code | 16:21 |
gthiemonge | i'm lazy so I would recommend fixing the doc :D | 16:24 |
johnsom | Yeah, this is one that is... unfortunate. | 16:24 |
johnsom | Maybe add it to the v3 api list | 16:24 |
gthiemonge | johnsom: but if you think that 202 is more appropriate, let's fix the code | 16:24 |
johnsom | I think the community should decide. I would like to hear what rm_work thinks too | 16:25 |
gthiemonge | do we have a v3 api todo list? | 16:25 |
johnsom | We should | 16:25 |
johnsom | Maybe we should create a wiki page | 16:26 |
johnsom | With a big warning at the top that v3 is not planned any time soon | 16:26 |
gthiemonge | +1 | 16:26 |
johnsom | Ok, maybe we should table this topic and hope for more community feedback | 16:27 |
gthiemonge | yeah, async feedback | 16:28 |
gthiemonge | johnsom: thanks for raising this issue | 16:28 |
gthiemonge | #topic Open Discussion | 16:28 |
gthiemonge | Any other topics today? | 16:28 |
johnsom | I don't have anything | 16:29 |
gthiemonge | Ok! | 16:29 |
gthiemonge | Thanks everyone! | 16:29 |
gthiemonge | #endmeeting | 16:29 |
opendevmeet | Meeting ended Wed Nov 3 16:29:30 2021 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:29 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/octavia/2021/octavia.2021-11-03-16.00.html | 16:29 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/octavia/2021/octavia.2021-11-03-16.00.txt | 16:29 |
opendevmeet | Log: https://meetings.opendev.org/meetings/octavia/2021/octavia.2021-11-03-16.00.log.html | 16:29 |
johnsom | o/ | 16:29 |
opendevreview | Gregory Thiemonge proposed openstack/octavia master: Enable taskflow retry feature when waiting for compute https://review.opendev.org/c/openstack/octavia/+/816535 | 16:55 |
maysams | gthiemonge: hello again, getting back to the issue we chatted about earlier. I only found a call of "Sending lb create to octavia provider" nothing extra after that | 17:26 |
maysams | gthiemonge: regardless of it being with ovn provider the lb should have moved to pending_update state right? | 17:27 |
johnsom | That means Octavia handed the request off to the provider driver. | 17:27 |
maysams | right and I couldn't find some useful info on the provider logs | 17:28 |
johnsom | Yes, that happens. Octavia sets it to PENDING_UPDATE and hands it off to the provider driver. In the case of OVN, I think it immediately sets it ACTIVE as it doesn't do much state management. Not sure, as I'm not familiar with the OVN internals | 17:28 |
johnsom | Yeah, I don't know that OVN logs much. | 17:28 |
maysams | all right, thanks | 17:29 |
johnsom | You can ask for help in the neutron channel, they own the OVN provider and know more about it than most folks here. | 17:29 |
maysams | okay | 17:29 |
rm_work | Ugh yeah, I think we are now stuck with the current contract (returning 200) | 20:02 |
rm_work | Kinda sucks but there’s always v3 :D | 20:02 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!