*** yadnesh|away is now known as yadnesh | 05:28 | |
*** yadnesh is now known as yadnesh|afk | 08:09 | |
*** yadnesh|afk is now known as yadnesh | 08:53 | |
QG | Hello all ! | 15:23 |
---|---|---|
QG | Did you already have a member staying in pending_update due to 'taskflow.conductors.backends.impl_executor taskflow.exceptions.Duplicate: Atoms with duplicate names found: ['octavia-mark-member-active-indb-0129470e-40f8-4bce-9ed9-b5d4badc3f26']' ? | 15:23 |
johnsom | Oh! That is a coding error. What version of Octavia? | 15:25 |
QG | We are running Yoga version | 15:25 |
johnsom | Can you provide the log snippet at paste.openstack.org? Logs back to the start of the call that raised this would be helpful. | 15:25 |
johnsom | Really curious how that could happen given the UUID there. Is jobboard enabled in this deployment? | 15:26 |
QG | yes jobboard is enabled | 15:27 |
QG | and we have also this patch : https://review.opendev.org/c/openstack/octavia/+/838438 | 15:28 |
gthiemonge | maybe a task that is executed on 2 controllers at the same time? | 15:29 |
gthiemonge | https://review.opendev.org/c/openstack/octavia/+/838438 | 15:30 |
gthiemonge | ^ this commit should be backported | 15:30 |
johnsom | Maybe..... That would be odd though. Typically that error means the flow compile failed, like the flow was built wrong. But yeah, I am wondering if this isn't a jobboard side effect we haven't seen yet. | 15:30 |
johnsom | QG getting those logs would be super helpful so we can link them to a storyboard. | 15:31 |
QG | yep i'm trying to anonymize the log | 15:32 |
johnsom | Ack, thanks! | 15:33 |
QG | https://paste.opendev.org/show/b9hNswobCUpqnjMQd3uq/ | 15:40 |
QG | this is the logs from the worker-0 | 15:42 |
QG | and we have two workers | 15:42 |
gthiemonge | "old='[]', new='[None]', updated='[]'" this is a bit weird | 15:43 |
QG | the other one at the same time was logging this : https://paste.opendev.org/show/b1hK1bFh18WeSQydYIlt/ | 15:43 |
gthiemonge | updated='['0129470e-40f8-4bce-9ed9-b5d4badc3f26', '0129470e-40f8-4bce-9ed9-b5d4badc3f26'] | 15:46 |
gthiemonge | same id | 15:46 |
QG | so Octavia tried to update twice a member | 15:47 |
gthiemonge | QG: you don't have the content of the API call? | 15:49 |
johnsom | gthiemonge Yeah, and that is the ID that is the problem. So, I think we just found the RCA on that bug. | 15:49 |
johnsom | That should have been de-duplicated | 15:50 |
QG | gthiemonge: let me find it and upload it to paste.openstack.org | 15:50 |
gthiemonge | johnsom: maybe the API call has updated twice the same member? | 15:51 |
johnsom | yeah, but we should either de-dup that or throw them back an error. | 15:51 |
gthiemonge | yes of course | 15:51 |
johnsom | I vaguely remember a bug like this in the past. I'm searching to see if it was fixed, but not backported | 15:51 |
johnsom | Nope, I didn't find any patches, maybe it is just a story | 15:53 |
gthiemonge | based on those logs, I'm going to try to reproduce it | 15:56 |
gthiemonge | #startmeeting Octavia | 16:01 |
opendevmeet | Meeting started Wed Nov 9 16:01:04 2022 UTC and is due to finish in 60 minutes. The chair is gthiemonge. Information about MeetBot at http://wiki.debian.org/MeetBot. | 16:01 |
opendevmeet | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 16:01 |
opendevmeet | The meeting name has been set to 'octavia' | 16:01 |
gthiemonge | Hi Folks | 16:01 |
matfechner | o/ | 16:01 |
tweining | o/ | 16:01 |
pyjou | o/ | 16:01 |
QG | o/ | 16:01 |
johnsom | o/ | 16:01 |
gthiemonge | #topic Announcements | 16:03 |
gthiemonge | ** Antelope-1 milestone | 16:03 |
gthiemonge | next week is Antelope-1, we planned to get some RFEs merged by this milestone (amphora vertical scaling) | 16:04 |
gthiemonge | #link https://review.opendev.org/q/topic:amp-cpu-pinning+is:open | 16:04 |
gthiemonge | so please review/test these commits | 16:05 |
tweining | if you need assistance wrt. testing feel free to ping me | 16:06 |
gthiemonge | tweining: thanks | 16:07 |
gthiemonge | any other announcements folks? | 16:07 |
gthiemonge | ok | 16:08 |
gthiemonge | #topic CI Status | 16:08 |
gthiemonge | no update this week | 16:09 |
gthiemonge | #topic Brief progress reports / bugs needing review | 16:09 |
gthiemonge | I haven't spent a lot of time on upstream stuff, I'm working on Octavia tests downstream, I hope I will focus more on upstream tasks next week | 16:10 |
johnsom | I am still working on secrets consumers. It was a bigger job than I expected. | 16:10 |
gthiemonge | #topic Open Discussion | 16:14 |
gthiemonge | anything folks? | 16:15 |
tweining | no really | 16:16 |
tweining | *not | 16:16 |
johnsom | Nothing here | 16:16 |
gthiemonge | ok, that was a quick one | 16:17 |
gthiemonge | thank you! | 16:17 |
gthiemonge | #endmeeting | 16:17 |
opendevmeet | Meeting ended Wed Nov 9 16:17:04 2022 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:17 |
opendevmeet | Minutes: https://meetings.opendev.org/meetings/octavia/2022/octavia.2022-11-09-16.01.html | 16:17 |
opendevmeet | Minutes (text): https://meetings.opendev.org/meetings/octavia/2022/octavia.2022-11-09-16.01.txt | 16:17 |
opendevmeet | Log: https://meetings.opendev.org/meetings/octavia/2022/octavia.2022-11-09-16.01.log.html | 16:17 |
QG | gthiemonge: this is the call on api side : | 16:18 |
QG | https://paste.opendev.org/show/b7Gqp6BAfYUy4uQLLs2v/ | 16:18 |
gthiemonge | QG: ok, we don't have the json params. I'm trying to forge an API call that triggers those duplicate member IDs | 16:21 |
tweining | btw. did we miss to backport https://review.opendev.org/c/openstack/octavia/+/838438 ? | 16:36 |
gthiemonge | we didn't miss it :D we still have a lot of fixes from Z to backport, but there's also a long list of backports to review, I didn't want to create too many bp at the same time | 16:37 |
tweining | yeah, I see your comment above now | 16:37 |
*** yadnesh is now known as yadnesh|away | 16:42 | |
gthiemonge | taskflow.exceptions.Duplicate: Atoms with duplicate names found: ['octavia-mark-member-active-indb-039c16a3-5b2f-4385-aad2-a541288c10ca'] | 16:43 |
gthiemonge | Ok I reproduce it | 16:43 |
johnsom | Nice! | 16:44 |
gthiemonge | QG: johnsom: so I think the only way to trigger it is to update the same member twice in a PUT call (like https://paste.opendev.org/show/bKQTYSBpBEpKYEmVWTtb/) | 17:00 |
johnsom | yeah, so maybe missing validation that the call isn't updating the same member twice in the same call? | 17:02 |
QG | gthiemonge: GG ! | 17:06 |
gthiemonge | QG: do you know which client/application sent this request? | 17:07 |
gthiemonge | because if we fix it, the API will return a 400 (ValidationException error) | 17:07 |
QG | gthiemonge: Yep it's | 17:08 |
QG | HashiCorp Terraform/1.3.4 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 gophercloud/2.0.0" | 17:08 |
gthiemonge | ok | 17:09 |
QG | uhhhh there is something i don't understand | 17:10 |
QG | in the https://paste.opendev.org/show/b7Gqp6BAfYUy4uQLLs2v/ i put two lines, one for the apache log and the other one for octavia-api but they are in fact one call | 17:11 |
QG | let me upload more logs | 17:12 |
johnsom | Yeah, the request flows through those | 17:12 |
QG | https://paste.opendev.org/show/bcptBtJxNERLcXNt17su/ | 17:18 |
QG | so i added the field wsgi for apache log | 17:18 |
johnsom | What is your question? | 17:20 |
johnsom | This looks like a pretty normal terraform log | 17:20 |
QG | from the logs there is no call for updating the same member twice in a PUT | 17:21 |
johnsom | So, this call: PUT /v2.0/lbaas/pools/a7fecc48-c125-4b9a-bff4-cbef08b6326e/members HTTP/1.1" status: 202 len: - time: 775(ms) | 17:22 |
QG | I thought you deduced that from the fact that there was a duplicate log line when there was one for octavia-api and one for apache | 17:22 |
johnsom | This is a batch member update call. The JSON body of this PUT had the duplicate members in it | 17:22 |
johnsom | Oh, nope, it was this: | 17:22 |
johnsom | updated='['0129470e-40f8-4bce-9ed9-b5d4badc3f26', '0129470e-40f8-4bce-9ed9-b5d4badc3f26'] | 17:23 |
gthiemonge | yep | 17:23 |
QG | oh ok thanks | 17:23 |
johnsom | From this snippet: https://paste.opendev.org/show/b1hK1bFh18WeSQydYIlt/ | 17:23 |
QG | do you advice us to log the body also ? | 17:23 |
johnsom | It was the worker log | 17:23 |
gthiemonge | which is also the id of the duplicate atom | 17:23 |
johnsom | Generally you don't want to, it's a lot of data. | 17:24 |
johnsom | The worker will generally log what you need to know out of the request | 17:24 |
gthiemonge | but terraform or gophercloud probably sends a invalid request | 17:24 |
johnsom | Yeah, it has had a history of issues. Like originally it didn't honor the 409's | 17:25 |
gthiemonge | QG: do you have the logs (DEBUG) from terraform? https://github.com/terraform-provider-openstack/terraform-provider-openstack/blob/131410572775d76c1dac5cec0d39d9bdde65f4ab/openstack/resource_openstack_lb_members_v2.go#L211 | 17:26 |
QG | no unfortunately a customer did that | 17:28 |
gthiemonge | because here, it triggers a bug in octavia, and when we will have a fix, they will have a bug in terraform | 17:31 |
QG | i will try to ask customer if he have the DEBUG logs from terraform | 17:35 |
johnsom | I would argue updating the same member twice in one API call is a bug in terraform | 17:35 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!