*** vishalmanchanda has quit IRC | 00:09 | |
*** sapd1 has quit IRC | 00:37 | |
*** zzzeek has quit IRC | 01:16 | |
*** zzzeek has joined #openstack-lbaas | 01:17 | |
*** jamesdenton has quit IRC | 01:34 | |
*** jamesden_ has joined #openstack-lbaas | 01:34 | |
*** zzzeek has quit IRC | 02:52 | |
*** zzzeek has joined #openstack-lbaas | 02:55 | |
*** rcernin has quit IRC | 03:16 | |
*** psachin has joined #openstack-lbaas | 03:32 | |
*** ianychoi has quit IRC | 03:37 | |
*** rcernin has joined #openstack-lbaas | 03:37 | |
*** rcernin has quit IRC | 03:39 | |
*** rcernin has joined #openstack-lbaas | 03:39 | |
*** vishalmanchanda has joined #openstack-lbaas | 04:15 | |
*** jamesden_ has quit IRC | 04:24 | |
*** jamesdenton has joined #openstack-lbaas | 04:25 | |
*** jamesdenton has quit IRC | 05:09 | |
*** jamesdenton has joined #openstack-lbaas | 05:10 | |
*** xgerman_ has quit IRC | 06:59 | |
*** rcernin has quit IRC | 07:43 | |
*** openstackgerrit has joined #openstack-lbaas | 08:17 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: Add new scenario test to create LB in specific AZ https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/695349 | 08:17 |
---|---|---|
*** zzzeek has quit IRC | 08:19 | |
*** zzzeek has joined #openstack-lbaas | 08:21 | |
*** rpittau|afk is now known as rpittau | 08:24 | |
*** rcernin has joined #openstack-lbaas | 08:28 | |
*** rcernin has quit IRC | 08:35 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia master: Properly validate AZ-profile networks https://review.opendev.org/c/openstack/octavia/+/726494 | 08:41 |
*** zzzeek has quit IRC | 08:55 | |
*** zzzeek has joined #openstack-lbaas | 08:57 | |
*** rcernin has joined #openstack-lbaas | 09:10 | |
*** rcernin has quit IRC | 09:10 | |
*** rcernin has joined #openstack-lbaas | 09:10 | |
*** rcernin has quit IRC | 09:20 | |
*** rcernin has joined #openstack-lbaas | 09:29 | |
*** rcernin has quit IRC | 10:01 | |
*** rcernin has joined #openstack-lbaas | 10:04 | |
*** rcernin has quit IRC | 10:24 | |
*** rcernin has joined #openstack-lbaas | 10:48 | |
openstackgerrit | Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: Fix two-node job configuration https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/773888 | 10:56 |
*** priteau has joined #openstack-lbaas | 11:38 | |
*** rcernin has quit IRC | 11:39 | |
*** rcernin has joined #openstack-lbaas | 11:41 | |
*** rcernin has quit IRC | 11:46 | |
*** rcernin has joined #openstack-lbaas | 12:04 | |
*** rcernin has quit IRC | 12:09 | |
*** rcernin has joined #openstack-lbaas | 12:41 | |
*** rcernin has quit IRC | 12:46 | |
*** rcernin has joined #openstack-lbaas | 13:14 | |
*** zzzeek has quit IRC | 13:33 | |
*** irclogbot_1 has quit IRC | 13:33 | |
*** openstackgerrit has quit IRC | 13:33 | |
*** irclogbot_2 has joined #openstack-lbaas | 13:36 | |
*** jamesdenton has quit IRC | 13:36 | |
*** jamesdenton has joined #openstack-lbaas | 13:39 | |
*** rcernin has quit IRC | 15:05 | |
*** rpittau is now known as rpittau|afk | 15:08 | |
*** sapd1 has joined #openstack-lbaas | 15:38 | |
*** psachin has quit IRC | 15:59 | |
johnsom | FYI, I have but a -1 hold on the octavia RC1 release patches that have been proposed. | 16:08 |
johnsom | This will allow time for any bug fixes we want to get into RC1. I think we should plan to have those merged by Wednesday next week. I.e. we will make the call at the weekly meeting. | 16:09 |
*** xgerman_ has joined #openstack-lbaas | 16:31 | |
*** xgerman_ is now known as xgerman | 16:32 | |
*** jamesdenton has quit IRC | 19:07 | |
*** jamesdenton has joined #openstack-lbaas | 19:08 | |
*** jamesdenton has quit IRC | 19:29 | |
*** jamesdenton has joined #openstack-lbaas | 19:29 | |
*** vishalmanchanda has quit IRC | 19:35 | |
*** xgerman has quit IRC | 20:34 | |
*** servagem has quit IRC | 21:19 | |
*** jamesdenton has quit IRC | 21:30 | |
*** jamesdenton has joined #openstack-lbaas | 21:32 | |
rm_work | hmmm looks like Terraform openstack provider doesn't support monitor_port for octavia members :( | 21:32 |
*** rcernin has joined #openstack-lbaas | 22:03 | |
*** rouk has joined #openstack-lbaas | 22:39 | |
rouk | johnsom: you around? its late on a friday, so i dont expect much. wondering if you are aware of any pool update deadlocks that might happen? | 22:41 |
rouk | got a pool with no members, stuck in pending_update, which cant be modified. | 22:42 |
rouk | which is odd because theres no errors from the amphora-agent, nor octavia in general... just... silence. | 22:43 |
rouk | sent the pool update, crickets. | 22:43 |
johnsom | Hi | 22:44 |
rouk | cant reproduce it either, other LBs work fine. | 22:44 |
johnsom | Hmm, two possibilities I can think of: | 22:44 |
rouk | bugging after hours cause i dont wanna destroy what could be a bug, but i dont have any evidence of something wrong to actually make a report, heh. | 22:45 |
johnsom | 1. Your rabbit queue lost the message. The API will setup the pool in the DB (after validation and all of that), set the object in PENDING_UPDATE, then put the message on the queue. If the workers don't ever get the message, it could be "stuck". The solution is to use durable queues and HA with rabbit. | 22:46 |
johnsom | 2. Someone kill -9 or pulled the power on a worker that was actively working on the provisioning request. This means the code was pulled from under us, so we could not clean up. | 22:47 |
rouk | both of those sound... pretty unlikely. | 22:47 |
johnsom | Fix on # is flow resumption, which we are working on. It's working, but we are working through bugs. | 22:47 |
rouk | the amphora-agent got calls from the worker at the same time as the request came in | 22:48 |
johnsom | Debugging, I would start by looking through the worker logs for the pool ID. See if it ever came off the queue. | 22:48 |
johnsom | Ah, so the controller had started talking to the amphora about the pool? | 22:48 |
rouk | there is a backend for the pool id, yes. | 22:49 |
johnsom | Ok, so we can eliminate the queue issue. | 22:49 |
johnsom | So, I would find the worker than has the pool ID in the log and start looking through what the flow was doing at the time. | 22:50 |
rouk | Sending Pool bb274384-33e7-42cd-a957-6c13dd2fb321 batch member update to provider amphora | 22:50 |
johnsom | I'm happy to look at log files for you if you would like | 22:50 |
rouk | is the only thing in my logs, though. | 22:50 |
rouk | crickets after that. | 22:50 |
johnsom | Ah, so it was a batch member call that was in action. hmmm, maybe rm_work would have an idea? | 22:51 |
rouk | yeah, the pool create went through, and the backend is made, just the member batch add went silent. | 22:52 |
rouk | from what i can see, anyway. | 22:52 |
johnsom | Any chance you are willing to share the worker log from that "Sending pool batch member update" line with me? | 22:53 |
johnsom | You could PM me a private paste.openstack.org link if you are concerned about the content. | 22:54 |
rouk | i can share whatever... but that log message was from octaiva-api | 22:55 |
johnsom | Oh, wrong log. So that is the message that it went on the queue. What about on the worker logs? | 22:55 |
rouk | no mention of the pool. | 22:55 |
johnsom | Hmm, do you run your workers with debug logging on? | 22:56 |
rouk | dont believe so, no. | 22:57 |
rouk | Member batch update is a noop, returning early. | 22:58 |
rouk | that happens right after. | 22:58 |
rouk | just looking at the ~10 minutes of all octavia messages around that previous log. | 22:58 |
johnsom | I would have expected this line to be in one of your worker logs: https://github.com/openstack/octavia/blob/master/octavia/controller/queue/v1/endpoints.py#L117 | 22:58 |
rouk | marks lb active, 27 seconds later, sends pool update, then no-op | 22:59 |
rouk | would the no-op exit before that? | 22:59 |
johnsom | This message? https://github.com/openstack/octavia/blob/master/octavia/api/drivers/noop_driver/driver.py#L153 | 22:59 |
rouk | im not sure what a no-op would mean, sounds... redundant? | 22:59 |
johnsom | Ah, I see the bug | 23:00 |
rouk | "Member batch update is a noop, returning early." doesnt match that syntax, no. | 23:00 |
*** jamesdenton has quit IRC | 23:00 | |
johnsom | https://opendev.org/openstack/octavia/src/branch/master/octavia/api/drivers/amphora_driver/v1/driver.py#L297 | 23:00 |
rouk | grepping the code i dont see the message, hard to grep code without the context, heh. | 23:01 |
rouk | so... an update of no contents will deadlock? | 23:01 |
johnsom | So, somehow it decided that there were no changes to apply so it just returned. However, it didn't unlock the record back to ACTIVE. | 23:01 |
johnsom | I wonder how it got a batch update with no changes???? hmmm | 23:01 |
*** jamesdenton has joined #openstack-lbaas | 23:02 | |
johnsom | I wish rm_work was around, he knows the batch code better than I do. | 23:02 |
johnsom | So, you fix is to update the LB records in the DB back to ACTIVE. | 23:03 |
rouk | yeah, i can do that, now that ive shown you it. | 23:03 |
johnsom | Yeah, I'm going to open a story (bug) for this | 23:03 |
rouk | our staff keep using random things like terraform and gophercloud to do things... so its pulling teeth to get api calls, heh. | 23:05 |
rouk | trying to see if i can get the exact call they did. | 23:05 |
johnsom | Yeah, that can be "fun" | 23:06 |
rouk | bane of my existance. "it doesnt work" okay what error did you get "idk, it just says failed", okay what request id? "it doesnt say", okay what did you send? "idk, terraform generated it" | 23:07 |
johnsom | Yeah, the "terraform said failed", "why?", "Don't know" is one I'm familiar with.... | 23:08 |
johnsom | https://storyboard.openstack.org/#!/story/2008731 | 23:08 |
johnsom | That is a nasty bug. We should get that fixed as soon as possible and try to get it in the Wallaby release. | 23:09 |
johnsom | terraform probably just sent the batch update multiple times or something. "just to confirm it" lol | 23:09 |
rouk | (and backport it to U/V, heh) | 23:10 |
johnsom | Right | 23:10 |
rouk | at least its the first time ive seen it, so far. | 23:10 |
rouk | and we have 150 LBs | 23:10 |
johnsom | It should be pretty straightforward to fix | 23:10 |
johnsom | Yeah, that is an old feature, so you are lucky! | 23:10 |
rouk | old feature? i just see a PUT to pools, should i be telling them to use a better path? let me read the api spec... | 23:11 |
johnsom | I just meant it has been in the code for a long time | 23:12 |
johnsom | That bug has been there since queens | 23:13 |
rouk | multiple submits sounds like terraform. | 23:13 |
johnsom | So over three years | 23:13 |
rouk | http://paste.openstack.org/show/BxwfqZS3YAtUxNVAg5rX/ | 23:14 |
rouk | heres the exact script they ran, not the api calls... but this is as much as ill get. | 23:14 |
johnsom | rouk Thanks for reporting it and getting us the information we need to fix it. | 23:18 |
rouk | i edited the paste http://paste.openstack.org/show/IpJIIKaNTBKtzMDvI666/ they missed a line, this... ExtractErr might be calling it again. | 23:20 |
rouk | so yeah, probably a double call. | 23:20 |
rouk | but alright, have a good weekend, thanks for the help | 23:21 |
johnsom | Sure, NP | 23:21 |
*** tkajinam has quit IRC | 23:27 | |
*** jamesdenton has quit IRC | 23:30 | |
*** jamesdenton has joined #openstack-lbaas | 23:31 | |
rm_work | yeah, off | 23:40 |
rm_work | *oof | 23:40 |
*** stand has quit IRC | 23:47 |
Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!