Friday, 2021-03-19

*** vishalmanchanda has quit IRC00:09
*** sapd1 has quit IRC00:37
*** zzzeek has quit IRC01:16
*** zzzeek has joined #openstack-lbaas01:17
*** jamesdenton has quit IRC01:34
*** jamesden_ has joined #openstack-lbaas01:34
*** zzzeek has quit IRC02:52
*** zzzeek has joined #openstack-lbaas02:55
*** rcernin has quit IRC03:16
*** psachin has joined #openstack-lbaas03:32
*** ianychoi has quit IRC03:37
*** rcernin has joined #openstack-lbaas03:37
*** rcernin has quit IRC03:39
*** rcernin has joined #openstack-lbaas03:39
*** vishalmanchanda has joined #openstack-lbaas04:15
*** jamesden_ has quit IRC04:24
*** jamesdenton has joined #openstack-lbaas04:25
*** jamesdenton has quit IRC05:09
*** jamesdenton has joined #openstack-lbaas05:10
*** xgerman_ has quit IRC06:59
*** rcernin has quit IRC07:43
*** openstackgerrit has joined #openstack-lbaas08:17
openstackgerritGregory Thiemonge proposed openstack/octavia-tempest-plugin master: Add new scenario test to create LB in specific AZ  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/69534908:17
*** zzzeek has quit IRC08:19
*** zzzeek has joined #openstack-lbaas08:21
*** rpittau|afk is now known as rpittau08:24
*** rcernin has joined #openstack-lbaas08:28
*** rcernin has quit IRC08:35
openstackgerritGregory Thiemonge proposed openstack/octavia master: Properly validate AZ-profile networks  https://review.opendev.org/c/openstack/octavia/+/72649408:41
*** zzzeek has quit IRC08:55
*** zzzeek has joined #openstack-lbaas08:57
*** rcernin has joined #openstack-lbaas09:10
*** rcernin has quit IRC09:10
*** rcernin has joined #openstack-lbaas09:10
*** rcernin has quit IRC09:20
*** rcernin has joined #openstack-lbaas09:29
*** rcernin has quit IRC10:01
*** rcernin has joined #openstack-lbaas10:04
*** rcernin has quit IRC10:24
*** rcernin has joined #openstack-lbaas10:48
openstackgerritGregory Thiemonge proposed openstack/octavia-tempest-plugin master: Fix two-node job configuration  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/77388810:56
*** priteau has joined #openstack-lbaas11:38
*** rcernin has quit IRC11:39
*** rcernin has joined #openstack-lbaas11:41
*** rcernin has quit IRC11:46
*** rcernin has joined #openstack-lbaas12:04
*** rcernin has quit IRC12:09
*** rcernin has joined #openstack-lbaas12:41
*** rcernin has quit IRC12:46
*** rcernin has joined #openstack-lbaas13:14
*** zzzeek has quit IRC13:33
*** irclogbot_1 has quit IRC13:33
*** openstackgerrit has quit IRC13:33
*** irclogbot_2 has joined #openstack-lbaas13:36
*** jamesdenton has quit IRC13:36
*** jamesdenton has joined #openstack-lbaas13:39
*** rcernin has quit IRC15:05
*** rpittau is now known as rpittau|afk15:08
*** sapd1 has joined #openstack-lbaas15:38
*** psachin has quit IRC15:59
johnsomFYI, I have but a -1 hold on the octavia RC1 release patches that have been proposed.16:08
johnsomThis will allow time for any bug fixes we want to get into RC1. I think we should plan to have those merged by Wednesday next week. I.e. we will make the call at the weekly meeting.16:09
*** xgerman_ has joined #openstack-lbaas16:31
*** xgerman_ is now known as xgerman16:32
*** jamesdenton has quit IRC19:07
*** jamesdenton has joined #openstack-lbaas19:08
*** jamesdenton has quit IRC19:29
*** jamesdenton has joined #openstack-lbaas19:29
*** vishalmanchanda has quit IRC19:35
*** xgerman has quit IRC20:34
*** servagem has quit IRC21:19
*** jamesdenton has quit IRC21:30
*** jamesdenton has joined #openstack-lbaas21:32
rm_workhmmm looks like Terraform openstack provider doesn't support monitor_port for octavia members :(21:32
*** rcernin has joined #openstack-lbaas22:03
*** rouk has joined #openstack-lbaas22:39
roukjohnsom: you around? its late on a friday, so i dont expect much. wondering if you are aware of any pool update deadlocks that might happen?22:41
roukgot a pool with no members, stuck in pending_update, which cant be modified.22:42
roukwhich is odd because theres no errors from the amphora-agent, nor octavia in general... just... silence.22:43
rouksent the pool update, crickets.22:43
johnsomHi22:44
roukcant reproduce it either, other LBs work fine.22:44
johnsomHmm, two possibilities I can think of:22:44
roukbugging after hours cause i dont wanna destroy what could be a bug, but i dont have any evidence of something wrong to actually make a report, heh.22:45
johnsom1. Your rabbit queue lost the message. The API will setup the pool in the DB (after validation and all of that), set the object in PENDING_UPDATE, then put the message on the queue. If the workers don't ever get the message, it could be "stuck". The solution is to use durable queues and HA with rabbit.22:46
johnsom2. Someone kill -9 or pulled the power on a worker that was actively working on the provisioning request. This means the code was pulled from under us, so we could not clean up.22:47
roukboth of those sound... pretty unlikely.22:47
johnsomFix on # is flow resumption, which we are working on. It's working, but we are working through bugs.22:47
roukthe amphora-agent got calls from the worker at the same time as the request came in22:48
johnsomDebugging, I would start by looking through the worker logs for the pool ID. See if it ever came off the queue.22:48
johnsomAh, so the controller had started talking to the amphora about the pool?22:48
roukthere is a backend for the pool id, yes.22:49
johnsomOk, so we can eliminate the queue issue.22:49
johnsomSo, I would find the worker than has the pool ID in the log and start looking through what the flow was doing at the time.22:50
roukSending Pool bb274384-33e7-42cd-a957-6c13dd2fb321 batch member update to provider amphora22:50
johnsomI'm happy to look at log files for you if you would like22:50
roukis the only thing in my logs, though.22:50
roukcrickets after that.22:50
johnsomAh, so it was a batch member call that was in action. hmmm, maybe rm_work would have an idea?22:51
roukyeah, the pool create went through, and the backend is made, just the member batch add went silent.22:52
roukfrom what i can see, anyway.22:52
johnsomAny chance you are willing to share the worker log from that "Sending pool batch member update" line with me?22:53
johnsomYou could PM me a private paste.openstack.org link if you are concerned about the content.22:54
rouki can share whatever... but that log message was from octaiva-api22:55
johnsomOh, wrong log. So that is the message that it went on the queue. What about on the worker logs?22:55
roukno mention of the pool.22:55
johnsomHmm, do you run your workers with debug logging on?22:56
roukdont believe so, no.22:57
roukMember batch update is a noop, returning early.22:58
roukthat happens right after.22:58
roukjust looking at the ~10 minutes of all octavia messages around that previous log.22:58
johnsomI would have expected this line to be in one of your worker logs: https://github.com/openstack/octavia/blob/master/octavia/controller/queue/v1/endpoints.py#L11722:58
roukmarks lb active, 27 seconds later, sends pool update, then no-op22:59
roukwould the no-op exit before that?22:59
johnsomThis message? https://github.com/openstack/octavia/blob/master/octavia/api/drivers/noop_driver/driver.py#L15322:59
roukim not sure what a no-op would mean, sounds... redundant?22:59
johnsomAh, I see the bug23:00
rouk"Member batch update is a noop, returning early." doesnt match that syntax, no.23:00
*** jamesdenton has quit IRC23:00
johnsomhttps://opendev.org/openstack/octavia/src/branch/master/octavia/api/drivers/amphora_driver/v1/driver.py#L29723:00
roukgrepping the code i dont see the message, hard to grep code without the context, heh.23:01
roukso... an update of no contents will deadlock?23:01
johnsomSo, somehow it decided that there were no changes to apply so it just returned. However, it didn't unlock the record back to ACTIVE.23:01
johnsomI wonder how it got a batch update with no changes????  hmmm23:01
*** jamesdenton has joined #openstack-lbaas23:02
johnsomI wish rm_work was around, he knows the batch code better than I do.23:02
johnsomSo, you fix is to update the LB records in the DB back to ACTIVE.23:03
roukyeah, i can do that, now that ive shown you it.23:03
johnsomYeah, I'm going to open a story (bug) for this23:03
roukour staff keep using random things like terraform and gophercloud to do things... so its pulling teeth to get api calls, heh.23:05
rouktrying to see if i can get the exact call they did.23:05
johnsomYeah, that can be "fun"23:06
roukbane of my existance. "it doesnt work" okay what error did you get "idk, it just says failed", okay what request id? "it doesnt say", okay what did you send? "idk, terraform generated it"23:07
johnsomYeah, the "terraform said failed", "why?", "Don't know" is one I'm familiar with....23:08
johnsomhttps://storyboard.openstack.org/#!/story/200873123:08
johnsomThat is a nasty bug. We should get that fixed as soon as possible and try to get it in the Wallaby release.23:09
johnsomterraform probably just sent the batch update multiple times or something.  "just to confirm it" lol23:09
rouk(and backport it to U/V, heh)23:10
johnsomRight23:10
roukat least its the first time ive seen it, so far.23:10
roukand we have 150 LBs23:10
johnsomIt should be pretty straightforward to fix23:10
johnsomYeah, that is an old feature, so you are lucky!23:10
roukold feature? i just see a PUT to pools, should i be telling them to use a better path? let me read the api spec...23:11
johnsomI just meant it has been in the code for a long time23:12
johnsomThat bug has been there since queens23:13
roukmultiple submits sounds like terraform.23:13
johnsomSo over three years23:13
roukhttp://paste.openstack.org/show/BxwfqZS3YAtUxNVAg5rX/23:14
roukheres the exact script they ran, not the api calls... but this is as much as ill get.23:14
johnsomrouk Thanks for reporting it and getting us the information we need to fix it.23:18
rouki edited the paste http://paste.openstack.org/show/IpJIIKaNTBKtzMDvI666/ they missed a line, this... ExtractErr might be calling it again.23:20
roukso yeah, probably a double call.23:20
roukbut alright, have a good weekend, thanks for the help23:21
johnsomSure, NP23:21
*** tkajinam has quit IRC23:27
*** jamesdenton has quit IRC23:30
*** jamesdenton has joined #openstack-lbaas23:31
rm_workyeah, off23:40
rm_work*oof23:40
*** stand has quit IRC23:47

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!