colin- | i'm getting that feedback from the driver even on IPs i believe are not in use and who don't have valid ARP responses, any idea why that might be? false positive maybe? | 00:29 |
---|---|---|
johnsom | Not likely. Can you do a "openstack port list" and search for the IP? | 00:30 |
rm_work | johnsom: i am officially nunned | 00:35 |
johnsom | Well, I was the one that did it, so you should be throwing nun my way | 00:36 |
rm_work | did I escape nun-shaming for https://review.openstack.org/634302 immediately following agreeing that there was no time to deal with it before Stein? :P | 00:36 |
rm_work | ah | 00:36 |
rm_work | i just figured it was me :P | 00:36 |
johnsom | Sigh | 00:37 |
rm_work | it was only an hour or two distraction while i was building devstacks anyway <_< | 00:37 |
rm_work | i had just forgotten | 00:37 |
johnsom | colin- FYI, that message is coming straight from neutron here: https://github.com/openstack/octavia/blob/master/octavia/network/drivers/neutron/allowed_address_pairs.py#L414 | 00:37 |
johnsom | Ok, I have decided I need to override whole methods in openstacksdk to make failover work. Joy. Fun for tomorrow. | 00:39 |
johnsom | It was kindly silenty passing the failover test even though it didn't even call the API | 00:39 |
johnsom | All because it's hard coded to always expect a header or body change in a PUT call. If nothing changed, return success and don't bother the call! | 00:40 |
johnsom | lol | 00:40 |
johnsom | I need to stop early today, off to see W. Kamau Bell. He is in town tonight. | 00:42 |
xgerman | have fun! | 00:58 |
rm_work | wat | 00:59 |
rm_work | that's .... wat | 00:59 |
rm_work | ok. whelp | 00:59 |
colin- | thanks johnsom that will help figure out what's going on | 01:16 |
colin- | going to pause here for the day, appreciate the quick replies | 01:16 |
*** Swami has quit IRC | 01:32 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 02:14 | |
*** hongbin has joined #openstack-lbaas | 02:29 | |
*** sapd1 has joined #openstack-lbaas | 02:34 | |
*** psachin has joined #openstack-lbaas | 02:45 | |
*** sapd1 has quit IRC | 02:50 | |
*** dims has quit IRC | 02:53 | |
*** ramishra has joined #openstack-lbaas | 03:57 | |
*** Dinesh_Bhor has quit IRC | 03:58 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 04:08 | |
*** hongbin has quit IRC | 05:24 | |
*** numans has joined #openstack-lbaas | 05:43 | |
*** AlexStaf has joined #openstack-lbaas | 05:58 | |
*** AlexStaf has quit IRC | 06:14 | |
*** ramishra has quit IRC | 06:48 | |
*** psachin has quit IRC | 07:04 | |
*** pcaruana has joined #openstack-lbaas | 07:19 | |
*** ramishra has joined #openstack-lbaas | 07:23 | |
*** psachin has joined #openstack-lbaas | 07:33 | |
*** rpittau has joined #openstack-lbaas | 07:59 | |
*** yamamoto has quit IRC | 08:00 | |
*** yamamoto has joined #openstack-lbaas | 08:01 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: Add octavia-v2-dsvm-scenario-fedora-latest job https://review.openstack.org/600381 | 08:06 |
*** ccamposr has joined #openstack-lbaas | 08:21 | |
*** mkuf_ has quit IRC | 08:48 | |
*** rcernin has joined #openstack-lbaas | 08:57 | |
openstackgerrit | Reedip proposed openstack/octavia-tempest-plugin master: Modify Member tests for Provider Drivers https://review.openstack.org/598476 | 08:58 |
*** salmankhan has joined #openstack-lbaas | 09:14 | |
*** mkuf_ has joined #openstack-lbaas | 09:17 | |
*** yamamoto has quit IRC | 09:18 | |
*** salmankhan1 has joined #openstack-lbaas | 09:26 | |
*** salmankhan has quit IRC | 09:26 | |
*** salmankhan1 is now known as salmankhan | 09:26 | |
*** pcaruana has quit IRC | 09:30 | |
*** salmankhan has quit IRC | 09:34 | |
*** salmankhan has joined #openstack-lbaas | 09:37 | |
*** pcaruana has joined #openstack-lbaas | 09:42 | |
*** mkuf_ has quit IRC | 09:48 | |
*** mkuf_ has joined #openstack-lbaas | 09:58 | |
*** Dinesh_Bhor has quit IRC | 10:01 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 10:06 | |
*** yamamoto has joined #openstack-lbaas | 10:41 | |
*** yamamoto has quit IRC | 10:51 | |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Adds flavor profile API tests https://review.openstack.org/630411 | 10:51 |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Adds flavor API tests https://review.openstack.org/630804 | 10:51 |
*** Dinesh_Bhor has quit IRC | 11:01 | |
*** rcernin has quit IRC | 11:12 | |
*** pcaruana has quit IRC | 12:05 | |
*** pcaruana has joined #openstack-lbaas | 12:19 | |
*** psachin has quit IRC | 12:21 | |
*** pcaruana|afk| has joined #openstack-lbaas | 12:25 | |
*** pcaruana has quit IRC | 12:26 | |
*** pcaruana|afk| is now known as pcaruana | 12:27 | |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Modify Member tests for Provider Drivers https://review.openstack.org/598476 | 13:01 |
*** trown|outtypewww is now known as trown | 13:03 | |
*** pcaruana has quit IRC | 13:13 | |
*** yamamoto has joined #openstack-lbaas | 13:19 | |
*** pcaruana has joined #openstack-lbaas | 13:20 | |
*** dims has joined #openstack-lbaas | 13:41 | |
*** celebdor has joined #openstack-lbaas | 13:43 | |
*** dims has quit IRC | 14:38 | |
*** dims has joined #openstack-lbaas | 14:44 | |
*** dims has quit IRC | 15:01 | |
tomtom001 | johnsom, xgerman: thanks so much for your help, I've just about got Octavia ACTIVE_STANDBY working 100%!! | 15:05 |
tomtom001 | One question is that the /0.5/info url keeps timing out when it re-builds after a failover: http://paste.openstack.org/show/744395/ I think it is not waiting long enough but I cannot find the setting to make it wait for the new instance to be up. Can you tell me which setting that is? | 15:06 |
*** sapd1 has joined #openstack-lbaas | 15:09 | |
*** dims has joined #openstack-lbaas | 15:14 | |
*** dims has quit IRC | 15:19 | |
*** dims has joined #openstack-lbaas | 15:20 | |
*** yamamoto has quit IRC | 15:23 | |
*** yamamoto has joined #openstack-lbaas | 15:23 | |
*** yamamoto has quit IRC | 15:23 | |
openstackgerrit | Corey Bryant proposed openstack/octavia master: Add missing import octavia/opts.py https://review.openstack.org/634441 | 15:51 |
*** sapd1 has quit IRC | 15:57 | |
*** pcaruana has quit IRC | 15:59 | |
johnsom | tomtom001 Do you get an ERROR log there or just those WARNING messages? | 16:04 |
johnsom | Those warnings are normal while we wait for nova to finish booting the service VM. | 16:04 |
tomtom001 | johnsom: yeah and eventually I get this(full traceback): http://paste.openstack.org/show/744401/ | 16:11 |
johnsom | tomtom001 Ok, so that means the amphora image is bad. There should be a log message with the explanation of when there was an error inside the amphora. | 16:12 |
tomtom001 | johnsom: wow, it's the same image I'm using for all the LB's, they work without error. It's only on failover that I see this. | 16:17 |
tomtom001 | I will check the amphora image log though | 16:17 |
tomtom001 | johnsom: so you're saying look at amphora.log inside amphora? | 16:18 |
tomtom001 | just to be clear | 16:18 |
johnsom | Ok, so that is interesting actually. Any chance you can past 20 lines before and 20 lines after that traceback? | 16:19 |
johnsom | Well, the controller should capture a "why" log message for that internal error. Sometimes it doesn't and then we would need to look inside the amphora.log and the syslog/messages log inside the amphora. | 16:20 |
tomtom001 | Yes let me get them for you. | 16:21 |
*** yamamoto has joined #openstack-lbaas | 16:26 | |
*** yamamoto has quit IRC | 16:34 | |
tomtom001 | johnsom: ok, here is the octavia-worker log: 474cG3ao1hOrCyW2y5HdyPf88necVBS7zUOl0yINIfz0Ey6YOF7q\npX38XbSWueRRnXHFl9WtJl/UH3F2bA3/uubkY1UKfoWfr2c/lQM2B7fSTNSvQ5Wp\nF28KGwKBgQCeuYgcNSrUdC65jI1ADM9NaPXqb4mYqqvKLdadS9tRCyH2oO31GRPf\nOQRARyrBSg2/woNxB1Z7mmEFT/gCVJW262xAZGd0MpSMJ7NJUXLGb2cRwTSs9wIT\n80ekXoo2TdPBjdzuBX1SXva0t5F5HxkPQVo4Enlk1TlPnj4Cp6c2zA==\n-----END RSA PRIVATE KEY-----\n'} | 16:37 |
tomtom001 | |__Atom 'octavia-failover-amphora-flow-octavia-create-amp-for-lb-subflow-octavia-create-amphora-indb' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {}, 'provides': u'be18776e-9b14-49dd-8119-c2f0c538dfd3'} | 16:37 |
tomtom001 | |__Flow 'octavia-failover-amphora-flow-octavia-create-amp-for-lb-subflow' | 16:37 |
tomtom001 | |__Atom 'octavia-failover-amphora-flow-octavia-get-amphora-for-lb-subflow-octavia-mapload-balancer-to-amphora' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'server_group_id': u'e0af734a-5f76-4a87-a476-f4d1a1959b35', 'loadbalancer_id': u'8e92efeb-e2a3-4a1c-b61a-2b47f67de93b'}, 'provides': | 16:37 |
tomtom001 | None} | 16:37 |
tomtom001 | |__Flow 'octavia-failover-amphora-flow-octavia-get-amphora-for-lb-subflow' | 16:37 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.database_tasks.GetAmphoraDetails' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': <octavia.common.data_models.Amphora object at 0x7f05c85f1a90>} | 16:38 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraDeletedInDB' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': None} | 16:38 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.network_tasks.WaitForPortDetach' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': None} | 16:38 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.compute_tasks.ComputeDelete' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': None} | 16:38 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraHealthBusy' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': None} | 16:38 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.database_tasks.MarkAmphoraPendingDeleteInDB' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': None} | 16:38 |
johnsom | Yeah, we don't need the tree, it's actually before that | 16:38 |
tomtom001 | |__Atom 'octavia.controller.worker.tasks.lifecycle_tasks.AmphoraToErrorOnRevertTask' {'intention': 'EXECUTE', 'state': 'SUCCESS', 'requires': {'amphora': <octavia.common.data_models.Amphora object at 0x7f05cb143490>}, 'provides': None} | 16:38 |
tomtom001 | |__Flow 'octavia-failover-amphora-flow': InternalServerError: Internal Server Error | 16:38 |
tomtom001 | 2019-02-01 16:34:00.973 30045 ERROR octavia.controller.wor | 16:38 |
tomtom001 | ok | 16:38 |
tomtom001 | sorry, copy paste didn't work | 16:38 |
tomtom001 | johnsom: octavia-worker.log: http://paste.openstack.org/raw/744403/ | 16:38 |
tomtom001 | got it that time. | 16:38 |
tomtom001 | johnsom: Ok, here is right before the tree started: http://paste.openstack.org/show/744408/ | 16:40 |
johnsom | The line I am looking for should read: ERROR Amphora agent returned unexpected result code %s with response %s' | 16:41 |
johnsom | Lol, ok. Yeah, it didn't get it. "due to: Internal Server Error: InternalServerError: Internal Server Error" | 16:41 |
johnsom | Not helpful. sigh, we need to work on that more. | 16:42 |
johnsom | Ok, so root cause is going to be saving the amphora and going into the /var/log/syslog (messages on redhat style) and search for amphora-agent | 16:42 |
tomtom001 | oh ok, yeah I just grep for it and the phrase you're looking for doesn't exist | 16:42 |
johnsom | Yeah, the controller didn't capture the true error, just the repeated internal error string | 16:43 |
tomtom001 | ok, yeah I'll go onto the amphora instance and look for the error there. | 16:43 |
johnsom | Now, cleanup may automatically delete the amphora that had the problem. If that is the case, I can help you with the configuration change that would allow it to remain | 16:43 |
tomtom001 | ok, yeah it does just loop endlessly, what config change can I make? | 16:44 |
johnsom | Ok, this is for testing only. It disables ALL of the error handling and cleanup. So don't set this on production, etc. | 16:44 |
tomtom001 | agreed will not implement in production | 16:45 |
johnsom | Believe it or not, we did have an operator do it in production.... So, I try to be very explicit about it. | 16:45 |
tomtom001 | thank you, it can happen. | 16:46 |
johnsom | Change this setting to True: https://docs.openstack.org/octavia/latest/configuration/configref.html#task_flow.disable_revert | 16:46 |
johnsom | Then restart your controller that is getting the "internal error" message. CW if you are manually failing over, HM if it is automatic | 16:46 |
johnsom | Everything will run the same, but when it hits the error it will not attempt to repair or cleanup the resources. | 16:47 |
openstackgerrit | Merged openstack/octavia-tempest-plugin master: Adds provider API tests https://review.openstack.org/631105 | 16:56 |
*** ramishra has quit IRC | 16:56 | |
johnsom | Yay, more tests merging! | 16:57 |
tomtom001 | johnsom: Here is the syslog from amphora: http://paste.openstack.org/show/744409/ | 17:01 |
tomtom001 | johnsom: here is the haproxy.log from amphora: http://paste.openstack.org/show/744410/ | 17:02 |
johnsom | Hmmm, that was fixed a long time ago.... Let me dig for that fix | 17:02 |
*** salmankhan1 has joined #openstack-lbaas | 17:04 | |
tomtom001 | is this an amphora image bug, or one in octavia? | 17:04 |
johnsom | Amphora image | 17:04 |
*** salmankhan has quit IRC | 17:04 | |
*** salmankhan1 is now known as salmankhan | 17:04 | |
tomtom001 | ok, thanks | 17:05 |
johnsom | tomtom001 It was this bug: https://review.openstack.org/#/c/577238/ | 17:05 |
johnsom | It was fixed in Rocky and backported to queens here: https://review.openstack.org/578014 | 17:06 |
tomtom001 | ok, thank you | 17:06 |
colin- | johnsom: worked out the explicit VIP address, thanks for the help | 17:08 |
johnsom | Ok cool. Something had it in use you didn't know about? | 17:08 |
colin- | yes :) | 17:11 |
*** rpittau has quit IRC | 17:20 | |
*** trown is now known as trown|lunch | 17:41 | |
*** salmankhan has quit IRC | 17:48 | |
*** salmankhan has joined #openstack-lbaas | 17:53 | |
*** gcheresh has joined #openstack-lbaas | 18:27 | |
*** salmankhan has quit IRC | 18:28 | |
*** ccamposr has quit IRC | 18:46 | |
tomtom001 | johnsom: I have applied the patch and now everything is function perfectly in my octavia deployment. Thank you!! for your assistance. | 18:47 |
johnsom | Sure, NP. Glad you are up and running | 18:47 |
*** trown|lunch is now known as trown | 18:58 | |
*** gcheresh has quit IRC | 19:40 | |
*** salmankhan has joined #openstack-lbaas | 19:49 | |
openstackgerrit | Merged openstack/octavia master: Add missing import octavia/opts.py https://review.openstack.org/634441 | 20:41 |
colin- | is there an easy way to get detail about a listeners in an ERROR provisioning state? | 21:10 |
colin- | so far i'm going only on: ERROR | 21:11 |
colin- | Provisioning failed | 21:11 |
johnsom | Yeah, the additional details are in the controller worker or health manager log file | 21:12 |
colin- | got it | 21:12 |
*** celebdor has quit IRC | 21:32 | |
*** salmankhan has quit IRC | 21:36 | |
*** trown is now known as trown|outtypewww | 21:42 | |
cgoncalves | johnsom, do you have any flavor patch open that touch on spare pool area? running master, HK can't create amps | 22:28 |
cgoncalves | 'revert' method on 'octavia.controller.worker.tasks.compute_tasks.CertComputeCreate==1.0' requires ['flavor'] but no other entity produces said requirements | 22:28 |
cgoncalves | MissingDependencies: 'execute' method on 'octavia.controller.worker.tasks.compute_tasks.CertComputeCreate==1.0' requires ['flavor'] but no other entity produces said requirements | 22:28 |
johnsom | Yes I do | 22:29 |
cgoncalves | oh, I see it | 22:29 |
cgoncalves | https://review.openstack.org/#/c/632594/ | 22:29 |
johnsom | Yep, that is the one | 22:29 |
cgoncalves | thanks. I'll just rollback to a pre-flavor hash and continue for now | 22:33 |
johnsom | Or you can grab the top of the chain and test the last two patches.... grin | 22:35 |
johnsom | https://review.openstack.org/#/c/632842 | 22:35 |
cgoncalves | well, what I'd like to do would end up help validating your spare pool patch | 22:37 |
cgoncalves | how receptive are folks with another new tempest job? :) | 22:38 |
johnsom | Yeah... spares, are .... ummm... | 22:38 |
johnsom | I think it might have to be a multi-node as I don't think we can have enough amps booted on a single node job. | 22:39 |
johnsom | I still need to figure out why the ipv6 tests randomly fail on the multi-node tests. My guess is there is a neutron issue with ipv6, but haven't run it to ground yet | 22:40 |
*** KeithMnemonic has quit IRC | 22:40 | |
cgoncalves | 2 amps should do it: 1 associated to a LB, 1 in spare pool. test would be verifying the one in spare pool takes over when active is failed over | 22:40 |
johnsom | I was thinking it might peak at 3 | 22:43 |
cgoncalves | correct, it might. we can do multi-node if we have to | 22:49 |
*** takamatsu has quit IRC | 23:26 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!