Tuesday, 2018-08-14

*** rcernin_ has joined #openstack-lbaas00:02
*** rcernin has quit IRC00:03
*** rcernin has joined #openstack-lbaas00:29
*** rcernin has quit IRC00:29
*** rcernin has joined #openstack-lbaas00:30
abaindurBTW it would be nice if we could specify the network names/sec groups/flavors by name as we can with the ssh key and imgage tag00:31
*** rcernin_ has quit IRC00:32
*** longkb has joined #openstack-lbaas00:37
*** longkb has quit IRC00:57
*** korean101 has quit IRC00:57
*** korean101 has joined #openstack-lbaas01:04
*** abaindur has quit IRC01:42
*** abaindur has joined #openstack-lbaas01:42
*** abaindur has quit IRC01:45
*** abaindur has joined #openstack-lbaas01:46
*** hongbin has joined #openstack-lbaas02:01
*** longkb has joined #openstack-lbaas02:12
*** hongbin has quit IRC03:39
*** abaindur has quit IRC04:06
*** abaindur has joined #openstack-lbaas04:06
*** KeithMnemonic has quit IRC04:08
*** celebdor has joined #openstack-lbaas04:19
*** abaindur has quit IRC04:43
*** abaindur has joined #openstack-lbaas04:45
*** abaindur has quit IRC05:25
*** pcaruana has joined #openstack-lbaas06:44
*** rcernin has quit IRC07:02
*** openstackgerrit has joined #openstack-lbaas07:07
openstackgerritYang JianFeng proposed openstack/python-octaviaclient master: Add l7policy and l7rule to octavia quota  https://review.openstack.org/59156807:07
*** salmankhan has joined #openstack-lbaas07:29
*** salmankhan has quit IRC07:33
openstackgerritYang JianFeng proposed openstack/python-octaviaclient master: Add l7policy and l7rule to octavia quota  https://review.openstack.org/59156807:55
*** openstackstatus has quit IRC08:12
*** ktibi has joined #openstack-lbaas08:13
openstackgerritMin Sun proposed openstack/octavia-dashboard master: Cannot update ssl certificate when update listener  https://review.openstack.org/55031308:14
openstackgerritYang JianFeng proposed openstack/octavia master: [WIP] Add quota support to octavia's l7policy and l7rule  https://review.openstack.org/59062008:36
*** salmankhan has joined #openstack-lbaas09:10
*** salmankhan1 has joined #openstack-lbaas09:15
*** salmankhan has quit IRC09:16
*** salmankhan1 is now known as salmankhan09:16
openstackgerritYang JianFeng proposed openstack/octavia master: Add quota support to octavia's l7policy and l7rule  https://review.openstack.org/59062009:22
*** openstackstatus has joined #openstack-lbaas09:41
*** ChanServ sets mode: +v openstackstatus09:41
*** yboaron has joined #openstack-lbaas09:46
openstackgerritYang JianFeng proposed openstack/octavia master: Add quota support to octavia's l7policy and l7rule  https://review.openstack.org/59062009:49
*** yboaron_ has joined #openstack-lbaas10:02
*** yboaron has quit IRC10:05
*** yboaron_ has quit IRC10:38
*** yboaron_ has joined #openstack-lbaas10:59
*** savvas has joined #openstack-lbaas11:56
*** amuller has joined #openstack-lbaas12:08
*** amuller has quit IRC12:08
*** longkb has quit IRC12:13
mnaserrm_work: it is a processpoolexecutor12:27
mnaserhttp://paste.openstack.org/show/728008/ => ps auxf output12:28
*** celebdor has quit IRC12:28
mnaserrm_work: http://paste.openstack.org/show/728009/ output of how much threads are up12:29
mnaserit looks like health manager is only 2 processes12:31
mnaserand i shouldnt have 20?12:31
mnaseryeah all my other deploys are 2 processes12:32
mnaser61 more processes12:34
savvashi johnsom how are you today12:36
mnaserok according to lsof, it's the udp listener that ends up with all tehse threads12:36
savvasdo you have any idea why this would (consistently happen)? http://paste.openstack.org/show/728010/12:36
cgoncalvessavvas, multiple images with same name. xgerman_ could probably help you better as that seems to be ansible-octavia12:54
savvashe's on vacation :)12:55
savvashe recommended I reach out to johnsom12:55
cgoncalvesI could have a look at ansible-octavia but not just right now, sorry12:55
savvasthat's alright, if you do get a chance let me know12:56
savvasthe playbook successfully completes a second time but skips that part. It also doesn't seem to create the api endpoints properly12:57
savvasI get response on port 9876 but v2.0/lbaas/loadbalancers leads to a not found page12:57
mnaseri have a feeling that this is the root cause -- https://github.com/openstack/octavia/commit/98484332c4ff7deeebd749d0f5cd36b7679cd1bc#diff-0fb9e31059c066ef1f145f6a872f699313:08
*** amuller has joined #openstack-lbaas13:34
devfazmnaser: are you using ubuntu?13:41
mnaserdevfaz: ubuntu amphoras, centos controller13:41
devfazmnaser: had similar issue with ubuntu python-pkg of octavia. Try to use only "pip install ..." without any pkged versions13:42
mnaserdevfaz: how many octavia-health-manager processes do you have13:42
cgoncalvessavvas, I thought that task ("Get curremt image id") to be from openstack-ansible-os_octavia but I can't find any references13:42
devfazhttps://pastebin.com/MH7BzVPp13:43
mnaserdevfaz: so it looks like there's two proceses only13:47
mnaserwhich means you're probably running pre that commit13:47
devfaz%prog 3.0.0.0b4.dev1913:48
savvasyes cgoncalves, for me that gets executed when running os-otavia-install.yml13:48
*** KeithMnemonic has joined #openstack-lbaas13:48
savvaswhich is part of Openstack Ansible13:48
devfazmnaser: docker-iamge was build 5days ago from master.13:49
devfazmnaser: I will rebuild with current master. give me a minute.13:49
mnaserdevfaz: thats weird, i have a ton more processes..13:49
mnaserthis machines has 40 cores though13:49
savvashttp://paste.openstack.org/show/728014/ this is the playbook cgoncalves13:49
devfazmnaser: python -m pip install -c https://raw.githubusercontent.com/openstack/requirements/master/upper-constraints.txt -r requirements.txt13:49
devfazmnaser: try this to update/fix your python-module dep.13:50
mnaseri cant really just do that, this is a deployment done via openstack ansible13:50
devfaztook me a while to detect this.13:50
mnaserso i would need to make sure it happens cleanly13:50
*** fnaval has joined #openstack-lbaas13:56
cgoncalvessavvas, ok, so I see that is on stable/queens or earlier. things have changed substantially in recent master14:14
cgoncalvessavvas, the os_image_facts ansible module can only return facts of a single image and it is confused because there are more than one with that image name14:15
cgoncalvesprobably openstack-ansible-os_octavia misses a task that deletes existing amphora image before uploading new one14:17
cgoncalvescertainly os_image_facts could be extended to allow retrieving latest created image (and retrieval by tag name)14:18
cgoncalvesI have an idea... borrow code from octavia-undercloud role in tripleo-common :)14:21
savvasyee I was trying to let this run as much out of the box as possible, since we want to use and maintain it in multiple locations14:27
cgoncalvessavvas, there's a task at the end to delete old image from glance. how did you end up with multiple ones?14:34
*** ktibi has quit IRC14:57
*** yboaron_ has quit IRC15:17
*** savvas has quit IRC15:36
*** pcaruana has quit IRC16:02
*** rpittau has quit IRC16:09
openstackgerritMichal Rostecki proposed openstack/octavia master: devstack: Define packages for (open)SUSE  https://review.openstack.org/59177416:17
openstackgerritMerged openstack/octavia-dashboard master: Cannot update ssl certificate when update listener  https://review.openstack.org/55031316:22
*** Swami has joined #openstack-lbaas17:10
colby_Do I need to change the network driver to work with neutron? Should it be: allowed_address_pairs_driver or noop? My setup it not creating the vip port so Im trying to troubleshoot17:30
rm_workmnaser: hmmm, yeah umm, do you run neutron-lbaas also or just octavia? is the status update queue possibly turned on by accident?17:53
mnaserrm_work: octavia only17:53
rm_workyeah so in that case17:53
mnaserwhat is the status update queue?17:54
rm_workdefinitely check config to make sure that's off17:54
mnaserdoesnt health-manager only take care of udp polls?17:54
mnaserit looks like it spawns two processes, one that does udp and the other one that picks up the requests17:54
rm_workthe event stream17:55
rm_worklet me see where it's configured17:55
rm_workhttps://github.com/openstack/octavia/blob/stable/queens/etc/octavia.conf#L77-L8317:55
rm_workthat can get turned on accidentally by some deployment tools i think17:55
rm_workif they once assumed neutron-lbaas support17:55
rm_workand it will cause issues17:55
rm_worksimilar to what you're seeing17:56
rm_workbut it's a longshot maybe17:56
rm_workif that IS enabled tho, then that'd be an easy fix :)17:56
mnaserrm_work: hm, its enabled with osa octavia18:00
mnaseroctavia_sync_provisioning_status that is18:00
rm_workwhat about the driver18:01
mnaseroctavia_event_streamer: False and octavia_sync_provisioning_status: false ?18:01
mnaserwhat do you mean what about the driver?18:01
rm_workevent_streamer_driver = noop_event_streamer18:01
rm_workor queue_event_streamer18:01
mnaserwhat does the event_streamer_driver control18:01
rm_workbadness18:02
rm_worklook in the config that's provisioned18:02
rm_workon one of your HM controllers18:02
rm_workand see if event_streamer_driver is set18:02
rm_workI THINK the `sync_provisioning_status` option is ignored as long as it's on the noop driver18:02
rm_workbut if you are on the `queue_event_streamer` then that *is* a problem, fix that and prolly it'll resolve itself18:03
rm_workyou may also be wrecking some poor RMQ somewhere lol18:03
rm_worknormally it is used to stream status change events to neutron-lbaas, but if you don't have neutron-lbaas, it is problematic18:03
rm_workit writes to a queue that never empties and everything goes to shit T_T18:04
rm_workif they are both False tho, prolly it's not that18:04
*** salmankhan has quit IRC18:07
johnsommnaser sync_provisioning_status is your problem, shut that off.  German made a mistake in the OSA role18:08
johnsomcolby_ It needs to be allowed_address_pairs to have things created in neutron.18:10
rm_workyeah waiting for mnaser to tell me if the driver is actually enabled on his HM's config18:11
johnsomEven with the noop event driver, that sync setting will do bad things18:12
rm_workah i thought it was ignore18:12
johnsomWell saw it right away in you deployment, he had noop event, but left that provisioning sync on in the role and it hosed up the HM.18:12
rm_workwhere did mnaser go lol :P18:13
mnaserin a call, one sec :p18:14
johnsomI don't know how he does everything he does as it is....18:14
rm_worklol18:14
rm_workright18:14
colby_johnsom Thanks. So Im seeing some weirdness. So when I create the load balancer Im specifying the subnet_id and the network_id. But the ip address that it comes up with is not part of the subnet18:15
colby_2018-08-14 18:00:46.018 38226 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.network_tasks.AllocateVIP' (bdfcf0dc-a4b2-4f59-a34d-ca083ef79ba7) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:19418:16
colby_2018-08-14 18:00:46.018 38226 DEBUG octavia.controller.worker.tasks.network_tasks [-] Allocate_vip port_id ed4220ca-2860-4597-bd0c-1448e0b49d0a, subnet_id 0f09cbd9-711d-4582-ac02-aa64ccd038f2,ip_address 198.51.100.1 execute /usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/network_tasks.py:32318:16
colby_2018-08-14 18:00:46.018 38226 INFO octavia.network.drivers.neutron.allowed_address_pairs [-] Port ed4220ca-2860-4597-bd0c-1448e0b49d0a already exists. Nothing to be done.18:16
colby_2018-08-14 18:00:46.212 38226 DEBUG neutronclient.v2_0.client [-] Error message: {"NeutronError": {"message": "Port ed4220ca-2860-4597-bd0c-1448e0b49d0a could not be found.", "type": "PortNotFound", "detail": ""}} _handle_fault_response /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:25818:16
johnsomcolby_ Do you get the right thing with just specifying subnet_id?18:16
colby_no It still gets the same weird ip18:17
johnsomAre you using neutron-lbaas or native Octavia?18:17
colby_native octavia18:17
johnsomGood. Hmmm, which version of Octavia?18:17
colby_pike: 1.0.2-1 (centos)18:18
colby_the ip on that subnet should be 192.168.2.0/2418:19
johnsomOn create are you specifying an IP or just the network/subnet?18:21
colby_tried just subnet and subnet & network18:22
colby_like so: openstack loadbalancer create --name lb15 --vip-subnet-id 0f09cbd9-711d-4582-ac02-aa64ccd038f2 --vip-network-id aed09c75-74e0-447a-936e-7362ad77597b18:23
johnsomOk, can you pastebin a "openstack subnet show 0f09cbd9-711d-4582-ac02-aa64ccd038f2" and "openstack network show aed09c75-74e0-447a-936e-7362ad77597b"?18:24
colby_sure18:24
colby_https://pastebin.com/9qUg8zSq18:26
johnsomWhen you first ran the load balancer create command, did the return data show that IP as well? 198.51.100.118:29
colby_yes18:29
colby_does the api use noop? just seeing that in the logs: 2018-08-14 18:00:32.754 1350542 DEBUG octavia.network.drivers.noop_driver.driver [req-c15a1e74-74a1-48f4-878d-7080e150ea7b e28435e0a66740968c523e6376c57f68 18882d9c32ba42aeaa33c4703ad84b2c - default default] Network NoopManager no-op, get_network network_id aed09c75-74e0-447a-936e-7362ad77597b get_network /usr/lib/python2.7/site-packages/octavia/network/drivers/noop_driver/driver.py:11918:31
johnsomThat is your problem. You are getting a "fake" IP from the noop driver.  No-Op means it does no operations against other services, such as neutron/nova/glance/etc.18:32
johnsomI was just opening a window to see if that was a no-op range18:32
colby_oh I did not see a driver option for API config just worker18:32
rm_workit's all the same config18:32
rm_workyou should be using an identical config for all processes, really18:33
colby_ah gotcha.18:33
colby_ok that makes it more complicated since I use the puppet modules. At least I know what I need to do now though18:33
johnsomYeah, we did not do a good job on that section of the config, it is confusing that other processes use stuff in the "controller_worker" section18:34
rm_workhmm, which puppet modules?18:34
colby_the openstack ones18:35
rm_workis there something which has a bad idea about how to do octavia config deployment?18:35
colby_I just need to include with the api server and NOT manage packages and not enable the service18:35
rm_workopenstack has puppet modules? i guess i am out of the loop lol18:35
rm_worki thought everyone used ansible18:35
colby_haha yea. We use puppet/foreman for all our deployment18:36
*** sapd1 has quit IRC18:38
mnaserok18:58
mnaserim done18:58
colby_Ok progress. Its creating the port on the correct subnet/network now. But its not creating a port on the octavia managment net. Ill dig some more on that18:58
mnaserjohnsom, rm_work: ok i'm checking if its enabled now18:59
mnaserevent_streamer_driver = queue_event_streamer19:03
mnasersync_provisioning_status = True19:03
johnsomOuch, double trouble19:04
rm_workyeah that's it19:04
rm_worknoop and False plz19:04
johnsommnaser Use the defaults: https://github.com/openstack/octavia/blob/master/etc/octavia.conf#L9119:04
johnsomWith both of those on I am surprised you made it this long....19:05
mnaserlol19:05
mnaseroctavia_sync_provisioning_status/octavia_event_streamer both True19:05
mnaserin stalbe/queens19:05
mnasershould we fix that?19:05
rm_workdefault?!19:06
rm_workmnaser: i think they have always defaulted to False ...19:06
rm_worksince basically ever19:06
rm_workit must be whatever deployment system19:07
mnaserhttps://github.com/openstack/openstack-ansible-os_octavia/blob/stable/queens/defaults/main.yml#L281-L28619:07
mnasernegatory19:07
mnaserfor openstack ansible, sorry19:07
rm_workyeah19:07
rm_workso yes, as johnsom was saying, I guess a mistake was made setting that up in OSA19:07
rm_workif we can fix it, maybe we should19:07
johnsomI thought German did fix those in OSA, but if they are still there, yes, it should be fixed19:07
mnaseri kinda wanna push a stable/pike only fix19:07
rm_workwell, I think in Pike most people ran it behind n-lbaas still19:08
rm_workthough technically it can run standalone19:08
mnaseramateurs19:08
mnaser:P19:08
rm_worklol19:08
mnaserhttps://github.com/openstack/openstack-ansible-os_octavia/commit/c8f4f275a1e9968b1f049c022639adadc4333b6319:08
mnaserthis commit added it19:08
rm_workhey I still think ALL ya'll are amateurs, for running stable branches :P19:08
johnsomYeah, it could be that pike OSA still defaults to neutron-lbaas19:08
mnaseri think what i want to do is19:08
mnaserif octavia_v1 is false and octavia_v2 is true then octavia_event_streamer/octavia_sync_provisioning_status = false else true19:09
rm_workyeah19:09
rm_worki was gonna suggest something like tha19:09
johnsomeh gad, it's both true in queens too.19:09
johnsommnaser +119:09
johnsomWell, +2, I'm a core there19:09
mnasersorry19:09
mnaseri meant stable/queens19:10
mnaseri assume queens didnt default to v2 only19:10
mnaserin osa19:10
johnsomYep, it looks like queens has v1 enabled19:10
mnaserill do the patch that enables it19:10
mnaserand backport it through19:11
johnsomThanks.19:11
mnaserjohnsom: https://review.openstack.org/591829 Enable event streamer and provisioning status sync for V1 API19:17
mnaserrm_work: ^19:17
jitekaHey guys, I'm still trying to get my LB working in Queens using ubuntu based amphora image xenial19:34
jitekaProblem : I'm getting "Failed to bring up eth1" when octavia-worker contact amphora Rest API on /0.5/plug/vip/<vip address>19:34
jitekaI disabled revert with task_flow to get a chance to ssh and troubleshoot amphora and I realise 2 things19:34
jiteka1. BACKUP amphora node is not yet configured when the LB failed to be created due to that ifup problem on MASTER19:34
jitekaand when I'm doing my cURL I just get {"message":"OK","details":"VIP 10.63.68.30 plugged on interface eth1"}19:34
jitekaamphora-agent : ::ffff:10.63.8.9 - - [14/Aug/2018:17:16:41 +0000] "POST /0.5/plug/vip/10.63.68.30 HTTP/1.1" 202 70 "-" "curl/7.29.0"19:34
jiteka2. MASTER amphora node that failed is left half-configured so to avoid {"message":"Interface already exists"} on cURL on /plug/vip19:34
jitekaI deleted the netns amphora-agent then tried again and got a 202 OK too on that node19:34
jitekaBut when it fail the first time on the master, interface cfg for eth1 look like this :19:34
jitekahttp://paste.openstack.org/show/728039/19:34
jitekaand when it works on the cURL command it looks like this :19:34
jitekahttp://paste.openstack.org/show/728040/19:34
jitekaeth1 is no more static but getting his config from dhcp19:34
jitekaAny idea, suggestion why this is happening ?19:34
johnsomjiteka Did you get a chance to do the line by line test I recommended before my vacation?19:41
johnsomMy guess is that overlapping host route is not making ubuntu happy19:41
jitekanot sure how to proceed19:41
johnsomOk, let me give you steps19:42
johnsomIn the amphora:19:42
johnsomip netns exec amphora-haproxy bash19:42
johnsom(you need to be root)19:42
jitekaok19:42
johnsomifconfig eth1:0 down19:42
jitekaI  remember you showed me that19:42
johnsomifconfig eth1 down19:42
johnsomThose may fail, that is ok19:42
johnsomThen ifconfig eth1 up19:43
johnsomifconfig eth1:0 up19:43
johnsomOne or more of those might fail too, that is also ok19:43
johnsomThen go through the extra config lines for the host routes:19:43
johnsomip route add -net 169.254.169.254/32 gw 10.63.68.2 dev eth119:44
jitekaeth1:0 fail yes :19:44
jitekaCannot assign requested address19:44
johnsomip route add -net 198.74.40.0/24 gw 10.63.68.2 dev eth119:44
johnsomip route add -net 10.0.0.0/8 gw 10.63.68.2 dev eth119:44
johnsomip route add -net 172.16.0.0/12 gw 10.63.68.2 dev eth119:44
jitekaError: ??? prefix is expected rather than "-net".19:44
johnsomOh, right, those are the other command syntax, one second19:44
johnsomremove the "ip" from the front of those19:45
johnsomroute add -net 12.148.80.0/22 gw 10.63.68.2 dev eth119:45
johnsom^^^ that was the last in the top list19:45
johnsom /sbin/ip route add 10.63.68.0/22 dev eth1 src 10.63.68.30 scope link table 119:45
johnsom /sbin/ip route add default via 10.63.68.1 dev eth1 onlink table 119:45
jitekaNetwork is unreachable19:45
johnsom /sbin/ip route add 169.254.169.254/32 via 10.63.68.2 dev eth1 onlink table 119:46
johnsomWhich line?19:46
jiteka12.148.80.0/2219:46
jiteka/sbin/ip route add default via 10.63.68.1 dev eth1 onlink table 1 (OK)19:47
johnsomOk, darn, the interface must not have been up19:47
johnsom /sbin/ip route add 198.74.40.0/24 via 10.63.68.2 dev eth1 onlink table 119:47
johnsom /sbin/ip route add 10.0.0.0/8 via 10.63.68.2 dev eth1 onlink table 119:47
johnsom /sbin/ip route add 172.16.0.0/12 via 10.63.68.2 dev eth1 onlink table 119:47
johnsom /sbin/ip route add 12.148.80.0/22 via 10.63.68.2 dev eth1 onlink table 119:48
johnsom /sbin/ip rule add from 10.63.68.30/32 table 1 priority 10019:48
johnsomMy suspicion is that last line is going to fail'19:48
jitekahttp://paste.openstack.org/show/728041/19:49
jitekano hard fail here19:49
jitekaat least not visible in command output19:50
johnsomHmmm, ok, so it must be in the first block.19:50
jiteka/sbin/ip route list19:51
jitekait should give me the list of rules I just added right ?19:51
johnsomExcept that top block where you got that error about network not reachable.19:51
johnsomDid you get that on all of the lines or just the first one?19:51
jitekahttp://paste.openstack.org/show/728042/19:53
jitekainvalid argument on :19:53
jiteka/sbin/ip route add 10.63.68.0/22 dev eth1 src 10.63.68.30 scope link table 119:53
johnsomCan you do an openstack subnet show again for that subnet? I want to look at the host routes configured there19:57
jitekasure19:57
jitekadoing it now19:58
jitekahttp://paste.openstack.org/show/728044/19:58
johnsomWell, I'm not sure what the issue is. I think it's the overlapping host route, but not sure.20:01
johnsomLet me build a devstack, create a matching subnet and give it a go local so I can poke at this without the IRC latency. grin20:01
*** amuller has quit IRC20:02
jitekajohnsom: thanks, I will be on my way home soon but will resume on this on thursday (bank holiday tomorrow in France)20:02
johnsomOk, if you have an open story for this I can post my results there.20:03
johnsomhttps://storyboard.openstack.org/#!/dashboard/stories20:03
johnsomIt will take me about a half hour to stack and get that subnet configured, etc.20:03
jitekaahaha yes I know that devstack is taking time20:04
jitekaI anticipated that :D20:04
colby_will disable_revert = true leave the amphora in place on error. The worker is having trouble connecting to the amphora on port 9443. Im able to ssh to the instance but it gets destroyed right away before I can troubleshoot.20:05
jitekahmm what could I put here on that story as "Task title"20:07
johnsomcolby_ Yes. You can also stop the health manager process to stop it from being failed over to a new vm20:08
johnsomjiteka "load balancer create fails with host routes"20:08
johnsomShould do the trick20:08
jitekahttps://storyboard.openstack.org/#!/story/200344120:10
johnsomThank you. I will post my findings there20:10
jitekaI think also that I will be in that configuration :20:13
jitekahttps://bugs.launchpad.net/octavia/+bug/148827920:13
openstackLaunchpad bug 1488279 in octavia "Amphora fails with member on same subnet as vip" [Critical,Fix released]20:13
jitekabecause I'm trying to get my VIP on the same subnet that futur listeners20:13
jitekaI don't see that issue yet20:13
jitekabut I see that status is "fixed released"20:14
jitekafix20:14
johnsomYeah, that bug was fixed a long time ago.20:16
jitekacolby_: it actually leave it in PENDING_CREATE, so after when I need to delete it I need to go in the DB to update the octavia.load_balancer row20:16
jitekaupdate load_balancer set provisioning_status = 'ERROR' where id = '<lb_uuid>'20:16
jitekacolby_: API prevent you to delete lb with provisionning status = PENDING_CREATE20:17
johnsomPENDING_CREATE will go to ERROR once the retries have given up. It's a configuration setting. The default is super long for folks using bad hypervisors like virtualbox.  They will either go to ERROR or ACTIVE.20:17
jitekajohnsom: ok that's a good news20:17
colby_yea Ive had to do that before. Thanks.20:17
johnsomYou have to be careful doing that with the DB as if it's in PENDING_* that means one of the controllers is actively working on that load balancer and you could get things in an inconsistent state by changing the status while a controller has ownership of the object.20:18
johnsomYou will get at least a bunch of errors in the log.20:19
colby_ok so looks like my problem is just a timeout maybe. I can telnet to port 9443 from the worker node now that the instance is staying. so maybe the amphora just was not up all the way when the worker is trying to connect.20:20
*** salmankhan has joined #openstack-lbaas20:20
johnsomcolby_ With a good cloud it should take about 30 seconds to come up.  Is this running on bare metal or in a hypervisor?20:21
colby_hypervisor20:21
johnsomWhich one?20:22
colby_kvm20:22
johnsomOk, so you are running kvm inside kvm?20:22
colby_oh no20:23
colby_its spinning up amphora on our compute nodes20:23
colby_compute nodes are baremetal running kvm20:23
johnsomOn the compute host, can you look in /proc/cpuinfo and see if you find vmx or smx?20:23
colby_vmx20:25
colby_these are dual Intel(R) Xeon(R) CPU E5-2697 v420:25
johnsomOk, yeah, so it should boot up and be ready in about 30 seconds.20:25
colby_the instance spins up quickly20:25
colby_I can ssh in...in less than 30 seconds20:26
johnsomHmm, so it's not likely the timeout unless you have changed the defaults to something low20:26
rm_worksounds like a bad return issue like we had a long time ago20:51
rm_workwhere the agent responded badly on the initial startup if you hit it too soon20:52
colby_I get: 2018-08-14 20:46:26.119 6118 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.10.0.7', port=9443): Max retries exceeded with url: /0.5/plug/vip/192.168.2.8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fa4c86c1650>: Failed to establish a new connection: [Errno 111] Connection refused20:52
colby_',))20:52
colby_Here are the logs of the amphora-agent: https://pastebin.com/gXXPdPch20:53
colby_the timestamps are about the same so maybe its taking longer to get the agent up than the worker is expecting20:54
johnsomThis is not normal: [CRITICAL] WORKER TIMEOUT20:54
johnsomIt sounds like gunicorn is crashing20:54
colby_it starts another worker after that20:55
colby_and that one doesn't crash20:55
rm_workyeah so prolly the initial one crashes during the request20:55
rm_workand that kills it20:55
rm_workbecause the controller GETS something, and it's bad20:55
johnsomHmm, I don't see those20:55
colby_sorry I didn't paste those. Ill redo20:55
rm_workyeah crashes or hangs20:56
rm_worki wonder if it's handing on the config action20:56
rm_work*hanging20:56
rm_worklike, on an inteface up or something20:56
colby_https://pastebin.com/Zt0vShuJ20:56
colby_the ssl negotiatin errors are me testing via telnet and curl from the worker20:57
rm_workthe issue is almost certainly that timeout tho20:57
colby_oops pastebin had a captcha I didn't see20:58
colby_now that link should work20:58
rm_workwe only try repeatedly to get a connection if we get a timeout/refused... if we get like a partial data return that's broken, we don't retry20:58
rm_workwe bail and recycle the amp20:58
rm_workit worked, i just had to captcha :P20:58
colby_How can I test using curl (and using the certs) to see what kind of response I get21:01
rm_worki don't have the command handy, but i remember basically i just had to go through the curl manpage and pick all the cert related stuff21:02
colby_sorry yea I meant which cert exactly do I need to connect to the amphora. I have split CA certs setup21:02
rm_workoh21:02
rm_workyou validate using the cert that is the CA used to create the amp cert21:03
rm_workIIRC21:03
rm_workbut i've never actually done split-cert lol21:03
colby_The method is not allowed for the requested URL21:14
colby_is that due to certificate issue21:14
colby_?21:14
colby_trying to connect to: https://10.10.0.7:9443/0.5/plug/vip/192.168.2.821:14
*** abaindur has joined #openstack-lbaas21:21
abaindurjohnsom: or anyone else that can answer this question for me: What exactly is the bind_ip and controller_ip_port_list in the [health_manager] for, and what is it to be set to?21:23
*** salmankhan has quit IRC21:23
abainduris it the IP of the host/physical machine on which the octavia health-manager service is running?21:24
*** salmankhan has joined #openstack-lbaas21:24
abaindurand why is it in a list format? Is this assuming if we have health-manager running on multiple hosts, the conf value on every host should be the same and include the public IPs of every host running health-manager?21:25
colby_controller_ip_port_list is a list of the hosts running health_manager. The amphora use this list to connect to as far as I can tell21:26
abaindurFor example we have health-manager and worker running on 2 hosts, suppose with management IP 10.4.0.2 and 10.4.0.3 (this is the IP of physical machine, not at all related to the octavia LB network)21:26
colby_I have 2 health managers so I have both in there21:26
abaindurbind_ip would be different on each host - it would be 10.4.0.2 on one, 10.4.0.3 on the other21:27
colby_correct21:27
colby_the list would include both for the amphora21:27
abaindurand "controller_ip_port_list = 10.4.0.2:5555, 10.4.0.3:5555" - on both the hosts .conf files?21:27
colby_I believe so (I could be wrong. That's how I did it and I don't get errors any more)21:28
abaindurwhat did you also set for the heartbeat_key?21:28
abaindurDoes this really matter...? are we just supposed to set it to something random?21:29
colby_random key21:29
colby_random string rather21:29
abaindurhmm ok. right now i have set the bind_ip and controller_ip_port_list to 127.0.0.1. We have it only on one host21:29
abaindurbut i dont see health-manager doing anything in the logs.21:30
abaindurhowever, the loadbalancer is working21:30
abaindurwhat is also strange is the provisioning_status | ACTIVE21:30
abaindurbut the operating_status shows as offline21:30
abaindurIs this due to health-manager not doing anything, which due to wrong IP used for bind_ip/controller port list (127.0.0.1)21:31
abaindurthe LB I deployed was without a healthmonitor21:31
colby_Im not sure on that. Im still having trouble getting the lb to spin up successfully. Im having issues with the worker connecting to the amphora21:37
abaindurwhat is the issue?21:39
colby_worker cant connect to the amphora21:40
abainduri had mentioned the LB network needs to basically be a provider network, from what i could tell21:40
colby_it times out. Its not a firewall issue. I can telnet to the port from the worker21:40
abaindursince the octavia worker runs as a process on the host, it will use the physical machines' networking/routing table21:40
abaindurso here we have a host, which needs to be able to talk ot a VM. for us, that means it cant be an isolated tenant network, it has to be a provider network21:41
colby_I can connect from command line so the process should be able to fine. I think there is something more going on21:41
colby_the amphora worker crashes21:42
abaindursecurity group of the amphora?21:42
abaindurWe just put it on an allowall sec group that allows all traffic21:42
abainduris the flavor large enough? im giving my amp 1 cpu, 2 GB, and 10GB disk21:42
colby_no I can connect to 9443 from the worker fine.21:42
colby_1cpu 1GB Ram21:43
colby_I was going to try curl from the worker to duplicate what it tried to do but Im having trouble doing it due to hostname mismatch21:44
colby_using --resolve didn't help either21:45
*** salmankhan has quit IRC22:04
abaindurcolby_: how do you know the amphora crashes?22:04
abaindurmaybe then it is problem with your disk image when you built it22:04
colby_no it seems to happen when the worker makes the connection attempt22:08
colby_the amphora worker times out, dies then starts another worker22:09
colby_which is why I wanted to try and test a connection with curl to see if I see any issues there22:09
*** fnaval has quit IRC22:16
colin-what is the most direct way to determine the nova ID(s) of amphora agent for a given loadbalancer in the cli?22:27
johnsomAs an admin you can use the amphora admin API22:37
colin-found what i needed that way, thanks johnsom22:48
abaindurspeaking of, what is the [amphora_agent] section used for?22:48
johnsomabaindur those are settings rendered inside the amphora-agent confguration file. Ignore it in your octavia.conf22:52
johnsomIt's an oddity with how our agent is setup and oslo config.  I removed them from the sample octavia.conf, but people kept putting them back, so I stopped fighting it22:53
abainduryea we left it blank. didnt see any issues, the LB was working22:57
abaindurjohnsom: btw did you see my msgs above about the controller_ip_port_list and our operating_status showing as OFFLINE? even though the LB is working23:00
abaindurprovisioning_status shows as  ACTIVE23:01
abaindurright now i have set the bind_ip and controller_ip_port_list to 127.0.0.1. We have it only on one host23:01
abaindurso I suspect its because the amphora VM cant heartbeat to the health-manager? we see nothing in the health-manager logs23:02
johnsomNo I didn't see it. I have four conversations going at the same time right now23:02
johnsomSo, yeah, exactly, you are telling the amphora to send their heartbeat to themselves, so we never get status or stats from those amphora.23:02
colby_So any ideas on how to test the amphora connection. I got curl working but get a method not allowed to 0.5/plug/vip/192.168.2.2323:05
abaindurwhats your .conf file look like?23:05
abaindurMaybe its your certs?23:05
colby_mine is complicated since I have split CA cert (one for client and one for server)23:06
abaindurdoesnt the worker talk to the amphora over https REST api. so if you can ping/telnet working to the amphora VM from the host and its on an allow all security group, sounds like it cant authenticate?23:07
abaindurtry the same certs...?23:07
colby_yes, Which is why I was trying to use curl with the certs23:07
colby_to mimic the worker23:07
johnsomcolby_ So, after you set all that up in the config for the new certs, did you build a new amp?  Those certs are loaded at amp boot time23:07
colby_you mean the image?23:08
abaindurbuild a new amp == build an entire new disk image? or just create a LB to spin up a new AMP23:08
johnsomNo, not the image, at "openstack loadbalancer create" time23:08
colby_yes Ive made many attempts to spin up new lb with the certs in place correctly23:09
abaindurwhy dont you try w/ the same certs23:10
abainduri just ran the sample script in the repo23:10
abaindurthen scp -r the output folder onto my host23:10
johnsomcolby_ Sorry, too many conversations going on, so refocusing here23:10
johnsomcolby_ Are you getting SSL errors in the syslog or amphora-agent log on the amp?23:11
colby_not from when the worker tried to connect23:11
colby_it just had that timeout23:11
johnsomOk, so these "Invalid request from ip=::ffff:10.10.0.8: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake " were from your manual tests?23:12
colby_correct23:12
colby_when I use curl pointing to my certs I dont get an ssl error either23:13
johnsomAnd all of your retries are "[Errno 111] Connection refused"23:13
johnsom?23:13
colby_I just get a method not allowed23:13
johnsomYeah, GET won't work on that path.23:13
johnsomIf you do /info you would get data23:13
johnsomoh, well 0.5/info23:14
colby_{"haproxy_version":"1.6.3-1ubuntu0.1","api_version":"0.5","hostname":"amphora-eb2ba854-d38f-4100-b5dc-578c1db556ac"}23:15
colby_so thats working from curl23:15
johnsomSo, yeah, you can get there just fine...23:15
colby_yea I only see 2 logged failures with error 11123:16
colby_or rather there are only 2 log entries about it both failing with 11123:16
colby_in the worker logs23:16
johnsomCan you pastebin the controller log from before it starts trying to plug the VIP until it errors or, 2-3 screens worth of the WARNING messages?23:16
colby_the controller worker log right:?23:17
johnsomyes please23:17
colby_https://pastebin.com/qBKpnQeF23:20
colby_I removed the private key from the logs :-)23:20
colby_I have to run but please let me know if you notice anything23:21
johnsomSure, NP. Darn it, I had those hidden, but we must have changed the task names23:21
johnsomcolby_ Yeah, you have a bad cert or config on the controller side23:22
abaindurhow do you know? (ps, no affiliation with colby, just curious)23:23
johnsomError: [('PEM routines', 'PEM_read_bio', 'no start line'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'PEM lib')]23:23
johnsomThe private key file is bad, missing, or the passphrase is wrong23:24
*** Swami has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!