*** rcernin_ has joined #openstack-lbaas | 00:02 | |
*** rcernin has quit IRC | 00:03 | |
*** rcernin has joined #openstack-lbaas | 00:29 | |
*** rcernin has quit IRC | 00:29 | |
*** rcernin has joined #openstack-lbaas | 00:30 | |
abaindur | BTW it would be nice if we could specify the network names/sec groups/flavors by name as we can with the ssh key and imgage tag | 00:31 |
---|---|---|
*** rcernin_ has quit IRC | 00:32 | |
*** longkb has joined #openstack-lbaas | 00:37 | |
*** longkb has quit IRC | 00:57 | |
*** korean101 has quit IRC | 00:57 | |
*** korean101 has joined #openstack-lbaas | 01:04 | |
*** abaindur has quit IRC | 01:42 | |
*** abaindur has joined #openstack-lbaas | 01:42 | |
*** abaindur has quit IRC | 01:45 | |
*** abaindur has joined #openstack-lbaas | 01:46 | |
*** hongbin has joined #openstack-lbaas | 02:01 | |
*** longkb has joined #openstack-lbaas | 02:12 | |
*** hongbin has quit IRC | 03:39 | |
*** abaindur has quit IRC | 04:06 | |
*** abaindur has joined #openstack-lbaas | 04:06 | |
*** KeithMnemonic has quit IRC | 04:08 | |
*** celebdor has joined #openstack-lbaas | 04:19 | |
*** abaindur has quit IRC | 04:43 | |
*** abaindur has joined #openstack-lbaas | 04:45 | |
*** abaindur has quit IRC | 05:25 | |
*** pcaruana has joined #openstack-lbaas | 06:44 | |
*** rcernin has quit IRC | 07:02 | |
*** openstackgerrit has joined #openstack-lbaas | 07:07 | |
openstackgerrit | Yang JianFeng proposed openstack/python-octaviaclient master: Add l7policy and l7rule to octavia quota https://review.openstack.org/591568 | 07:07 |
*** salmankhan has joined #openstack-lbaas | 07:29 | |
*** salmankhan has quit IRC | 07:33 | |
openstackgerrit | Yang JianFeng proposed openstack/python-octaviaclient master: Add l7policy and l7rule to octavia quota https://review.openstack.org/591568 | 07:55 |
*** openstackstatus has quit IRC | 08:12 | |
*** ktibi has joined #openstack-lbaas | 08:13 | |
openstackgerrit | Min Sun proposed openstack/octavia-dashboard master: Cannot update ssl certificate when update listener https://review.openstack.org/550313 | 08:14 |
openstackgerrit | Yang JianFeng proposed openstack/octavia master: [WIP] Add quota support to octavia's l7policy and l7rule https://review.openstack.org/590620 | 08:36 |
*** salmankhan has joined #openstack-lbaas | 09:10 | |
*** salmankhan1 has joined #openstack-lbaas | 09:15 | |
*** salmankhan has quit IRC | 09:16 | |
*** salmankhan1 is now known as salmankhan | 09:16 | |
openstackgerrit | Yang JianFeng proposed openstack/octavia master: Add quota support to octavia's l7policy and l7rule https://review.openstack.org/590620 | 09:22 |
*** openstackstatus has joined #openstack-lbaas | 09:41 | |
*** ChanServ sets mode: +v openstackstatus | 09:41 | |
*** yboaron has joined #openstack-lbaas | 09:46 | |
openstackgerrit | Yang JianFeng proposed openstack/octavia master: Add quota support to octavia's l7policy and l7rule https://review.openstack.org/590620 | 09:49 |
*** yboaron_ has joined #openstack-lbaas | 10:02 | |
*** yboaron has quit IRC | 10:05 | |
*** yboaron_ has quit IRC | 10:38 | |
*** yboaron_ has joined #openstack-lbaas | 10:59 | |
*** savvas has joined #openstack-lbaas | 11:56 | |
*** amuller has joined #openstack-lbaas | 12:08 | |
*** amuller has quit IRC | 12:08 | |
*** longkb has quit IRC | 12:13 | |
mnaser | rm_work: it is a processpoolexecutor | 12:27 |
mnaser | http://paste.openstack.org/show/728008/ => ps auxf output | 12:28 |
*** celebdor has quit IRC | 12:28 | |
mnaser | rm_work: http://paste.openstack.org/show/728009/ output of how much threads are up | 12:29 |
mnaser | it looks like health manager is only 2 processes | 12:31 |
mnaser | and i shouldnt have 20? | 12:31 |
mnaser | yeah all my other deploys are 2 processes | 12:32 |
mnaser | 61 more processes | 12:34 |
savvas | hi johnsom how are you today | 12:36 |
mnaser | ok according to lsof, it's the udp listener that ends up with all tehse threads | 12:36 |
savvas | do you have any idea why this would (consistently happen)? http://paste.openstack.org/show/728010/ | 12:36 |
cgoncalves | savvas, multiple images with same name. xgerman_ could probably help you better as that seems to be ansible-octavia | 12:54 |
savvas | he's on vacation :) | 12:55 |
savvas | he recommended I reach out to johnsom | 12:55 |
cgoncalves | I could have a look at ansible-octavia but not just right now, sorry | 12:55 |
savvas | that's alright, if you do get a chance let me know | 12:56 |
savvas | the playbook successfully completes a second time but skips that part. It also doesn't seem to create the api endpoints properly | 12:57 |
savvas | I get response on port 9876 but v2.0/lbaas/loadbalancers leads to a not found page | 12:57 |
mnaser | i have a feeling that this is the root cause -- https://github.com/openstack/octavia/commit/98484332c4ff7deeebd749d0f5cd36b7679cd1bc#diff-0fb9e31059c066ef1f145f6a872f6993 | 13:08 |
*** amuller has joined #openstack-lbaas | 13:34 | |
devfaz | mnaser: are you using ubuntu? | 13:41 |
mnaser | devfaz: ubuntu amphoras, centos controller | 13:41 |
devfaz | mnaser: had similar issue with ubuntu python-pkg of octavia. Try to use only "pip install ..." without any pkged versions | 13:42 |
mnaser | devfaz: how many octavia-health-manager processes do you have | 13:42 |
cgoncalves | savvas, I thought that task ("Get curremt image id") to be from openstack-ansible-os_octavia but I can't find any references | 13:42 |
devfaz | https://pastebin.com/MH7BzVPp | 13:43 |
mnaser | devfaz: so it looks like there's two proceses only | 13:47 |
mnaser | which means you're probably running pre that commit | 13:47 |
devfaz | %prog 3.0.0.0b4.dev19 | 13:48 |
savvas | yes cgoncalves, for me that gets executed when running os-otavia-install.yml | 13:48 |
*** KeithMnemonic has joined #openstack-lbaas | 13:48 | |
savvas | which is part of Openstack Ansible | 13:48 |
devfaz | mnaser: docker-iamge was build 5days ago from master. | 13:49 |
devfaz | mnaser: I will rebuild with current master. give me a minute. | 13:49 |
mnaser | devfaz: thats weird, i have a ton more processes.. | 13:49 |
mnaser | this machines has 40 cores though | 13:49 |
savvas | http://paste.openstack.org/show/728014/ this is the playbook cgoncalves | 13:49 |
devfaz | mnaser: python -m pip install -c https://raw.githubusercontent.com/openstack/requirements/master/upper-constraints.txt -r requirements.txt | 13:49 |
devfaz | mnaser: try this to update/fix your python-module dep. | 13:50 |
mnaser | i cant really just do that, this is a deployment done via openstack ansible | 13:50 |
devfaz | took me a while to detect this. | 13:50 |
mnaser | so i would need to make sure it happens cleanly | 13:50 |
*** fnaval has joined #openstack-lbaas | 13:56 | |
cgoncalves | savvas, ok, so I see that is on stable/queens or earlier. things have changed substantially in recent master | 14:14 |
cgoncalves | savvas, the os_image_facts ansible module can only return facts of a single image and it is confused because there are more than one with that image name | 14:15 |
cgoncalves | probably openstack-ansible-os_octavia misses a task that deletes existing amphora image before uploading new one | 14:17 |
cgoncalves | certainly os_image_facts could be extended to allow retrieving latest created image (and retrieval by tag name) | 14:18 |
cgoncalves | I have an idea... borrow code from octavia-undercloud role in tripleo-common :) | 14:21 |
savvas | yee I was trying to let this run as much out of the box as possible, since we want to use and maintain it in multiple locations | 14:27 |
cgoncalves | savvas, there's a task at the end to delete old image from glance. how did you end up with multiple ones? | 14:34 |
*** ktibi has quit IRC | 14:57 | |
*** yboaron_ has quit IRC | 15:17 | |
*** savvas has quit IRC | 15:36 | |
*** pcaruana has quit IRC | 16:02 | |
*** rpittau has quit IRC | 16:09 | |
openstackgerrit | Michal Rostecki proposed openstack/octavia master: devstack: Define packages for (open)SUSE https://review.openstack.org/591774 | 16:17 |
openstackgerrit | Merged openstack/octavia-dashboard master: Cannot update ssl certificate when update listener https://review.openstack.org/550313 | 16:22 |
*** Swami has joined #openstack-lbaas | 17:10 | |
colby_ | Do I need to change the network driver to work with neutron? Should it be: allowed_address_pairs_driver or noop? My setup it not creating the vip port so Im trying to troubleshoot | 17:30 |
rm_work | mnaser: hmmm, yeah umm, do you run neutron-lbaas also or just octavia? is the status update queue possibly turned on by accident? | 17:53 |
mnaser | rm_work: octavia only | 17:53 |
rm_work | yeah so in that case | 17:53 |
mnaser | what is the status update queue? | 17:54 |
rm_work | definitely check config to make sure that's off | 17:54 |
mnaser | doesnt health-manager only take care of udp polls? | 17:54 |
mnaser | it looks like it spawns two processes, one that does udp and the other one that picks up the requests | 17:54 |
rm_work | the event stream | 17:55 |
rm_work | let me see where it's configured | 17:55 |
rm_work | https://github.com/openstack/octavia/blob/stable/queens/etc/octavia.conf#L77-L83 | 17:55 |
rm_work | that can get turned on accidentally by some deployment tools i think | 17:55 |
rm_work | if they once assumed neutron-lbaas support | 17:55 |
rm_work | and it will cause issues | 17:55 |
rm_work | similar to what you're seeing | 17:56 |
rm_work | but it's a longshot maybe | 17:56 |
rm_work | if that IS enabled tho, then that'd be an easy fix :) | 17:56 |
mnaser | rm_work: hm, its enabled with osa octavia | 18:00 |
mnaser | octavia_sync_provisioning_status that is | 18:00 |
rm_work | what about the driver | 18:01 |
mnaser | octavia_event_streamer: False and octavia_sync_provisioning_status: false ? | 18:01 |
mnaser | what do you mean what about the driver? | 18:01 |
rm_work | event_streamer_driver = noop_event_streamer | 18:01 |
rm_work | or queue_event_streamer | 18:01 |
mnaser | what does the event_streamer_driver control | 18:01 |
rm_work | badness | 18:02 |
rm_work | look in the config that's provisioned | 18:02 |
rm_work | on one of your HM controllers | 18:02 |
rm_work | and see if event_streamer_driver is set | 18:02 |
rm_work | I THINK the `sync_provisioning_status` option is ignored as long as it's on the noop driver | 18:02 |
rm_work | but if you are on the `queue_event_streamer` then that *is* a problem, fix that and prolly it'll resolve itself | 18:03 |
rm_work | you may also be wrecking some poor RMQ somewhere lol | 18:03 |
rm_work | normally it is used to stream status change events to neutron-lbaas, but if you don't have neutron-lbaas, it is problematic | 18:03 |
rm_work | it writes to a queue that never empties and everything goes to shit T_T | 18:04 |
rm_work | if they are both False tho, prolly it's not that | 18:04 |
*** salmankhan has quit IRC | 18:07 | |
johnsom | mnaser sync_provisioning_status is your problem, shut that off. German made a mistake in the OSA role | 18:08 |
johnsom | colby_ It needs to be allowed_address_pairs to have things created in neutron. | 18:10 |
rm_work | yeah waiting for mnaser to tell me if the driver is actually enabled on his HM's config | 18:11 |
johnsom | Even with the noop event driver, that sync setting will do bad things | 18:12 |
rm_work | ah i thought it was ignore | 18:12 |
johnsom | Well saw it right away in you deployment, he had noop event, but left that provisioning sync on in the role and it hosed up the HM. | 18:12 |
rm_work | where did mnaser go lol :P | 18:13 |
mnaser | in a call, one sec :p | 18:14 |
johnsom | I don't know how he does everything he does as it is.... | 18:14 |
rm_work | lol | 18:14 |
rm_work | right | 18:14 |
colby_ | johnsom Thanks. So Im seeing some weirdness. So when I create the load balancer Im specifying the subnet_id and the network_id. But the ip address that it comes up with is not part of the subnet | 18:15 |
colby_ | 2018-08-14 18:00:46.018 38226 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.network_tasks.AllocateVIP' (bdfcf0dc-a4b2-4f59-a34d-ca083ef79ba7) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194 | 18:16 |
colby_ | 2018-08-14 18:00:46.018 38226 DEBUG octavia.controller.worker.tasks.network_tasks [-] Allocate_vip port_id ed4220ca-2860-4597-bd0c-1448e0b49d0a, subnet_id 0f09cbd9-711d-4582-ac02-aa64ccd038f2,ip_address 198.51.100.1 execute /usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/network_tasks.py:323 | 18:16 |
colby_ | 2018-08-14 18:00:46.018 38226 INFO octavia.network.drivers.neutron.allowed_address_pairs [-] Port ed4220ca-2860-4597-bd0c-1448e0b49d0a already exists. Nothing to be done. | 18:16 |
colby_ | 2018-08-14 18:00:46.212 38226 DEBUG neutronclient.v2_0.client [-] Error message: {"NeutronError": {"message": "Port ed4220ca-2860-4597-bd0c-1448e0b49d0a could not be found.", "type": "PortNotFound", "detail": ""}} _handle_fault_response /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:258 | 18:16 |
johnsom | colby_ Do you get the right thing with just specifying subnet_id? | 18:16 |
colby_ | no It still gets the same weird ip | 18:17 |
johnsom | Are you using neutron-lbaas or native Octavia? | 18:17 |
colby_ | native octavia | 18:17 |
johnsom | Good. Hmmm, which version of Octavia? | 18:17 |
colby_ | pike: 1.0.2-1 (centos) | 18:18 |
colby_ | the ip on that subnet should be 192.168.2.0/24 | 18:19 |
johnsom | On create are you specifying an IP or just the network/subnet? | 18:21 |
colby_ | tried just subnet and subnet & network | 18:22 |
colby_ | like so: openstack loadbalancer create --name lb15 --vip-subnet-id 0f09cbd9-711d-4582-ac02-aa64ccd038f2 --vip-network-id aed09c75-74e0-447a-936e-7362ad77597b | 18:23 |
johnsom | Ok, can you pastebin a "openstack subnet show 0f09cbd9-711d-4582-ac02-aa64ccd038f2" and "openstack network show aed09c75-74e0-447a-936e-7362ad77597b"? | 18:24 |
colby_ | sure | 18:24 |
colby_ | https://pastebin.com/9qUg8zSq | 18:26 |
johnsom | When you first ran the load balancer create command, did the return data show that IP as well? 198.51.100.1 | 18:29 |
colby_ | yes | 18:29 |
colby_ | does the api use noop? just seeing that in the logs: 2018-08-14 18:00:32.754 1350542 DEBUG octavia.network.drivers.noop_driver.driver [req-c15a1e74-74a1-48f4-878d-7080e150ea7b e28435e0a66740968c523e6376c57f68 18882d9c32ba42aeaa33c4703ad84b2c - default default] Network NoopManager no-op, get_network network_id aed09c75-74e0-447a-936e-7362ad77597b get_network /usr/lib/python2.7/site-packages/octavia/network/drivers/noop_driver/driver.py:119 | 18:31 |
johnsom | That is your problem. You are getting a "fake" IP from the noop driver. No-Op means it does no operations against other services, such as neutron/nova/glance/etc. | 18:32 |
johnsom | I was just opening a window to see if that was a no-op range | 18:32 |
colby_ | oh I did not see a driver option for API config just worker | 18:32 |
rm_work | it's all the same config | 18:32 |
rm_work | you should be using an identical config for all processes, really | 18:33 |
colby_ | ah gotcha. | 18:33 |
colby_ | ok that makes it more complicated since I use the puppet modules. At least I know what I need to do now though | 18:33 |
johnsom | Yeah, we did not do a good job on that section of the config, it is confusing that other processes use stuff in the "controller_worker" section | 18:34 |
rm_work | hmm, which puppet modules? | 18:34 |
colby_ | the openstack ones | 18:35 |
rm_work | is there something which has a bad idea about how to do octavia config deployment? | 18:35 |
colby_ | I just need to include with the api server and NOT manage packages and not enable the service | 18:35 |
rm_work | openstack has puppet modules? i guess i am out of the loop lol | 18:35 |
rm_work | i thought everyone used ansible | 18:35 |
colby_ | haha yea. We use puppet/foreman for all our deployment | 18:36 |
*** sapd1 has quit IRC | 18:38 | |
mnaser | ok | 18:58 |
mnaser | im done | 18:58 |
colby_ | Ok progress. Its creating the port on the correct subnet/network now. But its not creating a port on the octavia managment net. Ill dig some more on that | 18:58 |
mnaser | johnsom, rm_work: ok i'm checking if its enabled now | 18:59 |
mnaser | event_streamer_driver = queue_event_streamer | 19:03 |
mnaser | sync_provisioning_status = True | 19:03 |
johnsom | Ouch, double trouble | 19:04 |
rm_work | yeah that's it | 19:04 |
rm_work | noop and False plz | 19:04 |
johnsom | mnaser Use the defaults: https://github.com/openstack/octavia/blob/master/etc/octavia.conf#L91 | 19:04 |
johnsom | With both of those on I am surprised you made it this long.... | 19:05 |
mnaser | lol | 19:05 |
mnaser | octavia_sync_provisioning_status/octavia_event_streamer both True | 19:05 |
mnaser | in stalbe/queens | 19:05 |
mnaser | should we fix that? | 19:05 |
rm_work | default?! | 19:06 |
rm_work | mnaser: i think they have always defaulted to False ... | 19:06 |
rm_work | since basically ever | 19:06 |
rm_work | it must be whatever deployment system | 19:07 |
mnaser | https://github.com/openstack/openstack-ansible-os_octavia/blob/stable/queens/defaults/main.yml#L281-L286 | 19:07 |
mnaser | negatory | 19:07 |
mnaser | for openstack ansible, sorry | 19:07 |
rm_work | yeah | 19:07 |
rm_work | so yes, as johnsom was saying, I guess a mistake was made setting that up in OSA | 19:07 |
rm_work | if we can fix it, maybe we should | 19:07 |
johnsom | I thought German did fix those in OSA, but if they are still there, yes, it should be fixed | 19:07 |
mnaser | i kinda wanna push a stable/pike only fix | 19:07 |
rm_work | well, I think in Pike most people ran it behind n-lbaas still | 19:08 |
rm_work | though technically it can run standalone | 19:08 |
mnaser | amateurs | 19:08 |
mnaser | :P | 19:08 |
rm_work | lol | 19:08 |
mnaser | https://github.com/openstack/openstack-ansible-os_octavia/commit/c8f4f275a1e9968b1f049c022639adadc4333b63 | 19:08 |
mnaser | this commit added it | 19:08 |
rm_work | hey I still think ALL ya'll are amateurs, for running stable branches :P | 19:08 |
johnsom | Yeah, it could be that pike OSA still defaults to neutron-lbaas | 19:08 |
mnaser | i think what i want to do is | 19:08 |
mnaser | if octavia_v1 is false and octavia_v2 is true then octavia_event_streamer/octavia_sync_provisioning_status = false else true | 19:09 |
rm_work | yeah | 19:09 |
rm_work | i was gonna suggest something like tha | 19:09 |
johnsom | eh gad, it's both true in queens too. | 19:09 |
johnsom | mnaser +1 | 19:09 |
johnsom | Well, +2, I'm a core there | 19:09 |
mnaser | sorry | 19:09 |
mnaser | i meant stable/queens | 19:10 |
mnaser | i assume queens didnt default to v2 only | 19:10 |
mnaser | in osa | 19:10 |
johnsom | Yep, it looks like queens has v1 enabled | 19:10 |
mnaser | ill do the patch that enables it | 19:10 |
mnaser | and backport it through | 19:11 |
johnsom | Thanks. | 19:11 |
mnaser | johnsom: https://review.openstack.org/591829 Enable event streamer and provisioning status sync for V1 API | 19:17 |
mnaser | rm_work: ^ | 19:17 |
jiteka | Hey guys, I'm still trying to get my LB working in Queens using ubuntu based amphora image xenial | 19:34 |
jiteka | Problem : I'm getting "Failed to bring up eth1" when octavia-worker contact amphora Rest API on /0.5/plug/vip/<vip address> | 19:34 |
jiteka | I disabled revert with task_flow to get a chance to ssh and troubleshoot amphora and I realise 2 things | 19:34 |
jiteka | 1. BACKUP amphora node is not yet configured when the LB failed to be created due to that ifup problem on MASTER | 19:34 |
jiteka | and when I'm doing my cURL I just get {"message":"OK","details":"VIP 10.63.68.30 plugged on interface eth1"} | 19:34 |
jiteka | amphora-agent : ::ffff:10.63.8.9 - - [14/Aug/2018:17:16:41 +0000] "POST /0.5/plug/vip/10.63.68.30 HTTP/1.1" 202 70 "-" "curl/7.29.0" | 19:34 |
jiteka | 2. MASTER amphora node that failed is left half-configured so to avoid {"message":"Interface already exists"} on cURL on /plug/vip | 19:34 |
jiteka | I deleted the netns amphora-agent then tried again and got a 202 OK too on that node | 19:34 |
jiteka | But when it fail the first time on the master, interface cfg for eth1 look like this : | 19:34 |
jiteka | http://paste.openstack.org/show/728039/ | 19:34 |
jiteka | and when it works on the cURL command it looks like this : | 19:34 |
jiteka | http://paste.openstack.org/show/728040/ | 19:34 |
jiteka | eth1 is no more static but getting his config from dhcp | 19:34 |
jiteka | Any idea, suggestion why this is happening ? | 19:34 |
johnsom | jiteka Did you get a chance to do the line by line test I recommended before my vacation? | 19:41 |
johnsom | My guess is that overlapping host route is not making ubuntu happy | 19:41 |
jiteka | not sure how to proceed | 19:41 |
johnsom | Ok, let me give you steps | 19:42 |
johnsom | In the amphora: | 19:42 |
johnsom | ip netns exec amphora-haproxy bash | 19:42 |
johnsom | (you need to be root) | 19:42 |
jiteka | ok | 19:42 |
johnsom | ifconfig eth1:0 down | 19:42 |
jiteka | I remember you showed me that | 19:42 |
johnsom | ifconfig eth1 down | 19:42 |
johnsom | Those may fail, that is ok | 19:42 |
johnsom | Then ifconfig eth1 up | 19:43 |
johnsom | ifconfig eth1:0 up | 19:43 |
johnsom | One or more of those might fail too, that is also ok | 19:43 |
johnsom | Then go through the extra config lines for the host routes: | 19:43 |
johnsom | ip route add -net 169.254.169.254/32 gw 10.63.68.2 dev eth1 | 19:44 |
jiteka | eth1:0 fail yes : | 19:44 |
jiteka | Cannot assign requested address | 19:44 |
johnsom | ip route add -net 198.74.40.0/24 gw 10.63.68.2 dev eth1 | 19:44 |
johnsom | ip route add -net 10.0.0.0/8 gw 10.63.68.2 dev eth1 | 19:44 |
johnsom | ip route add -net 172.16.0.0/12 gw 10.63.68.2 dev eth1 | 19:44 |
jiteka | Error: ??? prefix is expected rather than "-net". | 19:44 |
johnsom | Oh, right, those are the other command syntax, one second | 19:44 |
johnsom | remove the "ip" from the front of those | 19:45 |
johnsom | route add -net 12.148.80.0/22 gw 10.63.68.2 dev eth1 | 19:45 |
johnsom | ^^^ that was the last in the top list | 19:45 |
johnsom | /sbin/ip route add 10.63.68.0/22 dev eth1 src 10.63.68.30 scope link table 1 | 19:45 |
johnsom | /sbin/ip route add default via 10.63.68.1 dev eth1 onlink table 1 | 19:45 |
jiteka | Network is unreachable | 19:45 |
johnsom | /sbin/ip route add 169.254.169.254/32 via 10.63.68.2 dev eth1 onlink table 1 | 19:46 |
johnsom | Which line? | 19:46 |
jiteka | 12.148.80.0/22 | 19:46 |
jiteka | /sbin/ip route add default via 10.63.68.1 dev eth1 onlink table 1 (OK) | 19:47 |
johnsom | Ok, darn, the interface must not have been up | 19:47 |
johnsom | /sbin/ip route add 198.74.40.0/24 via 10.63.68.2 dev eth1 onlink table 1 | 19:47 |
johnsom | /sbin/ip route add 10.0.0.0/8 via 10.63.68.2 dev eth1 onlink table 1 | 19:47 |
johnsom | /sbin/ip route add 172.16.0.0/12 via 10.63.68.2 dev eth1 onlink table 1 | 19:47 |
johnsom | /sbin/ip route add 12.148.80.0/22 via 10.63.68.2 dev eth1 onlink table 1 | 19:48 |
johnsom | /sbin/ip rule add from 10.63.68.30/32 table 1 priority 100 | 19:48 |
johnsom | My suspicion is that last line is going to fail' | 19:48 |
jiteka | http://paste.openstack.org/show/728041/ | 19:49 |
jiteka | no hard fail here | 19:49 |
jiteka | at least not visible in command output | 19:50 |
johnsom | Hmmm, ok, so it must be in the first block. | 19:50 |
jiteka | /sbin/ip route list | 19:51 |
jiteka | it should give me the list of rules I just added right ? | 19:51 |
johnsom | Except that top block where you got that error about network not reachable. | 19:51 |
johnsom | Did you get that on all of the lines or just the first one? | 19:51 |
jiteka | http://paste.openstack.org/show/728042/ | 19:53 |
jiteka | invalid argument on : | 19:53 |
jiteka | /sbin/ip route add 10.63.68.0/22 dev eth1 src 10.63.68.30 scope link table 1 | 19:53 |
johnsom | Can you do an openstack subnet show again for that subnet? I want to look at the host routes configured there | 19:57 |
jiteka | sure | 19:57 |
jiteka | doing it now | 19:58 |
jiteka | http://paste.openstack.org/show/728044/ | 19:58 |
johnsom | Well, I'm not sure what the issue is. I think it's the overlapping host route, but not sure. | 20:01 |
johnsom | Let me build a devstack, create a matching subnet and give it a go local so I can poke at this without the IRC latency. grin | 20:01 |
*** amuller has quit IRC | 20:02 | |
jiteka | johnsom: thanks, I will be on my way home soon but will resume on this on thursday (bank holiday tomorrow in France) | 20:02 |
johnsom | Ok, if you have an open story for this I can post my results there. | 20:03 |
johnsom | https://storyboard.openstack.org/#!/dashboard/stories | 20:03 |
johnsom | It will take me about a half hour to stack and get that subnet configured, etc. | 20:03 |
jiteka | ahaha yes I know that devstack is taking time | 20:04 |
jiteka | I anticipated that :D | 20:04 |
colby_ | will disable_revert = true leave the amphora in place on error. The worker is having trouble connecting to the amphora on port 9443. Im able to ssh to the instance but it gets destroyed right away before I can troubleshoot. | 20:05 |
jiteka | hmm what could I put here on that story as "Task title" | 20:07 |
johnsom | colby_ Yes. You can also stop the health manager process to stop it from being failed over to a new vm | 20:08 |
johnsom | jiteka "load balancer create fails with host routes" | 20:08 |
johnsom | Should do the trick | 20:08 |
jiteka | https://storyboard.openstack.org/#!/story/2003441 | 20:10 |
johnsom | Thank you. I will post my findings there | 20:10 |
jiteka | I think also that I will be in that configuration : | 20:13 |
jiteka | https://bugs.launchpad.net/octavia/+bug/1488279 | 20:13 |
openstack | Launchpad bug 1488279 in octavia "Amphora fails with member on same subnet as vip" [Critical,Fix released] | 20:13 |
jiteka | because I'm trying to get my VIP on the same subnet that futur listeners | 20:13 |
jiteka | I don't see that issue yet | 20:13 |
jiteka | but I see that status is "fixed released" | 20:14 |
jiteka | fix | 20:14 |
johnsom | Yeah, that bug was fixed a long time ago. | 20:16 |
jiteka | colby_: it actually leave it in PENDING_CREATE, so after when I need to delete it I need to go in the DB to update the octavia.load_balancer row | 20:16 |
jiteka | update load_balancer set provisioning_status = 'ERROR' where id = '<lb_uuid>' | 20:16 |
jiteka | colby_: API prevent you to delete lb with provisionning status = PENDING_CREATE | 20:17 |
johnsom | PENDING_CREATE will go to ERROR once the retries have given up. It's a configuration setting. The default is super long for folks using bad hypervisors like virtualbox. They will either go to ERROR or ACTIVE. | 20:17 |
jiteka | johnsom: ok that's a good news | 20:17 |
colby_ | yea Ive had to do that before. Thanks. | 20:17 |
johnsom | You have to be careful doing that with the DB as if it's in PENDING_* that means one of the controllers is actively working on that load balancer and you could get things in an inconsistent state by changing the status while a controller has ownership of the object. | 20:18 |
johnsom | You will get at least a bunch of errors in the log. | 20:19 |
colby_ | ok so looks like my problem is just a timeout maybe. I can telnet to port 9443 from the worker node now that the instance is staying. so maybe the amphora just was not up all the way when the worker is trying to connect. | 20:20 |
*** salmankhan has joined #openstack-lbaas | 20:20 | |
johnsom | colby_ With a good cloud it should take about 30 seconds to come up. Is this running on bare metal or in a hypervisor? | 20:21 |
colby_ | hypervisor | 20:21 |
johnsom | Which one? | 20:22 |
colby_ | kvm | 20:22 |
johnsom | Ok, so you are running kvm inside kvm? | 20:22 |
colby_ | oh no | 20:23 |
colby_ | its spinning up amphora on our compute nodes | 20:23 |
colby_ | compute nodes are baremetal running kvm | 20:23 |
johnsom | On the compute host, can you look in /proc/cpuinfo and see if you find vmx or smx? | 20:23 |
colby_ | vmx | 20:25 |
colby_ | these are dual Intel(R) Xeon(R) CPU E5-2697 v4 | 20:25 |
johnsom | Ok, yeah, so it should boot up and be ready in about 30 seconds. | 20:25 |
colby_ | the instance spins up quickly | 20:25 |
colby_ | I can ssh in...in less than 30 seconds | 20:26 |
johnsom | Hmm, so it's not likely the timeout unless you have changed the defaults to something low | 20:26 |
rm_work | sounds like a bad return issue like we had a long time ago | 20:51 |
rm_work | where the agent responded badly on the initial startup if you hit it too soon | 20:52 |
colby_ | I get: 2018-08-14 20:46:26.119 6118 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.10.0.7', port=9443): Max retries exceeded with url: /0.5/plug/vip/192.168.2.8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fa4c86c1650>: Failed to establish a new connection: [Errno 111] Connection refused | 20:52 |
colby_ | ',)) | 20:52 |
colby_ | Here are the logs of the amphora-agent: https://pastebin.com/gXXPdPch | 20:53 |
colby_ | the timestamps are about the same so maybe its taking longer to get the agent up than the worker is expecting | 20:54 |
johnsom | This is not normal: [CRITICAL] WORKER TIMEOUT | 20:54 |
johnsom | It sounds like gunicorn is crashing | 20:54 |
colby_ | it starts another worker after that | 20:55 |
colby_ | and that one doesn't crash | 20:55 |
rm_work | yeah so prolly the initial one crashes during the request | 20:55 |
rm_work | and that kills it | 20:55 |
rm_work | because the controller GETS something, and it's bad | 20:55 |
johnsom | Hmm, I don't see those | 20:55 |
colby_ | sorry I didn't paste those. Ill redo | 20:55 |
rm_work | yeah crashes or hangs | 20:56 |
rm_work | i wonder if it's handing on the config action | 20:56 |
rm_work | *hanging | 20:56 |
rm_work | like, on an inteface up or something | 20:56 |
colby_ | https://pastebin.com/Zt0vShuJ | 20:56 |
colby_ | the ssl negotiatin errors are me testing via telnet and curl from the worker | 20:57 |
rm_work | the issue is almost certainly that timeout tho | 20:57 |
colby_ | oops pastebin had a captcha I didn't see | 20:58 |
colby_ | now that link should work | 20:58 |
rm_work | we only try repeatedly to get a connection if we get a timeout/refused... if we get like a partial data return that's broken, we don't retry | 20:58 |
rm_work | we bail and recycle the amp | 20:58 |
rm_work | it worked, i just had to captcha :P | 20:58 |
colby_ | How can I test using curl (and using the certs) to see what kind of response I get | 21:01 |
rm_work | i don't have the command handy, but i remember basically i just had to go through the curl manpage and pick all the cert related stuff | 21:02 |
colby_ | sorry yea I meant which cert exactly do I need to connect to the amphora. I have split CA certs setup | 21:02 |
rm_work | oh | 21:02 |
rm_work | you validate using the cert that is the CA used to create the amp cert | 21:03 |
rm_work | IIRC | 21:03 |
rm_work | but i've never actually done split-cert lol | 21:03 |
colby_ | The method is not allowed for the requested URL | 21:14 |
colby_ | is that due to certificate issue | 21:14 |
colby_ | ? | 21:14 |
colby_ | trying to connect to: https://10.10.0.7:9443/0.5/plug/vip/192.168.2.8 | 21:14 |
*** abaindur has joined #openstack-lbaas | 21:21 | |
abaindur | johnsom: or anyone else that can answer this question for me: What exactly is the bind_ip and controller_ip_port_list in the [health_manager] for, and what is it to be set to? | 21:23 |
*** salmankhan has quit IRC | 21:23 | |
abaindur | is it the IP of the host/physical machine on which the octavia health-manager service is running? | 21:24 |
*** salmankhan has joined #openstack-lbaas | 21:24 | |
abaindur | and why is it in a list format? Is this assuming if we have health-manager running on multiple hosts, the conf value on every host should be the same and include the public IPs of every host running health-manager? | 21:25 |
colby_ | controller_ip_port_list is a list of the hosts running health_manager. The amphora use this list to connect to as far as I can tell | 21:26 |
abaindur | For example we have health-manager and worker running on 2 hosts, suppose with management IP 10.4.0.2 and 10.4.0.3 (this is the IP of physical machine, not at all related to the octavia LB network) | 21:26 |
colby_ | I have 2 health managers so I have both in there | 21:26 |
abaindur | bind_ip would be different on each host - it would be 10.4.0.2 on one, 10.4.0.3 on the other | 21:27 |
colby_ | correct | 21:27 |
colby_ | the list would include both for the amphora | 21:27 |
abaindur | and "controller_ip_port_list = 10.4.0.2:5555, 10.4.0.3:5555" - on both the hosts .conf files? | 21:27 |
colby_ | I believe so (I could be wrong. That's how I did it and I don't get errors any more) | 21:28 |
abaindur | what did you also set for the heartbeat_key? | 21:28 |
abaindur | Does this really matter...? are we just supposed to set it to something random? | 21:29 |
colby_ | random key | 21:29 |
colby_ | random string rather | 21:29 |
abaindur | hmm ok. right now i have set the bind_ip and controller_ip_port_list to 127.0.0.1. We have it only on one host | 21:29 |
abaindur | but i dont see health-manager doing anything in the logs. | 21:30 |
abaindur | however, the loadbalancer is working | 21:30 |
abaindur | what is also strange is the provisioning_status | ACTIVE | 21:30 |
abaindur | but the operating_status shows as offline | 21:30 |
abaindur | Is this due to health-manager not doing anything, which due to wrong IP used for bind_ip/controller port list (127.0.0.1) | 21:31 |
abaindur | the LB I deployed was without a healthmonitor | 21:31 |
colby_ | Im not sure on that. Im still having trouble getting the lb to spin up successfully. Im having issues with the worker connecting to the amphora | 21:37 |
abaindur | what is the issue? | 21:39 |
colby_ | worker cant connect to the amphora | 21:40 |
abaindur | i had mentioned the LB network needs to basically be a provider network, from what i could tell | 21:40 |
colby_ | it times out. Its not a firewall issue. I can telnet to the port from the worker | 21:40 |
abaindur | since the octavia worker runs as a process on the host, it will use the physical machines' networking/routing table | 21:40 |
abaindur | so here we have a host, which needs to be able to talk ot a VM. for us, that means it cant be an isolated tenant network, it has to be a provider network | 21:41 |
colby_ | I can connect from command line so the process should be able to fine. I think there is something more going on | 21:41 |
colby_ | the amphora worker crashes | 21:42 |
abaindur | security group of the amphora? | 21:42 |
abaindur | We just put it on an allowall sec group that allows all traffic | 21:42 |
abaindur | is the flavor large enough? im giving my amp 1 cpu, 2 GB, and 10GB disk | 21:42 |
colby_ | no I can connect to 9443 from the worker fine. | 21:42 |
colby_ | 1cpu 1GB Ram | 21:43 |
colby_ | I was going to try curl from the worker to duplicate what it tried to do but Im having trouble doing it due to hostname mismatch | 21:44 |
colby_ | using --resolve didn't help either | 21:45 |
*** salmankhan has quit IRC | 22:04 | |
abaindur | colby_: how do you know the amphora crashes? | 22:04 |
abaindur | maybe then it is problem with your disk image when you built it | 22:04 |
colby_ | no it seems to happen when the worker makes the connection attempt | 22:08 |
colby_ | the amphora worker times out, dies then starts another worker | 22:09 |
colby_ | which is why I wanted to try and test a connection with curl to see if I see any issues there | 22:09 |
*** fnaval has quit IRC | 22:16 | |
colin- | what is the most direct way to determine the nova ID(s) of amphora agent for a given loadbalancer in the cli? | 22:27 |
johnsom | As an admin you can use the amphora admin API | 22:37 |
colin- | found what i needed that way, thanks johnsom | 22:48 |
abaindur | speaking of, what is the [amphora_agent] section used for? | 22:48 |
johnsom | abaindur those are settings rendered inside the amphora-agent confguration file. Ignore it in your octavia.conf | 22:52 |
johnsom | It's an oddity with how our agent is setup and oslo config. I removed them from the sample octavia.conf, but people kept putting them back, so I stopped fighting it | 22:53 |
abaindur | yea we left it blank. didnt see any issues, the LB was working | 22:57 |
abaindur | johnsom: btw did you see my msgs above about the controller_ip_port_list and our operating_status showing as OFFLINE? even though the LB is working | 23:00 |
abaindur | provisioning_status shows as ACTIVE | 23:01 |
abaindur | right now i have set the bind_ip and controller_ip_port_list to 127.0.0.1. We have it only on one host | 23:01 |
abaindur | so I suspect its because the amphora VM cant heartbeat to the health-manager? we see nothing in the health-manager logs | 23:02 |
johnsom | No I didn't see it. I have four conversations going at the same time right now | 23:02 |
johnsom | So, yeah, exactly, you are telling the amphora to send their heartbeat to themselves, so we never get status or stats from those amphora. | 23:02 |
colby_ | So any ideas on how to test the amphora connection. I got curl working but get a method not allowed to 0.5/plug/vip/192.168.2.23 | 23:05 |
abaindur | whats your .conf file look like? | 23:05 |
abaindur | Maybe its your certs? | 23:05 |
colby_ | mine is complicated since I have split CA cert (one for client and one for server) | 23:06 |
abaindur | doesnt the worker talk to the amphora over https REST api. so if you can ping/telnet working to the amphora VM from the host and its on an allow all security group, sounds like it cant authenticate? | 23:07 |
abaindur | try the same certs...? | 23:07 |
colby_ | yes, Which is why I was trying to use curl with the certs | 23:07 |
colby_ | to mimic the worker | 23:07 |
johnsom | colby_ So, after you set all that up in the config for the new certs, did you build a new amp? Those certs are loaded at amp boot time | 23:07 |
colby_ | you mean the image? | 23:08 |
abaindur | build a new amp == build an entire new disk image? or just create a LB to spin up a new AMP | 23:08 |
johnsom | No, not the image, at "openstack loadbalancer create" time | 23:08 |
colby_ | yes Ive made many attempts to spin up new lb with the certs in place correctly | 23:09 |
abaindur | why dont you try w/ the same certs | 23:10 |
abaindur | i just ran the sample script in the repo | 23:10 |
abaindur | then scp -r the output folder onto my host | 23:10 |
johnsom | colby_ Sorry, too many conversations going on, so refocusing here | 23:10 |
johnsom | colby_ Are you getting SSL errors in the syslog or amphora-agent log on the amp? | 23:11 |
colby_ | not from when the worker tried to connect | 23:11 |
colby_ | it just had that timeout | 23:11 |
johnsom | Ok, so these "Invalid request from ip=::ffff:10.10.0.8: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake " were from your manual tests? | 23:12 |
colby_ | correct | 23:12 |
colby_ | when I use curl pointing to my certs I dont get an ssl error either | 23:13 |
johnsom | And all of your retries are "[Errno 111] Connection refused" | 23:13 |
johnsom | ? | 23:13 |
colby_ | I just get a method not allowed | 23:13 |
johnsom | Yeah, GET won't work on that path. | 23:13 |
johnsom | If you do /info you would get data | 23:13 |
johnsom | oh, well 0.5/info | 23:14 |
colby_ | {"haproxy_version":"1.6.3-1ubuntu0.1","api_version":"0.5","hostname":"amphora-eb2ba854-d38f-4100-b5dc-578c1db556ac"} | 23:15 |
colby_ | so thats working from curl | 23:15 |
johnsom | So, yeah, you can get there just fine... | 23:15 |
colby_ | yea I only see 2 logged failures with error 111 | 23:16 |
colby_ | or rather there are only 2 log entries about it both failing with 111 | 23:16 |
colby_ | in the worker logs | 23:16 |
johnsom | Can you pastebin the controller log from before it starts trying to plug the VIP until it errors or, 2-3 screens worth of the WARNING messages? | 23:16 |
colby_ | the controller worker log right:? | 23:17 |
johnsom | yes please | 23:17 |
colby_ | https://pastebin.com/qBKpnQeF | 23:20 |
colby_ | I removed the private key from the logs :-) | 23:20 |
colby_ | I have to run but please let me know if you notice anything | 23:21 |
johnsom | Sure, NP. Darn it, I had those hidden, but we must have changed the task names | 23:21 |
johnsom | colby_ Yeah, you have a bad cert or config on the controller side | 23:22 |
abaindur | how do you know? (ps, no affiliation with colby, just curious) | 23:23 |
johnsom | Error: [('PEM routines', 'PEM_read_bio', 'no start line'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'PEM lib')] | 23:23 |
johnsom | The private key file is bad, missing, or the passphrase is wrong | 23:24 |
*** Swami has quit IRC | 23:45 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!