Tuesday, 2018-08-14

*** rcernin_ has joined #openstack-lbaas		00:02
*** rcernin has quit IRC		00:03
*** rcernin has joined #openstack-lbaas		00:29
*** rcernin has quit IRC		00:29
*** rcernin has joined #openstack-lbaas		00:30
abaindur	BTW it would be nice if we could specify the network names/sec groups/flavors by name as we can with the ssh key and imgage tag	00:31
*** rcernin_ has quit IRC		00:32
*** longkb has joined #openstack-lbaas		00:37
*** longkb has quit IRC		00:57
*** korean101 has quit IRC		00:57
*** korean101 has joined #openstack-lbaas		01:04
*** abaindur has quit IRC		01:42
*** abaindur has joined #openstack-lbaas		01:42
*** abaindur has quit IRC		01:45
*** abaindur has joined #openstack-lbaas		01:46
*** hongbin has joined #openstack-lbaas		02:01
*** longkb has joined #openstack-lbaas		02:12
*** hongbin has quit IRC		03:39
*** abaindur has quit IRC		04:06
*** abaindur has joined #openstack-lbaas		04:06
*** KeithMnemonic has quit IRC		04:08
*** celebdor has joined #openstack-lbaas		04:19
*** abaindur has quit IRC		04:43
*** abaindur has joined #openstack-lbaas		04:45
*** abaindur has quit IRC		05:25
*** pcaruana has joined #openstack-lbaas		06:44
*** rcernin has quit IRC		07:02
*** openstackgerrit has joined #openstack-lbaas		07:07
openstackgerrit	Yang JianFeng proposed openstack/python-octaviaclient master: Add l7policy and l7rule to octavia quota https://review.openstack.org/591568	07:07
*** salmankhan has joined #openstack-lbaas		07:29
*** salmankhan has quit IRC		07:33
openstackgerrit	Yang JianFeng proposed openstack/python-octaviaclient master: Add l7policy and l7rule to octavia quota https://review.openstack.org/591568	07:55
*** openstackstatus has quit IRC		08:12
*** ktibi has joined #openstack-lbaas		08:13
openstackgerrit	Min Sun proposed openstack/octavia-dashboard master: Cannot update ssl certificate when update listener https://review.openstack.org/550313	08:14
openstackgerrit	Yang JianFeng proposed openstack/octavia master: [WIP] Add quota support to octavia's l7policy and l7rule https://review.openstack.org/590620	08:36
*** salmankhan has joined #openstack-lbaas		09:10
*** salmankhan1 has joined #openstack-lbaas		09:15
*** salmankhan has quit IRC		09:16
*** salmankhan1 is now known as salmankhan		09:16
openstackgerrit	Yang JianFeng proposed openstack/octavia master: Add quota support to octavia's l7policy and l7rule https://review.openstack.org/590620	09:22
*** openstackstatus has joined #openstack-lbaas		09:41
*** ChanServ sets mode: +v openstackstatus		09:41
*** yboaron has joined #openstack-lbaas		09:46
openstackgerrit	Yang JianFeng proposed openstack/octavia master: Add quota support to octavia's l7policy and l7rule https://review.openstack.org/590620	09:49
*** yboaron_ has joined #openstack-lbaas		10:02
*** yboaron has quit IRC		10:05
*** yboaron_ has quit IRC		10:38
*** yboaron_ has joined #openstack-lbaas		10:59
*** savvas has joined #openstack-lbaas		11:56
*** amuller has joined #openstack-lbaas		12:08
*** amuller has quit IRC		12:08
*** longkb has quit IRC		12:13
mnaser	rm_work: it is a processpoolexecutor	12:27
mnaser	http://paste.openstack.org/show/728008/ => ps auxf output	12:28
*** celebdor has quit IRC		12:28
mnaser	rm_work: http://paste.openstack.org/show/728009/ output of how much threads are up	12:29
mnaser	it looks like health manager is only 2 processes	12:31
mnaser	and i shouldnt have 20?	12:31
mnaser	yeah all my other deploys are 2 processes	12:32
mnaser	61 more processes	12:34
savvas	hi johnsom how are you today	12:36
mnaser	ok according to lsof, it's the udp listener that ends up with all tehse threads	12:36
savvas	do you have any idea why this would (consistently happen)? http://paste.openstack.org/show/728010/	12:36
cgoncalves	savvas, multiple images with same name. xgerman_ could probably help you better as that seems to be ansible-octavia	12:54
savvas	he's on vacation :)	12:55
savvas	he recommended I reach out to johnsom	12:55
cgoncalves	I could have a look at ansible-octavia but not just right now, sorry	12:55
savvas	that's alright, if you do get a chance let me know	12:56
savvas	the playbook successfully completes a second time but skips that part. It also doesn't seem to create the api endpoints properly	12:57
savvas	I get response on port 9876 but v2.0/lbaas/loadbalancers leads to a not found page	12:57
mnaser	i have a feeling that this is the root cause -- https://github.com/openstack/octavia/commit/98484332c4ff7deeebd749d0f5cd36b7679cd1bc#diff-0fb9e31059c066ef1f145f6a872f6993	13:08
*** amuller has joined #openstack-lbaas		13:34
devfaz	mnaser: are you using ubuntu?	13:41
mnaser	devfaz: ubuntu amphoras, centos controller	13:41
devfaz	mnaser: had similar issue with ubuntu python-pkg of octavia. Try to use only "pip install ..." without any pkged versions	13:42
mnaser	devfaz: how many octavia-health-manager processes do you have	13:42
cgoncalves	savvas, I thought that task ("Get curremt image id") to be from openstack-ansible-os_octavia but I can't find any references	13:42
devfaz	https://pastebin.com/MH7BzVPp	13:43
mnaser	devfaz: so it looks like there's two proceses only	13:47
mnaser	which means you're probably running pre that commit	13:47
devfaz	%prog 3.0.0.0b4.dev19	13:48
savvas	yes cgoncalves, for me that gets executed when running os-otavia-install.yml	13:48
*** KeithMnemonic has joined #openstack-lbaas		13:48
savvas	which is part of Openstack Ansible	13:48
devfaz	mnaser: docker-iamge was build 5days ago from master.	13:49
devfaz	mnaser: I will rebuild with current master. give me a minute.	13:49
mnaser	devfaz: thats weird, i have a ton more processes..	13:49
mnaser	this machines has 40 cores though	13:49
savvas	http://paste.openstack.org/show/728014/ this is the playbook cgoncalves	13:49
devfaz	mnaser: python -m pip install -c https://raw.githubusercontent.com/openstack/requirements/master/upper-constraints.txt -r requirements.txt	13:49
devfaz	mnaser: try this to update/fix your python-module dep.	13:50
mnaser	i cant really just do that, this is a deployment done via openstack ansible	13:50
devfaz	took me a while to detect this.	13:50
mnaser	so i would need to make sure it happens cleanly	13:50
*** fnaval has joined #openstack-lbaas		13:56
cgoncalves	savvas, ok, so I see that is on stable/queens or earlier. things have changed substantially in recent master	14:14
cgoncalves	savvas, the os_image_facts ansible module can only return facts of a single image and it is confused because there are more than one with that image name	14:15
cgoncalves	probably openstack-ansible-os_octavia misses a task that deletes existing amphora image before uploading new one	14:17
cgoncalves	certainly os_image_facts could be extended to allow retrieving latest created image (and retrieval by tag name)	14:18
cgoncalves	I have an idea... borrow code from octavia-undercloud role in tripleo-common :)	14:21
savvas	yee I was trying to let this run as much out of the box as possible, since we want to use and maintain it in multiple locations	14:27
cgoncalves	savvas, there's a task at the end to delete old image from glance. how did you end up with multiple ones?	14:34
*** ktibi has quit IRC		14:57
*** yboaron_ has quit IRC		15:17
*** savvas has quit IRC		15:36
*** pcaruana has quit IRC		16:02
*** rpittau has quit IRC		16:09
openstackgerrit	Michal Rostecki proposed openstack/octavia master: devstack: Define packages for (open)SUSE https://review.openstack.org/591774	16:17
openstackgerrit	Merged openstack/octavia-dashboard master: Cannot update ssl certificate when update listener https://review.openstack.org/550313	16:22
*** Swami has joined #openstack-lbaas		17:10
colby_	Do I need to change the network driver to work with neutron? Should it be: allowed_address_pairs_driver or noop? My setup it not creating the vip port so Im trying to troubleshoot	17:30
rm_work	mnaser: hmmm, yeah umm, do you run neutron-lbaas also or just octavia? is the status update queue possibly turned on by accident?	17:53
mnaser	rm_work: octavia only	17:53
rm_work	yeah so in that case	17:53
mnaser	what is the status update queue?	17:54
rm_work	definitely check config to make sure that's off	17:54
mnaser	doesnt health-manager only take care of udp polls?	17:54
mnaser	it looks like it spawns two processes, one that does udp and the other one that picks up the requests	17:54
rm_work	the event stream	17:55
rm_work	let me see where it's configured	17:55
rm_work	https://github.com/openstack/octavia/blob/stable/queens/etc/octavia.conf#L77-L83	17:55
rm_work	that can get turned on accidentally by some deployment tools i think	17:55
rm_work	if they once assumed neutron-lbaas support	17:55
rm_work	and it will cause issues	17:55
rm_work	similar to what you're seeing	17:56
rm_work	but it's a longshot maybe	17:56
rm_work	if that IS enabled tho, then that'd be an easy fix :)	17:56
mnaser	rm_work: hm, its enabled with osa octavia	18:00
mnaser	octavia_sync_provisioning_status that is	18:00
rm_work	what about the driver	18:01
mnaser	octavia_event_streamer: False and octavia_sync_provisioning_status: false ?	18:01
mnaser	what do you mean what about the driver?	18:01
rm_work	event_streamer_driver = noop_event_streamer	18:01
rm_work	or queue_event_streamer	18:01
mnaser	what does the event_streamer_driver control	18:01
rm_work	badness	18:02
rm_work	look in the config that's provisioned	18:02
rm_work	on one of your HM controllers	18:02
rm_work	and see if event_streamer_driver is set	18:02
rm_work	I THINK the `sync_provisioning_status` option is ignored as long as it's on the noop driver	18:02
rm_work	but if you are on the `queue_event_streamer` then that is a problem, fix that and prolly it'll resolve itself	18:03
rm_work	you may also be wrecking some poor RMQ somewhere lol	18:03
rm_work	normally it is used to stream status change events to neutron-lbaas, but if you don't have neutron-lbaas, it is problematic	18:03
rm_work	it writes to a queue that never empties and everything goes to shit T_T	18:04
rm_work	if they are both False tho, prolly it's not that	18:04
*** salmankhan has quit IRC		18:07
johnsom	mnaser sync_provisioning_status is your problem, shut that off. German made a mistake in the OSA role	18:08
johnsom	colby_ It needs to be allowed_address_pairs to have things created in neutron.	18:10
rm_work	yeah waiting for mnaser to tell me if the driver is actually enabled on his HM's config	18:11
johnsom	Even with the noop event driver, that sync setting will do bad things	18:12
rm_work	ah i thought it was ignore	18:12
johnsom	Well saw it right away in you deployment, he had noop event, but left that provisioning sync on in the role and it hosed up the HM.	18:12
rm_work	where did mnaser go lol :P	18:13
mnaser	in a call, one sec :p	18:14
johnsom	I don't know how he does everything he does as it is....	18:14
rm_work	lol	18:14
rm_work	right	18:14
colby_	johnsom Thanks. So Im seeing some weirdness. So when I create the load balancer Im specifying the subnet_id and the network_id. But the ip address that it comes up with is not part of the subnet	18:15
colby_	2018-08-14 18:00:46.018 38226 DEBUG octavia.controller.worker.controller_worker [-] Task 'octavia.controller.worker.tasks.network_tasks.AllocateVIP' (bdfcf0dc-a4b2-4f59-a34d-ca083ef79ba7) transitioned into state 'RUNNING' from state 'PENDING' _task_receiver /usr/lib/python2.7/site-packages/taskflow/listeners/logging.py:194	18:16
colby_	2018-08-14 18:00:46.018 38226 DEBUG octavia.controller.worker.tasks.network_tasks [-] Allocate_vip port_id ed4220ca-2860-4597-bd0c-1448e0b49d0a, subnet_id 0f09cbd9-711d-4582-ac02-aa64ccd038f2,ip_address 198.51.100.1 execute /usr/lib/python2.7/site-packages/octavia/controller/worker/tasks/network_tasks.py:323	18:16
colby_	2018-08-14 18:00:46.018 38226 INFO octavia.network.drivers.neutron.allowed_address_pairs [-] Port ed4220ca-2860-4597-bd0c-1448e0b49d0a already exists. Nothing to be done.	18:16
colby_	2018-08-14 18:00:46.212 38226 DEBUG neutronclient.v2_0.client [-] Error message: {"NeutronError": {"message": "Port ed4220ca-2860-4597-bd0c-1448e0b49d0a could not be found.", "type": "PortNotFound", "detail": ""}} _handle_fault_response /usr/lib/python2.7/site-packages/neutronclient/v2_0/client.py:258	18:16
johnsom	colby_ Do you get the right thing with just specifying subnet_id?	18:16
colby_	no It still gets the same weird ip	18:17
johnsom	Are you using neutron-lbaas or native Octavia?	18:17
colby_	native octavia	18:17
johnsom	Good. Hmmm, which version of Octavia?	18:17
colby_	pike: 1.0.2-1 (centos)	18:18
colby_	the ip on that subnet should be 192.168.2.0/24	18:19
johnsom	On create are you specifying an IP or just the network/subnet?	18:21
colby_	tried just subnet and subnet & network	18:22
colby_	like so: openstack loadbalancer create --name lb15 --vip-subnet-id 0f09cbd9-711d-4582-ac02-aa64ccd038f2 --vip-network-id aed09c75-74e0-447a-936e-7362ad77597b	18:23
johnsom	Ok, can you pastebin a "openstack subnet show 0f09cbd9-711d-4582-ac02-aa64ccd038f2" and "openstack network show aed09c75-74e0-447a-936e-7362ad77597b"?	18:24
colby_	sure	18:24
colby_	https://pastebin.com/9qUg8zSq	18:26
johnsom	When you first ran the load balancer create command, did the return data show that IP as well? 198.51.100.1	18:29
colby_	yes	18:29
colby_	does the api use noop? just seeing that in the logs: 2018-08-14 18:00:32.754 1350542 DEBUG octavia.network.drivers.noop_driver.driver [req-c15a1e74-74a1-48f4-878d-7080e150ea7b e28435e0a66740968c523e6376c57f68 18882d9c32ba42aeaa33c4703ad84b2c - default default] Network NoopManager no-op, get_network network_id aed09c75-74e0-447a-936e-7362ad77597b get_network /usr/lib/python2.7/site-packages/octavia/network/drivers/noop_driver/driver.py:119	18:31
johnsom	That is your problem. You are getting a "fake" IP from the noop driver. No-Op means it does no operations against other services, such as neutron/nova/glance/etc.	18:32
johnsom	I was just opening a window to see if that was a no-op range	18:32
colby_	oh I did not see a driver option for API config just worker	18:32
rm_work	it's all the same config	18:32
rm_work	you should be using an identical config for all processes, really	18:33
colby_	ah gotcha.	18:33
colby_	ok that makes it more complicated since I use the puppet modules. At least I know what I need to do now though	18:33
johnsom	Yeah, we did not do a good job on that section of the config, it is confusing that other processes use stuff in the "controller_worker" section	18:34
rm_work	hmm, which puppet modules?	18:34
colby_	the openstack ones	18:35
rm_work	is there something which has a bad idea about how to do octavia config deployment?	18:35
colby_	I just need to include with the api server and NOT manage packages and not enable the service	18:35
rm_work	openstack has puppet modules? i guess i am out of the loop lol	18:35
rm_work	i thought everyone used ansible	18:35
colby_	haha yea. We use puppet/foreman for all our deployment	18:36
*** sapd1 has quit IRC		18:38
mnaser	ok	18:58
mnaser	im done	18:58
colby_	Ok progress. Its creating the port on the correct subnet/network now. But its not creating a port on the octavia managment net. Ill dig some more on that	18:58
mnaser	johnsom, rm_work: ok i'm checking if its enabled now	18:59
mnaser	event_streamer_driver = queue_event_streamer	19:03
mnaser	sync_provisioning_status = True	19:03
johnsom	Ouch, double trouble	19:04
rm_work	yeah that's it	19:04
rm_work	noop and False plz	19:04
johnsom	mnaser Use the defaults: https://github.com/openstack/octavia/blob/master/etc/octavia.conf#L91	19:04
johnsom	With both of those on I am surprised you made it this long....	19:05
mnaser	lol	19:05
mnaser	octavia_sync_provisioning_status/octavia_event_streamer both True	19:05
mnaser	in stalbe/queens	19:05
mnaser	should we fix that?	19:05
rm_work	default?!	19:06
rm_work	mnaser: i think they have always defaulted to False ...	19:06
rm_work	since basically ever	19:06
rm_work	it must be whatever deployment system	19:07
mnaser	https://github.com/openstack/openstack-ansible-os_octavia/blob/stable/queens/defaults/main.yml#L281-L286	19:07
mnaser	negatory	19:07
mnaser	for openstack ansible, sorry	19:07
rm_work	yeah	19:07
rm_work	so yes, as johnsom was saying, I guess a mistake was made setting that up in OSA	19:07
rm_work	if we can fix it, maybe we should	19:07
johnsom	I thought German did fix those in OSA, but if they are still there, yes, it should be fixed	19:07
mnaser	i kinda wanna push a stable/pike only fix	19:07
rm_work	well, I think in Pike most people ran it behind n-lbaas still	19:08
rm_work	though technically it can run standalone	19:08
mnaser	amateurs	19:08
mnaser	:P	19:08
rm_work	lol	19:08
mnaser	https://github.com/openstack/openstack-ansible-os_octavia/commit/c8f4f275a1e9968b1f049c022639adadc4333b63	19:08
mnaser	this commit added it	19:08
rm_work	hey I still think ALL ya'll are amateurs, for running stable branches :P	19:08
johnsom	Yeah, it could be that pike OSA still defaults to neutron-lbaas	19:08
mnaser	i think what i want to do is	19:08
mnaser	if octavia_v1 is false and octavia_v2 is true then octavia_event_streamer/octavia_sync_provisioning_status = false else true	19:09
rm_work	yeah	19:09
rm_work	i was gonna suggest something like tha	19:09
johnsom	eh gad, it's both true in queens too.	19:09
johnsom	mnaser +1	19:09
johnsom	Well, +2, I'm a core there	19:09
mnaser	sorry	19:09
mnaser	i meant stable/queens	19:10
mnaser	i assume queens didnt default to v2 only	19:10
mnaser	in osa	19:10
johnsom	Yep, it looks like queens has v1 enabled	19:10
mnaser	ill do the patch that enables it	19:10
mnaser	and backport it through	19:11
johnsom	Thanks.	19:11
mnaser	johnsom: https://review.openstack.org/591829 Enable event streamer and provisioning status sync for V1 API	19:17
mnaser	rm_work: ^	19:17
jiteka	Hey guys, I'm still trying to get my LB working in Queens using ubuntu based amphora image xenial	19:34
jiteka	Problem : I'm getting "Failed to bring up eth1" when octavia-worker contact amphora Rest API on /0.5/plug/vip/<vip address>	19:34
jiteka	I disabled revert with task_flow to get a chance to ssh and troubleshoot amphora and I realise 2 things	19:34
jiteka	1. BACKUP amphora node is not yet configured when the LB failed to be created due to that ifup problem on MASTER	19:34
jiteka	and when I'm doing my cURL I just get {"message":"OK","details":"VIP 10.63.68.30 plugged on interface eth1"}	19:34
jiteka	amphora-agent : ::ffff:10.63.8.9 - - [14/Aug/2018:17:16:41 +0000] "POST /0.5/plug/vip/10.63.68.30 HTTP/1.1" 202 70 "-" "curl/7.29.0"	19:34
jiteka	2. MASTER amphora node that failed is left half-configured so to avoid {"message":"Interface already exists"} on cURL on /plug/vip	19:34
jiteka	I deleted the netns amphora-agent then tried again and got a 202 OK too on that node	19:34
jiteka	But when it fail the first time on the master, interface cfg for eth1 look like this :	19:34
jiteka	http://paste.openstack.org/show/728039/	19:34
jiteka	and when it works on the cURL command it looks like this :	19:34
jiteka	http://paste.openstack.org/show/728040/	19:34
jiteka	eth1 is no more static but getting his config from dhcp	19:34
jiteka	Any idea, suggestion why this is happening ?	19:34
johnsom	jiteka Did you get a chance to do the line by line test I recommended before my vacation?	19:41
johnsom	My guess is that overlapping host route is not making ubuntu happy	19:41
jiteka	not sure how to proceed	19:41
johnsom	Ok, let me give you steps	19:42
johnsom	In the amphora:	19:42
johnsom	ip netns exec amphora-haproxy bash	19:42
johnsom	(you need to be root)	19:42
jiteka	ok	19:42
johnsom	ifconfig eth1:0 down	19:42
jiteka	I remember you showed me that	19:42
johnsom	ifconfig eth1 down	19:42
johnsom	Those may fail, that is ok	19:42
johnsom	Then ifconfig eth1 up	19:43
johnsom	ifconfig eth1:0 up	19:43
johnsom	One or more of those might fail too, that is also ok	19:43
johnsom	Then go through the extra config lines for the host routes:	19:43
johnsom	ip route add -net 169.254.169.254/32 gw 10.63.68.2 dev eth1	19:44
jiteka	eth1:0 fail yes :	19:44
jiteka	Cannot assign requested address	19:44
johnsom	ip route add -net 198.74.40.0/24 gw 10.63.68.2 dev eth1	19:44
johnsom	ip route add -net 10.0.0.0/8 gw 10.63.68.2 dev eth1	19:44
johnsom	ip route add -net 172.16.0.0/12 gw 10.63.68.2 dev eth1	19:44
jiteka	Error: ??? prefix is expected rather than "-net".	19:44
johnsom	Oh, right, those are the other command syntax, one second	19:44
johnsom	remove the "ip" from the front of those	19:45
johnsom	route add -net 12.148.80.0/22 gw 10.63.68.2 dev eth1	19:45
johnsom	^^^ that was the last in the top list	19:45
johnsom	/sbin/ip route add 10.63.68.0/22 dev eth1 src 10.63.68.30 scope link table 1	19:45
johnsom	/sbin/ip route add default via 10.63.68.1 dev eth1 onlink table 1	19:45
jiteka	Network is unreachable	19:45
johnsom	/sbin/ip route add 169.254.169.254/32 via 10.63.68.2 dev eth1 onlink table 1	19:46
johnsom	Which line?	19:46
jiteka	12.148.80.0/22	19:46
jiteka	/sbin/ip route add default via 10.63.68.1 dev eth1 onlink table 1 (OK)	19:47
johnsom	Ok, darn, the interface must not have been up	19:47
johnsom	/sbin/ip route add 198.74.40.0/24 via 10.63.68.2 dev eth1 onlink table 1	19:47
johnsom	/sbin/ip route add 10.0.0.0/8 via 10.63.68.2 dev eth1 onlink table 1	19:47
johnsom	/sbin/ip route add 172.16.0.0/12 via 10.63.68.2 dev eth1 onlink table 1	19:47
johnsom	/sbin/ip route add 12.148.80.0/22 via 10.63.68.2 dev eth1 onlink table 1	19:48
johnsom	/sbin/ip rule add from 10.63.68.30/32 table 1 priority 100	19:48
johnsom	My suspicion is that last line is going to fail'	19:48
jiteka	http://paste.openstack.org/show/728041/	19:49
jiteka	no hard fail here	19:49
jiteka	at least not visible in command output	19:50
johnsom	Hmmm, ok, so it must be in the first block.	19:50
jiteka	/sbin/ip route list	19:51
jiteka	it should give me the list of rules I just added right ?	19:51
johnsom	Except that top block where you got that error about network not reachable.	19:51
johnsom	Did you get that on all of the lines or just the first one?	19:51
jiteka	http://paste.openstack.org/show/728042/	19:53
jiteka	invalid argument on :	19:53
jiteka	/sbin/ip route add 10.63.68.0/22 dev eth1 src 10.63.68.30 scope link table 1	19:53
johnsom	Can you do an openstack subnet show again for that subnet? I want to look at the host routes configured there	19:57
jiteka	sure	19:57
jiteka	doing it now	19:58
jiteka	http://paste.openstack.org/show/728044/	19:58
johnsom	Well, I'm not sure what the issue is. I think it's the overlapping host route, but not sure.	20:01
johnsom	Let me build a devstack, create a matching subnet and give it a go local so I can poke at this without the IRC latency. grin	20:01
*** amuller has quit IRC		20:02
jiteka	johnsom: thanks, I will be on my way home soon but will resume on this on thursday (bank holiday tomorrow in France)	20:02
johnsom	Ok, if you have an open story for this I can post my results there.	20:03
johnsom	https://storyboard.openstack.org/#!/dashboard/stories	20:03
johnsom	It will take me about a half hour to stack and get that subnet configured, etc.	20:03
jiteka	ahaha yes I know that devstack is taking time	20:04
jiteka	I anticipated that :D	20:04
colby_	will disable_revert = true leave the amphora in place on error. The worker is having trouble connecting to the amphora on port 9443. Im able to ssh to the instance but it gets destroyed right away before I can troubleshoot.	20:05
jiteka	hmm what could I put here on that story as "Task title"	20:07
johnsom	colby_ Yes. You can also stop the health manager process to stop it from being failed over to a new vm	20:08
johnsom	jiteka "load balancer create fails with host routes"	20:08
johnsom	Should do the trick	20:08
jiteka	https://storyboard.openstack.org/#!/story/2003441	20:10
johnsom	Thank you. I will post my findings there	20:10
jiteka	I think also that I will be in that configuration :	20:13
jiteka	https://bugs.launchpad.net/octavia/+bug/1488279	20:13
openstack	Launchpad bug 1488279 in octavia "Amphora fails with member on same subnet as vip" [Critical,Fix released]	20:13
jiteka	because I'm trying to get my VIP on the same subnet that futur listeners	20:13
jiteka	I don't see that issue yet	20:13
jiteka	but I see that status is "fixed released"	20:14
jiteka	fix	20:14
johnsom	Yeah, that bug was fixed a long time ago.	20:16
jiteka	colby_: it actually leave it in PENDING_CREATE, so after when I need to delete it I need to go in the DB to update the octavia.load_balancer row	20:16
jiteka	update load_balancer set provisioning_status = 'ERROR' where id = '<lb_uuid>'	20:16
jiteka	colby_: API prevent you to delete lb with provisionning status = PENDING_CREATE	20:17
johnsom	PENDING_CREATE will go to ERROR once the retries have given up. It's a configuration setting. The default is super long for folks using bad hypervisors like virtualbox. They will either go to ERROR or ACTIVE.	20:17
jiteka	johnsom: ok that's a good news	20:17
colby_	yea Ive had to do that before. Thanks.	20:17
johnsom	You have to be careful doing that with the DB as if it's in PENDING_* that means one of the controllers is actively working on that load balancer and you could get things in an inconsistent state by changing the status while a controller has ownership of the object.	20:18
johnsom	You will get at least a bunch of errors in the log.	20:19
colby_	ok so looks like my problem is just a timeout maybe. I can telnet to port 9443 from the worker node now that the instance is staying. so maybe the amphora just was not up all the way when the worker is trying to connect.	20:20
*** salmankhan has joined #openstack-lbaas		20:20
johnsom	colby_ With a good cloud it should take about 30 seconds to come up. Is this running on bare metal or in a hypervisor?	20:21
colby_	hypervisor	20:21
johnsom	Which one?	20:22
colby_	kvm	20:22
johnsom	Ok, so you are running kvm inside kvm?	20:22
colby_	oh no	20:23
colby_	its spinning up amphora on our compute nodes	20:23
colby_	compute nodes are baremetal running kvm	20:23
johnsom	On the compute host, can you look in /proc/cpuinfo and see if you find vmx or smx?	20:23
colby_	vmx	20:25
colby_	these are dual Intel(R) Xeon(R) CPU E5-2697 v4	20:25
johnsom	Ok, yeah, so it should boot up and be ready in about 30 seconds.	20:25
colby_	the instance spins up quickly	20:25
colby_	I can ssh in...in less than 30 seconds	20:26
johnsom	Hmm, so it's not likely the timeout unless you have changed the defaults to something low	20:26
rm_work	sounds like a bad return issue like we had a long time ago	20:51
rm_work	where the agent responded badly on the initial startup if you hit it too soon	20:52
colby_	I get: 2018-08-14 20:46:26.119 6118 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: ConnectionError: HTTPSConnectionPool(host='10.10.0.7', port=9443): Max retries exceeded with url: /0.5/plug/vip/192.168.2.8 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7fa4c86c1650>: Failed to establish a new connection: [Errno 111] Connection refused	20:52
colby_	',))	20:52
colby_	Here are the logs of the amphora-agent: https://pastebin.com/gXXPdPch	20:53
colby_	the timestamps are about the same so maybe its taking longer to get the agent up than the worker is expecting	20:54
johnsom	This is not normal: [CRITICAL] WORKER TIMEOUT	20:54
johnsom	It sounds like gunicorn is crashing	20:54
colby_	it starts another worker after that	20:55
colby_	and that one doesn't crash	20:55
rm_work	yeah so prolly the initial one crashes during the request	20:55
rm_work	and that kills it	20:55
rm_work	because the controller GETS something, and it's bad	20:55
johnsom	Hmm, I don't see those	20:55
colby_	sorry I didn't paste those. Ill redo	20:55
rm_work	yeah crashes or hangs	20:56
rm_work	i wonder if it's handing on the config action	20:56
rm_work	*hanging	20:56
rm_work	like, on an inteface up or something	20:56
colby_	https://pastebin.com/Zt0vShuJ	20:56
colby_	the ssl negotiatin errors are me testing via telnet and curl from the worker	20:57
rm_work	the issue is almost certainly that timeout tho	20:57
colby_	oops pastebin had a captcha I didn't see	20:58
colby_	now that link should work	20:58
rm_work	we only try repeatedly to get a connection if we get a timeout/refused... if we get like a partial data return that's broken, we don't retry	20:58
rm_work	we bail and recycle the amp	20:58
rm_work	it worked, i just had to captcha :P	20:58
colby_	How can I test using curl (and using the certs) to see what kind of response I get	21:01
rm_work	i don't have the command handy, but i remember basically i just had to go through the curl manpage and pick all the cert related stuff	21:02
colby_	sorry yea I meant which cert exactly do I need to connect to the amphora. I have split CA certs setup	21:02
rm_work	oh	21:02
rm_work	you validate using the cert that is the CA used to create the amp cert	21:03
rm_work	IIRC	21:03
rm_work	but i've never actually done split-cert lol	21:03
colby_	The method is not allowed for the requested URL	21:14
colby_	is that due to certificate issue	21:14
colby_	?	21:14
colby_	trying to connect to: https://10.10.0.7:9443/0.5/plug/vip/192.168.2.8	21:14
*** abaindur has joined #openstack-lbaas		21:21
abaindur	johnsom: or anyone else that can answer this question for me: What exactly is the bind_ip and controller_ip_port_list in the [health_manager] for, and what is it to be set to?	21:23
*** salmankhan has quit IRC		21:23
abaindur	is it the IP of the host/physical machine on which the octavia health-manager service is running?	21:24
*** salmankhan has joined #openstack-lbaas		21:24
abaindur	and why is it in a list format? Is this assuming if we have health-manager running on multiple hosts, the conf value on every host should be the same and include the public IPs of every host running health-manager?	21:25
colby_	controller_ip_port_list is a list of the hosts running health_manager. The amphora use this list to connect to as far as I can tell	21:26
abaindur	For example we have health-manager and worker running on 2 hosts, suppose with management IP 10.4.0.2 and 10.4.0.3 (this is the IP of physical machine, not at all related to the octavia LB network)	21:26
colby_	I have 2 health managers so I have both in there	21:26
abaindur	bind_ip would be different on each host - it would be 10.4.0.2 on one, 10.4.0.3 on the other	21:27
colby_	correct	21:27
colby_	the list would include both for the amphora	21:27
abaindur	and "controller_ip_port_list = 10.4.0.2:5555, 10.4.0.3:5555" - on both the hosts .conf files?	21:27
colby_	I believe so (I could be wrong. That's how I did it and I don't get errors any more)	21:28
abaindur	what did you also set for the heartbeat_key?	21:28
abaindur	Does this really matter...? are we just supposed to set it to something random?	21:29
colby_	random key	21:29
colby_	random string rather	21:29
abaindur	hmm ok. right now i have set the bind_ip and controller_ip_port_list to 127.0.0.1. We have it only on one host	21:29
abaindur	but i dont see health-manager doing anything in the logs.	21:30
abaindur	however, the loadbalancer is working	21:30
abaindur	what is also strange is the provisioning_status \| ACTIVE	21:30
abaindur	but the operating_status shows as offline	21:30
abaindur	Is this due to health-manager not doing anything, which due to wrong IP used for bind_ip/controller port list (127.0.0.1)	21:31
abaindur	the LB I deployed was without a healthmonitor	21:31
colby_	Im not sure on that. Im still having trouble getting the lb to spin up successfully. Im having issues with the worker connecting to the amphora	21:37
abaindur	what is the issue?	21:39
colby_	worker cant connect to the amphora	21:40
abaindur	i had mentioned the LB network needs to basically be a provider network, from what i could tell	21:40
colby_	it times out. Its not a firewall issue. I can telnet to the port from the worker	21:40
abaindur	since the octavia worker runs as a process on the host, it will use the physical machines' networking/routing table	21:40
abaindur	so here we have a host, which needs to be able to talk ot a VM. for us, that means it cant be an isolated tenant network, it has to be a provider network	21:41
colby_	I can connect from command line so the process should be able to fine. I think there is something more going on	21:41
colby_	the amphora worker crashes	21:42
abaindur	security group of the amphora?	21:42
abaindur	We just put it on an allowall sec group that allows all traffic	21:42
abaindur	is the flavor large enough? im giving my amp 1 cpu, 2 GB, and 10GB disk	21:42
colby_	no I can connect to 9443 from the worker fine.	21:42
colby_	1cpu 1GB Ram	21:43
colby_	I was going to try curl from the worker to duplicate what it tried to do but Im having trouble doing it due to hostname mismatch	21:44
colby_	using --resolve didn't help either	21:45
*** salmankhan has quit IRC		22:04
abaindur	colby_: how do you know the amphora crashes?	22:04
abaindur	maybe then it is problem with your disk image when you built it	22:04
colby_	no it seems to happen when the worker makes the connection attempt	22:08
colby_	the amphora worker times out, dies then starts another worker	22:09
colby_	which is why I wanted to try and test a connection with curl to see if I see any issues there	22:09
*** fnaval has quit IRC		22:16
colin-	what is the most direct way to determine the nova ID(s) of amphora agent for a given loadbalancer in the cli?	22:27
johnsom	As an admin you can use the amphora admin API	22:37
colin-	found what i needed that way, thanks johnsom	22:48
abaindur	speaking of, what is the [amphora_agent] section used for?	22:48
johnsom	abaindur those are settings rendered inside the amphora-agent confguration file. Ignore it in your octavia.conf	22:52
johnsom	It's an oddity with how our agent is setup and oslo config. I removed them from the sample octavia.conf, but people kept putting them back, so I stopped fighting it	22:53
abaindur	yea we left it blank. didnt see any issues, the LB was working	22:57
abaindur	johnsom: btw did you see my msgs above about the controller_ip_port_list and our operating_status showing as OFFLINE? even though the LB is working	23:00
abaindur	provisioning_status shows as ACTIVE	23:01
abaindur	right now i have set the bind_ip and controller_ip_port_list to 127.0.0.1. We have it only on one host	23:01
abaindur	so I suspect its because the amphora VM cant heartbeat to the health-manager? we see nothing in the health-manager logs	23:02
johnsom	No I didn't see it. I have four conversations going at the same time right now	23:02
johnsom	So, yeah, exactly, you are telling the amphora to send their heartbeat to themselves, so we never get status or stats from those amphora.	23:02
colby_	So any ideas on how to test the amphora connection. I got curl working but get a method not allowed to 0.5/plug/vip/192.168.2.23	23:05
abaindur	whats your .conf file look like?	23:05
abaindur	Maybe its your certs?	23:05
colby_	mine is complicated since I have split CA cert (one for client and one for server)	23:06
abaindur	doesnt the worker talk to the amphora over https REST api. so if you can ping/telnet working to the amphora VM from the host and its on an allow all security group, sounds like it cant authenticate?	23:07
abaindur	try the same certs...?	23:07
colby_	yes, Which is why I was trying to use curl with the certs	23:07
colby_	to mimic the worker	23:07
johnsom	colby_ So, after you set all that up in the config for the new certs, did you build a new amp? Those certs are loaded at amp boot time	23:07
colby_	you mean the image?	23:08
abaindur	build a new amp == build an entire new disk image? or just create a LB to spin up a new AMP	23:08
johnsom	No, not the image, at "openstack loadbalancer create" time	23:08
colby_	yes Ive made many attempts to spin up new lb with the certs in place correctly	23:09
abaindur	why dont you try w/ the same certs	23:10
abaindur	i just ran the sample script in the repo	23:10
abaindur	then scp -r the output folder onto my host	23:10
johnsom	colby_ Sorry, too many conversations going on, so refocusing here	23:10
johnsom	colby_ Are you getting SSL errors in the syslog or amphora-agent log on the amp?	23:11
colby_	not from when the worker tried to connect	23:11
colby_	it just had that timeout	23:11
johnsom	Ok, so these "Invalid request from ip=::ffff:10.10.0.8: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake " were from your manual tests?	23:12
colby_	correct	23:12
colby_	when I use curl pointing to my certs I dont get an ssl error either	23:13
johnsom	And all of your retries are "[Errno 111] Connection refused"	23:13
johnsom	?	23:13
colby_	I just get a method not allowed	23:13
johnsom	Yeah, GET won't work on that path.	23:13
johnsom	If you do /info you would get data	23:13
johnsom	oh, well 0.5/info	23:14
colby_	{"haproxy_version":"1.6.3-1ubuntu0.1","api_version":"0.5","hostname":"amphora-eb2ba854-d38f-4100-b5dc-578c1db556ac"}	23:15
colby_	so thats working from curl	23:15
johnsom	So, yeah, you can get there just fine...	23:15
colby_	yea I only see 2 logged failures with error 111	23:16
colby_	or rather there are only 2 log entries about it both failing with 111	23:16
colby_	in the worker logs	23:16
johnsom	Can you pastebin the controller log from before it starts trying to plug the VIP until it errors or, 2-3 screens worth of the WARNING messages?	23:16
colby_	the controller worker log right:?	23:17
johnsom	yes please	23:17
colby_	https://pastebin.com/qBKpnQeF	23:20
colby_	I removed the private key from the logs :-)	23:20
colby_	I have to run but please let me know if you notice anything	23:21
johnsom	Sure, NP. Darn it, I had those hidden, but we must have changed the task names	23:21
johnsom	colby_ Yeah, you have a bad cert or config on the controller side	23:22
abaindur	how do you know? (ps, no affiliation with colby, just curious)	23:23
johnsom	Error: [('PEM routines', 'PEM_read_bio', 'no start line'), ('SSL routines', 'SSL_CTX_use_PrivateKey_file', 'PEM lib')]	23:23
johnsom	The private key file is bad, missing, or the passphrase is wrong	23:24
*** Swami has quit IRC		23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!