Tuesday, 2019-03-19

*** sapd1_x has joined #openstack-lbaas		00:11
*** luksky has quit IRC		00:15
*** sapd1_x has quit IRC		00:21
*** abaindur has joined #openstack-lbaas		00:51
*** abaindur has quit IRC		00:55
*** abaindur has joined #openstack-lbaas		01:02
*** abaindur has quit IRC		01:03
*** abaindur has joined #openstack-lbaas		01:03
*** lemko has quit IRC		01:04
*** ricolin has joined #openstack-lbaas		01:08
*** hongbin has joined #openstack-lbaas		01:12
*** ricolin has quit IRC		02:44
*** sapd1_x has joined #openstack-lbaas		03:52
*** sapd1_x has quit IRC		04:04
*** hongbin has quit IRC		04:12
*** ramishra has joined #openstack-lbaas		04:18
*** vishalmanchanda has joined #openstack-lbaas		05:07
vishalmanchanda	johnsom: Hi, i've had a query regarding your patch[1], may i know why we need both horizon-nodejs4-jobs, horizon-nodejs10-jobs in the octavia plugin?	05:11
vishalmanchanda	[1] https://review.openstack.org/#/c/643630/	05:11
johnsom	vishalmanchanda: Hi. So our old jobs used nodejs 4. However that is very old now, so we want to test with the current Long Term Support (LTS) version of nodejs, version 10	05:23
*** ricolin has joined #openstack-lbaas		05:23
vishalmanchanda	johnsom: yes, my question was related to keeping nodejs4. IIUC it's running on xenial and we've migrated to bionic so why are we still keeping that job?	05:42
johnsom	vishalmanchanda: We consider Stein the transition release where both xenial and bionic should be supported. In train, nodejs 4 will not matter.	05:58
vishalmanchanda	johnsom: ok. Thanks for info.	06:00
*** lemko has joined #openstack-lbaas		06:15
*** ivve has joined #openstack-lbaas		06:19
*** ricolin has quit IRC		06:43
*** gcheresh has joined #openstack-lbaas		06:47
*** mkuf has joined #openstack-lbaas		06:50
openstackgerrit	OpenStack Proposal Bot proposed openstack/octavia-dashboard master: Imported Translations from Zanata https://review.openstack.org/644496	07:13
*** pcaruana has joined #openstack-lbaas		07:14
*** pcaruana has quit IRC		07:33
*** pcaruana has joined #openstack-lbaas		07:34
*** rpittau\|afk is now known as rpittau		07:56
*** ramishra has quit IRC		08:03
*** ramishra has joined #openstack-lbaas		08:06
*** luksky has joined #openstack-lbaas		08:07
*** luksky has quit IRC		08:16
*** abaindur has quit IRC		08:24
*** ricolin has joined #openstack-lbaas		08:58
*** luksky has joined #openstack-lbaas		09:19
*** ramishra has quit IRC		09:23
*** ramishra_ has joined #openstack-lbaas		09:35
*** ramishra_ has quit IRC		09:36
*** ramishra_ has joined #openstack-lbaas		09:36
openstackgerrit	Merged openstack/octavia-dashboard master: Imported Translations from Zanata https://review.openstack.org/644496	10:12
dulek	cgoncalves: Hi! Do I need to enable o-da even in Apmhora jobs in Kuryr?	10:24
cgoncalves	dulek, not currently, no	10:25
dulek	cgoncalves: So just for OVN?	10:25
cgoncalves	dulek, yes and any other 3rd party provider driver	10:26
dulek	cgoncalves: Okay, nice, thanks!	10:26
*** yamamoto has quit IRC		10:50
*** yamamoto has joined #openstack-lbaas		10:56
*** yamamoto has quit IRC		11:00
*** yamamoto has joined #openstack-lbaas		11:05
*** yamamoto has quit IRC		11:05
*** yamamoto has joined #openstack-lbaas		11:06
*** psachin has joined #openstack-lbaas		11:07
*** rcernin has quit IRC		11:24
*** psachin has quit IRC		11:54
*** luksky has quit IRC		12:07
openstackgerrit	Margarita Shakhova proposed openstack/octavia master: Support create amphora instance from volume based. https://review.openstack.org/570505	12:16
*** luksky has joined #openstack-lbaas		12:20
*** luksky has quit IRC		12:29
*** trown\|outtypewww is now known as trown		12:38
*** luksky has joined #openstack-lbaas		12:42
*** yamamoto has quit IRC		13:09
rm_work	johnsom: was the plan for even the amphora driver to go through o-da eventually?	13:12
cgoncalves	rm_work, IIRC I read somewhere it is. I'd like it	13:15
rm_work	It seems like it'd be "fair"	13:22
rm_work	Otherwise we have a kind of... First class / second class provider arrangement	13:23
rm_work	And multiple code paths	13:23
rm_work	I guess that's probably planned as a follow up?	13:23
cgoncalves	right	13:23
cgoncalves	planned: most likely yes. assigned: unsure ;)	13:23
rm_work	It's something I'd not mind doing but ffff no idea if I'm gonna have time for that much	13:26
rm_work	We'll see	13:27
*** yamamoto has joined #openstack-lbaas		13:28
rm_work	Do we have any patches we need for RC1?	13:31
cgoncalves	https://etherpad.openstack.org/p/octavia-priority-reviews	13:31
*** sapd1_x has joined #openstack-lbaas		14:13
rm_work	Will see if I can do some reviews today... :/	14:14
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Add RHEL 8 amphora support https://review.openstack.org/638581	14:46
*** yamamoto has quit IRC		14:49
*** fnaval has joined #openstack-lbaas		14:50
*** yamamoto has joined #openstack-lbaas		14:55
*** yamamoto has quit IRC		14:55
*** yamamoto has joined #openstack-lbaas		14:56
*** yamamoto has quit IRC		14:58
*** luksky has quit IRC		14:58
*** yamamoto has joined #openstack-lbaas		14:58
*** yamamoto has quit IRC		14:58
*** yamamoto has joined #openstack-lbaas		14:59
*** yamamoto has quit IRC		15:03
*** vishalmanchanda has quit IRC		15:05
openstackgerrit	Margarita Shakhova proposed openstack/octavia master: Support create amphora instance from volume based. https://review.openstack.org/570505	15:12
*** sapd1_x has quit IRC		15:19
*** roukoswarf has joined #openstack-lbaas		15:23
openstackgerrit	Nir Magnezi proposed openstack/octavia master: Add RHEL 8 amphora support https://review.openstack.org/638581	15:28
*** gcheresh has quit IRC		15:34
*** yamamoto has joined #openstack-lbaas		15:39
*** yamamoto has quit IRC		15:45
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Add RHEL 8 amphora support https://review.openstack.org/638581	15:48
*** luksky has joined #openstack-lbaas		15:51
openstackgerrit	Nir Magnezi proposed openstack/octavia master: Add RHEL 8 amphora support https://review.openstack.org/638581	15:55
*** trown is now known as trown\|lunch		16:13
*** ramishra_ has quit IRC		16:15
*** ivve has quit IRC		16:31
roukoswarf	johnsom: how crazy of an LB count has octavia/amphora been tested at?	16:34
roukoswarf	having some downtime on adding listeners to a LB with like... 6 listeners, 6 pools, 2 members ish each	16:35
johnsom	roukoswarf What are you seeing? That seems pretty small	16:35
xgerman	rm_work: ran our biggest cluster followed by eandersson	16:35
johnsom	Well, there are others too.	16:36
xgerman	:-)	16:36
roukoswarf	user is adding a new listener, and then every LBs provisioning status went error, then went down, then eventually just the one that got a listener stayed down, but the other LBs came back	16:36
roukoswarf	ah actually, LBs never went down, but it did take down the provision status of every listener in that LB to down	16:37
roukoswarf	sorry, error, not down, same result	16:38
johnsom	Yeah, we try to make sure they still pass traffic even if there is some kind of provisioning error.	16:38
roukoswarf	so added new listener, then every listener which previously worked is now down	16:38
johnsom	What does your controller worker log show as to why it went to error?	16:38
roukoswarf	which log?	16:40
johnsom	The worker process log	16:40
johnsom	o-cw if you are using devstack	16:40
roukoswarf	both of the amphoras are listed as down now.. and one is completely not even an instance anymore.	16:43
roukoswarf	still checking logs	16:43
johnsom	While you look for logs, I will boot up an LB with that scenario and see what I get.	16:44
roukoswarf	what influences the amphora status? master and backup both listed as down	16:45
roukoswarf	never had an amphora blow up	16:45
roukoswarf	is there a way to re-provision the amphoras?	16:45
johnsom	Yeah, me either	16:45
xgerman	the failover command will under certain conditions re-provision	16:46
roukoswarf	both the active and backup died	16:46
roukoswarf	but... the backup is totally deleted, not an instance anymore, and the active is "error", but the instance is up	16:47
johnsom	Yeah, that is what should happen. It attempted to repair the backup, but it failed somehow (nova issue?), so it stopped.	16:48
roukoswarf	so how do i give it a kick to see if it fails again?	16:48
johnsom	Ok I have an LB with six listeners, each with a pool, each with two members.	16:48
roukoswarf	all he did was add another listener, took the entire active-backup pair down	16:49
johnsom	The failover commands are what you need.	16:49
johnsom	Ah, active/standby. I'm running single. Let me test this, then I will restart with act/stdby.	16:49
johnsom	Yeah, no issue with #7 listener.	16:50
roukoswarf	also, every member is a "external" member because this user is doing insanity with ip pairs	16:51
*** henriqueof2 has quit IRC		16:51
roukoswarf	but that should be the same code path, no?	16:51
*** luksky has quit IRC		16:54
johnsom	Correct, same code	16:55
johnsom	Ok, I'm up to 10 listeners, each with a pool, each with two members. No errors.	16:55
johnsom	I will start over with active/standby	16:55
roukoswarf	health manager is failing out the LB due to WARNING octavia.controller.healthmanager.health_drivers.update_db [-] Amphora 676aaa72-7c61-4c67-8b1b-4b9ed48dab0e health message reports 5 listeners when 6 expected	16:56
roukoswarf	it seems, looking for why theres a mismatch	16:57
johnsom	Yeah, that is just a warning and can happen during the provisioning process. Not an error	16:57
*** ricolin has quit IRC		17:01
johnsom	Yeah, up to seven under active/standby as well. All is working as expected	17:04
*** luksky has joined #openstack-lbaas		17:07
johnsom	Ok, up to 10	17:07
johnsom	https://www.irccloud.com/pastebin/YR5nUy12/	17:09
roukoswarf	http://paste.openstack.org/show/gKyWjSbM1ZuQqQDG4cfM/	17:10
roukoswarf	this is the only errors we have that i can find, but they are not for the right LBs	17:10
roukoswarf	the failed out lbs are .50 and .35	17:10
*** rpittau is now known as rpittau\|afk		17:10
roukoswarf	no mention of either failed LB id either	17:11
johnsom	What version of Octavia are you using in the image?	17:12
roukoswarf	you mean the amphora?	17:12
roukoswarf	i built everything on the git branch for rocky	17:13
johnsom	Yeah, what version of Octavia did you use to create the image?	17:13
roukoswarf	3.0.2	17:13
johnsom	This is the troubling error: Amphora agent returned unexpected result code 500 with response {u'http_code': 500, u'error': u"a bytes-like object is required, not 'str'"}	17:14
roukoswarf	well, thats the worker/controller.	17:14
johnsom	It implies there is something wrong with the image.	17:14
roukoswarf	looks like a python2 vs python3 problem	17:14
johnsom	Right. however we test with both. this is why I'm wondering what is up	17:15
roukoswarf	whats the easy way to check octavia version in the image built?	17:15
roukoswarf	controller is 3.0.2, but do i just check the agent's pip env?	17:16
johnsom	Well, you can ssh into the instance and cd /opt/amphora-agent and do a "git log"	17:16
johnsom	commit ae26cf9ebf7986148f9be2ddcc67cc185e88c7e0	17:17
johnsom	There is also a way to manually query the amp agent, but that is harder to do because of the security	17:17
roukoswarf	octavia==3.1.0.dev166 inside the image, quite ahead...	17:17
roukoswarf	image is commits up to jan 9	17:18
roukoswarf	so should we build an image and lock it to 3.0.2? is the mismatch the problem?	17:19
johnsom	It should not be.	17:19
roukoswarf	how do i get the images back up so i can stop blocking people while i troubleshoot this? how do i trigger a rebuild of master/backup?	17:20
johnsom	So, if you can reproduce this, I can give you configurations that will cause it to not delete the failed instance so you can look at the logs. It basically disables all of the error handling code. I just can't reproduce this locally. I'm running python3 in the amp.	17:21
johnsom	https://developer.openstack.org/api-ref/load-balancer/v2/index.html#failover-amphora	17:21
johnsom	and	17:21
roukoswarf	python3 is being used in the amphora, the controller is py2	17:22
johnsom	https://developer.openstack.org/api-ref/load-balancer/v2/index.html#failover-a-load-balancer	17:22
roukoswarf	it seems	17:22
johnsom	That should not matter, we do everything over a json REST api	17:22
roukoswarf	is there no openstack api command to wrap this rest api?	17:23
johnsom	yes, you can use the openstack CLI	17:24
roukoswarf	failover errors due to being in error state.	17:24
johnsom	Ah, yeah, that bug was recently fixed.	17:25
johnsom	This is the rocky backport: https://review.openstack.org/#/c/643005/	17:26
roukoswarf	so... what do	17:26
roukoswarf	amphoras are "immutable", and LB cant failover	17:27
roukoswarf	not sure where to find why these nodes are failed, those python errors arent on the failed nodes.	17:27
johnsom	yeah, all of the details are in the logs. The create listener error should be in the worker log, the failover error should be in the health manager log.	17:28
johnsom	Umm, you can try setting the status to ACTIVE manually in the DB, then using the failvoer	17:29
*** luksky has quit IRC		17:32
roukoswarf	im hoping the next version (after rocky) wont blow up?	17:34
johnsom	Well, rocky doesn't blow up for us	17:35
roukoswarf	well i cant really make syntax errors myself unless i edited the code, which i didnt	17:35
roukoswarf	its a kolla-ansible deploy of openstack.	17:36
johnsom	Yeah, that is what I don't understand, why you are getting those.	17:36
roukoswarf	i could provide configs	17:36
johnsom	Can you provide the commit ID from your amphora image?	17:36
roukoswarf	if you wanted to lab them tog	17:36
roukoswarf	well, the amphora is running on py3, which shouldnt get that error, its the 3.0.2 controller that seems to have blown a gasket	17:37
roukoswarf	7675c40d3024f275666240e4c2ada44813d0e535 commit on amphora-agent	17:38
johnsom	The error you pasted was clearly something wrong inside the amphora image.	17:38
johnsom	Thanks	17:38
johnsom	Ok, so that is a "master" branch amphroa-agent.	17:41
roukoswarf	yep	17:41
*** trown\|lunch is now known as trown		17:43
roukoswarf	i can build images to a specific commit/version, if thats my fix for my irregularity	17:44
johnsom	Do you know the exact command line they used to create the listener?	17:45
johnsom	Ah, looking through the rest of that log snippet, they were creating a pool.	17:46
johnsom	So, the options they used for the pool would be helpful	17:47
roukoswarf	they have just been poking around in horizon, i could try and dig for the settings	17:49
roukoswarf	need to get the blown LB up first though	17:49
roukoswarf	so i just set the LB to up in the db and then failover the LB and it should bring both the active and backup back?	17:51
roukoswarf	just want to confirm before i do further damage, seems like an odd way to trigger a reprovision of a LB	17:54
johnsom	Yes, it should cycle through the amphora and attempt a repair. There is a risk, using LB failover, that it may interrupt traffic that is likely still passing through the LB.	17:59
roukoswarf	well everything is down currently	17:59
johnsom	It's not passing traffic still?	18:00
roukoswarf	adding a listener killed the entire stack	18:00
roukoswarf	nope, no ports listening on the vip	18:00
johnsom	Hmmm, that seems more like a outside of Octavia failure.	18:00
johnsom	Well, then, you have nothing to make worse I guess.	18:01
roukoswarf	well, other LBs are fine	18:01
roukoswarf	the one LB that had a listener added blew up both amphoras and all listeners	18:01
*** ivve has joined #openstack-lbaas		18:07
roukoswarf	so i set everything to online/active on the lb, then failoverd it, which now theres a master and a standlone, on the same vip...	18:16
roukoswarf	logs show another 500 error	18:17
roukoswarf	same state i was before, so this specific LB setup crashes the amphora setup?	18:18
roukoswarf	http://sprunge.us/DBBRRl	18:19
johnsom	Can you ssh into one of the amphora and check the syslog/messages for an amphora-agent error	18:19
johnsom	?	18:19
roukoswarf	sure	18:19
roukoswarf	http://paste.openstack.org/show/6JQAWhzNSBYExhlVb571/	18:22
roukoswarf	cannot fork?	18:23
roukoswarf	only one haproxy is failing though, that i can tell.	18:25
johnsom	Hmm, yeah, that is odd. Is there an error in the haproxy.log?	18:35
roukoswarf	its the fork issue... debuged it	18:36
roukoswarf	haproxy will throw a fork error instead of ooming	18:36
johnsom	Yeah, it throws that for a few reasons	18:36
roukoswarf	strange though, cause i had 200mb of ram free, it wanted more.	18:37
roukoswarf	even while running it, it still had 200mb of ram free... must have some kind of minfree, which is silly to throw a fork error on...	18:37
johnsom	How much RAM are you giving the amps? Are you doing a lot of TLS offload?	18:38
roukoswarf	0 tls offload, 1gb of ram	18:38
roukoswarf	its all tcp proxy, not even http	18:39
johnsom	This is the ubuntu image right?	18:39
roukoswarf	yes	18:39
johnsom	Yeah, TCP doesn't use much	18:39
johnsom	paste finally loaded, wondering why the str error isn't there from the amphora-agent	18:41
roukoswarf	doubt the str error is related to this specific haproxy crash, but yeah	18:43
johnsom	Well, if it caused the one of the files to be written out wrong, it might	18:44
roukoswarf	it was happening on LBs that never went down	18:45
roukoswarf	but... maybe they are close on ram and failed intermittently?	18:45
johnsom	FYI, here is my amp memory with one LB: KiB Mem : 1015812 total, 756360 free, 114248 used, 145204 buff/cache	18:45
*** abaindur has joined #openstack-lbaas		19:10
roukoswarf	johnsom: if the health monitor is happy, and the member is happy, why would the listener still be error? just a delay in update?	19:33
johnsom	Provisioning status error or operating status error?	19:33
roukoswarf	provisioning, sorry.	19:34
roukoswarf	does provisioning status need a manual kick on a failover rebuild?	19:34
johnsom	provisioning status is the status of actions the controller was taking. Operating is the status as measured or observed.	19:35
johnsom	A listener in prov status ERROR means the last time the controller attempted to interact with that listener, the process failed. For example if neutron had a problem and the controller could not reach the listener.	19:35
johnsom	A successful failvoer will clear the provisioning status errors.	19:36
roukoswarf	even if we enable/disable a health monitor, it flips the provision status to active, lol.	19:37
johnsom	Yeah, because it did a round trip to the amp and pushed out another configuration. So if that succeeded, i.e. neutron is working again, it's back to Active.	19:40
roukoswarf	yep, thanks for all the help	19:40
*** trown is now known as trown\|outtypewww		20:14
*** yamamoto has joined #openstack-lbaas		20:18
*** yamamoto has quit IRC		20:22
*** luksky has joined #openstack-lbaas		20:54
*** abaindur has quit IRC		22:01
*** abaindur has joined #openstack-lbaas		22:03
*** pcaruana has quit IRC		22:08
*** celebdor has joined #openstack-lbaas		22:16
*** rcernin has joined #openstack-lbaas		22:16
*** fnaval has quit IRC		23:12
*** fnaval has joined #openstack-lbaas		23:20
*** luksky has quit IRC		23:21
rm_work	xgerman: did you want to refactor https://review.openstack.org/#/c/613685/ on top of https://review.openstack.org/642914 which I just +W'd?	23:22
rm_work	might still be able to make RC1... though not sure, we're at the line	23:22
xgerman	ok, will refactor that one on top	23:23
rm_work	i think it addresses the last concerns cgoncalves had	23:23
johnsom	Wow, you are bold +w a flow change....	23:23
johnsom	We just had to fix/revert one of those	23:23
xgerman	that one is LOW impact	23:23
johnsom	grin, ok	23:24
johnsom	Ugh, tracking down a status bug my last patch uncovered.... Fun times, not	23:24
johnsom	Summary: pool: PENDING_UPDATE lb: ACTIVE list: ACTIVE	23:26
*** abaindur has quit IRC		23:27
rm_work	yeah it passed my head-parser	23:27
rm_work	so i'm pretty sure it's good :D	23:27
rm_work	hmm :/	23:28
rm_work	somehow the pool is escaping our state machine?	23:28
rm_work	BTW I'm starting to move patches that are not ready yet out of the "Merged by 19th" section into "next cycle"	23:28
johnsom	Ok	23:28
johnsom	Yeah, somewhere we aren't marking the pool back to active at the end of a flow.	23:29
rm_work	not ready yet == zuul -1 that is clearly not an intermittent issue	23:29
rm_work	or heavyweight refactors that we just don't need in an RC	23:29
*** abaindur has joined #openstack-lbaas		23:30
rm_work	you really think you'll make it in for RC1? lol	23:31
johnsom	I think we need to get this lifecycle bug fix in.	23:31
rm_work	i mean, it's not done, and gate time alone...	23:31
johnsom	Thursday is the hard cut	23:32
rm_work	what TIME thursday?	23:32
johnsom	I should have it fixed up today	23:32
johnsom	Well, the patch is fine, but we have another bug it exposed	23:32
rm_work	lol ok, if you really want it in, I'll try :)	23:32
johnsom	Ugh, that neutron-lbaas proxy API gate is blocking an A10 patch on neutron-lbaas.	23:33
xgerman	?	23:34
johnsom	https://review.openstack.org/#/c/639571/	23:34
*** henriqueof has joined #openstack-lbaas		23:36
xgerman	must have forgotten how to read logs. http://logs.openstack.org/71/639571/1/gate/neutron-lbaasv2-dsvm-api-proxy/a673ee2/logs/screen-n-api.txt.gz ends timnestampwise way before the test runs	23:41
xgerman	disrregard	23:42
*** celebdor has quit IRC		23:56

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!