Friday, 2020-07-10

*** wuchunyang has quit IRC		00:15
*** wuchunyang has joined #openstack-lbaas		00:16
*** yamamoto has joined #openstack-lbaas		00:17
*** wuchunyang has quit IRC		01:02
openstackgerrit	Anushka Singh proposed openstack/octavia master: Refactoring amphora stats driver interface https://review.opendev.org/737111	01:03
aannuusshhkkaa	johnsom, we fixed 2/3 issues you had raised on https://review.opendev.org/737111...	01:10
*** yamamoto has quit IRC		01:23
*** yamamoto has joined #openstack-lbaas		01:26
*** yamamoto has quit IRC		01:30
*** tkajinam has quit IRC		01:32
*** tkajinam has joined #openstack-lbaas		01:32
openstackgerrit	Merged openstack/octavia-tempest-plugin master: Change to use memory_tracker variable https://review.opendev.org/704202	01:32
*** wuchunyang has joined #openstack-lbaas		01:34
*** ianychoi_ has quit IRC		01:48
*** ianychoi_ has joined #openstack-lbaas		01:50
*** armax has joined #openstack-lbaas		02:27
*** yamamoto has joined #openstack-lbaas		02:51
*** yamamoto has quit IRC		03:05
*** yamamoto has joined #openstack-lbaas		03:06
*** wuchunyang has quit IRC		03:40
*** wuchunyang has joined #openstack-lbaas		03:45
*** coreycb has quit IRC		04:08
*** headphoneJames has quit IRC		04:08
*** rm_work has quit IRC		04:08
*** nicolasbock has quit IRC		04:08
*** armax has quit IRC		04:08
*** KeithMnemonic has quit IRC		04:08
*** bcafarel has quit IRC		04:08
*** laerling has quit IRC		04:08
*** oklhost has quit IRC		04:08
*** dayou has quit IRC		04:08
*** zigo has quit IRC		04:08
*** gthiemonge has quit IRC		04:08
*** andy_ has quit IRC		04:08
*** cgoncalves has quit IRC		04:08
*** amotoki has quit IRC		04:08
*** dulek has quit IRC		04:08
*** f0o has quit IRC		04:08
*** ramishra has quit IRC		04:08
*** dasp_ has quit IRC		04:08
*** gmann has quit IRC		04:08
*** emccormick has quit IRC		04:08
*** dougwig has quit IRC		04:08
*** mnaser has quit IRC		04:08
*** squarebracket has quit IRC		04:08
*** ianychoi_ has quit IRC		04:08
*** jmccrory has quit IRC		04:08
*** njohnston has quit IRC		04:08
*** wuchunyang has quit IRC		04:08
*** rpittau has quit IRC		04:08
*** osmanlicilegi has quit IRC		04:08
*** zzzeek has quit IRC		04:08
*** brtknr has quit IRC		04:08
*** mloza has quit IRC		04:08
*** trident has quit IRC		04:08
*** tobberydberg_ has quit IRC		04:08
*** eandersson has quit IRC		04:08
*** sorrison has quit IRC		04:08
*** zetaab has quit IRC		04:08
*** openstackgerrit has quit IRC		04:08
*** frickler has quit IRC		04:08
*** johnthetubaguy has quit IRC		04:08
*** vesper11 has quit IRC		04:08
*** fyx has quit IRC		04:08
*** NobodyCam has quit IRC		04:08
*** jrosser has quit IRC		04:08
*** aannuusshhkkaa has quit IRC		04:08
*** JayF has quit IRC		04:08
*** mugsie has quit IRC		04:08
*** stingrayza has quit IRC		04:08
*** jamespage has quit IRC		04:08
*** kklimonda has quit IRC		04:08
*** dmsimard has quit IRC		04:08
*** dosaboy has quit IRC		04:08
*** TMM has quit IRC		04:08
*** michchap has quit IRC		04:08
*** devfaz has quit IRC		04:08
*** numans has quit IRC		04:08
*** yamamoto has quit IRC		04:08
*** servagem has quit IRC		04:08
*** colin- has quit IRC		04:08
*** dtruong has quit IRC		04:08
*** beisner has quit IRC		04:08
*** hemanth_n has quit IRC		04:08
*** irclogbot_3 has quit IRC		04:08
*** haleyb has quit IRC		04:08
*** logan- has quit IRC		04:08
*** kevinz has quit IRC		04:08
*** lxkong has quit IRC		04:08
*** johnsom has quit IRC		04:08
*** tkajinam has quit IRC		04:08
*** xgerman has quit IRC		04:08
*** andrein has quit IRC		04:08
*** stevenglasford has quit IRC		04:08
*** ramishra has joined #openstack-lbaas		04:14
*** squarebracket has joined #openstack-lbaas		04:14
*** mnaser has joined #openstack-lbaas		04:14
*** dougwig has joined #openstack-lbaas		04:14
*** emccormick has joined #openstack-lbaas		04:14
*** gmann has joined #openstack-lbaas		04:14
*** dasp_ has joined #openstack-lbaas		04:14
*** amotoki has joined #openstack-lbaas		04:14
*** cgoncalves has joined #openstack-lbaas		04:14
*** andy_ has joined #openstack-lbaas		04:14
*** gthiemonge has joined #openstack-lbaas		04:14
*** nicolasbock has joined #openstack-lbaas		04:14
*** rm_work has joined #openstack-lbaas		04:14
*** headphoneJames has joined #openstack-lbaas		04:14
*** coreycb has joined #openstack-lbaas		04:14
*** frickler has joined #openstack-lbaas		04:14
*** openstackgerrit has joined #openstack-lbaas		04:14
*** zetaab has joined #openstack-lbaas		04:14
*** sorrison has joined #openstack-lbaas		04:14
*** eandersson has joined #openstack-lbaas		04:14
*** johnthetubaguy has joined #openstack-lbaas		04:14
*** tobberydberg_ has joined #openstack-lbaas		04:14
*** trident has joined #openstack-lbaas		04:14
*** zzzeek has joined #openstack-lbaas		04:14
*** mugsie has joined #openstack-lbaas		04:14
*** JayF has joined #openstack-lbaas		04:14
*** aannuusshhkkaa has joined #openstack-lbaas		04:14
*** jrosser has joined #openstack-lbaas		04:14
*** NobodyCam has joined #openstack-lbaas		04:14
*** vesper11 has joined #openstack-lbaas		04:14
*** jmccrory has joined #openstack-lbaas		04:14
*** ianychoi_ has joined #openstack-lbaas		04:14
*** njohnston has joined #openstack-lbaas		04:14
*** osmanlicilegi has joined #openstack-lbaas		04:14
*** rpittau has joined #openstack-lbaas		04:14
*** numans has joined #openstack-lbaas		04:14
*** devfaz has joined #openstack-lbaas		04:14
*** michchap has joined #openstack-lbaas		04:14
*** TMM has joined #openstack-lbaas		04:14
*** dosaboy has joined #openstack-lbaas		04:14
*** dmsimard has joined #openstack-lbaas		04:14
*** kklimonda has joined #openstack-lbaas		04:14
*** jamespage has joined #openstack-lbaas		04:14
*** stingrayza has joined #openstack-lbaas		04:14
*** yamamoto has joined #openstack-lbaas		04:14
*** tkajinam has joined #openstack-lbaas		04:14
*** servagem has joined #openstack-lbaas		04:14
*** fyx has joined #openstack-lbaas		04:14
*** lxkong has joined #openstack-lbaas		04:14
*** johnsom has joined #openstack-lbaas		04:14
*** xgerman has joined #openstack-lbaas		04:14
*** andrein has joined #openstack-lbaas		04:14
*** stevenglasford has joined #openstack-lbaas		04:14
*** beisner has joined #openstack-lbaas		04:14
*** hemanth_n has joined #openstack-lbaas		04:14
*** kevinz has joined #openstack-lbaas		04:14
*** colin- has joined #openstack-lbaas		04:14
*** f0o has joined #openstack-lbaas		04:14
*** dulek has joined #openstack-lbaas		04:14
*** dtruong has joined #openstack-lbaas		04:14
*** irclogbot_3 has joined #openstack-lbaas		04:14
*** haleyb has joined #openstack-lbaas		04:14
*** logan- has joined #openstack-lbaas		04:14
*** brtknr has joined #openstack-lbaas		04:15
*** mloza has joined #openstack-lbaas		04:15
*** armax has joined #openstack-lbaas		04:15
*** laerling has joined #openstack-lbaas		04:15
*** KeithMnemonic has joined #openstack-lbaas		04:15
*** bcafarel has joined #openstack-lbaas		04:15
*** oklhost has joined #openstack-lbaas		04:15
*** dayou has joined #openstack-lbaas		04:15
*** zigo has joined #openstack-lbaas		04:15
*** coreycb has quit IRC		04:15
*** nicolasbock has quit IRC		04:16
*** beisner has quit IRC		04:16
*** gmann has quit IRC		04:16
*** mnaser has quit IRC		04:16
*** fyx has quit IRC		04:16
*** coreycb has joined #openstack-lbaas		04:16
*** beisner has joined #openstack-lbaas		04:18
*** fyx has joined #openstack-lbaas		04:18
*** gmann has joined #openstack-lbaas		04:19
*** nicolasbock has joined #openstack-lbaas		04:21
*** yamamoto has quit IRC		04:35
*** yamamoto has joined #openstack-lbaas		04:38
*** gcheresh has joined #openstack-lbaas		05:11
*** vishalmanchanda has joined #openstack-lbaas		05:25
*** gcheresh has quit IRC		05:32
*** wuchunyang has joined #openstack-lbaas		05:51
*** ianychoi_ has quit IRC		06:22
*** ianychoi_ has joined #openstack-lbaas		06:23
*** tkajinam has quit IRC		06:25
*** tkajinam has joined #openstack-lbaas		06:26
*** also_stingrayza has joined #openstack-lbaas		06:37
*** stingrayza has quit IRC		06:39
*** also_stingrayza is now known as stingrayza		06:47
*** wuchunyang has quit IRC		06:48
*** wuchunyang has joined #openstack-lbaas		06:53
*** ataraday_ has joined #openstack-lbaas		07:20
*** wuchunyang has quit IRC		07:25
*** maciejjozefczyk has joined #openstack-lbaas		07:57
*** gcheresh has joined #openstack-lbaas		08:00
*** gcheresh has quit IRC		08:24
*** yamamoto has quit IRC		08:26
*** cgoncalves has quit IRC		08:27
*** yamamoto has joined #openstack-lbaas		08:28
*** cgoncalves has joined #openstack-lbaas		08:29
*** yamamoto has quit IRC		08:29
*** yamamoto has joined #openstack-lbaas		08:29
*** gcheresh has joined #openstack-lbaas		08:30
ataraday_	cgoncalves, Hi! I was looking into making grenade job do amphora -> amphorav2 upgrade. And there is an issue with that. I found that I can pass some settings via grenade_devstack_localrc only to new setups https://review.opendev.org/#/c/737993/6/zuul.d/amphorav2-jobs.yaml@107	08:38
ataraday_	But I cannot pass post-configs with that	08:38
ataraday_	I tried some options and checked other projects, but cannot find anything that helps	08:39
cgoncalves	ataraday_, hello! yeah, grenade does not support post-config settings like normal jobs do. you will have to have a per release specific upgrade script	08:44
cgoncalves	ref: https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade (see last bullet item in that section)	08:45
cgoncalves	an example of a patch I have open: https://review.opendev.org/#/c/738017/4/devstack/upgrade/from-ussuri/upgrade-octavia	08:45
cgoncalves	ataraday_, although I'm not sure we need such script if we go forward with aliasing "amphora" to "amphorav2"	08:46
ataraday_	cgoncalves, great, I'll look into it	08:47
ataraday_	mmm, this was about adding experimental jobs	08:48
ataraday_	may be I should not add grenade job for now	08:50
*** born2bake has joined #openstack-lbaas		08:53
*** gcheresh has quit IRC		08:55
cgoncalves	ataraday_, you could propose a patch with the alias	09:02
cgoncalves	should be trivial, see the "octavia" to "amphora" alias: https://github.com/openstack/octavia/blob/master/setup.cfg#L59-L60	09:04
ataraday_	I guess making amphorav2 -> amphora is final step, and experimental jobs we need before to verify that we can do it :)	09:07
cgoncalves	right, so a depends-on patch would work	09:09
cgoncalves	1) propose alias patch (ignore CI results), 2) set depends-on on the experimental jobs patch for CI validation	09:10
ataraday_	OK, but then this job will make sense only against this change.. May be this is fine, I will add comment about it.	09:22
ataraday_	Thanks a lot!	09:22
openstackgerrit	Ann Taraday proposed openstack/octavia master: Add experimental amphorav2 jobs https://review.opendev.org/737993	09:39
cgoncalves	we can run some experiments to see how it goes :)	09:40
*** dosaboy has quit IRC		09:40
openstackgerrit	Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/740432	09:41
cgoncalves	ataraday_, not sure I follow your comment in https://review.opendev.org/#/c/739053/3/octavia/common/base_taskflow.py. retryMaskFilter is in both v2.controller_worker and in base_taskflow. are you saying we only need it in one place?	09:43
ataraday_	cgoncalves, no, I mean we need it in both places. In v2.controller_worker and base_taskflow	09:44
ataraday_	with enabled or disabled jobboard logs comes from different start points	09:45
*** wuchunyang has joined #openstack-lbaas		09:45
ataraday_	cgoncalves, https://review.opendev.org/#/c/647406/106/octavia/controller/worker/v2/controller_worker.py@48 It was dropped it from v2.controller_worker	09:49
cgoncalves	ataraday_, oh, I see the log filter was later removed in the v2.controller_worker.	09:49
cgoncalves	yep	09:49
ataraday_	sorry for confusion	09:49
cgoncalves	ataraday_, still, v2.controller_worker imports base_taskflow so the log filter will still be applied no?	09:50
cgoncalves	ah, no. never mind	09:50
openstackgerrit	Gregory Thiemonge proposed openstack/octavia-tempest-plugin master: WIP SCTP traffic scenario tests https://review.opendev.org/738643	09:53
*** yamamoto has quit IRC		09:53
*** gcheresh has joined #openstack-lbaas		09:55
*** wuchunyang has quit IRC		10:08
*** yamamoto has joined #openstack-lbaas		10:08
*** yamamoto has quit IRC		10:08
*** yamamoto has joined #openstack-lbaas		10:09
*** yamamoto has quit IRC		10:11
*** yamamoto has joined #openstack-lbaas		10:12
*** gcheresh has quit IRC		10:20
*** spatel has joined #openstack-lbaas		10:41
*** spatel has quit IRC		10:46
*** pck has quit IRC		10:48
*** pck has joined #openstack-lbaas		10:51
*** dosaboy has joined #openstack-lbaas		11:19
*** dosaboy has quit IRC		11:19
*** dosaboy has joined #openstack-lbaas		11:19
*** ramishra has quit IRC		11:23
*** ramishra has joined #openstack-lbaas		11:27
*** yamamoto has quit IRC		11:30
*** yamamoto has joined #openstack-lbaas		11:40
*** yamamoto has quit IRC		11:42
*** pck has quit IRC		11:52
*** pck has joined #openstack-lbaas		11:54
*** yamamoto has joined #openstack-lbaas		12:00
*** gcheresh has joined #openstack-lbaas		12:12
*** pck has quit IRC		12:12
*** pck has joined #openstack-lbaas		12:13
openstackgerrit	Ann Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora https://review.opendev.org/740432	12:14
*** yamamoto has quit IRC		12:18
*** yamamoto has joined #openstack-lbaas		12:28
*** spatel has joined #openstack-lbaas		12:42
*** spatel has quit IRC		12:47
*** yamamoto has quit IRC		12:53
*** yamamoto has joined #openstack-lbaas		13:05
*** gcheresh has quit IRC		13:07
*** mnaser has joined #openstack-lbaas		13:09
*** jamesdenton has joined #openstack-lbaas		13:11
devfaz	hi, anyone here able to help us getting some loadbalancers back to "normal"? We have amphoras in ERROR-state and are unable to failover.	13:11
*** yamamoto has quit IRC		13:12
*** irclogbot_3 has quit IRC		13:27
*** kevinz has quit IRC		13:27
*** irclogbot_0 has joined #openstack-lbaas		13:29
*** hemanth_n has quit IRC		13:29
*** hemanth_n_ has joined #openstack-lbaas		13:30
*** TrevorV has joined #openstack-lbaas		13:30
*** haleyb has quit IRC		13:30
*** logan- has quit IRC		13:30
devfaz	we would like to remove an amphora from the database (the instance already got removed) and just let octavia create a new one. If we just try to failover an amphora we run into different issues f.e. "unable to attach port" to new amphora, then we removed the port.. => "Port: NULL not found", then we created a vrrp_port as described here http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:0	13:30
devfaz	7:45 - but now getting "subnet_id: Null", ... is there an easy way to just tell octavia: hey, drop this amphora and create a new one with new vrrp_port?	13:30
*** gmann has quit IRC		13:30
*** gmann has joined #openstack-lbaas		13:32
*** logan- has joined #openstack-lbaas		13:32
*** yamamoto has joined #openstack-lbaas		13:38
openstackgerrit	Merged openstack/octavia master: Stop to use the __future__ module. https://review.opendev.org/732880	13:41
*** yamamoto has quit IRC		13:42
*** yamamoto has joined #openstack-lbaas		13:52
openstackgerrit	Gregory Thiemonge proposed openstack/octavia master: Deny the creation of L7Policies in TCP or UDP listeners https://review.opendev.org/740478	14:16
*** gcheresh has joined #openstack-lbaas		15:09
*** yamamoto has quit IRC		15:10
*** vishalmanchanda has quit IRC		15:15
*** sapd1_x has joined #openstack-lbaas		15:16
*** ataraday_ has quit IRC		15:17
*** tkajinam has quit IRC		15:37
*** gcheresh has quit IRC		15:57
*** gcheresh has joined #openstack-lbaas		15:58
*** armax has quit IRC		16:16
*** gcheresh has quit IRC		16:19
*** armax has joined #openstack-lbaas		16:20
*** dmellado has joined #openstack-lbaas		16:50
*** dmellado has quit IRC		17:04
*** dmellado has joined #openstack-lbaas		17:08
*** dmellado has quit IRC		17:26
*** armax has joined #openstack-lbaas		17:36
*** armax has quit IRC		17:38
*** sapd1_x has quit IRC		18:44
*** spatel has joined #openstack-lbaas		19:22
*** spatel has quit IRC		19:23
*** spatel has joined #openstack-lbaas		19:23
*** spatel has quit IRC		19:30
rm_work	hmm just heads up I am debugging an issue around some session persistence config causing LBs to ERROR	19:36
rm_work	in my cloud, so cent8 amps and minor patching but not anything that should interfere, will follow up when i have some idea what's u[	19:37
johnsom	"not anything that should interfere" lol	19:38
rm_work	it's pretty minimal now	19:42
rm_work	nova scheduling patch, and a patch to force the cent8 amps to actually ARP properly on boot	19:43
*** TrevorV has quit IRC		20:01
*** gcheresh has joined #openstack-lbaas		20:02
*** maciejjozefczyk has quit IRC		20:25
rm_work	johnsom: ok it's super weird	20:41
cgoncalves	rm_work, could you please revisit https://review.opendev.org/#/c/738246/ (nest virt for CI patch)	20:41
johnsom	rm_work Theme of my day	20:42
rm_work	one of my amps gets	20:42
rm_work	[2020-07-10 20:35:55 +0000] [1090] [DEBUG] Ignoring connection reset	20:42
rm_work	and then won't respond for a while	20:42
rm_work	then [2020-07-10 20:38:55 +0000] [1031] [CRITICAL] WORKER TIMEOUT (pid:1090)	20:42
rm_work	and then exits and starts a new worker, then the new worker just gets constant SSL/socket errors	20:43
johnsom	This is haproxy? <missing some context>	20:43
rm_work	this is the agent	20:43
rm_work	LBs getting stuck in pending	20:43
rm_work	and eventually ERROR	20:43
rm_work	(after timeout)	20:43
johnsom	yeah, go to ERROR. Ok, so this is the gunicorn worker	20:44
rm_work	yes	20:44
johnsom	This rings a bell, I'm just not sure which one yet.	20:44
rm_work	oh huh	20:45
rm_work	[2020-07-10 20:34:10 +0000] [1090] [DEBUG] PUT /1.0/loadbalancer/083a6861-8ec9-47cb-81e6-6b03dbf45a1f/reload	20:45
rm_work	::ffff:10.249.23.94 - - [10/Jul/2020:20:35:40 +0000] "PUT /1.0/loadbalancer/083a6861-8ec9-47cb-81e6-6b03dbf45a1f/reload HTTP/1.1" 500 377 "-" "Octavia HaProxy Rest Client/0.5 (https://wiki.openstack.org/wiki/Octavia)"	20:45
rm_work	right before it starts doing this	20:46
johnsom	Yeah, is it memory pressure in the amp?	20:46
rm_work	http://paste.openstack.org/show/3AbgVxoWHvxUwpf6N2Di/	20:47
*** KeithMnemonic has quit IRC		20:47
rm_work	no	20:47
cgoncalves	could it be the haproxy memory bug gthiemonge has been working on? because you mentioned session persistence	20:48
johnsom	Is there a "failed" haproxy config file? You know the one that it saves if haproxy doesn't like the config?	20:48
rm_work	looking	20:49
johnsom	Yeah, that is why I asked about the memory pressure	20:49
rm_work	no	20:49
rm_work	and the other amp took the config correctly	20:49
rm_work	both amps seem to be around:	20:49
rm_work	total used free shared buff/cache available	20:49
rm_work	Mem: 979Mi 315Mi 540Mi 6.0Mi 123Mi 529Mi	20:49
rm_work	seems not bad	20:49
cgoncalves	rm_work, https://storyboard.openstack.org/#!/story/2007794	20:49
johnsom	Yeah, plenty	20:50
rm_work	i can try changing the connection limit tho	20:50
johnsom	I doubt it is related	20:50
johnsom	Can you paste the systemd service file for haproxy?	20:51
johnsom	Really I'm looking for the peer ID string	20:51
rm_work	heh	20:52
cgoncalves	hah	20:52
rm_work	wait where is that	20:52
rm_work	yeah it always happens when I add a member...	20:52
rm_work	i am trying to confirm that also	20:52
johnsom	You can also get the string I want with "ps -ef \| grep haproxy"	20:54
johnsom	Should be after the -L	20:54
rm_work	ah	20:54
rm_work	lEyka8jttt6jkyQiPHSB6AvUwU0	20:54
rm_work	no -	20:54
johnsom	Ok, bummer, it's not that	20:54
johnsom	what does the haproxy log file have? Anything interesting?	20:55
rm_work	ah i do see something in there, was just looking	20:55
rm_work	interesting	20:55
rm_work	FD limit issues	20:56
rm_work	on third member i think	20:56
rm_work	http://paste.openstack.org/show/795804/	20:56
rm_work	maxcon related also	20:56
johnsom	Nope, it's the usage output.	20:57
johnsom	The FD stuff is always there	20:57
johnsom	So, that peer ID has to be the problem.	20:57
cgoncalves	wouldn't we see a second usage output if it was a bad peer ID?	20:58
johnsom	Hmm, there is a "cannot fork" in there	20:58
johnsom	Maybe it is memory. Try dropping the max connections on the listener down to 50k	20:58
rm_work	wait so we ALWAYS have an FD problem?	20:59
johnsom	Yeah, it always whines about the FD limit and drops it down to whatever the instance can handle	20:59
johnsom	Something that is fixed in 2.x versions btw	21:00
johnsom	That usage output makes me wonder though. I don't think we see that when it's just the memory being too low	21:02
rm_work	hmmm this is weird tho	21:05
rm_work	i created a third member and it worked fine...	21:05
rm_work	so i deleted it, which worked	21:06
rm_work	and recreated it	21:06
rm_work	and now it broke one amp again	21:06
cgoncalves	maxconn issue	21:06
rm_work	memory is totally fine tho	21:07
johnsom	Give it a shot though, I'm leaning that way as well	21:07
cgoncalves	rm_work, it is until haproxy tries to reload	21:07
rm_work	k	21:08
*** gcheresh has quit IRC		21:08
johnsom	You could check syslog and see if there are oom logs, but I don't think there always are	21:08
rm_work	i mean it then succeeds at loading haproxy again right after	21:08
rm_work	and memory is NOW fine	21:08
rm_work	but the amp agent is still totally busted	21:08
rm_work	and therefore the amp is broken	21:08
cgoncalves	rm_work, it works fine if you don't reload too fast	21:09
rm_work	ok sooooo	21:09
rm_work	why is the amp agent dead	21:09
rm_work	and spewing connection errors	21:09
rm_work	so haproxy failed to load -- ok	21:09
rm_work	but now the amp agent can't accept connections?	21:09
rm_work	how would one affect the other that way	21:10
johnsom	I have some theories on that. my guess is the systemctl restart was hanging, which let the controller to timeout and close the connection, thus the connection reset, but gunicorn is still waiting on systemctl to give up or whatever	21:10
rm_work	trying a manual restart of the amp agent to see if it comes back	21:11
johnsom	check if there is a systemctl in the process list	21:11
rm_work	ok well i restarted the agent	21:12
rm_work	and the amp is back to ACTIVE	21:12
rm_work	and now the LB is good	21:12
rm_work	"restart" didn't work, had to do stop/start BTW	21:12
rm_work	yeah	21:15
rm_work	/bin/systemctl reload haproxy-143b63ec-6058-437f-8eb6-112380a612e4.service	21:15
rm_work	it's stuck on the reload and timing out gunicorn, you're correct	21:15
rm_work	so gunicorn doesn't handle that well and just ... breaks?	21:15
rm_work	and can't recover	21:15
johnsom	Well, we have one worker configured (good reasons) and that one worker is locked up with systemd dumbness	21:16
rm_work	hmm	21:16
rm_work	and systemd NEVER times out?	21:17
rm_work	ah no, it is gone now	21:17
rm_work	but agent is still borked I think	21:17
johnsom	It does, but probably waits longer than gunicorn	21:17
rm_work	so once that's timed out, gunicorn should start responding, right?	21:17
johnsom	Yeah, we see it killing the worker, but I don't know what it does when a new worker is started, does it just run the request again?	21:19
rm_work	doesn't seem to	21:19
johnsom	So out of curiosity, did lowering the listener connection limit help?	21:23
rm_work	still doing a bunch of testing to make sure i can 100% replicate	21:25
rm_work	i think i'm about confident	21:25
johnsom	Now that we are on python3 we could add a timeout to that systemd call	21:35
*** shtepanie has joined #openstack-lbaas		21:41
aannuusshhkkaa	johnsom: are you around?	21:42
johnsom	aannuusshhkkaa I am	21:43
aannuusshhkkaa	so we were wondering if we should create a new table for the new metrics we are adding	21:44
johnsom	Those are the CPU and RAM?	21:45
shtepanie	yep, and later on we plan on adding things like load averages and disk usage	21:45
aannuusshhkkaa	and probably some related to active connections and network bandwidth etc...	21:46
johnsom	Hmm, do you need them stored for your use case?	21:46
aannuusshhkkaa	as opposed to?	21:46
johnsom	Well, we could collect them from the amphora, pass them to the metrics driver(s), and that is all.	21:47
johnsom	We are not planning to add those to the API, so I don't know if they need to be stored or simply passed to the metrics driver(s)	21:47
aannuusshhkkaa	hmm okay..	21:49
aannuusshhkkaa	so we wont need a new api either right?	21:50
johnsom	We store the current metrics because they are exposed via the API, so if someone asks for the stats we don't have to wait for a message or poll the amphora. But these metrics I don't think we are planning to add to the API, we are just going to use them and/or send them somewhere.	21:50
aannuusshhkkaa	so we just query the amphora for the NEW metrics say every 10 seconds, and use them however we want?	21:52
johnsom	Well, the amphora is going to send them right? we aren't adding polling.	21:53
shtepanie	we want our customer to be able to get the metrics when they ask for it, so wouldn't we fall into the first scenario? and if we don't add them to the api, the customer would just have to wait for the next message to get the stats?	21:54
johnsom	Hmm, we didn't write up a spec did we rm_work?	21:54
rm_work	we did not.... though i expected we WOULD expose those via the API somehow	21:55
rm_work	I realize they're not always totally generic	21:55
johnsom	Yeah, so maybe our use cases are very different. I thought we were updating the metrics and adding the ability to have metrics drivers, which would take the data and do something with it.	21:56
rm_work	yes, both	21:56
rm_work	but they're separate things	21:56
rm_work	and if we pass that data to the metrics drivers, the update_db driver needs to store it too...	21:56
johnsom	Yeah, the issues with the API are, 1. users aren't supposed to know amphora even exist, let alone the memory or CPU allocated to them.	21:56
rm_work	right, been thinking about that	21:57
johnsom	2. we would have to store those	21:57
rm_work	so if we didn't create a new table and added them to the listenerstats table... there would be a little bit of data duplication on the amp driver, but could allow other providers at least to store things granularly enough	21:57
johnsom	3. then expose more "amphora" stuff	21:57
johnsom	4. Is it going to be consistent with different amphora images, etc.	21:58
rm_work	well, they all have a concept of CPU/RAM	21:58
rm_work	and if it's percentages, then... pretty consistent	21:58
johnsom	Not really per-listener or LB though	21:58
johnsom	This just feels like exposing the sausage making, it's ugly and customers really only want the finished product.	21:59
rm_work	yeah, i mean we will already have to make funky decisions like "do we store just the MAX of the usage values, since we have two amps returning data?"	22:00
rm_work	you say that, but we've tried to convince our customer that all they need is active connections and an estimate of the maximum we support	22:00
rm_work	and that has not been an accepted answer, they want to see CPU/RAM data	22:01
johnsom	Yeah, you just can't add that to the listener or lb api. It wouldn't make an sense	22:01
rm_work	it wouldn't necessarily make sense to SHOW on the listener stats api call	22:02
johnsom	The only place you could add it is is to create a new "amphora stats" API	22:02
rm_work	yeah, that's an option	22:04
rm_work	I mean, it COULD be displayed on the loadbalancer stats	22:04
rm_work	because I believe every provider will have these stats, even the hardware ones? wouldn't an F5 have CPU/RAM usage?	22:04
johnsom	So you have an LB with five amphora on it. Do you average?	22:04
rm_work	like i said, Peak	22:04
rm_work	Max	22:05
johnsom	No, F5 doesn't have this	22:05
johnsom	F5 would have a few thousand load balancers all sharing a CPU and RAM	22:05
rm_work	ah I didn't think F5 did multi-tenant	22:06
rm_work	i guess it makes sense tho	22:06
rm_work	i've just never seen it deployed that way	22:06
aannuusshhkkaa	looks like F5 does? https://techdocs.f5.com/kb/en-us/products/big-ip_analytics/manuals/product/analytics-implementations-12-1-0/7.html	22:07
aannuusshhkkaa	have cpu and ram usage.. correct me if I am wrong!	22:07
johnsom	aannuusshhkkaa Yeah, that is exactly what I am saying. The only metrics for cpu/disk/ram on the appliances is for thousands of load balancers. It's not per-load balancer like the amphora.	22:08
*** rcernin has joined #openstack-lbaas		22:09
aannuusshhkkaa	ouuu okay	22:10
rm_work	hmm, it does have them?	22:10
rm_work	and i guess it depends on the deployment	22:10
johnsom	On F5 a load balancer is a "virtual server". The appliances are expensive, so you stack as many virtual servers (load balancers) on each appliance as you can.	22:10
aannuusshhkkaa	and if one fails, does it mean the others are about to fail too?	22:11
johnsom	Yeah, it is shared fate, but typically you have them in an HA pair and it fails over to the other appliance.	22:12
rm_work	so you would -2 adding system statistics to LoadbalancerStats in any form?	22:13
aannuusshhkkaa	gotcha, then using peak(max) totally makes sense right?	22:13
johnsom	So every customer would see 95% basically. No matter if their load balancer was idle.	22:14
aannuusshhkkaa	i guess a false positive indicating failure is safer than a false negative in this case right?	22:16
rm_work	mixed bag -- don't necessarily want a ton of customers coming to us and complaining that their LB is always over 50% capacity and they want a new one	22:16
johnsom	Just to give you an idea, this is the cheapest F5 appliance: https://www.softchoice.com/catalog/en-us/network-devices-f5-big-ip-iseries-local-traffic-manager-i2800-load-balancing-device-F5Networks-UX5251	22:17
rm_work	lol yeah	22:17
aannuusshhkkaa	whaaaaat!!!!!	22:17
aannuusshhkkaa	rm_work, right.. so what do we do then? about the mixed bag?	22:18
*** dmellado has joined #openstack-lbaas		22:20
johnsom	Oh, ha, I was wrong, there is one for $16,000	22:20
rm_work	well, one option is we tell our customer "sorry, we know you want CPU/RAM stats, but... you don't get that. Trust us that the meaningful metric is current active connections, and use that."	22:22
rm_work	which I tried to do in our last meeting and was ignored	22:22
rm_work	or rather, told "no, that isn't good enough"	22:23
rm_work	but we may just have to be more forceful	22:23
johnsom	Yeah, so given that all pretty much every other driver those are shared resources across tenants, I don't think it is viable for adding to the load balancer or listener stats.	22:23
johnsom	If you want to make something like that public, you could I guess come up with a bogo-mip kind of thing. It's not really CPU or RAM, but "bogo-load".	22:24
johnsom	Most drivers would probably do number of connections over max	22:25
rm_work	right, a "capacity units" measurement, 0-1	22:26
rm_work	something like that	22:26
rm_work	and yeah, we would do connections over our estimated max	22:26
johnsom	We already have that to some degree as the listener will go degraded if the connection level goes too high	22:27
aannuusshhkkaa	how do we check that currently? just based on number of active connections?	22:31
johnsom	Yes. You set your maximum number of connections when creating the listener. Then we currently collect the number of current active connections.	22:32
johnsom	HAProxy also notifies us via the "FULL" state: https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L289	22:33
johnsom	That means connections are getting queued on the front end	22:33
aannuusshhkkaa	aah gotcha	22:40
aannuusshhkkaa	so the health of a LB is determined based on current connections/max number of connections?	22:41
johnsom	Among the other status issues that could lead to a degraded or error operational state.	22:41
rm_work	basically everything boils down to how many connections can be open	22:42
rm_work	CPU/RAM is all just there so we can have more open connections	22:42
johnsom	+1 to that	22:42
rm_work	basically an operator should know in their environment how many connections is a theoretical max	22:42
rm_work	and then that should be the "100%"	22:43
aannuusshhkkaa	johnsom, what are the other "status issues"?	22:43
johnsom	Like the number of member servers that are down, etc.	22:43
aannuusshhkkaa	okay so would we have to incorporate those as well in determining the health of the LB?	22:44
johnsom	So if you have a pool with five member servers, and one is not responding, that would be a degraded state as well. If all are not responding, then you are in ERROR.	22:44
shtepanie	going back a little, but if HAProxy notifies us of a "FULL" state, is it also possible to notify us of a "almost full" type of state? some sort of warning when we're starting to use close to a full state?	22:45
johnsom	Yeah, we already do. That is what the operational status is for	22:45
shtepanie	ahh okay	22:45
aannuusshhkkaa	do we already have something that would warn us when the LB is at 50% or 75% capacity?	22:46
johnsom	Not really percentages. Really CPU and RAM won't tell you that either.	22:48
aannuusshhkkaa	yeap yeap..	22:48
aannuusshhkkaa	so if we want to find that out, what would we use?	22:49
johnsom	I mean a user can calculate the percentage using the active connections. We will say when you have reached you connection capacity limit. The part that is harder is deciding on the correct "MAX" to set.	22:50
rm_work	yeah MAX has to be determined with performance testing per cloud	22:50
aannuusshhkkaa	dont we already calculate the max for each cloud?	22:51
johnsom	google "openstack octavia performance" a few lines down is a guide I wrote a long time ago that I guess is now published. It has a list of some of the factors that will go into why getting a "MAX" is hard.	22:54
aannuusshhkkaa	alrighty.. i'll take a look	22:56
rm_work	yeah it has to do with your network hardware setup, your compute hosts, and maybe a couple other things	22:59
aannuusshhkkaa	https://developer.rackspace.com/docs/private-cloud/rpc/master/rpc-octavia-internal/octavia-perf-guide/ is this the link?	22:59
rm_work	yeah	22:59
aannuusshhkkaa	okay	22:59
johnsom	rm_work So back to the haproxy reload issue. Systemd docs suck, so it's not clear which of those timeouts matter on a reload call. Worse yet, it implies the timeout is 100ms which can't be true. Any luck testing out the connection limit change on the listener?	23:02
aannuusshhkkaa	so there are about 18 factors that contribute to the performance according to that page.. do we collect that data? if so, where can i find it?	23:02
johnsom	At least that many factors...	23:03
rm_work	yeah seems to be the connection limit thing	23:03
aannuusshhkkaa	yeah.. maybe if we look at the logs, we can come up with a formula to determine LB capacity units..	23:03
rm_work	guess the memory pressure is too ephemeral	23:03
johnsom	Yeah, great. ok. So we have been talking about this internally for a while. Thus the patch Greg posted, but last I looked it needs work.	23:04
johnsom	Basically with the "unlimited" -1, we translated that into 1,000,000 connections. With the memory allocation up front, that is a sizable amount. Now, using the current reload mechanism for hitless reloads, haproxy starts a secondary process, or depending on how often the reloads come in, more.	23:06
johnsom	What seems to be a bit new is how long the old processes stick around.	23:06
johnsom	In general, what should happen is if haproxy doesn't have enough memory, it should fail and systemd should kill it and restart it. So the pain would only be that it was a non-hitless reload, but the amp should continue fine. However that isn't happening right. I found a bug in systemd that was causing restarts to not fire in the version centos has. That should be fixed, at least in RHEL.	23:10
johnsom	The gunicorn issue.... that one i'm not sure about. If we coudl set a timeout for systemd reload, that would be great. The only other thought is setting a timeout on the subsystem call in python.	23:11
rm_work	hmm	23:21
rm_work	weird that even a "restart" on the agent didn't actually stop/start the agent properly either	23:21
rm_work	it was kinda... stuck?	23:21
rm_work	I think	23:21

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!