Tuesday, 2022-07-05

*** ysandeep\|out is now known as ysandeep\|ruck		04:02
*** ysandeep\|ruck is now known as ysandeep\|ruck\|afk		04:38
*** ysandeep\|ruck\|afk is now known as ysandeep\|ruck		05:08
*** ysandeep\|ruck is now known as ysandeep\|ruck\|afk		06:18
opendevreview	Vishal Manchanda proposed openstack/octavia-dashboard master: Migrate to AngularJS v1.8.2 https://review.opendev.org/c/openstack/octavia-dashboard/+/846176	06:35
*** ysandeep\|ruck\|afk is now known as ysandeep\|ruck		07:11
*** ysandeep\|ruck is now known as ysandeep\|lunch		08:15
*** ysandeep\|lunch is now known as ysandeep\|ruck		10:16
*** ysandeep\|ruck is now known as ysandeep\|brb		12:52
*** ysandeep\|brb is now known as ysandeep\|ruck		13:01
opendevreview	Tom Weininger proposed openstack/octavia master: Proposal for Amphora vertical scaling https://review.opendev.org/c/openstack/octavia/+/848105	14:00
opendevreview	Tom Weininger proposed openstack/octavia master: Add element for TuneD and Tuna https://review.opendev.org/c/openstack/octavia/+/848637	14:00
*** ysandeep\|ruck is now known as ysandeep\|out		14:39
opendevreview	Tom Weininger proposed openstack/octavia master: Add element for TuneD and Tuna https://review.opendev.org/c/openstack/octavia/+/848637	15:14
spatel	johnsom Hi	20:04
spatel	around?	20:04
johnsom	spatel Hi	20:05
spatel	I have haproxy related question, i think you can help me out here :) - https://paste.opendev.org/show/bhM1FR1Y0fzOyaSPexiN/	20:05
spatel	I have 5 backend servers and single source IP	20:06
spatel	why i am running out of ports even i have enough local port range?	20:06
johnsom	Are you benchmarking or normal traffic?	20:06
spatel	I am benchmarking haproxy	20:07
spatel	we have built in load tester for my customer application	20:07
spatel	I am generating 50k connection to haproxy but as soon as it hit 35k ish.. then problem start and haproxy.log full of error logs	20:08
johnsom	Yeah, ok, so the kernel will put the ports in TIMED_WAIT for a period after use. When benchmarking you are making a lot of short connections (unless you use HTTP keepalive), which can use up the ports and they will be stuck in "TIMED_WAIT".	20:08
johnsom	You can use lsof to see those	20:09
spatel	I didn't noticed any error in dmes (kernel logs related local_port_range)	20:09
spatel	dmesg*	20:09
johnsom	https://sysctl-explorer.net/net/ipv4/tcp_tw_reuse/	20:10
spatel	lsof -u haproxy -n \| grep WAIT	20:10
spatel	empty output	20:11
johnsom	Don't use -u, just lsof -n	20:11
johnsom	Technically they aren't owned by haproxy at that point.	20:11
johnsom	HAProxy has let them go	20:11
spatel	my load-tester has keepalive which sending ping packet to keep connection alive..	20:11
johnsom	Is it tcp keepalive or HTTP keepalive, they are different	20:12
spatel	lsof -n \| grep WAIT	20:12
spatel	empty output	20:12
johnsom	And the benchmark is running?	20:12
spatel	lsof -n \| grep EST \| wc -l	20:13
spatel	151781	20:13
spatel	I have 15k ESTABLISHED connections	20:13
spatel	for experiment i added alias IP and configured haproxy to use that ip then i can hit 50k connection without issue	20:14
johnsom	Yeah, it supports multiple source IPs	20:15
spatel	I am just curious i have 5 backend server then it should not required more source IP	20:16
johnsom	Can you try "echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"?	20:16
spatel	ok and then run loadtest?	20:16
johnsom	Yeah, it's all about the source ports. From your pastebin, you have 55,000 so, should be ok.	20:16
johnsom	Yeah	20:17
spatel	loadtest started	20:17
johnsom	Default is 0, so you can set it back if you wish	20:17
spatel	yep! lets see	20:17
johnsom	The other thing to note, the health checks will also take a port	20:17
johnsom	But, that should be small	20:18
spatel	Yes that is correct	20:18
spatel	:)	20:18
spatel	my end goal is to run 1million connection :D	20:18
johnsom	That is a lot, so you will need to do some tuning	20:18
spatel	but currently testing with 50k and then try 100k --- 200k -- 500k (baby steps)	20:19
johnsom	Also, do you need 1 million connections or requests. Different metrics. Connections is heavily weighted to issues with the TCP setup	20:20
spatel	Yes we are planning to play with more options for 1mil	20:20
johnsom	https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/	20:20
johnsom	If you haven't seen it	20:20
spatel	i am seeing error in logs no free ports :(	20:20
spatel	/proc/sys/net/ipv4/tcp_tw_reuse didn't help	20:21
johnsom	Hmmm. The other thing that will be an issue for you is the TLS offload and the logging.	20:21
spatel	I will deal with TLS later but currently not able to run 50k :)	20:22
spatel	lets try to pass 50k first	20:22
johnsom	TLS is going to need some hardware help, so make sure AESNI is available, but with that connection level, you might need to go all the way to QAT offload.	20:22
spatel	plan is to deploy multiple HA proxy in DNS rr method	20:22
spatel	in worst case scenario	20:23
johnsom	Yeah, no need for session persistence, it works fine	20:23
spatel	i want to check capacity of single HA proxy first	20:23
spatel	based on that number i can go with N number of haproxy behind DNS	20:23
johnsom	My (old haproxy version) benchmark is around 35k rps per core	20:24
spatel	hmm	20:24
spatel	i have 10 core VM	20:24
spatel	4 core assigned to haproxy	20:24
johnsom	Ok, check one more thing, if you restart haproxy, in the log file are there comments about the maximum number of open files? Any error/warnings?	20:26
spatel	my open file limit is 10million	20:28
spatel	i did all kind of tuning... :?9	20:28
spatel	:(	20:28
spatel	last 2 days doing googling to make it work	20:28
johnsom	Yeah	20:29
johnsom	So, you say you have 15k in established, what are those for? Is that during the benchmark?	20:30
johnsom	Or just normal situation	20:30
johnsom	?	20:30
spatel	15k was current value when i run that command	20:31
johnsom	The benchmark doesn't have some kind of idle connection pool thing that is actually causing a problem does it?	20:31
johnsom	Some protocols will hold open a channel "in case" there is another request, which the benchmark is likely not using since it's trying to maximize connections	20:32
spatel	when i start load it can ramp up from 0 to 50k in 5 min interval and keep all connection active with running ping based keepalive (its all magic of my application loadtester)	20:32
spatel	same loadtest tools working for F5 load-balancer	20:32
spatel	on F5 we hit 1 million without issue.. i am trying to mimic on Haproxy	20:33
johnsom	Check your F5 config for "forced close" or similar	20:33
johnsom	setting	20:33
johnsom	I don't remember what it's called	20:33
johnsom	For TCP traffic, it's related to FIN and RST packets	20:34
johnsom	What protocol are you using over TCP?	20:35
spatel	TCP	20:35
johnsom	Try adding "option nolinger" to your defaults section in haproxy.conf	20:37
spatel	hmm	20:38
johnsom	Oh, actually, just set it in the backend section	20:38
spatel	let me try that	20:38
spatel	i can set on both place :)	20:38
johnsom	I have a suspicion this is an issue with the protocol you are running on top of TCP.	20:39
spatel	hmm	20:39
spatel	let me try some option and see	20:39
johnsom	You might try an alternative benchmark tool, like ethr or something and see if the results are different.	20:41
johnsom	Ok, play around with it. If the CPU isn't maxed out for HAProxy, it's something other than haproxy that is limiting it.	20:43
johnsom	I am on vacation this week, so not around a lot. But let me know how it goes.	20:43
johnsom	Also, the folks in https://haproxy.slack.com/ might also have other ideas for you	20:44
spatel	My big question is if haproxy running out of connection then kernel should log that in dmesg (but in my case its not)	20:44
johnsom	Not really, that is an error that the kernel will return to the requesting application	20:45
johnsom	From the error it sure seemed like reuse was the answer, this is common, but since that didn't work, I am wondering if it's an benchmark too issue and related to the FIN/RST settings like nolinger	20:46
spatel	I am going to try option nolinger (just waiting for my current loadtest to finish otherwise it will be mess to clean up all lingering stuff)	20:47

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!