*** ysandeep|out is now known as ysandeep|ruck | 04:02 | |
*** ysandeep|ruck is now known as ysandeep|ruck|afk | 04:38 | |
*** ysandeep|ruck|afk is now known as ysandeep|ruck | 05:08 | |
*** ysandeep|ruck is now known as ysandeep|ruck|afk | 06:18 | |
opendevreview | Vishal Manchanda proposed openstack/octavia-dashboard master: Migrate to AngularJS v1.8.2 https://review.opendev.org/c/openstack/octavia-dashboard/+/846176 | 06:35 |
---|---|---|
*** ysandeep|ruck|afk is now known as ysandeep|ruck | 07:11 | |
*** ysandeep|ruck is now known as ysandeep|lunch | 08:15 | |
*** ysandeep|lunch is now known as ysandeep|ruck | 10:16 | |
*** ysandeep|ruck is now known as ysandeep|brb | 12:52 | |
*** ysandeep|brb is now known as ysandeep|ruck | 13:01 | |
opendevreview | Tom Weininger proposed openstack/octavia master: Proposal for Amphora vertical scaling https://review.opendev.org/c/openstack/octavia/+/848105 | 14:00 |
opendevreview | Tom Weininger proposed openstack/octavia master: Add element for TuneD and Tuna https://review.opendev.org/c/openstack/octavia/+/848637 | 14:00 |
*** ysandeep|ruck is now known as ysandeep|out | 14:39 | |
opendevreview | Tom Weininger proposed openstack/octavia master: Add element for TuneD and Tuna https://review.opendev.org/c/openstack/octavia/+/848637 | 15:14 |
spatel | johnsom Hi | 20:04 |
spatel | around? | 20:04 |
johnsom | spatel Hi | 20:05 |
spatel | I have haproxy related question, i think you can help me out here :) - https://paste.opendev.org/show/bhM1FR1Y0fzOyaSPexiN/ | 20:05 |
spatel | I have 5 backend servers and single source IP | 20:06 |
spatel | why i am running out of ports even i have enough local port range? | 20:06 |
johnsom | Are you benchmarking or normal traffic? | 20:06 |
spatel | I am benchmarking haproxy | 20:07 |
spatel | we have built in load tester for my customer application | 20:07 |
spatel | I am generating 50k connection to haproxy but as soon as it hit 35k ish.. then problem start and haproxy.log full of error logs | 20:08 |
johnsom | Yeah, ok, so the kernel will put the ports in TIMED_WAIT for a period after use. When benchmarking you are making a lot of short connections (unless you use HTTP keepalive), which can use up the ports and they will be stuck in "TIMED_WAIT". | 20:08 |
johnsom | You can use lsof to see those | 20:09 |
spatel | I didn't noticed any error in dmes (kernel logs related local_port_range) | 20:09 |
spatel | dmesg* | 20:09 |
johnsom | https://sysctl-explorer.net/net/ipv4/tcp_tw_reuse/ | 20:10 |
spatel | lsof -u haproxy -n | grep WAIT | 20:10 |
spatel | empty output | 20:11 |
johnsom | Don't use -u, just lsof -n | 20:11 |
johnsom | Technically they aren't owned by haproxy at that point. | 20:11 |
johnsom | HAProxy has let them go | 20:11 |
spatel | my load-tester has keepalive which sending ping packet to keep connection alive.. | 20:11 |
johnsom | Is it tcp keepalive or HTTP keepalive, they are different | 20:12 |
spatel | lsof -n | grep WAIT | 20:12 |
spatel | empty output | 20:12 |
johnsom | And the benchmark is running? | 20:12 |
spatel | lsof -n | grep EST | wc -l | 20:13 |
spatel | 151781 | 20:13 |
spatel | I have 15k ESTABLISHED connections | 20:13 |
spatel | for experiment i added alias IP and configured haproxy to use that ip then i can hit 50k connection without issue | 20:14 |
johnsom | Yeah, it supports multiple source IPs | 20:15 |
spatel | I am just curious i have 5 backend server then it should not required more source IP | 20:16 |
johnsom | Can you try "echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"? | 20:16 |
spatel | ok and then run loadtest? | 20:16 |
johnsom | Yeah, it's all about the source ports. From your pastebin, you have 55,000 so, should be ok. | 20:16 |
johnsom | Yeah | 20:17 |
spatel | loadtest started | 20:17 |
johnsom | Default is 0, so you can set it back if you wish | 20:17 |
spatel | yep! lets see | 20:17 |
johnsom | The other thing to note, the health checks will also take a port | 20:17 |
johnsom | But, that should be small | 20:18 |
spatel | Yes that is correct | 20:18 |
spatel | :) | 20:18 |
spatel | my end goal is to run 1million connection :D | 20:18 |
johnsom | That is a lot, so you will need to do some tuning | 20:18 |
spatel | but currently testing with 50k and then try 100k --- 200k -- 500k (baby steps) | 20:19 |
johnsom | Also, do you need 1 million connections or requests. Different metrics. Connections is heavily weighted to issues with the TCP setup | 20:20 |
spatel | Yes we are planning to play with more options for 1mil | 20:20 |
johnsom | https://www.haproxy.com/blog/haproxy-forwards-over-2-million-http-requests-per-second-on-a-single-aws-arm-instance/ | 20:20 |
johnsom | If you haven't seen it | 20:20 |
spatel | i am seeing error in logs no free ports :( | 20:20 |
spatel | /proc/sys/net/ipv4/tcp_tw_reuse didn't help | 20:21 |
johnsom | Hmmm. The other thing that will be an issue for you is the TLS offload and the logging. | 20:21 |
spatel | I will deal with TLS later but currently not able to run 50k :) | 20:22 |
spatel | lets try to pass 50k first | 20:22 |
johnsom | TLS is going to need some hardware help, so make sure AESNI is available, but with that connection level, you might need to go all the way to QAT offload. | 20:22 |
spatel | plan is to deploy multiple HA proxy in DNS rr method | 20:22 |
spatel | in worst case scenario | 20:23 |
johnsom | Yeah, no need for session persistence, it works fine | 20:23 |
spatel | i want to check capacity of single HA proxy first | 20:23 |
spatel | based on that number i can go with N number of haproxy behind DNS | 20:23 |
johnsom | My (old haproxy version) benchmark is around 35k rps per core | 20:24 |
spatel | hmm | 20:24 |
spatel | i have 10 core VM | 20:24 |
spatel | 4 core assigned to haproxy | 20:24 |
johnsom | Ok, check one more thing, if you restart haproxy, in the log file are there comments about the maximum number of open files? Any error/warnings? | 20:26 |
spatel | my open file limit is 10million | 20:28 |
spatel | i did all kind of tuning... :?9 | 20:28 |
spatel | :( | 20:28 |
spatel | last 2 days doing googling to make it work | 20:28 |
johnsom | Yeah | 20:29 |
johnsom | So, you say you have 15k in established, what are those for? Is that during the benchmark? | 20:30 |
johnsom | Or just normal situation | 20:30 |
johnsom | ? | 20:30 |
spatel | 15k was current value when i run that command | 20:31 |
johnsom | The benchmark doesn't have some kind of idle connection pool thing that is actually causing a problem does it? | 20:31 |
johnsom | Some protocols will hold open a channel "in case" there is another request, which the benchmark is likely not using since it's trying to maximize connections | 20:32 |
spatel | when i start load it can ramp up from 0 to 50k in 5 min interval and keep all connection active with running ping based keepalive (its all magic of my application loadtester) | 20:32 |
spatel | same loadtest tools working for F5 load-balancer | 20:32 |
spatel | on F5 we hit 1 million without issue.. i am trying to mimic on Haproxy | 20:33 |
johnsom | Check your F5 config for "forced close" or similar | 20:33 |
johnsom | setting | 20:33 |
johnsom | I don't remember what it's called | 20:33 |
johnsom | For TCP traffic, it's related to FIN and RST packets | 20:34 |
johnsom | What protocol are you using over TCP? | 20:35 |
spatel | TCP | 20:35 |
johnsom | Try adding "option nolinger" to your defaults section in haproxy.conf | 20:37 |
spatel | hmm | 20:38 |
johnsom | Oh, actually, just set it in the backend section | 20:38 |
spatel | let me try that | 20:38 |
spatel | i can set on both place :) | 20:38 |
johnsom | I have a suspicion this is an issue with the protocol you are running on top of TCP. | 20:39 |
spatel | hmm | 20:39 |
spatel | let me try some option and see | 20:39 |
johnsom | You might try an alternative benchmark tool, like ethr or something and see if the results are different. | 20:41 |
johnsom | Ok, play around with it. If the CPU isn't maxed out for HAProxy, it's something other than haproxy that is limiting it. | 20:43 |
johnsom | I am on vacation this week, so not around a lot. But let me know how it goes. | 20:43 |
johnsom | Also, the folks in https://haproxy.slack.com/ might also have other ideas for you | 20:44 |
spatel | My big question is if haproxy running out of connection then kernel should log that in dmesg (but in my case its not) | 20:44 |
johnsom | Not really, that is an error that the kernel will return to the requesting application | 20:45 |
johnsom | From the error it sure seemed like reuse was the answer, this is common, but since that didn't work, I am wondering if it's an benchmark too issue and related to the FIN/RST settings like nolinger | 20:46 |
spatel | I am going to try option nolinger (just waiting for my current loadtest to finish otherwise it will be mess to clean up all lingering stuff) | 20:47 |
Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!