Thursday, 2020-12-03

*** noonedeadpunk has quit IRC00:08
*** noonedeadpunk has joined #openstack-lbaas00:10
*** cgoncalves has quit IRC00:28
*** gregwork has quit IRC00:48
*** cgoncalves has joined #openstack-lbaas01:06
*** armax has quit IRC01:29
*** rcernin has quit IRC01:58
*** rcernin has joined #openstack-lbaas01:58
*** ramishra_ has joined #openstack-lbaas02:08
*** xgerman has quit IRC02:56
*** rcernin has quit IRC03:06
openstackgerritwu.chunyang proposed openstack/octavia master: Add notifications specification documens  https://review.opendev.org/c/openstack/octavia/+/72791503:16
*** rcernin has joined #openstack-lbaas03:26
*** rcernin has quit IRC03:30
*** rcernin has joined #openstack-lbaas03:30
*** sapd1_x has joined #openstack-lbaas03:33
*** psachin has joined #openstack-lbaas03:59
*** lemko has quit IRC04:25
*** lemko has joined #openstack-lbaas04:25
*** gcheresh has joined #openstack-lbaas05:32
*** sapd1_x has quit IRC05:40
openstackgerritzhangboye proposed openstack/octavia master: Replace deprecated UPPER_CONSTRAINTS_FILE variable  https://review.opendev.org/c/openstack/octavia/+/76524005:48
openstackgerritMerged openstack/octavia-tempest-plugin master: Add HTTP/2 support to the Go test server  https://review.opendev.org/c/openstack/octavia-tempest-plugin/+/75861706:39
openstackgerritMerged openstack/octavia master: Add amphora_id in store params for failover_amphora  https://review.opendev.org/c/openstack/octavia/+/76038006:45
openstackgerritAnn Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora  https://review.opendev.org/c/openstack/octavia/+/74043207:20
openstackgerritAnn Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora  https://review.opendev.org/c/openstack/octavia/+/74043207:20
*** rpittau|afk is now known as rpittau07:27
openstackgerritwu.chunyang proposed openstack/octavia master: Add default value for enabled column in l7rule table  https://review.opendev.org/c/openstack/octavia/+/76128307:27
*** sapd1_x has joined #openstack-lbaas07:29
openstackgerritwu.chunyang proposed openstack/octavia master: Add notifications specification documens  https://review.opendev.org/c/openstack/octavia/+/72791507:49
rm_workinteresting one -- an amp went down because a HV went down... and it took MANY HOURS to actually failover (config is set to default of 60s heartbeat stale time)08:04
rm_workrecorded this error at the time I assume it was first stale:08:04
rm_work octavia/controller/worker/v1/tasks/amphora_driver_tasks.py:execute:84 Failed to update listeners on amphora c37d547f-bacb-48c3-8bab-41254bba4945. Skipping this amphora as it is failing to update due to: contacting the amphora timed out08:08
rm_workand i see a ton of the timeouts going back for ~12 minutes (120 retries at 5s each, also default config, it seems)08:13
rm_workso ... the amp went down, became stale in the DB, and then... why would it be trying to connect and timing out? O_o08:13
rm_workafter that timeout it seems it moved on to another step? and failed timeouts for another ~10.5m (120*5) until it got this:08:16
rm_workoctavia/controller/worker/v1/tasks/amphora_driver_tasks.py:execute:136 Failed to reload listeners on amphora c37d547f-bacb-48c3-8bab-41254bba4945. Skipping this amphora as it is failing to reload due to: contacting the amphora timed out08:16
rm_workAH, it seems that both of the amphorae on the LB failed at almost the same time (may have been on the same HV... damn soft-aa)08:21
rm_workpossibly the single-amp failover process doesn't handle the case super well?08:21
rm_workone succeeded but the other most certainly did not08:21
*** luksky has joined #openstack-lbaas08:22
rm_workOH I SEE (I think)08:24
rm_workSo both went down at approximately the same time. One of them went stale first. HM tried to failover that amp. It succeeded but took a really long time because it had to time-out on two steps attempting to update the other amp.08:25
rm_workthose timeout failures cause the other amp (which was also down) to go to ERROR08:26
rm_worksomehow about 2h45m later, it actually DID get picked up as stale again??? was it marked busy somehow for that time? unclear08:28
rm_workat which point it completed failover in short order08:28
*** redrobot has quit IRC08:41
*** rcernin has quit IRC08:46
*** vishalmanchanda has joined #openstack-lbaas08:47
rm_worknevermind. figured out the correct timeline08:50
rm_work6:26 -- amp1 HV dies08:51
rm_work6:27 -- amp1 goes stale, attempts to failover, but can't connect to amp2 to update the haproxy/vrrp peer configs08:51
rm_work6:39 -- times out on first task (update listeners)08:53
rm_work6:49 -- times out on second task (reload listeners)08:53
rm_work6:50 -- amp1 failover complete08:54
rm_work9:32 -- amp2 HV dies08:54
rm_work9:33 -- amp2 goes stale, fails over08:54
rm_work9:35 -- amp2 failover complete08:54
rm_workso, the issue was that amp2 was not connectable during the amp1 failover, even tho it should have been up, as it was sending heartbeats just fine and the HV *was* up08:55
rm_workamp2 was in ERROR status for the intervening period until the heartbeats actually did fail, and then it was replaced correctly08:55
rm_workMEANWHILE, user was reporting intermittent 502 errors -- I can only guess that: without an updated vrrp config on amp2, it thought it was still supposed to be the MASTER and so it kept gARPing, but so did amp1-new, and the routes were flipping back and forth constantly? would that cause a 502 in the case where it happened at exactly the right time (between packets on a keepalive connection)?08:57
*** rcernin has joined #openstack-lbaas09:00
*** zzzeek has quit IRC09:02
*** rcernin has quit IRC09:04
*** zzzeek has joined #openstack-lbaas09:04
*** ataraday_ has joined #openstack-lbaas09:07
lxkonganyone has seen this error during failover before https://dpaste.com/C68CMVQ6A#wrap?09:38
lxkongafter upgrading octavia from train to ussuri.09:39
lxkongThe load balancer info: https://dpaste.com/7HQRKPWL809:39
lxkongin ussuri, when failover failed, the new amphora is removed, so no chance to log in and check09:40
*** luksky has quit IRC09:51
lxkongwell, the pool in the load balancer has configured session persistence10:11
lxkongthe peer section is empty, https://dpaste.com/GV6V84PHV10:22
lxkonghmm...seems like the issue has been fixed in the upstream recently11:01
lxkongusing the latest master, amphora can be successfully failed over11:01
lxkongfound, https://review.opendev.org/q/change:I923accd73e0c9cadc91c115157c576432f42862211:16
*** sapd1 has quit IRC11:17
*** zzzeek has quit IRC11:17
*** zzzeek has joined #openstack-lbaas11:19
*** luksky has joined #openstack-lbaas11:22
*** sapd1_x has quit IRC11:27
*** sapd1_x has joined #openstack-lbaas11:33
*** spatel has joined #openstack-lbaas11:34
*** sapd1_x has quit IRC11:38
*** spatel has quit IRC11:39
*** ramishra_ has quit IRC11:40
*** psachin has quit IRC11:45
*** ramishra has joined #openstack-lbaas11:59
openstackgerritAnn Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora  https://review.opendev.org/c/openstack/octavia/+/74043212:01
openstackgerritAnn Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora  https://review.opendev.org/c/openstack/octavia/+/74043212:03
*** ramishra has quit IRC12:09
*** ramishra has joined #openstack-lbaas12:16
*** zzzeek has quit IRC13:09
*** zzzeek has joined #openstack-lbaas13:11
*** mugsie has quit IRC13:32
*** TrevorV has joined #openstack-lbaas13:38
*** ramishra has quit IRC13:45
*** ramishra has joined #openstack-lbaas14:04
*** ataraday_ has quit IRC14:51
*** laerling has joined #openstack-lbaas15:17
*** redrobot has joined #openstack-lbaas15:35
*** spatel has joined #openstack-lbaas15:40
spateljohnsom: hey!15:40
johnsomrm_work: Your frankinetwork makes it had to say definitively, but if the user got a 502 from the load balancer it was reachable, but a backend server may have become unreachable while it was servicing the request.15:40
johnsomspatel Hi15:40
spatelI have affinity SINGLE for octavia but when i building LB default it creating two amphora somehow15:41
rm_workHmm15:41
johnsomDo you have a spares pool configured?15:42
spateljohnsom: let me collect more logs etc. thought just ask you if any thing change recently which i am not aware.15:42
spatelspares pool?  i didn't do any special configuration (everything is default)15:43
johnsomCheck your config file for the spares setting and make sure it is not configured15:43
spatellooking..15:44
johnsomspare_amphora_pool_size15:45
spatelspare_amphora_pool_size = 115:45
johnsomThat is why, it is booting a spare amp15:45
johnsomSet that to zero15:45
spatelhow does this extra amphora different then full HA mode?15:46
spatelwhat is the use of having spare_amphora_pool_size setting?15:46
johnsomThey are unconfigured and can be used when creating a new load balancer15:47
adebergfaster recovery i believe15:47
johnsomWell, in very limited situations due to nova issues. We have marked it deprecated in recent releases15:47
spateljohnsom: you are saying this option will deprecated in future release right?15:49
johnsomThe idea wad to speed creation and failover by having the VM already booted.15:49
johnsomYes15:49
spatelgood to know so i don't spend my time on that one. :)15:49
johnsomhttps://docs.openstack.org/octavia/latest/configuration/configref.html#house_keeping.spare_amphora_pool_size15:49
spatel+115:53
openstackgerritMerged openstack/octavia stable/ussuri: Fix load balancers with failed amphora failover  https://review.opendev.org/c/openstack/octavia/+/76373215:53
openstackgerritMerged openstack/octavia stable/stein: Fix missing cronie package in RHEL-based image builds  https://review.opendev.org/c/openstack/octavia/+/76489015:54
openstackgerritMerged openstack/octavia stable/train: Fix missing cronie package in RHEL-based image builds  https://review.opendev.org/c/openstack/octavia/+/76488915:54
spateljohnsom: does octavia support ngnix amphora?15:54
johnsomNo15:54
spatel:(15:54
johnsomBad licensing issues and no one developed one15:55
*** armax has joined #openstack-lbaas15:55
johnsomPlus why?15:55
spatelcurrently we are running ngnix using tcp stream socket15:55
spatelnot sure if haproxy support that protocol15:56
johnsomYes it does15:56
spatelhmm i think i need to ask developer to try out haproxy to validate functionality15:57
spatelIs there any load-testing or benchmark report available for octvia to verify how many TPS it can handle with standard hardware?15:58
spatelI am going to benchmark but i need some baseline report to compare my result15:59
johnsomHa, well, that is a moving target and highly dependent on the underlying cloud15:59
johnsomIf you google, there is a page that will come up for an old version16:00
spatelOk i will try to find them.16:00
johnsomIt was around 30,000 for an older amp, 1 core, 1gbps16:01
spatel30k TPS with SSL.. that is freaking awesome number16:01
johnsomNo, that was not with TLS16:01
spatelah16:02
johnsomWith TLS you will want your nova to pass through the encryption acceleration cpu functions and may need to bump the RAM for the amp16:03
spatelcurrently i am using public centos8 amphora, and lots of doc saying build your own so is there really advantage to build own amphora image?16:03
spatelYes for TLS we need AES flag on CPU with openssl support to use that flag16:04
johnsomWell, we don’t ship images, so everyone builds their own16:04
johnsomOur amps will use the extensions if they are there16:04
spatelwhat is the advantage of getting from public place vs building own? (i believe we can add some custom stuff if we build in home)16:05
johnsomI guess that you have current bits. We provide scripts that make it quick and easy to build the image16:06
johnsomWe, as the OpenStack community, do not ship prebuilt images for production use.16:07
johnsomSome vendors do however16:07
johnsomOver the years there has been some advantage to building custom to get a newer version of HAproxy than the distros shipped. But right now I think we are pulling in 2.x so in good shape16:10
spateljohnsom: thanks16:21
johnsomSure, np16:21
spateljohnsom: does amphora support SRIOV instances for performance?16:22
johnsomWe have not yet added the scheduling hints to flavors to support that. It can, but the dev work has not been done yet.16:23
johnsomAre you interested in QAT SRIOV or the nic SRIOV?16:24
spatelnic SRIOV16:25
spatelmy 80% workload running on sriov instances so looking for that support if required to run high performance haproxy LB16:25
spatelwhat is QAT?16:26
johnsomEncryption and compression offload16:27
spatelNot there yet.16:28
johnsomYeah, the underlying networking is usually the bottleneck for the amps16:28
spatelwe use nic SRIOV for low latency network16:28
spatelvirtio is really bad for moderate workload16:29
spateli did benchmarking and found virtio only support 200kpps Vs sriov support 1.5mpps16:29
johnsomI have seen up to 14gbps through an amp, TCP, but it was same host16:30
johnsomWell, there is a lot of tuning that can be done as well16:30
spatelI always do benchmark based on PPS rate16:30
spatelThis is my Trex result of standard virtio vm - https://asciinema.org/a/qXPA48Kc7deILJJrObtF2i3ZT16:31
spatelThis is SR-IOV Trex result - https://asciinema.org/a/37636716:32
johnsomThere are still a lot of usecases where using a hardware provider with Octavia is the right answer.16:32
spatelACTIVE+ACTIVE will solve all those issue :)16:33
johnsomWell, it was going to target them for sure. Now with the HAProxy 2.x amps we can vertically scale by adding cpu cores as well, which will also provide a good bump16:36
johnsomThere is more tuning for that I have planned as well, but sadly my focus is on other internal projects at the moment16:37
*** sapd1_x has joined #openstack-lbaas16:37
spateladding more cpu means changing flavor right?16:38
johnsomWell, you would create an Octavia flavor unless you want all of you lbs to have more cores16:40
spatel+116:40
spatelCan i pass properties via flavor to select SINGLE vs ACTIVE-STANDBY16:41
johnsomYes16:42
spatelnice! let me go back to my lab for more testing :) thank you johnsom16:46
johnsomSure, NP16:47
*** rpittau is now known as rpittau|afk16:49
*** luksky has quit IRC16:54
*** sapd1_x has quit IRC17:25
openstackgerritMerged openstack/octavia stable/stein: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros.  https://review.opendev.org/c/openstack/octavia/+/76489417:43
openstackgerritMerged openstack/octavia stable/train: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros.  https://review.opendev.org/c/openstack/octavia/+/76489318:27
*** vishalmanchanda has quit IRC19:22
*** beagles has quit IRC19:28
*** b3nt_pin has joined #openstack-lbaas19:29
*** b3nt_pin is now known as beagles19:29
*** gcheresh has quit IRC19:35
*** luksky has joined #openstack-lbaas19:42
*** rcernin has joined #openstack-lbaas19:57
*** rcernin has quit IRC20:23
*** xgerman has joined #openstack-lbaas20:31
openstackgerritMerged openstack/octavia stable/ussuri: Fix missing cronie package in RHEL-based image builds  https://review.opendev.org/c/openstack/octavia/+/76488820:34
openstackgerritMerged openstack/octavia stable/ussuri: Fix load balancers with failed amphora failover  https://review.opendev.org/c/openstack/octavia/+/75690320:54
openstackgerritMerged openstack/octavia master: Remove re-import of octavia-lib constants  https://review.opendev.org/c/openstack/octavia/+/76343720:54
*** rcernin has joined #openstack-lbaas20:58
*** rcernin has quit IRC21:03
*** TrevorV has quit IRC21:04
*** openstackgerrit has quit IRC21:08
*** rcernin has joined #openstack-lbaas21:29
*** ccamposr has joined #openstack-lbaas21:41
*** ccamposr__ has quit IRC21:43
*** spatel has quit IRC22:16
*** ccamposr__ has joined #openstack-lbaas22:35
*** ccamposr has quit IRC22:38
*** luksky has quit IRC23:03
*** tkajinam has quit IRC23:03
*** tkajinam has joined #openstack-lbaas23:04
*** openstackgerrit has joined #openstack-lbaas23:11
openstackgerritMerged openstack/octavia stable/victoria: Map cloud-guest-utils to cloud-utils-growpart for Red Hat distros.  https://review.opendev.org/c/openstack/octavia/+/76489123:11

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!