Friday, 2020-07-10

*** wuchunyang has quit IRC00:15
*** wuchunyang has joined #openstack-lbaas00:16
*** yamamoto has joined #openstack-lbaas00:17
*** wuchunyang has quit IRC01:02
openstackgerritAnushka Singh proposed openstack/octavia master: Refactoring amphora stats driver interface  https://review.opendev.org/73711101:03
aannuusshhkkaajohnsom, we fixed 2/3 issues you had raised on https://review.opendev.org/737111...01:10
*** yamamoto has quit IRC01:23
*** yamamoto has joined #openstack-lbaas01:26
*** yamamoto has quit IRC01:30
*** tkajinam has quit IRC01:32
*** tkajinam has joined #openstack-lbaas01:32
openstackgerritMerged openstack/octavia-tempest-plugin master: Change to use memory_tracker variable  https://review.opendev.org/70420201:32
*** wuchunyang has joined #openstack-lbaas01:34
*** ianychoi_ has quit IRC01:48
*** ianychoi_ has joined #openstack-lbaas01:50
*** armax has joined #openstack-lbaas02:27
*** yamamoto has joined #openstack-lbaas02:51
*** yamamoto has quit IRC03:05
*** yamamoto has joined #openstack-lbaas03:06
*** wuchunyang has quit IRC03:40
*** wuchunyang has joined #openstack-lbaas03:45
*** coreycb has quit IRC04:08
*** headphoneJames has quit IRC04:08
*** rm_work has quit IRC04:08
*** nicolasbock has quit IRC04:08
*** armax has quit IRC04:08
*** KeithMnemonic has quit IRC04:08
*** bcafarel has quit IRC04:08
*** laerling has quit IRC04:08
*** oklhost has quit IRC04:08
*** dayou has quit IRC04:08
*** zigo has quit IRC04:08
*** gthiemonge has quit IRC04:08
*** andy_ has quit IRC04:08
*** cgoncalves has quit IRC04:08
*** amotoki has quit IRC04:08
*** dulek has quit IRC04:08
*** f0o has quit IRC04:08
*** ramishra has quit IRC04:08
*** dasp_ has quit IRC04:08
*** gmann has quit IRC04:08
*** emccormick has quit IRC04:08
*** dougwig has quit IRC04:08
*** mnaser has quit IRC04:08
*** squarebracket has quit IRC04:08
*** ianychoi_ has quit IRC04:08
*** jmccrory has quit IRC04:08
*** njohnston has quit IRC04:08
*** wuchunyang has quit IRC04:08
*** rpittau has quit IRC04:08
*** osmanlicilegi has quit IRC04:08
*** zzzeek has quit IRC04:08
*** brtknr has quit IRC04:08
*** mloza has quit IRC04:08
*** trident has quit IRC04:08
*** tobberydberg_ has quit IRC04:08
*** eandersson has quit IRC04:08
*** sorrison has quit IRC04:08
*** zetaab has quit IRC04:08
*** openstackgerrit has quit IRC04:08
*** frickler has quit IRC04:08
*** johnthetubaguy has quit IRC04:08
*** vesper11 has quit IRC04:08
*** fyx has quit IRC04:08
*** NobodyCam has quit IRC04:08
*** jrosser has quit IRC04:08
*** aannuusshhkkaa has quit IRC04:08
*** JayF has quit IRC04:08
*** mugsie has quit IRC04:08
*** stingrayza has quit IRC04:08
*** jamespage has quit IRC04:08
*** kklimonda has quit IRC04:08
*** dmsimard has quit IRC04:08
*** dosaboy has quit IRC04:08
*** TMM has quit IRC04:08
*** michchap has quit IRC04:08
*** devfaz has quit IRC04:08
*** numans has quit IRC04:08
*** yamamoto has quit IRC04:08
*** servagem has quit IRC04:08
*** colin- has quit IRC04:08
*** dtruong has quit IRC04:08
*** beisner has quit IRC04:08
*** hemanth_n has quit IRC04:08
*** irclogbot_3 has quit IRC04:08
*** haleyb has quit IRC04:08
*** logan- has quit IRC04:08
*** kevinz has quit IRC04:08
*** lxkong has quit IRC04:08
*** johnsom has quit IRC04:08
*** tkajinam has quit IRC04:08
*** xgerman has quit IRC04:08
*** andrein has quit IRC04:08
*** stevenglasford has quit IRC04:08
*** ramishra has joined #openstack-lbaas04:14
*** squarebracket has joined #openstack-lbaas04:14
*** mnaser has joined #openstack-lbaas04:14
*** dougwig has joined #openstack-lbaas04:14
*** emccormick has joined #openstack-lbaas04:14
*** gmann has joined #openstack-lbaas04:14
*** dasp_ has joined #openstack-lbaas04:14
*** amotoki has joined #openstack-lbaas04:14
*** cgoncalves has joined #openstack-lbaas04:14
*** andy_ has joined #openstack-lbaas04:14
*** gthiemonge has joined #openstack-lbaas04:14
*** nicolasbock has joined #openstack-lbaas04:14
*** rm_work has joined #openstack-lbaas04:14
*** headphoneJames has joined #openstack-lbaas04:14
*** coreycb has joined #openstack-lbaas04:14
*** frickler has joined #openstack-lbaas04:14
*** openstackgerrit has joined #openstack-lbaas04:14
*** zetaab has joined #openstack-lbaas04:14
*** sorrison has joined #openstack-lbaas04:14
*** eandersson has joined #openstack-lbaas04:14
*** johnthetubaguy has joined #openstack-lbaas04:14
*** tobberydberg_ has joined #openstack-lbaas04:14
*** trident has joined #openstack-lbaas04:14
*** zzzeek has joined #openstack-lbaas04:14
*** mugsie has joined #openstack-lbaas04:14
*** JayF has joined #openstack-lbaas04:14
*** aannuusshhkkaa has joined #openstack-lbaas04:14
*** jrosser has joined #openstack-lbaas04:14
*** NobodyCam has joined #openstack-lbaas04:14
*** vesper11 has joined #openstack-lbaas04:14
*** jmccrory has joined #openstack-lbaas04:14
*** ianychoi_ has joined #openstack-lbaas04:14
*** njohnston has joined #openstack-lbaas04:14
*** osmanlicilegi has joined #openstack-lbaas04:14
*** rpittau has joined #openstack-lbaas04:14
*** numans has joined #openstack-lbaas04:14
*** devfaz has joined #openstack-lbaas04:14
*** michchap has joined #openstack-lbaas04:14
*** TMM has joined #openstack-lbaas04:14
*** dosaboy has joined #openstack-lbaas04:14
*** dmsimard has joined #openstack-lbaas04:14
*** kklimonda has joined #openstack-lbaas04:14
*** jamespage has joined #openstack-lbaas04:14
*** stingrayza has joined #openstack-lbaas04:14
*** yamamoto has joined #openstack-lbaas04:14
*** tkajinam has joined #openstack-lbaas04:14
*** servagem has joined #openstack-lbaas04:14
*** fyx has joined #openstack-lbaas04:14
*** lxkong has joined #openstack-lbaas04:14
*** johnsom has joined #openstack-lbaas04:14
*** xgerman has joined #openstack-lbaas04:14
*** andrein has joined #openstack-lbaas04:14
*** stevenglasford has joined #openstack-lbaas04:14
*** beisner has joined #openstack-lbaas04:14
*** hemanth_n has joined #openstack-lbaas04:14
*** kevinz has joined #openstack-lbaas04:14
*** colin- has joined #openstack-lbaas04:14
*** f0o has joined #openstack-lbaas04:14
*** dulek has joined #openstack-lbaas04:14
*** dtruong has joined #openstack-lbaas04:14
*** irclogbot_3 has joined #openstack-lbaas04:14
*** haleyb has joined #openstack-lbaas04:14
*** logan- has joined #openstack-lbaas04:14
*** brtknr has joined #openstack-lbaas04:15
*** mloza has joined #openstack-lbaas04:15
*** armax has joined #openstack-lbaas04:15
*** laerling has joined #openstack-lbaas04:15
*** KeithMnemonic has joined #openstack-lbaas04:15
*** bcafarel has joined #openstack-lbaas04:15
*** oklhost has joined #openstack-lbaas04:15
*** dayou has joined #openstack-lbaas04:15
*** zigo has joined #openstack-lbaas04:15
*** coreycb has quit IRC04:15
*** nicolasbock has quit IRC04:16
*** beisner has quit IRC04:16
*** gmann has quit IRC04:16
*** mnaser has quit IRC04:16
*** fyx has quit IRC04:16
*** coreycb has joined #openstack-lbaas04:16
*** beisner has joined #openstack-lbaas04:18
*** fyx has joined #openstack-lbaas04:18
*** gmann has joined #openstack-lbaas04:19
*** nicolasbock has joined #openstack-lbaas04:21
*** yamamoto has quit IRC04:35
*** yamamoto has joined #openstack-lbaas04:38
*** gcheresh has joined #openstack-lbaas05:11
*** vishalmanchanda has joined #openstack-lbaas05:25
*** gcheresh has quit IRC05:32
*** wuchunyang has joined #openstack-lbaas05:51
*** ianychoi_ has quit IRC06:22
*** ianychoi_ has joined #openstack-lbaas06:23
*** tkajinam has quit IRC06:25
*** tkajinam has joined #openstack-lbaas06:26
*** also_stingrayza has joined #openstack-lbaas06:37
*** stingrayza has quit IRC06:39
*** also_stingrayza is now known as stingrayza06:47
*** wuchunyang has quit IRC06:48
*** wuchunyang has joined #openstack-lbaas06:53
*** ataraday_ has joined #openstack-lbaas07:20
*** wuchunyang has quit IRC07:25
*** maciejjozefczyk has joined #openstack-lbaas07:57
*** gcheresh has joined #openstack-lbaas08:00
*** gcheresh has quit IRC08:24
*** yamamoto has quit IRC08:26
*** cgoncalves has quit IRC08:27
*** yamamoto has joined #openstack-lbaas08:28
*** cgoncalves has joined #openstack-lbaas08:29
*** yamamoto has quit IRC08:29
*** yamamoto has joined #openstack-lbaas08:29
*** gcheresh has joined #openstack-lbaas08:30
ataraday_cgoncalves, Hi! I was looking into making grenade job do amphora -> amphorav2 upgrade. And there is an issue with that. I found that I can pass some settings via grenade_devstack_localrc only to new setups https://review.opendev.org/#/c/737993/6/zuul.d/amphorav2-jobs.yaml@10708:38
ataraday_But I cannot pass post-configs with that08:38
ataraday_I tried some options and checked other projects, but cannot find anything that helps08:39
cgoncalvesataraday_, hello! yeah, grenade does not support post-config settings like normal jobs do. you will have to have a per release specific upgrade script08:44
cgoncalvesref: https://docs.openstack.org/grenade/latest/readme.html#theory-of-upgrade (see last bullet item in that section)08:45
cgoncalvesan example of a patch I have open: https://review.opendev.org/#/c/738017/4/devstack/upgrade/from-ussuri/upgrade-octavia08:45
cgoncalvesataraday_, although I'm not sure we need such script if we go forward with aliasing "amphora" to "amphorav2"08:46
ataraday_cgoncalves, great, I'll look into it08:47
ataraday_mmm, this was about adding experimental jobs08:48
ataraday_may be I should not add grenade job for now08:50
*** born2bake has joined #openstack-lbaas08:53
*** gcheresh has quit IRC08:55
cgoncalvesataraday_, you could propose a patch with the alias09:02
cgoncalvesshould be trivial, see the "octavia" to "amphora" alias: https://github.com/openstack/octavia/blob/master/setup.cfg#L59-L6009:04
ataraday_I guess making amphorav2 -> amphora is final step, and experimental jobs we need before to verify that we can do it :)09:07
cgoncalvesright, so a depends-on patch would work09:09
cgoncalves1) propose alias patch (ignore CI results), 2) set depends-on on the experimental jobs patch for CI validation09:10
ataraday_OK, but then this job will make sense only against this change.. May be this is fine, I will add comment about it.09:22
ataraday_Thanks a lot!09:22
openstackgerritAnn Taraday proposed openstack/octavia master: Add experimental amphorav2 jobs  https://review.opendev.org/73799309:39
cgoncalveswe can run some experiments to see how it goes :)09:40
*** dosaboy has quit IRC09:40
openstackgerritAnn Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora  https://review.opendev.org/74043209:41
cgoncalvesataraday_, not sure I follow your comment in https://review.opendev.org/#/c/739053/3/octavia/common/base_taskflow.py. retryMaskFilter is in both v2.controller_worker and in base_taskflow. are you saying we only need it in one place?09:43
ataraday_cgoncalves, no, I mean we need it in both places. In v2.controller_worker and base_taskflow09:44
ataraday_with enabled or disabled jobboard logs comes from different start points09:45
*** wuchunyang has joined #openstack-lbaas09:45
ataraday_cgoncalves, https://review.opendev.org/#/c/647406/106/octavia/controller/worker/v2/controller_worker.py@48 It was dropped it from v2.controller_worker09:49
cgoncalvesataraday_, oh, I see the log filter was later removed in the v2.controller_worker.09:49
cgoncalvesyep09:49
ataraday_sorry for confusion09:49
cgoncalvesataraday_, still, v2.controller_worker imports base_taskflow so the log filter will still be applied no?09:50
cgoncalvesah, no. never mind09:50
openstackgerritGregory Thiemonge proposed openstack/octavia-tempest-plugin master: WIP SCTP traffic scenario tests  https://review.opendev.org/73864309:53
*** yamamoto has quit IRC09:53
*** gcheresh has joined #openstack-lbaas09:55
*** wuchunyang has quit IRC10:08
*** yamamoto has joined #openstack-lbaas10:08
*** yamamoto has quit IRC10:08
*** yamamoto has joined #openstack-lbaas10:09
*** yamamoto has quit IRC10:11
*** yamamoto has joined #openstack-lbaas10:12
*** gcheresh has quit IRC10:20
*** spatel has joined #openstack-lbaas10:41
*** spatel has quit IRC10:46
*** pck has quit IRC10:48
*** pck has joined #openstack-lbaas10:51
*** dosaboy has joined #openstack-lbaas11:19
*** dosaboy has quit IRC11:19
*** dosaboy has joined #openstack-lbaas11:19
*** ramishra has quit IRC11:23
*** ramishra has joined #openstack-lbaas11:27
*** yamamoto has quit IRC11:30
*** yamamoto has joined #openstack-lbaas11:40
*** yamamoto has quit IRC11:42
*** pck has quit IRC11:52
*** pck has joined #openstack-lbaas11:54
*** yamamoto has joined #openstack-lbaas12:00
*** gcheresh has joined #openstack-lbaas12:12
*** pck has quit IRC12:12
*** pck has joined #openstack-lbaas12:13
openstackgerritAnn Taraday proposed openstack/octavia master: Alias change amphorav2 -> amphora  https://review.opendev.org/74043212:14
*** yamamoto has quit IRC12:18
*** yamamoto has joined #openstack-lbaas12:28
*** spatel has joined #openstack-lbaas12:42
*** spatel has quit IRC12:47
*** yamamoto has quit IRC12:53
*** yamamoto has joined #openstack-lbaas13:05
*** gcheresh has quit IRC13:07
*** mnaser has joined #openstack-lbaas13:09
*** jamesdenton has joined #openstack-lbaas13:11
devfazhi, anyone here able to help us getting some loadbalancers back to "normal"? We have amphoras in ERROR-state and are unable to failover.13:11
*** yamamoto has quit IRC13:12
*** irclogbot_3 has quit IRC13:27
*** kevinz has quit IRC13:27
*** irclogbot_0 has joined #openstack-lbaas13:29
*** hemanth_n has quit IRC13:29
*** hemanth_n_ has joined #openstack-lbaas13:30
*** TrevorV has joined #openstack-lbaas13:30
*** haleyb has quit IRC13:30
*** logan- has quit IRC13:30
devfazwe would like to remove an amphora from the database (the instance already got removed) and just let octavia create a new one. If we just try to failover an amphora we run into different issues f.e. "unable to attach port" to new amphora, then we removed the port.. => "Port: NULL not found", then we created a vrrp_port as described here http://eavesdrop.openstack.org/irclogs/%23openstack-lbaas/%23openstack-lbaas.2017-11-02.log.html#t2017-11-02T11:013:30
devfaz7:45 - but now getting "subnet_id: Null", ... is there an easy way to just tell octavia: hey, drop this amphora and create a new one with new vrrp_port?13:30
*** gmann has quit IRC13:30
*** gmann has joined #openstack-lbaas13:32
*** logan- has joined #openstack-lbaas13:32
*** yamamoto has joined #openstack-lbaas13:38
openstackgerritMerged openstack/octavia master: Stop to use the __future__ module.  https://review.opendev.org/73288013:41
*** yamamoto has quit IRC13:42
*** yamamoto has joined #openstack-lbaas13:52
openstackgerritGregory Thiemonge proposed openstack/octavia master: Deny the creation of L7Policies in TCP or UDP listeners  https://review.opendev.org/74047814:16
*** gcheresh has joined #openstack-lbaas15:09
*** yamamoto has quit IRC15:10
*** vishalmanchanda has quit IRC15:15
*** sapd1_x has joined #openstack-lbaas15:16
*** ataraday_ has quit IRC15:17
*** tkajinam has quit IRC15:37
*** gcheresh has quit IRC15:57
*** gcheresh has joined #openstack-lbaas15:58
*** armax has quit IRC16:16
*** gcheresh has quit IRC16:19
*** armax has joined #openstack-lbaas16:20
*** dmellado has joined #openstack-lbaas16:50
*** dmellado has quit IRC17:04
*** dmellado has joined #openstack-lbaas17:08
*** dmellado has quit IRC17:26
*** armax has joined #openstack-lbaas17:36
*** armax has quit IRC17:38
*** sapd1_x has quit IRC18:44
*** spatel has joined #openstack-lbaas19:22
*** spatel has quit IRC19:23
*** spatel has joined #openstack-lbaas19:23
*** spatel has quit IRC19:30
rm_workhmm just heads up I am debugging an issue around some session persistence config causing LBs to ERROR19:36
rm_workin my cloud, so cent8 amps and minor patching but not anything that should interfere, will follow up when i have some idea what's u[19:37
johnsom"not anything that should interfere" lol19:38
rm_workit's pretty minimal now19:42
rm_worknova scheduling patch, and a patch to force the cent8 amps to actually ARP properly on boot19:43
*** TrevorV has quit IRC20:01
*** gcheresh has joined #openstack-lbaas20:02
*** maciejjozefczyk has quit IRC20:25
rm_workjohnsom: ok it's super weird20:41
cgoncalvesrm_work, could you please revisit https://review.opendev.org/#/c/738246/ (nest virt for CI patch)20:41
johnsomrm_work Theme of my day20:42
rm_workone of my amps gets20:42
rm_work[2020-07-10 20:35:55 +0000] [1090] [DEBUG] Ignoring connection reset20:42
rm_workand then won't respond for a while20:42
rm_workthen [2020-07-10 20:38:55 +0000] [1031] [CRITICAL] WORKER TIMEOUT (pid:1090)20:42
rm_workand then exits and starts a new worker, then the new worker just gets constant SSL/socket errors20:43
johnsomThis is haproxy? <missing some context>20:43
rm_workthis is the agent20:43
rm_workLBs getting stuck in pending20:43
rm_workand eventually ERROR20:43
rm_work(after timeout)20:43
johnsomyeah, go to ERROR. Ok, so this is the gunicorn worker20:44
rm_workyes20:44
johnsomThis rings a bell, I'm just not sure which one yet.20:44
rm_workoh huh20:45
rm_work[2020-07-10 20:34:10 +0000] [1090] [DEBUG] PUT /1.0/loadbalancer/083a6861-8ec9-47cb-81e6-6b03dbf45a1f/reload20:45
rm_work::ffff:10.249.23.94 - - [10/Jul/2020:20:35:40 +0000] "PUT /1.0/loadbalancer/083a6861-8ec9-47cb-81e6-6b03dbf45a1f/reload HTTP/1.1" 500 377 "-" "Octavia HaProxy Rest Client/0.5 (https://wiki.openstack.org/wiki/Octavia)"20:45
rm_workright before it starts doing this20:46
johnsomYeah, is it memory pressure in the amp?20:46
rm_workhttp://paste.openstack.org/show/3AbgVxoWHvxUwpf6N2Di/20:47
*** KeithMnemonic has quit IRC20:47
rm_workno20:47
cgoncalvescould it be the haproxy memory bug gthiemonge has been working on? because you mentioned session persistence20:48
johnsomIs there a "failed" haproxy config file? You know the one that it saves if haproxy doesn't like the config?20:48
rm_worklooking20:49
johnsomYeah, that is why I asked about the memory pressure20:49
rm_workno20:49
rm_workand the other amp took the config correctly20:49
rm_workboth amps seem to be around:20:49
rm_work              total        used        free      shared  buff/cache   available20:49
rm_workMem:          979Mi       315Mi       540Mi       6.0Mi       123Mi       529Mi20:49
rm_workseems not bad20:49
cgoncalvesrm_work, https://storyboard.openstack.org/#!/story/200779420:49
johnsomYeah, plenty20:50
rm_worki can try changing the connection limit tho20:50
johnsomI doubt it is related20:50
johnsomCan you paste the systemd service file for haproxy?20:51
johnsomReally I'm looking for the peer ID string20:51
rm_workheh20:52
cgoncalveshah20:52
rm_workwait where is that20:52
rm_workyeah it always happens when I add a member...20:52
rm_worki am trying to confirm that also20:52
johnsomYou can also get the string I want with "ps -ef | grep haproxy"20:54
johnsomShould be after the -L20:54
rm_workah20:54
rm_worklEyka8jttt6jkyQiPHSB6AvUwU020:54
rm_workno -20:54
johnsomOk, bummer, it's not that20:54
johnsomwhat does the haproxy log file have? Anything interesting?20:55
rm_workah i do see something in there, was just looking20:55
rm_workinteresting20:55
rm_workFD limit issues20:56
rm_workon third member i think20:56
rm_workhttp://paste.openstack.org/show/795804/20:56
rm_workmaxcon related also20:56
johnsomNope, it's the usage output.20:57
johnsomThe FD stuff is always there20:57
johnsomSo, that peer ID has to be the problem.20:57
cgoncalveswouldn't we see a second usage output if it was a bad peer ID?20:58
johnsomHmm, there is a "cannot fork" in there20:58
johnsomMaybe it is memory. Try dropping the max connections on the listener down to 50k20:58
rm_workwait so we ALWAYS have an FD problem?20:59
johnsomYeah, it always whines about the FD limit and drops it down to whatever the instance can handle20:59
johnsomSomething that is fixed in 2.x versions btw21:00
johnsomThat usage output makes me wonder though. I don't think we see that when it's just the memory being too low21:02
rm_workhmmm this is weird tho21:05
rm_worki created a third member and it worked fine...21:05
rm_workso i deleted it, which worked21:06
rm_workand recreated it21:06
rm_workand now it broke one amp again21:06
cgoncalvesmaxconn issue21:06
rm_workmemory is totally fine tho21:07
johnsomGive it a shot though, I'm leaning that way as well21:07
cgoncalvesrm_work, it is until haproxy tries to reload21:07
rm_workk21:08
*** gcheresh has quit IRC21:08
johnsomYou could check syslog and see if there are oom logs, but I don't think there always are21:08
rm_worki mean it then succeeds at loading haproxy again right after21:08
rm_workand memory is NOW fine21:08
rm_workbut the amp agent is still totally busted21:08
rm_workand therefore the amp is broken21:08
cgoncalvesrm_work, it works fine if you don't reload too fast21:09
rm_workok sooooo21:09
rm_workwhy is the amp agent dead21:09
rm_workand spewing connection errors21:09
rm_workso haproxy failed to load -- ok21:09
rm_workbut now the amp agent can't accept connections?21:09
rm_workhow would one affect the other that way21:10
johnsomI have some theories on that. my guess is the systemctl restart was hanging, which let the controller to timeout and close the connection, thus the connection reset, but gunicorn is still waiting on systemctl to give up or whatever21:10
rm_worktrying a manual restart of the amp agent to see if it comes back21:11
johnsomcheck if there is a systemctl in the process list21:11
rm_workok well i restarted the agent21:12
rm_workand the amp is back to ACTIVE21:12
rm_workand now the LB is good21:12
rm_work"restart" didn't work, had to do stop/start BTW21:12
rm_workyeah21:15
rm_work /bin/systemctl reload haproxy-143b63ec-6058-437f-8eb6-112380a612e4.service21:15
rm_workit's stuck on the reload and timing out gunicorn, you're correct21:15
rm_workso gunicorn doesn't handle that well and just ... breaks?21:15
rm_workand can't recover21:15
johnsomWell, we have one worker configured (good reasons) and that one worker is locked up with systemd dumbness21:16
rm_workhmm21:16
rm_workand systemd NEVER times out?21:17
rm_workah no, it is gone now21:17
rm_workbut agent is still borked I think21:17
johnsomIt does, but probably waits longer than gunicorn21:17
rm_workso once that's timed out, gunicorn should start responding, right?21:17
johnsomYeah, we see it killing the worker, but I don't know what it does when a new worker is started, does it just run the request again?21:19
rm_workdoesn't seem to21:19
johnsomSo out of curiosity, did lowering the listener connection limit help?21:23
rm_workstill doing a bunch of testing to make sure i can 100% replicate21:25
rm_worki think i'm about confident21:25
johnsomNow that we are on python3 we could add a timeout to that systemd call21:35
*** shtepanie has joined #openstack-lbaas21:41
aannuusshhkkaajohnsom: are you around?21:42
johnsomaannuusshhkkaa I am21:43
aannuusshhkkaaso we were wondering if we should create a new table for the new metrics we are adding21:44
johnsomThose are the CPU and RAM?21:45
shtepanieyep, and later on we plan on adding things like load averages and disk usage21:45
aannuusshhkkaaand probably some related to active connections and network bandwidth etc...21:46
johnsomHmm, do you need them stored for your use case?21:46
aannuusshhkkaaas opposed to?21:46
johnsomWell, we could collect them from the amphora, pass them to the metrics driver(s), and that is all.21:47
johnsomWe are not planning to add those to the API, so I don't know if they need to be stored or simply passed to the metrics driver(s)21:47
aannuusshhkkaahmm okay..21:49
aannuusshhkkaaso we wont need a new api either right?21:50
johnsomWe store the current metrics because they are exposed via the API, so if someone asks for the stats we don't have to wait for a message or poll the amphora. But these metrics I don't think we are planning to add to the API, we are just going to use them and/or send them somewhere.21:50
aannuusshhkkaaso we just query the amphora for the NEW metrics say every 10 seconds, and use them however we want?21:52
johnsomWell, the amphora is going to send them right? we aren't adding polling.21:53
shtepaniewe want our customer to be able to get the metrics when they ask for it, so wouldn't we fall into the first scenario? and if we don't add them to the api, the customer would just have to wait for the next message to get the stats?21:54
johnsomHmm, we didn't write up a spec did we rm_work?21:54
rm_workwe did not.... though i expected we WOULD expose those via the API somehow21:55
rm_workI realize they're not always totally generic21:55
johnsomYeah, so maybe our use cases are very different. I thought we were updating the metrics and adding the ability to have metrics drivers, which would take the data and do something with it.21:56
rm_workyes, both21:56
rm_workbut they're separate things21:56
rm_workand if we pass that data to the metrics drivers, the update_db driver needs to store it too...21:56
johnsomYeah, the issues with the API are, 1. users aren't supposed to know amphora even exist, let alone the memory or CPU allocated to them.21:56
rm_workright, been thinking about that21:57
johnsom2. we would have to store those21:57
rm_workso if we didn't create a new table and added them to the listenerstats table... there would be a little bit of data duplication on the amp driver, but could allow other providers at least to store things granularly enough21:57
johnsom3. then expose more "amphora" stuff21:57
johnsom4. Is it going to be consistent with different amphora images, etc.21:58
rm_workwell, they all have a concept of CPU/RAM21:58
rm_workand if it's percentages, then... pretty consistent21:58
johnsomNot really per-listener or LB though21:58
johnsomThis just feels like exposing the sausage making, it's ugly and customers really only want the finished product.21:59
rm_workyeah, i mean we will already have to make funky decisions like "do we store just the MAX of the usage values, since we have two amps returning data?"22:00
rm_workyou say that, but we've tried to convince our customer that all they need is active connections and an estimate of the maximum we support22:00
rm_workand that has not been an accepted answer, they want to see CPU/RAM data22:01
johnsomYeah, you just can't add that to the listener or lb api. It wouldn't make an sense22:01
rm_workit wouldn't necessarily make sense to SHOW on the listener stats api call22:02
johnsomThe only place you could add it is is to create a new "amphora stats" API22:02
rm_workyeah, that's an option22:04
rm_workI mean, it COULD be displayed on the loadbalancer stats22:04
rm_workbecause I believe every provider will have these stats, even the hardware ones? wouldn't an F5 have CPU/RAM usage?22:04
johnsomSo you have an LB with five amphora on it. Do you average?22:04
rm_worklike i said, Peak22:04
rm_workMax22:05
johnsomNo, F5 doesn't have this22:05
johnsomF5 would have a few thousand load balancers all sharing a CPU and RAM22:05
rm_workah I didn't think F5 did multi-tenant22:06
rm_worki guess it makes sense tho22:06
rm_worki've just never seen it deployed that way22:06
aannuusshhkkaalooks like F5 does? https://techdocs.f5.com/kb/en-us/products/big-ip_analytics/manuals/product/analytics-implementations-12-1-0/7.html22:07
aannuusshhkkaahave cpu and ram usage.. correct me if I am wrong!22:07
johnsomaannuusshhkkaa Yeah, that is exactly what I am saying. The only metrics for cpu/disk/ram on the appliances is for thousands of load balancers. It's not per-load balancer like the amphora.22:08
*** rcernin has joined #openstack-lbaas22:09
aannuusshhkkaaouuu okay22:10
rm_workhmm, it does have them?22:10
rm_workand i guess it depends on the deployment22:10
johnsomOn F5 a load balancer is a "virtual server". The appliances are expensive, so you stack as many virtual servers (load balancers) on each appliance as you can.22:10
aannuusshhkkaaand if one fails, does it mean the others are about to fail too?22:11
johnsomYeah, it is shared fate, but typically you have them in an HA pair and it fails over to the other appliance.22:12
rm_workso you would -2 adding system statistics to LoadbalancerStats in any form?22:13
aannuusshhkkaagotcha, then using peak(max) totally makes sense right?22:13
johnsomSo every customer would see 95% basically. No matter if their load balancer was idle.22:14
aannuusshhkkaai guess a false positive indicating failure is safer than a false negative in this case right?22:16
rm_workmixed bag -- don't necessarily want a ton of customers coming to us and complaining that their LB is always over 50% capacity and they want a new one22:16
johnsomJust to give you an idea, this is the cheapest F5 appliance: https://www.softchoice.com/catalog/en-us/network-devices-f5-big-ip-iseries-local-traffic-manager-i2800-load-balancing-device-F5Networks-UX525122:17
rm_worklol yeah22:17
aannuusshhkkaawhaaaaat!!!!!22:17
aannuusshhkkaarm_work, right.. so what do we do then? about the mixed bag?22:18
*** dmellado has joined #openstack-lbaas22:20
johnsomOh, ha, I was wrong, there is one for $16,00022:20
rm_workwell, one option is we tell our customer "sorry, we know you want CPU/RAM stats, but... you don't get that. Trust us that the meaningful metric is current active connections, and use that."22:22
rm_workwhich I tried to do in our last meeting and was ignored22:22
rm_workor rather, told "no, that isn't good enough"22:23
rm_workbut we may just have to be more forceful22:23
johnsomYeah, so given that all pretty much every other driver those are shared resources across tenants, I don't think it is viable for adding to the load balancer or listener stats.22:23
johnsomIf you want to make something like that public, you could I guess come up with a bogo-mip kind of thing. It's not really CPU or RAM, but "bogo-load".22:24
johnsomMost drivers would probably do number of connections over max22:25
rm_workright, a "capacity units" measurement, 0-122:26
rm_worksomething like that22:26
rm_workand yeah, we would do connections over our estimated max22:26
johnsomWe already have that to some degree as the listener will go degraded if the connection level goes too high22:27
aannuusshhkkaahow do we check that currently? just based on number of active connections?22:31
johnsomYes. You set your maximum number of connections when creating the listener. Then we currently collect the number of current active connections.22:32
johnsomHAProxy also notifies us via the "FULL" state: https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L28922:33
johnsomThat means connections are getting queued on the front end22:33
aannuusshhkkaaaah gotcha22:40
aannuusshhkkaaso the health of a LB is determined based on current connections/max number of connections?22:41
johnsomAmong the other status issues that could lead to a degraded or error operational state.22:41
rm_workbasically everything boils down to how many connections can be open22:42
rm_workCPU/RAM is all just there so we can have more open connections22:42
johnsom+1 to that22:42
rm_workbasically an operator should know in their environment how many connections is a theoretical max22:42
rm_workand then that should be the "100%"22:43
aannuusshhkkaajohnsom, what are the other "status issues"?22:43
johnsomLike the number of member servers that are down, etc.22:43
aannuusshhkkaaokay so would we have to incorporate those as well in determining the health of the LB?22:44
johnsomSo if you have a pool with five member servers, and one is not responding, that would be a degraded state as well. If all are not responding, then you are in ERROR.22:44
shtepaniegoing back a little, but if HAProxy notifies us of a "FULL" state, is it also possible to notify us of a "almost full" type of state? some sort of warning when we're starting to use close to a full state?22:45
johnsomYeah, we already do. That is what the operational status is for22:45
shtepanieahh okay22:45
aannuusshhkkaado we already have something that would warn us when the LB is at 50% or 75% capacity?22:46
johnsomNot really percentages. Really CPU and RAM won't tell you that either.22:48
aannuusshhkkaayeap yeap..22:48
aannuusshhkkaaso if we want to find that out, what would we use?22:49
johnsomI mean a user can calculate the percentage using the active connections.  We will say when you have reached you connection capacity limit. The part that is harder is deciding on the correct "MAX" to set.22:50
rm_workyeah MAX has to be determined with performance testing per cloud22:50
aannuusshhkkaadont we already calculate the max for each cloud?22:51
johnsomgoogle "openstack octavia performance" a few lines down is a guide I wrote a long time ago that I guess is now published. It has a list of some of the factors that will go into why getting a "MAX" is hard.22:54
aannuusshhkkaaalrighty.. i'll take a look22:56
rm_workyeah it has to do with your network hardware setup, your compute hosts, and maybe a couple other things22:59
aannuusshhkkaahttps://developer.rackspace.com/docs/private-cloud/rpc/master/rpc-octavia-internal/octavia-perf-guide/ is this the link?22:59
rm_workyeah22:59
aannuusshhkkaaokay22:59
johnsomrm_work So back to the haproxy reload issue. Systemd docs suck, so it's not clear which of those timeouts matter on a reload call. Worse yet, it implies the timeout is 100ms which can't be true. Any luck testing out the connection limit change on the listener?23:02
aannuusshhkkaaso there are about 18 factors that contribute to the performance according to that page.. do we collect that data? if so, where can i find it?23:02
johnsomAt least that many factors...23:03
rm_workyeah seems to be the connection limit thing23:03
aannuusshhkkaayeah.. maybe if we look at the logs, we can come up with a formula to determine LB capacity units..23:03
rm_workguess the memory pressure is too ephemeral23:03
johnsomYeah, great. ok. So we have been talking about this internally for a while. Thus the patch Greg posted, but last I looked it needs work.23:04
johnsomBasically with the "unlimited" -1, we translated that into 1,000,000 connections. With the memory allocation up front, that is a sizable amount. Now, using the current reload mechanism for hitless reloads, haproxy starts a secondary process, or depending on how often the reloads come in, more.23:06
johnsomWhat seems to be a bit new is how long the old processes stick around.23:06
johnsomIn general, what should happen is if haproxy doesn't have enough memory, it should fail and systemd *should* kill it and restart it. So the pain would only be that it was a non-hitless reload, but the amp should continue fine. However that isn't happening right. I found a bug in systemd that was causing restarts to not fire in the version centos has. That *should* be fixed, at least in RHEL.23:10
johnsomThe gunicorn issue.... that one i'm not sure about. If we coudl set a timeout for systemd reload, that would be great. The only other thought is setting a timeout on the subsystem call in python.23:11
rm_workhmm23:21
rm_workweird that even a "restart" on the agent didn't actually stop/start the agent properly either23:21
rm_workit was kinda... stuck?23:21
rm_workI think23:21

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!