Thursday, 2019-02-07

colin-perhaps it'd be useful for me to phrase another question that is bouncing around in my head here00:00
colin-is there a meaningful uppper limit on how many amps two HMs can health check effectively before octavia is the only thing that can live on the box?00:00
johnsomSo, the question is what is the right mix of HMs vs. workers for a 650 amp cloud00:00
rm_workalso, i ran my stuff in kubernetes for the API/workers00:00
colin-yeah just want to understand the ratio better00:00
rm_workand the HM on their own VMs00:00
rm_workyou're sharing one box for all services?00:01
colin-no, there are two discrete hosts in this example00:01
rm_workright but each of those two is running o-api, o-cw, o-hm00:01
rm_workand o-hk00:02
rm_work?00:02
colin-yes00:02
rm_workk00:02
rm_workso, yeah, i found that o-api o-hk o-cw were all very lightweight00:02
rm_workso that's probably fine for those00:02
rm_worki would probably put the HMs on a series of different boxes00:03
openstackgerritIan Wienand proposed openstack/octavia master: [dnm] testing pip-and-virtualenv centos fix  https://review.openstack.org/63537100:03
rm_workbut that's just me ;P00:03
colin-not ruling that out yet but it isn't an attractive option00:04
rm_worki think well, i definitely ran them in different quanitity, so00:04
rm_work*quantity00:04
rm_work4x API, 2x CW, 1x HK, 6x HM00:04
rm_workoctavia services work really well in k8s BTW :P00:05
johnsomI mean, the work it is doing is very DB bound. I would look at the logs and figure out how long each heartbeat takes to process, and how many per second you are seeing.  That should give you an idea of how to tune the number of workers.00:05
johnsomWe run them in lxc containers00:05
colin-have a sincere interest in that rm_work but will probably have to circle back00:05
colin-k8s is a parallel effort consuming octavia at the moment00:05
colin-don't want to get too chicken and eggy00:06
colin-but i'm interested in running them there too for the ease of operation it offers00:06
colin-understood johnsom that's a good suggestion00:06
rm_worknapkin math might be something like: Projected number of LBs * 2 (for active-standby amp count) / health interval = number of amp messages per second00:06
colin-johnsom: you run the HMs in lxc containers?00:07
rm_workbut i think you got that far00:07
johnsomyes00:07
colin-neat00:07
johnsomAll of the control plane is in lxc containers00:07
colin-did the project ever consider using the existing messaging infrastructure to pass amp health back to HM?00:08
colin-amqp00:08
colin-or would participating in that have used up too much res on the amp00:08
rm_workthe question is, what is "X", in the equation: Number of amp messages per second / ma<X> messages per second that a single HM can handle = number of HMs00:08
rm_workit would take down any AMQP i can think of :P00:09
rm_worklol00:09
rm_workRMQ is a horrible point of failure already00:09
colin-yeah that captures what i'm trying to figure out pretty well i think rm_work00:09
colin-oh ok00:09
johnsomAh yes, some of us have experience with attempting to use rabbit for the health heartbeats. It ended very very poorly for that team.00:09
rm_workusing RMQ ends very poorly for most teams for any project00:09
rm_worklol00:09
colin-do you guys feel like HM resource utilization constitutes a meaningful ceiling for your octavia implementations in your clouds?00:10
rm_workunless your messaging rate is very low00:10
rm_workactually yes00:10
rm_workbut that's why it's scalable :P00:10
rm_workjohnsom's figures showed handling several thousand amps per HM00:11
johnsomRight, it scales horizontally for the HMs.  The DB is probably the limiting factor of how many amps a single deployment can handle. We had some thoughts on that as well, but we haven't worked on that yet.00:11
rm_workbut i am not sure if that was "per second" or "amps total, given some interval"00:11
colin-ok, if it's helpful as feedback the difference between several thousand and 650 would be extremely meaningful for me00:12
colin-(fwiw(00:12
colin-)*00:12
rm_workyes, health was specifically split out into its own simple table with no FK relations, to be put in something like redis possibly in the long term00:12
johnsomYeah, it was amps on the standard 10 second interval00:12
rm_workbut we never got to it00:12
colin-will work on what we've discussed to try and improve that as well as arrive at a better ratio00:12
rm_workok so00:12
colin-i've gone to 10s in config now so new amps will use that00:12
rm_work100-200 mps?00:12
rm_worki think the default *is* 10s?00:13
colin-3s00:13
colin-oh wait00:13
colin-that's the wrong value, sorry00:13
rm_workso if you assume you can handle 200 messages per second on a single HM, you can work that math backwards00:13
colin- heartbeat_interval¶00:13
colin-    Type:integer00:13
colin-    Default:1000:13
colin-you're right00:13
rm_work200 (mps max) * 10 (interval) / 2 (amps per LB) = 1000 LBs per HM00:14
rm_workI think i did that right00:15
rm_workassuming active-standby00:15
colin-(true for us)00:15
rm_workso if you want to support 100,000 LBs, you need 100 HMs? which seems... a little ridiculous maybe, yes00:16
rm_worki'd hope we could get that "max mps" number a bit higher00:16
rm_workand yeah, part of it would probably be switching to a faster backend00:16
rm_workthough i think we have a problem there because of the amount of non-health data we write/pull in order to actually do that operation00:17
rm_workwe basically pull the whole LB and write to a bunch of different tables to capture member statuses00:17
johnsomYou are missing the # workers in that formula00:17
rm_workerr?00:17
johnsomWell, each HM can have a configurable # of worker processes. It defaults to as many cores as you have, but theirs is capped at 8 per HM00:19
rm_workerr yeah true00:19
rm_workwas yours "1" in your tests?00:19
johnsom400:19
rm_workok so we're assuming linear scaling there?00:19
rm_workso maybe 400mps for 8 workers?00:20
*** fnaval has quit IRC00:25
colin-suggesting twice as many LBs?00:28
colin-hm00:28
rm_workyes, and 8 was below the limit you REALLY could run, right?00:28
rm_workassuming these have their own box00:28
colin-in this example the hosts are also operating other openstack control plane services so not their own box00:29
rm_workah00:29
colin-i chose 8 for health_update_threads and status_update_threads somewhat arbitrarily; it can be increased but when i noticed much higher than expected utilization capping it there was a quick measure to ensure it didn't run away in case it was a near-term issue00:32
colin-so i'm not super committed to that value outside of wanting to have some control over how this behaves00:33
*** yamamoto has quit IRC01:41
*** yamamoto has joined #openstack-lbaas02:05
*** psachin has joined #openstack-lbaas02:57
*** rcernin has joined #openstack-lbaas03:07
*** Dinesh_Bhor has joined #openstack-lbaas03:08
*** Dinesh_Bhor has quit IRC03:30
*** Dinesh_Bhor has joined #openstack-lbaas03:39
*** ramishra has joined #openstack-lbaas04:14
*** rcernin has quit IRC04:35
*** ramishra has quit IRC04:45
*** Dinesh_Bhor has quit IRC04:48
*** Dinesh_Bhor has joined #openstack-lbaas05:00
*** ramishra has joined #openstack-lbaas05:14
*** gcheresh_ has joined #openstack-lbaas05:22
*** ramishra has quit IRC05:29
*** ramishra has joined #openstack-lbaas05:30
*** gcheresh_ has quit IRC05:30
*** gcheresh_ has joined #openstack-lbaas06:26
openstackgerritErik Olof Gunnar Andersson proposed openstack/octavia master: Cleaning up logging  https://review.openstack.org/63543906:41
openstackgerritErik Olof Gunnar Andersson proposed openstack/octavia master: Cleaning up logging  https://review.openstack.org/63543906:45
eanderssonjohnsom, rm_work ^ couldn't help myself after I found a bug in the logging06:47
johnsomPlease  feel free to help us any time06:57
*** pcaruana has joined #openstack-lbaas07:02
*** yamamoto has quit IRC07:15
*** yamamoto has joined #openstack-lbaas07:16
rm_workIndeed!07:18
eanderssonjohnsom, is there a reason DriverManager is called everytime a call is made? seems like a waste07:24
eanderssone.g. https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L19107:24
rm_workI think you may be correct, probably could be moved into the class07:26
rm_workAnd just loaded once07:26
rm_workTry it? :D07:26
rm_workThough that's only called in the case of a zombie, which should be rare, so I doubt it matters too much07:28
eanderssonYea - was looking into it, was more worrieda bout the heartbeat_udp ones, but looks like they are just called once07:29
*** AlexStaf has joined #openstack-lbaas07:35
*** yboaron has quit IRC07:56
openstackgerritErik Olof Gunnar Andersson proposed openstack/octavia master: Ensure drivers only need to be loaded once  https://review.openstack.org/63544708:01
eanderssonrm_work, ^ :p08:02
*** Emine has joined #openstack-lbaas08:02
*** rpittau has joined #openstack-lbaas08:06
*** ccamposr has joined #openstack-lbaas08:22
*** celebdor has joined #openstack-lbaas08:47
*** yboaron has joined #openstack-lbaas09:00
*** sapd1 has joined #openstack-lbaas09:21
*** yamamoto has quit IRC09:38
*** sapd1 has quit IRC09:40
*** yamamoto has joined #openstack-lbaas09:47
openstackgerritCarlos Goncalves proposed openstack/octavia-tempest-plugin master: DNM: Add octavia-v2-dsvm-scenario-fedora-latest job  https://review.openstack.org/60038109:49
*** oanson has joined #openstack-lbaas09:51
*** salmankhan has joined #openstack-lbaas10:28
*** salmankhan has quit IRC10:33
*** salmankhan has joined #openstack-lbaas10:33
*** salmankhan1 has joined #openstack-lbaas10:38
*** salmankhan has quit IRC10:38
*** salmankhan1 is now known as salmankhan10:38
*** Dinesh_Bhor has quit IRC10:41
*** yamamoto has quit IRC11:37
*** yamamoto has joined #openstack-lbaas11:40
*** yamamoto has quit IRC11:45
*** yamamoto has joined #openstack-lbaas11:46
*** ccamposr has quit IRC11:49
*** yamamoto has quit IRC11:50
*** yamamoto has joined #openstack-lbaas12:27
*** ramishra has quit IRC13:29
*** yboaron_ has joined #openstack-lbaas13:32
*** ramishra has joined #openstack-lbaas13:33
*** yboaron has quit IRC13:35
*** KeithMnemonic has joined #openstack-lbaas13:48
*** ramishra has quit IRC13:48
*** yboaron_ has quit IRC14:08
*** yboaron_ has joined #openstack-lbaas14:13
*** psachin has quit IRC14:26
*** irclogbot_1 has joined #openstack-lbaas14:31
openstackgerritNir Magnezi proposed openstack/octavia master: Encrypt certs and keys  https://review.openstack.org/62706414:50
openstackgerritVadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer.  https://review.openstack.org/63516714:51
openstackgerritVadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer.  https://review.openstack.org/63516715:18
*** AlexStaf has quit IRC15:19
*** gcheresh_ has quit IRC15:23
*** yboaron_ has quit IRC15:35
*** yboaron_ has joined #openstack-lbaas15:36
*** yboaron_ has quit IRC16:04
*** yboaron_ has joined #openstack-lbaas16:05
*** yboaron_ has quit IRC16:05
*** pcaruana has quit IRC16:21
johnsomYeah, that call should be very rare, only firing when nova has failures deleting.16:30
openstackgerritboden proposed openstack/neutron-lbaas master: stop using common db mixin methods  https://review.openstack.org/63557016:53
*** rpittau has quit IRC17:02
*** salmankhan has quit IRC17:36
*** trown is now known as trown|lunch17:38
openstackgerritMichael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib  https://review.openstack.org/61370917:45
openstackgerritMichael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib  https://review.openstack.org/61370917:50
*** aojea has joined #openstack-lbaas18:03
openstackgerritMerged openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests  https://review.openstack.org/63507618:45
*** trown|lunch is now known as trown19:15
openstackgerritMerged openstack/octavia master: Fix flavors support when using spares pool  https://review.openstack.org/63259419:45
*** aojea has quit IRC20:26
*** blake has joined #openstack-lbaas20:35
*** celebdor has quit IRC20:37
eanderssonjohnsom, I found a few more that I tried to fix, but I think tempest is changing the driver during runtime21:30
johnsomYeah, some of the driver loaders are intended to pick up changes. Especially when we bring those into the flavors21:31
johnsomLike the provider driver loader, that should pickup the new code if a provider driver update is deployed.21:32
rm_workyeah, only really the amp-health stuff needs to be "optimized" heavily21:35
rm_workeverywhere else I would err on the side of leaving it to reload as much as it wants21:35
johnsomI think the only thing that loads there is the zombie stuff, which, should fire very rarely21:36
openstackgerritErik Olof Gunnar Andersson proposed openstack/octavia master: [WIP] Ensure drivers only need to be loaded once  https://review.openstack.org/63544721:38
eanderssonAre you saying that any of these would be configured outside of a config file?21:41
eanderssonOutside of testing?21:41
eanderssonBecause I can't imagine a world where a driver wouldn't be coming from a config file.21:42
johnsomWell, for example, provider drivers can be loaded by installing a python module.21:42
eanderssonSure - but I would assume that they would be configured in a config file still21:43
eanderssonand not loaded at runtime after installing said python file?21:43
eandersson*module21:43
johnsomWe are also migrating a number of configuration settings to be configurable via flavors instead of the config file. The config file will be default, but flavors may override that setting.21:43
eanderssonInteresting21:43
johnsomYou can enable/disable provider drivers via the config file, but if it's enabled, simply installing a new version would start using it.21:44
eanderssonSeems risky21:44
johnsomWhy would we need to restart the whole API process just to upgrade a provider driver?21:44
eanderssonupgrade a provider driver you most certainly need to restart your process21:45
johnsomWell, note, flavors can only be created by an operator. They are not intended to be created by users21:45
eanderssonsince python will cache the python code21:45
eanderssoninstalling a new I might understand21:45
eanderssonbut even that is pretty crazy21:45
johnsomBy calling stevedore, it will pick up new code that is deployed21:45
rm_workyes21:49
rm_workand also for barbican, we need new clients for different auth21:49
rm_workIIRC21:49
rm_workso we make new barbican clients a lot21:49
rm_workagain, IIRC21:49
eanderssonI tested stevedore and it does not reload new changes21:54
eanderssonI mean of course if you load X, and then change it to Y it will work21:54
*** blake has quit IRC21:55
eanderssonbut if you load X and make changes to X it wont affect until I restart th eapp21:55
eanderssonbut anyway I abandoned that change21:56
johnsomDid you change module version?21:59
johnsomI didn't have a problem with what you had, but I also don't think we what to make all of them cached inside our code.22:00
eanderssonYep - tested installing a new version22:07
eanderssonThe only way to get around it in python is to call reload on the file22:07
openstackgerritMichael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib  https://review.openstack.org/61370922:15
openstackgerritCarlos Goncalves proposed openstack/octavia master: WIP: Update pylint version  https://review.openstack.org/63523622:25
eanderssonIf you guys have a blueprint, or make one on this I wouldn't mind doing some testing and/or providing some feedback on thos.22:34
eanderssonI am not trying to be a negative nancy :p22:34
johnsomeandersson For which?  provider drivers? flavors?22:35
eanderssonboth22:35
johnsomProvider drivers: https://docs.openstack.org/octavia/latest/contributor/specs/version1.1/enable-provider-driver.html22:35
johnsomFlavors: https://docs.openstack.org/octavia/latest/contributor/specs/version1.0/flavors.html22:35
eanderssonThanks johnsom22:35
johnsomThose are both fairly old now, Pike and queens merged.22:36
johnsomProviders landed in Rocky, but we are refining in Stein. Flavors has landed in Stein22:36
johnsomBoth have admin guide docs pages as well22:37
eanderssonYea - great work on that btw22:37
eanderssonI am excited about it.. lets just hope our vendors are equally excited haha22:38
johnsomI know two vendors already have code working internally22:38
eanderssonbut then again haproxy is great22:38
johnsomcgoncalves That looks like it was fun....22:43
johnsomI need to look at what those rules are you disabled22:43
rm_workthis is my favorite test, because it's just so *me*: https://github.com/openstack/octavia/blob/master/octavia/tests/functional/db/test_repositories.py#L44622:57
johnsomlol, yeah I saw that again recently. Good stuff22:57
rm_workdoes the passive-aggressiveness come through? :P22:58
johnsomI need to make another "why are we skipping tests" pass to see if there are more to clean up22:58
*** Emine has quit IRC22:59
johnsomWe have 7 functionals in skip mode...23:00
*** takamatsu has quit IRC23:19
*** celebdor has joined #openstack-lbaas23:27

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!