colin- | perhaps it'd be useful for me to phrase another question that is bouncing around in my head here | 00:00 |
---|---|---|
colin- | is there a meaningful uppper limit on how many amps two HMs can health check effectively before octavia is the only thing that can live on the box? | 00:00 |
johnsom | So, the question is what is the right mix of HMs vs. workers for a 650 amp cloud | 00:00 |
rm_work | also, i ran my stuff in kubernetes for the API/workers | 00:00 |
colin- | yeah just want to understand the ratio better | 00:00 |
rm_work | and the HM on their own VMs | 00:00 |
rm_work | you're sharing one box for all services? | 00:01 |
colin- | no, there are two discrete hosts in this example | 00:01 |
rm_work | right but each of those two is running o-api, o-cw, o-hm | 00:01 |
rm_work | and o-hk | 00:02 |
rm_work | ? | 00:02 |
colin- | yes | 00:02 |
rm_work | k | 00:02 |
rm_work | so, yeah, i found that o-api o-hk o-cw were all very lightweight | 00:02 |
rm_work | so that's probably fine for those | 00:02 |
rm_work | i would probably put the HMs on a series of different boxes | 00:03 |
openstackgerrit | Ian Wienand proposed openstack/octavia master: [dnm] testing pip-and-virtualenv centos fix https://review.openstack.org/635371 | 00:03 |
rm_work | but that's just me ;P | 00:03 |
colin- | not ruling that out yet but it isn't an attractive option | 00:04 |
rm_work | i think well, i definitely ran them in different quanitity, so | 00:04 |
rm_work | *quantity | 00:04 |
rm_work | 4x API, 2x CW, 1x HK, 6x HM | 00:04 |
rm_work | octavia services work really well in k8s BTW :P | 00:05 |
johnsom | I mean, the work it is doing is very DB bound. I would look at the logs and figure out how long each heartbeat takes to process, and how many per second you are seeing. That should give you an idea of how to tune the number of workers. | 00:05 |
johnsom | We run them in lxc containers | 00:05 |
colin- | have a sincere interest in that rm_work but will probably have to circle back | 00:05 |
colin- | k8s is a parallel effort consuming octavia at the moment | 00:05 |
colin- | don't want to get too chicken and eggy | 00:06 |
colin- | but i'm interested in running them there too for the ease of operation it offers | 00:06 |
colin- | understood johnsom that's a good suggestion | 00:06 |
rm_work | napkin math might be something like: Projected number of LBs * 2 (for active-standby amp count) / health interval = number of amp messages per second | 00:06 |
colin- | johnsom: you run the HMs in lxc containers? | 00:07 |
rm_work | but i think you got that far | 00:07 |
johnsom | yes | 00:07 |
colin- | neat | 00:07 |
johnsom | All of the control plane is in lxc containers | 00:07 |
colin- | did the project ever consider using the existing messaging infrastructure to pass amp health back to HM? | 00:08 |
colin- | amqp | 00:08 |
colin- | or would participating in that have used up too much res on the amp | 00:08 |
rm_work | the question is, what is "X", in the equation: Number of amp messages per second / ma<X> messages per second that a single HM can handle = number of HMs | 00:08 |
rm_work | it would take down any AMQP i can think of :P | 00:09 |
rm_work | lol | 00:09 |
rm_work | RMQ is a horrible point of failure already | 00:09 |
colin- | yeah that captures what i'm trying to figure out pretty well i think rm_work | 00:09 |
colin- | oh ok | 00:09 |
johnsom | Ah yes, some of us have experience with attempting to use rabbit for the health heartbeats. It ended very very poorly for that team. | 00:09 |
rm_work | using RMQ ends very poorly for most teams for any project | 00:09 |
rm_work | lol | 00:09 |
colin- | do you guys feel like HM resource utilization constitutes a meaningful ceiling for your octavia implementations in your clouds? | 00:10 |
rm_work | unless your messaging rate is very low | 00:10 |
rm_work | actually yes | 00:10 |
rm_work | but that's why it's scalable :P | 00:10 |
rm_work | johnsom's figures showed handling several thousand amps per HM | 00:11 |
johnsom | Right, it scales horizontally for the HMs. The DB is probably the limiting factor of how many amps a single deployment can handle. We had some thoughts on that as well, but we haven't worked on that yet. | 00:11 |
rm_work | but i am not sure if that was "per second" or "amps total, given some interval" | 00:11 |
colin- | ok, if it's helpful as feedback the difference between several thousand and 650 would be extremely meaningful for me | 00:12 |
colin- | (fwiw( | 00:12 |
colin- | )* | 00:12 |
rm_work | yes, health was specifically split out into its own simple table with no FK relations, to be put in something like redis possibly in the long term | 00:12 |
johnsom | Yeah, it was amps on the standard 10 second interval | 00:12 |
rm_work | but we never got to it | 00:12 |
colin- | will work on what we've discussed to try and improve that as well as arrive at a better ratio | 00:12 |
rm_work | ok so | 00:12 |
colin- | i've gone to 10s in config now so new amps will use that | 00:12 |
rm_work | 100-200 mps? | 00:12 |
rm_work | i think the default *is* 10s? | 00:13 |
colin- | 3s | 00:13 |
colin- | oh wait | 00:13 |
colin- | that's the wrong value, sorry | 00:13 |
rm_work | so if you assume you can handle 200 messages per second on a single HM, you can work that math backwards | 00:13 |
colin- | heartbeat_interval¶ | 00:13 |
colin- | Type:integer | 00:13 |
colin- | Default:10 | 00:13 |
colin- | you're right | 00:13 |
rm_work | 200 (mps max) * 10 (interval) / 2 (amps per LB) = 1000 LBs per HM | 00:14 |
rm_work | I think i did that right | 00:15 |
rm_work | assuming active-standby | 00:15 |
colin- | (true for us) | 00:15 |
rm_work | so if you want to support 100,000 LBs, you need 100 HMs? which seems... a little ridiculous maybe, yes | 00:16 |
rm_work | i'd hope we could get that "max mps" number a bit higher | 00:16 |
rm_work | and yeah, part of it would probably be switching to a faster backend | 00:16 |
rm_work | though i think we have a problem there because of the amount of non-health data we write/pull in order to actually do that operation | 00:17 |
rm_work | we basically pull the whole LB and write to a bunch of different tables to capture member statuses | 00:17 |
johnsom | You are missing the # workers in that formula | 00:17 |
rm_work | err? | 00:17 |
johnsom | Well, each HM can have a configurable # of worker processes. It defaults to as many cores as you have, but theirs is capped at 8 per HM | 00:19 |
rm_work | err yeah true | 00:19 |
rm_work | was yours "1" in your tests? | 00:19 |
johnsom | 4 | 00:19 |
rm_work | ok so we're assuming linear scaling there? | 00:19 |
rm_work | so maybe 400mps for 8 workers? | 00:20 |
*** fnaval has quit IRC | 00:25 | |
colin- | suggesting twice as many LBs? | 00:28 |
colin- | hm | 00:28 |
rm_work | yes, and 8 was below the limit you REALLY could run, right? | 00:28 |
rm_work | assuming these have their own box | 00:28 |
colin- | in this example the hosts are also operating other openstack control plane services so not their own box | 00:29 |
rm_work | ah | 00:29 |
colin- | i chose 8 for health_update_threads and status_update_threads somewhat arbitrarily; it can be increased but when i noticed much higher than expected utilization capping it there was a quick measure to ensure it didn't run away in case it was a near-term issue | 00:32 |
colin- | so i'm not super committed to that value outside of wanting to have some control over how this behaves | 00:33 |
*** yamamoto has quit IRC | 01:41 | |
*** yamamoto has joined #openstack-lbaas | 02:05 | |
*** psachin has joined #openstack-lbaas | 02:57 | |
*** rcernin has joined #openstack-lbaas | 03:07 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 03:08 | |
*** Dinesh_Bhor has quit IRC | 03:30 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 03:39 | |
*** ramishra has joined #openstack-lbaas | 04:14 | |
*** rcernin has quit IRC | 04:35 | |
*** ramishra has quit IRC | 04:45 | |
*** Dinesh_Bhor has quit IRC | 04:48 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 05:00 | |
*** ramishra has joined #openstack-lbaas | 05:14 | |
*** gcheresh_ has joined #openstack-lbaas | 05:22 | |
*** ramishra has quit IRC | 05:29 | |
*** ramishra has joined #openstack-lbaas | 05:30 | |
*** gcheresh_ has quit IRC | 05:30 | |
*** gcheresh_ has joined #openstack-lbaas | 06:26 | |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/octavia master: Cleaning up logging https://review.openstack.org/635439 | 06:41 |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/octavia master: Cleaning up logging https://review.openstack.org/635439 | 06:45 |
eandersson | johnsom, rm_work ^ couldn't help myself after I found a bug in the logging | 06:47 |
johnsom | Please feel free to help us any time | 06:57 |
*** pcaruana has joined #openstack-lbaas | 07:02 | |
*** yamamoto has quit IRC | 07:15 | |
*** yamamoto has joined #openstack-lbaas | 07:16 | |
rm_work | Indeed! | 07:18 |
eandersson | johnsom, is there a reason DriverManager is called everytime a call is made? seems like a waste | 07:24 |
eandersson | e.g. https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L191 | 07:24 |
rm_work | I think you may be correct, probably could be moved into the class | 07:26 |
rm_work | And just loaded once | 07:26 |
rm_work | Try it? :D | 07:26 |
rm_work | Though that's only called in the case of a zombie, which should be rare, so I doubt it matters too much | 07:28 |
eandersson | Yea - was looking into it, was more worrieda bout the heartbeat_udp ones, but looks like they are just called once | 07:29 |
*** AlexStaf has joined #openstack-lbaas | 07:35 | |
*** yboaron has quit IRC | 07:56 | |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/octavia master: Ensure drivers only need to be loaded once https://review.openstack.org/635447 | 08:01 |
eandersson | rm_work, ^ :p | 08:02 |
*** Emine has joined #openstack-lbaas | 08:02 | |
*** rpittau has joined #openstack-lbaas | 08:06 | |
*** ccamposr has joined #openstack-lbaas | 08:22 | |
*** celebdor has joined #openstack-lbaas | 08:47 | |
*** yboaron has joined #openstack-lbaas | 09:00 | |
*** sapd1 has joined #openstack-lbaas | 09:21 | |
*** yamamoto has quit IRC | 09:38 | |
*** sapd1 has quit IRC | 09:40 | |
*** yamamoto has joined #openstack-lbaas | 09:47 | |
openstackgerrit | Carlos Goncalves proposed openstack/octavia-tempest-plugin master: DNM: Add octavia-v2-dsvm-scenario-fedora-latest job https://review.openstack.org/600381 | 09:49 |
*** oanson has joined #openstack-lbaas | 09:51 | |
*** salmankhan has joined #openstack-lbaas | 10:28 | |
*** salmankhan has quit IRC | 10:33 | |
*** salmankhan has joined #openstack-lbaas | 10:33 | |
*** salmankhan1 has joined #openstack-lbaas | 10:38 | |
*** salmankhan has quit IRC | 10:38 | |
*** salmankhan1 is now known as salmankhan | 10:38 | |
*** Dinesh_Bhor has quit IRC | 10:41 | |
*** yamamoto has quit IRC | 11:37 | |
*** yamamoto has joined #openstack-lbaas | 11:40 | |
*** yamamoto has quit IRC | 11:45 | |
*** yamamoto has joined #openstack-lbaas | 11:46 | |
*** ccamposr has quit IRC | 11:49 | |
*** yamamoto has quit IRC | 11:50 | |
*** yamamoto has joined #openstack-lbaas | 12:27 | |
*** ramishra has quit IRC | 13:29 | |
*** yboaron_ has joined #openstack-lbaas | 13:32 | |
*** ramishra has joined #openstack-lbaas | 13:33 | |
*** yboaron has quit IRC | 13:35 | |
*** KeithMnemonic has joined #openstack-lbaas | 13:48 | |
*** ramishra has quit IRC | 13:48 | |
*** yboaron_ has quit IRC | 14:08 | |
*** yboaron_ has joined #openstack-lbaas | 14:13 | |
*** psachin has quit IRC | 14:26 | |
*** irclogbot_1 has joined #openstack-lbaas | 14:31 | |
openstackgerrit | Nir Magnezi proposed openstack/octavia master: Encrypt certs and keys https://review.openstack.org/627064 | 14:50 |
openstackgerrit | Vadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer. https://review.openstack.org/635167 | 14:51 |
openstackgerrit | Vadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer. https://review.openstack.org/635167 | 15:18 |
*** AlexStaf has quit IRC | 15:19 | |
*** gcheresh_ has quit IRC | 15:23 | |
*** yboaron_ has quit IRC | 15:35 | |
*** yboaron_ has joined #openstack-lbaas | 15:36 | |
*** yboaron_ has quit IRC | 16:04 | |
*** yboaron_ has joined #openstack-lbaas | 16:05 | |
*** yboaron_ has quit IRC | 16:05 | |
*** pcaruana has quit IRC | 16:21 | |
johnsom | Yeah, that call should be very rare, only firing when nova has failures deleting. | 16:30 |
openstackgerrit | boden proposed openstack/neutron-lbaas master: stop using common db mixin methods https://review.openstack.org/635570 | 16:53 |
*** rpittau has quit IRC | 17:02 | |
*** salmankhan has quit IRC | 17:36 | |
*** trown is now known as trown|lunch | 17:38 | |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib https://review.openstack.org/613709 | 17:45 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib https://review.openstack.org/613709 | 17:50 |
*** aojea has joined #openstack-lbaas | 18:03 | |
openstackgerrit | Merged openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076 | 18:45 |
*** trown|lunch is now known as trown | 19:15 | |
openstackgerrit | Merged openstack/octavia master: Fix flavors support when using spares pool https://review.openstack.org/632594 | 19:45 |
*** aojea has quit IRC | 20:26 | |
*** blake has joined #openstack-lbaas | 20:35 | |
*** celebdor has quit IRC | 20:37 | |
eandersson | johnsom, I found a few more that I tried to fix, but I think tempest is changing the driver during runtime | 21:30 |
johnsom | Yeah, some of the driver loaders are intended to pick up changes. Especially when we bring those into the flavors | 21:31 |
johnsom | Like the provider driver loader, that should pickup the new code if a provider driver update is deployed. | 21:32 |
rm_work | yeah, only really the amp-health stuff needs to be "optimized" heavily | 21:35 |
rm_work | everywhere else I would err on the side of leaving it to reload as much as it wants | 21:35 |
johnsom | I think the only thing that loads there is the zombie stuff, which, should fire very rarely | 21:36 |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/octavia master: [WIP] Ensure drivers only need to be loaded once https://review.openstack.org/635447 | 21:38 |
eandersson | Are you saying that any of these would be configured outside of a config file? | 21:41 |
eandersson | Outside of testing? | 21:41 |
eandersson | Because I can't imagine a world where a driver wouldn't be coming from a config file. | 21:42 |
johnsom | Well, for example, provider drivers can be loaded by installing a python module. | 21:42 |
eandersson | Sure - but I would assume that they would be configured in a config file still | 21:43 |
eandersson | and not loaded at runtime after installing said python file? | 21:43 |
eandersson | *module | 21:43 |
johnsom | We are also migrating a number of configuration settings to be configurable via flavors instead of the config file. The config file will be default, but flavors may override that setting. | 21:43 |
eandersson | Interesting | 21:43 |
johnsom | You can enable/disable provider drivers via the config file, but if it's enabled, simply installing a new version would start using it. | 21:44 |
eandersson | Seems risky | 21:44 |
johnsom | Why would we need to restart the whole API process just to upgrade a provider driver? | 21:44 |
eandersson | upgrade a provider driver you most certainly need to restart your process | 21:45 |
johnsom | Well, note, flavors can only be created by an operator. They are not intended to be created by users | 21:45 |
eandersson | since python will cache the python code | 21:45 |
eandersson | installing a new I might understand | 21:45 |
eandersson | but even that is pretty crazy | 21:45 |
johnsom | By calling stevedore, it will pick up new code that is deployed | 21:45 |
rm_work | yes | 21:49 |
rm_work | and also for barbican, we need new clients for different auth | 21:49 |
rm_work | IIRC | 21:49 |
rm_work | so we make new barbican clients a lot | 21:49 |
rm_work | again, IIRC | 21:49 |
eandersson | I tested stevedore and it does not reload new changes | 21:54 |
eandersson | I mean of course if you load X, and then change it to Y it will work | 21:54 |
*** blake has quit IRC | 21:55 | |
eandersson | but if you load X and make changes to X it wont affect until I restart th eapp | 21:55 |
eandersson | but anyway I abandoned that change | 21:56 |
johnsom | Did you change module version? | 21:59 |
johnsom | I didn't have a problem with what you had, but I also don't think we what to make all of them cached inside our code. | 22:00 |
eandersson | Yep - tested installing a new version | 22:07 |
eandersson | The only way to get around it in python is to call reload on the file | 22:07 |
openstackgerrit | Michael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib https://review.openstack.org/613709 | 22:15 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: WIP: Update pylint version https://review.openstack.org/635236 | 22:25 |
eandersson | If you guys have a blueprint, or make one on this I wouldn't mind doing some testing and/or providing some feedback on thos. | 22:34 |
eandersson | I am not trying to be a negative nancy :p | 22:34 |
johnsom | eandersson For which? provider drivers? flavors? | 22:35 |
eandersson | both | 22:35 |
johnsom | Provider drivers: https://docs.openstack.org/octavia/latest/contributor/specs/version1.1/enable-provider-driver.html | 22:35 |
johnsom | Flavors: https://docs.openstack.org/octavia/latest/contributor/specs/version1.0/flavors.html | 22:35 |
eandersson | Thanks johnsom | 22:35 |
johnsom | Those are both fairly old now, Pike and queens merged. | 22:36 |
johnsom | Providers landed in Rocky, but we are refining in Stein. Flavors has landed in Stein | 22:36 |
johnsom | Both have admin guide docs pages as well | 22:37 |
eandersson | Yea - great work on that btw | 22:37 |
eandersson | I am excited about it.. lets just hope our vendors are equally excited haha | 22:38 |
johnsom | I know two vendors already have code working internally | 22:38 |
eandersson | but then again haproxy is great | 22:38 |
johnsom | cgoncalves That looks like it was fun.... | 22:43 |
johnsom | I need to look at what those rules are you disabled | 22:43 |
rm_work | this is my favorite test, because it's just so *me*: https://github.com/openstack/octavia/blob/master/octavia/tests/functional/db/test_repositories.py#L446 | 22:57 |
johnsom | lol, yeah I saw that again recently. Good stuff | 22:57 |
rm_work | does the passive-aggressiveness come through? :P | 22:58 |
johnsom | I need to make another "why are we skipping tests" pass to see if there are more to clean up | 22:58 |
*** Emine has quit IRC | 22:59 | |
johnsom | We have 7 functionals in skip mode... | 23:00 |
*** takamatsu has quit IRC | 23:19 | |
*** celebdor has joined #openstack-lbaas | 23:27 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!