Thursday, 2019-02-07

colin-	perhaps it'd be useful for me to phrase another question that is bouncing around in my head here	00:00
colin-	is there a meaningful uppper limit on how many amps two HMs can health check effectively before octavia is the only thing that can live on the box?	00:00
johnsom	So, the question is what is the right mix of HMs vs. workers for a 650 amp cloud	00:00
rm_work	also, i ran my stuff in kubernetes for the API/workers	00:00
colin-	yeah just want to understand the ratio better	00:00
rm_work	and the HM on their own VMs	00:00
rm_work	you're sharing one box for all services?	00:01
colin-	no, there are two discrete hosts in this example	00:01
rm_work	right but each of those two is running o-api, o-cw, o-hm	00:01
rm_work	and o-hk	00:02
rm_work	?	00:02
colin-	yes	00:02
rm_work	k	00:02
rm_work	so, yeah, i found that o-api o-hk o-cw were all very lightweight	00:02
rm_work	so that's probably fine for those	00:02
rm_work	i would probably put the HMs on a series of different boxes	00:03
openstackgerrit	Ian Wienand proposed openstack/octavia master: [dnm] testing pip-and-virtualenv centos fix https://review.openstack.org/635371	00:03
rm_work	but that's just me ;P	00:03
colin-	not ruling that out yet but it isn't an attractive option	00:04
rm_work	i think well, i definitely ran them in different quanitity, so	00:04
rm_work	*quantity	00:04
rm_work	4x API, 2x CW, 1x HK, 6x HM	00:04
rm_work	octavia services work really well in k8s BTW :P	00:05
johnsom	I mean, the work it is doing is very DB bound. I would look at the logs and figure out how long each heartbeat takes to process, and how many per second you are seeing. That should give you an idea of how to tune the number of workers.	00:05
johnsom	We run them in lxc containers	00:05
colin-	have a sincere interest in that rm_work but will probably have to circle back	00:05
colin-	k8s is a parallel effort consuming octavia at the moment	00:05
colin-	don't want to get too chicken and eggy	00:06
colin-	but i'm interested in running them there too for the ease of operation it offers	00:06
colin-	understood johnsom that's a good suggestion	00:06
rm_work	napkin math might be something like: Projected number of LBs * 2 (for active-standby amp count) / health interval = number of amp messages per second	00:06
colin-	johnsom: you run the HMs in lxc containers?	00:07
rm_work	but i think you got that far	00:07
johnsom	yes	00:07
colin-	neat	00:07
johnsom	All of the control plane is in lxc containers	00:07
colin-	did the project ever consider using the existing messaging infrastructure to pass amp health back to HM?	00:08
colin-	amqp	00:08
colin-	or would participating in that have used up too much res on the amp	00:08
rm_work	the question is, what is "X", in the equation: Number of amp messages per second / ma<X> messages per second that a single HM can handle = number of HMs	00:08
rm_work	it would take down any AMQP i can think of :P	00:09
rm_work	lol	00:09
rm_work	RMQ is a horrible point of failure already	00:09
colin-	yeah that captures what i'm trying to figure out pretty well i think rm_work	00:09
colin-	oh ok	00:09
johnsom	Ah yes, some of us have experience with attempting to use rabbit for the health heartbeats. It ended very very poorly for that team.	00:09
rm_work	using RMQ ends very poorly for most teams for any project	00:09
rm_work	lol	00:09
colin-	do you guys feel like HM resource utilization constitutes a meaningful ceiling for your octavia implementations in your clouds?	00:10
rm_work	unless your messaging rate is very low	00:10
rm_work	actually yes	00:10
rm_work	but that's why it's scalable :P	00:10
rm_work	johnsom's figures showed handling several thousand amps per HM	00:11
johnsom	Right, it scales horizontally for the HMs. The DB is probably the limiting factor of how many amps a single deployment can handle. We had some thoughts on that as well, but we haven't worked on that yet.	00:11
rm_work	but i am not sure if that was "per second" or "amps total, given some interval"	00:11
colin-	ok, if it's helpful as feedback the difference between several thousand and 650 would be extremely meaningful for me	00:12
colin-	(fwiw(	00:12
colin-	)*	00:12
rm_work	yes, health was specifically split out into its own simple table with no FK relations, to be put in something like redis possibly in the long term	00:12
johnsom	Yeah, it was amps on the standard 10 second interval	00:12
rm_work	but we never got to it	00:12
colin-	will work on what we've discussed to try and improve that as well as arrive at a better ratio	00:12
rm_work	ok so	00:12
colin-	i've gone to 10s in config now so new amps will use that	00:12
rm_work	100-200 mps?	00:12
rm_work	i think the default is 10s?	00:13
colin-	3s	00:13
colin-	oh wait	00:13
colin-	that's the wrong value, sorry	00:13
rm_work	so if you assume you can handle 200 messages per second on a single HM, you can work that math backwards	00:13
colin-	heartbeat_interval¶	00:13
colin-	Type:integer	00:13
colin-	Default:10	00:13
colin-	you're right	00:13
rm_work	200 (mps max) * 10 (interval) / 2 (amps per LB) = 1000 LBs per HM	00:14
rm_work	I think i did that right	00:15
rm_work	assuming active-standby	00:15
colin-	(true for us)	00:15
rm_work	so if you want to support 100,000 LBs, you need 100 HMs? which seems... a little ridiculous maybe, yes	00:16
rm_work	i'd hope we could get that "max mps" number a bit higher	00:16
rm_work	and yeah, part of it would probably be switching to a faster backend	00:16
rm_work	though i think we have a problem there because of the amount of non-health data we write/pull in order to actually do that operation	00:17
rm_work	we basically pull the whole LB and write to a bunch of different tables to capture member statuses	00:17
johnsom	You are missing the # workers in that formula	00:17
rm_work	err?	00:17
johnsom	Well, each HM can have a configurable # of worker processes. It defaults to as many cores as you have, but theirs is capped at 8 per HM	00:19
rm_work	err yeah true	00:19
rm_work	was yours "1" in your tests?	00:19
johnsom	4	00:19
rm_work	ok so we're assuming linear scaling there?	00:19
rm_work	so maybe 400mps for 8 workers?	00:20
*** fnaval has quit IRC		00:25
colin-	suggesting twice as many LBs?	00:28
colin-	hm	00:28
rm_work	yes, and 8 was below the limit you REALLY could run, right?	00:28
rm_work	assuming these have their own box	00:28
colin-	in this example the hosts are also operating other openstack control plane services so not their own box	00:29
rm_work	ah	00:29
colin-	i chose 8 for health_update_threads and status_update_threads somewhat arbitrarily; it can be increased but when i noticed much higher than expected utilization capping it there was a quick measure to ensure it didn't run away in case it was a near-term issue	00:32
colin-	so i'm not super committed to that value outside of wanting to have some control over how this behaves	00:33
*** yamamoto has quit IRC		01:41
*** yamamoto has joined #openstack-lbaas		02:05
*** psachin has joined #openstack-lbaas		02:57
*** rcernin has joined #openstack-lbaas		03:07
*** Dinesh_Bhor has joined #openstack-lbaas		03:08
*** Dinesh_Bhor has quit IRC		03:30
*** Dinesh_Bhor has joined #openstack-lbaas		03:39
*** ramishra has joined #openstack-lbaas		04:14
*** rcernin has quit IRC		04:35
*** ramishra has quit IRC		04:45
*** Dinesh_Bhor has quit IRC		04:48
*** Dinesh_Bhor has joined #openstack-lbaas		05:00
*** ramishra has joined #openstack-lbaas		05:14
*** gcheresh_ has joined #openstack-lbaas		05:22
*** ramishra has quit IRC		05:29
*** ramishra has joined #openstack-lbaas		05:30
*** gcheresh_ has quit IRC		05:30
*** gcheresh_ has joined #openstack-lbaas		06:26
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/octavia master: Cleaning up logging https://review.openstack.org/635439	06:41
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/octavia master: Cleaning up logging https://review.openstack.org/635439	06:45
eandersson	johnsom, rm_work ^ couldn't help myself after I found a bug in the logging	06:47
johnsom	Please feel free to help us any time	06:57
*** pcaruana has joined #openstack-lbaas		07:02
*** yamamoto has quit IRC		07:15
*** yamamoto has joined #openstack-lbaas		07:16
rm_work	Indeed!	07:18
eandersson	johnsom, is there a reason DriverManager is called everytime a call is made? seems like a waste	07:24
eandersson	e.g. https://github.com/openstack/octavia/blob/master/octavia/controller/healthmanager/health_drivers/update_db.py#L191	07:24
rm_work	I think you may be correct, probably could be moved into the class	07:26
rm_work	And just loaded once	07:26
rm_work	Try it? :D	07:26
rm_work	Though that's only called in the case of a zombie, which should be rare, so I doubt it matters too much	07:28
eandersson	Yea - was looking into it, was more worrieda bout the heartbeat_udp ones, but looks like they are just called once	07:29
*** AlexStaf has joined #openstack-lbaas		07:35
*** yboaron has quit IRC		07:56
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/octavia master: Ensure drivers only need to be loaded once https://review.openstack.org/635447	08:01
eandersson	rm_work, ^ :p	08:02
*** Emine has joined #openstack-lbaas		08:02
*** rpittau has joined #openstack-lbaas		08:06
*** ccamposr has joined #openstack-lbaas		08:22
*** celebdor has joined #openstack-lbaas		08:47
*** yboaron has joined #openstack-lbaas		09:00
*** sapd1 has joined #openstack-lbaas		09:21
*** yamamoto has quit IRC		09:38
*** sapd1 has quit IRC		09:40
*** yamamoto has joined #openstack-lbaas		09:47
openstackgerrit	Carlos Goncalves proposed openstack/octavia-tempest-plugin master: DNM: Add octavia-v2-dsvm-scenario-fedora-latest job https://review.openstack.org/600381	09:49
*** oanson has joined #openstack-lbaas		09:51
*** salmankhan has joined #openstack-lbaas		10:28
*** salmankhan has quit IRC		10:33
*** salmankhan has joined #openstack-lbaas		10:33
*** salmankhan1 has joined #openstack-lbaas		10:38
*** salmankhan has quit IRC		10:38
*** salmankhan1 is now known as salmankhan		10:38
*** Dinesh_Bhor has quit IRC		10:41
*** yamamoto has quit IRC		11:37
*** yamamoto has joined #openstack-lbaas		11:40
*** yamamoto has quit IRC		11:45
*** yamamoto has joined #openstack-lbaas		11:46
*** ccamposr has quit IRC		11:49
*** yamamoto has quit IRC		11:50
*** yamamoto has joined #openstack-lbaas		12:27
*** ramishra has quit IRC		13:29
*** yboaron_ has joined #openstack-lbaas		13:32
*** ramishra has joined #openstack-lbaas		13:33
*** yboaron has quit IRC		13:35
*** KeithMnemonic has joined #openstack-lbaas		13:48
*** ramishra has quit IRC		13:48
*** yboaron_ has quit IRC		14:08
*** yboaron_ has joined #openstack-lbaas		14:13
*** psachin has quit IRC		14:26
*** irclogbot_1 has joined #openstack-lbaas		14:31
openstackgerrit	Nir Magnezi proposed openstack/octavia master: Encrypt certs and keys https://review.openstack.org/627064	14:50
openstackgerrit	Vadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer. https://review.openstack.org/635167	14:51
openstackgerrit	Vadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer. https://review.openstack.org/635167	15:18
*** AlexStaf has quit IRC		15:19
*** gcheresh_ has quit IRC		15:23
*** yboaron_ has quit IRC		15:35
*** yboaron_ has joined #openstack-lbaas		15:36
*** yboaron_ has quit IRC		16:04
*** yboaron_ has joined #openstack-lbaas		16:05
*** yboaron_ has quit IRC		16:05
*** pcaruana has quit IRC		16:21
johnsom	Yeah, that call should be very rare, only firing when nova has failures deleting.	16:30
openstackgerrit	boden proposed openstack/neutron-lbaas master: stop using common db mixin methods https://review.openstack.org/635570	16:53
*** rpittau has quit IRC		17:02
*** salmankhan has quit IRC		17:36
*** trown is now known as trown\|lunch		17:38
openstackgerrit	Michael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib https://review.openstack.org/613709	17:45
openstackgerrit	Michael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib https://review.openstack.org/613709	17:50
*** aojea has joined #openstack-lbaas		18:03
openstackgerrit	Merged openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076	18:45
*** trown\|lunch is now known as trown		19:15
openstackgerrit	Merged openstack/octavia master: Fix flavors support when using spares pool https://review.openstack.org/632594	19:45
*** aojea has quit IRC		20:26
*** blake has joined #openstack-lbaas		20:35
*** celebdor has quit IRC		20:37
eandersson	johnsom, I found a few more that I tried to fix, but I think tempest is changing the driver during runtime	21:30
johnsom	Yeah, some of the driver loaders are intended to pick up changes. Especially when we bring those into the flavors	21:31
johnsom	Like the provider driver loader, that should pickup the new code if a provider driver update is deployed.	21:32
rm_work	yeah, only really the amp-health stuff needs to be "optimized" heavily	21:35
rm_work	everywhere else I would err on the side of leaving it to reload as much as it wants	21:35
johnsom	I think the only thing that loads there is the zombie stuff, which, should fire very rarely	21:36
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/octavia master: [WIP] Ensure drivers only need to be loaded once https://review.openstack.org/635447	21:38
eandersson	Are you saying that any of these would be configured outside of a config file?	21:41
eandersson	Outside of testing?	21:41
eandersson	Because I can't imagine a world where a driver wouldn't be coming from a config file.	21:42
johnsom	Well, for example, provider drivers can be loaded by installing a python module.	21:42
eandersson	Sure - but I would assume that they would be configured in a config file still	21:43
eandersson	and not loaded at runtime after installing said python file?	21:43
eandersson	*module	21:43
johnsom	We are also migrating a number of configuration settings to be configurable via flavors instead of the config file. The config file will be default, but flavors may override that setting.	21:43
eandersson	Interesting	21:43
johnsom	You can enable/disable provider drivers via the config file, but if it's enabled, simply installing a new version would start using it.	21:44
eandersson	Seems risky	21:44
johnsom	Why would we need to restart the whole API process just to upgrade a provider driver?	21:44
eandersson	upgrade a provider driver you most certainly need to restart your process	21:45
johnsom	Well, note, flavors can only be created by an operator. They are not intended to be created by users	21:45
eandersson	since python will cache the python code	21:45
eandersson	installing a new I might understand	21:45
eandersson	but even that is pretty crazy	21:45
johnsom	By calling stevedore, it will pick up new code that is deployed	21:45
rm_work	yes	21:49
rm_work	and also for barbican, we need new clients for different auth	21:49
rm_work	IIRC	21:49
rm_work	so we make new barbican clients a lot	21:49
rm_work	again, IIRC	21:49
eandersson	I tested stevedore and it does not reload new changes	21:54
eandersson	I mean of course if you load X, and then change it to Y it will work	21:54
*** blake has quit IRC		21:55
eandersson	but if you load X and make changes to X it wont affect until I restart th eapp	21:55
eandersson	but anyway I abandoned that change	21:56
johnsom	Did you change module version?	21:59
johnsom	I didn't have a problem with what you had, but I also don't think we what to make all of them cached inside our code.	22:00
eandersson	Yep - tested installing a new version	22:07
eandersson	The only way to get around it in python is to call reload on the file	22:07
openstackgerrit	Michael Johnson proposed openstack/octavia master: Updates Octavia to support octavia-lib https://review.openstack.org/613709	22:15
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: WIP: Update pylint version https://review.openstack.org/635236	22:25
eandersson	If you guys have a blueprint, or make one on this I wouldn't mind doing some testing and/or providing some feedback on thos.	22:34
eandersson	I am not trying to be a negative nancy :p	22:34
johnsom	eandersson For which? provider drivers? flavors?	22:35
eandersson	both	22:35
johnsom	Provider drivers: https://docs.openstack.org/octavia/latest/contributor/specs/version1.1/enable-provider-driver.html	22:35
johnsom	Flavors: https://docs.openstack.org/octavia/latest/contributor/specs/version1.0/flavors.html	22:35
eandersson	Thanks johnsom	22:35
johnsom	Those are both fairly old now, Pike and queens merged.	22:36
johnsom	Providers landed in Rocky, but we are refining in Stein. Flavors has landed in Stein	22:36
johnsom	Both have admin guide docs pages as well	22:37
eandersson	Yea - great work on that btw	22:37
eandersson	I am excited about it.. lets just hope our vendors are equally excited haha	22:38
johnsom	I know two vendors already have code working internally	22:38
eandersson	but then again haproxy is great	22:38
johnsom	cgoncalves That looks like it was fun....	22:43
johnsom	I need to look at what those rules are you disabled	22:43
rm_work	this is my favorite test, because it's just so me: https://github.com/openstack/octavia/blob/master/octavia/tests/functional/db/test_repositories.py#L446	22:57
johnsom	lol, yeah I saw that again recently. Good stuff	22:57
rm_work	does the passive-aggressiveness come through? :P	22:58
johnsom	I need to make another "why are we skipping tests" pass to see if there are more to clean up	22:58
*** Emine has quit IRC		22:59
johnsom	We have 7 functionals in skip mode...	23:00
*** takamatsu has quit IRC		23:19
*** celebdor has joined #openstack-lbaas		23:27

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!