Wednesday, 2019-02-06

johnsom	Yes. Thanks for the patch	00:01
eandersson	Also affects Octavia btw	00:02
*** celebdor has quit IRC		00:02
johnsom	Yeah, I know we have a few regressions still with the octavia API	00:02
johnsom	Hope to tackle the basic ones in Stein.	00:03
*** yamamoto has quit IRC		00:04
*** Emine has quit IRC		00:23
*** salmankhan has quit IRC		00:35
*** Swami has quit IRC		00:57
rm_work	Do we?	01:31
rm_work	I made a gate for that...	01:31
johnsom	For the API performance?	01:34
rm_work	Ah you mean performance regressions	01:34
johnsom	yes	01:34
*** dims has quit IRC		01:40
*** dims has joined #openstack-lbaas		02:11
*** Dinesh_Bhor has joined #openstack-lbaas		02:15
*** dims has quit IRC		02:25
*** yamamoto has joined #openstack-lbaas		02:27
*** yamamoto has quit IRC		02:32
*** dims has joined #openstack-lbaas		02:33
openstackgerrit	Michael Johnson proposed openstack/octavia-lib master: Fix some py3 byte string issues https://review.openstack.org/635087	02:37
*** psachin has joined #openstack-lbaas		03:04
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create requests https://review.openstack.org/635076	03:32
*** yamamoto has joined #openstack-lbaas		03:33
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076	03:34
*** ramishra has joined #openstack-lbaas		04:06
*** ramishra has quit IRC		04:16
*** ramishra has joined #openstack-lbaas		04:17
*** yamamoto has quit IRC		06:01
*** yamamoto has joined #openstack-lbaas		06:10
*** ramishra has quit IRC		06:51
*** ramishra has joined #openstack-lbaas		06:51
*** ramishra has quit IRC		07:00
*** ramishra has joined #openstack-lbaas		07:01
*** jmccrory has quit IRC		07:06
*** jmccrory has joined #openstack-lbaas		07:06
cgoncalves	rm_work, yes, I did but stopped seeing this https://code.visualstudio.com/assets/docs/python/unit-testing/editor-adornments-unittest.png somehow yesterday	07:29
*** pcaruana has joined #openstack-lbaas		07:29
rm_work	hmmm	07:29
cgoncalves	vscode can still detect and I can pick tests	07:29
cgoncalves	it's running now actually	07:29
rm_work	oh that's neat. if you can see it i guess	07:29
rm_work	so how did you configure it to use the right venv for the testing?	07:30
rm_work	I configured the venv and such and it uses it for code completion and linting....	07:30
rm_work	but when i try to run tests, it doesn't use it?	07:30
rm_work	and normally it says no tests discovered, i had to manually hack at it to even get it to try to run some	07:30
cgoncalves	"python.unitTest.unittestEnabled": true,	07:31
cgoncalves	"python.unitTest.pyTestEnabled": false,	07:31
cgoncalves	"python.unitTest.nosetestsEnabled": false,	07:31
cgoncalves	as per https://code.visualstudio.com/docs/python/unit-testing	07:31
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076	07:31
rm_work	ok yeah... and then did yours find any?	07:31
cgoncalves	btw, spare pool job passed https://review.openstack.org/#/c/634988/ :)	07:32
rm_work	that's what I did, and it was like "no tests detected"	07:32
cgoncalves	rm_work, yes	07:32
rm_work	"please configure test locations"	07:32
rm_work	hmm	07:32
cgoncalves	https://snag.gy/qOPtRn.jpg	07:33
*** yboaron has joined #openstack-lbaas		07:39
*** yamamoto has quit IRC		07:39
*** yamamoto has joined #openstack-lbaas		07:41
*** yamamoto has quit IRC		07:41
*** yamamoto has joined #openstack-lbaas		07:41
*** gcheresh has joined #openstack-lbaas		07:46
*** Emine has joined #openstack-lbaas		07:49
*** gcheresh_ has joined #openstack-lbaas		07:53
*** gcheresh has quit IRC		07:53
*** rpittau has joined #openstack-lbaas		08:06
*** AlexStaf has joined #openstack-lbaas		08:07
*** Emine has quit IRC		08:12
*** ramishra has quit IRC		08:55
*** AlexStaf has quit IRC		08:56
*** ramishra has joined #openstack-lbaas		08:57
*** celebdor has joined #openstack-lbaas		09:04
*** takamatsu_ has joined #openstack-lbaas		09:57
*** takamatsu has quit IRC		09:57
*** takamatsu_ has quit IRC		10:00
*** takamatsu_ has joined #openstack-lbaas		10:03
*** Emine has joined #openstack-lbaas		10:14
*** Emine has quit IRC		10:18
*** Emine has joined #openstack-lbaas		10:21
*** yamamoto has quit IRC		10:23
*** psachin has quit IRC		10:24
*** salmankhan has joined #openstack-lbaas		10:27
*** salmankhan has quit IRC		10:28
*** salmankhan has joined #openstack-lbaas		10:29
cgoncalves	All: FYI, proposed release of Octavia stable/queens 2.0.4 -- https://review.openstack.org/#/c/635122/	10:29
*** AlexStaf has joined #openstack-lbaas		10:32
*** salmankhan has quit IRC		10:35
nmagnezi	cgoncalves, thanks for that!	10:46
*** psachin has joined #openstack-lbaas		10:50
*** Emine has quit IRC		10:58
*** salmankhan has joined #openstack-lbaas		10:59
*** celebdor has quit IRC		11:25
*** yamamoto has joined #openstack-lbaas		11:32
*** takamatsu_ has quit IRC		11:48
*** takamatsu_ has joined #openstack-lbaas		11:52
*** yamamoto has quit IRC		11:56
*** celebdor has joined #openstack-lbaas		12:03
*** Emine has joined #openstack-lbaas		12:06
*** Dinesh_Bhor has quit IRC		12:24
*** takamatsu_ has quit IRC		12:24
*** takamatsu has joined #openstack-lbaas		12:24
*** yamamoto has joined #openstack-lbaas		12:37
*** ccamposr has joined #openstack-lbaas		13:12
*** ccamposr has quit IRC		13:26
*** trown\|outtypewww is now known as trown		13:35
*** yamamoto has quit IRC		14:00
*** yamamoto has joined #openstack-lbaas		14:00
*** yamamoto has quit IRC		14:00
*** yamamoto has joined #openstack-lbaas		14:01
*** yamamoto has quit IRC		14:05
*** yamamoto has joined #openstack-lbaas		14:06
*** psachin has quit IRC		14:10
openstackgerrit	Vadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer. https://review.openstack.org/635167	14:34
*** fnaval has joined #openstack-lbaas		15:36
openstackgerrit	Bernhard M. Wiedemann proposed openstack/python-octaviaclient master: Make the documentation reproducible https://review.openstack.org/635194	15:42
*** gcheresh_ has quit IRC		15:50
cgoncalves	Zuul is experiencing some issues. it has a long queue of events to process, it seems. infra team is aware	16:00
johnsom	Yeah, just saw that	16:05
*** ramishra has quit IRC		16:28
cgoncalves	Zuul is back to normal. queue is empty	16:38
*** AlexStaf has quit IRC		16:52
*** celebdor has quit IRC		16:57
*** pcaruana has quit IRC		17:19
-openstackstatus- NOTICE: Any changes failed around 16:30 UTC today with a review comment from Zuul like "ERROR Unable to find playbook" can be safely rechecked; this was an unanticipated side effect of our work to move base job definitions between configuration repositories.		17:27
*** rpittau has quit IRC		17:34
*** gcheresh_ has joined #openstack-lbaas		17:36
*** trown is now known as trown\|lunch		17:45
cgoncalves	rm_work, have you managed to run unit tests in vscode?	17:46
openstackgerrit	Merged openstack/python-octaviaclient master: Make the documentation reproducible https://review.openstack.org/635194	17:48
*** gcheresh_ has quit IRC		17:53
cgoncalves	would anyone object to lift pylint constraint version from ==1.9.2 to >=1.9.2? 1.9.2 doesn't support python 3.7	18:04
cgoncalves	https://github.com/openstack/octavia/commit/0322cbc5c38838648253827610d44e71162978e5	18:04
cgoncalves	^ this was the change that bumped to 1.9.2	18:05
johnsom	Should be fine	18:05
openstackgerrit	Carlos Goncalves proposed openstack/octavia master: Update pylint version https://review.openstack.org/635236	18:13
*** trown\|lunch is now known as trown		18:55
*** salmankhan has quit IRC		19:06
rm_work	cgoncalves: no, not yet	19:32
cgoncalves	rm_work, weird. I started vscode settings and workspace from scratch and it works	19:58
rm_work	i might have to wipe all my settings and try again?	19:59
rm_work	maybe i did something wrong	19:59
cgoncalves	make sure "python.unitTest.unittestEnabled": true is set	19:59
cgoncalves	and disable pyTestEnabled and nosetsEnabled	19:59
cgoncalves	you may have to restart vscode	20:00
johnsom	#startmeeting Octavia	20:00
openstack	Meeting started Wed Feb 6 20:00:04 2019 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.	20:00
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	20:00
*** openstack changes topic to " (Meeting topic: Octavia)"		20:00
openstack	The meeting name has been set to 'octavia'	20:00
johnsom	Hi folks	20:00
cgoncalves	good time of the day	20:00
nmagnezi	o/	20:00
johnsom	#topic Announcements	20:00
*** openstack changes topic to "Announcements (Meeting topic: Octavia)"		20:00
johnsom	We have one month before feature freeze for Stein. Slightly less for the libraries.	20:01
johnsom	Thanks to everyone that has been helping with reviews.	20:01
johnsom	Other than that, I don't think I have any announcements. Any one else?	20:01
johnsom	#topic Brief progress reports / bugs needing review	20:02
*** openstack changes topic to "Brief progress reports / bugs needing review (Meeting topic: Octavia)"		20:02
johnsom	I worked on updating the openstack SDK for our recent new features. All of those patches are up for review and have one +2.	20:03
johnsom	I also spent some time on stable/queens patches which have now merged.	20:03
*** celebdor has joined #openstack-lbaas		20:04
johnsom	Thank you to Carlos for posting a release patch for that.	20:04
johnsom	Hopefully it will go out today.	20:04
nmagnezi	Yeah this tag will include some important patches	20:04
johnsom	Currently I am working on the octavia-lib patch. I have a few more things to do on it, but making progress updating it.	20:04
johnsom	Yes, it has a number of important fixes.	20:05
rm_work	o/	20:05
cgoncalves	the -centos job got broken by https://review.openstack.org/#/c/633141/	20:05
johnsom	Once that is done I'm going to focus on code reviews so we can get those features merged in Stein.	20:05
johnsom	Joy. Are you working with Ian on getting that fixed?	20:06
cgoncalves	I didn't have time today to look at it. I just pinged Ian on IRC. timezones make it difficult to sync. I'll try tomorrow my morning	20:07
johnsom	Ok	20:07
johnsom	Any other updates today?	20:07
cgoncalves	amphora spare pool: currently broken in master, will be fixed by https://review.openstack.org/#/c/632594/ and is being successfully tested by a new tempest scenario + job https://review.openstack.org/#/c/634988/	20:07
cgoncalves	I would like to have the spare pool job in queens and rocky too	20:08
johnsom	Does that mean you are going to +2/+W https://review.openstack.org/#/c/632594/ ?	20:09
cgoncalves	and a friendly reminder to johnsom and rm_work to revisit https://review.openstack.org/#/c/627058/ if their time permits	20:09
cgoncalves	I can upvote, sure. it passes the job so... ;)	20:09
*** Emine has quit IRC		20:10
johnsom	Thank you for the reminder	20:10
johnsom	Ok, if there aren't any other updates, I will move on	20:11
johnsom	#topic Open Discussion	20:11
*** openstack changes topic to "Open Discussion (Meeting topic: Octavia)"		20:11
johnsom	Any other topics today?	20:11
johnsom	Pretty light agenda this week	20:11
johnsom	Also note, the summit/PTG discounts start going away at the end of the month, so make sure to ping your managers...	20:12
cgoncalves	not a discussion per se, but just thank everyone who submitted talk proposals about and related to LBaaS/Octavia to the Summit in Denver!	20:13
johnsom	Well, if there isn't any other topics this week we can close out the meeting.	20:13
johnsom	Yes, pretty good turn out for Octavia related talks. I hope they get accepted.	20:14
johnsom	Ok, thanks folks! have a great week.	20:15
johnsom	#endmeeting	20:15
*** openstack changes topic to "Discussions for Octavia \| Stein priority review list: https://etherpad.openstack.org/p/octavia-priority-reviews"		20:15
openstack	Meeting ended Wed Feb 6 20:15:09 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	20:15
openstack	Minutes: http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.html	20:15
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.txt	20:15
openstack	Log: http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.log.html	20:15
rm_work	yeah so it says to wait until you get a speaker code and use that over these codes for the summit -- but i don't know if my talks will be accepted yet, hopefully will know before the deadline for these discount codes :P	20:21
rm_work	also it referenced one that i should use instead if i went to the denver PTG? which I did, but not sure i got that code yet, did you guys?	20:21
cgoncalves	I got a 50% off registration code for being a contributor and another 80% code for had attended the last Denver PTG	20:25
johnsom	Yeah, got both of those e-mails as well	20:26
johnsom	The 80% one for the last PTG had the subject : "Invitation & Discount Registration to Open Infrastructure Summit & PTG Denver" and arrived January 17th for me.	20:27
johnsom	If you can't find it, email summitreg@openstack.org	20:27
rm_work	k will look, prolly got it and just forgot	20:45
rm_work	yep, i did	20:47
rm_work	k	20:47
rm_work	so, hopefully speaker codes happen before the 27th	20:48
rm_work	oh nice, no meals at the summit this time	20:49
cgoncalves	there was an email in the past where the foundation wrote they could make refunds if one ends up getting a speaker code	20:49
cgoncalves	not sure it's still valid	20:50
cgoncalves	what!	20:50
rm_work	part of me kinda hates the buffet stuff, because i like to go out and get good food with folks, and also i tend to miss the timing a lot on those :P	20:50
rm_work	still meals at the PTG tho	20:50
cgoncalves	hmm, well, maybe it is for the best actually. meals in last events were not that great	20:51
rm_work	denver PTG food was great	20:52
rm_work	but the summit food is sometimes meh	20:52
rm_work	wtflol	21:37
rm_work	i have deleted every .vscode i can find, homedir and project dir, and deleted the application and unzipped it fresh from the downloaded zipfile	21:37
rm_work	and it STILL has settings somewhere it's reading	21:38
colin-	seeing health-manager processes with like, ~20% cpu utiliztion	21:58
colin-	that seems weirdly high	21:59
*** celebdor has quit IRC		22:08
colin-	thinking of lowering health_check_interval to see if it has a positive impact	22:15
colin-	locked health_update_threads and stats_update_threads to 8 each to try and keep it isolated to just a few of the cores on the host	22:17
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076	22:33
openstackgerrit	Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076	22:40
eandersson	Cleaning up the git message ^ :p	22:40
rm_work	colin-: which version do you run again?	22:42
rm_work	and yeah, i always see some amount of utilization -- the health manager is busy	22:42
rm_work	so i kinda expect it	22:42
rm_work	cgoncalves: HA! finally found where the settings are	22:42
rm_work	"$HOME/Library/Application Support/Code/User/settings.json"	22:43
rm_work	removed the whole Code directory and now i can try setting this up again	22:43
eandersson	rm_work, rocky	22:44
rm_work	hmm	22:44
rm_work	eandersson: what have you done with JudeC btw	22:44
eandersson	He works on Senlin now :p	22:44
colin-	he is lurking around here on freenode somewhere	22:44
rm_work	ah :D	22:44
cgoncalves	Ctrl+, will show settings. you can click on "open settings (json)" (upper right corner). that gives you autocomplete + documentation	22:45
rm_work	i was wondering if you locked him away in a basement or something	22:45
colin-	is that effective in your experience :)?	22:45
eandersson	We try, but he keeps escaping	22:45
rm_work	colin-: not especially, he seems to work best when he has access to good food and some small amount of sunlight	22:46
eandersson	We sit 30 feet away from the cafeteria	22:46
eandersson	so he has access to food at least :D	22:46
colin-	and still choses to exist on energy drinks lol	22:48
colin-	will try with health_check_interval increased and see if it has any positive impact	22:50
rm_work	what interval were you using?	22:51
rm_work	and how many amps do you have?	22:51
colin-	it was unset so 3	22:51
rm_work	ah i think my interval was 10	22:51
colin-	~650 amphorae	22:52
rm_work	so ~217 per second to handle messages for	22:52
rm_work	how many HM processes are in your rotation?	22:53
colin-	19 total, of which 8 are exhibiting the high CPU usage	22:54
colin-	consistent with the health_update_threads and stats_update_threads i set above	22:54
colin-	not sure what determines how many HMs i have	22:54
rm_work	wow, 19, that is a lot actually :P	22:56
rm_work	so each one with ~12/s	22:56
rm_work	yeah, 20% CPU seems a little high for that	22:56
rm_work	only some were doing it? interesting	22:56
colin-	the higher CPU? yeah but i'm not surprised given that i told those two vars to only use 8 "threads"	22:58
colin-	i think before i set that the others were also using more	22:58
colin-	interestingly, increasing the check interval doesn't seem to have had any positive impact on the CPU utilization per process	22:59
colin-	was pretty confident it would	22:59
rm_work	ah	23:04
rm_work	so yeah	23:04
rm_work	that config is out on the amps, and was set at creation time	23:04
colin-	hehe	23:04
rm_work	so until you re-roll all your amps... they're all going to be on 3s or whatever	23:04
colin-	i see	23:04
rm_work	or until you get the new amp-config api	23:05
eandersson	oh - I wish we had set that earlier :D	23:05
rm_work	and then do a reconfig on all of them	23:05
rm_work	lol yes	23:05
rm_work	sorry :P	23:05
colin-	any other properties related to this in the same category that come to mind?	23:05
rm_work	hmmm	23:05
eandersson	btw I know you love lbaas, but rm_work can you review https://review.openstack.org/#/c/635076/ ?	23:05
colin-	that we might want to also enforce on new resources	23:05
rm_work	the HM address list is also static	23:05
colin-	got it	23:06
rm_work	so yeah be careful about those	23:06
rm_work	use FLIPs if possible so you won't lose them	23:06
rm_work	because old amps will start failing if too many of their IPs change	23:06
colin-	wantd to use FQDNs but it was not meant to be ;)	23:06
rm_work	yeah lol	23:06
rm_work	sorry	23:06
rm_work	no DNS :P	23:06
rm_work	woah, that was a weird repeat	23:07
rm_work	eandersson: what is this project? do we support this? :P	23:07
eandersson	I think it's the Octavia replacement	23:07
rm_work	<_<	23:07
eandersson	:D	23:07
eandersson	to be fair we need the same patch for Octavia as well =]	23:08
rm_work	does this really make that huge of a perf difference?	23:08
eandersson	40s down to 1s	23:08
rm_work	it's loading a single LB O_o	23:08
eandersson	for creating members	23:08
rm_work	how does that take 40s	23:08
eandersson	If you have 400 members	23:08
rm_work	erg	23:09
rm_work	i don't see any test changes?	23:09
rm_work	or new tests? >_>	23:09
eandersson	So I created 200 members in my lab and it went from 24s to 1s	23:09
rm_work	seems this stuff is either untested or badly tested, lol	23:09
eandersson	Well the old tests are more than enough for this	23:09
rm_work	hmm	23:09
eandersson	same as https://review.openstack.org/#/c/568361/	23:09
eandersson	but tbh I don't really need it merged	23:09
eandersson	I just need to fix it internally, and want to make sure there isn't anything crazy going on	23:10
eandersson	and if the community benefits from it thats great!	23:10
rm_work	seems fine	23:10
rm_work	i just like to see coverage not drop	23:10
rm_work	only increase! :P	23:10
eandersson	same of course =]	23:10
eandersson	I mean if this was Octavia I would fully agree	23:10
rm_work	generally any CR with no changes to tests is a red flag	23:10
rm_work	but yeah, since i give just about zero care-units about n-lbaas...	23:11
rm_work	the code itself seems fine so long as the tempest stuff is passing	23:11
eandersson	It's not really changing any underlying functionality, and nothing that the existing tests don't already cover	23:11
rm_work	which it seems they were	23:11
rm_work	so there you go	23:12
eandersson	:p	23:12
eandersson	If I had more time I would add more tests thou	23:12
colin-	hey, so on the topic of the health-manager, i feel like i have a pretty clear picture now of how it's operating in the control plane, but i would like to better understand how it could be improved aside from using less CPU (for example), should i be expressing more or less config to it as our needs grow generally?	23:13
eandersson	You can never have enough coverage	23:13
colin-	what is the expected posture for it?	23:14
colin-	(how many HMs, is that a factor of another value?)	23:14
rm_work	so, my strategy was to have enough HMs that we wouldn't see more than ~50 messages per second to any one	23:15
rm_work	and spread them out on the infrastructure enough that we wouldn't see huge outages	23:15
rm_work	but there's also math about how many you have and what your interval / threshold is	23:15
colin-	was it a consideration ever for the amps to just report their health back? push model v pull?	23:15
johnsom	Geez, look away and there is a huge scrollback	23:15
colin-	hope that doesn't sound judge-y just trying to picture it mentally	23:16
rm_work	if you have 6, and 2 go down, if your interval is 10 and your threshold is 20, then it's possible to get spurious failovers (just as a simple example) if it tries both down HMs in a row	23:16
rm_work	and by "down" i mean, network unroutable / HV dies / whatever	23:16
johnsom	Also make sure you have the HM performance patch. Not sure if you deployment is up to date or not.	23:16
rm_work	it is push	23:16
rm_work	that's what's happening	23:16
rm_work	the HMs are the push destination	23:16
eandersson	https://github.com/openstack/octavia/commit/8c54a226308b2d74c77090e7998100209268694f ?	23:17
rm_work	the amps push their updates via UDP packets on the set interval, round-robin across the list of HM ip/ports	23:17
johnsom	eandersson Yes	23:17
rm_work	the HMs just process the amp's health reporting	23:17
eandersson	Yea we have that one	23:17
colin-	understood rm_work thanks for clarifying that for me	23:18
colin-	all of that just processing the reports? i'm surprised	23:18
johnsom	Oh, since you have neutron-lbaas you probably have the status sync stuff enabled. That will put a higher load on too	23:18
colin-	it must be doing more than i realize	23:18
colin-	no, we don't set not setting sync_provisioning_status johnsom	23:18
colin-	whoops sorry didn't realied i'd started that sentence	23:19
rm_work	lol	23:19
eandersson	man more format in logging statements, I thought I fixed all of those :p	23:19
johnsom	Are you using the event streamer though? (separate setting)	23:19
rm_work	hmmm so you use n-lbaas but DON'T have the sync status?	23:19
rm_work	uhhh	23:19
rm_work	that seems ... non-viable	23:19
rm_work	nothing would ever go ACTIVE in n-lbaas so you'd never be able to do anything with LBs you create	23:20
johnsom	We should probably try to make a hacking check for that logging issue	23:20
eandersson	We don't connect lbaas and octavia	23:20
rm_work	ah it's two different deployments?	23:20
johnsom	Ah, ok	23:20
colin-	yes	23:20
rm_work	interesting. guess that makes sense :P	23:20
eandersson	semi offtopic but I'll throw in another patch to change things like this to be lazy-loaded	23:21
eandersson	> LOG.debug('Health Update finished in: {0} seconds'.format(	23:21
eandersson	No need to build strings we don't use :p	23:22
rm_work	ah heh	23:22
rm_work	yeah that was one of mine	23:22
rm_work	and yeah that's fair	23:22
rm_work	i tend to discount the cycles needed for logging stuff	23:23
eandersson	yea - I mean.. so minor :P	23:23
eandersson	but if it is done often enough it adds up	23:23
colin-	the sense i'm getting from the conversation is that what i'm observing is mostly within expectations, is that right?	23:23
rm_work	honestly, it's hard for me to say what a good expectation is	23:26
rm_work	what i thought was "normal baseline" actually turned out to be insanity	23:26
rm_work	... a couple of different times	23:26
colin-	ok	23:27
openstackgerrit	Merged openstack/octavia master: Improve local hacking checks test coverage https://review.openstack.org/629955	23:28
colin-	i'm concerned about this because it correlates directly with the size of the fleet and right now i don't feel like i have a lot of control over it	23:29
colin-	any tips for shoring up confidence about managing the resource needs of the HM beyond this many amps?	23:30
johnsom	colin- Some time we should talk about what you are seeing and what you have configured.	23:37
johnsom	colin- Also, I have an HM stress tool that you can use to simulate some levels of load. It's how I tested the performance patch.	23:37
johnsom	I didn't follow the how scroll back, so don't have all of the details.	23:38
johnsom	It is total crap code as I slapped it together, but it does work: https://github.com/johnsom/stresshm	23:38
johnsom	I could push a few thousand amps per HM on my desktop VM using that tool.	23:39
colin-	cool, thanks. the best recap i can offer is just that i noticed higher than expected (to me) CPU utilization on some of the health-manager processes (there were 19 total and 8 of them were at ~30% CPU utilization checking on ~650 amps at a default interval of 3s)	23:40
rm_work	yeah, seeing that was one of the times i had to re-evaluate my baseline for "normal"	23:40
johnsom	I couldn't really stress the HM beyond that as my test VM couldn't spawn enough stress threads	23:40
johnsom	It's only some of the HMs?	23:40
colin-	8 of them because i manually set health_update_threads and stats_update_threads to 8 when trying to get a handle on overall CPU utilization	23:41
colin-	at least i think that's why it's only 8 of them	23:41
johnsom	So, a few things...	23:42
johnsom	This setting: https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.health_check_interval	23:42
johnsom	Which defaults to 3 is how often the health check thread polls the DB to find stale amphora records.	23:42
colin-	oh	23:43
colin-	heartbeat frequency what i wanted, then?	23:43
johnsom	There are basically two functions to the HM: 1. it polls the DB looking for missing/stale amphora. 2. It receives the health heartbeats	23:43
johnsom	This is the interval between heartbeats from the amps: https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.heartbeat_interval	23:43
colin-	figured i had a 50/50 shot, oh well	23:45
colin-	appreciate the clarification on that thanks	23:45
johnsom	Sure. Now the other issue is as you have added HMs, only the newly booted pick up the new list. So you may have a fleet that are going to be hot on an older list of HMs.	23:45
johnsom	I just posted patches that let you fix that without failovers.	23:46
johnsom	This one: https://review.openstack.org/#/c/632842/	23:46
johnsom	Basically you would update the controller list in the CWs, then call this API across your amps to have them update the controller list.	23:47
colin-	the number of hosts in that list isn't actually changing for me, it was at two previously and continues to be there (running the octavia services on two hosts in parallel)	23:48
johnsom	So with your numbers, you should only have around 34 amps per HM, that is super low.	23:48
colin-	i don't follow, how did you derive that	23:49
johnsom	Umm, now I am slightly confused. You said you had 19 HMs running right? or do you mean threads there and not processes?	23:49
colin-	was referring to processes of /usr/local/openstack/bin/octavia-health-manager in ps output	23:50
colin-	how about you?	23:50
johnsom	Oh, ok, so two hosts running HM, but they each have a bunch of processes. Got it.	23:50
colin-	yes	23:50
johnsom	We deploy at least three HM hosts.	23:51
colin-	do you have hosts that only run HM services?	23:51
colin-	just curious	23:51
johnsom	No, they are running ~20 containers with various control plane processes in them	23:52
colin-	ok, sounds familiar	23:52
johnsom	Ok, so you have 325 amps per HM instance. That is some load, but not anything super high. With only 8 workers, yeah, I would expect you to have some load. Those eight are always going to be busy.	23:54
colin-	any advice for scaling that meaningfully beyond just trying a higher value and seeing how the control plane reacts? would like to be more deliberate than that	23:56
johnsom	Yeah, we did the math on this back when rm_work was doing his deployment. Let me dig around.	23:57
johnsom	It's bit hard as it's dependent on your hardware, and most importantly the DB performance.	23:57
rm_work	ah yeah lol he said health_check_interval earlier and I even read it as heartbeat_interval i think	23:58
colin-	no harm done i knew it was a toss up when i picked it, was eyeballing heartbeat but wasn't sure which heart it was discussing ;p	23:58
rm_work	also: wow	23:58
rm_work	yeah i totally did not get what you meant	23:58
rm_work	i thought you had 19 HMs running	23:58
rm_work	I ran 6	23:58
rm_work	but you only actually run 2	23:59
colin-	yeah that wasn't super clear sorry	23:59
johnsom	Yeah, capped at 8 workers each	23:59
rm_work	so that's actually ~109 per HM	23:59
colin-	right	23:59
rm_work	per second	23:59
rm_work	that's a lot busier	23:59
rm_work	I would definitely run more than two	23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!