Wednesday, 2019-02-06

johnsomYes.  Thanks for the patch00:01
eanderssonAlso affects Octavia btw00:02
*** celebdor has quit IRC00:02
johnsomYeah, I know we have a few regressions still with the octavia API00:02
johnsomHope to tackle the basic ones in Stein.00:03
*** yamamoto has quit IRC00:04
*** Emine has quit IRC00:23
*** salmankhan has quit IRC00:35
*** Swami has quit IRC00:57
rm_workDo we?01:31
rm_workI made a gate for that...01:31
johnsomFor the API performance?01:34
rm_workAh you mean performance regressions01:34
johnsomyes01:34
*** dims has quit IRC01:40
*** dims has joined #openstack-lbaas02:11
*** Dinesh_Bhor has joined #openstack-lbaas02:15
*** dims has quit IRC02:25
*** yamamoto has joined #openstack-lbaas02:27
*** yamamoto has quit IRC02:32
*** dims has joined #openstack-lbaas02:33
openstackgerritMichael Johnson proposed openstack/octavia-lib master: Fix some py3 byte string issues  https://review.openstack.org/63508702:37
*** psachin has joined #openstack-lbaas03:04
openstackgerritErik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create requests  https://review.openstack.org/63507603:32
*** yamamoto has joined #openstack-lbaas03:33
openstackgerritErik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests  https://review.openstack.org/63507603:34
*** ramishra has joined #openstack-lbaas04:06
*** ramishra has quit IRC04:16
*** ramishra has joined #openstack-lbaas04:17
*** yamamoto has quit IRC06:01
*** yamamoto has joined #openstack-lbaas06:10
*** ramishra has quit IRC06:51
*** ramishra has joined #openstack-lbaas06:51
*** ramishra has quit IRC07:00
*** ramishra has joined #openstack-lbaas07:01
*** jmccrory has quit IRC07:06
*** jmccrory has joined #openstack-lbaas07:06
cgoncalvesrm_work, yes, I did but stopped seeing this https://code.visualstudio.com/assets/docs/python/unit-testing/editor-adornments-unittest.png somehow yesterday07:29
*** pcaruana has joined #openstack-lbaas07:29
rm_workhmmm07:29
cgoncalvesvscode can still detect and I can pick tests07:29
cgoncalvesit's running now actually07:29
rm_workoh that's neat. if you can see it i guess07:29
rm_workso how did you configure it to use the right venv for the testing?07:30
rm_workI configured the venv and such and it uses it for code completion and linting....07:30
rm_workbut when i try to run tests, it doesn't use it?07:30
rm_workand normally it says no tests discovered, i had to manually hack at it to even get it to try to run some07:30
cgoncalves"python.unitTest.unittestEnabled": true,07:31
cgoncalves"python.unitTest.pyTestEnabled": false,07:31
cgoncalves"python.unitTest.nosetestsEnabled": false,07:31
cgoncalvesas per https://code.visualstudio.com/docs/python/unit-testing07:31
openstackgerritErik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests  https://review.openstack.org/63507607:31
rm_workok yeah... and then did yours find any?07:31
cgoncalvesbtw, spare pool job passed https://review.openstack.org/#/c/634988/ :)07:32
rm_workthat's what I did, and it was like "no tests detected"07:32
cgoncalvesrm_work, yes07:32
rm_work"please configure test locations"07:32
rm_workhmm07:32
cgoncalveshttps://snag.gy/qOPtRn.jpg07:33
*** yboaron has joined #openstack-lbaas07:39
*** yamamoto has quit IRC07:39
*** yamamoto has joined #openstack-lbaas07:41
*** yamamoto has quit IRC07:41
*** yamamoto has joined #openstack-lbaas07:41
*** gcheresh has joined #openstack-lbaas07:46
*** Emine has joined #openstack-lbaas07:49
*** gcheresh_ has joined #openstack-lbaas07:53
*** gcheresh has quit IRC07:53
*** rpittau has joined #openstack-lbaas08:06
*** AlexStaf has joined #openstack-lbaas08:07
*** Emine has quit IRC08:12
*** ramishra has quit IRC08:55
*** AlexStaf has quit IRC08:56
*** ramishra has joined #openstack-lbaas08:57
*** celebdor has joined #openstack-lbaas09:04
*** takamatsu_ has joined #openstack-lbaas09:57
*** takamatsu has quit IRC09:57
*** takamatsu_ has quit IRC10:00
*** takamatsu_ has joined #openstack-lbaas10:03
*** Emine has joined #openstack-lbaas10:14
*** Emine has quit IRC10:18
*** Emine has joined #openstack-lbaas10:21
*** yamamoto has quit IRC10:23
*** psachin has quit IRC10:24
*** salmankhan has joined #openstack-lbaas10:27
*** salmankhan has quit IRC10:28
*** salmankhan has joined #openstack-lbaas10:29
cgoncalvesAll: FYI, proposed release of Octavia stable/queens 2.0.4 -- https://review.openstack.org/#/c/635122/10:29
*** AlexStaf has joined #openstack-lbaas10:32
*** salmankhan has quit IRC10:35
nmagnezicgoncalves, thanks for that!10:46
*** psachin has joined #openstack-lbaas10:50
*** Emine has quit IRC10:58
*** salmankhan has joined #openstack-lbaas10:59
*** celebdor has quit IRC11:25
*** yamamoto has joined #openstack-lbaas11:32
*** takamatsu_ has quit IRC11:48
*** takamatsu_ has joined #openstack-lbaas11:52
*** yamamoto has quit IRC11:56
*** celebdor has joined #openstack-lbaas12:03
*** Emine has joined #openstack-lbaas12:06
*** Dinesh_Bhor has quit IRC12:24
*** takamatsu_ has quit IRC12:24
*** takamatsu has joined #openstack-lbaas12:24
*** yamamoto has joined #openstack-lbaas12:37
*** ccamposr has joined #openstack-lbaas13:12
*** ccamposr has quit IRC13:26
*** trown|outtypewww is now known as trown13:35
*** yamamoto has quit IRC14:00
*** yamamoto has joined #openstack-lbaas14:00
*** yamamoto has quit IRC14:00
*** yamamoto has joined #openstack-lbaas14:01
*** yamamoto has quit IRC14:05
*** yamamoto has joined #openstack-lbaas14:06
*** psachin has quit IRC14:10
openstackgerritVadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer.  https://review.openstack.org/63516714:34
*** fnaval has joined #openstack-lbaas15:36
openstackgerritBernhard M. Wiedemann proposed openstack/python-octaviaclient master: Make the documentation reproducible  https://review.openstack.org/63519415:42
*** gcheresh_ has quit IRC15:50
cgoncalvesZuul is experiencing some issues. it has a long queue of events to process, it seems. infra team is aware16:00
johnsomYeah, just saw that16:05
*** ramishra has quit IRC16:28
cgoncalvesZuul is back to normal. queue is empty16:38
*** AlexStaf has quit IRC16:52
*** celebdor has quit IRC16:57
*** pcaruana has quit IRC17:19
-openstackstatus- NOTICE: Any changes failed around 16:30 UTC today with a review comment from Zuul like "ERROR Unable to find playbook" can be safely rechecked; this was an unanticipated side effect of our work to move base job definitions between configuration repositories.17:27
*** rpittau has quit IRC17:34
*** gcheresh_ has joined #openstack-lbaas17:36
*** trown is now known as trown|lunch17:45
cgoncalvesrm_work, have you managed to run unit tests in vscode?17:46
openstackgerritMerged openstack/python-octaviaclient master: Make the documentation reproducible  https://review.openstack.org/63519417:48
*** gcheresh_ has quit IRC17:53
cgoncalveswould anyone object to lift pylint constraint version from ==1.9.2 to >=1.9.2? 1.9.2 doesn't support python 3.718:04
cgoncalveshttps://github.com/openstack/octavia/commit/0322cbc5c38838648253827610d44e71162978e518:04
cgoncalves^ this was the change that bumped to 1.9.218:05
johnsomShould be fine18:05
openstackgerritCarlos Goncalves proposed openstack/octavia master: Update pylint version  https://review.openstack.org/63523618:13
*** trown|lunch is now known as trown18:55
*** salmankhan has quit IRC19:06
rm_workcgoncalves: no, not yet19:32
cgoncalvesrm_work, weird. I started vscode settings and workspace from scratch and it works19:58
rm_worki might have to wipe all my settings and try again?19:59
rm_workmaybe i did something wrong19:59
cgoncalvesmake sure "python.unitTest.unittestEnabled": true is set19:59
cgoncalvesand disable pyTestEnabled and nosetsEnabled19:59
cgoncalvesyou may have to restart vscode20:00
johnsom#startmeeting Octavia20:00
openstackMeeting started Wed Feb  6 20:00:04 2019 UTC and is due to finish in 60 minutes.  The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot.20:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.20:00
*** openstack changes topic to " (Meeting topic: Octavia)"20:00
openstackThe meeting name has been set to 'octavia'20:00
johnsomHi folks20:00
cgoncalvesgood time of the day20:00
nmagnezio/20:00
johnsom#topic Announcements20:00
*** openstack changes topic to "Announcements (Meeting topic: Octavia)"20:00
johnsomWe have one month before feature freeze for Stein.  Slightly less for the libraries.20:01
johnsomThanks to everyone that has been helping with reviews.20:01
johnsomOther than that, I don't think I have any announcements. Any one else?20:01
johnsom#topic Brief progress reports / bugs needing review20:02
*** openstack changes topic to "Brief progress reports / bugs needing review (Meeting topic: Octavia)"20:02
johnsomI worked on updating the openstack SDK for our recent new features. All of those patches are up for review and have one +2.20:03
johnsomI also spent some time on stable/queens patches which have now merged.20:03
*** celebdor has joined #openstack-lbaas20:04
johnsomThank you to Carlos for posting a release patch for that.20:04
johnsomHopefully it will go out today.20:04
nmagneziYeah this tag will include some important patches20:04
johnsomCurrently I am working on the octavia-lib patch. I have a few more things to do on it, but making progress updating it.20:04
johnsomYes, it has a number of important fixes.20:05
rm_worko/20:05
cgoncalvesthe -centos job got broken by https://review.openstack.org/#/c/633141/20:05
johnsomOnce that is done I'm going to focus on code reviews so we can get those features merged in Stein.20:05
johnsomJoy. Are you working with Ian on getting that fixed?20:06
cgoncalvesI didn't have time today to look at it. I just pinged Ian on IRC. timezones make it difficult to sync. I'll try tomorrow my morning20:07
johnsomOk20:07
johnsomAny other updates today?20:07
cgoncalvesamphora spare pool: currently broken in master, will be fixed by https://review.openstack.org/#/c/632594/ and is being successfully tested by a new tempest scenario + job https://review.openstack.org/#/c/634988/20:07
cgoncalvesI would like to have the spare pool job in queens and rocky too20:08
johnsomDoes that mean you are going to +2/+W https://review.openstack.org/#/c/632594/ ?20:09
cgoncalvesand a friendly reminder to johnsom and rm_work to revisit https://review.openstack.org/#/c/627058/ if their time permits20:09
cgoncalvesI can upvote, sure. it passes the job so... ;)20:09
*** Emine has quit IRC20:10
johnsomThank you for the reminder20:10
johnsomOk, if there aren't any other updates, I will move on20:11
johnsom#topic Open Discussion20:11
*** openstack changes topic to "Open Discussion (Meeting topic: Octavia)"20:11
johnsomAny other topics today?20:11
johnsomPretty light agenda this week20:11
johnsomAlso note, the summit/PTG discounts start going away at the end of the month, so make sure to ping your managers...20:12
cgoncalvesnot a discussion per se, but just thank everyone who submitted talk proposals about and related to LBaaS/Octavia to the Summit in Denver!20:13
johnsomWell, if there isn't any other topics this week we can close out the meeting.20:13
johnsomYes, pretty good turn out for Octavia related talks. I hope they get accepted.20:14
johnsomOk, thanks folks! have a great week.20:15
johnsom#endmeeting20:15
*** openstack changes topic to "Discussions for Octavia | Stein priority review list: https://etherpad.openstack.org/p/octavia-priority-reviews"20:15
openstackMeeting ended Wed Feb  6 20:15:09 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)20:15
openstackMinutes:        http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.html20:15
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.txt20:15
openstackLog:            http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.log.html20:15
rm_workyeah so it says to wait until you get a speaker code and use that over these codes for the summit -- but i don't know if my talks will be accepted yet, hopefully will know before the deadline for these discount codes :P20:21
rm_workalso it referenced one that i should use instead if i went to the denver PTG? which I did, but not sure i got that code yet, did you guys?20:21
cgoncalvesI got a 50% off registration code for being a contributor and another 80% code for had attended the last Denver PTG20:25
johnsomYeah, got both of those e-mails as well20:26
johnsomThe 80% one for the last PTG had the subject : "Invitation & Discount Registration to Open Infrastructure Summit & PTG Denver" and arrived January 17th for me.20:27
johnsomIf you can't find it, email summitreg@openstack.org20:27
rm_workk will look, prolly got it and just forgot20:45
rm_workyep, i did20:47
rm_workk20:47
rm_workso, hopefully speaker codes happen before the 27th20:48
rm_workoh nice, no meals at the summit this time20:49
cgoncalvesthere was an email in the past where the foundation wrote they could make refunds if one ends up getting a speaker code20:49
cgoncalvesnot sure it's still valid20:50
cgoncalveswhat!20:50
rm_workpart of me kinda hates the buffet stuff, because i like to go out and get good food with folks, and also i tend to miss the timing a lot on those :P20:50
rm_workstill meals at the PTG tho20:50
cgoncalveshmm, well, maybe it is for the best actually. meals in last events were not that great20:51
rm_workdenver PTG food was great20:52
rm_workbut the summit food is sometimes meh20:52
rm_workwtflol21:37
rm_worki have deleted every .vscode i can find, homedir and project dir, and deleted the application and unzipped it fresh from the downloaded zipfile21:37
rm_workand it STILL has settings somewhere it's reading21:38
colin-seeing health-manager processes with like, ~20% cpu utiliztion21:58
colin-that seems weirdly high21:59
*** celebdor has quit IRC22:08
colin-thinking of lowering  health_check_interval to see if it has a positive impact22:15
colin-locked health_update_threads and stats_update_threads to 8 each to try and keep it isolated to just a few of the cores on the host22:17
openstackgerritErik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests  https://review.openstack.org/63507622:33
openstackgerritErik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests  https://review.openstack.org/63507622:40
eanderssonCleaning up the git message ^ :p22:40
rm_workcolin-: which version do you run again?22:42
rm_workand yeah, i always see some amount of utilization -- the health manager *is* busy22:42
rm_workso i kinda expect it22:42
rm_workcgoncalves: HA! finally found where the settings are22:42
rm_work"$HOME/Library/Application Support/Code/User/settings.json"22:43
rm_workremoved the whole Code directory and now i can try setting this up again22:43
eanderssonrm_work, rocky22:44
rm_workhmm22:44
rm_workeandersson: what have you done with JudeC btw22:44
eanderssonHe works on Senlin now :p22:44
colin-he is lurking around here on freenode somewhere22:44
rm_workah :D22:44
cgoncalvesCtrl+, will show settings. you can click on "open settings (json)" (upper right corner). that gives you autocomplete + documentation22:45
rm_worki was wondering if you locked him away in a basement or something22:45
colin-is that effective in your experience :)?22:45
eanderssonWe try, but he keeps escaping22:45
rm_workcolin-: not especially, he seems to work best when he has access to good food and some small amount of sunlight22:46
eanderssonWe sit 30 feet away from the cafeteria22:46
eanderssonso he has access to food at least :D22:46
colin-and still choses to exist on energy drinks lol22:48
colin-will try with health_check_interval increased and see if it has any positive impact22:50
rm_workwhat interval were you using?22:51
rm_workand how many amps do you have?22:51
colin-it was unset so 322:51
rm_workah i think my interval was 1022:51
colin-~650 amphorae22:52
rm_workso ~217 per second to handle messages for22:52
rm_workhow many HM processes are in your rotation?22:53
colin-19 total, of which 8 are exhibiting the high CPU usage22:54
colin-consistent with the health_update_threads and stats_update_threads i set above22:54
colin-not sure what determines how many HMs i have22:54
rm_workwow, 19, that is a lot actually :P22:56
rm_workso each one with ~12/s22:56
rm_workyeah, 20% CPU seems a little high for that22:56
rm_workonly some were doing it? interesting22:56
colin-the higher CPU? yeah but i'm not surprised given that i told those two vars to only use 8 "threads"22:58
colin-i think before i set that the others were also using more22:58
colin-interestingly, increasing the check interval doesn't seem to have had any positive impact on the CPU utilization per process22:59
colin-was pretty confident it would22:59
rm_workah23:04
rm_workso yeah23:04
rm_workthat config is out on the amps, and was set at creation time23:04
colin-hehe23:04
rm_workso until you re-roll all your amps... they're all going to be on 3s or whatever23:04
colin-i see23:04
rm_workor until you get the new amp-config api23:05
eanderssonoh - I wish we had set that earlier :D23:05
rm_workand then do a reconfig on all of them23:05
rm_worklol yes23:05
rm_worksorry :P23:05
colin-any other properties related to this in the same category that come to mind?23:05
rm_workhmmm23:05
eanderssonbtw I know you love lbaas, but rm_work can you review https://review.openstack.org/#/c/635076/ ?23:05
colin-that we might want to also enforce on new resources23:05
rm_workthe HM address list is also static23:05
colin-got it23:06
rm_workso yeah be careful about those23:06
rm_workuse FLIPs if possible so you won't lose them23:06
rm_workbecause old amps will start failing if too many of their IPs change23:06
colin-wantd to use FQDNs but it was not meant to be ;)23:06
rm_workyeah lol23:06
rm_worksorry23:06
rm_workno DNS :P23:06
rm_workwoah, that was a weird repeat23:07
rm_workeandersson: what is this project? do we support this? :P23:07
eanderssonI think it's the Octavia replacement23:07
rm_work<_<23:07
eandersson:D23:07
eanderssonto be fair we need the same patch for Octavia as well =]23:08
rm_workdoes this really make that huge of a perf difference?23:08
eandersson40s down to 1s23:08
rm_workit's loading a single LB O_o23:08
eanderssonfor creating members23:08
rm_workhow does that take 40s23:08
eanderssonIf you have 400 members23:08
rm_workerg23:09
rm_worki don't see any test changes?23:09
rm_workor new tests? >_>23:09
eanderssonSo I created 200 members in my lab and it went from 24s to 1s23:09
rm_workseems this stuff is either untested or badly tested, lol23:09
eanderssonWell the old tests are more than enough for this23:09
rm_workhmm23:09
eanderssonsame as https://review.openstack.org/#/c/568361/23:09
eanderssonbut tbh I don't really need it merged23:09
eanderssonI just need to fix it internally, and want to make sure there isn't anything crazy going on23:10
eanderssonand if the community benefits from it thats great!23:10
rm_workseems fine23:10
rm_worki just like to see coverage not drop23:10
rm_workonly increase! :P23:10
eanderssonsame of course =]23:10
eanderssonI mean if this was Octavia I would fully agree23:10
rm_workgenerally any CR with no changes to tests is a red flag23:10
rm_workbut yeah, since i give just about zero care-units about n-lbaas...23:11
rm_workthe code itself seems fine so long as the tempest stuff is passing23:11
eanderssonIt's not really changing any underlying functionality, and nothing that the existing tests don't already cover23:11
rm_workwhich it seems they were23:11
rm_workso there you go23:12
eandersson:p23:12
eanderssonIf I had more time I would add more tests thou23:12
colin-hey, so on the topic of the health-manager, i feel like i have a pretty clear picture now of how it's operating in the control plane, but i would like to better understand how it could be improved aside from using less CPU (for example), should i be expressing more or less config to it as our needs grow generally?23:13
eanderssonYou can never have enough coverage23:13
colin-what is the expected posture for it?23:14
colin-(how many HMs, is that a factor of another value?)23:14
rm_workso, my strategy was to have enough HMs that we wouldn't see more than ~50 messages per second to any one23:15
rm_workand spread them out on the infrastructure enough that we wouldn't see huge outages23:15
rm_workbut there's also math about how many you have and what your interval / threshold is23:15
colin-was it a consideration ever for the amps to just report their health back? push model v pull?23:15
johnsomGeez, look away and there is a huge scrollback23:15
colin-hope that doesn't sound judge-y just trying to picture it mentally23:16
rm_workif you have 6, and 2 go down, if your interval is 10 and your threshold is 20, then it's possible to get spurious failovers (just as a simple example) if it tries both down HMs in a row23:16
rm_workand by "down" i mean, network unroutable / HV dies / whatever23:16
johnsomAlso make sure you have the HM performance patch.  Not sure if you deployment is up to date or not.23:16
rm_workit *is* push23:16
rm_workthat's what's happening23:16
rm_workthe HMs are the push destination23:16
eanderssonhttps://github.com/openstack/octavia/commit/8c54a226308b2d74c77090e7998100209268694f ?23:17
rm_workthe amps push their updates via UDP packets on the set interval, round-robin across the list of HM ip/ports23:17
johnsomeandersson Yes23:17
rm_workthe HMs just process the amp's health reporting23:17
eanderssonYea we have that one23:17
colin-understood rm_work thanks for clarifying that for me23:18
colin-all of that just processing the reports? i'm surprised23:18
johnsomOh, since you have neutron-lbaas you probably have the status sync stuff enabled. That will put a higher load on too23:18
colin-it must be doing more than i realize23:18
colin-no, we don't set not setting  sync_provisioning_status johnsom23:18
colin-whoops sorry didn't realied i'd started that sentence23:19
rm_worklol23:19
eanderssonman more format in logging statements, I thought I fixed all of those :p23:19
johnsomAre you using the event streamer though?  (separate setting)23:19
rm_workhmmm so you use n-lbaas but DON'T have the sync status?23:19
rm_workuhhh23:19
rm_workthat seems ... non-viable23:19
rm_worknothing would ever go ACTIVE in n-lbaas so you'd never be able to do anything with LBs you create23:20
johnsomWe should probably try to make a hacking check for that logging issue23:20
eanderssonWe don't connect lbaas and octavia23:20
rm_workah it's two different deployments?23:20
johnsomAh, ok23:20
colin-yes23:20
rm_workinteresting. guess that makes sense :P23:20
eanderssonsemi offtopic but I'll throw in another patch to change things like this to be lazy-loaded23:21
eandersson> LOG.debug('Health Update finished in: {0} seconds'.format(23:21
eanderssonNo need to build strings we don't use :p23:22
rm_workah heh23:22
rm_workyeah that was one of mine23:22
rm_workand yeah that's fair23:22
rm_worki tend to discount the cycles needed for logging stuff23:23
eanderssonyea - I mean.. so minor :P23:23
eanderssonbut if it is done often enough it adds up23:23
colin-the sense i'm getting from the conversation is that what i'm observing is mostly within expectations, is that right?23:23
rm_workhonestly, it's hard for me to say what a good expectation is23:26
rm_workwhat i thought was "normal baseline" actually turned out to be insanity23:26
rm_work... a couple of different times23:26
colin-ok23:27
openstackgerritMerged openstack/octavia master: Improve local hacking checks test coverage  https://review.openstack.org/62995523:28
colin-i'm concerned about this because it correlates directly with the size of the fleet and right now i don't feel like i have a lot of control over it23:29
colin-any tips for shoring up confidence about managing the resource needs of the HM beyond this many amps?23:30
johnsomcolin- Some time we should talk about what you are seeing and what you have configured.23:37
johnsomcolin- Also, I have an HM stress tool that you can use to simulate some levels of load. It's how I tested the performance patch.23:37
johnsomI didn't follow the how scroll back, so don't have all of the details.23:38
johnsomIt is total crap code as I slapped it together, but it does work: https://github.com/johnsom/stresshm23:38
johnsomI could push a few thousand amps per HM on my desktop VM using that tool.23:39
colin-cool, thanks. the best recap i can offer is just that i noticed higher than expected (to me) CPU utilization on some of the health-manager processes (there were 19 total and 8 of them were at ~30% CPU utilization checking on ~650 amps at a default interval of 3s)23:40
rm_workyeah, seeing that was one of the times i had to re-evaluate my baseline for "normal"23:40
johnsomI couldn't really stress the HM beyond that as my test VM couldn't spawn enough stress threads23:40
johnsomIt's only some of the HMs?23:40
colin-8 of them because i manually set health_update_threads and  stats_update_threads to 8 when trying to get a handle on overall CPU utilization23:41
colin-at least i think that's why it's only 8 of them23:41
johnsomSo, a few things...23:42
johnsomThis setting: https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.health_check_interval23:42
johnsomWhich defaults to 3 is how often the health check thread polls the DB to find stale amphora records.23:42
colin-oh23:43
colin-heartbeat frequency what i wanted, then?23:43
johnsomThere are basically two functions to the HM: 1. it polls the DB looking for missing/stale amphora. 2. It receives the health heartbeats23:43
johnsomThis is the interval between heartbeats from the amps: https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.heartbeat_interval23:43
colin-figured i had a 50/50 shot, oh well23:45
colin-appreciate the clarification on that thanks23:45
johnsomSure. Now the other issue is as you have added HMs, only the newly booted pick up the new list. So you may have a fleet that are going to be hot on an older list of HMs.23:45
johnsomI just posted patches that let you fix that without failovers.23:46
johnsomThis one: https://review.openstack.org/#/c/632842/23:46
johnsomBasically you would update the controller list in the CWs, then call this API across your amps to have them update the controller list.23:47
colin-the number of hosts in that list isn't actually changing for me, it was at two previously and continues to be there (running the octavia services on two hosts in parallel)23:48
johnsomSo with your numbers, you should only have around 34 amps per HM, that is super low.23:48
colin-i don't follow, how did you derive that23:49
johnsomUmm, now I am slightly confused. You said you had 19 HMs running right? or do you mean threads there and not processes?23:49
colin-was referring to processes of /usr/local/openstack/bin/octavia-health-manager in ps output23:50
colin-how about you?23:50
johnsomOh, ok, so two hosts running HM, but they each have a bunch of processes. Got it.23:50
colin-yes23:50
johnsomWe deploy at least three HM hosts.23:51
colin-do you have hosts that only run HM services?23:51
colin-just curious23:51
johnsomNo, they are running ~20 containers with various control plane processes in them23:52
colin-ok, sounds familiar23:52
johnsomOk, so you have 325 amps per HM instance. That is some load, but not anything super high. With only 8 workers, yeah, I would expect you to have some load. Those eight are always going to be busy.23:54
colin-any advice for scaling that meaningfully beyond just trying a higher value and seeing how the control plane reacts? would like to be more deliberate than that23:56
johnsomYeah, we did the math on this back when rm_work was doing his deployment.  Let me dig around.23:57
johnsomIt's bit hard as it's dependent on your hardware, and most importantly the DB performance.23:57
rm_workah yeah lol he said health_check_interval earlier and I even read it as heartbeat_interval i think23:58
colin-no harm done i knew it was a toss up when i picked it, was eyeballing heartbeat but wasn't sure which heart it was discussing ;p23:58
rm_workalso: wow23:58
rm_workyeah i totally did not get what you meant23:58
rm_worki thought you had 19 HMs running23:58
rm_workI ran 623:58
rm_workbut you only actually run 223:59
colin-yeah that wasn't super clear sorry23:59
johnsomYeah, capped at 8 workers each23:59
rm_workso that's actually ~109 per HM23:59
colin-right23:59
rm_workper second23:59
rm_workthat's a lot busier23:59
rm_workI would definitely run more than two23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!