johnsom | Yes. Thanks for the patch | 00:01 |
---|---|---|
eandersson | Also affects Octavia btw | 00:02 |
*** celebdor has quit IRC | 00:02 | |
johnsom | Yeah, I know we have a few regressions still with the octavia API | 00:02 |
johnsom | Hope to tackle the basic ones in Stein. | 00:03 |
*** yamamoto has quit IRC | 00:04 | |
*** Emine has quit IRC | 00:23 | |
*** salmankhan has quit IRC | 00:35 | |
*** Swami has quit IRC | 00:57 | |
rm_work | Do we? | 01:31 |
rm_work | I made a gate for that... | 01:31 |
johnsom | For the API performance? | 01:34 |
rm_work | Ah you mean performance regressions | 01:34 |
johnsom | yes | 01:34 |
*** dims has quit IRC | 01:40 | |
*** dims has joined #openstack-lbaas | 02:11 | |
*** Dinesh_Bhor has joined #openstack-lbaas | 02:15 | |
*** dims has quit IRC | 02:25 | |
*** yamamoto has joined #openstack-lbaas | 02:27 | |
*** yamamoto has quit IRC | 02:32 | |
*** dims has joined #openstack-lbaas | 02:33 | |
openstackgerrit | Michael Johnson proposed openstack/octavia-lib master: Fix some py3 byte string issues https://review.openstack.org/635087 | 02:37 |
*** psachin has joined #openstack-lbaas | 03:04 | |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create requests https://review.openstack.org/635076 | 03:32 |
*** yamamoto has joined #openstack-lbaas | 03:33 | |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076 | 03:34 |
*** ramishra has joined #openstack-lbaas | 04:06 | |
*** ramishra has quit IRC | 04:16 | |
*** ramishra has joined #openstack-lbaas | 04:17 | |
*** yamamoto has quit IRC | 06:01 | |
*** yamamoto has joined #openstack-lbaas | 06:10 | |
*** ramishra has quit IRC | 06:51 | |
*** ramishra has joined #openstack-lbaas | 06:51 | |
*** ramishra has quit IRC | 07:00 | |
*** ramishra has joined #openstack-lbaas | 07:01 | |
*** jmccrory has quit IRC | 07:06 | |
*** jmccrory has joined #openstack-lbaas | 07:06 | |
cgoncalves | rm_work, yes, I did but stopped seeing this https://code.visualstudio.com/assets/docs/python/unit-testing/editor-adornments-unittest.png somehow yesterday | 07:29 |
*** pcaruana has joined #openstack-lbaas | 07:29 | |
rm_work | hmmm | 07:29 |
cgoncalves | vscode can still detect and I can pick tests | 07:29 |
cgoncalves | it's running now actually | 07:29 |
rm_work | oh that's neat. if you can see it i guess | 07:29 |
rm_work | so how did you configure it to use the right venv for the testing? | 07:30 |
rm_work | I configured the venv and such and it uses it for code completion and linting.... | 07:30 |
rm_work | but when i try to run tests, it doesn't use it? | 07:30 |
rm_work | and normally it says no tests discovered, i had to manually hack at it to even get it to try to run some | 07:30 |
cgoncalves | "python.unitTest.unittestEnabled": true, | 07:31 |
cgoncalves | "python.unitTest.pyTestEnabled": false, | 07:31 |
cgoncalves | "python.unitTest.nosetestsEnabled": false, | 07:31 |
cgoncalves | as per https://code.visualstudio.com/docs/python/unit-testing | 07:31 |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076 | 07:31 |
rm_work | ok yeah... and then did yours find any? | 07:31 |
cgoncalves | btw, spare pool job passed https://review.openstack.org/#/c/634988/ :) | 07:32 |
rm_work | that's what I did, and it was like "no tests detected" | 07:32 |
cgoncalves | rm_work, yes | 07:32 |
rm_work | "please configure test locations" | 07:32 |
rm_work | hmm | 07:32 |
cgoncalves | https://snag.gy/qOPtRn.jpg | 07:33 |
*** yboaron has joined #openstack-lbaas | 07:39 | |
*** yamamoto has quit IRC | 07:39 | |
*** yamamoto has joined #openstack-lbaas | 07:41 | |
*** yamamoto has quit IRC | 07:41 | |
*** yamamoto has joined #openstack-lbaas | 07:41 | |
*** gcheresh has joined #openstack-lbaas | 07:46 | |
*** Emine has joined #openstack-lbaas | 07:49 | |
*** gcheresh_ has joined #openstack-lbaas | 07:53 | |
*** gcheresh has quit IRC | 07:53 | |
*** rpittau has joined #openstack-lbaas | 08:06 | |
*** AlexStaf has joined #openstack-lbaas | 08:07 | |
*** Emine has quit IRC | 08:12 | |
*** ramishra has quit IRC | 08:55 | |
*** AlexStaf has quit IRC | 08:56 | |
*** ramishra has joined #openstack-lbaas | 08:57 | |
*** celebdor has joined #openstack-lbaas | 09:04 | |
*** takamatsu_ has joined #openstack-lbaas | 09:57 | |
*** takamatsu has quit IRC | 09:57 | |
*** takamatsu_ has quit IRC | 10:00 | |
*** takamatsu_ has joined #openstack-lbaas | 10:03 | |
*** Emine has joined #openstack-lbaas | 10:14 | |
*** Emine has quit IRC | 10:18 | |
*** Emine has joined #openstack-lbaas | 10:21 | |
*** yamamoto has quit IRC | 10:23 | |
*** psachin has quit IRC | 10:24 | |
*** salmankhan has joined #openstack-lbaas | 10:27 | |
*** salmankhan has quit IRC | 10:28 | |
*** salmankhan has joined #openstack-lbaas | 10:29 | |
cgoncalves | All: FYI, proposed release of Octavia stable/queens 2.0.4 -- https://review.openstack.org/#/c/635122/ | 10:29 |
*** AlexStaf has joined #openstack-lbaas | 10:32 | |
*** salmankhan has quit IRC | 10:35 | |
nmagnezi | cgoncalves, thanks for that! | 10:46 |
*** psachin has joined #openstack-lbaas | 10:50 | |
*** Emine has quit IRC | 10:58 | |
*** salmankhan has joined #openstack-lbaas | 10:59 | |
*** celebdor has quit IRC | 11:25 | |
*** yamamoto has joined #openstack-lbaas | 11:32 | |
*** takamatsu_ has quit IRC | 11:48 | |
*** takamatsu_ has joined #openstack-lbaas | 11:52 | |
*** yamamoto has quit IRC | 11:56 | |
*** celebdor has joined #openstack-lbaas | 12:03 | |
*** Emine has joined #openstack-lbaas | 12:06 | |
*** Dinesh_Bhor has quit IRC | 12:24 | |
*** takamatsu_ has quit IRC | 12:24 | |
*** takamatsu has joined #openstack-lbaas | 12:24 | |
*** yamamoto has joined #openstack-lbaas | 12:37 | |
*** ccamposr has joined #openstack-lbaas | 13:12 | |
*** ccamposr has quit IRC | 13:26 | |
*** trown|outtypewww is now known as trown | 13:35 | |
*** yamamoto has quit IRC | 14:00 | |
*** yamamoto has joined #openstack-lbaas | 14:00 | |
*** yamamoto has quit IRC | 14:00 | |
*** yamamoto has joined #openstack-lbaas | 14:01 | |
*** yamamoto has quit IRC | 14:05 | |
*** yamamoto has joined #openstack-lbaas | 14:06 | |
*** psachin has quit IRC | 14:10 | |
openstackgerrit | Vadim Ponomarev proposed openstack/octavia master: Fix check redirect pool for creating a fully populated load balancer. https://review.openstack.org/635167 | 14:34 |
*** fnaval has joined #openstack-lbaas | 15:36 | |
openstackgerrit | Bernhard M. Wiedemann proposed openstack/python-octaviaclient master: Make the documentation reproducible https://review.openstack.org/635194 | 15:42 |
*** gcheresh_ has quit IRC | 15:50 | |
cgoncalves | Zuul is experiencing some issues. it has a long queue of events to process, it seems. infra team is aware | 16:00 |
johnsom | Yeah, just saw that | 16:05 |
*** ramishra has quit IRC | 16:28 | |
cgoncalves | Zuul is back to normal. queue is empty | 16:38 |
*** AlexStaf has quit IRC | 16:52 | |
*** celebdor has quit IRC | 16:57 | |
*** pcaruana has quit IRC | 17:19 | |
-openstackstatus- NOTICE: Any changes failed around 16:30 UTC today with a review comment from Zuul like "ERROR Unable to find playbook" can be safely rechecked; this was an unanticipated side effect of our work to move base job definitions between configuration repositories. | 17:27 | |
*** rpittau has quit IRC | 17:34 | |
*** gcheresh_ has joined #openstack-lbaas | 17:36 | |
*** trown is now known as trown|lunch | 17:45 | |
cgoncalves | rm_work, have you managed to run unit tests in vscode? | 17:46 |
openstackgerrit | Merged openstack/python-octaviaclient master: Make the documentation reproducible https://review.openstack.org/635194 | 17:48 |
*** gcheresh_ has quit IRC | 17:53 | |
cgoncalves | would anyone object to lift pylint constraint version from ==1.9.2 to >=1.9.2? 1.9.2 doesn't support python 3.7 | 18:04 |
cgoncalves | https://github.com/openstack/octavia/commit/0322cbc5c38838648253827610d44e71162978e5 | 18:04 |
cgoncalves | ^ this was the change that bumped to 1.9.2 | 18:05 |
johnsom | Should be fine | 18:05 |
openstackgerrit | Carlos Goncalves proposed openstack/octavia master: Update pylint version https://review.openstack.org/635236 | 18:13 |
*** trown|lunch is now known as trown | 18:55 | |
*** salmankhan has quit IRC | 19:06 | |
rm_work | cgoncalves: no, not yet | 19:32 |
cgoncalves | rm_work, weird. I started vscode settings and workspace from scratch and it works | 19:58 |
rm_work | i might have to wipe all my settings and try again? | 19:59 |
rm_work | maybe i did something wrong | 19:59 |
cgoncalves | make sure "python.unitTest.unittestEnabled": true is set | 19:59 |
cgoncalves | and disable pyTestEnabled and nosetsEnabled | 19:59 |
cgoncalves | you may have to restart vscode | 20:00 |
johnsom | #startmeeting Octavia | 20:00 |
openstack | Meeting started Wed Feb 6 20:00:04 2019 UTC and is due to finish in 60 minutes. The chair is johnsom. Information about MeetBot at http://wiki.debian.org/MeetBot. | 20:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 20:00 |
*** openstack changes topic to " (Meeting topic: Octavia)" | 20:00 | |
openstack | The meeting name has been set to 'octavia' | 20:00 |
johnsom | Hi folks | 20:00 |
cgoncalves | good time of the day | 20:00 |
nmagnezi | o/ | 20:00 |
johnsom | #topic Announcements | 20:00 |
*** openstack changes topic to "Announcements (Meeting topic: Octavia)" | 20:00 | |
johnsom | We have one month before feature freeze for Stein. Slightly less for the libraries. | 20:01 |
johnsom | Thanks to everyone that has been helping with reviews. | 20:01 |
johnsom | Other than that, I don't think I have any announcements. Any one else? | 20:01 |
johnsom | #topic Brief progress reports / bugs needing review | 20:02 |
*** openstack changes topic to "Brief progress reports / bugs needing review (Meeting topic: Octavia)" | 20:02 | |
johnsom | I worked on updating the openstack SDK for our recent new features. All of those patches are up for review and have one +2. | 20:03 |
johnsom | I also spent some time on stable/queens patches which have now merged. | 20:03 |
*** celebdor has joined #openstack-lbaas | 20:04 | |
johnsom | Thank you to Carlos for posting a release patch for that. | 20:04 |
johnsom | Hopefully it will go out today. | 20:04 |
nmagnezi | Yeah this tag will include some important patches | 20:04 |
johnsom | Currently I am working on the octavia-lib patch. I have a few more things to do on it, but making progress updating it. | 20:04 |
johnsom | Yes, it has a number of important fixes. | 20:05 |
rm_work | o/ | 20:05 |
cgoncalves | the -centos job got broken by https://review.openstack.org/#/c/633141/ | 20:05 |
johnsom | Once that is done I'm going to focus on code reviews so we can get those features merged in Stein. | 20:05 |
johnsom | Joy. Are you working with Ian on getting that fixed? | 20:06 |
cgoncalves | I didn't have time today to look at it. I just pinged Ian on IRC. timezones make it difficult to sync. I'll try tomorrow my morning | 20:07 |
johnsom | Ok | 20:07 |
johnsom | Any other updates today? | 20:07 |
cgoncalves | amphora spare pool: currently broken in master, will be fixed by https://review.openstack.org/#/c/632594/ and is being successfully tested by a new tempest scenario + job https://review.openstack.org/#/c/634988/ | 20:07 |
cgoncalves | I would like to have the spare pool job in queens and rocky too | 20:08 |
johnsom | Does that mean you are going to +2/+W https://review.openstack.org/#/c/632594/ ? | 20:09 |
cgoncalves | and a friendly reminder to johnsom and rm_work to revisit https://review.openstack.org/#/c/627058/ if their time permits | 20:09 |
cgoncalves | I can upvote, sure. it passes the job so... ;) | 20:09 |
*** Emine has quit IRC | 20:10 | |
johnsom | Thank you for the reminder | 20:10 |
johnsom | Ok, if there aren't any other updates, I will move on | 20:11 |
johnsom | #topic Open Discussion | 20:11 |
*** openstack changes topic to "Open Discussion (Meeting topic: Octavia)" | 20:11 | |
johnsom | Any other topics today? | 20:11 |
johnsom | Pretty light agenda this week | 20:11 |
johnsom | Also note, the summit/PTG discounts start going away at the end of the month, so make sure to ping your managers... | 20:12 |
cgoncalves | not a discussion per se, but just thank everyone who submitted talk proposals about and related to LBaaS/Octavia to the Summit in Denver! | 20:13 |
johnsom | Well, if there isn't any other topics this week we can close out the meeting. | 20:13 |
johnsom | Yes, pretty good turn out for Octavia related talks. I hope they get accepted. | 20:14 |
johnsom | Ok, thanks folks! have a great week. | 20:15 |
johnsom | #endmeeting | 20:15 |
*** openstack changes topic to "Discussions for Octavia | Stein priority review list: https://etherpad.openstack.org/p/octavia-priority-reviews" | 20:15 | |
openstack | Meeting ended Wed Feb 6 20:15:09 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 20:15 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.html | 20:15 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.txt | 20:15 |
openstack | Log: http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-02-06-20.00.log.html | 20:15 |
rm_work | yeah so it says to wait until you get a speaker code and use that over these codes for the summit -- but i don't know if my talks will be accepted yet, hopefully will know before the deadline for these discount codes :P | 20:21 |
rm_work | also it referenced one that i should use instead if i went to the denver PTG? which I did, but not sure i got that code yet, did you guys? | 20:21 |
cgoncalves | I got a 50% off registration code for being a contributor and another 80% code for had attended the last Denver PTG | 20:25 |
johnsom | Yeah, got both of those e-mails as well | 20:26 |
johnsom | The 80% one for the last PTG had the subject : "Invitation & Discount Registration to Open Infrastructure Summit & PTG Denver" and arrived January 17th for me. | 20:27 |
johnsom | If you can't find it, email summitreg@openstack.org | 20:27 |
rm_work | k will look, prolly got it and just forgot | 20:45 |
rm_work | yep, i did | 20:47 |
rm_work | k | 20:47 |
rm_work | so, hopefully speaker codes happen before the 27th | 20:48 |
rm_work | oh nice, no meals at the summit this time | 20:49 |
cgoncalves | there was an email in the past where the foundation wrote they could make refunds if one ends up getting a speaker code | 20:49 |
cgoncalves | not sure it's still valid | 20:50 |
cgoncalves | what! | 20:50 |
rm_work | part of me kinda hates the buffet stuff, because i like to go out and get good food with folks, and also i tend to miss the timing a lot on those :P | 20:50 |
rm_work | still meals at the PTG tho | 20:50 |
cgoncalves | hmm, well, maybe it is for the best actually. meals in last events were not that great | 20:51 |
rm_work | denver PTG food was great | 20:52 |
rm_work | but the summit food is sometimes meh | 20:52 |
rm_work | wtflol | 21:37 |
rm_work | i have deleted every .vscode i can find, homedir and project dir, and deleted the application and unzipped it fresh from the downloaded zipfile | 21:37 |
rm_work | and it STILL has settings somewhere it's reading | 21:38 |
colin- | seeing health-manager processes with like, ~20% cpu utiliztion | 21:58 |
colin- | that seems weirdly high | 21:59 |
*** celebdor has quit IRC | 22:08 | |
colin- | thinking of lowering health_check_interval to see if it has a positive impact | 22:15 |
colin- | locked health_update_threads and stats_update_threads to 8 each to try and keep it isolated to just a few of the cores on the host | 22:17 |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076 | 22:33 |
openstackgerrit | Erik Olof Gunnar Andersson proposed openstack/neutron-lbaas master: Improve performance on get and create/update/delete requests https://review.openstack.org/635076 | 22:40 |
eandersson | Cleaning up the git message ^ :p | 22:40 |
rm_work | colin-: which version do you run again? | 22:42 |
rm_work | and yeah, i always see some amount of utilization -- the health manager *is* busy | 22:42 |
rm_work | so i kinda expect it | 22:42 |
rm_work | cgoncalves: HA! finally found where the settings are | 22:42 |
rm_work | "$HOME/Library/Application Support/Code/User/settings.json" | 22:43 |
rm_work | removed the whole Code directory and now i can try setting this up again | 22:43 |
eandersson | rm_work, rocky | 22:44 |
rm_work | hmm | 22:44 |
rm_work | eandersson: what have you done with JudeC btw | 22:44 |
eandersson | He works on Senlin now :p | 22:44 |
colin- | he is lurking around here on freenode somewhere | 22:44 |
rm_work | ah :D | 22:44 |
cgoncalves | Ctrl+, will show settings. you can click on "open settings (json)" (upper right corner). that gives you autocomplete + documentation | 22:45 |
rm_work | i was wondering if you locked him away in a basement or something | 22:45 |
colin- | is that effective in your experience :)? | 22:45 |
eandersson | We try, but he keeps escaping | 22:45 |
rm_work | colin-: not especially, he seems to work best when he has access to good food and some small amount of sunlight | 22:46 |
eandersson | We sit 30 feet away from the cafeteria | 22:46 |
eandersson | so he has access to food at least :D | 22:46 |
colin- | and still choses to exist on energy drinks lol | 22:48 |
colin- | will try with health_check_interval increased and see if it has any positive impact | 22:50 |
rm_work | what interval were you using? | 22:51 |
rm_work | and how many amps do you have? | 22:51 |
colin- | it was unset so 3 | 22:51 |
rm_work | ah i think my interval was 10 | 22:51 |
colin- | ~650 amphorae | 22:52 |
rm_work | so ~217 per second to handle messages for | 22:52 |
rm_work | how many HM processes are in your rotation? | 22:53 |
colin- | 19 total, of which 8 are exhibiting the high CPU usage | 22:54 |
colin- | consistent with the health_update_threads and stats_update_threads i set above | 22:54 |
colin- | not sure what determines how many HMs i have | 22:54 |
rm_work | wow, 19, that is a lot actually :P | 22:56 |
rm_work | so each one with ~12/s | 22:56 |
rm_work | yeah, 20% CPU seems a little high for that | 22:56 |
rm_work | only some were doing it? interesting | 22:56 |
colin- | the higher CPU? yeah but i'm not surprised given that i told those two vars to only use 8 "threads" | 22:58 |
colin- | i think before i set that the others were also using more | 22:58 |
colin- | interestingly, increasing the check interval doesn't seem to have had any positive impact on the CPU utilization per process | 22:59 |
colin- | was pretty confident it would | 22:59 |
rm_work | ah | 23:04 |
rm_work | so yeah | 23:04 |
rm_work | that config is out on the amps, and was set at creation time | 23:04 |
colin- | hehe | 23:04 |
rm_work | so until you re-roll all your amps... they're all going to be on 3s or whatever | 23:04 |
colin- | i see | 23:04 |
rm_work | or until you get the new amp-config api | 23:05 |
eandersson | oh - I wish we had set that earlier :D | 23:05 |
rm_work | and then do a reconfig on all of them | 23:05 |
rm_work | lol yes | 23:05 |
rm_work | sorry :P | 23:05 |
colin- | any other properties related to this in the same category that come to mind? | 23:05 |
rm_work | hmmm | 23:05 |
eandersson | btw I know you love lbaas, but rm_work can you review https://review.openstack.org/#/c/635076/ ? | 23:05 |
colin- | that we might want to also enforce on new resources | 23:05 |
rm_work | the HM address list is also static | 23:05 |
colin- | got it | 23:06 |
rm_work | so yeah be careful about those | 23:06 |
rm_work | use FLIPs if possible so you won't lose them | 23:06 |
rm_work | because old amps will start failing if too many of their IPs change | 23:06 |
colin- | wantd to use FQDNs but it was not meant to be ;) | 23:06 |
rm_work | yeah lol | 23:06 |
rm_work | sorry | 23:06 |
rm_work | no DNS :P | 23:06 |
rm_work | woah, that was a weird repeat | 23:07 |
rm_work | eandersson: what is this project? do we support this? :P | 23:07 |
eandersson | I think it's the Octavia replacement | 23:07 |
rm_work | <_< | 23:07 |
eandersson | :D | 23:07 |
eandersson | to be fair we need the same patch for Octavia as well =] | 23:08 |
rm_work | does this really make that huge of a perf difference? | 23:08 |
eandersson | 40s down to 1s | 23:08 |
rm_work | it's loading a single LB O_o | 23:08 |
eandersson | for creating members | 23:08 |
rm_work | how does that take 40s | 23:08 |
eandersson | If you have 400 members | 23:08 |
rm_work | erg | 23:09 |
rm_work | i don't see any test changes? | 23:09 |
rm_work | or new tests? >_> | 23:09 |
eandersson | So I created 200 members in my lab and it went from 24s to 1s | 23:09 |
rm_work | seems this stuff is either untested or badly tested, lol | 23:09 |
eandersson | Well the old tests are more than enough for this | 23:09 |
rm_work | hmm | 23:09 |
eandersson | same as https://review.openstack.org/#/c/568361/ | 23:09 |
eandersson | but tbh I don't really need it merged | 23:09 |
eandersson | I just need to fix it internally, and want to make sure there isn't anything crazy going on | 23:10 |
eandersson | and if the community benefits from it thats great! | 23:10 |
rm_work | seems fine | 23:10 |
rm_work | i just like to see coverage not drop | 23:10 |
rm_work | only increase! :P | 23:10 |
eandersson | same of course =] | 23:10 |
eandersson | I mean if this was Octavia I would fully agree | 23:10 |
rm_work | generally any CR with no changes to tests is a red flag | 23:10 |
rm_work | but yeah, since i give just about zero care-units about n-lbaas... | 23:11 |
rm_work | the code itself seems fine so long as the tempest stuff is passing | 23:11 |
eandersson | It's not really changing any underlying functionality, and nothing that the existing tests don't already cover | 23:11 |
rm_work | which it seems they were | 23:11 |
rm_work | so there you go | 23:12 |
eandersson | :p | 23:12 |
eandersson | If I had more time I would add more tests thou | 23:12 |
colin- | hey, so on the topic of the health-manager, i feel like i have a pretty clear picture now of how it's operating in the control plane, but i would like to better understand how it could be improved aside from using less CPU (for example), should i be expressing more or less config to it as our needs grow generally? | 23:13 |
eandersson | You can never have enough coverage | 23:13 |
colin- | what is the expected posture for it? | 23:14 |
colin- | (how many HMs, is that a factor of another value?) | 23:14 |
rm_work | so, my strategy was to have enough HMs that we wouldn't see more than ~50 messages per second to any one | 23:15 |
rm_work | and spread them out on the infrastructure enough that we wouldn't see huge outages | 23:15 |
rm_work | but there's also math about how many you have and what your interval / threshold is | 23:15 |
colin- | was it a consideration ever for the amps to just report their health back? push model v pull? | 23:15 |
johnsom | Geez, look away and there is a huge scrollback | 23:15 |
colin- | hope that doesn't sound judge-y just trying to picture it mentally | 23:16 |
rm_work | if you have 6, and 2 go down, if your interval is 10 and your threshold is 20, then it's possible to get spurious failovers (just as a simple example) if it tries both down HMs in a row | 23:16 |
rm_work | and by "down" i mean, network unroutable / HV dies / whatever | 23:16 |
johnsom | Also make sure you have the HM performance patch. Not sure if you deployment is up to date or not. | 23:16 |
rm_work | it *is* push | 23:16 |
rm_work | that's what's happening | 23:16 |
rm_work | the HMs are the push destination | 23:16 |
eandersson | https://github.com/openstack/octavia/commit/8c54a226308b2d74c77090e7998100209268694f ? | 23:17 |
rm_work | the amps push their updates via UDP packets on the set interval, round-robin across the list of HM ip/ports | 23:17 |
johnsom | eandersson Yes | 23:17 |
rm_work | the HMs just process the amp's health reporting | 23:17 |
eandersson | Yea we have that one | 23:17 |
colin- | understood rm_work thanks for clarifying that for me | 23:18 |
colin- | all of that just processing the reports? i'm surprised | 23:18 |
johnsom | Oh, since you have neutron-lbaas you probably have the status sync stuff enabled. That will put a higher load on too | 23:18 |
colin- | it must be doing more than i realize | 23:18 |
colin- | no, we don't set not setting sync_provisioning_status johnsom | 23:18 |
colin- | whoops sorry didn't realied i'd started that sentence | 23:19 |
rm_work | lol | 23:19 |
eandersson | man more format in logging statements, I thought I fixed all of those :p | 23:19 |
johnsom | Are you using the event streamer though? (separate setting) | 23:19 |
rm_work | hmmm so you use n-lbaas but DON'T have the sync status? | 23:19 |
rm_work | uhhh | 23:19 |
rm_work | that seems ... non-viable | 23:19 |
rm_work | nothing would ever go ACTIVE in n-lbaas so you'd never be able to do anything with LBs you create | 23:20 |
johnsom | We should probably try to make a hacking check for that logging issue | 23:20 |
eandersson | We don't connect lbaas and octavia | 23:20 |
rm_work | ah it's two different deployments? | 23:20 |
johnsom | Ah, ok | 23:20 |
colin- | yes | 23:20 |
rm_work | interesting. guess that makes sense :P | 23:20 |
eandersson | semi offtopic but I'll throw in another patch to change things like this to be lazy-loaded | 23:21 |
eandersson | > LOG.debug('Health Update finished in: {0} seconds'.format( | 23:21 |
eandersson | No need to build strings we don't use :p | 23:22 |
rm_work | ah heh | 23:22 |
rm_work | yeah that was one of mine | 23:22 |
rm_work | and yeah that's fair | 23:22 |
rm_work | i tend to discount the cycles needed for logging stuff | 23:23 |
eandersson | yea - I mean.. so minor :P | 23:23 |
eandersson | but if it is done often enough it adds up | 23:23 |
colin- | the sense i'm getting from the conversation is that what i'm observing is mostly within expectations, is that right? | 23:23 |
rm_work | honestly, it's hard for me to say what a good expectation is | 23:26 |
rm_work | what i thought was "normal baseline" actually turned out to be insanity | 23:26 |
rm_work | ... a couple of different times | 23:26 |
colin- | ok | 23:27 |
openstackgerrit | Merged openstack/octavia master: Improve local hacking checks test coverage https://review.openstack.org/629955 | 23:28 |
colin- | i'm concerned about this because it correlates directly with the size of the fleet and right now i don't feel like i have a lot of control over it | 23:29 |
colin- | any tips for shoring up confidence about managing the resource needs of the HM beyond this many amps? | 23:30 |
johnsom | colin- Some time we should talk about what you are seeing and what you have configured. | 23:37 |
johnsom | colin- Also, I have an HM stress tool that you can use to simulate some levels of load. It's how I tested the performance patch. | 23:37 |
johnsom | I didn't follow the how scroll back, so don't have all of the details. | 23:38 |
johnsom | It is total crap code as I slapped it together, but it does work: https://github.com/johnsom/stresshm | 23:38 |
johnsom | I could push a few thousand amps per HM on my desktop VM using that tool. | 23:39 |
colin- | cool, thanks. the best recap i can offer is just that i noticed higher than expected (to me) CPU utilization on some of the health-manager processes (there were 19 total and 8 of them were at ~30% CPU utilization checking on ~650 amps at a default interval of 3s) | 23:40 |
rm_work | yeah, seeing that was one of the times i had to re-evaluate my baseline for "normal" | 23:40 |
johnsom | I couldn't really stress the HM beyond that as my test VM couldn't spawn enough stress threads | 23:40 |
johnsom | It's only some of the HMs? | 23:40 |
colin- | 8 of them because i manually set health_update_threads and stats_update_threads to 8 when trying to get a handle on overall CPU utilization | 23:41 |
colin- | at least i think that's why it's only 8 of them | 23:41 |
johnsom | So, a few things... | 23:42 |
johnsom | This setting: https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.health_check_interval | 23:42 |
johnsom | Which defaults to 3 is how often the health check thread polls the DB to find stale amphora records. | 23:42 |
colin- | oh | 23:43 |
colin- | heartbeat frequency what i wanted, then? | 23:43 |
johnsom | There are basically two functions to the HM: 1. it polls the DB looking for missing/stale amphora. 2. It receives the health heartbeats | 23:43 |
johnsom | This is the interval between heartbeats from the amps: https://docs.openstack.org/octavia/latest/configuration/configref.html#health_manager.heartbeat_interval | 23:43 |
colin- | figured i had a 50/50 shot, oh well | 23:45 |
colin- | appreciate the clarification on that thanks | 23:45 |
johnsom | Sure. Now the other issue is as you have added HMs, only the newly booted pick up the new list. So you may have a fleet that are going to be hot on an older list of HMs. | 23:45 |
johnsom | I just posted patches that let you fix that without failovers. | 23:46 |
johnsom | This one: https://review.openstack.org/#/c/632842/ | 23:46 |
johnsom | Basically you would update the controller list in the CWs, then call this API across your amps to have them update the controller list. | 23:47 |
colin- | the number of hosts in that list isn't actually changing for me, it was at two previously and continues to be there (running the octavia services on two hosts in parallel) | 23:48 |
johnsom | So with your numbers, you should only have around 34 amps per HM, that is super low. | 23:48 |
colin- | i don't follow, how did you derive that | 23:49 |
johnsom | Umm, now I am slightly confused. You said you had 19 HMs running right? or do you mean threads there and not processes? | 23:49 |
colin- | was referring to processes of /usr/local/openstack/bin/octavia-health-manager in ps output | 23:50 |
colin- | how about you? | 23:50 |
johnsom | Oh, ok, so two hosts running HM, but they each have a bunch of processes. Got it. | 23:50 |
colin- | yes | 23:50 |
johnsom | We deploy at least three HM hosts. | 23:51 |
colin- | do you have hosts that only run HM services? | 23:51 |
colin- | just curious | 23:51 |
johnsom | No, they are running ~20 containers with various control plane processes in them | 23:52 |
colin- | ok, sounds familiar | 23:52 |
johnsom | Ok, so you have 325 amps per HM instance. That is some load, but not anything super high. With only 8 workers, yeah, I would expect you to have some load. Those eight are always going to be busy. | 23:54 |
colin- | any advice for scaling that meaningfully beyond just trying a higher value and seeing how the control plane reacts? would like to be more deliberate than that | 23:56 |
johnsom | Yeah, we did the math on this back when rm_work was doing his deployment. Let me dig around. | 23:57 |
johnsom | It's bit hard as it's dependent on your hardware, and most importantly the DB performance. | 23:57 |
rm_work | ah yeah lol he said health_check_interval earlier and I even read it as heartbeat_interval i think | 23:58 |
colin- | no harm done i knew it was a toss up when i picked it, was eyeballing heartbeat but wasn't sure which heart it was discussing ;p | 23:58 |
rm_work | also: wow | 23:58 |
rm_work | yeah i totally did not get what you meant | 23:58 |
rm_work | i thought you had 19 HMs running | 23:58 |
rm_work | I ran 6 | 23:58 |
rm_work | but you only actually run 2 | 23:59 |
colin- | yeah that wasn't super clear sorry | 23:59 |
johnsom | Yeah, capped at 8 workers each | 23:59 |
rm_work | so that's actually ~109 per HM | 23:59 |
colin- | right | 23:59 |
rm_work | per second | 23:59 |
rm_work | that's a lot busier | 23:59 |
rm_work | I would definitely run more than two | 23:59 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!