harlowja | klindgren ya, the question is how to get there from where we are now | 00:01 |
---|---|---|
harlowja | that is the mega-change that i'm unsure how to get to :-P | 00:01 |
*** arnoldje has joined #openstack-performance | 00:07 | |
*** arnoldje has quit IRC | 01:05 | |
*** arnoldje has joined #openstack-performance | 01:16 | |
*** paco20151113 has joined #openstack-performance | 01:23 | |
*** arnoldje has quit IRC | 01:28 | |
*** markvoelker has joined #openstack-performance | 01:37 | |
*** mriedem_away is now known as mriedem | 01:59 | |
*** mwagner has joined #openstack-performance | 02:03 | |
*** mriedem has quit IRC | 02:07 | |
kun_huang | ha | 02:43 |
kun_huang | harlowja: “ i think we (yahoo) really want to switch osprofiler to not ceilometer" what's yahoo's alternative of ceilometer? | 02:44 |
harlowja | something internal that users hbase or tsdb or something | 02:45 |
harlowja | as u can imagine metrics gathering and such has existed many years before ceilometer | 02:45 |
harlowja | many many years | 02:45 |
harlowja | so its ummm, sorta hard to say, use this ceilometer thing, when that other stuff has existed for years and is being actively worked on | 02:46 |
harlowja | and the fact that ceilometer has (had?) scale issues doesn't make that sell any easier.. | 02:47 |
harlowja | i think kun_huang http://www.slideshare.net/HBaseCon/ecosystem-session-6/6 references what exists (or what is being built or something) | 02:48 |
harlowja | probably other slides somewhere to | 02:49 |
harlowja | but ya, thats the jist of the reasoning... | 02:50 |
*** bapalm has quit IRC | 03:06 | |
*** harshs has quit IRC | 03:06 | |
*** harshs has joined #openstack-performance | 03:16 | |
*** arnoldje has joined #openstack-performance | 03:33 | |
*** harshs has quit IRC | 03:46 | |
kun_huang | harlowja: I'm sorry, I didn't receive notification of you message... | 04:03 |
kun_huang | harlowja: thank you first, do your team use redis to store some perf data? | 04:04 |
boris-42 | harlowja: so we will do mongodb and influxdb and elasticsearch | 04:05 |
boris-42 | harlowja: for osprofiler | 04:05 |
boris-42 | harlowja: so it will work well | 04:05 |
*** dims has quit IRC | 04:06 | |
kun_huang | boris-42: DinaBelova said there is a performance team in mirantis | 04:09 |
kun_huang | are you in that team? I'm curious about how this team works with other teams in mirantis | 04:13 |
kun_huang | we(huawei) are running a public cloud, but we don't have such a performance team to solve issues | 04:14 |
kun_huang | they solve every performance issues by deploying more nodes | 04:14 |
*** harshs has joined #openstack-performance | 04:46 | |
*** swann has joined #openstack-performance | 04:56 | |
*** serverascode has quit IRC | 05:00 | |
*** swann_ has quit IRC | 05:00 | |
*** mgagne has quit IRC | 05:00 | |
*** mgagne has joined #openstack-performance | 05:00 | |
*** serverascode has joined #openstack-performance | 05:03 | |
klindgren | when you can print servers.... | 05:06 |
kun_huang | klindgren: ? | 05:07 |
klindgren | huawei - is a among many other things a server vendor... no? We at least tested some sku's from you guys. So when you can literally make servers, just adding more boxes is probably at some point the easiest solution. | 05:10 |
kun_huang | klindgren: huawei has many production lines: switch, servers, cloud computing, storage like EMC's, OS forked from SUSE, cellphone..... | 05:12 |
kun_huang | klindgren: I'm in cloud computing... | 05:12 |
*** rpodolyaka1 has joined #openstack-performance | 05:17 | |
*** rpodolyaka1 has quit IRC | 05:22 | |
*** rpodolyaka1 has joined #openstack-performance | 05:42 | |
*** aswadr has joined #openstack-performance | 05:54 | |
*** harshs has quit IRC | 06:11 | |
*** aswadr has quit IRC | 06:55 | |
*** rpodolyaka1 has quit IRC | 06:56 | |
boris-42 | kun_huang: so actually we have 2 teams related to perfromance/scalability | 06:57 |
boris-42 | kun_huang: One is QA team another one is RnD | 06:57 |
boris-42 | kun_huang: so we are testing stuff on daily basis and trying to figure out the issues after that we are invlolving component teams that are fixing issues in our product/upstream that we found | 06:58 |
kun_huang | you have TWO teams only, that is the point. There are many QA teams and RnD teams in huawei | 07:02 |
kun_huang | with geography politic competition aslo | 07:03 |
*** harlowja_at_home has joined #openstack-performance | 07:03 | |
kun_huang | and it's not strange for me to get nothing performance topics from public cloud side | 07:03 |
kun_huang | the QA team I will meet these days looks like "open guys" and could learn something | 07:04 |
*** rpodolyaka1 has joined #openstack-performance | 07:07 | |
*** itsuugo has joined #openstack-performance | 07:07 | |
*** arnoldje has quit IRC | 07:08 | |
*** harlowja_at_home has quit IRC | 07:17 | |
*** rpodolyaka1 has quit IRC | 07:23 | |
*** rpodolyaka1 has joined #openstack-performance | 07:25 | |
*** itsuugo has quit IRC | 08:06 | |
*** itsuugo has joined #openstack-performance | 08:07 | |
*** aojea has joined #openstack-performance | 08:12 | |
*** itsuugo has quit IRC | 08:14 | |
*** rpodolyaka1 has quit IRC | 08:47 | |
*** xek has joined #openstack-performance | 09:03 | |
*** rpodolyaka1 has joined #openstack-performance | 09:23 | |
*** rpodolyaka1 has quit IRC | 09:43 | |
*** rpodolyaka1 has joined #openstack-performance | 09:45 | |
*** markvoelker has quit IRC | 10:05 | |
*** paco20151113 has quit IRC | 10:12 | |
*** aojea has quit IRC | 10:49 | |
*** rpodolyaka1 has quit IRC | 10:50 | |
*** itsuugo has joined #openstack-performance | 11:06 | |
*** markvoelker has joined #openstack-performance | 11:06 | |
*** dims has joined #openstack-performance | 11:08 | |
*** markvoelker has quit IRC | 11:11 | |
*** redixin has joined #openstack-performance | 11:21 | |
*** rmart04 has joined #openstack-performance | 11:23 | |
*** rpodolyaka1 has joined #openstack-performance | 11:57 | |
*** rpodolyaka1 has quit IRC | 11:59 | |
*** rpodolyaka1 has joined #openstack-performance | 12:09 | |
*** itsuugo has quit IRC | 12:19 | |
*** itsuugo has joined #openstack-performance | 12:19 | |
*** markvoelker has joined #openstack-performance | 12:37 | |
*** markvoelker has quit IRC | 12:42 | |
*** itsuugo has quit IRC | 12:53 | |
*** markvoelker has joined #openstack-performance | 13:27 | |
*** regXboi has joined #openstack-performance | 13:34 | |
*** rpodolyaka1 has quit IRC | 13:35 | |
*** msemenov has joined #openstack-performance | 13:52 | |
*** itsuugo has joined #openstack-performance | 13:54 | |
*** rpodolyaka1 has joined #openstack-performance | 13:56 | |
*** mdorman has joined #openstack-performance | 14:09 | |
*** itsuugo has quit IRC | 14:20 | |
*** itsuugo has joined #openstack-performance | 14:20 | |
*** mriedem has joined #openstack-performance | 14:32 | |
*** rvasilets___ has joined #openstack-performance | 14:48 | |
*** ozamiatin has joined #openstack-performance | 14:49 | |
*** rohanion has joined #openstack-performance | 14:51 | |
*** mriedem has quit IRC | 14:55 | |
DinaBelova | harlowja - did you have a chance to wake up? :) | 14:57 |
*** manand has joined #openstack-performance | 14:57 | |
DinaBelova | probably no :) | 14:59 |
DinaBelova | #startmeeting Performance Team | 15:00 |
openstack | Meeting started Tue Nov 17 15:00:02 2015 UTC and is due to finish in 60 minutes. The chair is DinaBelova. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:00 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:00 |
openstack | The meeting name has been set to 'performance_team' | 15:00 |
DinaBelova | hello folks! | 15:00 |
rvasilets___ | o/ | 15:00 |
rohanion | Hi! | 15:00 |
ozamiatin | o/ | 15:00 |
kun_huang | good evening :) | 15:00 |
kun_huang | o/ | 15:00 |
DinaBelova | kun_huang - good evening sir | 15:00 |
DinaBelova | so todays agenda | 15:00 |
boris-42 | hi | 15:00 |
DinaBelova | #link https://wiki.openstack.org/wiki/Meetings/Performance#Agenda_for_next_meeting | 15:00 |
DinaBelova | there was a complain last time that there was not enough time to fill it | 15:01 |
DinaBelova | although this time it looks not so big as well :) | 15:01 |
DinaBelova | so let's start with action items | 15:01 |
DinaBelova | #topic Action Items | 15:01 |
DinaBelova | last time we had two action items | 15:01 |
DinaBelova | #1 was about filling the etherpad https://etherpad.openstack.org/p/rally_scenarios_list with information about Rally scenarios used | 15:02 |
DinaBelova | in your companies :) | 15:02 |
DinaBelova | well, it looks like nothing has changed since previous meeting | 15:02 |
DinaBelova | :( | 15:02 |
*** AugieMena has joined #openstack-performance | 15:02 | |
DinaBelova | I really hoped augiemena3, Kristian_, patrykw_ will fill it | 15:02 |
DinaBelova | although I do not see them here today | 15:03 |
kun_huang | I know Kevin had shared a topic about rally and neutron's control plane benchmarking | 15:03 |
*** ikhudoshyn has joined #openstack-performance | 15:03 | |
DinaBelova | kun_huang - oh, that's cool | 15:03 |
DinaBelova | do you have link to that info? | 15:03 |
*** mriedem has joined #openstack-performance | 15:03 | |
kun_huang | a topic in tokyo, wait a minute | 15:04 |
AugieMena | Dina - my bad, should have filled in with some info | 15:04 |
* mriedem joins late | 15:04 | |
* dims waves hi | 15:04 | |
kun_huang | #link https://www.youtube.com/watch?v=a0qlsH1hoKs | 15:04 |
DinaBelova | #action everyone (who use Rally for OpenStack testing inside your companies) fill etherpad https://etherpad.openstack.org/p/rally_scenarios_list with used scanarios | 15:04 |
DinaBelova | AugieMena - :) | 15:04 |
DinaBelova | please spend some time on this etherpad filling | 15:04 |
DinaBelova | if we want to create a standard it'll be useful to collect some info preliminary | 15:05 |
DinaBelova | mriedem, dims o/ | 15:05 |
DinaBelova | @kun_huang thank you sir | 15:05 |
DinaBelova | lemme take a quick look | 15:05 |
DinaBelova | ah, that's video | 15:05 |
DinaBelova | so after the meeting :) | 15:05 |
DinaBelova | #action DinaBelova go through the https://www.youtube.com/watch?v=a0qlsH1hoKs | 15:05 |
kun_huang | no problem | 15:05 |
DinaBelova | ok, cool | 15:06 |
DinaBelova | so one more action item was on Kristian_ | 15:06 |
*** llu-laptop has joined #openstack-performance | 15:06 | |
*** amaretskiy has joined #openstack-performance | 15:06 | |
DinaBelova | he promised to collect the information about Rally blanks inside ATT | 15:06 |
DinaBelova | it looks like he was not able to join us today | 15:06 |
DinaBelova | #action DinaBelova ping Kristian_ about internal ATT Rally feedback gathering | 15:07 |
DinaBelova | so it looks like we went through the action items :) | 15:07 |
DinaBelova | just once more time - please fill https://etherpad.openstack.org/p/rally_scenarios_list | 15:07 |
DinaBelova | that will be super useful for future recommendations / methodologies creation | 15:08 |
*** pkoniszewski has joined #openstack-performance | 15:08 | |
*** dansmith has joined #openstack-performance | 15:08 | |
DinaBelova | I guess we may go to the next topic | 15:08 |
DinaBelova | #topic Nova-conductor performance issues | 15:08 |
DinaBelova | #link https://etherpad.openstack.org/p/remote-conductor-performance | 15:08 |
*** alaski has joined #openstack-performance | 15:08 | |
boris-42 | DinaBelova: can we retrun back to the previous topic? | 15:08 |
DinaBelova | boris-42 heh :) | 15:09 |
DinaBelova | I dunno how to make that easy using the bot controls | 15:09 |
dansmith | #undo | 15:09 |
DinaBelova | thanks! | 15:09 |
DinaBelova | #undo | 15:09 |
openstack | Removing item from minutes: <ircmeeting.items.Link object at 0xaf28d90> | 15:09 |
DinaBelova | boris-42 - feel free | 15:09 |
boris-42 | dansmith: nice | 15:09 |
boris-42 | DinaBelova: so we (Rally team) started recently workin on certification task | 15:10 |
*** bauzas has joined #openstack-performance | 15:10 | |
boris-42 | #link https://github.com/openstack/rally/tree/master/certification/openstack | 15:10 |
*** claudiub has joined #openstack-performance | 15:10 | |
DinaBelova | boris-42 - sadly I do not have much info about this initiative | 15:10 |
DinaBelova | lemme take a quick look | 15:10 |
boris-42 | DinaBelova: wich is much better way to share your expirience | 15:10 |
* bauzas waves | 15:10 | |
boris-42 | DinaBelova: than just using etherpads | 15:11 |
DinaBelova | boris-42 - that may be cool | 15:11 |
DinaBelova | so it's some task creation on cloud validation | 15:11 |
boris-42 | DinaBelova: so basically it's single task that accepts few arguments about cloud and should generate proper load and test everything that you specified | 15:11 |
DinaBelova | boris-42 - a-ha, cool | 15:11 |
boris-42 | DinaBelova: so basically it's executable etherpad | 15:12 |
DinaBelova | ok, so that may be very useful for this purpose | 15:12 |
boris-42 | DinaBelova: that you are trying to collect | 15:12 |
DinaBelova | thank you sir | 15:12 |
DinaBelova | we may definitely use it | 15:12 |
mriedem | so rally as defcore? | 15:12 |
*** atuvenie_ has joined #openstack-performance | 15:12 | |
kun_huang | boris-42: Has mirantis team used this feature? | 15:12 |
*** abalutoiu has joined #openstack-performance | 15:12 | |
boris-42 | mriedem: so nope | 15:12 |
regXboi | mriedem: I'm trying to wrap my head around that :) | 15:12 |
*** lpetrut has joined #openstack-performance | 15:13 | |
DinaBelova | #info we may use https://github.com/openstack/rally/tree/master/certification/openstack to collect information about Rally scenarios used in verious companies | 15:13 |
boris-42 | mriedem: it's pain in neck to use rally to validate OpenStack | 15:13 |
DinaBelova | kun_huang - I did not hear about this frankly speaking | 15:13 |
boris-42 | mriedem: because you need to create such task and it takes usually 2-3 weeks | 15:13 |
DinaBelova | kun_huang - but as boris-42 said this initiative is fairly new | 15:13 |
boris-42 | mriedem: so we decided to create it once and avoid duplication of effort | 15:14 |
DinaBelova | boris-42 - very useful, thank you sir | 15:14 |
boris-42 | mriedem: our goal is not say is it openstack or not* | 15:14 |
boris-42 | kun_huang: so we just recently made it | 15:14 |
boris-42 | kun_huang: I know about only 1 usage and there were bunch of issues that I am going to address soon | 15:14 |
mriedem | ok, maybe the readme there needs more detail | 15:15 |
DinaBelova | ok, very cool. thanks boris-42! something else to mention here? | 15:15 |
boris-42 | mriedem: what you would like to see there | 15:15 |
boris-42 | mriedem: ? | 15:15 |
mriedem | what it is and what it's used for | 15:15 |
dims | boris-42 why is it called "certification" then? :) | 15:15 |
mriedem | note that i'm not a rally user | 15:15 |
mriedem | right, 'ceritification' makes me think defcore | 15:15 |
DinaBelova | :) | 15:15 |
dims | y | 15:15 |
AugieMena | would someone provide a one-liner on what the purpose of it is? | 15:16 |
kun_huang | I would like to say that is some kind of task template | 15:16 |
kun_huang | tasks template | 15:16 |
DinaBelova | AugieMena - single task to check all OpenStack cloud. And you may fill it with all scenarios you like | 15:16 |
DinaBelova | kun_huang - is that accurate? | 15:16 |
boris-42 | AugieMena: just that will put properl load and SLA on your cloud | 15:16 |
rvasilets___ | I guess to run one big task again cloud and to see measures or different resources | 15:17 |
boris-42 | dims: nope not scenarios | 15:17 |
boris-42 | DinaBelova: nope not scenarios | 15:17 |
kun_huang | DinaBelova: my understanding | 15:17 |
boris-42 | dims: sorry | 15:17 |
DinaBelova | boris-42 :) | 15:17 |
AugieMena | so how will it help make it easier to gather info about what Rally scenarios various companies are using? | 15:17 |
boris-42 | It's the single task that contains bunch of subtasks that will test specified serviced with proper load (based on size & quality of cloud) and proper SLA | 15:18 |
DinaBelova | AugieMena - boris-42 just proposed to create these lists in form of these "certification" tasks to be able to run them | 15:18 |
AugieMena | OK, I see | 15:18 |
DinaBelova | ack! | 15:19 |
boris-42 | AugieMena: separated scenario doesn't mean anything | 15:19 |
boris-42 | AugieMena: without it's arguments, context, runner.... | 15:19 |
DinaBelova | boris-42 - moving forward? :) | 15:19 |
boris-42 | DinaBelova: there is still one question | 15:19 |
DinaBelova | boris-42 - go ahead :) | 15:20 |
*** claudiub has quit IRC | 15:20 | |
boris-42 | dims: so certification is picked because it's like "Rally certification of your cloud" | 15:20 |
kun_huang | boris-42: DinaBelova pls make a note to describe rally's certification work, blogs or slides... I will help to understand | 15:20 |
boris-42 | dims: it certifies the scalability & performance of evertyhing.. | 15:20 |
* DinaBelova guesses boris-42 meant Dina | 15:20 | |
AugieMena | boris-42 - ok, understand the need to provide specifics about arguments used in the scenarios | 15:21 |
DinaBelova | #idea describe rally's certification work, blogs or slides - kun_huang can help with it | 15:21 |
dims | boris-42 : i understand, some link to the official certification activities would help evangelize this better. you will get this question asked again and again :) | 15:21 |
*** kashyap has joined #openstack-performance | 15:22 | |
boris-42 | dims: ) | 15:22 |
DinaBelova | dims - yep, documentation is everything here :) | 15:22 |
boris-42 | dims: honestly we can rename this directory to anything | 15:22 |
boris-42 | dims: but personally I don't like word validation because validation is what Tempest is doing | 15:22 |
boris-42 | =) | 15:22 |
rvasilets___ | ) | 15:23 |
rvasilets___ | or not doing) | 15:23 |
*** arnoldje has joined #openstack-performance | 15:23 | |
DinaBelova | rvasilets___ :) | 15:23 |
DinaBelova | ok, anything else here? | 15:23 |
DinaBelova | ok, moving forward | 15:24 |
DinaBelova | #topic Nova-conductor performance issues | 15:24 |
DinaBelova | ok, so some historical info | 15:24 |
*** andreykurilin__ has joined #openstack-performance | 15:24 | |
DinaBelova | during the Tokyo summit several operators including GoDaddy (ping klindgren) mentioned about issues observed around nova-conductor | 15:25 |
DinaBelova | #link https://etherpad.openstack.org/p/remote-conductor-performance | 15:25 |
DinaBelova | Rackspace mentioned it was well | 15:25 |
* klindgren waves | 15:25 | |
DinaBelova | so it was decided it'll be cool idea to investigate this issue | 15:25 |
DinaBelova | currently all known info is collected in the etherpad ^^ | 15:26 |
DinaBelova | SpamapS has started the investigation of the issue on the local lab | 15:26 |
DinaBelova | afaik he had to switch to something else yesterday, so not sure if anything new has hapenned | 15:26 |
mriedem | i'd be interested to know if moving to oslo.db >= 1.12 helps anything | 15:27 |
rpodolyaka1 | why would it? | 15:27 |
dansmith | also, is everyone still using mysqldb-python in these tests? | 15:27 |
mriedem | dansmith: right | 15:27 |
mriedem | b/c oslo.db < 1.12 | 15:27 |
dansmith | mriedem: is that a yes, or agreement with the question/ | 15:27 |
mriedem | rpodolyaka1: oslo.db 1.12 switched to pymysql | 15:27 |
rpodolyaka1 | oslo.db >= 1.12 does not mean they use pymysql | 15:27 |
mriedem | dansmith: that's agreement | 15:27 |
mriedem | and yes | 15:28 |
rpodolyaka1 | it's only used in oslo.db tests | 15:28 |
rpodolyaka1 | it's up to operator to specify the connection string | 15:28 |
dansmith | right | 15:28 |
mriedem | ooo | 15:28 |
rpodolyaka1 | you may use mysql-python as well | 15:28 |
mriedem | have we deprecated mysql-python? | 15:28 |
DinaBelova | and afaik Rackspace fixed this (or probably looking like this) issue by moving back to MySQL-Python | 15:28 |
rpodolyaka1 | mriedem: I think we actually run the unit tests for it in oslo.db | 15:29 |
dansmith | DinaBelova: I think you're conflating two things there | 15:29 |
mriedem | rax has an out of tree change (that's also a DNM in nova) for direct sql for some db APIs | 15:29 |
alaski | DinaBelova: rackspace went back to an out of tree db api | 15:29 |
mriedem | this is what rax has https://review.openstack.org/#/c/243822/ | 15:29 |
DinaBelova | dansmith - probably, I just remember conversation on Tokyo summit about issue like that | 15:29 |
alaski | essentially dropping sqlalchemy for some calls | 15:29 |
DinaBelova | alaski - a-ha, thank you sir | 15:29 |
DinaBelova | thanks dansmith, mriedem | 15:30 |
mriedem | it'd also be good to know what the conductor/compute ratios are | 15:30 |
DinaBelova | klindgren ^^ | 15:30 |
mriedem | there is some info in the etherpad | 15:31 |
rpodolyaka1 | mriedem: e.g. https://review.openstack.org/#/c/246198/ , there is a separate gate job for mysql-python | 15:31 |
mriedem | rpodolyaka1: so why isn't that deprecated? we want people to move to pymysql don't we? | 15:31 |
alaski | that being said, we are using mysqldb | 15:31 |
DinaBelova | mriedem - yeah, conductor service with 20 workers per server (2 servers, 16 cores per server), 250 HV in the cell | 15:31 |
klindgren | Do you want to see if oslo.db >- 1.12 works better ? Or if pymysql works better | 15:32 |
mriedem | klindgren: pymysql | 15:32 |
klindgren | right now 20 computes * 3 servers | 15:32 |
rpodolyaka1 | mriedem: we let them decide which one they want to use | 15:32 |
klindgren | 2 servers are 16 core boxes, one is an 8 core box | 15:32 |
mriedem | but that requires at least oslo.db >= 1.12 if i'm understanding the change history correctly | 15:32 |
dansmith | klindgren: so 2.5 conductor boxes for 20 computes? | 15:32 |
klindgren | 20 conductors* | 15:32 |
mriedem | rpodolyaka1: yeah but mysql-python is not python 3 compliant and has known issues with eventlet right? | 15:32 |
klindgren | for 250 computes | 15:33 |
*** harlowja_at_home has joined #openstack-performance | 15:33 | |
dansmith | klindgren: that's waaaay low | 15:33 |
rpodolyaka1 | mriedem: right, but as rax experience shows, pymysql does not shine on busy clouds :( | 15:33 |
dansmith | rpodolyaka1: I don't think that's what their experience shows | 15:33 |
rpodolyaka1 | anyway, are we sure that's a bottleneck? | 15:33 |
mriedem | rpodolyaka1: i think those are unrelated | 15:33 |
harlowja_at_home | \o | 15:33 |
DinaBelova | rpodolyaka1 - not yet, sir. Investigation in progress, we're just collecting the ideas of where to look at | 15:33 |
alaski | rpodolyaka1: rax hasn't tried pymysql yet. it's on our backlog to test but we don't have any data on it | 15:34 |
DinaBelova | harlowja_at_home - morning sir! | 15:34 |
mriedem | rpodolyaka1: rax uses mysqldb b/c of their direct to mysql change uses mysqld-python | 15:34 |
mriedem | https://review.openstack.org/#/c/243822/ | 15:34 |
klindgren | rpodolyaka1, I am getting Model server went away errors randomly from nova-computes | 15:34 |
harlowja_at_home | DinaBelova, hi! :) | 15:34 |
rpodolyaka1 | alaski: mriedem: ah, I must have confused them with someone else then. I was pretty sure someone blamed pymysql for causing the load on nova-conductors. and that mysql-python was a solution | 15:35 |
DinaBelova | SpamapS wanted to check if switching to some other JSON lib will help, and I'm going to work on this issue as well (probably start tomorrow) | 15:35 |
dansmith | rpodolyaka1: I'm pretty sure not | 15:35 |
klindgren | dansmith, what would recommend as the number of servers dedicated to nova-conductor to nova-compute ratio? | 15:35 |
rpodolyaka1 | ok | 15:35 |
mriedem | DinaBelova: unless you're on python 2.6, i don't know that the json change in oslo.serialization will make a difference | 15:35 |
alaski | rpodolyaka1: we blame sqlalchemy right now :) but are hopeful that pymysql will be better | 15:35 |
rpodolyaka1 | haha | 15:36 |
dansmith | klindgren: it all depends on your environment and your load.. but I just want to clarify.. above you seemed to confuse a few things | 15:36 |
dims | alaski lol | 15:36 |
dansmith | klindgren: 250 computes and how many physical conductor machines running how many workers? | 15:36 |
DinaBelova | mriedem - well, SpamapS is experimenting here, I'll probably start with some meaningful profiling | 15:36 |
rpodolyaka1 | ++ | 15:36 |
DinaBelova | if will be able to reproduce it | 15:36 |
klindgren | 3 physical boxes one server has 8 cores the others have 16 | 15:36 |
klindgren | running 20 workers each | 15:36 |
dansmith | klindgren: so three total boxes for 250 computes, right? | 15:36 |
klindgren | yep | 15:37 |
dansmith | klindgren: right, so that's insanely low, IMHO | 15:37 |
mriedem | plus, you have $workers > npcu on those conductor boxes, | 15:37 |
dansmith | klindgren: and the answer is: keep increasing conductor boxes until the load is manageable :) | 15:37 |
klindgren | thats a pretty shit answer | 15:37 |
dansmith | mriedem: well, with mysqldb you have to have that | 15:37 |
klindgren | imho | 15:37 |
DinaBelova | klindgren :D | 15:37 |
kun_huang | hah | 15:37 |
dansmith | klindgren: so run some conductors on every compute if you want | 15:38 |
mriedem | dansmith: although local conductor is now deprecated | 15:38 |
dansmith | klindgren: the load is all the same, conductor just concentrates it on a much smaller number of boxes if you choose it to be small | 15:38 |
DinaBelova | dansmith - heh, afair conductors were created to avoid local conductoring? | 15:38 |
dansmith | mriedem: sure, but they can still run conductor on compute if they don't want upgrades to work | 15:38 |
klindgren | fyi this environment has always been remote conductor | 15:38 |
klindgren | and load only started being an issue | 15:38 |
rpodolyaka1 | klindgren: can you run nova-conductor under cProfile on one of the nodes? We haven't seen anything like that on our 200-compute nodes deployments | 15:38 |
klindgren | when we went to kilo | 15:38 |
dansmith | klindgren: are you still on kilo? | 15:38 |
klindgren | "still" | 15:39 |
klindgren | liberty *just* came out | 15:39 |
dansmith | klindgren: that's an important detail, maybe you're just experiencing the load of the flavor migrations | 15:39 |
mriedem | hmmm, flavor migrations in kilo maybe? | 15:39 |
dansmith | klindgren: that's a hugely important data point :) | 15:39 |
klindgren | we ran all the flavor migration commands after upgrade | 15:39 |
dansmith | klindgren: right, but there is still overhead | 15:40 |
klindgren | btw all of this in in the etherpad | 15:40 |
dansmith | klindgren: and it turned out to be higher than we expected even after the migrations were done | 15:40 |
alaski | even after migrations we still saw overhead as well | 15:40 |
DinaBelova | dansmith, mriedem - yep, these details are in the etherpad as well :) | 15:40 |
dansmith | klindgren: but it's gone in liberty because the migration is complete | 15:40 |
dansmith | DinaBelova: I've read the etherpad and didn't get the impression this was just a kilo thing | 15:40 |
DinaBelova | dansmith, ok | 15:40 |
mriedem | DinaBelova: klindgren: i don't see anything about flavor migrations in the etherpad | 15:41 |
DinaBelova | mriedem - I meant kilo-based cloud | 15:41 |
dansmith | DinaBelova: I see that they say they started getting alarms after kilo, but the rest of the text makes it sound like this has always been a problem and just now tipped over the edge | 15:41 |
mriedem | yeah, i just added the notes on the flavor migrations | 15:42 |
dansmith | klindgren: so I think you should add some more capacity for conductors until you move to liberty, at which time you'll probably be able to drop it back down | 15:42 |
DinaBelova | mriedem thanks! | 15:42 |
mriedem | fyi on the flavor migrations for kilo upgrade https://wiki.openstack.org/wiki/ReleaseNotes/Kilo#Upgrade_Notes_2 | 15:42 |
dansmith | klindgren: going forward, we have some better machinery to help us avoid the continued overhead once everything is upgraded | 15:42 |
*** itsuugo has quit IRC | 15:43 | |
DinaBelova | ok, so any other points for investigators to look at (except flavor migrations and JSON libs)? // not mentioning some profiling to find the real bottleneck // | 15:43 |
dansmith | klindgren: and also, the flavor migration was about the largest migration we could have done, so it almost can't be worse in the future | 15:43 |
dansmith | DinaBelova: I don't think there is a bottleneck to find, it sounds like | 15:43 |
mriedem | DinaBelova: i'm always curious about rogue periodic tasks in the compute nodes hitting the db too often and pulling too many instances | 15:43 |
dansmith | DinaBelova: I think this is likely due to flavor migrations we were doing in kilo and nothing more | 15:43 |
dansmith | DinaBelova: conductor-specific bottlenecks I mean | 15:43 |
mriedem | but roge periodic tasks pulling too much data could also mean you need to purge your db | 15:43 |
mriedem | *rogue | 15:43 |
alaski | dansmith: not conductor specific bottlenecks, but there are db bottlenecks which conductor amplifies | 15:44 |
DinaBelova | dansmith - that may be very probable answer, I just want to reproduce the same situation klindgren is seeing, track that's about flavor migrations, and check everything is ok on liberty | 15:44 |
dansmith | alaski: yes, totes | 15:44 |
DinaBelova | that is also an answer | 15:45 |
DinaBelova | not mentioning something interesting may be found on what alaski has mentioned | 15:45 |
DinaBelova | ok, cool. | 15:45 |
dansmith | I shouldn't have said "no bottleneck to find" I meant that I think the kilo-centric bit that is the immediate problem is flavor migrations | 15:45 |
DinaBelova | dansmith, yep, gotcha | 15:46 |
dansmith | I'm also amazed that they _were_ fine with 2.5 conductor boxes for 250 computes | 15:46 |
klindgren | it it possible to turn off flavor mgirations under kilo to see if things get better? | 15:46 |
dansmith | klindgren: not really, no | 15:47 |
mriedem | not configurable, it happens in the code | 15:47 |
DinaBelova | klindgren, suffer :) | 15:47 |
dansmith | klindgren: we can have a back alley chat about some hacking you can do if you want | 15:47 |
klindgren | dansmith, can you provide what your mind is an acceptable conductor -> compute ratio? | 15:47 |
dansmith | klindgren: and if I may say, the next time you hit some spike when you roll to a release, please come to the nova channel and raise it :) | 15:47 |
DinaBelova | dansmith - I think if klindgren will be ok with trying some code hacking, I suppose this session will be very useful | 15:48 |
dansmith | klindgren: as I said, there is no magic number.. 1% is much lower than I would have expected would be reasonable for anyone, but you're proving it's doable, which also points to there being no magic number :) | 15:48 |
DinaBelova | #idea check if issue GoDaddy is facing is related to the flavor migrations or just to the too low conductor/compute ratio | 15:49 |
DinaBelova | klindgren - are you interesting in hacking session dansmith has proposed? | 15:49 |
dansmith | I think it's also worth pointing out, | 15:49 |
dansmith | since my answer was "shit" about having enough boxes to handle the load, | 15:50 |
bauzas | maybe running a GMR ? | 15:50 |
dansmith | that conductor separate from computes is mostly an upgrade story | 15:50 |
harlowja_at_home | just out of curiosity since there isn't a magic number, has any bunch of companies shared there conductor ratios with the world, then we can derive a 'suggested' number from those shared values...? | 15:50 |
klindgren | technically 2 -> 250 was working as well. Adding another physical box didn't actually fix anything, it just resulted in burning cpu on that server as well. | 15:50 |
dansmith | if you don't care about that, you can run a few conductor workers on every compute and distribute the load everywhere | 15:50 |
DinaBelova | dansmith thanks for the note | 15:51 |
klindgren | I mean if local conductor is depreacted - and remote conductor ian upgrade story - people are going to need to know a conductor to compute ratio that is "safe" | 15:51 |
DinaBelova | harlowja_at_home - did not hear about that :( | 15:51 |
dansmith | klindgren: 100% is safe | 15:51 |
klindgren | otherwise people are going to be blowing up their cloud | 15:51 |
harlowja_at_home | :-/ | 15:51 |
DinaBelova | klindgren probably we need to write an email to the operators email list | 15:51 |
dansmith | klindgren: let me ask you a question.. how many api nodes should everyone run? | 15:52 |
DinaBelova | and try to find what ratio do other folks have | 15:52 |
klindgren | then un-deprecate local-condcutor because obvious remote-conductor is planned out | 15:52 |
klindgren | is not well planned out* | 15:52 |
harlowja_at_home | DinaBelova, i'd like that | 15:52 |
DinaBelova | #action DinaBelova klindgren compose an email to the operators list and find out what conductors/computes ratio is used | 15:53 |
mriedem | can you do rolling upgrades with cells though? i thought not. | 15:53 |
DinaBelova | dansmith - well, I guess there is no right answer here :) | 15:53 |
dansmith | DinaBelova: right, that's what I'm trying to get at.. if I never create/destroy nodes, I can use one api worker for 250 computes :) | 15:53 |
dansmith | s/nodes/instances/ | 15:54 |
DinaBelova | dansmith :D | 15:54 |
klindgren | its almost always been possible in the past to run n-1 in cells | 15:54 |
manand | while we are on the subject of ratio, is this something we should look across other components such as network node to compute ration etc., | 15:54 |
dansmith | klindgren: just so you know, we think that's crazy :) | 15:54 |
alaski | klindgren: that has been by chance though. there's no code to ensure it works | 15:54 |
DinaBelova | manand - yep, great note | 15:54 |
dansmith | whether or not it works :) | 15:54 |
DinaBelova | ok, folks, we've spent much time on this item | 15:55 |
mriedem | reminds me of the rpc compat bug in the cells code i saw last week... | 15:55 |
dansmith | yeah | 15:55 |
DinaBelova | it losos like we'll return to it back after the meeting | 15:55 |
DinaBelova | looks* | 15:55 |
DinaBelova | so let's move forward, as we're running out of time | 15:55 |
DinaBelova | #topic OSProfiler weekly update | 15:55 |
DinaBelova | ok, so last time we agreed that if we want to use osprofiler for tracing/profiling needs we need #1 fix it and #2 make it better | 15:56 |
DinaBelova | harlowja_at_home has created an etherpad | 15:56 |
DinaBelova | #link https://etherpad.openstack.org/p/perf-zoom-zoom | 15:56 |
harlowja_at_home | i put some code up for an idea of a different notifier that just uses files!! :-P | 15:56 |
harlowja_at_home | morezoom zoom | 15:56 |
harlowja_at_home | lol | 15:56 |
DinaBelova | harlowja_at_home - yep, saw it | 15:57 |
DinaBelova | and I left a comment - lemme create a change regarding https://github.com/openstack/osprofiler/blob/master/doc/specs/in-progress/multi_backend_support.rst first | 15:57 |
DinaBelova | not to have two drivers for backward compatibility | 15:57 |
DinaBelova | so in short - I was able to make osprofiler working ok with ceilometer events | 15:58 |
harlowja_at_home | cool | 15:58 |
DinaBelova | its limited now and some ceilometer work needs to be done now | 15:58 |
DinaBelova | one of Ceilo devs will work on it | 15:58 |
DinaBelova | and I've moved to https://github.com/openstack/osprofiler/blob/master/doc/specs/in-progress/multi_backend_support.rst task | 15:58 |
*** atuvenie_ has quit IRC | 15:58 | |
DinaBelova | harlowja_at_home - I'll ping you once I'll push the change to gerrit | 15:58 |
harlowja_at_home | kk | 15:58 |
harlowja_at_home | thx | 15:58 |
DinaBelova | so you'll be able to rebase your code | 15:58 |
DinaBelova | np | 15:58 |
harlowja_at_home | sounds good to me | 15:59 |
DinaBelova | boris-42 - did you have a chance to update the osprofiler -> oslo spec? | 15:59 |
DinaBelova | for mitaka? | 15:59 |
DinaBelova | a-ha, I see not yet | 15:59 |
DinaBelova | #action boris-42 update osprofiler spec to fit Mitaka cycle | 15:59 |
DinaBelova | ok, so we ran out of time | 16:00 |
DinaBelova | any last questions to mention? | 16:00 |
DinaBelova | thank you guys! | 16:00 |
harlowja_at_home | boris-42, where are u! | 16:00 |
harlowja_at_home | come in boris! | 16:00 |
harlowja_at_home | lol | 16:00 |
DinaBelova | :D | 16:00 |
DinaBelova | #endmeeting | 16:00 |
harlowja_at_home | :) | 16:00 |
openstack | Meeting ended Tue Nov 17 16:00:40 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:00 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/performance_team/2015/performance_team.2015-11-17-15.00.html | 16:00 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/performance_team/2015/performance_team.2015-11-17-15.00.txt | 16:00 |
openstack | Log: http://eavesdrop.openstack.org/meetings/performance_team/2015/performance_team.2015-11-17-15.00.log.html | 16:00 |
*** rohanion has quit IRC | 16:02 | |
*** mriedem has quit IRC | 16:06 | |
*** llu-laptop has quit IRC | 16:08 | |
*** mriedem has joined #openstack-performance | 16:11 | |
*** mwagner has quit IRC | 16:18 | |
*** harlowja_at_home has quit IRC | 16:23 | |
*** ozamiatin has quit IRC | 16:23 | |
*** rpodolyaka1 has quit IRC | 16:24 | |
*** rpodolyaka1 has joined #openstack-performance | 16:30 | |
*** dansmith has left #openstack-performance | 16:32 | |
*** pkoniszewski has quit IRC | 16:36 | |
*** rmart04 has quit IRC | 16:45 | |
DinaBelova | swann - are you around? | 16:47 |
*** harshs has joined #openstack-performance | 16:58 | |
*** mriedem is now known as mriedem_meeting | 16:59 | |
*** mwagner has joined #openstack-performance | 17:03 | |
*** AugieMena has quit IRC | 17:37 | |
*** amaretskiy has quit IRC | 17:37 | |
*** mriedem_meeting is now known as mriedem | 17:39 | |
*** rpodolyaka1 has quit IRC | 17:55 | |
*** markvoelker_ has joined #openstack-performance | 17:57 | |
*** lpetrut1 has joined #openstack-performance | 17:58 | |
*** lpetrut has quit IRC | 17:58 | |
*** lpetrut1 is now known as lpetrut | 17:58 | |
*** rvasilets___ has quit IRC | 17:59 | |
*** markvoelker has quit IRC | 18:00 | |
*** xek has quit IRC | 18:00 | |
*** lpetrut has quit IRC | 18:02 | |
*** abalutoiu has quit IRC | 18:09 | |
*** harshs has quit IRC | 18:22 | |
*** dims has quit IRC | 18:37 | |
*** lpetrut has joined #openstack-performance | 18:39 | |
*** dims has joined #openstack-performance | 18:40 | |
*** mriedem has quit IRC | 18:42 | |
*** mriedem has joined #openstack-performance | 18:44 | |
*** andreykurilin__ has quit IRC | 18:46 | |
*** itsuugo has joined #openstack-performance | 18:48 | |
*** boris-42 has quit IRC | 18:48 | |
*** lpetrut has quit IRC | 18:52 | |
*** itsuugo has quit IRC | 18:53 | |
*** lpetrut has joined #openstack-performance | 18:53 | |
*** itsuugo has joined #openstack-performance | 18:58 | |
*** harshs has joined #openstack-performance | 19:02 | |
*** itsuugo has quit IRC | 19:03 | |
*** manand has quit IRC | 19:10 | |
*** ozamiatin has joined #openstack-performance | 19:44 | |
*** itsuugo has joined #openstack-performance | 19:46 | |
*** itsuugo has quit IRC | 20:07 | |
*** regXboi has quit IRC | 20:11 | |
*** harshs has quit IRC | 20:13 | |
*** itsuugo has joined #openstack-performance | 20:34 | |
*** ozamiatin has quit IRC | 20:46 | |
*** rmart04 has joined #openstack-performance | 20:50 | |
*** rmart04 has left #openstack-performance | 20:52 | |
*** dims_ has joined #openstack-performance | 21:00 | |
*** dims has quit IRC | 21:02 | |
*** itsuugo has quit IRC | 21:07 | |
*** lpetrut has quit IRC | 21:12 | |
*** itsuugo has joined #openstack-performance | 21:17 | |
*** itsuugo has quit IRC | 21:21 | |
*** itsuugo has joined #openstack-performance | 21:21 | |
*** rpodolyaka1 has joined #openstack-performance | 21:23 | |
*** rpodolyaka1 has quit IRC | 21:32 | |
*** rpodolyaka1 has joined #openstack-performance | 21:44 | |
*** dims_ has quit IRC | 21:51 | |
*** rpodolyaka1 has quit IRC | 21:52 | |
*** dims has joined #openstack-performance | 21:57 | |
*** rpodolyaka1 has joined #openstack-performance | 22:06 | |
*** rpodolyaka1 has quit IRC | 22:15 | |
*** harshs has joined #openstack-performance | 22:16 | |
*** mriedem has quit IRC | 22:53 | |
*** mwagner has quit IRC | 22:55 | |
*** itsuugo has quit IRC | 22:59 | |
*** dims has quit IRC | 23:00 | |
*** dims has joined #openstack-performance | 23:01 | |
*** dims_ has joined #openstack-performance | 23:04 | |
*** dims has quit IRC | 23:07 | |
*** arnoldje has quit IRC | 23:16 | |
*** mwagner has joined #openstack-performance | 23:31 | |
*** redixin has quit IRC | 23:41 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!