harlowja_ | boris-42 whats up dawg | 00:21 |
---|---|---|
*** mdorman has quit IRC | 00:24 | |
SpamapS | klindgren: I'm taking a look at how much better it might get with msgpack | 00:47 |
klindgren | kk - from the server we jsut added to handle conductor loads: http://paste.ubuntu.com/13243560/ | 00:51 |
klindgren | ^^ All day everyday | 00:54 |
SpamapS | klindgren: I'd be interested to see what the rate of messages going through rabbitmq is. | 00:55 |
SpamapS | klindgren: guessing it is very high. | 00:55 |
klindgren | 500 - 1k messages/s | 00:55 |
klindgren | and thats just for nova since its just the nova-cell | 00:56 |
SpamapS | right, so, 1000/s, if each json blob must be deserialized, turned into an object, dealt with, answer serialized.. | 00:59 |
SpamapS | guessing kilo added something that makes each message deserialized twice or using a different, less performance deserializer. | 00:59 |
harlowja_ | ya, i guess it becomes a question of which part of 'json blob must be deserialized, turned into an object, dealt with, answer serialized' is the problem | 01:08 |
harlowja_ | SpamapS might be easy just to plugin https://github.com/openstack/oslo.serialization/blob/master/oslo_serialization/msgpackutils.py#L336 | 01:09 |
harlowja_ | ^ which is msgpack | 01:09 |
harlowja_ | + some nice python extensions to handle more python types | 01:09 |
SpamapS | actually | 01:09 |
SpamapS | before that | 01:09 |
SpamapS | https://gist.github.com/lightcatcher/1136415 | 01:09 |
SpamapS | this suggests that we're doing it wrong | 01:09 |
SpamapS | favoring built in python json | 01:10 |
SpamapS | instead of going to faster external libs | 01:10 |
SpamapS | especially for serializing | 01:10 |
SpamapS | harlowja_: plugging in msgpack is what I"m going to do if I can get devstack to actually work | 01:11 |
* SpamapS starts on a fresh vm | 01:11 | |
harlowja_ | ya, SpamapS just plugin https://github.com/openstack/oslo.serialization/blob/master/oslo_serialization/msgpackutils.py | 01:11 |
harlowja_ | i made that just for u | 01:11 |
klindgren | what version of oslo.messaging is msgpack available on? | 01:11 |
harlowja_ | lol | 01:11 |
klindgren | or oslo.serialization | 01:12 |
harlowja_ | oslo.serialization, ummm, i forget when i made that | 01:12 |
harlowja_ | some version ago, lol | 01:12 |
SpamapS | looks like it's been there since what, january? | 01:12 |
harlowja_ | SpamapS it handles https://github.com/openstack/oslo.serialization/blob/master/oslo_serialization/msgpackutils.py#L288 those nativfe types | 01:12 |
klindgren | running: oslo.serialization==1.4.0 | 01:12 |
harlowja_ | perhap | 01:12 |
harlowja_ | oslo.messaging using msgpack though will require more work | 01:12 |
harlowja_ | medhi was trying to add that in, but didn't think it goet in | 01:12 |
harlowja_ | https://review.openstack.org/#/c/151300/ ----> abandoned :( | 01:13 |
SpamapS | oops, time to go fetch kids | 01:13 |
* SpamapS disappears | 01:13 | |
klindgren | kk | 01:13 |
klindgren | eitherway 1.4.0 doesn't have it :-D | 01:13 |
klindgren | it is in 1.5 | 01:14 |
harlowja_ | k | 01:14 |
*** paco20151113 has joined #openstack-performance | 01:26 | |
*** arnoldje has joined #openstack-performance | 02:14 | |
*** arnoldje has quit IRC | 02:30 | |
*** boris-42 has quit IRC | 02:36 | |
*** pasquier-s has quit IRC | 02:37 | |
*** ozamiatin has joined #openstack-performance | 02:38 | |
*** pasquier-s has joined #openstack-performance | 02:40 | |
*** ozamiatin has quit IRC | 02:41 | |
*** boris-42 has joined #openstack-performance | 02:43 | |
*** harshs has joined #openstack-performance | 03:28 | |
*** markvoelker has quit IRC | 03:39 | |
*** badari has quit IRC | 04:01 | |
*** markvoelker has joined #openstack-performance | 04:40 | |
*** markvoelker has quit IRC | 04:44 | |
*** boris-42 has quit IRC | 04:48 | |
*** harshs has quit IRC | 05:09 | |
*** harshs has joined #openstack-performance | 06:18 | |
*** harshs has quit IRC | 06:39 | |
*** markvoelker has joined #openstack-performance | 06:41 | |
*** markvoelker has quit IRC | 06:46 | |
*** ozamiatin has joined #openstack-performance | 07:17 | |
*** ozamiatin has quit IRC | 07:52 | |
*** ozamiatin has joined #openstack-performance | 08:10 | |
*** itsuugo has joined #openstack-performance | 08:37 | |
*** markvoelker has joined #openstack-performance | 08:41 | |
*** ozamiatin has quit IRC | 08:42 | |
*** rmart04 has joined #openstack-performance | 08:42 | |
*** markvoelker has quit IRC | 08:46 | |
*** itsuugo has quit IRC | 08:48 | |
*** itsuugo has joined #openstack-performance | 08:48 | |
*** amaretskiy has joined #openstack-performance | 09:27 | |
*** itsuugo has quit IRC | 09:29 | |
*** markvoelker has joined #openstack-performance | 10:02 | |
*** markvoelker has quit IRC | 10:07 | |
*** itsuugo has joined #openstack-performance | 10:29 | |
*** ozamiatin has joined #openstack-performance | 10:32 | |
*** aojea has joined #openstack-performance | 10:33 | |
*** itsuugo has quit IRC | 10:33 | |
*** redixin has joined #openstack-performance | 10:45 | |
*** ozamiatin has quit IRC | 10:46 | |
*** ozamiatin has joined #openstack-performance | 10:51 | |
*** aojea has quit IRC | 11:02 | |
*** paco20151113 has quit IRC | 11:05 | |
*** itsuugo has joined #openstack-performance | 11:08 | |
*** itsuugo has quit IRC | 11:52 | |
*** itsuugo has joined #openstack-performance | 12:00 | |
*** markvoelker has joined #openstack-performance | 12:34 | |
*** redixin has quit IRC | 12:37 | |
*** markvoelker has quit IRC | 12:39 | |
*** itsuugo has quit IRC | 12:45 | |
*** rvasilets has joined #openstack-performance | 12:50 | |
*** rvasilets has quit IRC | 13:38 | |
*** redixin has joined #openstack-performance | 13:41 | |
*** itsuugo has joined #openstack-performance | 13:45 | |
*** xek has quit IRC | 13:49 | |
*** itsuugo has quit IRC | 13:50 | |
*** rmart04 has quit IRC | 13:56 | |
*** rmart04 has joined #openstack-performance | 13:57 | |
*** xek has joined #openstack-performance | 14:18 | |
*** markvoelker has joined #openstack-performance | 14:34 | |
*** badari has joined #openstack-performance | 14:35 | |
*** markvoelker has quit IRC | 14:39 | |
*** markvoelker has joined #openstack-performance | 14:42 | |
*** mriedem has joined #openstack-performance | 14:59 | |
*** itsuugo has joined #openstack-performance | 15:00 | |
*** itsuugo has quit IRC | 15:01 | |
*** itsuugo has joined #openstack-performance | 15:02 | |
*** regXboi has joined #openstack-performance | 15:04 | |
*** arnoldje has joined #openstack-performance | 15:31 | |
*** mdorman has joined #openstack-performance | 15:49 | |
*** harshs has joined #openstack-performance | 15:49 | |
*** klindgren_ has joined #openstack-performance | 16:00 | |
*** klindgren has quit IRC | 16:01 | |
*** boris-42 has joined #openstack-performance | 16:02 | |
*** harshs has quit IRC | 16:10 | |
*** itsuugo has quit IRC | 16:11 | |
*** rmart04 has quit IRC | 16:21 | |
*** itsuugo has joined #openstack-performance | 16:23 | |
*** harshs has joined #openstack-performance | 16:28 | |
*** harshs has quit IRC | 16:33 | |
*** harlowja_at_home has joined #openstack-performance | 16:38 | |
*** itsuugo has quit IRC | 16:55 | |
*** itsuugo has joined #openstack-performance | 17:26 | |
*** harshs has joined #openstack-performance | 17:32 | |
*** amaretskiy has quit IRC | 17:36 | |
*** mriedem is now known as mriedem_lunch | 17:45 | |
*** ozamiatin has quit IRC | 17:45 | |
*** rmart04 has joined #openstack-performance | 17:47 | |
*** rmart04 has quit IRC | 17:47 | |
*** itsuugo has quit IRC | 17:47 | |
*** rmart04 has joined #openstack-performance | 17:48 | |
*** rmart04 has quit IRC | 17:48 | |
SpamapS | klindgren_: so I'm playing with nova-conductor by slamming it with nova boot/list commands | 18:41 |
SpamapS | klindgren_: in a small scale, nova-api ends up chewing up all of the CPU | 18:41 |
SpamapS | klindgren_: I suspect you have _many_ API's compared to your few conductors. Yes? | 18:42 |
klindgren_ | SpamapS, k I would say most of our stuff is requests for metadata | 18:42 |
klindgren_ | its roughly the same ratio honestly | 18:42 |
klindgren_ | 3 physical server running about 40api services | 18:42 |
SpamapS | but the nova-api's aren't egging 32 CPU's? | 18:42 |
SpamapS | pegging | 18:42 |
klindgren_ | but our conductor load isn't coming from boots | 18:43 |
SpamapS | klindgren_: have you considered configdrive... ;) | 18:43 |
klindgren_ | IE we only boot like 10 vm's an hour or so | 18:43 |
klindgren_ | we are using config drive | 18:43 |
SpamapS | oh, where are the metadata reqs coming from? | 18:43 |
klindgren_ | well people run puppet | 18:43 |
SpamapS | You mean like, other apps pulling it out? | 18:43 |
klindgren_ | puppet uses factor | 18:44 |
klindgren_ | with the ec2 metadata turned on by default | 18:44 |
SpamapS | ah so you use configdrive, but you allow metadata service | 18:44 |
SpamapS | ACK | 18:44 |
klindgren_ | yea | 18:44 |
SpamapS | ok let me cook up a rally scenario for metadata gets then | 18:44 |
klindgren_ | we also may have done some of this to ourselves, we have a cronjob running in the vm's that poll metadata every 10 minutes or so | 18:45 |
klindgren_ | but we put a random offset on the cronjob and we turned on memcache for the metadata services | 18:46 |
SpamapS | honestly, metadata should be really fast | 18:47 |
SpamapS | kind of surprised neutron-metadata-agent doesn't cache it (and the lookup by MAC) | 18:47 |
SpamapS | klindgren_: oh so wait, memecache _is_ caching it, weird | 18:48 |
klindgren_ | but I would say the people running puppet set for puppet to reach out to the puppetmaster every 2 minutes is a bigger load. As puppet will run factor each time it run | 18:48 |
klindgren_ | we aren't running neutron-metadata-agent | 18:48 |
SpamapS | oh? | 18:48 |
klindgren_ | we are running nova-metadata on every compute node | 18:48 |
SpamapS | that is interesting | 18:48 |
SpamapS | I like that better honestly. :) | 18:49 |
klindgren_ | we run with flat networking | 18:49 |
SpamapS | of course you do. :) | 18:49 |
klindgren_ | I hsould say flat provider networks | 18:49 |
klindgren_ | kiss :-D | 18:49 |
SpamapS | Right I understand, thats, IMO, the only sane way to play. | 18:49 |
SpamapS | (We're building infra-cloud the same) | 18:49 |
klindgren_ | cool - yea we have been happy with it | 18:50 |
SpamapS | let the tenants fend for themselves! ;) | 18:50 |
SpamapS | its the internet yo | 18:50 |
klindgren_ | most of our tenants don't want to know about networking, or they dont care | 18:50 |
SpamapS | interesting, rally has no built in metadata scenario | 18:51 |
SpamapS | this might explain something. ;) | 18:51 |
harlowja_at_home | why are u guys kissing | 18:51 |
harlowja_at_home | thats weird | 18:51 |
harlowja_at_home | ha | 18:51 |
klindgren_ | K.I.S.S** | 18:51 |
harlowja_at_home | :-p | 18:51 |
SpamapS | harlowja_at_home: its the internet. Get used to seeing stuff you can't explain. | 18:51 |
harlowja_at_home | :) | 18:51 |
klindgren_ | Those that do want to create their own networks/routers think that somehow that will give them resource isolation from other people because its not "shared" | 18:52 |
* klindgren_ sighs | 18:52 | |
SpamapS | klindgren_: do we make them wear hats, or signs? | 18:53 |
harlowja_at_home | lol | 18:53 |
SpamapS | "I booted my server on an isolated tenant network and all I got was pwned and then they sent me this t-shirt I don't know how they got my address" | 18:53 |
SpamapS | might be a tad long for a t-shirt | 18:54 |
klindgren_ | LIke, if I created my own network, and router, no one else uses it and I will not have perofmance problems, because all of this stuff is dedicated soley to me | 18:54 |
SpamapS | maybe sweatpants and they can write that on the butt | 18:54 |
SpamapS | klindgren_: riight.. its a real, dedicated imaginary overlayed network! | 18:54 |
klindgren_ | Even though in this case the issue was the firewall ran out of out bound nat connections because someone was being an idiot | 18:54 |
klindgren_ | and it impacted the entire internal network | 18:55 |
SpamapS | ok, so it looks like rally has no benchmark anywhere for metadata | 18:55 |
SpamapS | I sense an opportunity. ;) | 18:55 |
SpamapS | klindgren_: when you say you run 'nova-metadata' on all computes, do you mean you just run nova-api on them (and redirect packets for link-local to it) ? | 18:57 |
SpamapS | oh nova-api-metadata is what you meant | 18:57 |
klindgren_ | yea -sorry nova-api-metadata on all compute nodes | 18:58 |
klindgren_ | with 169.254.1692.54 bound to loop back | 18:58 |
SpamapS | np :) | 18:58 |
klindgren_ | and the iptables rule to redirect traffic from that to the local metadata host | 18:58 |
klindgren_ | we use to run metadata centralized on a few servers per flat network | 18:59 |
klindgren_ | but that was a physical resource waste | 18:59 |
SpamapS | indeed | 19:00 |
*** badari_ has joined #openstack-performance | 19:00 | |
klindgren_ | we actually use to run nova-api-metadata, neutron-dhcp and glance on dedicated servers per flat, but moved ro running some centralized glance servers, with metadata/dhcp getting moved onto the computes | 19:00 |
klindgren_ | dhcp is only runs on a few hosts, since neutron tips over if you run it on every host | 19:01 |
harlowja_at_home | SpamapS, when are u (ibm) building out that megacloud of yours?? | 19:01 |
harlowja_at_home | in progress? | 19:02 |
SpamapS | harlowja_at_home: always | 19:03 |
*** badari has quit IRC | 19:03 | |
harlowja_at_home | whats the node count so far ;) | 19:03 |
harlowja_at_home | do share, haha | 19:04 |
*** mriedem_lunch is now known as mriedem | 19:04 | |
*** badari_ is now known as badari | 19:13 | |
*** ozamiatin has joined #openstack-performance | 19:27 | |
*** boris-42 has quit IRC | 19:28 | |
*** itsuugo has joined #openstack-performance | 19:37 | |
*** ozamiatin has quit IRC | 20:05 | |
*** harlowja_at_home has quit IRC | 20:10 | |
*** klindgren__ has joined #openstack-performance | 21:05 | |
*** klindgren_ has quit IRC | 21:06 | |
*** klindgren__ is now known as klindgren | 21:28 | |
*** med_ has joined #openstack-performance | 21:29 | |
SpamapS | klindgren: ok, with fake driver you can easily reproduce conductor-slapping with just showing/listing instances | 22:26 |
SpamapS | I only have 2 cores in my VM, and 2 nova-conductors, and they're eating up all the CPU that nova-api and nova-compute don't... | 22:26 |
klindgren | Side note - with neutron metadata calls, *really* hammer the neutron api as well | 22:27 |
SpamapS | I'll try the stupid thing, and just see if I can get oslo.serialization to use one of the faster json things | 22:27 |
klindgren | seems like every request in for metadata value grabs information on the fixed port from neutron | 22:27 |
klindgren | IE if I query for hostname - a call is still made to neutron for the port of the VM | 22:28 |
klindgren | or Availability zone | 22:28 |
SpamapS | klindgren: thats probably just making it worse. I'm running with nova-net (trying to islolate from neutron issues) | 22:29 |
klindgren | atleast as I have read the metadata code | 22:30 |
* klindgren is not a python dev | 22:30 | |
SpamapS | also, I"m doing a rally test where I create 10000 instances then list after each one.. | 22:30 |
SpamapS | so it's getting worse, and worse, and worse | 22:30 |
SpamapS | oh actually no it isn't, it's doing create, list, delete | 22:32 |
SpamapS | so I should do create(10000), and then test showing all of them | 22:34 |
*** regXboi has quit IRC | 22:39 | |
SpamapS | oh doh, it was creating them and listing longer and longer lists | 22:46 |
* SpamapS forgot --all-tenants | 22:46 | |
SpamapS | | auth_url http://192.168.122.60:5000/v2.0 | 0.00113987922668 | | 22:47 |
SpamapS | | GET /servers/detail?all_tenants=1 | 5.68457818031 | | 22:47 |
notmorgan | ooooooh i see a SpamapS and a harlowja_ | 22:53 |
SpamapS | notmorgan: o/ | 22:53 |
notmorgan | SpamapS: seriously?! 5.6? | 22:53 |
notmorgan | ugh. | 22:53 |
SpamapS | notmorgan: thats with 347 servers | 22:53 |
notmorgan | that seems.. annoying | 22:53 |
SpamapS | with 45 it was 2.5s | 22:54 |
notmorgan | still, we *should* do better than that | 22:54 |
SpamapS | loads better | 22:54 |
SpamapS | I'm profiling without changing anything now | 22:54 |
SpamapS | just profiling conductor | 22:54 |
SpamapS | though api could probably use a profile too | 22:54 |
notmorgan | oh. ok. so it sucks but the suck is mostly upfront | 22:54 |
*** mriedem has quit IRC | 22:54 | |
notmorgan | if 45 is 2.5 and approx 350 is 5.6, it's a lot of upfront suck | 22:54 |
SpamapS | well conductor is effectively a proxy | 22:54 |
notmorgan | yeah | 22:54 |
SpamapS | a REALLY heavy proxy | 22:54 |
notmorgan | i need to spend some time digging back into conductor. | 22:54 |
SpamapS | notmorgan: 10 servers was 1.5 | 22:55 |
SpamapS | notmorgan: so I think there's some log(N) scaling problems too | 22:55 |
SpamapS | good job keystone responding crazy fast. ;) | 22:55 |
notmorgan | SpamapS: yay Keystone isn't the suck point | 22:56 |
notmorgan | SpamapS: in this case... | 22:56 |
notmorgan | SpamapS: though tomorrow, i'm sure it will be | 22:57 |
notmorgan | for another thing | 22:57 |
SpamapS | wow, this is interesting | 22:57 |
SpamapS | boot and show seems to be _MORE_ painful than boot and list | 22:57 |
notmorgan | wait, what? | 22:57 |
notmorgan | how... how is ... how is that a thing? | 22:57 |
SpamapS | notmorgan: likely show shows mroe | 22:57 |
SpamapS | moar | 22:57 |
SpamapS | and more json, more packets, more messages... | 22:58 |
notmorgan | oh, wait it's a really nasssssty set of joins | 22:58 |
notmorgan | too | 22:58 |
notmorgan | not in SQL, but effectively | 22:58 |
SpamapS | yeah, so even though they're single key reads | 22:58 |
notmorgan | yah. icky | 22:58 |
klindgren | just saying nova list --all-tenants on your cloud takes over a minute | 22:58 |
klindgren | s/your/our | 22:58 |
SpamapS | klindgren: yeah not surprised. ;) | 22:59 |
SpamapS | and to be clear, thats probably not a great idea anyway | 22:59 |
klindgren | so we got pretty good at either directing people to give us the uuid of the vm - to troubleshoot issues | 22:59 |
klindgren | or | 22:59 |
klindgren | list --all-tenants --<other modifier> | 22:59 |
klindgren | like name or ip | 23:00 |
SpamapS | name lookups are pretty fast | 23:00 |
notmorgan | klindgren: that doesn't really surprise me | 23:00 |
notmorgan | klindgren: that is a massive set of cross record lookups | 23:00 |
notmorgan | SpamapS: ++ on it not being a great idea | 23:01 |
klindgren | not sure how rackspace would ever be able to to a nova list --all-tenants | 23:01 |
notmorgan | klindgren: they don't. | 23:01 |
klindgren | without waiting a few hours | 23:01 |
notmorgan | klindgren: remember they are heavily cell based too | 23:01 |
SpamapS | Thats one of those things where everything should timeout at 5s no matter what | 23:01 |
SpamapS | "you are doing something stupid" | 23:01 |
SpamapS | "or something is broken" | 23:01 |
notmorgan | SpamapS: "E_STUPID_QUESTION_TO_ASK_AN_API" | 23:02 |
klindgren | SpamapS, except when you need to find orphaned resourcres | 23:02 |
SpamapS | hm.. cProfile didn't write me a report | 23:02 |
klindgren | because deletion is what - yolo | 23:02 |
SpamapS | klindgren: yeah, I"m not saying we _can_ do it | 23:03 |
SpamapS | just that we should :) | 23:03 |
klindgren | IE give me a list of all vm's and tenants and compare the list of tenants to keystone and show me which ones are no longer around | 23:03 |
klindgren | true | 23:03 |
notmorgan | klindgren: so, in the case of orphans, honestly, this is a case where a direct DB access is better [today] | 23:03 |
SpamapS | Also thats the kind of thing that works well against a readonly slave. | 23:03 |
notmorgan | klindgren: and i'm really ok with side-band management of things like that | 23:03 |
notmorgan | SpamapS: ++ | 23:03 |
SpamapS | so you have the admin-helper-api instance that only has access to RO slaves. | 23:04 |
notmorgan | SpamapS: that could be useful too | 23:04 |
notmorgan | i'm also ok with the API being for acting on available resources, orphans are not an end-user concern in most cases. | 23:04 |
notmorgan | just from a pure philisophical standpoint | 23:05 |
klindgren | yep yep re: stand-alone admin helper api | 23:05 |
klindgren | also lets you do upgrades easier, IE you can test before allowing people back in | 23:05 |
SpamapS | ineresting | 23:06 |
SpamapS | I restarted my conductors and got this on a few instances | 23:06 |
SpamapS | | fault | {"message": "Timed out waiting for a reply to message ID 97c0189ea28f49899c885274e1661e6b", "code": 500, "details": " File \"/opt/stack/nova/nova/compute/manager.py\", line 366, in decorated_function | | 23:06 |
notmorgan | SpamapS: thats an interesting error | 23:06 |
klindgren | we see that all the itme when we restart services | 23:06 |
SpamapS | perhaps that timeout is a bit too low? | 23:07 |
klindgren | its because conductor either isn't listenting on its channel yet, or the compute nodes hasn't created the channel yet | 23:07 |
*** badari has quit IRC | 23:10 | |
notmorgan | SpamapS: don't think it's actually a timeout. it just acts as though it was. | 23:11 |
*** arnoldje has quit IRC | 23:12 | |
SpamapS | ossum | 23:12 |
*** mdorman has quit IRC | 23:14 | |
*** mdorman has joined #openstack-performance | 23:14 | |
SpamapS | hrm.. so far can't get the profiler to display its results | 23:15 |
SpamapS | aha | 23:33 |
SpamapS | adding cProfile to guru meditation works | 23:33 |
SpamapS | I may have to figure out a way to make that permanent... as it is QUITE handy to be able to turn profiling on and off | 23:33 |
*** mdorman has quit IRC | 23:36 | |
SpamapS | seems to spend a lot of time in str.join | 23:37 |
SpamapS | http://paste.openstack.org/show/478854/ | 23:40 |
SpamapS | hm | 23:41 |
SpamapS | I think I'm only profiling the parent | 23:41 |
* SpamapS tries the workers | 23:41 | |
SpamapS | http://paste.openstack.org/show/478855/ | 23:45 |
SpamapS | there's a worker | 23:45 |
SpamapS | not super helpful :-P | 23:47 |
*** harshs has quit IRC | 23:47 | |
notmorgan | SpamapS: well, it is better than nothing...but yeah not super interesting | 23:53 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!