Thursday, 2018-12-20

*** jaypipes_ has joined #openstack-placement01:18
*** mriedem has quit IRC01:18
*** jaypipes has quit IRC01:20
*** tetsuro has joined #openstack-placement02:00
*** bhagyashris has joined #openstack-placement02:04
openstackgerritJack Ding proposed openstack/nova-specs master: Select cpu model from a list of cpu models  https://review.openstack.org/62095903:03
*** tetsuro has quit IRC04:09
*** sean-k-mooney_ has joined #openstack-placement04:45
*** sean-k-mooney has quit IRC04:45
*** tetsuro has joined #openstack-placement05:39
*** mgagne has quit IRC06:33
*** mgagne has joined #openstack-placement06:40
*** tetsuro has quit IRC07:04
*** gryf is now known as _gryf07:15
*** _gryf is now known as gryf07:16
*** cdent has joined #openstack-placement07:42
*** avolkov has joined #openstack-placement07:49
*** tetsuro has joined #openstack-placement07:50
*** helenafm has joined #openstack-placement08:14
openstackgerritYongli He proposed openstack/nova-specs master: add spec "show-server-numa-topology"  https://review.openstack.org/61225608:14
*** tetsuro_ has joined #openstack-placement08:46
*** tetsuro has quit IRC08:48
*** bhagyashris has quit IRC09:06
*** tetsuro_ has quit IRC09:22
*** ttsiouts has joined #openstack-placement09:26
*** takashin has left #openstack-placement09:30
*** e0ne has joined #openstack-placement09:53
*** e0ne has quit IRC10:29
*** e0ne has joined #openstack-placement10:30
*** ttsiouts has quit IRC11:10
*** ttsiouts has joined #openstack-placement11:11
*** ttsiouts has quit IRC11:15
*** fanzhang has joined #openstack-placement11:19
fanzhanghi, team placement, recently we use rally to do a concurrency test booting total 1000 vms which concurrency=45 and times=25. Only very few vms (like 4/1000) got NoValidHost error. From the scheduler log, we got http://paste.openstack.org/show/737782/ , but placement service is up and good. Any ideas why this happens and how to solve it?11:23
cdentfanzhang: thanks, reading the paste11:24
cdentfanzhang: is this with master, or an earlier release? What web server set up are you using with placement11:25
cdentThat error message suggests that there's a proxy in there somewhere, and the proxy is overloaded (perhaps apache2 in front of mod wsgi?)11:26
fanzhangcdent thanks, we are currently using queens-17.0.3 and apache httpd for web server11:26
fanzhangyes, we have haproxy in front of 3 controller nodes11:26
cdentif that's the case you'll want to see where one of those request is dying: at haproxy or at apache11:27
cdentand adjust the configuration on those11:27
cdentplacement itself should be capable of scaling horizontally as much as you like, but the things in front of it need to have their configurations adjust so they can accept or queue socket connects fast enough to handle what you are doing11:27
cdentThe BadStatusLine is almost always that a proxy gave up11:28
cdentI've had some luck using different mpm modules with apache, depending on the environment11:29
fanzhangthanks, earlier today, we checked the haproxy logs, and filtered the log manually by time. Looks like haproxy was cool, and got response from placement with http code 200 and len(body) 16622.11:30
cdentAnother possibility is that the nova-scheduler process was unable to deal and dropped the connection before reading the results (from the haproxy)11:33
cdentare you running more than one nova-scheduler process?11:33
fanzhangyes, on nova-scheduler on a controller node11:34
fanzhangtotal 3 nova-scheduler process11:34
fanzhangoh, btw, there are many warnings in nova-scheduler logs, like 2018-12-20 15:04:50.073 95480 DEBUG nova.api.openstack.placement.wsgi_wrapper [req-2ec0e1b7-2e1f-4d1d-86a0-352b9059890e dae40408cd614d948d8b6756d698e18b 7e64659205d1432c9a49b94534db5344 - default default] Placement API returning an error response: Inventory changed while attempting to allocate: Another thread concurrently updated the data. Please retry your update call_func11:35
fanzhang /usr/lib/python2.7/site-packages/nova/api/openstack/placement/wsgi_wrapper.py:3111:35
cdentthat's excpected11:35
cdentsorry, expected11:35
cdentwhat's happening there is that the state of the resource providers and their inventory is changing while you are making allocations and if your view and the server's view is different the server will tell you with a 409 response11:36
cdentthat handling gets a bit more clean and clear in rocky11:36
fanzhangok...I see, an expected conflict exception it is, right? About the 1st question, I googled a lot, and I assumed that it's the 'server' who killed the connection, but had no clue why.11:37
*** tetsuro has joined #openstack-placement11:38
cdentfanzhang: yeah the notion of 'server' gets pretty messy when two proxies are involved. since you're seeing that the reponse is making it back to the haproxy, that means it is either dying there or at nova-scheduler11:40
cdentsince you're doing scale work with placement, you might find this (rather old) blog posting on the topic interesting: https://anticdent.org/placement-scale-fun.html11:41
*** cdent_ has joined #openstack-placement11:44
fanzhangcdent thanks, will read it later, but if currently we don't do scale work, would you please share some suggestions we can at least try to make assumptions above more clear? Like wanting to find out if it's haproxy or nova-scheduler doing the 'dirty' work?11:44
*** cdent has quit IRC11:46
*** cdent_ is now known as cdent11:46
fanzhangin case that you miss the last reply: thanks, will read it later, but if currently we don't do scale work, would you please share some suggestions we can at least try to make assumptions above more clear? Like wanting to find out if it's haproxy or nova-scheduler doing the 'dirty' work?11:47
cdentfanzhang: it's hard to guess what is going on. it sounds like you've already traced request ids on all the services and narrowed it down somewhat. If you can narrow a bit more that would be good.11:49
cdentyou might also try implementing the change in this patch: https://review.openstack.org/#/c/159382/11:49
cdentto see if the problem is an overloaded nova-scheduler. Do you have any data on the cpu load on the machines where these services are running?11:50
*** ttsiouts has joined #openstack-placement11:52
fanzhang48 cpus, 512G ram as one controller node, cpu load is high as currently some folks are doing other concurrency tests, now the load average is 15.59, 17.90, 12.42.11:53
fanzhangfrom zabbix dashboard, the time we met the warning, cpu utils of 3 nodes were avg 16,17,2011:55
fanzhangyes, I traced a lot by request id, and we are currently trying to get global request id work. The patch above looks useful, thanks so much cdent :)12:01
fanzhangmy head is total mess right now...I will go through your advice again and try to grab a cup of coffee :(12:06
cdentI know how that can be :)12:09
fanzhangthanks again cdent12:10
cdentIf you get a bit further and have more questions, I should be around, or post to the mailing list. I think it basically comes down to the fact that at a certain point something is giving up because there are too many active tcp sockets in action12:10
cdentit's not a specific problem with placement, but rather the whole system12:11
fanzhangyes, we adjusted several times about the whole openstack cluster, it's really not easy...12:12
cdentthere are so many moving parts12:13
fanzhangoh, it just comes up another question, https://bugs.launchpad.net/openstack-manuals/+bug/1761649, any ideas why we should disable `option httpchk` for nova-placement on haproxy.cfg ?12:13
openstackLaunchpad bug 1761649 in openstack-manuals "HAProxy in openstackhaguide nova placement api" [Medium,Triaged]12:13
cdenthmmm, I wasn't aware of that bug, let me read through it12:16
*** tssurya has joined #openstack-placement12:18
fanzhangwe just hit this bug after adding `option httpchk` to haproxy.cfg and  all placement services are going crazy. Simple removing  it helps, and I googled and here it is.12:19
cdentIt's unclear how that would be related, unless the issue with the bad status line is causing the httpchk to invalidate remotes incorrectly12:20
cdentbut again: that all suggests to me that the haproxy is itself overloaded12:21
fanzhanghmmm, really need to take a break here. Thanks so much for your time cdent :)12:23
fanzhangyou really help a lot :)12:23
cdentyou're welcome. good luck.12:24
*** e0ne has quit IRC13:19
*** e0ne has joined #openstack-placement13:33
*** mriedem has joined #openstack-placement13:38
*** helenafm has quit IRC13:59
*** e0ne has quit IRC14:11
*** e0ne_ has joined #openstack-placement14:11
openstackgerritMerged openstack/nova-specs master: Spec: Support filtering by forbidden aggregate  https://review.openstack.org/60335214:30
*** ttsiouts has quit IRC14:33
*** ttsiouts has joined #openstack-placement14:34
*** ttsiouts has quit IRC14:38
*** ttsiouts has joined #openstack-placement14:43
*** helenafm has joined #openstack-placement14:53
openstackgerritMerged openstack/nova-specs master: Expose virtual device tags in REST API  https://review.openstack.org/39393014:55
*** mriedem is now known as mriedem_afk15:11
*** tssurya_ has joined #openstack-placement15:16
*** tssurya has quit IRC15:18
*** evrardjp has quit IRC15:37
*** evrardjp has joined #openstack-placement15:42
*** efried has joined #openstack-placement16:26
*** tetsuro has quit IRC16:27
*** mriedem_afk is now known as mriedem16:32
*** tssurya_ has quit IRC16:52
*** avolkov has quit IRC16:52
*** gibi is now known as gibi_off16:55
*** helenafm has quit IRC16:55
*** e0ne_ has quit IRC16:59
*** rubasov has quit IRC17:01
openstackgerritJack Ding proposed openstack/nova-specs master: Select cpu model from a list of cpu models  https://review.openstack.org/62095917:13
*** tssurya_ has joined #openstack-placement17:19
*** cdent has quit IRC17:20
*** tssurya_ is now known as tssurya17:32
*** rubasov has joined #openstack-placement17:33
openstackgerritsean mooney proposed openstack/nova-specs master: Add spec for sriov live migration  https://review.openstack.org/60511617:41
*** sean-k-mooney_ is now known as sean-k-mooney17:44
*** ttsiouts has quit IRC17:50
*** ttsiouts has joined #openstack-placement17:51
*** ttsiouts has quit IRC17:55
*** efried has quit IRC17:59
*** efried has joined #openstack-placement17:59
*** avolkov has joined #openstack-placement18:00
*** avolkov has quit IRC18:13
*** jaypipes_ is now known as jaypipes18:28
*** avolkov has joined #openstack-placement18:34
*** e0ne has joined #openstack-placement18:49
*** tssurya has quit IRC19:07
*** e0ne_ has joined #openstack-placement19:25
*** e0ne has quit IRC19:25
*** N3l1x has joined #openstack-placement19:47
*** takashin has joined #openstack-placement20:51
*** e0ne_ has quit IRC21:21
openstackgerritMerged openstack/nova-specs master: Select cpu model from a list of cpu models  https://review.openstack.org/62095922:22
*** N3l1x has quit IRC22:26

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!