Tuesday, 2018-11-27

*** tetsuro has joined #openstack-placement00:04
*** mriedem_away has quit IRC00:23
*** takashin has joined #openstack-placement02:19
openstackgerritTetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB  https://review.openstack.org/62021605:44
openstackgerritTetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB  https://review.openstack.org/62021606:01
openstackgerritTetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB  https://review.openstack.org/62021607:01
*** tssurya has joined #openstack-placement08:13
*** tssurya has quit IRC08:49
*** cdent has joined #openstack-placement09:01
*** tssurya has joined #openstack-placement09:01
openstackgerritMerged openstack/placement master: Documentation cleanup: front page  https://review.openstack.org/61927309:47
cdenthuzzah09:48
*** takashin has left #openstack-placement10:04
openstackgerritMerged openstack/placement master: Add a doc describing a quick live environment  https://review.openstack.org/61334310:22
openstackgerritChris Dent proposed openstack/placement master: Allow placement to start without a config file  https://review.openstack.org/61904910:38
cdentgibi: fixed that missing word, thanks10:38
cdentgibi: can hyou have a look at https://review.openstack.org/#/c/619121/ ? Is pretty minor changes but may help with figuring out the race that is happening on https://review.openstack.org/#/c/617941/10:39
*** sean-k-mooney has quit IRC11:04
*** sean-k-mooney has joined #openstack-placement11:08
gibicdent: thanks for the update, plugged the +2 back11:31
cdentthanks!11:31
gibicdent: I've queued https://review.openstack.org/#/c/617941/ for review as well11:31
cdentdouble thanks!11:31
cdentI have the nova functional tests running in an infinite loop (breaking on error) trying to make the race happen and I'm not making much progress :(11:32
gibicdent: I can try to run the test in the background too, see if I'm more lucky11:33
cdenthmm, just got a failure. 'ServerGroup policy is not supported: ServerGroupAntiAffinityFilter not configured' which suggests CONF is getting messed up, which is not surprising (this doesn't appear to have anything to do with placement itself, but perhaps with conf being migrated back and forth)11:35
gibicdent: I saw that group failure before so that is definitely a thing11:37
cdenta different thing :(11:37
gibiserver group checks are using global state so I'm also not suprised11:37
cdentso that particular problem, at least, is unlikely to be driven by the placement changes if it's been around before?11:38
gibicdent: yes, I saw that before the placement separation become a reality11:39
cdentk11:39
* cdent thinks11:39
gibicdent: I think this is the place causing the group failure https://github.com/openstack/nova/blob/1a1ea8e2aa66a2654e6cc141c735e47bbd8c4fef/nova/scheduler/utils.py#L80511:39
gibicdent: nasty globals11:39
cdentewww11:39
cdentyeah11:39
gibicdent: did you run the nova functional on https://review.openstack.org/#/c/617941 with or without https://review.openstack.org/#/c/619121/ ?11:42
cdentboth11:42
cdentright now I'm running with11:42
gibiack11:43
gibiI first try without11:43
cdenthmm. I can break that server group stuff regularly. I'll look into that... later11:45
openstackgerritTetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB  https://review.openstack.org/62021611:45
gibiit wasn't fequent enough in my env to push me towards trying to fix it11:46
gibicdent: ... and at the first functional run I now hit the group race :)11:47
* cdent facepalms11:48
cdentIt looks like 'class ServerGroupTestBase' ought to be doing some cleanup on those globals. Some of the test mock them, but not all of them.11:55
* cdent tries it12:11
*** tetsuro has quit IRC12:13
* cdent needs more hardware12:14
cdentlots more hardware12:14
* gibi is lucky enough to have access to a machine in the OPNFV lab that has 88 x86 cores12:20
cdentwow. nice. I max out 1612:21
cdentand since I've got that one in the infinite loop, I'm running some other tests on my laptop12:21
cdentwhere I have 412:21
cdentoh great. the more I poke at this, the worse it gets. I've got server group tests failing regularly, on nova master12:22
sean-k-mooneygibi: hehe when i was still at intel i had a couple of those :) i miss my 88 core 192GB ram compute nodes12:25
cdentluxury12:26
sean-k-mooneygibi: also 88 core machines really show why defaulting serivice workers to $(nproc) is a dumb idea12:26
cdentquite12:26
cdentit's a dumb idea in any situation12:26
cdentsean-k-mooney: if you have a clean nova master lying around can you do a 'tox -efunctional test_get_groups_all_projects' let me know if it is happy?12:27
sean-k-mooneyi had 30% idel cpu usage in the server because of 88 gnocci metrict collectors for like 5 mins till i deleted gnocci12:27
* cdent spins up a few more vms12:28
sean-k-mooneyam sure i can try it unfortunetly it will be runnign on my personal hardware or a vm since i dont have beefy servers anymore12:29
cdentno problem, I just want to confirm that the issue I'm seeing is just me12:30
* cdent also needs more spindles12:31
sean-k-mooneyit ran without errors12:33
sean-k-mooneywhat me to checkout a patch and run it again12:33
sean-k-mooneycdent: full out put incase that helps http://paste.openstack.org/show/736081/12:35
cdentnope that's fine, thanks for doing that12:36
sean-k-mooneyno worries happy to help12:36
sean-k-mooneyis that the test that is racing with the placemnet fixture12:37
cdentno, I was going down the rabbit hole with regard to the server group tests, trying to get their own racing out of the picture12:39
gibicdent: I've tried run the test in the same order as they was run in the failed gate job but it does not reproduce the problem. So I think it is not just two interfeering test cases12:41
cdentgibi: yeah. it's...weird12:42
cdentgibi: when comparing master and the placement fixture branch, the ServerGroup tests are much more likely to fail on the latter13:01
cdentwhich is yet more weird13:01
gibicdent: your patch removes a lot of tests but most of them is unit test or gabbit so, yeah weird13:03
gibicdent: btw I managed to produce another test failure with https://review.openstack.org/#/c/617941/ without the placement change. See http://paste.openstack.org/show/736086/13:04
cdentoh really13:05
cdentthat is useful13:05
cdentyeah, so that's the original problem, before I tried adding the fixture tidy up patch as a depends-on13:06
cdentif we can't cause that to happen with the depends-on then maybe it helps :)13:06
gibiOK, I will start running the patch with its dependency13:06
cdentthanks for doing this gibi, I think I might go crazy if I kept on with this stuff solo.13:08
gibicdent: running nova functional with the placement deps still fails with TrasactionFactory is already started: http://paste.openstack.org/show/736088/13:33
cdentgibi: I guess that's to be expected13:54
cdentany ideas?13:54
sean-k-mooneyyou shoudl not get teh TrasactionFactory error anymore after teh run_once decorator13:55
gibicdent: I thought that your placement deps is fixing this but I read the patcha and now I'm not sure13:56
cdentsean-k-mooney: we're resetting that, on purpose13:56
sean-k-mooneyoh ok13:56
cdentsean-k-mooney: we have to because we need up to 3 different engine-types in the same process13:56
cdentit's a mess13:57
sean-k-mooneyi assume the different engine types are for different tests?13:59
cdentyes14:02
cdentgibi: responded to your comments on the fixture adjustments. I hadn't really expected them to fix the current issues. It was more of a wild guess, since I already had that code around for a few days and I was hoping (in a useless way) that have fewer globals would make a difference14:10
gibicdent: OK, I thanks, missunderstood the goal of the placement patch a bit14:12
cdenton the ServerGroup stuff I think the issue may be with policy file handling14:15
*** mriedem has joined #openstack-placement14:18
cdentgibi: I think one of the several factors here _may_ policy handing having global conf in itself14:23
cdentbut I'm not really clear. Unfortunately I have to do an internal thing before the end of day tomorrow so I need to drop this now, if you figure something out, feel free to fix it, or leave your notes on the changes14:25
gibicdent: ack, I also not promise too much progress14:25
cdent:)14:25
cdentI keep hitting another variable and falling in a hole, so maybe after a break I'll figure out a way to narrow things14:26
gibi:)14:26
cdentmriedem or dansmith you may have thoughts on https://review.openstack.org/#/c/620216/14:28
tssuryaefried: so once we disable the whole refreshing, the only call to placement during periodic update would be this periodic checker https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L781 for allocations14:28
cdent(using stamp after migration)14:28
cdentback after a while14:29
* cdent waves14:29
tssuryaefried: I do see jaypipes's comment https://github.com/openstack/nova/blob/1e823f21997018bcd197057ebd4d6207a5c54403/nova/compute/resource_tracker.py#L1237 about this being "sucky code"14:29
*** cdent has quit IRC14:29
tssuryaefried: but yea probably its a good idea to have that query every 60secs to keep it consistent14:29
efriedtssurya: Catching up...14:41
efriedtssurya: Yeah, it looks like that will still happen.14:50
efriedIt should be noted that there's a pretty sharp dividing line between [providers, inventories, traits, aggregates] and [consumers/instances, allocations] in terms of how and where we query, store, and use the information in the resource tracker.14:50
efriedTo wit: we cache the former in the ProviderTree object in the report client, let the virt driver muck with them (via update_provider_tree), and assume they won't change out of band.14:50
efriedWhereas the latter we don't cache, and we treat it as much more dynamic and subject to changing, e.g. due to migrations of various forms.14:50
efriedThat said, once we flush the last vestige of "doubling allocations" (I think that's in the evacuate path? gibi?) we may be able to do away with some of this "sucky" code.14:51
mriedemwe also double allocations on resize to same host14:52
mriedemala https://review.openstack.org/#/c/619123/14:52
efriedmriedem: ack, thx. We're on the road to fixing that, yah?14:53
efriedthough I guess we haven't figured out how we're gonna do it yet14:53
gibiefried: yepp, the only real doubling is during evacuation the rest is using the migration_uuid as a consumer for the dest host allocation14:54
gibimriedem: resize same host still use two different consumer14:54
gibimriedem: as far as I know14:54
efriedbut it's still effectively doubling the allocation, because both consumers are on the same host.14:54
gibiefried: from host perspective it is doubled from consumer perspective it is not :)14:55
gibiefried: but I agree14:55
mriedemefried: well i've got the functional regression recreate there, and the bug reported, with hacky ideas in the bug report, but i'm not actively working on fixing that yet14:55
mriedemgibi: yeah correct14:55
gibiso evac doubles from consumer perspective but not from host perspective. The resize to same host doubles it from host perspective but not from consumer perspective. what a nice complete coverage of possibilities :)14:57
*** ttsiouts has joined #openstack-placement15:58
*** dansmith has quit IRC16:02
*** dansmith has joined #openstack-placement16:02
openstackgerritMerged openstack/placement master: Clean up and clarify tox.ini  https://review.openstack.org/61171916:14
*** ttsiouts has quit IRC17:06
*** ttsiouts has joined #openstack-placement17:07
*** ttsiouts has quit IRC17:11
openstackgerritJack Ding proposed openstack/nova-specs master: [WIP] Flavor Extra Spec and Image Properties Validation  https://review.openstack.org/61854217:12
*** cdent has joined #openstack-placement17:16
openstackgerritChris Dent proposed openstack/placement master: Start a contributor goals document  https://review.openstack.org/61881117:26
cdentgibi: I've got a demo on the lower-constraints thing of the reason why of the install command17:39
openstackgerritChris Dent proposed openstack/placement master: Correct lower-constraints.txt and the related tox job  https://review.openstack.org/61455917:43
openstackgerritArtom Lifshitz proposed openstack/nova-specs master: Re-propose numa-aware-live-migration spec  https://review.openstack.org/59958717:49
*** tssurya has quit IRC18:50
* cdent watches --until-failure not fail19:12
edleafeNot failing is failure?19:18
cdentI'm unsure on how much I need to convince myself19:19
cdentand when I do, I then need ot convince myself in the other direction19:20
mriedemthis might make you feel better https://review.openstack.org/#/c/617662/19:42
cdenthurrah19:49
cdentwow, that took a long time, but failed20:10
cdentwhich disproves one hypothesis20:11
mriedemno jaybird huh20:32
cdenti'm not even in a maze of twisty passages, I'm in one of those smelly ponds fully of stank and ugh20:42
efriedmriedem: while I've got it in front of me, would you please hit https://review.openstack.org/#/c/619299/ so we don't somehow forget it before the release? Easy peasy.20:58
mriedemwhy does that depend on a grenade change?21:01
cdentmriedem: that can probably go away now21:02
cdentIt was part of trying to figure out the issue with swift21:02
mriedemok so totally unrelated21:02
cdentno,21:03
cdentthe only way it could be properly tested was if tempest and grenade were running21:03
cdentand those were not working until my swift fix21:03
cdentso it is unrelated _now_21:03
cdentbut it wasn't then21:03
cdentso if one of the two of you can clean it out, that would be awesome, as I'm in a deep hole21:03
openstackgerritMerged openstack/placement master: Add integrated-gate-py35 template to .zuul.yaml  https://review.openstack.org/61756521:22
mriedemdansmith: fyi placement is now gating on tempest/devstack and grenade ^21:23
dansmithcool21:23
mriedemwe might want to send a thing to the ML to let people know that devstack is now using extracted placement...21:24
mriedemdevstack and grenade21:24
mriedemin case weird issues crop up21:24
mriedemwho's it?21:24
mriedemi guess i can do it21:26
mriedemefried: cdent: ok comments inline on https://review.openstack.org/#/c/619299/21:33
cdentefried: i'm not going to be able to get to that until tomorrow if you feel inclined to do it today. I agree with mriedem  says21:36
mriedemi was going to push up a change to drop [keystone] and fix the missing [keystone_authtoken] entry in the config docs21:36
mriedemthen i think i can just tweak the commit message and such and we're happy21:36
cdentoh if your'e happy to do that, then awesome21:37
mriedemit's better than reviewing specs21:37
cdenttru21:38
cdentthe thing I'm doing now will likely need to base off that as it's moaning about duplicated config21:39
openstackgerritMatt Riedemann proposed openstack/placement master: Remove keystoneauth1 opts from placement config group  https://review.openstack.org/61929921:46
openstackgerritMatt Riedemann proposed openstack/placement master: Remove keystoneauth1 opts from placement config group  https://review.openstack.org/61929921:56
openstackgerritMatt Riedemann proposed openstack/placement master: Remove [keystone] config options from placement  https://review.openstack.org/62041221:56
efriedcdent: Are we going to need ksa adapter opts for anything from placement? I wouldn't have thought so, right?22:08
cdentit doesn't talk out, so I reckon no22:08
efriedmriedem: make https://review.openstack.org/620412 bigger, see comment.22:08
efriedcdent: ^22:08
cdentyeah, agree22:09
mriedemit's just never enough is it22:12
efriedmriedem, meet world.22:12
cdentless code mmm good22:13
efried++22:13
efriedWho wrote that pos module anyway?22:13
mriedemyeah yeah i'll do it after i'm done shitting on something else atm22:13
openstackgerritMatt Riedemann proposed openstack/placement master: Remove [keystone] config options from placement  https://review.openstack.org/62041222:25
efried+2, yay.22:32
*** mriedem has quit IRC23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!