Monday, 2019-08-12

*** e0ne has joined #openstack-placement06:16
*** e0ne has quit IRC06:17
*** e0ne has joined #openstack-placement07:49
*** helenafm has joined #openstack-placement07:55
*** cdent has joined #openstack-placement08:12
*** cdent has quit IRC08:33
*** cdent has joined #openstack-placement08:33
*** openstackgerrit has quit IRC08:45
*** irclogbot_3 has quit IRC08:49
*** irclogbot_1 has joined #openstack-placement08:52
*** tssurya has joined #openstack-placement09:40
*** artom has joined #openstack-placement10:17
cdentgibi, stephenfin : two more performance fixes10:55
cdenteasy : https://review.opendev.org/67564810:55
cdentharder: https://review.opendev.org/67560610:56
cdentthat second one has a rather significant impact10:56
*** vdrok has quit IRC12:14
*** vdrok has joined #openstack-placement12:14
*** edleafe has joined #openstack-placement12:29
*** jroll has quit IRC13:01
*** jroll has joined #openstack-placement13:02
*** efried_pto is now known as efried13:09
*** mriedem has joined #openstack-placement13:18
*** belmoreira has joined #openstack-placement14:53
*** tssurya has quit IRC15:18
cdentefried: when you get a moment, this might blow off the top of your head: https://review.opendev.org/67560615:21
*** tssurya has joined #openstack-placement15:21
efriedcdent: on my list15:21
cdentcool, thanks15:21
efriedbrain still grinding into gear today. Been a bit... off lately.15:21
efriedand I know that one's going to require brain.15:21
cdenti know how it can be when brain being off15:22
*** openstackgerrit has joined #openstack-placement15:30
openstackgerritMerged openstack/placement master: Use expanding bindparam in get_traits_by_provider_tree  https://review.opendev.org/67564815:30
*** belmoreira has quit IRC15:44
*** helenafm has quit IRC15:53
*** e0ne has quit IRC16:01
gibicdent: left comments in https://review.opendev.org/#/c/67560616:07
cdentthanks16:09
openstackgerritChris Dent proposed openstack/placement master: Avoid duplicate ProviderSummary in _merge_candidates  https://review.opendev.org/67560616:28
cdentgibi: fixed ^16:28
*** dklyle_ is now known as dklyle16:34
efriedcdent: How can I run one gabbi test and have it print out the UUIDs from os.environ?16:36
efried(in a way that I can tell which keys belong to which values)16:36
cdentefried: need slightly more context16:37
efriedOkay16:37
efriedI think https://review.opendev.org/#/c/675606/3/placement/objects/allocation_candidate.py@934 is very wrong, and I'm trying to prove it16:37
efriedso I reverted it, and it fails with KeyError, which I'm sure is why you changed it16:38
cdentyes16:38
efriedso when I get there I want to print out some of the provider UUIDs to prove that that key belongs to a not-root provider, which is what we're assuming it is when it's not found in the dict.16:38
efriedjust thought of another way to do that...16:39
cdentyeah, I wouldn't bother with gabbi for htat16:39
efried(from the code, not from the gabbi16:39
efried)16:39
efriedum, still16:40
cdentthe different here is that parent_uuid_by_rp_uuid has different stuff in it now that I've changed the algorithm16:41
cdentless stuff16:41
efriedYes, I understand that.16:41
cdentit used to have _all_ the stuff16:41
cdent(even though it shouldn't have)16:41
efriedIn the cases you were getting KeyError, it's because you're no longer processing cn116:41
efriedAnd you're assuming numa0 and numa1 are roots16:42
efriedI just need to prove it16:42
cdentthat may b16:43
cdentif so, we need a test for it, because all the tests currently pass16:43
cdentand we need to make sure that dict gets the right stuff16:43
efriedtests do *not* pass16:43
efriedunless you do that .get()16:43
efriedwhich is bypassing the bug16:43
efriedThe test cases that cover that path, albeit somewhat incidentally, are those where we return results that don't involve the root provider.16:44
cdentwhat i mean is that we need a more explicit test16:44
cdentit may be as simple populating the parent_uuid thing somewhere else in the loop16:45
cdentor more16:45
cdentyou accept the rest of the problems, yes, that too much data was involved?16:45
cdentalso, my change to use get is still returning the same actual value, just not a literal value in the dict16:46
cdentso I'm not sure...16:46
efriedsay what??16:47
efried"returning the same actual value" - it's returning None when the key is absent.16:47
efriedthere is no "same actual value"16:47
efriedI must be misunderstanding you.16:47
efriedYes, I accept that we needn't process psums that aren't involved in the request - as long as we make sure to process all the psums in all the *trees* that have providers involved in the request.16:48
cdentI _thought_, but maybe not, that parent_uuid is None16:48
efriedparent_uuid is None only for roots.16:49
efriedor, should be.16:49
cdentyes16:49
efriedotherwise it should have a value16:49
cdentyes16:49
efriedswhy I'm trying to print out cn1's uuid16:49
* cdent nods16:49
cdenthowever16:49
efriedprinting uuids for cn1 and cn2 to prove that the one we're getting KeyError on isn't one of them.16:49
cdentif the tests pass when I force None there, that means that the tests already expected None there16:50
cdentthat is they expected the recursion to hit its base case16:50
cdentso what's happening is that parent_uuid_by_rp_uuid is sparse16:50
* cdent is speculating16:51
* cdent we will wait while efried doesn't further exploration.16:51
efriedRight, I'll accept that we should have an existing same_subtree test go deeper to confirm that the result changes when we're assuming numa0/1 is the root, which ought to be the case.16:51
cdentI'm in the midst of a tedious email, will rejoin this in a moment16:52
efriedbecause that ought to be the impact. We're using that recursive thingy to calculate same_subtree-ness16:52
efriedso if we're getting too short a tree, we should get a different result.16:52
efriedbut whatever test(s) we're doing would happen to get the same result either way.16:52
efriedwhich probably means we need a test with a deeper tree, where the subtree root is at level 3 instead of level 2.16:52
cdenti reckon root of the subtree needs to be resourcelesss with a resourceless gap between it and anything providing resources in the same tree16:56
cdenta parent without resources having a parent without resources16:56
efriedf, summary.resource_provider.name is empty16:58
efriednot16:58
efriedhelpful16:58
efriedI feel really stupid that I can't f'ing figure out how to printf wtf I need here.16:59
cdentmaybe you're trying to be too focused16:59
cdentjust print all the stuff16:59
efriedthat's what I thought I was doing, but without 'name' it's pretty hard for me to tell which provider is which.17:00
cdentline 538 the name isn't being set because we don't have it17:02
cdent(i guess to save space)17:03
efriedfwiw it's not being set on the arr either17:04
efried(not sure if those are coming from the same place, prolly are)17:04
cdenti think it all comes from creating provider_ids, which was optimized (by jay I think) for space17:04
cdentyou can LOG.debug when you start the fixtures with the data you care about, printing a mapping of name to uuid17:05
efriedI tried that17:06
efriedthe log didn't show up17:06
cdentbecause OS_DEBUG isn't true and a test didn't fail17:07
cdentyou can either change it to warning or some other high level, or force the test to fail17:07
efriedI failed the test, and printed it as info.17:07
efried"when you start the fixtures" -> I was doing it at the end of the fixture setup, is that what you meant?17:08
efriedI added name to the psum17:08
efriedso now I can see that the KeyError is indeed coming off of cn1's UUID17:09
efriedwhich means this will work until we have a candidate that doesn't involve the root or first tier down.17:10
cdentwe should see about adding that as a test17:10
cdentbecause we don't know for certain that what you said is true, only that something that doesn't have a parent is not in the dict17:11
cdentyou're likely right17:11
cdentthat is: it not having a parent, in this instance, is correct, so we need a setup where we see the problem where it does17:12
efriedyes. I'm trying to formulate that in words on the review. Then we get to decide who writes that test.17:14
efriedBecause I'm pretty loaded down, so I would like it to be not-me if possible.17:14
efriedJust need to make someone understand what the test needs to look like.17:14
cdenti can probably make it happen if you do that ^17:14
efriedcdent: I left the review. I haven't codified precisely what that test needs to look like, because I don't really know yet. Did I give enough information for someone to experiment until found?17:17
cdentI _think_ so17:19
cdentit at least gives me enough to mess about with and see what I can break17:19
efriedcdent: all that said...17:21
efriedI'm not sure that part of the optimization is the necessary part to reduce from 21000 to 700017:21
cdentit was necessary to remove carrying around ids17:22
cdent(or psums)17:22
cdentyou have to look at something to get rp ids17:22
efriedI suspect it would reduce from, like, 7000 to 6000 or something, by eliminating the summaries for providers not involved in the request17:22
efriedwhich, as noted, might not be the right thing to do17:22
efried(Are we still returning all summaries for the whole tree around providers involved in the request?)17:23
cdentwe have tests for that, yes?17:23
cdentI _absolutely_ rely on the tests when doing this stuff. If we have gaps, we need to fill them.17:24
cdents/absolutely/solely/17:24
efriedwe should, yeah17:24
efriedanyway, what I'm saying is...17:24
efriedI think the reduction from 21000 to 7000 happened by virtue of removing the summaries as part of the return values you're slinging around between methods.17:24
cdentyes, that's a given17:25
efried...and relying solely on the cache in rw_ctx17:25
cdentthe chunk of code that is setting the parent_by dict is a result of that change, not a helper fo rit17:25
efriedwell then, a) let's split that change out, because I can get behind that being right, and b) then we can rationalize whether this other thing actually saves us anything, and maybe we don't need to "fix" this "problem" at all.17:25
cdentwe do17:26
cdentI already established that somewhere in the process17:26
cdentnot that I can recall it today17:26
efriedBased on some of the comment residue, it looks like you went through some stages in the middle17:26
efriedand I don't doubt that this may have been necessary in one of those stages.17:27
efriedBut I don't think it's necessary right now.17:27
cdenthmmm17:27
cdenti'll look, but I'm not convinced17:27
efriedI think if you cut L737 back to previous algorithm, but use psums from rw_ctx instead of from candidates (cause they're no longer there)17:27
efried...then you'll have achieved the 21->7 reduction17:27
cdentrw_ctx has all provider summaries17:28
efriedyes17:28
cdentwe are in one of multiple calls to merge17:28
efriedyes17:28
cdentso that's not good enought17:28
cdenti mean, if you want to go that way just so we can merge stuff in stages, I suppose that's an option, but it's far from ideal17:31
efriedI think it's L383 and 453 that effects the 21->7 reduction.17:31
efriedHow hard is it for you to prove the 21000 vs 7000 bit?17:31
efriedI mean, can you set that test up pretty easily?17:31
cdentit's not simply about proving the 21000 vs 7000 bit17:31
cdentthat was an offshoot of trying to fix that merge_candidates was doing too much work17:32
cdentthe goal is to make merge_candidates do less work17:32
cdentif it happened to be less expensive to mess with 21000 redundant objects I'd be happy with that17:32
cdentbut it's not17:32
cdentbut there's more to it than that17:32
cdentfurther, without the change I made we wouldn't have discovered that we've got a testing gap17:34
cdent(thigns would have worked by accident of data structure, not intent)17:34
cdentin any case: I will work on this some more and attempt to make it better on several dimensions17:34
efriedso17:34
cdentbut unlikely tonight17:35
efriedwhat we want to do is17:36
efriedmake _merge_candidates only iterate over the providers involved in the request17:36
efriedbut we're pretty sure that's going to have to include tree members, or at least ancestors, if we want to be building parent_uuid_by_rp_uuid17:36
efriedbut17:36
efriedwhat we could do is build that dict ^ just once, outside of _merge_candidates, based on the (whole) psums dict from rw_ctx.17:37
efriedand then the reduction of the loop in _merge_candidates makes total sense.17:37
cdentyes, that's effectively what I saw above when I said something like "we could build the dict somewhere else"17:37
efriedbecause we're only relying on it to do capacity checks17:37
cdents/saw/said/17:37
efriedwhich can be legitimately restricted to single providers.17:38
efriedokay, neat.17:38
efriedIf we did that, we wouldn't need that .get(), and I would be happy to acknowledge the testing gap with a TODO.17:39
cdentI think it's really a case of: come up with the broken test17:39
cdentfix it17:39
cdentno17:39
cdentmuch better to fill the gap17:39
cdentand fix it17:39
efriedas you wish. I'm just saying if the code were structured that way, I would be able to convince myself by inspection that it's still correct.17:40
cdentright, but I can't structure code the "right way" without having a test that shows me the wrong way17:40
efriedbecause we would have effectively not changed the algorithm.17:40
cdentbecause as far as the tests are cocncerned, right now, I've done it the right way. even the "best way"17:41
cdentbut you've found a hole17:41
cdentthat's the important part17:41
efriedokay.17:41
cdentconvincing by inspection gives me shivers17:42
*** tssurya has quit IRC17:42
* cdent dines17:49
*** cdent has quit IRC17:49
*** e0ne has joined #openstack-placement20:26
*** efried has quit IRC20:39
*** efried has joined #openstack-placement20:40
*** artom has quit IRC20:46
*** gryf has quit IRC20:58
*** e0ne has quit IRC21:49
*** mriedem is now known as mriedem_afk22:03
*** spatel has joined #openstack-placement22:14
*** spatel has quit IRC22:37
*** mriedem_afk has quit IRC23:19
*** takashin has joined #openstack-placement23:38
*** spatel has joined #openstack-placement23:54
*** alex_xu has quit IRC23:57
*** spatel has quit IRC23:59
*** alex_xu has joined #openstack-placement23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!