Wednesday, 2018-11-28

*** tetsuro has joined #openstack-placement00:11
*** sean-k-mooney has quit IRC00:44
*** sean-k-mooney has joined #openstack-placement00:58
openstackgerritMerged openstack/placement master: Remove keystoneauth1 opts from placement config group  https://review.openstack.org/61929902:56
openstackgerritChris Dent proposed openstack/placement master: Remove [keystone] config options from placement  https://review.openstack.org/62041203:08
openstackgerritChris Dent proposed openstack/placement master: WIP: Stop using global oslo_config  https://review.openstack.org/61912103:08
*** cdent has quit IRC03:28
*** diga has joined #openstack-placement03:47
*** diga has quit IRC04:49
*** e0ne has joined #openstack-placement05:37
*** e0ne has quit IRC05:42
*** e0ne has joined #openstack-placement05:44
*** e0ne has quit IRC05:45
*** alex_xu has quit IRC07:50
*** e0ne has joined #openstack-placement07:54
*** alex_xu has joined #openstack-placement07:55
openstackgerritTetsuro Nakamura proposed openstack/placement master: Add alembic version stamp capability to the DB  https://review.openstack.org/62021608:01
openstackgerritTetsuro Nakamura proposed openstack/placement master: Set root_provider_id in the database  https://review.openstack.org/61912608:04
*** e0ne has quit IRC08:37
*** tssurya has joined #openstack-placement08:41
*** tetsuro has quit IRC09:03
openstackgerritDebo Zhang proposed openstack/placement master: Fix typo  https://review.openstack.org/62054109:21
openstackgerritGhanshyam Mann proposed openstack/nova-specs master: Spec for API inconsistency cleanup  https://review.openstack.org/60396909:22
openstackgerritGhanshyam Mann proposed openstack/nova-specs master: Spec for API inconsistency cleanup  https://review.openstack.org/60396909:23
*** ttsiouts has joined #openstack-placement09:59
*** ttsiouts has quit IRC10:02
*** ttsiouts has joined #openstack-placement10:03
*** ttsiouts has quit IRC10:03
*** ttsiouts has joined #openstack-placement10:03
*** ttsiouts has quit IRC10:06
*** ttsiouts has joined #openstack-placement10:07
*** cdent has joined #openstack-placement10:11
*** ttsiouts has quit IRC10:12
cdentmorning gibi10:12
*** ttsiouts has joined #openstack-placement10:13
gibicdent: morning10:13
*** ttsiouts has quit IRC10:14
*** e0ne has joined #openstack-placement10:14
*** ttsiouts has joined #openstack-placement10:15
*** ttsiouts has quit IRC10:18
*** ttsiouts has joined #openstack-placement10:19
*** ttsiouts has quit IRC10:21
*** ttsiouts has joined #openstack-placement10:22
*** ttsiouts has quit IRC10:22
openstackgerritMerged openstack/placement master: Remove [keystone] config options from placement  https://review.openstack.org/62041210:44
cdentgibi: I fixed the two easy to fix test failures on the nova using external placement change but I'm basically stumped at this point. I'm running the placement tests in a loop now, just to be sure they don't have anything lingering, but I don't think that's a factor.10:46
cdentSo I'm now really sure how to proceed from here, other than the eventlet poking you're planning to do. Any other ideas?10:47
cdentI'll get to the alembic and lock fixes on the placement side in a while10:48
cdentFor giggles i ran the placement tests --uniti-failure. I got to >100000 tests and gave up11:06
*** ttsiouts has joined #openstack-placement11:13
gibicdent: I lost the morning runs due to a mistake. I thought I run the test with a lock in @run_once but I added -r to the tox command that re-coloned placement in the env :/11:16
gibiin theory run_once in in an eventlet environment allows calling the wrapped func more than once, if during the first call an eventlet switch happens11:17
gibias run_once set the wrapped.called = True after the func returned11:17
gibi(or raised)11:18
*** ttsiouts has quit IRC11:18
openstackgerritBrin Zhang proposed openstack/nova-specs master: Support for changing deleted_on_termination after boot  https://review.openstack.org/58033611:33
gibicdent: I've also tried to record who calls the configure method but as soon as I add some extra code to that place the problem disappears  so I was not able to catch the first caller when there are more than one calls :/ Most probalby adding more code there changes the timing of the race11:37
cdentgibi: ugh. This is _so_ much fun...11:41
gibicdent: we will learn from it... if we figure out what is wrong11:42
cdentI wish we could turn eventlet all the way off in the nova tests. It's the source of much pain.11:42
cdentOh yeah, the learning is good11:42
*** e0ne has quit IRC12:09
*** ttsiouts has joined #openstack-placement12:11
*** ttsiouts has quit IRC12:12
*** ttsiouts has joined #openstack-placement12:13
*** ttsiouts has quit IRC12:17
gibijust to increase the fun factor with PS 25 I have been running the tests without failure in the last 40 minutes12:27
* cdent shivers12:29
*** jaypipes has joined #openstack-placement12:31
cdentmoin jaypipes12:31
jaypipescdent: hey Chris, how are ya? :)12:33
cdentI'm swinging from disappointed and meh about everything to omg, we're fixing it!12:34
jaypipescdent: heh, I'm not swinging.12:35
jaypipescdent: just seem to be in a permanent state of disappointment of late.12:35
cdentdid you see my removing global config email?12:35
cdentI included you on it beause it might shine a small bit of light12:35
cdentI'm sure I'll swing the other way soon enough, but for now it's a brief moment of nice12:36
jaypipescdent: heh, just sent a reply. great stuff. will review today.12:37
cdentI've fixed the "need to be fixed bits" and am just now making sure it is all kosher12:37
cdentthanks12:38
jaypipesnp, exciting to see it.12:38
gibicdent: OK, I just needed to be patient. Now failed again12:38
*** ttsiouts has joined #openstack-placement12:38
cdentgibi: gah!12:39
cdentI don't really have any ideas. Do you have a working theory of what's going on?12:40
openstackgerritChris Dent proposed openstack/placement master: Stop using global oslo_config  https://review.openstack.org/61912112:41
cdentthat's ^ passing everything locally now, including the opportunistic tests, we'll see what tempest thinks of it12:41
gibicdent: this was still without to lock in run_once, so now I'm adding the lock and running again12:42
cdentah, okay12:42
cdentthat's a relief12:42
cdentI thought you were saying that the lock didn't help12:42
gibicdent: now I have a baseline that ps 25 still fails, and needs at least and hour of continuous run to reproduec12:42
cdenthow many total tests does an hour produce on your beastly machine?12:43
gibi24 full functional run12:45
jaypipescdent: I'm unfortunately unfamiliar with the tox siblings stuff. Does adding a required-projects in the .zuul.yaml essentially require setting tox_install_siblings? Also, what specifically does tox_install_siblings *do*? :) Looking at the YAML for the ansible task here doesn't really indicate anything: https://github.com/openstack-infra/zuul-jobs/blob/master/roles/tox/tasks/siblings.yaml. Plus, googling for "tox siblings" gives me results12:53
jaypipesfor "toxic siblings" which is funny, but non-useful.12:53
cdentjaypipes: ah yeah, it is all a bit obscure. basically it goes like this:12:54
cdentrequired projects lists the things that zuul is going to git clone early in its process12:54
*** ttsiouts has quit IRC12:56
cdentif tox siblings is turned on, then at some stage in the process zuul will look at the packages that are in the virtualenv for the jobs it is running and if any of them are provided by and of the things is had cloned early, it will re-install those packages into the virtenv, from the local git repo12:56
*** ttsiouts has joined #openstack-placement12:56
cdentthis means that you will either get master or whateven the depends-on version of that thing (in this case placement) in the test environment12:56
jaypipescdent: got it. thanks for the excellent explanation!13:03
*** e0ne has joined #openstack-placement13:05
jaypipescdent: +269, -17988 <-- nice.13:07
*** ttsiouts has quit IRC13:12
*** ttsiouts has joined #openstack-placement13:13
*** ttsiouts has quit IRC13:18
*** ttsiouts has joined #openstack-placement13:33
jaypipescdent: sorry, getting back to a review on the nova patch for removing placement stuff... could you explain to me what the from __future__ import absolute_import accomplishes in https://review.openstack.org/#/c/617941/25/nova/tests/functional/api_paste_fixture.py?13:35
jaypipescdent: is there some placement-y module that is shadowing one of the imports there?13:36
cdentjaypipes: catching up, was eating13:40
* cdent reads13:40
jaypipescdent: ah, maybe the fixtures...13:41
cdentyeah, it's the fixtures13:41
cdentI moved some fixtures out of nova/test/fixtures into that dir13:41
jaypipescdent: hmm, ok. I thought absolute_import was the default >= py27?13:42
cdentwell, if you don't have that line, in 27, it will try to load the wrong thing, so I guess not?13:42
*** diga has joined #openstack-placement13:42
jaypipescdent: :)13:42
cdentI recall reading something about it, but the details escape me13:43
cdentjaypipes: https://bugs.python.org/issue1449413:49
jaypipescdent: on https://review.openstack.org/#/c/617941/25/nova/tests/functional/regressions/test_bug_1679750.py, is that what you want long-term, or is that more of a temporary solution? My guess is you're trying to get full separation of placement modules right? so, not import'ing anything from placement directly into nova tests, yeah?13:50
jaypipescdent: ah, cool, good hunting on the absolute import thing13:50
cdentjaypipes: that probalby ought to be a short term thing, but that test is "special" and I couldn't think of a good way to do it more cleanly13:51
jaypipescdent: yeah, figured as much, just wanted to check with ya13:51
gibicdent: I tried the locking and I still fail. I now think that run_once has some hidden bug itself that we are overlooking. I will try to remove run_once and implement a minimalistic guard around content of configure() directly13:52
cdentgibi: :( Thanks for pluggin away.13:53
cdentgood that you have some meaty hardware. Are there any avenues I should explore?13:53
gibicdent: this is now so strange that I cannot stop. fortunately I have still an idea (the removeal of the complex run_once) to try13:53
cdentjaypipes: if/when it needs a respin, I can stick a TODO in there some where13:54
cdent"now so strange that I cannot stop" is my new tshirt13:54
gibicdent: if you can put a fresh eye the run_once decorator and the reset() function in the fixtures that calls the configure.reset() maybe you can spot something. Besides that I don't know13:55
*** diga has quit IRC13:56
cdentk, I'll try to do some thinking, but after last night I'm not sure I can promise much today13:56
gibicdent: no worries most probablty we have already looked too much into this.13:57
cdentin related news, I was running nova master in an --until-fail and it just did13:58
cdentI'll see if that lends a clue13:58
jaypipescdent: not a prob14:03
cdentsean-k-mooney: you around? you might have some ideas on the test failures gibi and I are seeing when using external placement with nova functional tests14:07
*** ttsiouts has quit IRC14:08
*** ttsiouts has joined #openstack-placement14:09
*** ttsiouts_ has joined #openstack-placement14:11
sean-k-mooneyi am14:12
sean-k-mooneyim not sure if i will but i can take a look if you send me a link14:12
jaypipescdent: https://review.openstack.org/#/c/617941/ reviewed. good stuff. not +1 b/c of a question, but close.14:13
cdentit's this patch, sean-k-mooney , which if you run forever will eventually have errors as indicated in the pastes that gibi has inked from it: https://review.openstack.org/#/c/617941/14:13
*** ttsiouts has quit IRC14:13
cdentthanks jaypipes, looking14:13
cdentjaypipes: there's commentary on that topic between mriedem and on ps 21 and at http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-26.log.html#t2018-11-26T15:44:3214:15
sean-k-mooneycdent: gibi so calling reset currently does nothing except clear the ran flag as there is no cleanup handeler to actully reset the garded transaction factory14:15
cdentbut basically: yeah, we need to figure out that out, jaypipes14:15
cdentsean-k-mooney: there is in the Database fixture placement/tests/fixtures.py14:16
sean-k-mooneycdent: are you pasing in write contexts via decoratrors anywhere14:16
* sean-k-mooney looks14:16
cdentyes, we covered that before14:17
cdentwhen making run once, there all over the place, idiomatic14:17
sean-k-mooneythen reset wont work14:17
sean-k-mooneythe decorator will never be rebound14:17
cdentsean-k-mooney: have a look at the fixture, it works fine on the placement side14:17
cdentwhere it seems to go wrong is when eventlet gets involved once nova is also involved14:18
*** guilhermesp has joined #openstack-placement14:19
sean-k-mooneyhttps://github.com/openstack/placement/blob/master/placement/tests/fixtures.py#L43 even with dispose i think that might be undefied behavior14:20
cdentwhen that code was added zzzeek was in the mix. It was the only thing that worked in a situation where we wanted three different potential engines-types in the same process14:21
sean-k-mooneybut in any case you are saying this works fine in placement but then when the same fixture is used in nova its failing14:22
cdentit's racing. rarely.14:22
* cdent finds a recent paste14:22
sean-k-mooneywell that kind of make sense14:23
cdentline 102 on http://paste.openstack.org/show/736292/14:23
*** mriedem has joined #openstack-placement14:23
*** ttsiouts_ has quit IRC14:23
sean-k-mooneythis should have a lock around it i think https://github.com/openstack/placement/blob/master/placement/util.py#L42614:23
cdentI think gibi tried some form of locking, but I'm not sure if was right there. gibi?14:23
*** ttsiouts has joined #openstack-placement14:24
sean-k-mooneywe proably need to lock around https://github.com/openstack/placement/blob/master/placement/util.py#L426-L435 and https://github.com/openstack/placement/blob/master/placement/util.py#L448-L45214:25
cdenton the placement side it's not an issue because we never go co-operative. If we were in the mood for another huge change, getting evently out of the nova tests would be handy, but I haven't got the stamina14:25
sean-k-mooneythis was writtend with the expection that this was single threaded but eventlest could case a yeild in the invocation of func14:26
sean-k-mooneylet me try a small patch with a lock in those two places and see if we can fix the race14:27
cdentcool, thanks14:27
cdentping me when you've got something and I can try testing it14:28
sean-k-mooneycdent: is there a simple way for me to test it locally by the way or just run the functional tests repetatedly and see if they fail14:28
cdentif you could base it on https://review.openstack.org/#/c/619121/ that's probably best14:28
*** ttsiouts has quit IRC14:28
cdenti learned the following from gibi: tox -e functional -r -- --random --until-failure14:29
sean-k-mooneydoes random jsut run tests in a random order?14:29
sean-k-mooneycdent: i can base it on that on the placement side but should i base it on https://review.openstack.org/#/c/617941/ on the nova side?14:31
cdentyes14:31
cdentyou shouldn't need any changes on the nova side, just the placement side, but in order to use the placement side you need that nova patch14:31
sean-k-mooneyoh your right in that case ill checkout https://review.openstack.org/#/c/617941/ and use a local copy of placement14:32
* gibi sitting on a meeting14:33
gibicdent, sean-k-mooney: I tried locking this way http://paste.openstack.org/show/736318/14:36
gibicdent, sean-k-mooney: and also tried and extra lock around fixtures.reset()14:37
sean-k-mooney.. that was basically what i was going to do.14:37
gibicdent, sean-k-mooney: and I was still able to reproduce the problem14:37
*** ttsiouts has joined #openstack-placement14:39
sean-k-mooneygibi: that ptache should have prevent any potential race14:40
cdentI wonder if our understanding of the problem is way off and we are following a red herring. I haven't got any alternate ideas though.14:40
gibisean-k-mooney: yeah, that was my thought as well. I'm now trying to replace the complex run_once with a simple if (and a lock) in the configure() call as I feel the run_one is more complex than I'm happy with14:41
sean-k-mooneyi dont think that will help14:41
gibisean-k-mooney: I have no better idea right now so I try this14:41
sean-k-mooneyi dont think this is an issue with the run_once decorator * onece the lock is added14:41
sean-k-mooneysure14:42
sean-k-mooneyi do have once question14:42
sean-k-mooneydo we guarentee that withing hte test runner that we will not interleave two tests in the same process14:42
cdentsean-k-mooney: with eventlet in place all bets are off, no?14:43
gibisean-k-mooney: I tend to assume that one test per process runs at once14:43
sean-k-mooneycdent: that what i was getting too with eventlet im not sure gibis assumetion woudl hold14:44
gibicould there be a not joined eventlet lingering around from a previous test while the next tests start runnig?14:44
sean-k-mooneyi dont know. but coudl we move reset to the setUP function14:48
sean-k-mooneyso call it here14:48
sean-k-mooneyhttps://github.com/openstack/placement/blob/master/placement/tests/fixtures.py#L6814:48
cdentsean-k-mooney: probably worth trying14:50
openstackgerritBalazs Gibizer proposed openstack/placement master: DNM: replace run_once with inline code  https://review.openstack.org/62061714:51
cdentsean-k-mooney: presumably to be most tidy we'd need reset both at the setUp and cleanUp14:52
sean-k-mooneyi guess. doing it isn setup is say ill start form a know good place. doing it is cleanup is i made a mess let me fix that for you14:53
cdentright, but we can't predict what nova will do so returning to a default state at the end is probably best14:54
sean-k-mooneywhere are you useing the placement fixture in nova by the way14:55
sean-k-mooneyhere https://review.openstack.org/#/c/617941/25/nova/tests/functional/fixtures.py?14:55
cdentall over the place, but notably with any test that tries to schedule an instance (thus many places)14:56
sean-k-mooneywhat i ment to ask is how are teh setup and cleanup functions called14:56
cdentthe fixture is used in the usual self.useFixture(...) way14:57
sean-k-mooneyah they are called like this https://review.openstack.org/#/c/617941/25/nova/tests/functional/regressions/test_bug_1550919.py@14914:57
cdentmy conection to gerrit is interrupted somewhere in the middle14:58
sean-k-mooneyoh well it was jsut self.useFixture(func_fixtures.PlacementFixture()) in the tests setUp function14:59
gibisean-k-mooney, cdent: fyi, I've started running the functional test in loop with https://review.openstack.org/620617 I expect to run in 2 x 24 full functional runs at least15:06
cdenti'm trying a --until-failure on reset() in both setup and cleanup15:06
openstackgerritChris Dent proposed openstack/placement master: Stop using global oslo_config  https://review.openstack.org/61912115:14
cdentmeh, reset() on both sides does not fix it, but still probably a useful thing to have15:26
gibicdent: my trial still runs (now running the 11th of full functional run)15:33
cdent15:34
* cdent crosses fingers15:34
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Support volume-backed server rebuild  https://review.openstack.org/53240715:59
cdentmriedem: are you aware of any devstack merge fallout? I haven't heard any.16:09
mriedemnot yet16:10
gibicdent: the test run passed the 30th run without failure16:15
cdentgibi: a) yay! b) my reset experiment above was likely insufficient, going to try a slight modification16:15
*** ttsiouts has quit IRC16:17
*** ttsiouts has joined #openstack-placement16:18
cdenton b) the config is in init not setUp, that's wrong16:18
* cdent tries again16:18
*** ttsiouts has quit IRC16:22
*** ttsiouts has joined #openstack-placement16:28
*** ttsiouts has quit IRC16:39
*** ttsiouts has joined #openstack-placement16:40
*** ttsiouts has quit IRC16:41
*** tssurya has quit IRC16:45
cdent"wrong" even if it doesn't fix the problem16:46
cdentgibi: your change on https://review.openstack.org/#/c/620617/ isn't going to cause a failure, it will only warn, or move the failure to somewhere else, correct? Which might be fine.16:55
gibicdent: it logs the same warning as the run_once decorator logged if the configure was called more than once16:56
gibicdent: my intention was to simply move the behaviour from run_once directly into configure()16:57
cdentthe thing in the previous fails is not a warning, it's an exception raised by oslo_db: http://paste.openstack.org/show/736292/ (line 102)16:57
cdentbecause configure was called16:57
cdentwhat your change does is make it so configure won't be called, and then logs a warning16:57
gibicdent: what I really changed is that I'm not depending on yet another global(ish) flag 'called' but depending on the internal of sqlaclhemy to know if configure was called before16:58
cdentyes, that's probably a nice thing, but we haven't got the same failure surface now, so we don't know if we've "fixed it", because "it" is now different16:58
gibicdent: in my mind when run_once let the configure to be called the second time was a bug16:58
cdentright, but the way the placement side is set up, if configure is attempted to be called a second time, that's a mistake16:59
cdentwhich we should know about16:59
cdentwe don't want to simply say "oh you tried, that's okay"17:00
gibicdent: I can make the else: branch raise in configure17:00
cdentit's probably okay in the context of a running service, where the error showed up before, but in tests we want to know17:00
cdentso it's confusing...17:00
gibibtw the test just passed the 48th time without problem17:01
cdentyeah, my point is: the problem can't happen with your code17:01
*** e0ne has quit IRC17:02
cdentwhether that means we are masking something else is unclear17:02
gibiOK, so during testing we want to fail if somebody called configure twice17:03
gibiwithout reset()17:03
cdentunless we've explicit done the reset17:03
cdentyeah17:04
gibiyeah17:04
cdent:)17:04
gibiOK, I can tune the code tomorrow to fail on the else branch17:04
cdentit's appearing like the change I have in progress is going to change the timing of things as well, making the race less likely, and is a right fix, so I'll push that up too.17:05
cdentwe're going to end up with lots of fixed and better stuff and never be quite sure which thing(s) fixed it17:05
openstackgerritBalazs Gibizer proposed openstack/placement master: DNM: replace run_once with inline code  https://review.openstack.org/62061717:05
cdentbut better is good17:05
gibiI will run this during the night ^^17:06
cdentcool, thanks17:07
openstackgerritChris Dent proposed openstack/placement master: Stop using global oslo_config  https://review.openstack.org/61912117:10
openstackgerritChris Dent proposed openstack/placement master: DNM: replace run_once with inline code  https://review.openstack.org/62061717:11
openstackgerritMerged openstack/placement master: Fix typo  https://review.openstack.org/62054117:11
gibicdent: already the first run failed with the ValueError I added17:14
gibicdent: but now in 20 places17:15
cdentdata!17:15
cdentgibi: try rebasing on my updated change, which fixes the Database fixture to be more correct17:15
gibicdent: this is the failure now http://paste.openstack.org/show/736328/17:16
gibicdent: do you mean using the ps 3 from my change that you rebased17:16
gibi?17:16
sean-k-mooneywhat would it take to kill the global and just pass the context manger down to things that need it17:16
sean-k-mooneyim assuming it not trivial but long term that might be for the best17:17
cdentgibi: yes, it moves when configure is called, to a more correct place17:17
cdent(after reset)17:17
gibicdent: OK17:17
gibibut now I have to really leave17:17
gibitalk to you tomorrow17:17
cdentsean-k-mooney: I think it would still be needed for the opportunistic tests, so much of the facade stuff expects a sort-of-global engine17:18
cdentand it is a nice convenience and the "only" reason we are having difficulty is because we are trying to do an until now abnormal thing with cross-project functional tests17:20
*** e0ne has joined #openstack-placement19:07
*** e0ne has quit IRC19:15
* cdent taps out19:17
*** cdent has quit IRC19:17
*** e0ne has joined #openstack-placement19:17
*** e0ne has quit IRC19:18
openstackgerritMatt Riedemann proposed openstack/nova-specs master: Support volume-backed server rebuild  https://review.openstack.org/53240719:40
*** e0ne has joined #openstack-placement20:19
*** e0ne has quit IRC20:20
*** openstackgerrit has quit IRC20:36
*** e0ne has joined #openstack-placement20:40
*** e0ne has quit IRC20:49
*** e0ne has joined #openstack-placement21:21
*** e0ne has quit IRC21:21
*** e0ne has joined #openstack-placement21:26
*** e0ne has quit IRC21:27
*** takashin has joined #openstack-placement21:50
*** openstackgerrit has joined #openstack-placement21:51
openstackgerritMerged openstack/nova-specs master: Support volume-backed server rebuild  https://review.openstack.org/53240721:51
*** N3l1x has quit IRC23:17

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!