Monday, 2019-09-16

*** ociuhandu has joined #openstack-nova00:31
*** ociuhandu has quit IRC00:35
*** brinzhang has quit IRC00:48
*** gbarros has quit IRC00:58
*** ccamacho has quit IRC01:09
*** slaweq_ has joined #openstack-nova01:11
*** slaweq_ has quit IRC01:16
*** factor has quit IRC01:23
*** factor has joined #openstack-nova01:25
*** factor has quit IRC01:28
*** factor has joined #openstack-nova01:28
*** factor has quit IRC01:31
*** factor has joined #openstack-nova01:31
*** brinzhang has joined #openstack-nova01:31
*** lbragstad has quit IRC01:39
*** markvoelker has joined #openstack-nova01:53
*** markvoelker has quit IRC01:58
*** factor has quit IRC02:03
*** factor has joined #openstack-nova02:04
*** factor has quit IRC02:07
*** factor has joined #openstack-nova02:07
*** larainema has joined #openstack-nova02:08
*** dannins has joined #openstack-nova02:09
*** factor has quit IRC02:10
*** factor has joined #openstack-nova02:10
*** dklyle has quit IRC02:11
*** david-lyle has joined #openstack-nova02:11
*** factor has quit IRC02:14
*** factor has joined #openstack-nova02:15
openstackgerritMerged openstack/nova master: Deprecate CONF.workarounds.enable_numa_live_migration  https://review.opendev.org/64002102:17
*** lbragstad has joined #openstack-nova02:18
*** yaawang has quit IRC02:45
*** idlemind has quit IRC02:46
*** yedongcan has joined #openstack-nova02:53
*** rcernin has quit IRC02:59
*** yaawang has joined #openstack-nova03:00
*** mkrai has joined #openstack-nova03:06
openstackgerritya.wang proposed openstack/nova master: Fix typor of cpu model when check CPU compatibility  https://review.opendev.org/68226703:06
*** slaweq_ has joined #openstack-nova03:11
*** slaweq_ has quit IRC03:15
*** markvoelker has joined #openstack-nova03:24
openstackgerritLuyao Zhong proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139703:25
*** markvoelker has quit IRC03:29
*** psachin has joined #openstack-nova03:31
*** factor has quit IRC03:37
*** factor has joined #openstack-nova03:38
*** factor has quit IRC03:41
*** factor has joined #openstack-nova03:41
*** factor has quit IRC03:42
*** factor has joined #openstack-nova03:43
*** factor has quit IRC03:45
*** factor has joined #openstack-nova03:46
*** rcernin has joined #openstack-nova03:46
*** jhesketh has joined #openstack-nova03:50
openstackgerritBoxiang Zhu proposed openstack/nova master: Make evacuation respects anti-affinity rule  https://review.opendev.org/64996303:56
*** udesale has joined #openstack-nova04:11
*** factor has quit IRC04:49
*** factor has joined #openstack-nova04:50
*** factor has quit IRC04:52
*** factor has joined #openstack-nova04:53
*** factor has quit IRC04:56
*** factor has joined #openstack-nova04:57
*** Luzi has joined #openstack-nova04:59
*** jawad_axd has joined #openstack-nova04:59
*** macz has joined #openstack-nova05:09
*** pcaruana has joined #openstack-nova05:11
*** etp has joined #openstack-nova05:14
*** macz has quit IRC05:17
*** slaweq_ has joined #openstack-nova05:36
*** ratailor has joined #openstack-nova05:44
*** HagunKim has joined #openstack-nova05:44
*** nnsingh has joined #openstack-nova05:45
*** boxiang has joined #openstack-nova05:45
nnsinghHI all, i have one doubt, why this file name is .yaml.txt "https://github.com/openstack/placement/blob/master/etc/placement/README-policy.yaml.txt"? what the reason behind this.05:46
*** zhubx has quit IRC05:47
openstackgerritgaryk proposed openstack/nova master: Deconstruct the mother of all locks  https://review.opendev.org/68224205:56
openstackgerritBoxiang Zhu proposed openstack/nova master: Fix live migration break group policy simultaneously  https://review.opendev.org/65196905:59
*** janki has joined #openstack-nova06:01
*** factor has quit IRC06:02
*** mkrai has quit IRC06:11
*** Garyx has quit IRC06:22
*** Garyx has joined #openstack-nova06:22
*** owalsh has quit IRC06:23
*** logan- has quit IRC06:23
*** owalsh has joined #openstack-nova06:23
*** markvoelker has joined #openstack-nova06:24
*** logan- has joined #openstack-nova06:26
*** markvoelker has quit IRC06:30
*** janki has quit IRC06:39
*** rha has joined #openstack-nova06:44
*** trident has quit IRC06:48
*** slaweq_ is now known as slaweq06:52
*** mjozefcz|away has joined #openstack-nova06:54
*** rpittau|afk is now known as rpittau06:55
*** trident has joined #openstack-nova06:57
*** damien_r has joined #openstack-nova06:58
*** janki has joined #openstack-nova07:00
*** trident has quit IRC07:03
*** trident has joined #openstack-nova07:12
*** mkrai has joined #openstack-nova07:17
*** macz has joined #openstack-nova07:18
*** macz has quit IRC07:23
*** mjozefcz|away is now known as mjozefcz07:26
openstackgerritBhagyashri Shewale proposed openstack/nova master: Ignore root_gb for BFV in simple tenant usage API  https://review.opendev.org/61262607:27
*** ivve has joined #openstack-nova07:32
*** ralonsoh has joined #openstack-nova07:33
*** jangutter has joined #openstack-nova07:34
*** FlorianFa has quit IRC07:36
*** ociuhandu has joined #openstack-nova07:41
*** FlorianFa has joined #openstack-nova07:41
*** ociuhandu has quit IRC07:42
*** ttsiouts has joined #openstack-nova07:47
*** panda has quit IRC07:56
*** janki has quit IRC07:57
*** janki has joined #openstack-nova07:57
*** panda has joined #openstack-nova07:58
*** xek_ has joined #openstack-nova08:05
*** ricolin has joined #openstack-nova08:09
*** mkrai has quit IRC08:10
*** lpetrut has joined #openstack-nova08:10
*** xek_ has quit IRC08:19
*** ociuhandu has joined #openstack-nova08:20
*** markvoelker has joined #openstack-nova08:27
*** luksky has joined #openstack-nova08:30
*** markvoelker has quit IRC08:32
*** nnsingh has quit IRC08:34
*** derekh has joined #openstack-nova08:38
*** ociuhandu has quit IRC08:51
*** panda is now known as panda|ruck08:59
*** kaisers has quit IRC09:05
*** kaisers has joined #openstack-nova09:08
openstackgerritBrin Zhang proposed openstack/nova-specs master: Allow specify user to reset password  https://review.opendev.org/68230209:12
*** mgoddard has quit IRC09:26
*** mgoddard has joined #openstack-nova09:28
*** dtantsur|afk is now known as dtantsur09:32
*** yedongcan has quit IRC09:37
openstackgerritArthur Dayne proposed openstack/nova-specs master: Proposal for a safer noVNC console with password authentication  https://review.opendev.org/62312009:39
*** jaosorior has joined #openstack-nova09:45
*** ricolin has quit IRC09:52
*** ociuhandu has joined #openstack-nova10:04
*** xek_ has joined #openstack-nova10:06
*** ttsiouts has quit IRC10:10
*** ttsiouts has joined #openstack-nova10:10
*** ttsiouts has quit IRC10:14
donnydsean-k-mooney: that was exactly what I was looking for. Thanks a bunch10:22
*** markvoelker has joined #openstack-nova10:28
*** udesale has quit IRC10:28
*** ociuhandu has quit IRC10:33
*** markvoelker has quit IRC10:33
sean-k-mooneydonnyd: you should be aware that it does not always work. on some distors there is a scrip that shuts down any running vms on host reboot. if the compute agent is not stopped first it might notice this and mark the vm as shutdown. the livbrt vm shutdown service file is there to prevent filesystem curroption by gracefully shuting down the vms instead of sig killing them10:34
sean-k-mooneyso if it does cause you issue then you have to consider if you should disable the serivce file or not.10:35
sean-k-mooneythere is also another config option in nova to disabel reporting of the vm state in the db10:37
sean-k-mooneyi think disabling that will also prevent this issue butthen if the guest does a poweroff it wont be reflected in nova status10:38
sean-k-mooneyun less they do it via that api.10:38
*** osmanlicilegi has joined #openstack-nova10:40
openstackgerritMerged openstack/nova master: Parse vpmem related flavor extra spec  https://review.opendev.org/67845610:40
sean-k-mooneypmem is almost done... the PCUP can merge and all the pending feature should be landed10:43
*** ociuhandu has joined #openstack-nova10:45
*** ociuhandu has quit IRC10:50
*** tesseract has joined #openstack-nova10:51
*** ttsiouts has joined #openstack-nova10:54
*** artom has joined #openstack-nova10:59
*** ociuhandu has joined #openstack-nova11:01
*** ociuhandu has quit IRC11:03
*** ociuhandu has joined #openstack-nova11:04
*** tesseract has quit IRC11:09
*** ociuhandu has quit IRC11:10
*** avolkov has joined #openstack-nova11:10
*** ociuhandu has joined #openstack-nova11:12
osmanlicilegigreetings. i'm having problem with nova-conductor, it gives "errno 111 econnrefused" while trying to connecto to rabbitmq. i know it's not a network/firewall issue because all other rabbitmq related services work like a charm. i had the same problem with nova-api and the root cause was monkey patching but this time it's not because nova-conductor does not use monkey patching. anybody had a similar11:14
osmanlicilegiissue before?11:14
aspiersin case anyone didn't get a chance to review my draft SEV blog post last week, here is a link which will last 24 hours (I think) https://blog.adamspiers.org/?p=1871&preview=1&_ppp=dbc2fbd3ce11:19
sean-k-mooneythe conductor shuold be useing monkey patching11:19
artomosmanlicilegi, you'll have bette luck in #openstack, operators hang out there, this channel is for development11:19
artomosmanlicilegi, see /topic :)11:19
sean-k-mooneyosmanlicilegi: e.g. it should be using eventlet11:19
sean-k-mooneybut i have not seen that so i dont know how to help11:19
*** macz has joined #openstack-nova11:19
*** mjozefcz has quit IRC11:20
*** mjozefcz has joined #openstack-nova11:20
lyarwoodartom: https://review.opendev.org/#/c/672595/ - do you think you'll have time today to work on this and potentially break out the per n-cpu service connection setting into another change?11:20
lyarwoodartom: if not I should be able to get to it this afternoon11:20
artomlyarwood, so... Yes, but!11:21
lyarwoodBut!11:21
lyarwoodNothing good ever follows but ;)11:22
artomBut! https://review.opendev.org/#/c/681060/ is in the gate11:22
sean-k-mooneywe might want to hold of doing any upstream work until https://review.opendev.org/#/q/topic:bp/cpu-resources+(status:open+OR+status:merged) lands11:22
artomlyarwood, which does the same thing, right? At least for the hostname part11:22
artomFor the connection I need to play around with it some more11:22
artomlyarwood, wait, I misunderstood you, didn't I?11:24
*** macz has quit IRC11:24
lyarwoodartom: it does, I'd just rather do this in a helper method in the fixture instead of in a loop per test as in that series11:24
artomYou just need the heterogeneous computes helper11:24
lyarwoodartom: and again I really need something I can backport for a few fixes11:24
lyarwoodartom: yeah  pretty much11:24
artomlyarwood, well, I have to stack on top of https://review.opendev.org/#/c/681060/11 regardless11:25
sean-k-mooneyaspiers: the assertion you cant snopp if you have physical acess is a little strong. sev and mktme both do not encyrpt cache content so if you have phyical acess to the server and snoop cache you can see the unencypted state of what ever the vm is doing11:25
artomOtherwise we'll conflict11:25
artomBut yeah11:25
*** dtantsur is now known as dtantsur|bbl11:26
sean-k-mooneyaspiers: you cant read its ram but you could constuct a view of it by recodering the reads/writes to cache11:26
lyarwoodsean-k-mooney: sorry re you're earlier comment are you also suggesting avoding POSTing new stuff at all still? I've been holding off for the last week anyway but thought we were passed the bulk of it now.11:27
artomI think what sean-k-mooney's saying is that the SEV work was *completely* pointless >;)11:27
lyarwoodyour*11:27
sean-k-mooneylyarwood: i personally am holding off posting new stuff until stephens changes land11:27
sean-k-mooneyor we decide to punt it11:28
sean-k-mooneyi dont want to take gate time form it11:28
lyarwoodack I'll continue to hold off then, wasn't sure if you were just talking about landing new stuff or posting.11:28
sean-k-mooneyif its small sure. if you have 10 pending patchs for different thing i dont know11:28
sean-k-mooneytoday is the cut off to not need an FFE i think11:29
lyarwoodkk11:29
sean-k-mooneyartom: :) not quite but as a person who previously worked at a hardware companiy im very carful of claims of hardware releatd security features11:31
sean-k-mooneyartom: also at this point any limitation of the technology are well within amd's court to go fix aspiers work is quite good11:32
sean-k-mooneyaspiers: the only real feedback i would give is the images are a little hard to read without clicking on them. the commandline ones more then anything else11:34
sean-k-mooneythe content looks good to me11:34
*** jawad_ax_ has joined #openstack-nova11:37
*** ociuhandu has quit IRC11:38
*** ccamacho has joined #openstack-nova11:39
*** jawad_axd has quit IRC11:40
openstackgerritya.wang proposed openstack/nova master: Fix typor of cpu model when check CPU compatibility  https://review.opendev.org/68226711:42
*** udesale has joined #openstack-nova11:45
*** ttsiouts has quit IRC11:58
*** xek_ has quit IRC12:00
*** ociuhandu has joined #openstack-nova12:00
*** ociuhandu has quit IRC12:00
*** ociuhandu has joined #openstack-nova12:01
*** ttsiouts has joined #openstack-nova12:05
*** luksky has quit IRC12:09
*** dave-mccowan has joined #openstack-nova12:13
*** mjozefcz has quit IRC12:21
*** mjozefcz has joined #openstack-nova12:22
bauzasFWIW, I'm a bit on and off today, working back on the placement audit command12:26
*** lbragstad has quit IRC12:32
*** ratailor has quit IRC12:34
*** rcernin has quit IRC12:35
*** etp has quit IRC12:40
efriedo/ nova!12:43
efriedready for a fun-filled day of rechecks?!12:43
sean-k-mooneyat least we are down to 2 series12:45
sean-k-mooneythere is 1 patch need for vpmem, + https://review.opendev.org/#/q/topic:bp/cpu-resources+(status:open+OR+status:merged) + 2 pathes for forbiden aggreates i think12:46
efriedI'm starting to think there may actually be something wrong.12:46
efriedforbidden aggs is in.12:46
sean-k-mooneyoh ok12:47
sean-k-mooneywith the pmem stuff12:47
efriedAnd i think there's one for numalm, but not an important one12:47
efriedpmem and cpu-resources12:47
efriedare the big ones12:47
sean-k-mooneyi was ingoring the funcitonal tests for numalm12:47
efriedright12:47
sean-k-mooneythe actully featre has lannded just tests are leaft12:47
* efried counts...12:47
efried13 rechecks on the top cpu-resources patch12:48
efriedcascading effect of -2s in the gate from a prior patch failing.12:48
efriedmostly12:48
sean-k-mooneyya that was what i was about to say12:49
efriedI think we have a race in select_destinations though12:49
efriedbeen seeing quite a few like this https://storage.gra1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_778/674895/42/check/openstack-tox-py36/7787cd5/testr_results.html.gz12:49
sean-k-mooney maybe or that test is just not deterministic12:51
efriedyeah, either way12:52
efriedit's not just that one test12:52
efriedI saw several similar failures with that same select_destinations unequal12:52
efriedI didn't dig all the way into them (lack of appropriate tooling on my phone)12:52
sean-k-mooneyall in the fallback tests12:52
efriedbut thinking maybe I should at this point.12:52
efriedno, I remember seeing one in a pre-existing test (I think)12:53
efriedi.e. regression12:53
sean-k-mooneywe might not be mocking some global state in the tests?12:53
sean-k-mooneythat said the fallback code is very new so there might be something there12:54
stephenfinefried: I'm thinking we should stop rechecking the two few patches for now12:56
stephenfin*top12:56
efriedwhyzat?12:56
efriedoo, I have a local repro12:56
efriedrunning the whole test class12:57
efriednova.tests.unit.scheduler.test_scheduler.SchedulerManagerTestCase12:57
efriedsean-k-mooney: ^12:57
sean-k-mooneyok then it is likely either a shared state issue or a bug in the code12:57
stephenfinOn account of having to recheck again if anything lower gets in. Better just get those lower ones in12:57
sean-k-mooneybut if that repoduces it locally that makes it eaiser to figure out12:57
*** luksky has joined #openstack-nova12:57
stephenfinHmm, I've seen that as well, actually12:58
stephenfinOkay, that sounds like a bug. That's testing the retry logic12:58
* stephenfin pulls down to try recreate12:58
efriedstephenfin: if it's not affecting queue times -- which it probably isn't at this point -- then it's better to have the top ones kicked out by merge-failed -2s than to have to wait for them to pass the check pipeline again once the lower ones are in, imho12:59
*** rcernin has joined #openstack-nova12:59
stephenfinthat's a fair point12:59
sean-k-mooneyefried: im just about to grab lunch but ill dig into it when i get back if ye have not figured it out by then12:59
efriedstephenfin, sean-k-mooney: I definitely remember seeing the failure on a patch lower in the series at some point.12:59
efriedI don't remember which one now12:59
efriedwhere was that fallback introduced, again?12:59
stephenfinefried: https://review.opendev.org/#/c/671801/13:00
sean-k-mooneyya thats the patch that updated those tests too13:00
stephenfinintermittently failing tests are the _best_13:00
sean-k-mooneybrb13:01
*** markvoelker has joined #openstack-nova13:04
*** shilpasd has joined #openstack-nova13:04
shilpasdEric: finally after so many rechecks isoalte agg patches are merged, tnx for your extended support13:05
shilpasdGibi: tnx for followup patch, review on isolate agg patches13:06
efriedshilpasd: Your work on this feature is really appreciated. Please pass our compliments on to your colleagues as well.13:06
*** mriedem has joined #openstack-nova13:08
shilpasddansmith: stephenfin: takashin: thanks for review13:08
shilpasdefried: sure thanks13:08
*** markvoelker has quit IRC13:09
efriedstephenfin: First thing that jumps out is test_select_destination_with_pcpu_fallback_disabled, which is a test, is calling test_select_destination_with_4_3_client, which is also a test. I.e. both are going to run "simultaneously", one being a subset of the other.13:13
efried...which should be fine, as long as neither is doing anything global13:13
stephenfinthey don't _look_ like they're doing anything global13:14
kashyapprint domain_caps[arch][machine_type]._os.loader.enums13:18
kashyapOops13:18
*** eharney has joined #openstack-nova13:18
kashyap(Disregard that, please)13:18
*** KeithMnemonic has joined #openstack-nova13:18
*** macz has joined #openstack-nova13:20
* efried adds to kashyap's fbi file13:21
kashyapefried: Mired in parsing some XML bits and testing :D13:21
efriedsuuuure13:21
kashyapNot my day today...13:21
efriedstephenfin: I agree. I'm looking through the other tests in that class; the problem could just as easily be caused by one of *them* doing something global.13:23
*** macz has quit IRC13:25
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org will be offline briefly starting at 14:00 UTC (that's roughly 30 minutes from now) for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009064.html13:29
*** beekneemech is now known as bnemec13:30
*** xek has joined #openstack-nova13:35
*** Luzi has quit IRC13:38
*** tbachman has joined #openstack-nova13:40
*** jangutter has quit IRC13:40
*** jangutter has joined #openstack-nova13:42
mriedemgibi: a few comments in this bug fix of yours https://review.opendev.org/#/c/666857/13:44
stephenfinefried: something is messing with flags13:44
efriedstephenfin: Because CONF is global mebbe?13:44
stephenfinquite possibly, yeah13:45
*** pcaruana has quit IRC13:45
stephenfin'CONF.workarounds.disable_fallback_pcpu_query' is intermittently 'False' despite me setting it to 'True' in the test13:45
efriedstephenfin: I can reproduce reliably locally when I skip test discovery, and reliably *not* reproduce when I let discovery happen (which runs the tests slower). So definitely racy.13:48
*** rcernin has quit IRC13:49
*** jangutter has quit IRC13:53
efriedstephenfin: Yeah, self.flags is just CONF.set_override13:54
stephenfinand CONF is global13:54
efriedstephenfin: use oslo_config.fixture.Config13:55
stephenfinThis is hardly the first time we've got positive and negative tests for something config opt-driven though, is it?13:55
sean-k-mooneywell13:56
sean-k-mooneyyou wont have two tests running at the same time in the same process13:56
sean-k-mooneyso its fine to set things with set flags13:56
sean-k-mooneyas long as you alreay set the correct state up in the test13:56
efriedsean-k-mooney: same process yes, same thread no, right?13:56
sean-k-mooneyno13:56
artomWait, has oslo_config.fixture.Config finally fixed the global CONF problem? I guess the fixture is not per-test, so no.13:57
sean-k-mooneytox runes tests in multiple processes not threads13:57
stephenfinwell, stestr13:57
efriedartom: ...no, it looks like it still by default uses global CONF. So much for that idea.13:57
sean-k-mooneyyes point is each worker is tis own process that runs its own python interpreter13:58
sean-k-mooneyand only 1 test is running in each worker at a time13:58
sean-k-mooneyso test only share global state with other test that ran in that worker13:58
stephenfinwe're using ConfFixture in our base test class in nova/test.py13:58
sean-k-mooneybut only one test is running at a time13:58
stephenfinthen wtf is happening here? :-D14:00
sean-k-mooneyim just back form lunch so i can start runing it in a debugger14:00
sean-k-mooneyye have narrorwed it down to the config option yes?14:01
stephenfinThat's what I'm seeing anyway14:01
stephenfinIt should be True but it's False instead14:01
sean-k-mooneyand ye are seeing it only when the test class is run14:01
*** gbarros has joined #openstack-nova14:01
sean-k-mooneyrather then the test directly14:02
*** markvoelker has joined #openstack-nova14:02
stephenfinyeah, I'm seeing it intermittently but efried notes that it's consistent if you run without test discovery14:02
efriedwhich gels with the problematic result, which is expecting from GET /a_c {normal results} and is getting {normal results + fallback results}14:02
stephenfinso 'tox -e py27 -- -n nova/tests/unit/scheduler/test_scheduler.py'14:02
*** jawad_ax_ has quit IRC14:02
efriedstephenfin, sean-k-mooney: fwiw I'm running it from within the venv as14:03
efriedstestr run -n nova.tests.unit.scheduler.test_scheduler.SchedulerManagerTestCase14:03
efriedhitting it every time14:03
sean-k-mooneyya i can run the full file in the debugger too although im not sure if that will show anything specifically14:03
efried^^ is just the one test class14:03
-openstackstatus- NOTICE: The Gerrit service on review.opendev.org is offline briefly for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009064.html14:05
*** ChanServ changes topic to "The Gerrit service on review.opendev.org is offline briefly for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009064.html"14:05
sean-k-mooneyyep it failed for me too14:05
mriedemi haven't been following but is there a new gate failure?14:05
mriedemand if so, do we have a bug and e-r query tracking it?14:05
sean-k-mooneyno14:05
sean-k-mooneyits a intermitent failure of one of the tests in the PCPU series14:06
*** pcaruana has joined #openstack-nova14:07
*** ttsiouts has quit IRC14:08
*** ttsiouts has joined #openstack-nova14:09
*** janki has quit IRC14:09
*** ttsiouts has quit IRC14:09
*** ttsiouts has joined #openstack-nova14:09
efriedstephenfin: So I've narrowed it down, by commenting out all the tests in the suite *except* test_select_destination_with_pcpu_fallback_disabled and test_select_destination_with_pcpu_fallback I still get the repro; commenting out either of those and I'm fine.14:10
efriedI also factored out the 4_3 thing so the former ^ only runs the one test14:10
efriedI also commented out the self.flags in the latter ^ because it's setting the default -- and still get the repro14:10
efriedso those two tests are banging on each other somehow.14:10
*** markvoelker has quit IRC14:11
sean-k-mooneywhy are we calling a test method directly by the way instead fo factoring out the common code into a helper method14:11
*** markvoelker has joined #openstack-nova14:12
efriedsean-k-mooney: that was the first thing I mentioned. But it doesn't make a difference, cause it's also the first thing I tried :P14:12
stephenfinsean-k-mooney: Because the theory was that the test end result and assertions should be identical under both circumstances14:13
stephenfinditto14:13
sean-k-mooneyyes but i was wonddering if we where sharing mocks by not doing it14:13
stephenfinpossibly. I tried reseting the 'select_destinations' mock to no avail14:14
artomWhy does discovery affect it, though? Discovery doesn't run any tests, does it?14:15
stephenfinI think it's just ordering14:15
efriedno, it just makes the tests run much more quickly14:15
efriedyeah14:15
artomAh, so a timing issue14:15
efriedit seems as though discovery even slows down the running of the tests themselves14:15
efriedyes14:16
sean-k-mooneywell its not a timeing issue when we are leaking shared state14:16
artomIt causes thing to not run at the same time14:16
sean-k-mooneyno14:16
sean-k-mooneybut it may change the order14:16
efriedI would have thought it would do all discovery, come up with a list of tests, and then run 'em, and by the time you got to that last thing it was the same as if you did no discovery. But clearly that's not how it's happening.14:16
efriedyeah, or possibly ordering I guess14:16
* efried tries...14:16
stephenfinefried: I've noticed that all the other mock assertions pass too14:16
stephenfinso maybe it's not the global conf :-\14:17
*** markvoelker has quit IRC14:17
artomstephenfin, but... you said you've observed an option being False when it should be True...14:18
efriedstephenfin: waitwait14:18
efriedyou're saying GET /a_c is called the appropriate number of times??14:18
stephenfinThat's what I'm seeing. Put the 'select_destinations.assert_called_once_with' to the end14:19
stephenfinin select_destinations.assert_called_once_with14:19
stephenfinsorry, test_select_destination_with_4_3_client14:19
sean-k-mooneyi have it open in the debuger now so ill check14:19
stephenfinassert_called_once_with checks that things are called exactly once, right/14:19
sean-k-mooneyyes its called once14:20
sean-k-mooneystephenfin: it check its called exactly once and has the correct args14:21
sean-k-mooneyso yes14:21
efriedI put a call_count assertion just in case14:21
*** mjozefcz has quit IRC14:22
stephenfinmoving the mocks inline and using context managers instead of function decorators doesn't help14:23
sean-k-mooneyif i put the select_destinations.assert_called_once_with( call at the end all the rest pass14:24
sean-k-mooneyits failing in assert_called_with when its comparing the args14:27
stephenfinefried, sean-k-mooney: got it14:32
efriedtell14:32
*** BjoernT has joined #openstack-nova14:33
stephenfinhttps://review.opendev.org/#/c/671801/50/nova/scheduler/manager.py@19114:33
stephenfinwe're using extend, which modifies a list in place14:33
stephenfinand our mock is returning 'fakes.ALLOC_REQS'14:33
efriedoy vay14:34
stephenfinso that's getting modified by the non-disabled fallback test14:34
efriedstephenfin: I think I saw those globals only used by this one test suite?14:34
stephenfincorrect14:34
efriedso make 'em instance vars, for future safety14:35
stephenfinwdym?14:35
stephenfinI was doing to do 'fakes.ALLOC_REQS[:]'14:35
stephenfinthough I could add a 'get_fake_alloc_reqs' helper too14:36
efriedI mean, you could deepcopy 'em to fix this problem, but it's just going to bite us in the ass again later, somewhere else.14:36
efriedglobal test artifacts bad14:36
stephenfinv bad.14:36
efriedfirst 205 lines of fakes, bad.14:36
*** ChanServ changes topic to "Current runways: https://etherpad.openstack.org/p/nova-runways-train -- This channel is for Nova development. For support of Nova deployments, please use #openstack."14:37
-openstackstatus- NOTICE: The Gerrit outage portion of the current maintenance is complete and the service is back on line, however reindexing for renamed repositories is still underway and some Zuul job fixes are in the process of being applied14:37
*** macz has joined #openstack-nova14:37
efriedstephenfin: I guess for now you could just make it a helper method that returns a fresh new copy every time14:37
efriedbut if it were me, I would make it completely fresh14:38
efrieddef get_fake_alloc_reqs():14:38
efried    return { $everything }14:38
efriedrather than14:38
efriedEVIL_GLOBAL14:38
efrieddef get_fake_alloc_reqs():14:38
efried    return some_unreliable_copy_method(EVIL_GLOBAL)14:38
stephenfingotcha14:39
stephenfincoming right up14:39
efriedthis is gonna be partway down the series, yah?14:39
stephenfinyeah, just before that patch. I'll do it separately14:39
efriedstephenfin: you're going to have to fix the patch anyway, so might as well do it in place, nah?14:39
efrieddon't try to fix the race afterward14:40
stephenfinI meant add the helper function in a precursor patch and modify the intermittently failing patch to use it14:40
efriedtbc, the patch in question is "Add support for translating CPU policy extra specs, image meta"14:40
stephenfinbut I can combine too14:40
efriedyeah, just combine, help me justify a fast approve14:41
stephenfinack14:41
*** TxGirlGeek has joined #openstack-nova14:47
openstackgerritStephen Finucane proposed openstack/nova master: Add support for translating CPU policy extra specs, image meta  https://review.opendev.org/67180114:47
openstackgerritStephen Finucane proposed openstack/nova master: fakelibvirt: Make 'Connection.getHostname' unique  https://review.opendev.org/68106014:47
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Mock 'libvirt_utils.file_open' properly  https://review.opendev.org/68106114:47
openstackgerritStephen Finucane proposed openstack/nova master: Add reshaper for PCPU  https://review.opendev.org/67489514:47
stephenfinefried: ^14:47
stephenfinI'll remove the rest of those global fakes now (separate patch)14:48
efried++14:48
openstackgerritMatt Riedemann proposed openstack/nova master: Centralize volume create code during boot from volume  https://review.opendev.org/68237814:48
mriedem^ is a simple refactor split off from https://review.opendev.org/#/c/541420/ which has been around since february of 2018,14:48
mriedemand is important if y'all ever want to drop the legacy volume attachment compat code14:49
efriedstephenfin: +A, and re+W up the pile. Nice work, thank you.14:50
*** belmoreira has joined #openstack-nova14:55
*** belmoreira has quit IRC14:55
*** belmoreira has joined #openstack-nova14:58
*** shilpasd has quit IRC14:58
stephenfinmriedem: done14:59
*** mlavalle has joined #openstack-nova14:59
efriedmriedem: one adjustment requested pls15:00
mriedemdoing it15:00
efriedalex_xu: yt?15:00
openstackgerritMatt Riedemann proposed openstack/nova master: Centralize volume create code during boot from volume  https://review.opendev.org/68237815:06
openstackgerritMatt Riedemann proposed openstack/nova master: WIP: Create volume attachment during boot from volume in compute  https://review.opendev.org/54142015:06
*** Sundar has joined #openstack-nova15:07
*** openstackgerrit has quit IRC15:08
*** xek has quit IRC15:08
efriedmriedem: are we merging stuff like that ^ at this point or waiting for ussuri to fork?15:09
mriedemwhich one? the refactor is trivial and i've added the latter to https://etherpad.openstack.org/p/nova-train-release-todo15:09
*** BjoernT_ has joined #openstack-nova15:10
mriedemas i said, it's been around forever without much core review outside melwitt15:10
mriedemthe mox->mock stuff in the tests blew it all up15:10
*** BjoernT has quit IRC15:10
mriedembut if we ever want to migrate off the legacy volume attach code, we need to be creating all volumes with the new style attachment stuff15:10
mriedemiow, the longer we wait, the bigger the data migration is going to be15:10
mriedeme.g. https://review.opendev.org/#/c/549130/15:11
mriedemi don't expect to get ^ into train at this point15:11
mriedemnor is it probably the only way to skin that cat15:11
mriedemat some point in the future we can add a nova-status upgrade check and fail to start if you haven't migrated old bdm records15:12
mriedemand then drop all that compat code15:12
mriedemmelwitt: do you want to send this through? https://review.opendev.org/#/c/677736/15:15
melwittI do. thanks15:16
*** ivve has quit IRC15:20
*** dtantsur|bbl is now known as dtantsur15:23
*** damien_r has quit IRC15:23
*** openstackgerrit has joined #openstack-nova15:26
openstackgerritBalazs Gibizer proposed openstack/nova master: Follow up for the bandwidth series  https://review.opendev.org/68238915:26
gibimriedem, efried: two smallish nits for the bandwidth series. I have two other testing enhancements on my TODO list that I plan to propose this week. None of these is critical, just nice to have.15:26
mriedemgibi: i got the fup15:27
*** damien_r has joined #openstack-nova15:27
gibimriedem: thanks15:27
*** damien_r has quit IRC15:27
gibimriedem: and thanks for the review on the bug fix https://review.opendev.org/#/c/666857/ I will respin that15:27
bauzasgibi: I can help you :)15:36
*** lbragstad has joined #openstack-nova15:41
*** luksky has quit IRC15:44
*** gyee has joined #openstack-nova15:47
*** ccamacho has quit IRC15:52
*** TxGirlGeek has quit IRC15:52
gibibauzas: thanks15:56
*** xek has joined #openstack-nova15:57
*** ccamacho has joined #openstack-nova15:58
*** ttsiouts has quit IRC15:58
*** cfriesen has joined #openstack-nova16:03
*** mjozefcz has joined #openstack-nova16:05
*** belmoreira has quit IRC16:08
*** markvoelker has joined #openstack-nova16:13
*** larainema has quit IRC16:14
*** lpetrut has quit IRC16:16
*** lpetrut has joined #openstack-nova16:16
*** slaweq has quit IRC16:17
*** slaweq has joined #openstack-nova16:17
*** markvoelker has quit IRC16:18
*** damien_r has joined #openstack-nova16:20
*** ccamacho has quit IRC16:27
*** lpetrut has quit IRC16:31
*** ociuhandu has quit IRC16:31
*** udesale has quit IRC16:32
*** markvoelker has joined #openstack-nova16:36
*** markvoelker has quit IRC16:40
*** ociuhandu has joined #openstack-nova16:45
*** jmlowe has quit IRC16:46
*** ozzzo has joined #openstack-nova16:47
*** ociuhandu has quit IRC16:50
*** TxGirlGeek has joined #openstack-nova16:53
*** JamesBenson has joined #openstack-nova16:56
*** rpittau is now known as rpittau|afk16:59
*** derekh has quit IRC17:00
*** slaweq has quit IRC17:01
*** jmlowe has joined #openstack-nova17:04
*** gbarros has quit IRC17:05
*** slaweq has joined #openstack-nova17:08
*** dtantsur is now known as dtantsur|afk17:14
*** ralonsoh has quit IRC17:16
*** xek_ has joined #openstack-nova17:16
*** gbarros has joined #openstack-nova17:18
*** xek has quit IRC17:18
*** psachin has quit IRC17:25
openstackgerritEric Fried proposed openstack/nova master: objects: use all_things_equal from objects.base  https://review.opendev.org/68139717:26
*** TxGirlGe_ has joined #openstack-nova17:26
*** mjozefcz has quit IRC17:27
*** TxGirlGeek has quit IRC17:28
openstackgerritMerged openstack/nova master: Deprecate the XenAPIDriver  https://review.opendev.org/68073217:39
*** ccamacho has joined #openstack-nova17:53
*** jmlowe has quit IRC17:57
*** gbarros has quit IRC18:13
openstackgerritMerged openstack/nova master: libvirt: Fix service-wide pauses caused by un-proxied libvirt calls  https://review.opendev.org/67773618:20
*** luksky has joined #openstack-nova18:24
sean-k-mooneycool that ^ will make mdbooth happy18:25
sean-k-mooneyby the way can i get a second +2+w on https://review.opendev.org/#/c/670585/18:27
sean-k-mooneyi really want to ensure that is deprecated in train so we can remove it in Ussuri18:27
sean-k-mooneyas in like 2 weeks if not sooner18:27
*** ociuhandu has joined #openstack-nova18:28
sean-k-mooneydansmith: mriedem: melwitt: could one of ye take a look when ye have time ^18:29
*** ociuhandu has quit IRC18:36
*** markvoelker has joined #openstack-nova18:37
*** munimeha1 has joined #openstack-nova18:41
*** markvoelker has quit IRC18:42
*** xek has joined #openstack-nova18:42
*** xek_ has quit IRC18:45
*** openstackgerrit has quit IRC18:52
*** ozzzo has quit IRC18:56
*** nweinber has joined #openstack-nova18:56
*** Sundar has quit IRC19:07
*** zhubx has joined #openstack-nova19:10
*** boxiang has quit IRC19:12
*** ozzzo has joined #openstack-nova19:22
*** ccamacho has quit IRC19:31
*** mmethot_ has quit IRC19:41
*** mmethot_ has joined #openstack-nova19:42
*** TxGirlGe_ has quit IRC19:45
*** gbarros has joined #openstack-nova19:45
*** igordc has joined #openstack-nova19:51
*** TxGirlGeek has joined #openstack-nova20:02
mriedemsean-k-mooney: commented,20:03
mriedemand added moshe and adrianc to it20:03
*** mmethot_ has quit IRC20:04
sean-k-mooneythanks. if stephen does not adress the docs bugs ill do them tomorrow20:04
*** mmethot_ has joined #openstack-nova20:04
sean-k-mooneymriedem: assuming no objection from moshe or adrianc are you generally ok with this?20:05
mriedemi can't say i'm very familiar with the pci passthrough whitelist dev name stuff20:06
mriedemor how much it's used20:06
sean-k-mooneyits an alternitive to useing the pci addres of vendor id and product id20:07
sean-k-mooneythe issue with is if you restart the compute agnet while its passed to a vm the device wont be found20:07
sean-k-mooneywe recently prevented the compute agent from removeing in use device from the db20:08
efriedmriedem: guessing that az fail is yet another global test var race. We should go on a crusade to murder all of those.20:08
sean-k-mooneybut we didnt always, also the name can change after you release teh VF/PF form teh vm20:08
sean-k-mooneyso the main issue is its not realible and we keep getting bug downstream that we cant fix20:09
sean-k-mooneyso we just want to stop supproting it20:09
mriedemefried: is this a recent failure? b/c i'm not sure what is global about this one and it's been around awhile (the test that is)20:10
efriedmriedem: from what I could tell, it started hitting on 9/1020:10
efriedmriedem: I'm looking at the delta in test_aggregates.py for I9ab9d7d65378be564b3731b5227ede8cece71bef20:10
efriedhttps://review.opendev.org/#/c/671075/21/nova/tests/functional/test_aggregates.py20:11
*** mmethot_ has quit IRC20:12
*** mmethot_ has joined #openstack-nova20:12
mriedemhmm, yeah probably related to that series20:13
artomsean-k-mooney, https://review.opendev.org/#/c/682435/320:13
efriednever mind, that one only merged yesterday.20:14
artomWhich mean that upstream whitebox has now "caught up" and can be used in tests20:14
efriedand its predecessor the day before20:14
efriedso that's not the culprit.20:14
mriedemefried: so from the error it looks like the actual failure is not really the issue,20:14
mriedemit's that we failed to put the instance on the host20:14
mriedemb/c of the az filter20:14
efriedyup20:14
artomNext step: update https://review.opendev.org/#/c/656890/ to move whitebox under openstack-qa/20:14
sean-k-mooneyartom: ok you didnt need to merge them for me to test with them20:14
sean-k-mooneybut ok20:14
* artom runs off to pick up daughter20:14
mriedemthe boot server function is just waiting for the server to exit BUILD status, which it does when it changes to ERROR status20:14
*** luksky has quit IRC20:16
*** luksky has joined #openstack-nova20:17
mriedemefried: so if i had to guess,20:23
mriedemchanging the base class on the test to ProviderUsageBaseTestCase but not cleaning up a bunch of the duplicate setup means we have multiple fixtures and multiple copies of the same conductor/api/scheduler services, and the api syncs informatoin to the scheduler about aggregates, which is probably not multi-scheduler aware in the functional tests,20:24
mriedemso it's probably intermittent b/c sometimes we hit hte scheduler process that knows about the az metadata and sometimes we don't20:24
efriedmm. so why hitting since the 10th20:25
mriedemi can put up a patch to cleanup that duplicate setup20:25
mriedemefried: the first change to hit it in logstash was the one you pointed at20:25
*** nweinber has quit IRC20:25
mriedem67107520:25
mriedemand then it was just a recheck grind20:26
efriedyeah...20:26
mriedemsee how many times someone named Eric Fried blindly rechecked it :)20:26
mriedemi'll put up a patch20:26
efriedIf only I had full debuggability from my phone.20:26
mriedemor make shilpa responsible for investigating the failures and rechecking them20:27
mriedembut whatever, done is done20:27
*** nweinber has joined #openstack-nova20:28
efriedmriedem: looks to me like everything (except the super()) through the self.computes = {} can be removed, yah?20:30
mriedemyup, just did it, running the tests20:30
efriedk, just prepping for review :P20:31
efried+2 "that's how I would have done it"20:31
*** TxGirlGeek has quit IRC20:35
*** ociuhandu has joined #openstack-nova20:36
efriedtests pass locally for me20:37
*** openstackgerrit has joined #openstack-nova20:37
openstackgerritMatt Riedemann proposed openstack/nova master: Remove redundancies from AggregateRequestFiltersTest.setUp  https://review.opendev.org/68247520:37
efriedmriedem: two more lines20:38
*** markvoelker has joined #openstack-nova20:38
*** pcaruana has quit IRC20:38
mriedemack20:38
efriedmriedem:  why only partial-bug?20:39
mriedemb/c i don't know if it's the root fix20:40
mriedemhence the "at least rule it out" comment20:40
mriedemif we see the hits drop off in e-r by next week we can close the bug20:40
efriedthis way requires us to remember to go back at the bug :)20:40
efriedbut okay20:40
mriedemi will when i cleanup old e-r queries20:40
openstackgerritMatt Riedemann proposed openstack/nova master: Remove redundancies from AggregateRequestFiltersTest.setUp  https://review.opendev.org/68247520:41
*** ociuhandu has quit IRC20:41
efriedthanks for the quick fix mriedem <high five>20:42
mriedem<down low>20:43
*** markvoelker has quit IRC20:43
mriedemi wonder if the fake messaging driver doesn't do rpc fanout cast properly, if we should have some sort of fixture that blows up if you try to start more than one non-nova-compute service in a functional test20:45
mriedemsince that's twice in 2 weeks that we've had some issue from functional test inheritance doubling up on fixtures and stuff20:45
sean-k-mooneyare you suggestion on;ly allowing 1 comptue service in the fucntional tests20:46
efriedno20:46
sean-k-mooneyor havign a fixture to detect when you have more then one but have not set things up correctly20:46
efriedhe's suggesting allowing only one *non* nova-compute service.20:46
*** BjoernT_ has quit IRC20:46
efriedwe clearly need multiple compute services for many tests.20:46
efriedbut we should never need more than one superconductor20:47
sean-k-mooneyoh so only one conductor or scheduler20:47
efriedright20:47
openstackgerritMatt Riedemann proposed openstack/nova master: Remove SchedulerReportClient from AggregateRequestFiltersTest  https://review.opendev.org/68248020:47
*** trident has quit IRC20:47
mriedemcorrect, one controller service20:47
mriedemone of each20:47
*** markvoelker has joined #openstack-nova20:49
mriedemi think multiple apis and conductors are probably ok since those are stateless20:50
mriedembut the scheduler has some stateful crap in the HostManager20:51
mriedemlike aggregate info20:51
*** xek has quit IRC20:52
mriedembut yeah the test was definitely starting 2 schedulers20:52
mriedemb'2019-09-16 15:10:19,491 INFO [nova.service] Starting scheduler node (version 19.1.0)'20:52
mriedemb'2019-09-16 15:10:20,156 INFO [nova.service] Starting scheduler node (version 19.1.0)'20:52
*** eharney has quit IRC20:53
*** trident has joined #openstack-nova20:59
*** markvoelker has quit IRC21:00
efriedmriedem: what was the other bug with the duplicated services?21:01
mriedemduplicated placement fixture21:01
mriedemsec21:01
mriedemhttps://github.com/openstack/nova/commit/5e1b096894f6de4cfbca254cf74dcfcf56358ea5#diff-5befe429f14247314e6ca487aa4e13bd21:01
efriedthank you sir21:02
efriedI'm posting a poison patch now21:02
mriedemheh i was just writing my commit message ofr one21:02
mriedem*for21:02
efriedoh, no bug for I057a07c8d0b880c8d09fc2e618ce1f7fc885beda ?21:02
mriedemguess not21:02
efriedhm, if that one was placement fixture, my fix isn't going to hit it.21:03
openstackgerritMatt Riedemann proposed openstack/nova master: Do not allow mutiple nova-scheduler workers  https://review.opendev.org/68248521:03
mriedemefried: ^ is what i was thinking of21:04
efriedwhoah21:04
efriedMine is way dumber, but more complete, and faster.21:04
openstackgerritEric Fried proposed openstack/nova master: Only allow one non-compute service in tests  https://review.opendev.org/68248621:05
efriedmriedem: ^21:05
*** JamesBenson has quit IRC21:06
*** lpetrut has joined #openstack-nova21:06
mriedemyou're going to at least have failures on tests that intentionally start multiple api fixtures with different projects and roles (admin vs non-admin, project1 and projec2 for filtering, etc)21:09
sean-k-mooneyif we have multi cell tests then those would also fail correct21:10
*** panda|ruck has quit IRC21:10
efriedyeah, I guess I thought the cell conductors would have different names, but that doesn't make sense.21:11
sean-k-mooneyso we need a way to allow it21:11
mriedemthe api/conductor/scheduler services don't get registered in the cell dbs21:11
mriedemso i don't think that matters here21:11
*** nweinber has quit IRC21:11
mriedembut i'm also not sure how sophisticated our fixture stuff is to know where to create those services (they should go into cell0 but might not)21:12
mriedemthe CellDatabase fixture has a default of cell1 so the stuff probably just gets created there21:12
mriedemsean-k-mooney: also btw i added https://review.opendev.org/#/c/670585/2 to https://review.opendev.org/#/c/670585/2 https://etherpad.openstack.org/p/nova-train-release-todo21:12
mriedemso we don't forget21:12
sean-k-mooneyok cool thanks. it would not be the end of the world if it sliped but im hopeing that we can drop supprot for this downstream before our next LTS21:13
sean-k-mooneyso in 24-18 months when people start using it we dont have to support it21:14
*** lpetrut has quit IRC21:14
* sean-k-mooney is surprised how many people are still running newton and wont upgrade21:14
* sean-k-mooney and expect it to work and get new feature21:15
artomsean-k-mooney, are you really though? :P21:15
sean-k-mooneyonly by the expect it to work and get new features bit21:15
sean-k-mooneythey can choose one but not both if they want something that old21:16
*** panda has joined #openstack-nova21:16
mriedemthey expect you to make it work and provide new features when they shovel out the $$$ to your sales guys21:17
mriedemand gals21:17
mriedemand then the shit rolls down hill to you, the developer21:17
sean-k-mooneyya well we try to do both but when you cant backport api,db or object changes addinf features that arent bug fixs to old release is21:18
sean-k-mooneychallanging21:18
efriedmriedem: I don't see anywhere we start cell conductor services. What would that look like?21:19
efriedOr do we do it implicitly when we start a compute service in a specific cell?21:20
sean-k-mooneyif we are starting only one conductor the we are running it in the non super conductor toplogy21:20
openstackgerritEric Fried proposed openstack/nova master: Only allow one scheduler service in tests  https://review.opendev.org/68248621:21
sean-k-mooneyi dont know if we create multiple conductors in the current fucntional test but the cross cell migration code might have some?21:21
efriedwe don't have any cross-cell migration code21:22
efriedtssssssss21:22
sean-k-mooneyi ment there might be case in mriedem series21:23
*** gbarros has quit IRC21:23
sean-k-mooneywe might be relying on https://github.com/openstack/nova/blob/master/nova/tests/functional/integrated_helpers.py#L11821:24
mriedemefried: we don't do the whole super conductor / cell conductor thing in functional tests21:24
mriedemit's all just one21:24
sean-k-mooneyto start the conductor in the functional test21:24
mriedemefried: meaning we get away with shit in functional tests that we wouldn't in devstack21:25
mriedeme.g. we do stuff like have the CheatingSerializer so it all looks like one RPC21:25
sean-k-mooneyi think devstack has the abiltiy to deploy in the legacy mode too but i dont know if thats tested still so it could be broken21:25
mriedemwhich has been a problem in my cross-cell series with functional tests21:25
mriedemsean-k-mooney: all grenade jobs are non-superconductor21:26
sean-k-mooneyah ok. is that still true for the new zuulv3 version?21:26
mriedemi would expect so but haven't looked21:27
sean-k-mooneyi tried using that to do a greade version of my numa jobs but  i found i could not set local.conf diffrently for new/old nodes21:27
sean-k-mooneylocalrc yes21:27
sean-k-mooneybut not local.conf21:27
sean-k-mooneyso i could not do the config overrides i needed21:27
sean-k-mooneyto be fair i dont know if normal greande supports that but it was something i found when i treid to use it21:28
*** markvoelker has joined #openstack-nova21:35
sean-k-mooneymriedem: by the way im going to spend some time this week trying to refine the nfv job into something we can have more permentaly and aslo creating an ovs-dpdk job. is this something we would consider merging before RC1 or shoudl i target U with the hope of maybe backporting.21:37
sean-k-mooneyinitally i want to run them as periodic jobs/via experimental but if i get multiple provider i would like to consier addign them to check if they seam stable21:38
mriedemi would say don't get the cart (backports) before the horse (actually getting something working on master)21:38
sean-k-mooneywell i had a dpdk job working w while ago then it broke because fedora21:39
sean-k-mooneyhttps://review.opendev.org/#/c/656580/21:39
*** Sundar has joined #openstack-nova21:39
sean-k-mooneyand the nfv job works. but i want to get them working properly before going anywher near nova check21:40
sean-k-mooneyhence start with a nightly periodic job21:40
*** markvoelker has quit IRC21:45
efriedmriedem: any chance of nailing down test_walk_versions this week?21:46
*** rcernin has joined #openstack-nova21:47
mriedemefried: i lost track of that21:47
mriedemweren't you trying to pull the mysql logs or something for debug?21:48
efriedyeah, I think I eventually succeeded in doing that but they didn't show anything interesting.21:48
efriedmriedem: https://review.opendev.org/#/c/678051/21:48
sean-k-mooneylook like the pcpu reshap went into merge conflict with somehting21:49
efriedmriedem: the bad one has this additional message https://zuul.opendev.org/t/openstack/build/f9b92d66ade145a195b996708cd66c28/log/mysql/error.log#20221:50
efriedand a few more of the final 'Aborted connection' lines21:50
efriedbut otherwise they look the same to me.21:50
efriedsean-k-mooney: what makes you say that? The yellow dot?21:50
efriedoh21:50
sean-k-mooneyno the "Patch in Merge Conflict"21:51
efriedwell wtf21:51
*** TxGirlGeek has joined #openstack-nova21:51
efriednothing has merged lately21:51
sean-k-mooneywe merged a few patches like an hour ago21:51
mriedemi've noticed sometimes gerrit will say something is in merge conflict and then i have a triviall conflict-free local rebase21:51
sean-k-mooneyproably the service wide pause thing21:52
sean-k-mooneyya21:52
sean-k-mooneyshall i pull it down and rebase against master?21:52
efriedno21:53
sean-k-mooneyok21:53
efriedthen we lose all of our +Vs on the rest of the series.21:53
efriedNext time a bottom one fails in the gate and -2s the whole pile, maybe we can do that.21:53
sean-k-mooneyi was just about to check that21:53
efriedmriedem: So should we be getting all those "Aborted connection" things in the first place?21:54
mriedemefried: that i don't know21:54
efrieddoes that mean we're not shutting down the db gracefully?21:54
efriedand regardless, should that matter?21:54
mriedemthe wacky names are just the random db names21:54
sean-k-mooneythere are not patches in the gate for nova. they are all in check. but ill leave it wait till the morning21:54
efriedis this a job for zzzeek?21:54
mriedemidk about that, but zzzeek might be able to tell us what to turn on for debugging21:55
efriedsean-k-mooney: right, about 12 of the 15 patches have cleared the check queue at this point. The bottom one is waiting; once it completes, the whole swatch will get shoved into the gate pipeline.21:55
mriedemwe're also using some deprecated opportunistic db test fixture stuff from oslo.db and i don't know if moving off the deprecated stuff would help us21:55
efriedmriedem: well, couldn't hurt. What are you talking about specifically?21:55
mriedemit's been awhile since i looked, sec21:56
sean-k-mooneyya ok then let hold off untill the patches below it have merged21:57
efriedsean-k-mooney: and tbh I'm not super worried about a merge conflict on that 15th patch. We've got at least a dozen rechecks to go before we get there.21:57
sean-k-mooney... hopefully not but we do need to kill some of the random failures21:58
*** slaweq has quit IRC21:58
efriedwell hopefully we killed two of them today21:58
sean-k-mooneyya and mriedem killed 1?2?  yesterday21:58
*** eharney has joined #openstack-nova21:58
efriedI took a couple of samples this morning and determined that the Nth patch takes ~N rechecks to merge.21:58
sean-k-mooneyfirday i cant rememebr21:58
sean-k-mooney:(21:59
efriedmriedem: test_fixtures.PostgresqlOpportunisticFixture and test_fixtures.MySQLOpportunisticFixture ?21:59
mriedemefried: was fixed already https://review.opendev.org/#/c/609352/22:00
*** gbarros has joined #openstack-nova22:00
efriedokay22:01
sean-k-mooneymriedem: by the way i got an error running the sql migate unit tests http://paste.openstack.org/show/776829/22:02
sean-k-mooneygiven the rest passed im going to push this soon but shoudl tox not install that22:03
sean-k-mooneyi guess it could have moved22:03
sean-k-mooneyi assume ibm_db_sa is a sqlalcheme pluging for ibm db backend and its just not installled by tox for some reason22:04
mriedemumm, that should just be skipped22:05
mriedemwell, or,22:06
mriedemtox should have pulled it in https://github.com/openstack/sqlalchemy-migrate/blob/master/test-requirements-py2.txt#L122:06
sean-k-mooneythe py27 version allso requries me to install the mysql hearder to compile the client...22:06
mriedemhttps://github.com/openstack/sqlalchemy-migrate/blob/master/test-requirements-py3.txt#L122:06
sean-k-mooneyi was runnnig py36]22:06
sean-k-mooneywhich also should install it22:07
mriedemright22:07
mriedemi don't have that problem locally22:07
mriedemare you hitting some pypi mirror?22:07
sean-k-mooneyno22:07
sean-k-mooneybut i can recreat the env and see what happens22:08
*** spatel has joined #openstack-nova22:08
sean-k-mooneythe rest of the test pass fine22:08
sean-k-mooneyi assume they just use sqlite22:08
sean-k-mooneythats interesting its not in pip freeze in the tox env22:10
sean-k-mooneyoh there is not explcit py36 target22:11
sean-k-mooneyand it wont look for requirement-py3.txt by defualt22:11
sean-k-mooneyi might update the tox file in a sperate patch.22:12
*** spatel has quit IRC22:12
efriedmriedem: https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_51/678051/4/check/nova-tox-collect-py37/f9b92d6/testr_results.html.gz looks like the root cause is some kind of timeout. If we could figure out where, we could probably extend it a bit and make the problem go away...?22:14
efriedmmph, it's a socket read, probably a real deadlock.22:15
mriedemsean-k-mooney: do you have this? https://review.opendev.org/#/c/659388/22:17
efriedactually, it's the test timeout22:17
mriedemefried: right,22:17
mriedemthat's a red herring22:17
sean-k-mooneymriedem: apparently not but i cloned this on friday22:18
mriedemi think we hit some db failure, switch eventlet context, and then never fail the test outright but it times out22:18
mriedemsean-k-mooney: that migrate patch merged awhile back22:18
sean-k-mooneymriedem: oh i cloned it form github so look like github sync for this is broken22:18
mriedemsean-k-mooney: did you clone from github or opendev?22:18
efriedthe time stamps are pretty funky on the log messages leading up to the exception.22:18
mriedemefried: yup22:18
mriedemsean-k-mooney: yeah you need opendev22:19
efriednearly 12 minutes elapse22:19
sean-k-mooneyya ok ill rebase22:19
sean-k-mooneyam should i still clean up the tox file seperatly22:19
mriedemefried: b/c https://github.com/openstack/nova/blob/master/nova/tests/unit/db/test_migrations.py#L7222:19
sean-k-mooneywe done actully still support py 26 or py 33/34 right22:19
sean-k-mooney*don't22:19
mriedemsean-k-mooney: how about you clone from the current repo before asking questoins22:20
efried4 * 160 = 11m40s, yup, that's right on.22:20
mriedemthe timeouts on those were bumped a couple of years ago for a valid reason at the time (slow nodes i think)22:20
*** mlavalle has quit IRC22:20
mriedemhttps://github.com/openstack/nova/commit/71d6333f855e139894f497fc120487895a1d66ce22:20
mriedembut the thread switch thing kills us22:21
*** gbarros has quit IRC22:22
efriedthis is always on limestone-regionone??22:23
efriedthat... seems like a thing22:25
efriedis it because limestone has slower nodes consistently?22:25
efriedor because of some particular config over there?22:25
sean-k-mooney are the limesotone jobs gernerally slower or is it just this one thing that is failing22:25
efriedcould we... increase the timeout s'more and see if it goes away? Not the greatest answer, but overall gate time wasted would be less22:25
sean-k-mooneyof the over all job is slow then it coudl be that its just slower in general22:26
efriedsean-k-mooney: I dunno, before digging into that I was going to see if anyone knew off the top.22:26
efriednot sure how I would look into that, other than clicking through a zillion different builds22:26
mriedemyes it's always limestone22:26
mriedemidk if they are slower or what, you'd have to ask in infra22:26
*** markvoelker has joined #openstack-nova22:26
mriedemthere is probably some way to find average job time per node provider and nova openstack-tox-py36 job or something22:27
sean-k-mooneyinfra have many grapha dashborads that track things but i dont know if they yave abuild time per regoin one22:27
sean-k-mooneyyou could proably figure it out form logstash22:27
mriedemi think awhile back i had wondered if those nodes used some different mysql binary or something22:27
mriedemfor everything to be not crazy though it should be the same ubuntu 18.04 image and such regardless of node provider22:28
sean-k-mooneythey should all use the nodepool image and then hit the infra mirros which are avaible via afs in all regions right22:28
sean-k-mooneyso it shoudl be the same22:28
efriedhttp://zuul.openstack.org/builds?job_name=openstack-tox-py27&branch=master&project=openstack%2Fnova but this table doesn't show the node provider...22:29
sean-k-mooneyi think infra has been uploading its own image nightly since the zuul v2 days to not have random things break in diffrent clouds22:29
logan-o/ from limestone22:30
efriedthis slow bastard happened on rax https://zuul.opendev.org/t/openstack/build/38181c3314284545943b38b52c72f8d7/log/job-output.txt22:30
sean-k-mooneylogan- they are trying to debug a test that seam to be hitting a timeout22:31
sean-k-mooneylogan-: but only (mostly?) happens in limestone-regionone22:32
alex_xuefried: I'm here22:33
sean-k-mooneylogan-: so we were wondering if limesotne is generally slower the other providre and hiting the timeout more often or if it could be a config/could issue22:33
sean-k-mooneylogan-: e.g. is there something differnet that could make it fail more often on limestone or it that just a red herring22:33
efriedalex_xu: Can you, or Rui, or luyao, or someone please formulate a bullet for vpmem for the train cycle highlights in a followon to https://review.opendev.org/#/c/681943/ ?22:33
alex_xuefried: got it, will do today22:34
efriedalex_xu: If you seed the message, I'll scrub the grammar etc. Thanks.22:34
alex_xuefried: thanks22:34
logan-i'm not sure if our testcloud is slower than other nodepool providers. certainly the image would be the same as any other nodepool node. looking thru the log.. i'm curious if the job hangs somewhere specific, or if its just slower in general. and if our nodes are just showing up slower on average i'd like to know so we can figure out if it is h/w or config related.22:35
efriedlogan-: The thing we're trying to nail down is http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22sqlalchemy.exc.InterfaceError:%20(pymysql.err.InterfaceError)%20(0,%20'')%5C%22%20AND%20tags:%5C%22console%5C%22%20AND%20voting:1&from=864000s22:36
efriedwe consistently see the test suite time out on the same 1-3 tests22:36
*** macz has quit IRC22:37
*** macz has joined #openstack-nova22:38
efriedlogan-: and if you punch the `node_provider` checkbox on the left-hand side next to the result list, you see it's *always* limestone-regionone22:39
*** avolkov has quit IRC22:40
logan-https://storage.bhs1.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/logs_51/678051/4/check/nova-tox-collect-py37/f9b92d6/mysql/error.log are any of these messages at the bottom pertinent?22:40
openstackgerritMatt Riedemann proposed openstack/nova master: Create volume attachment during boot from volume in compute  https://review.opendev.org/54142022:42
mriedemlogan-: we're not sure, we need zzzeek's help for that22:42
mriedemthose are definitely related to the opportunistic db tests that fail,22:43
mriedemthe db is a random db name created by the test fixture,22:44
mriedemopenstack_citest is setup by a script on the node22:44
mriedemhttps://github.com/openstack/nova/blob/master/tools/test-setup.sh does the mysql/postgresql setup22:44
efriedlogan-: this might be what you were already looking at, but I did a compare of the mysql error log on a "good" run and a "bad" one, and didn't see significant differences.22:45
*** munimeha1 has quit IRC22:45
efriedlogan-: see bottom-most comment on https://review.opendev.org/#/c/678051/22:45
logan-yep, that's what i was wondering, thanks22:45
efriedalthough the bad one is taking way longer...22:47
efriedbut dunno if that's cause or effect22:47
logan-to answer the general question -- I don't know of anything significantly different on our nodes. they're pretty standard dual proc intel systems with ssds. I wonder if there's something in particular with this test that is wrecking the node somehow.22:47
efriedmhm, mriedem here's an interesting data point: in the "good" one, all those "abort" messages come through in under a minute. In the "bad" one, the first 16 or so are under a minute, then we start getting long delays.22:48
efriednow, IIUC, everything before those "abort" messages is just setup, and the "abort" things are happening during the actual meat of the test. So I guess that's not really surprising.22:49
openstackgerritMatt Riedemann proposed openstack/nova master: Create volume attachment during boot from volume in compute  https://review.opendev.org/54142022:50
efriedIt would be painful, but maybe if we knew what operations were being run, it would give us a clue as to what's happening when it starts to grind down.22:53
*** tkajinam has joined #openstack-nova22:54
efriedanybody know how to turn on like trace logs to see the sql statements themselves as they're issued?22:54
openstackgerritMatt Riedemann proposed openstack/nova master: DNM: Stop using volume_api.initialize_connection  https://review.opendev.org/68250822:54
mriedemefried: i want to say https://docs.openstack.org/nova/latest/configuration/config.html#database.connection_debug or22:55
mriedemhttps://docs.openstack.org/nova/latest/configuration/config.html#database.connection_trace22:55
alex_xuefried: https://review.opendev.org/68250922:57
efriedalex_xu: cool, thank you.22:58
alex_xuefried: np!22:58
efriedalex_xu: since this is market-y, we probably don't want to mention the generic resource framework22:58
alex_xuefried: ah, got it22:58
efriedthat's a thing only devs would care about, at least until it reaps perf benefits via placement, which is a ways off.22:58
efriedalex_xu: perhaps kill that sentence and instead explain a bit more what vpmem is22:59
alex_xuefried: sev mentioned that, that is why I follow22:59
efriedoh? looking...22:59
efriedalex_xu: yeah, I'm looking for something corresponding to "to protect users against attackers or rogue administrators snooping on23:00
efried    their workloads when using the libvirt compute driver"23:00
alex_xuefried: got it23:00
alex_xumriedem: you are super fast23:00
efriedSo like "Added $feature for $benefit"23:00
* mriedem holsters gun23:01
mriedem"Added VPMEMs for Intel to sell hardware."23:01
mriedemdid it for you :)23:01
sean-k-mooneyadd vpmem so the nsa can keep allyour data in "ram" to track you faster23:02
mriedem"added vpmems so alex can continue working upstream"23:02
mriedemi kid i kid23:03
* mriedem has already made alex angry by 7am23:03
mriedemi think as long as you have HPC in there somewhere it will hit enough marketing bells to satisfy the entry23:04
mriedem"Support virtual persistent memory devices for HPC workloads when using the libvirt driver.":23:04
mriedemsomething like that23:04
efried"edge"23:04
sean-k-mooneythe primary usecase is really something like "vPMEM support was intoduced to allow big data workloads to retain more data in persitent memroy reduceing the total cost of onership for big data clouds"23:04
sean-k-mooneyya hpc is the other vertical that will likely use it most23:05
openstackgerritEric Fried proposed openstack/nova master: DNM: Try to repro bug 1823251 with mysql logs  https://review.opendev.org/67805123:05
openstackbug 1823251 in OpenStack Compute (nova) "Spike in TestNovaMigrationsMySQL.test_walk_versions/test_innodb_tables failures since April 1 2019 on limestone-regionone" [High,Confirmed] https://launchpad.net/bugs/182325123:05
efriedmriedem, logan-: Howza ^23:05
mriedem"other vertical"? someone get this guy a c-suite office23:05
mriedemefried: it's dinner o'clock for me23:06
efriedme too23:06
sean-k-mooneyits tuesday for me so im going to sleep soon o/23:06
openstackgerritEric Fried proposed openstack/nova master: DNM: Try to repro bug 1823251 with mysql logs  https://review.opendev.org/67805123:07
openstackbug 1823251 in OpenStack Compute (nova) "Spike in TestNovaMigrationsMySQL.test_walk_versions/test_innodb_tables failures since April 1 2019 on limestone-regionone" [High,Confirmed] https://launchpad.net/bugs/182325123:07
*** mriedem has quit IRC23:07
alex_xuwhatever that is worth to celebrate :)23:11
*** hoonetorg has quit IRC23:11
*** tbachman has quit IRC23:14
*** Sundar has quit IRC23:14
*** hoonetorg has joined #openstack-nova23:25
*** eharney has quit IRC23:25
*** gbarros has joined #openstack-nova23:34
*** gbarros has quit IRC23:37
*** tbachman has joined #openstack-nova23:48
*** gbarros has joined #openstack-nova23:49
*** luksky has quit IRC23:58

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!