Tuesday, 2019-04-02

*** mgoddard has quit IRC00:06
*** wolverineav has joined #openstack-nova00:07
*** wolverineav has quit IRC00:12
*** tetsuro has joined #openstack-nova00:13
*** mgoddard has joined #openstack-nova00:22
*** antonym has joined #openstack-nova00:26
mnasermelwitt: https://review.openstack.org/#/c/649204/2 I don't think I have time to actually get this to merge, but I tried working on it and I think it works.. so threw it up there if you wanna get it merged or if someone else wants to pick it up00:28
melwittmnaser: I don't think that helps with the brokenness from novnc though00:29
mnasermelwitt: oh yeah it doesn't, but I did the shuffling so I figured if someone was interested in making that code base a bit neater.. :P00:30
melwittself.path won't have the token in it by the time it gets to the plugin00:30
*** tetsuro has quit IRC00:31
melwittmnaser: ok. I did similar locally to come to the same conclusion. the only thing I notice offhand is I think you removed the scheme validation, at best you'd have to have it run in a different order than before (currently it's scheme, token, the rest). with the plugin it would have to be token, scheme, the rest00:33
*** tetsuro has joined #openstack-nova00:35
*** hongbin has joined #openstack-nova00:37
*** wolverineav has joined #openstack-nova00:46
*** tiendc has joined #openstack-nova00:47
*** wolverineav has quit IRC00:51
*** markvoelker has joined #openstack-nova00:55
*** igordc has quit IRC01:02
*** luksky has quit IRC01:08
*** ricolin has joined #openstack-nova01:29
*** whoami-rajat has joined #openstack-nova01:31
*** brinzhang has joined #openstack-nova01:35
*** BjoernT has quit IRC01:37
*** brinzhang has quit IRC01:47
*** brinzhang has joined #openstack-nova01:48
*** BjoernT has joined #openstack-nova01:49
*** BjoernT has quit IRC01:51
*** BjoernT has joined #openstack-nova01:52
*** lbragstad has joined #openstack-nova02:02
*** nicolasbock has quit IRC02:05
*** rcernin_ has joined #openstack-nova02:05
*** rcernin has quit IRC02:06
*** mrhillsman_afk is now known as mrhillsman02:10
*** rcernin_ has quit IRC02:12
*** openstackgerrit has joined #openstack-nova02:13
openstackgerritBrin Zhang proposed openstack/nova-specs master: Specifying az when restore shelved server  https://review.openstack.org/62468902:13
*** rcernin has joined #openstack-nova02:15
*** wolverineav has joined #openstack-nova02:44
openstackgerritTakashi NATSUME proposed openstack/nova master: WIP: Add a live migration regression test  https://review.openstack.org/64120002:53
*** takashin has joined #openstack-nova02:54
*** cfriesen has quit IRC02:55
*** hongbin has quit IRC02:57
*** hongbin has joined #openstack-nova02:58
*** BjoernT has quit IRC03:05
*** hongbin has quit IRC03:12
*** phasespace has quit IRC03:14
*** psachin has joined #openstack-nova03:14
*** hongbin has joined #openstack-nova03:14
*** samueldmq has quit IRC03:15
openstackgerritSeyeong Kim proposed openstack/nova stable/rocky: Share snapshot image membership with instance owner  https://review.openstack.org/64385303:19
*** wolverineav has quit IRC03:22
*** wolverineav has joined #openstack-nova03:23
*** hongbin has quit IRC03:31
*** spsurya has joined #openstack-nova03:31
*** brinzhang has quit IRC03:45
*** brinzhang has joined #openstack-nova03:46
*** cfriesen has joined #openstack-nova04:04
*** udesale has joined #openstack-nova04:11
*** krypto has joined #openstack-nova04:27
*** wolverineav has quit IRC04:28
*** wolverineav has joined #openstack-nova04:32
*** alex_xu has quit IRC04:46
*** alex_xu has joined #openstack-nova04:53
*** ileixe has joined #openstack-nova04:55
*** ratailor has joined #openstack-nova04:59
*** wolverineav has quit IRC05:01
openstackgerritTakashi NATSUME proposed openstack/nova master: Add a live migration regression test  https://review.openstack.org/64120005:12
*** cfriesen has quit IRC05:15
openstackgerritSeyeong Kim proposed openstack/nova stable/rocky: Share snapshot image membership with instance owner  https://review.openstack.org/64385305:23
*** lbragstad has quit IRC05:33
openstackgerritSeyeong Kim proposed openstack/nova stable/rocky: Share snapshot image membership with instance owner  https://review.openstack.org/64385305:34
*** pcaruana has joined #openstack-nova05:35
*** tbachman has quit IRC05:38
*** tbachman has joined #openstack-nova05:39
*** ratailor has quit IRC05:45
*** ratailor has joined #openstack-nova05:50
*** tbachman has quit IRC05:54
*** markvoelker has quit IRC05:58
openstackgerritBrin Zhang proposed openstack/nova-specs master: Specifying az when restore shelved server  https://review.openstack.org/62468905:59
*** openstackgerrit has quit IRC06:09
*** sridharg has joined #openstack-nova06:12
*** openstackgerrit has joined #openstack-nova06:20
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922906:20
kaisersmdbooth: Yep, saw that. Will follow up06:22
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922906:24
*** markvoelker has joined #openstack-nova06:29
*** mdbooth_ has joined #openstack-nova06:35
*** mdbooth has quit IRC06:39
*** ivve has joined #openstack-nova06:43
*** slaweq has joined #openstack-nova06:44
*** dpawlik has joined #openstack-nova06:44
*** tesseract has joined #openstack-nova07:02
*** belmoreira has joined #openstack-nova07:05
*** tosky has joined #openstack-nova07:09
*** kashyap has quit IRC07:10
*** awalende has joined #openstack-nova07:11
*** luksky has joined #openstack-nova07:15
*** tssurya has joined #openstack-nova07:19
openstackgerritTakashi NATSUME proposed openstack/nova master: Fix a deprecation warning  https://review.openstack.org/64923407:21
*** ircuser-1 has quit IRC07:23
*** rpittau|afk is now known as rpittau07:23
*** ccamacho has joined #openstack-nova07:24
*** krypto has quit IRC07:32
*** yan0s has joined #openstack-nova07:34
*** ccamacho has quit IRC07:42
*** balszoll has joined #openstack-nova07:43
*** ccamacho has joined #openstack-nova07:49
*** helenafm has joined #openstack-nova07:50
*** ralonsoh has joined #openstack-nova07:51
*** kashyap has joined #openstack-nova07:54
*** takashin has left #openstack-nova08:00
*** sidx64_ has joined #openstack-nova08:03
*** tetsuro has quit IRC08:12
*** ttsiouts has joined #openstack-nova08:14
openstackgerritMerged openstack/nova master: Pass --nic when creating servers in evacuate integration test script  https://review.openstack.org/64903608:15
*** wolverineav has joined #openstack-nova08:16
*** wolverineav has quit IRC08:20
*** tkajinam has quit IRC08:21
*** tetsuro has joined #openstack-nova08:22
*** zbr|pto is now known as zbr08:23
*** sidx64_ has quit IRC08:26
*** tetsuro has quit IRC08:29
*** cdent has joined #openstack-nova08:29
*** xek has joined #openstack-nova08:30
*** priteau has joined #openstack-nova08:34
*** tetsuro has joined #openstack-nova08:36
*** derekh has joined #openstack-nova08:41
openstackgerritMichael Still proposed openstack/nova master: Remove fake_libvirt_utils from connection tests.  https://review.openstack.org/64255708:46
openstackgerritMichael Still proposed openstack/nova master: Remove fake_libvirt_utils from snapshot tests.  https://review.openstack.org/64255808:46
openstackgerritMichael Still proposed openstack/nova master: Make privsep.chown mocking for libvirt snapshot tests less magic.  https://review.openstack.org/64213408:46
openstackgerritMichael Still proposed openstack/nova master: Remove fake_libvirt_utils from virt driver tests.  https://review.openstack.org/64389408:46
openstackgerritMichael Still proposed openstack/nova master: Remove fake_libvirt_utils from libvirt imagebackend tests.  https://review.openstack.org/64389508:46
openstackgerritMichael Still proposed openstack/nova master: Remove remaining vestiges of fake_libvirt_utils from unit tests.  https://review.openstack.org/64389608:46
openstackgerritMichael Still proposed openstack/nova master: Remove fake_libvirt_utils users in functional testing.  https://review.openstack.org/64479308:46
openstackgerritMichael Still proposed openstack/nova master: Remove usused umask argument to virt.libvirt.utils.write_to_file  https://review.openstack.org/64508608:46
openstackgerritMichael Still proposed openstack/nova master: Remove write_to_file.  https://review.openstack.org/64508708:46
*** manjeets has quit IRC08:56
openstackgerritYongli He proposed openstack/nova master: Clean up orphan instances virt driver  https://review.openstack.org/64891208:57
openstackgerritYongli He proposed openstack/nova master: Clean up orphan instances  https://review.openstack.org/62776508:57
*** manjeets has joined #openstack-nova08:57
*** manjeets has quit IRC09:05
*** davidsha has joined #openstack-nova09:10
*** rcernin has quit IRC09:11
*** ccamacho has quit IRC09:14
*** davidsha has quit IRC09:22
*** balszoll has quit IRC09:26
*** sapd1_x has joined #openstack-nova09:26
*** davidsha has joined #openstack-nova09:28
openstackgerritSeyeong Kim proposed openstack/nova stable/rocky: Share snapshot image membership with instance owner  https://review.openstack.org/64385309:31
*** dtantsur|afk is now known as dtantsur09:33
openstackgerritKashyap Chamarthy proposed openstack/nova-specs master: Re-propose the spec to allow specifying a list of CPU models  https://review.openstack.org/64203009:36
*** manjeets has joined #openstack-nova09:38
*** tbachman has joined #openstack-nova09:40
*** maciejjozefczyk has quit IRC09:42
*** Sundar has joined #openstack-nova09:43
*** maciejjozefczyk has joined #openstack-nova09:45
*** sapd1_x has quit IRC09:46
*** zigo has joined #openstack-nova09:55
*** ccamacho has joined #openstack-nova10:00
*** priteau has quit IRC10:07
*** priteau has joined #openstack-nova10:12
openstackgerritMichael Still proposed openstack/nova master: Style corrections for privsep usage.  https://review.openstack.org/64861510:13
openstackgerritMichael Still proposed openstack/nova master: Hacking N362: Don't abbrev/alias privsep import  https://review.openstack.org/64919010:13
openstackgerritMichael Still proposed openstack/nova master: Improve test coverage of nova.privsep.path.  https://review.openstack.org/64860110:13
openstackgerritMichael Still proposed openstack/nova master: Improve test coverage of nova.privsep.fs.  https://review.openstack.org/64860210:13
openstackgerritMichael Still proposed openstack/nova master: Improve test coverage of nova.privsep.fs, continued.  https://review.openstack.org/64860310:13
openstackgerritMichael Still proposed openstack/nova master: Add test coverage for nova.privsep.libvirt.  https://review.openstack.org/64861610:13
openstackgerritMichael Still proposed openstack/nova master: Add test coverage for nova.privsep.qemu.  https://review.openstack.org/64919110:13
openstackgerritMichael Still proposed openstack/nova master: Privsepify ipv4 forwarding enablement.  https://review.openstack.org/63543110:13
openstackgerritMichael Still proposed openstack/nova master: Remove unused FP device creation and deletion methods.  https://review.openstack.org/63543310:13
openstackgerritMichael Still proposed openstack/nova master: Privsep the ebtables modification code.  https://review.openstack.org/63543510:13
openstackgerritMichael Still proposed openstack/nova master: Move adding vlans to interfaces to privsep.  https://review.openstack.org/63543610:13
openstackgerritMichael Still proposed openstack/nova master: Move iptables rule fetching and setting to privsep.  https://review.openstack.org/63650810:13
openstackgerritMichael Still proposed openstack/nova master: Move dnsmasq restarts to privsep.  https://review.openstack.org/63928010:13
openstackgerritMichael Still proposed openstack/nova master: Move router advertisement daemon restarts to privsep.  https://review.openstack.org/63928110:13
openstackgerritMichael Still proposed openstack/nova master: Move calls to ovs-vsctl to privsep.  https://review.openstack.org/63928210:13
openstackgerritMichael Still proposed openstack/nova master: Move setting of device trust to privsep.  https://review.openstack.org/63928310:13
openstackgerritMichael Still proposed openstack/nova master: Move final bridge commands to privsep.  https://review.openstack.org/63958010:13
openstackgerritMichael Still proposed openstack/nova master: Cleanup the _execute shim in nova/network.  https://review.openstack.org/63958110:13
*** wolverineav has joined #openstack-nova10:17
*** ttsiouts has quit IRC10:20
*** ttsiouts has joined #openstack-nova10:21
*** wolverineav has quit IRC10:21
*** ttsiouts has quit IRC10:26
sean-k-mooneystephenfin: bauzas care to hit this https://review.openstack.org/#/c/622972/1610:30
*** tbachman has quit IRC10:47
stephenfindone10:49
sean-k-mooney:) thank you10:50
NewBrucesean-k-mooney - DM’ing you with an update on the issue re: port binding issue whens live migrating RDO -> OSA10:54
sean-k-mooneyNewBruce: oh cool so you made some progress root causing the interaction issues10:54
*** nicolasbock has joined #openstack-nova10:58
NewBruceso, i filled that with a bunch of extra debugs - if im migrating from compute28 -> compute29 (both OSA and successful)11:05
NewBruce2019-02-12 13:09:33.458 59488 INFO nova.network.neutronv2.api [req-8e14a273-fc18-461e-a2de-b8bad835dcd8 a3bee416cf67420995855d602d2bccd3 a564613210ee43708b8a7fc6274ebd63 - default default] [BRUCE](A-8-2): _setup_migration_port_profile: host_id = cc-compute29-kna111:05
*** tiendc has quit IRC11:05
*** erlon_ has joined #openstack-nova11:06
*** ttsiouts has joined #openstack-nova11:06
*** _alastor_ has quit IRC11:08
*** udesale has quit IRC11:10
jaypipesstephenfin: you around? I'm having a hell of time trying to run tests in nova now that tox requirements have changed.11:12
jaypipesstephenfin: nothing I seem to do will get me out of the hell that is this: [jaypipes@uberbox nova]$ tox -efunctional11:12
jaypipesERROR: tox version is 2.5, required is at least 3.1.111:12
jaypipesI've sudo -H pip install -U pip tox, I've apt purge'd python-tox11:12
jaypipesand still /usr/local/bin/tox persists and is pointing somewhere old11:13
jaypipesand I have no idea why this crap has to just suddenly break :)11:13
*** janki has joined #openstack-nova11:14
jaypipesstephenfin: cdent has graciously helped me. needed to `sudo -H pip uninstall tox && sudo -H pip install tox && reload bash...`11:18
sean-k-mooneyjaypipes: ya i was just going to say that you proably need to uninstall first11:32
sean-k-mooneyi had the same issue more or less to as i started with tox form my package mangaer and need to swap to pip later11:32
*** cdent has quit IRC11:36
*** ttsiouts has quit IRC11:44
*** ttsiouts has joined #openstack-nova11:45
*** tetsuro has quit IRC11:48
*** tetsuro has joined #openstack-nova11:49
*** ttsiouts has quit IRC11:49
*** ttsiouts has joined #openstack-nova11:53
*** phasespace has joined #openstack-nova11:54
stephenfinjaypipes: Good to hear you got sorted. FWIW, the current guidelines suggest not using 'sudo pip install' since it's way too likely to break distro packages. You need to prepend (or append, I don't recall the order) '~/.local/bin' to PATH then 'pip install --local'12:03
stephenfinjaypipes: fwiw, I was reluctant to merge the patches that required tox 3.1.1+ but I think we decided the benefits outweighed the costs12:03
* stephenfin uses system tox, reno, etc. wherever possible12:04
*** tbachman has joined #openstack-nova12:05
* sean-k-mooney aviods system packages whenever possibel and prefers pip packages or developer repos/ppas over disto one in general12:08
sean-k-mooneythat said it depend on what the thing is that im installing12:08
sean-k-mooneystephenfin: can i get you input on http://logs.openstack.org/33/647733/2/check/nova-tox-functional/72500de/testr_results.html.gz12:08
sean-k-mooneyour functional notification tests are asserting behavior of the payloads as json dicts12:09
sean-k-mooney1st this feals wrong at first glacne to call these fucntional test but i have not looked at the code to see how they work so ill put that aside for a minut12:10
*** sapd1_x has joined #openstack-nova12:10
sean-k-mooneyshould i a.) update these to use the new version of the object b.) convert them to do assertion using the objects or c.) force them to use the old version somehow?12:11
sean-k-mooneyby the way these are semi valid failures12:13
sean-k-mooneyas i am updating the allowed values in a field in the image metadata object but unlike everywhere else in nova we appearend version bump composed object in this specific case.12:15
sean-k-mooneyhttps://review.openstack.org/#/c/647733/12:15
*** manjeets has quit IRC12:16
*** wolverineav has joined #openstack-nova12:17
jaypipesthx stephenfin12:21
*** wolverineav has quit IRC12:22
*** markvoelker has quit IRC12:25
*** markvoelker has joined #openstack-nova12:25
sean-k-mooneyjaypipes: by the way i dont know if you have time to revew the last two patches in the sriov migrtion blueprint https://review.openstack.org/#/q/topic:bp/libvirt-neutron-sriov-livemigration+(status:open)12:26
jaypipessean-k-mooney: hmm, my favorite topics, merged into one.12:26
sean-k-mooneyit would be nice to get that squared away before the ptg12:26
sean-k-mooneyhaha all that is missing is numa12:27
jaypipesand FPGAs.12:28
sean-k-mooneylive migration with numa affined fpga exposed by sriov passthough... at some point it just gets easier to move the damb server12:29
sean-k-mooneyit is the one of the up sides of ironic12:30
artomOr you know, register a new corporation and buy them new machines.12:30
sean-k-mooneyim just going to pop out to grab lunch so ill brb12:32
*** jmlowe has quit IRC12:34
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922912:37
*** tbachman has quit IRC12:39
*** brinzhang has quit IRC12:39
*** tbachman has joined #openstack-nova12:41
stephenfinsean-k-mooney: They look like valid errors to me. I'm guessing we should update the notification samples but gibi_off or mriedem are probably the people to ask12:42
*** belmoreira has quit IRC12:43
*** belmoreira has joined #openstack-nova12:47
*** eharney has joined #openstack-nova12:50
*** mriedem has joined #openstack-nova12:52
*** cdent has joined #openstack-nova12:55
*** tetsuro has quit IRC12:57
mriedemcdent: question inline https://review.openstack.org/#/c/649068/12:58
*** dikonoor has joined #openstack-nova13:04
* cdent goes to look13:06
*** mriedem has quit IRC13:08
*** mriedem has joined #openstack-nova13:09
*** ygk_12345 has joined #openstack-nova13:09
kashyapCan anyone remind me again, has upstream Git master opened for Train, yet?  (/me is dazed after PTO)13:14
sean-k-mooneyyes13:14
sean-k-mooneylike a week or two ago13:14
kashyapThanks13:15
sean-k-mooneyit opens after RC1 but we dont then to merge big thing for  a few weeks after13:15
gibi_offsean-k-mooney: for notifications we only emit the latest version only13:17
gibi_offsean-k-mooney: so please update the samepl file according to the change in the object13:17
gibi_offsean-k-mooney: the tests are functional ast they call the nova API and assert if notifications are received13:18
sean-k-mooneygibi_off: ok but would they not avoid this issue if they parsed the notificaion into a python object and then checked that13:18
sean-k-mooneygibi_off: also if your off today your doing it wrong :) but thanks ill update it13:19
kashyapefried: When you get a moment, since Train has forked, might want to put this through: https://review.openstack.org/#/c/641981/13:19
efriedkashyap: done, thanks for the reminder.13:20
gibi_offsean-k-mooney: the parsed object would have different version13:21
gibi_offsean-k-mooney: also nova cannot assume how the notifications are parsed by the consumer so we assert the json we emmit13:21
sean-k-mooneyah good points13:21
kashyapefried: Tack!13:22
*** lbragstad has joined #openstack-nova13:22
kashyap(Swedish for "thanks", if people don't want to look it up :D)13:22
openstackgerritMatt Riedemann proposed openstack/nova stable/stein: Add functional regression test for bug 1669054  https://review.openstack.org/64931913:25
openstackbug 1669054 in OpenStack Compute (nova) "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem)13:25
openstackgerritMatt Riedemann proposed openstack/nova stable/stein: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64932013:25
*** lpetrut has joined #openstack-nova13:26
*** priteau has quit IRC13:28
*** trident has quit IRC13:30
*** ratailor has quit IRC13:32
*** trident has joined #openstack-nova13:33
*** hongbin has joined #openstack-nova13:35
jaypipesmnaser: around? can you please execute both of the queries in this pastebin and show me the output please? http://paste.openstack.org/show/748718/13:36
*** awalende has quit IRC13:37
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Add functional regression test for bug 1669054  https://review.openstack.org/64932513:37
openstackbug 1669054 in OpenStack Compute (nova) stein "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem)13:37
openstackgerritMatt Riedemann proposed openstack/nova stable/rocky: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64932613:37
*** awalende has joined #openstack-nova13:37
*** BjoernT has joined #openstack-nova13:39
*** BjoernT has quit IRC13:39
*** awalende has quit IRC13:42
*** helenafm has quit IRC13:44
*** bbowen__ has joined #openstack-nova13:48
yonglihemriedem: alex_xu:  patch splited, and new unit test added. and... i need fix 2 of them. https://review.openstack.org/#/c/627765/13.  thanks.13:49
*** awaugama has joined #openstack-nova13:49
*** beagles is now known as beagles_dentist13:50
alex_xuyonglihe: sorry for not reach it for a while, will try tomorrow13:51
yonglihethanks, great news for me.13:52
*** eharney_ has joined #openstack-nova13:52
efriedkashyap: I actually knew that :)13:53
kashyapefried: Nice.  Then I can try a few more phrases later on, then :D13:53
efriedIn Swedish? That would be pretty much the extent of it for me. Hit me with some other languages though13:54
*** eharney has quit IRC13:55
efriedkashyap: full disclosure, I wasn't sure of the spelling. The same pronunciation has the same meaning (but slightly different spellings) in Swedish, Danish, and Norwegian.13:55
kashyapefried: Yeah, you're right -- they're all North Germanic languages13:55
kashyapSo to put the spelling thing to rest: Tack (Swedish), Takk (Norwegian), and Tak (Danish) :D13:56
*** BjoernT has joined #openstack-nova13:56
ygk_12345hi all13:57
ygk_12345i having issues with spinning up vms on a compute node. they are forever in the scheduling state13:58
ygk_12345it is rocky setup OSA13:58
*** lpetrut has quit IRC14:00
kashyapygk_12345: Hi, as noted in PM, I think you might get better responses on the more generic #openstack channel, where more admins / operators might hang out.14:04
efriedmriedem: Here's an interesting one: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Background%20on%20this%20error%20at%3A%20http%3A%2F%2Fsqlalche.me%2Fe%2Frvf5%5C%2214:05
efried(e.g. http://logs.openstack.org/50/638150/7/check/openstack-tox-py27/345d1d1/testr_results.html.gz)14:05
efriedSomething happens every day around 1am that makes sqla race like hell.14:05
*** tssurya has quit IRC14:05
*** ttsiouts has quit IRC14:07
*** ttsiouts has joined #openstack-nova14:08
*** sapd1_x has quit IRC14:08
*** ttsiouts has quit IRC14:10
*** ttsiouts has joined #openstack-nova14:10
artomsean-k-mooney, hah, midair collision :) In any case, thanks for the sanity check14:12
artomsean-k-mooney++14:12
artomDoh, wrong channel14:12
*** tssurya has joined #openstack-nova14:14
ygk_12345any nova expert here ?14:15
ygk_12345tried in #openstack channel but no one could help me14:15
*** smcginnis_pto is now known as smcginnis14:16
efriedygk_12345: Have you had a look in the logs yet?14:17
ygk_12345efried: yes14:17
ygk_12345efried: it doesn't seem to indicate that the build started14:18
ygk_12345efried: http://paste.openstack.org/show/748717/14:18
ygk_12345efried: it is in the scheduling state forever14:18
*** wolverineav has joined #openstack-nova14:18
efriedygk_12345: What about the n-sch and/or n-cpu logs? Anything appear to be hanging/repeating in there?14:20
ygk_12345efried: nothing this is the log I got so far14:20
ygk_12345efried: I see this "claim resources in the placement API for instance c2093503-d2d0-4401-8956-6a68a7d6e0dc claim_resources /openstack/venvs/nova-18.1.5/lib/python2.7/site-packages/nova/scheduler/utils.py:934"14:21
openstackgerritChris Dent proposed openstack/nova master: Don't report 'exiting' when mapping cells  https://review.openstack.org/64934014:22
*** sapd1_x has joined #openstack-nova14:22
*** mlavalle has joined #openstack-nova14:22
*** helenafm has joined #openstack-nova14:23
*** wolverineav has quit IRC14:23
efriedygk_12345: that was in the n-sch log presumably. Nothing after that?14:25
cdentygk_12345: did you discover hosts in your cells?14:25
ygk_12345cdent: efried yes14:25
ygk_12345cdent: how to validate it ?14:25
ygk_12345cdent: regarding cells ?14:25
*** dpawlik has quit IRC14:26
openstackgerritStephen Finucane proposed openstack/nova-specs master: Standardize CPU resource tracking  https://review.openstack.org/55508114:27
cdentlisting your hypervisors14:27
cdentI think you need to look at your n-cpu logs14:27
ygk_12345cdent: all the hypervisors are enabled14:27
mriedemefried: i think that's this http://status.openstack.org/elastic-recheck/#179336414:27
ygk_12345cdent: where to look at the n-cpu logs ?14:28
cdentygk_12345: it depends on how you installed your openstack, but somewhere you have a nova-compute process, on the host which is or manage your hypervisor14:28
ygk_12345cdent: i see only nova-compute log in the hypervisor. Also I cant find any entry with the vm uuid in the nova-compute log14:29
cdentnova-compute log and n-cpu log are the same thing14:29
cdentthe name depends on how things were installed14:30
ygk_12345cdent: ok then I dont find any vm uuid entries there and all seems to be fine on the hypervisor14:30
cdentygk_12345: then either in the conductor log or the scheduler log there should be some kind of error, after the timestamp of the "claim resources"14:32
cdentgrepping for the vm uuid won't be sufficient, you need to look through the log for an error or warning14:32
openstackgerritHamdy Khader proposed openstack/nova master: Fix port update of host_id in case of baremetal instance  https://review.openstack.org/64934514:33
ygk_12345cdent: I find this in conductor log "Setting instance to ERROR state.: MessagingTimeout: Timed out waiting for a reply to message ID ef508c01e69841ae9f84356b7463165c"14:35
ygk_12345cdent: Failed to compute_task_build_instances: Timed out waiting for a reply to message ID ef508c01e69841ae9f84356b7463165c: MessagingTimeout: Timed out waiting for a reply to message ID ef508c01e69841ae9f84356b7463165c14:35
sean-k-mooneyygk_12345: well that is the down call to the comptue to spawn the instance14:36
sean-k-mooneyi think14:36
cdentygk_12345: okay that's progress (in the sense of useful info). either your messaging bus (rabbitmq) is too busy or your conductor and compute host are unable to talk to one another over that bus14:36
sean-k-mooneyso on the compute you should be able to grep for ef508c01e69841ae9f84356b7463165c14:37
cdentthis might mean the compute node is misconfigured14:37
*** ivenszambrano has joined #openstack-nova14:37
ygk_12345cdent: sean-k-mooney let me check14:37
sean-k-mooneycdent: as in listneing to the wrong exchange14:37
cdentygk_12345: I'm sorry but I've got to go, good luck with it14:37
*** cdent has quit IRC14:37
ygk_12345sean-k-mooney: i dont find any log entry on the compute node with that message ID14:38
ygk_12345sean-k-mooney: do u suspect its the rabbitmq issue with compute node ?14:39
sean-k-mooneythat would seam like the next most likely option14:39
ygk_12345sean-k-mooney: how to check that ?14:39
ygk_12345sean-k-mooney: but I dont think we made any changes to rabbitmq14:40
sean-k-mooneyam first you need to check the amqp setting in your nova.conf on both the conduction and compute node and make sure they are the same14:40
sean-k-mooneythe next thing to do would be to look a the topic queue and see if you can see the pending message14:40
*** ccamacho has quit IRC14:41
ygk_12345sean-k-mooney: i see that all the rabbit servers are pingable from the compute node14:41
mnaserjaypipes: around right now, ill do the explain14:44
ygk_12345sean-k-mooney: any idea ?14:44
mnaserjaypipes: http://paste.openstack.org/show/748727/14:46
sean-k-mooneyygk_12345: have you logged into rabbitmq to see if the messages are still in the excahge queues14:46
ygk_12345sean-k-mooney: i dont see any queues around when I did list_queues14:47
ygk_12345sean-k-mooney: there r three rabbit containers14:47
sean-k-mooneywell presuably you are using the clustering plugin so from any of them you should see the same view14:48
ygk_12345sean-k-mooney: yes haproxy14:48
sean-k-mooneyhaproxy is not the same thing14:48
ygk_12345sean-k-mooney: i see no error messages in the compute log14:48
sean-k-mooneyhaproxy sits in front of rabbitmq and loadblances across the rabbit instances. but the 3 rabbitmq instance also need to be in a cluster.14:49
ygk_12345sean-k-mooney: yes it is openstack-ansible setup14:49
dansmiththis sounds like it's very not dev discussion? maybe you guys could move to #openstack?14:50
sean-k-mooneyperhaps although ygk_12345 i think the openstack ansible chanel might be able to help more14:51
sean-k-mooneythere is obviosly some issue between the compute node and the conducttor after the compute node filled up its disk  and it appears to be related to rabbitmq14:52
ygk_12345sean-k-mooney: ok14:53
*** liuyulong has joined #openstack-nova14:53
*** ygk_12345 has quit IRC14:54
mnaserbtw, novnc has broken us and refuses to revert the thing that broke us: https://github.com/novnc/noVNC/pull/122014:56
mnaserso CI currently installs novnc from packaging but anyone using the actual latest novnc will be broken14:56
*** janki has quit IRC14:56
*** sridharg has quit IRC14:56
*** ileixe has quit IRC14:57
*** lpetrut has joined #openstack-nova14:59
dansmithmnaser: so it looks like we need to do something on our end then14:59
jaypipesmnaser: that's what I was afraid of... thanks.15:00
mnaserdansmith: yeah, im not sure how we "do something on our end" because of the design architecture, it doesn't give you a way to pass that info, unless we implement our own vnc.html and store it in repo and serve it.. overlaid on top of the vnc code, it starts to be iffy15:01
mnaserI pinned the novnc release in openstack ansible to avoid this but yeah.15:01
mnaserjaypipes: :< you're welcome15:01
dansmithmnaser: yeah, I'm guessing that is what they're saying, that we should provide our own implementation of the client (html) if we're going to do the auth part15:02
*** cfriesen has joined #openstack-nova15:02
mnaserdansmith: the weird thing is that the auth part, they have their own "token" stuff, including something called token_plugins which you can implement15:02
mnaserbut even if you implement a token plugin, you can't even use novnc without rewriting things.. I don't get it.15:03
dansmithwithout rewriting an html file you mean right?15:04
*** yan0s has quit IRC15:04
dansmithpassing the token in path means we just get the token at the websocket url, right?15:06
dansmiththat'd be the right place to do the auth, so why is that a problem?15:06
*** tbachman_ has joined #openstack-nova15:07
*** tbachman has quit IRC15:08
*** tbachman_ is now known as tbachman15:08
dansmith(looks further) yeah, that's where we're getting the token for our server side websocket15:08
dansmithso I'm not sure why that's not a solution15:09
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Add functional regression test for bug 1669054  https://review.openstack.org/64936215:10
openstackbug 1669054 in OpenStack Compute (nova) stein "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem)15:10
openstackgerritMatt Riedemann proposed openstack/nova stable/queens: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64936315:10
kashyapefried++15:13
kashyap(Hmm, should probably get a karma bot in here.)15:13
*** lpetrut has quit IRC15:14
*** beagles_dentist is now known as beagles15:15
dansmithno, we shouldn't.15:17
mnaserdansmith: what if we did ?path=token -- thoughts?15:19
mnaserthen we can drop that whole path parsing thing and just grab all query params15:20
dansmithmnaser: you mean path=$token instead of path=token=$token?15:20
mnaseryeah15:20
kashyapdansmith: Was only joking; where is your characteristic sense of humor...15:20
dansmithyou don't want to defeat the ability to actually use a path as part of that and have a proxy in between do you?15:20
dansmithpath=$token seems far more hacky than actually providing a legit url as the path15:21
mnaseryeah, that is valid15:21
mnaserthe only concern I have now is *hopefully* that works for both old and new novnc15:22
* mnaser doesn't have time right tow to follow up on a lot of this15:22
dansmithnot sure why that matters, since we control our package versions (and it's up to the distro to match), but as far as I can tell, they didn't change the path behavior15:22
dansmithor claim not to15:22
*** belmorei_ has joined #openstack-nova15:22
*** belmorei_ has quit IRC15:23
*** eharney_ has quit IRC15:23
mnaserdansmith: nova's CI actually installs novnc from distro pkgs which is probably why this wasn't caught15:25
mnaserand yeah, the path behavior seems to always have been there and continued to be there15:25
dansmithoh? I thought we had an u-c for it15:25
mnasernope, discovered this yesterday15:25
mnaserwe have u-c for Websockify probably15:26
mnaserbut novnc isn't even a python package so yeah15:26
*** helenafm has quit IRC15:26
mnaserdansmith: https://github.com/openstack-dev/devstack/blob/358cc122c3a6d30bf043b3e478790fd2773e9a88/.zuul.yaml#L22015:26
dansmithokay15:27
dansmithyeah I guess that makes sense actually15:27
sean-k-mooneymnaser: oh your right we set NOVNC_FROM_PACKAGE=True15:27
openstackgerritMohammed Naser proposed openstack/nova master: wip: start using ?path=%3Ftoken%3D=<token>  https://review.openstack.org/64937215:35
mnaserI don't have time to follow up on this too much but I guess its a start for someone to go through and start a discussion15:35
*** ivve has quit IRC15:35
dansmithseems like melwitt has some interest at least15:36
*** BjoernT has quit IRC15:36
mnaseryeah, so maybe that's the initial copy pasta which will probably fail15:37
* mnaser needs to go back to the fine art of openstack upgrades15:37
cfriesensean-k-mooney: do you know if nova is supposed to handle PCI passthrough for Intel QAT devices?15:39
sean-k-mooneyyes15:39
sean-k-mooneyit has worked since before icehouse15:39
sean-k-mooneyyou can do both pf and vf passhtough15:39
sean-k-mooneyi think intel QAT device were perhaps the first usecase that pci passhtough was enabeld for. i know nic came after it15:40
cfriesensean-k-mooney: we're hitting a weird issue where they're hittting the SRIOV_VF clause in LibvirtDriver._get_device_type() and failing in pci_utils.get_ifname_by_pci_address().   Are we configuring something wrong?15:41
sean-k-mooneycfriesen: do you have a physnet set in the pci whitelist for that device15:42
cfriesensean-k-mooney: not sure, can check.  Should it have one?15:42
sean-k-mooneyno15:42
sean-k-mooneythat should only be set on nics15:42
sean-k-mooneyQAT we expect the type of the device to be PCI for both pf and vf in the pcimanager15:43
openstackgerritColleen Murphy proposed openstack/nova stable/stein: Move create of ComputeAPI object in websocketproxy  https://review.openstack.org/64937415:43
openstackgerritColleen Murphy proposed openstack/nova stable/rocky: Move create of ComputeAPI object in websocketproxy  https://review.openstack.org/64937515:43
sean-k-mooneybut its possible that the code we added for the bandwith based schduing is causing issues15:43
*** sapd1_x has quit IRC15:44
sean-k-mooneycfriesen: https://github.com/openstack/nova/blob/master/nova/pci/request.py#L16-L3915:44
cfriesensean-k-mooney: sweet, thanks15:45
sean-k-mooney"product_id": "0443" is the VF and "product_id": "0442" is the pf15:46
*** ccamacho has joined #openstack-nova15:46
sean-k-mooneycfriesen: yep QAT was the thing that intoduce pci passthough https://github.com/openstack/nova/commit/fe67148234dba42468793f33c2ca83ce0616e824 so it really should work still15:47
cfriesensean-k-mooney: I expect it's a config issue.15:48
sean-k-mooneycfriesen: its posible but the pci_utils.get_ifname_by_pci_address call was intoduce for stien for band with based schudling so there could be a bug. i dont have qat devices to test with15:49
*** hamzy has quit IRC15:49
sean-k-mooneywell the fucntion existed before we jsut use it in more places now15:49
efriedsean-k-mooney: this look kosher to you: https://review.openstack.org/#/c/635533/15:50
efrieddansmith: ^ if you please?15:51
sean-k-mooneyxen is not really my thing but ill take a look15:51
*** krypto has joined #openstack-nova15:51
efriedsean-k-mooney: it's not a xen thing, really an ssl thing.15:52
*** belmoreira has quit IRC15:53
dansmithit's not an ssl thing, it's a python/oslo thing AFAICT15:54
sean-k-mooneyya its processutils exit code checking15:55
sean-k-mooneybut yes on second look it looks sane to me15:55
sean-k-mooneyefried: is there any change that the password could be loged to standard error if openssl did not like it15:56
openstackgerritEric Fried proposed openstack/nova master: Add minimum value in max_concurrent_live_migrations  https://review.openstack.org/64830215:57
efriedsean-k-mooney: I would seriously hope there is no ssl command in existence that will echo back a password to you :)15:58
efrieddansmith: only peripherally. They're just changing from "anything on stderr means we should fail" to "nonzero return code means we should fail".15:58
dansmith...right15:58
*** BjoernT has joined #openstack-nova15:59
sean-k-mooneyefried: yes16:00
dansmithefried: the patch asserts that before, writing anything to stderr would cause it to raise, even if it exited with zero, rght?16:00
dansmithI don't see where in oslo that behavior happens16:00
dansmith(it'd be pretty dumb, which is why I'm curious)16:00
sean-k-mooneydansmith: https://github.com/openstack/oslo.concurrency/blob/master/oslo_concurrency/processutils.py#L414-L42416:01
efrieddansmith: It wasn't in oslo, they were triggering the exception here (in nova) based on stderr being nonempty.16:02
dansmithsean-k-mooney: that's not what I'm asking about16:02
dansmithefried: oh? I thought they were asserting that was oslo...16:02
dansmithoh RuntImeError, I see16:02
sean-k-mooneythey are assertign that if openssl wrote anything to stderr then nova would raise the runtime error16:03
*** ttsiouts has quit IRC16:04
dansmithefried: that makes it much more openssl-related than I thought, I just zeroed in on the process handling.. so I dunno, probably need someone much more familiar with openssl to validate those semantics16:04
sean-k-mooneyim guessing that if openssl ever decied to add a deperaction warning or somehting this caused issues for them16:04
*** ttsiouts has joined #openstack-nova16:04
dansmithright, I definitely get that16:04
sean-k-mooneyi looks liek they jsut want to be a little more graceful and assume openssl follow the standard unix thing of if it returned and exit code of 0 it succeded16:05
sean-k-mooneywhich i think is resonable16:05
sean-k-mooneydid the bug have an explcit example16:05
dansmithsean-k-mooney: yes, that's all obvious :)16:06
sean-k-mooneyyes so the patch is sane yes to efried original quetion16:07
sean-k-mooneythe bug was reported as a deprecation warning as i gueesed16:07
sean-k-mooney|RuntimeError: OpenSSL error: *** WARNING : deprecated key derivation used.16:07
dansmiththe original code didn't look at the return code for something security-related and so changing that behavior is potentially pretty impactful16:07
sean-k-mooney|Using -iter or -pbkdf2 would be better.16:07
*** ttsiouts has quit IRC16:08
sean-k-mooneythat is possibly true yes in this specifc instance othe Runtime error that was raise i think its safe but you are concerend that there coudl be other case where it would not be16:08
*** ttsiouts has joined #openstack-nova16:08
*** tssurya has quit IRC16:09
dansmithI'm saying they used to fail if anything was written to stderr and now they won't16:09
sean-k-mooneyi woudl hope openssl would not exit with code 0 for anything other then sucess but i dont know that for certine even if i belive it very likely16:09
dansmithand depending on what is going on here, that could be, you know, a big deal16:09
sean-k-mooneyyes16:09
dansmithdepends on the command and what is going on16:09
sean-k-mooneywell we can see the command it fixed16:10
dansmithI'm saying someone needs to go make that determination, IMHO and not just blindly approve this16:10
sean-k-mooneywe are encrypting input text with a shared key using ase-123-cbc16:10
sean-k-mooneyfair im still wondering why we are doing this via the shell in the first place16:11
*** imacdonn has joined #openstack-nova16:12
*** wolverineav has joined #openstack-nova16:19
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Fix functional tests for USE_NEUTRON  https://review.openstack.org/64938516:22
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Add functional regression test for bug 1669054  https://review.openstack.org/64938616:22
openstackbug 1669054 in OpenStack Compute (nova) stein "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem)16:22
openstackgerritMatt Riedemann proposed openstack/nova stable/pike: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64938716:22
openstackgerritMerged openstack/nova master: Correct lower-constraints.txt and the related tox job  https://review.openstack.org/62297216:23
openstackgerritMerged openstack/nova master: Adding tests to demonstrate bug #1821824  https://review.openstack.org/64795716:23
openstackbug 1821824 in OpenStack Compute (nova) "Forbidden traits in flavor properties don't work" [Undecided,In progress] https://launchpad.net/bugs/1821824 - Assigned to Magnus Bergman (magnusbe)16:23
dansmithlooks like dimitri did a fairly decent analysis of the openssl code, which is good, but I also wonder if we couldn't just ignore lines that start with "WARNING" or something from the stderr and retain some of the original behavor16:24
*** rpittau is now known as rpittau|afk16:24
dansmithanyway, I don't really have time to dig into it super deep16:24
*** dtantsur is now known as dtantsur|afk16:24
openstackgerritMerged openstack/nova master: Add placement as required project to functional py36 and 37  https://review.openstack.org/64906816:25
*** wolverineav has quit IRC16:25
*** ccamacho has quit IRC16:32
openstackgerritMerged openstack/nova master: libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable  https://review.openstack.org/64198116:34
*** ircuser-1 has joined #openstack-nova16:35
*** BjoernT has quit IRC16:35
cfriesensean-k-mooney: apparently nova-compute chokes on startup with this particular QAT hardware even with no whitelist/alias entries in nova.conf.16:35
*** _alastor_ has joined #openstack-nova16:37
*** BjoernT has joined #openstack-nova16:37
*** _alastor_ has quit IRC16:39
*** _alastor_ has joined #openstack-nova16:39
sean-k-mooneycfriesen: ok then this is likely related to gibi_off's change to auto lookup the netdev name for bandwith based scheduling16:41
cfriesensean-k-mooney: yeah, confirmed that we're dying in the code gibi_off added in Dec 2018.16:42
cfriesensean-k-mooney: we're hitting this with the standard embedded Intel QAT, so it's going to cause grief with standard hardware16:44
sean-k-mooneyits proably this bit https://github.com/openstack/nova/commit/c02e213d507c830427a86d6a4bb4f7a2f5158590#diff-f4019782d93a196a0d026479e6aa61b1R593816:45
cfriesensean-k-mooney: the issue is that there is no "net" in the device path (i.e. /sys/bus/pci/devices/<pci_addr>/net)16:45
sean-k-mooneyya16:46
sean-k-mooneyso https://github.com/openstack/nova/blob/c02e213d507c830427a86d6a4bb4f7a2f5158590/nova/virt/libvirt/driver.py#L5938-L594016:46
sean-k-mooneyshould only be executed for VF that are network devices16:46
*** davidsha has quit IRC16:47
sean-k-mooneythat shoudl be a simple fix16:47
sean-k-mooneybut we will need to land it in RC2 or backport to stien before thurday to include it in the GA release16:47
sean-k-mooneycfriesen: we are expecting qat to hit the final return however16:49
sean-k-mooneyso there is something else going on16:49
cfriesensean-k-mooney: there are VFs for this device, so I was assuming we're enumerating the VFs16:50
*** BjoernT has quit IRC16:50
sean-k-mooneyyay be we are only ment to report the it as type SRIOV_VF if its a nic16:50
sean-k-mooneyall non nic VF are ment to be TYPE_PCI16:51
cfriesensean-k-mooney: where is that code?16:51
sean-k-mooneyim looking for it now but its the only thing that prevented you geting a qat device instead fo a nic VF when you ahave a neturon prot of vnic_type direct in the past16:53
*** dikonoor has quit IRC16:55
*** amodi has quit IRC16:58
cfriesensean-k-mooney: it kind of looks like _get_pcidev_info() is calling self._host.device_lookup_by_name() to get the XML for the device.  Is it possible libvirt is doing something different?17:01
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922917:03
sean-k-mooneyis that where you are having the failure?17:03
sean-k-mooneycfriesen: can you post a copy fo the error to paste.openstack.org17:04
cfriesensean-k-mooney: yeah, nova-compute startup.   here's the starlingx bug, the nova stuff is partway down: https://bugs.launchpad.net/starlingx/+bug/182193817:04
openstackLaunchpad bug 1821938 in StarlingX "No nova hypervisor can be enabled on workers with QAT devices" [High,Triaged]17:05
*** hamzy has joined #openstack-nova17:05
sean-k-mooneycfriesen: thanks17:05
*** ttsiouts has quit IRC17:05
cfriesensean-k-mooney: extra info: http://paste.openstack.org/show/748734/17:05
*** ttsiouts has joined #openstack-nova17:06
sean-k-mooneyya so this is not failing because of https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L606717:06
cfriesenwe don't have the resources to fix this in the near future, got other stuff to deal with17:06
sean-k-mooneyits failing because of https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L604717:06
sean-k-mooneybecause it does not have a netdev17:06
cfriesensean-k-mooney: correct, but you were wondering why we were going down the VF path for things that werent nics17:07
sean-k-mooneywhich is because of gibis change17:07
sean-k-mooneyya but the other filtering im thinking of could be else where17:07
sean-k-mooneyi think we just need to put a guard around that call17:07
cfriesensean-k-mooney: looks like _get_device_capabilities() also assumes that SRIOV_VF  is a NIC17:08
sean-k-mooneyyes it does17:09
sean-k-mooneyalthough it is reading form libvirt17:09
sean-k-mooneyinstead of sysfs17:09
sean-k-mooneyso it proably fine17:09
*** dpawlik has joined #openstack-nova17:09
cfriesensean-k-mooney: _get_pcinet_info calls get_net_name_by_vf_pci_address()17:10
cfriesenso I think it'll choke17:10
*** ttsiouts has quit IRC17:10
sean-k-mooneyill quickly hack something up one sec17:10
*** eharney has joined #openstack-nova17:11
sean-k-mooneycfriesen: https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L6008-L6010 i think will guard it for that case17:12
sean-k-mooneyactully no it wont17:12
*** ralonsoh has quit IRC17:12
sean-k-mooneyactully it should be fine17:14
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/pci/utils.py#L20517:14
sean-k-mooneythe exception is caught internally17:14
cfriesenah, yes17:14
sean-k-mooneyam is that the case in the starlingx code?17:15
cfriesenshould be, but we were choking earlier in _get_device_type()17:15
*** dpawlik has quit IRC17:15
sean-k-mooneyoh i see the issue17:16
sean-k-mooneywe are doing17:16
sean-k-mooney                        'parent_ifname':17:16
sean-k-mooney                            pci_utils.get_ifname_by_pci_address(17:16
sean-k-mooneypci_address, pf_interface=True),17:16
sean-k-mooneyso we are calling pci_utils.get_ifname_by_pci_address directly so we dont catch the excpetion17:17
cfriesenyep17:17
sean-k-mooneywhere as the other code calls get_net_name_by_vf_pci_address which does17:17
*** ricolin has quit IRC17:17
sean-k-mooneyok this is a simile fix ill do it now and make it as closing the bug17:17
sean-k-mooneyefried: im going to try and fix https://bugs.launchpad.net/starlingx/+bug/1821938 can we land it in stien?17:18
openstackLaunchpad bug 1821938 in StarlingX "No nova hypervisor can be enabled on workers with QAT devices" [High,Triaged]17:18
cfriesenI think one of my coworkers is going to open a nova bug17:18
sean-k-mooneyok ill add the starlingx bug as a related bug so17:18
sean-k-mooneyor do ye want to fix it17:18
cfriesengo for it17:19
*** erlon has joined #openstack-nova17:20
cfriesenI think my guy is having lunch. :)  I'll send you the nova bug when I get the number.17:20
*** artom has quit IRC17:22
stephenfinmriedem: https://review.openstack.org/62694917:24
* stephenfin -> home17:24
openstackgerritJared Winborne proposed openstack/nova master: Leave the brackets on Ceph Monitor IPv6 addresses for libguestfs  https://review.openstack.org/64940517:26
*** KH-Jared has joined #openstack-nova17:28
*** jmlowe has joined #openstack-nova17:28
KH-JaredI fully expect my change didn't follow some proper practice on the change I just submitted, guess I just have to wait and find out what that is at this point17:31
*** wolverineav has joined #openstack-nova17:34
*** amodi has joined #openstack-nova17:36
openstackgerritsean mooney proposed openstack/nova master: gracefuly handel none nic VFs  https://review.openstack.org/64940917:37
*** eharney has quit IRC17:37
sean-k-mooneycfriesen: ^17:37
*** jmlowe has quit IRC17:37
mriedemKH-Jared: commented17:37
sean-k-mooneyi need to run the unit test and see if any fail and alsoadd a new one17:38
KH-Jaredty mriedem17:39
sean-k-mooneymriedem: not sure if you were following but ^ is a fix for https://bugs.launchpad.net/starlingx/+bug/182193817:39
openstackLaunchpad bug 1821938 in StarlingX "No nova hypervisor can be enabled on workers with QAT devices" [High,Triaged]17:39
mriedemsean-k-mooney: i wasn't17:39
sean-k-mooneythink we could land it in stien if i get it ready soon17:39
mriedemidk wtf a qat device is17:39
sean-k-mooneyintesl Quick assist crypto card17:40
mriedemlooks like something gibi_off should be aware of17:40
sean-k-mooneyit was the first pci passthough device we supported in nova17:40
sean-k-mooneymriedem: ya so gibi_off added the auto parent interface name lookup feature i suggeted for the bandwidth based schudling17:41
sean-k-mooneybut we missed that it shoudl have handeld VF that were not nics17:41
sean-k-mooneyso it raise an excpution if you have a pci device that support sriov but is not a nic17:41
sean-k-mooneylike a QAT device or a GPU that uses sriov like amd do17:42
mriedemwell i don't know about all that wackiness but i know you need a test and you could avoid blanket ignoring Exception if you changed to handle PciDeviceNotFoundById17:43
mriedemand add a comment about why it's ok to ignore17:43
mriedemrun a spellchecker on your commit message as well :)17:44
sean-k-mooneyyep ill do all of the above. i was copying what we do here more or less https://github.com/openstack/nova/blame/2384c41b781a84de98d0932f44d4b3c544c3fe3d/nova/pci/utils.py#L20517:44
*** tbachman has quit IRC17:45
mriedemyou also need a nova bug for that and tag it with stein-rc-potential17:47
mriedemand inform the PTL17:47
sean-k-mooneyyes cfriesen or one of his coworkers is filing the bug but i guess i can jsut do that and i pingged efried but i think he is away or having lunch17:48
*** BjoernT has joined #openstack-nova17:48
sean-k-mooneyor ignoring me that is valid too17:48
mriedemhe's out cracking skulls over lunch break i'm sure17:48
*** MasterofJOKers has quit IRC17:48
*** MasterofJOKers has joined #openstack-nova17:49
sean-k-mooneycool ill get all of this done in the next hour or so. im asummig this qualifies by the way for the rc/stien release17:49
dansmiththis is stein ptl anyway right?17:49
sean-k-mooneyoh that would be melwitt then17:50
efriedI'm back17:50
efriedsean-k-mooney: I'm not stable anyway17:50
efriedtake that however you like17:50
dansmithnice17:50
sean-k-mooneyefried: cfriesen/starlingx noticed you cant start nova compute oh host with qat integrated into the cpu/chipset17:51
efriedclearly we just need to get rid of the pci subsystem17:51
sean-k-mooneyclearly. the fix is trivail and im cleaning it up now. ill ping people when its all ready17:52
KH-Jaredtrying to make sure I'm handling my change in the best way. I saw two options for making the addresses happy for libguestfs without changing how they were provided to libvirt, make striping the brackets optional or add them back if it looked like IPv6. Adding them back seemed easier but improper, since it would be removing and adding the brackets for no purpose, so I was going to go with trying to leave the brackets,17:52
KH-Jaredoptionally17:52
dansmithsean-k-mooney: and you're going to spell check the snot out of it right?17:52
dansmithsean-k-mooney: maybe two or three times just to be sure?17:52
sean-k-mooneyyes :)17:52
*** ivenszambrano has quit IRC17:54
efriedsean-k-mooney: So you want that bug assigned to you?17:54
*** tbachman has joined #openstack-nova17:54
sean-k-mooneywell i or cfriesen will file a nova one and ya it can be assigned to me17:54
sean-k-mooneythe current bug is against starlingx17:55
*** psachin has quit IRC17:55
efriedsean-k-mooney: You can add 'affects' to the same bug, nah?17:55
mordredmriedem: https://review.openstack.org/626949 - patch to osc regarding live migration and arguments17:56
efriedI guess the nova fix has to be ported to the stx fork?17:56
*** hongbin has quit IRC17:57
*** hongbin has joined #openstack-nova17:57
mriedemmordred: yeah i added you since i know you're at least aware of there being a few changes for that same issue17:57
sean-k-mooneyso i was thinking about just adding nova as a component of the same bug but im not sure how that works with the rc potential tag17:57
mriedemmordred: L18 https://etherpad.openstack.org/p/DEN-osc-compute-api-gaps17:57
*** BjoernT has quit IRC17:58
*** bbowen_ has joined #openstack-nova18:00
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922918:01
*** eharney has joined #openstack-nova18:02
*** jonspw has joined #openstack-nova18:02
mriedemsomeone tell me how this makes sense:18:02
mriedempike: https://github.com/openstack/nova/blame/1c6f99dc9aacaea78242561df35957bb711c6161/nova/tests/unit/conductor/test_conductor.py#L136018:02
mriedemocata: https://github.com/openstack/nova/blame/9b76fc7a0a1afbd9f2cd0d5786c37138c1b820f1/nova/tests/unit/conductor/test_conductor.py#L137718:02
mriedemthe code is different, but the commit on the left in the blame is the same18:02
*** bbowen__ has quit IRC18:02
dansmithcode looks the same to me18:03
dansmithwait, I opened the same thing twice :D18:03
mriedemheh18:04
mriedemoh i know,18:04
mriedemsomething just removed code which is why it's not showing up in the pike blame18:04
dansmithyeah, looks like it18:05
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922918:05
dansmithrequest_spec={} et al18:05
mriedemyeah18:05
*** wolverineav has quit IRC18:06
mriedembingo18:07
mriedemhttps://github.com/openstack/nova/commit/e211fca55a11c80058d5d78e31dc3ad466d7edfd#diff-df5e04ccff7072ded89c488a5649639e18:07
mriedemhttps://www.youtube.com/watch?v=YqAyz1coj4418:08
dansmithheh18:08
*** cdent has joined #openstack-nova18:09
*** BjoernT has joined #openstack-nova18:11
*** jmlowe has joined #openstack-nova18:11
*** wolverineav has joined #openstack-nova18:14
*** wolverineav has quit IRC18:14
*** artom has joined #openstack-nova18:14
openstackgerritArtom Lifshitz proposed openstack/nova master: Docs: emulator threads: clarify expected behavior  https://review.openstack.org/64941618:14
artomstephenfin, sean-k-mooney ^^ from this morning's downstream discussion18:15
sean-k-mooneyoh the Horror https://review.openstack.org/#/c/649416/1/doc/source/user/flavors.rst@57618:15
openstackgerritMerged openstack/nova master: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64751218:16
sean-k-mooneyfirst glance ignoring the space it looks fine18:17
sean-k-mooneyill wait for the docs job to finish18:17
*** wolverineav has joined #openstack-nova18:20
*** BjoernT_ has joined #openstack-nova18:20
*** BjoernT has quit IRC18:22
openstackgerritMatt Riedemann proposed openstack/nova stable/ocata: Add functional regression test for bug 1669054  https://review.openstack.org/64941918:25
openstackbug 1669054 in OpenStack Compute (nova) stein "RequestSpec.ignore_hosts from resize is reused in subsequent evacuate" [Medium,In progress] https://launchpad.net/bugs/1669054 - Assigned to Matt Riedemann (mriedem)18:25
openstackgerritMatt Riedemann proposed openstack/nova stable/ocata: Do not persist RequestSpec.ignore_hosts  https://review.openstack.org/64942018:25
*** BjoernT_ has quit IRC18:26
*** tbachman has quit IRC18:27
*** BjoernT has joined #openstack-nova18:27
*** cdent has quit IRC18:30
openstackgerritArtem Vasilyev proposed openstack/nova master: systemd detection result caching nit fixes  https://review.openstack.org/64922918:31
*** spsurya has quit IRC18:32
*** tbachman has joined #openstack-nova18:33
openstackgerritMatt Riedemann proposed openstack/nova stable/stein: Error out migration when confirm_resize fails  https://review.openstack.org/64942118:33
openstackgerritsean mooney proposed openstack/nova master: gracefully handle non-nic VFs  https://review.openstack.org/64940918:35
*** tbachman_ has joined #openstack-nova18:36
*** bbowen__ has joined #openstack-nova18:37
*** tbachman has quit IRC18:37
*** tbachman_ is now known as tbachman18:37
*** mdbooth has joined #openstack-nova18:39
*** bbowen_ has quit IRC18:39
*** mdbooth_ has quit IRC18:42
openstackgerritEric Fried proposed openstack/nova master: docs: Rework all things metadata'y  https://review.openstack.org/64073018:47
*** wolverineav has quit IRC18:49
*** wolverineav has joined #openstack-nova18:51
*** tbachman has quit IRC18:53
*** wolverineav has quit IRC18:55
*** artom has quit IRC18:58
*** dpawlik has joined #openstack-nova18:58
*** wolverineav has joined #openstack-nova18:59
*** dpawlik has quit IRC19:02
*** wolverineav has quit IRC19:02
*** wolverineav has joined #openstack-nova19:03
*** gmann is now known as gmann_afk19:04
*** tesseract has quit IRC19:04
openstackgerritsean mooney proposed openstack/nova master: Libvirt: gracefully handle non-nic VFs  https://review.openstack.org/64940919:07
sean-k-mooneymelwitt: efried i think ^ is now correct?19:08
*** wolverineav has quit IRC19:08
sean-k-mooneyi have added nova to the existing bug and added the stein-rc-potential tag but if you would like a new bug filed i can do that too.19:09
sean-k-mooneyim going to have dinner but if you want any changes let me know19:10
efriedsean-k-mooney: assuming stx doesn't care about our stein-rc-potential tag, I think this should be fine.19:10
efriedThanks sean-k-mooney19:10
sean-k-mooneycfriesen: any comment on ^19:10
cfriesenlooking19:11
sean-k-mooneyi think it should be ok but we should discuss it at the ptg too. e.g. how to tack cross project bugs like this between openstack and starlingx19:11
*** owalsh_ has joined #openstack-nova19:15
*** owalsh has quit IRC19:16
cfriesensean-k-mooney: looks reasonable, but why not put the result['parent_ifname'] assignment in the try block and get rid of the initial assignment to None and the conditional?19:16
openstackgerritGhanshyam Mann proposed openstack/nova-specs master: Spec for API policy updates  https://review.openstack.org/54785019:18
cfriesenefried: one of our people in charge thinks adding nova (and the tag) to that bug is a good solution for tracking joint issues19:21
efriedwfm. mriedem may have a stronger opinion.19:21
cfriesen(i.e. StarlingX, just to be clear)19:21
*** erlon_ has quit IRC19:22
efriedcfriesen: what's not clear to me is whether anything needs to be done on the stx side at all.19:22
cfriesenon our end it'll likely just be picking up a new load once it's fixed in nova and validating the issue is gone19:22
sean-k-mooneycfriesen: ya i think from a souce code point of view it will be a cherrypick or rebase19:25
sean-k-mooneythere is obviously the testing aspect too.19:25
cfriesensean-k-mooney: with your fix we're seeing logs every minute: nova.pci.utils [req-d9c8620e-7990-4b3c-a6b0-88a131852e47 - - - - -] No net device was found for VF 0000:3d:02.2: PciDeviceNotFoundById: PCI device 0000:3d:02.2 not found19:26
*** BjoernT has quit IRC19:27
sean-k-mooneythat is from the other capablities fucntion19:27
sean-k-mooneynot adding a second message every minute was why i add the pass in teh except block in stead of logging19:27
cfriesenmight make sense to quiet that down, but that's not quite so urgent.19:29
sean-k-mooneycfriesen: that comes form here https://github.com/openstack/nova/blob/master/nova/pci/utils.py#L224-L22519:29
cfriesenit's 48 logs every minute19:29
*** bbobrov has quit IRC19:30
sean-k-mooneycfriesen: ya that has been there for 2 or 3 releases19:30
*** awaugama has quit IRC19:30
sean-k-mooneyim going to work on a followup to fix some comments i notice i can remove that warning too.19:30
sean-k-mooneyor make it a debug message19:30
cfriesendebug might be good19:30
sean-k-mooneythere is nothing that an operator can do to scilence it if they have non nic VF currently and its not helpful in that case19:31
cfriesenagreed19:31
*** bbobrov has joined #openstack-nova19:31
sean-k-mooneyanyway the fact that your nova-compute agent didnt die means the fix is at least minimally working.19:32
sean-k-mooneyi might look at optimising this code a bit too. currently we call _get_pcidev_info from _get_pci_passthrough_devices before applying the pci whitelist so we look at way more device then we need too19:35
*** dpawlik has joined #openstack-nova19:43
*** wolverineav has joined #openstack-nova19:44
*** jmlowe has quit IRC19:47
*** dpawlik has quit IRC19:47
*** dpawlik has joined #openstack-nova19:48
*** wolverineav has quit IRC19:49
openstackgerritMatt Riedemann proposed openstack/nova master: Fix comment in test_attach_with_multiattach_fails_not_available  https://review.openstack.org/64944019:50
openstackgerritMerged openstack/nova master: Fix a deprecation warning  https://review.openstack.org/64923419:52
*** dpawlik has quit IRC19:53
efriedmriedem: How about this one: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ModuleNotFoundError%3A%20No%20module%20named%20'memcache'%5C%2219:54
efriedSeems to hit several different jobs, but always grenade19:54
efriedmriedem: https://review.openstack.org/#/c/649096/ ?19:56
mriedemefried: known issue19:57
mriedemyeah related to that bug19:58
mriedemi need to update the e-r query, or you can19:58
mriedemto include build_name:"neutron-grenade-multinode" OR build_name:"neutron-grenade-dvr-multinode"19:58
efriedmriedem: there's also grenade-py319:58
efriedlet me try19:58
mriedemthe query is already restricted to just grenade-py319:58
mriedemso need message:"..." AND tags:"screen" AND (build_name:"grenade-py3" OR build_name:"..."...)19:59
efriedk, hadn't pulled it up yet.19:59
mriedemmelwitt: dansmith: duh duh duh https://bugs.launchpad.net/devstack/+bug/182287319:59
openstackLaunchpad bug 1822873 in devstack "stack fails if NOVA_NUM_CELLS > 1 and n-novnc enabled" [Undecided,New]19:59
efriedthe bug says the module not found is etcd19:59
mriedemefried: it's a whole bunch of packages,19:59
*** wolverineav has joined #openstack-nova19:59
mriedemi spent half of yesterday digging into that19:59
dansmithmriedem: duh duh duhntcare20:00
efriedokay, I see the message is just ModuleNotFound, cool.20:00
mriedembut it's also just busted networks getting to pypi20:00
mriedemefried: just need to update the existing query for http://status.openstack.org/elastic-recheck/#182089220:00
efriedmriedem: on it20:00
*** markmcclain has quit IRC20:01
* mriedem gives more money to mnaser to create a new vm to stack w/o n-novnc20:01
*** wolverineav has quit IRC20:01
*** wolverineav has joined #openstack-nova20:01
mnasermriedem: if you launch it in sjc1, you'd be launching it against stein :)20:02
*** jmlowe has joined #openstack-nova20:03
*** bbowen__ has quit IRC20:04
efriedmriedem: Okay, so would there have been a better way for me to determine that this was 1820892 ?20:06
*** xek has quit IRC20:08
*** BjoernT has joined #openstack-nova20:12
*** igordc has joined #openstack-nova20:15
melwittmriedem: dammit novnc20:17
*** artom has joined #openstack-nova20:17
openstackgerritArtom Lifshitz proposed openstack/nova master: Docs: emulator threads: clarify expected behavior  https://review.openstack.org/64941620:17
*** markvoelker has quit IRC20:23
eanderssonDoes anyone recall the reason why the functionality to search based on metadata was removed from the api?20:29
eandersson> search_options["metadata"] = '{"my_key" : "bla" }'20:29
eanderssonhttps://review.openstack.org/#/c/408571/20:29
eanderssonAlso, is the alternative to use tags?20:29
*** pcaruana has quit IRC20:30
*** whoami-rajat has quit IRC20:30
*** hamzy has quit IRC20:41
efriedeandersson: I have no idea, but did you look at the spec? http://specs.openstack.org/openstack/nova-specs/specs/ocata/implemented/add-whitelist-for-server-list-filter-sort-parameters.html20:43
efriedThat appears to answer the first question. alex_xu could probably answer the second.20:44
mriedemmnaser: i only see ca-ymq-220:45
mnaserhuh?  sjc1 is a separate region20:45
mnaserOS_REGION_NAME=sjc120:46
eanderssonI can't find any good documentation for tags, but maybe I just suck at googling :D20:46
eanderssonnova cli has things like > nova server-tag-add20:48
eanderssonbut openstackcli has no mention of tags20:48
eandersson> https://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-create20:49
mriedemmnaser: oh i thouht it was a zone20:49
mriedemeandersson: openstack server set i think20:49
mriedemhttps://docs.openstack.org/python-openstackclient/latest/cli/command-objects/server.html#server-set20:49
mriedemoh nvm20:49
mriedemno tags support there either yet20:49
mriedemconsulting https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc20:49
mriedemhttps://review.openstack.org/#/c/569386/20:50
eanderssonThat explains it20:50
mriedemeandersson: if you're going to be in denver https://www.openstack.org/summit/denver-2019/summit-schedule/events/23665/closing-compute-api-feature-gaps-in-the-openstack-cli20:51
eanderssonI will20:52
mriedemi've started the etherpad for that session https://etherpad.openstack.org/p/DEN-osc-compute-api-gaps20:52
*** dpawlik has joined #openstack-nova20:54
*** dpawlik has quit IRC20:58
*** amodi has quit IRC21:01
*** ceryx has joined #openstack-nova21:03
*** priteau has joined #openstack-nova21:04
*** erlon has quit IRC21:06
*** slaweq has quit IRC21:07
NewBrucehej mriedem - we look to have stumbled on a bug in Neutrons Port Binding API, in a particular area of the code which you will probably be more intimately familliar with than I21:09
NewBrucehttps://bugs.launchpad.net/nova/+bug/1822884 (cc sean-k-mooney)21:10
openstackLaunchpad bug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Undecided,New]21:10
openstackgerritmelanie witt proposed openstack/nova stable/stein: Add doc on VGPU allocs and inventories for nrp  https://review.openstack.org/64945421:12
NewBruceif you have a minute, id love your opinion - im trying to decide how much more debugging is worth before we move over to cold migrations21:12
*** bbowen__ has joined #openstack-nova21:14
*** wolverineav has quit IRC21:16
openstackgerritArtom Lifshitz proposed openstack/nova master: Docs: emulator threads: clarify expected behavior  https://review.openstack.org/64941621:16
mriedemNewBruce: i don't know what the differences are between RDO and OSA which would cause issues here21:17
mriedemas you said, "This is unexpected, as even in the RDO to RDO case, both nodes are Rocky and so the new process should be in use."21:18
NewBrucehrmm, no - that seems to be our sticking point - i was chatting with sean-k-mooney and also mnaser on this21:18
mriedemare you sure you're not hitting a case where one node is really queens?21:18
*** wolverineav has joined #openstack-nova21:18
*** wolverineav has quit IRC21:18
*** wolverineav has joined #openstack-nova21:19
NewBruceyeah, absolutely sure - as I am using a single test source node which is absolutely Rocky as well as the target21:19
mriedemis 'Port Bindings Extended' showing up in the neutron api extension list in both the rdo and osa case?21:21
mriedemif rdo is rocky, i don't know why rdo->rdo would use the old flow21:22
mriedemunless the 'Port Bindings Extended' neutron extension is not showing up21:22
NewBrucei can confirm that on the target it is enabled21:23
mriedemthe live migration task should only do the new flow with the 2nd deactivated port binding if (1) the neutron 'Port Bindings Extended' API extension is available, and (2) both the source and dest compute service versions are >= 3721:23
mriedem*3521:23
*** slaweq has joined #openstack-nova21:24
eanderssonmriedem, solution was to just list all VMs in the project and then just filter on metadata :p21:24
*** ttsiouts has joined #openstack-nova21:24
eanderssonuntil tags is supported everywhere (openstackclient, terraform etc)21:24
NewBruceyeah, so we looked at that as well;21:25
NewBruce| cc-compute10-kna1                     | nova-compute       |      35 |21:25
NewBruce| cc-compute29-kna1                     | nova-compute       |      35 |21:25
NewBrucesource = 10 (RDO Rocky) Target = 29 (OSA Rocky)21:25
*** derekh has quit IRC21:26
NewBrucehowever we do have other compute nodes in the same environment which have not been upgraded to Rocky yet; (service version 30)21:26
mriedemNewBruce: do you have [upgrade_levels]/compute pinned to queens or something?21:27
openstackgerritJared Winborne proposed openstack/nova master: Leave the brackets on Ceph Monitor IPv6 addresses for libguestfs  https://review.openstack.org/64940521:28
*** wolverineav has quit IRC21:28
*** slaweq has quit IRC21:28
NewBruce[upgrade_levels]21:29
NewBrucecompute = auto21:29
*** krypto has quit IRC21:29
mriedemso that will pin to the lowest nova-compute service version in the deployment while you're upgrading,21:29
mriedemso if you have queens computes, it will be using the queens rpc versions and backlevel the migrate_data object21:29
mriedemwhich likely is dropping the vifs information for the 2nd deactivated port binding21:30
NewBruceAha….21:30
mriedemthat's my guess anyway21:30
*** wolverineav has joined #openstack-nova21:30
NewBruceok, so can we override that?21:30
NewBruceagain, mnaser / sean-k-mooney suggested that might be the case and to upgrade the entire compute - which we were in the process of, however anothe error in a node caused a bit of strife due to broken libevent so’s in OVS so we halted21:31
mriedemwell you can pin the compute rpc api version to a specific release (queens) or even rpc api version (5.0) but that could break things where the controller is sending versions of objects to older queens computes that won't understand those chnages21:32
NewBrucealso, we have other sites where we have performed the same procedure, without issue21:32
mriedemultimately it looks like https://github.com/openstack/nova/blob/stable/rocky/nova/conductor/tasks/live_migrate.py#L41 is faulty in that it doesn't account for pinned RPC versions21:32
NewBrucei have a test machine, so i can override to at least test it21:32
mriedemdansmith: yeah ^?21:32
*** luksky has quit IRC21:32
mriedemNewBruce: the kill switch would be to disable the 'Port Bindings Extended' neutron API extension until you're fully upgraded to rocky21:33
*** BjoernT has quit IRC21:33
dansmithmriedem: if conductor is new enough then the new objects are fine21:33
NewBrucewhich would force into the old flow21:33
mriedemdansmith: sounds like conductor in this case is rocky but some computes are queens21:33
dansmithmriedem: which is fine21:33
NewBrucedansmith correct21:33
mriedembecause of the bounce back thingy?21:33
dansmithyes21:33
dansmiththe rpc pin has nothing to do with the objects21:33
dansmithit's only really method signatures21:33
mriedemsure21:33
mriedembut,21:34
*** slaweq has joined #openstack-nova21:34
mriedemconductor is doing some stuff in neutron based on what the compute service versions are,21:34
mriedemand then setting fields in the objects that get passed to compute,21:34
*** wolverineav has quit IRC21:34
*** wolverineav has joined #openstack-nova21:34
mriedemand then compute has logic that is based on if those new fields are set or not,21:34
mriedemwhich it sounds like they might not be21:34
mriedemtldr compute is probably not doing the right thing21:35
dansmithwell, if the backport is right then it should be okay, unless the conductor is doing something against neutron that the compute can't possibly handle properly21:35
dansmithbut if so, we broke upgrades21:36
mriedemmnaser: did you hit https://bugs.launchpad.net/nova/+bug/1822884 when upgrading to rocky and doing live migrations with mixed computes (queens and rocky?)21:38
openstackLaunchpad bug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Undecided,New]21:38
*** slaweq has quit IRC21:38
mriedemmnaser: or do you upgrade neutron after nova?21:38
mnaserwe upgrade neutron after nova usually mriedem21:38
mnaserI’m very intimately familiar with that bug though.21:38
mriedemfirst i've heard of it :/21:39
mriedemmnaser: ok but that's why you didn't hit it21:39
mnaserI’ve been trying to help NewBruce nail it down forever.21:39
mnaserBut I’ve kinda struggled at finding the code path to replicate it21:39
*** BjoernT has joined #openstack-nova21:39
mnaserThere are other environments where there is mixed compute AND Rocky network and this doesn’t happen21:40
mnaserRight NewBruce ?21:40
NewBruceyeah thats right mnaser21:40
mriedemright he said, "2. OSA -> OSA uses the new flow (two entries which are cleaned up)"21:40
NewBrucethis is the first site we’ve seen it on...21:40
NewBrucemriedem right; so if i grab a VM on an OSA node, and migrate it to an OSA node, watch ml2_port_bindings - ill see one port, briefly two ports, the profiles change, and then the second port entry removed21:41
NewBruceas described in the bluebrint21:41
NewBrucetesting RDO - RDO, i don’t see that behaviour - just a single port entry in ml2_port_bindings throughout the entire migration21:42
mnasermriedem: NewBruce runs a few regions and this one is the only one where it works. With both queens to queens21:42
mnaserShit, what’s upgrade levels set to in the rdo notes?21:42
mnaserNodes21:42
mnaserSorry. I’m on mobile.21:43
NewBrucemnaser auto21:43
NewBrucemriedem and i chatted above that will pin it to the lowest version, as you suspected previously21:43
mnaserok, so yeah, I’m pretty torn on why it would actually do that in one environment and not another.21:43
mriedem"all control nodes and net nodes are running OSA (Rocky), some compute  are running RDO (Queens), some are RDO (Rocky) and the remaining are OSA  (Rocky)."21:43
NewBruceill double check some other regions and see if they are diferent there21:44
mnaserNewBruce: by any chance maybe the other regions aren’t pinned to lower version maybe?21:44
mnaserAnd so this is why we don’t catch it?21:44
NewBrucemriedem correct / mnaser ill double check21:44
NewBrucequick (not exhaustive) check, upgrade levels is auto21:46
NewBrucewe have the same mix of service versions there as well (30 / 35)21:47
mnaserYet that issue somehow doesn’t happen there21:49
mnaserSo off21:49
mnaserOdd21:49
mriedemNewBruce: it fails in _post_live_migration right/21:50
mriedem?21:50
mnaser I think so. Rather than deleting the old binding and activating the new one, it tries to update the port binding21:51
NewBrucemriedem correct21:51
NewBrucemnaser correct21:51
mnaserWhich is what it would be with old port binding method.21:52
mriedemright in the old method there is just one port binding21:52
mriedemand we change the host on it21:52
NewBrucemriedem and thats the exact behavior we see in RDO - RDO21:55
mnaserSo rocky to rocky and it does old port binding21:55
mnaserNewBruce: can you restart a nova-compute with rdo and debug=true and double check the value of upgrade_levels in the output on startup of Oslo CFC21:56
mnaserCfg21:56
mnaserIn case rdo is doing weird stuff21:56
NewBrucesure21:57
sean-k-mooneysound like ye are making some progress on this21:57
*** wolverineav has quit IRC21:57
NewBruce[upgrade_levels]21:57
NewBrucecompute = auto21:57
*** tbachman has joined #openstack-nova21:58
NewBrucedebug=true21:58
sean-k-mooneyis the theory that RDO and OSA are not using the same compute RPC version due to there configs eventhough they are running more or less the same code22:00
*** wolverineav has joined #openstack-nova22:00
sean-k-mooneyand as a result RDO is useing the old mechaniusm while osa is usign the new mechanisium?22:00
*** wolverineav has quit IRC22:00
*** wolverineav has joined #openstack-nova22:01
NewBrucemnaser22:03
NewBrucenova-compute.log:2019-04-02 23:59:08.448 10483 DEBUG oslo_service.service [req-4f4874a7-a967-49d1-a643-c59b856c5c61 - - - - -] upgrade_levels.compute         = auto log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:303222:03
mnaserI mean I dug through the conductor code a lot and it checks the source and dest to be above or at a certain later22:03
mnaserLevel22:04
mnaserAs far as I know the conductor creates the port bindings22:04
sean-k-mooneynot in all cases22:04
sean-k-mooneyit can be created by the compute node22:04
mnaserOh really?  So I think in the new flow, the new compute node creates it right?22:05
mriedemthere is already an active port binding for the source host when you start the live migration,22:05
mriedemin the new flow, conductor will create an inactive port binding for the dest host22:05
mriedemand saves information about that new dest host port binding on the LiveMigrateData.vifs field that gets passed around to the computes22:06
sean-k-mooneyhttps://github.com/openstack/nova/blob/master/nova/conductor/tasks/live_migrate.py#L280-L28422:06
mnaserRight. That’s what I thought. I even asked NewBruce to set the “dont live migrate until vif is plugged” option that was introduced in rocky as false and moved to true in Stein22:06
sean-k-mooneyyes we do the version check the bind the ports on teh destiation22:06
mriedemmnaser: that's unrelated to this22:07
mriedemi mean it was part of the same bp22:07
mriedembut doesn't rely on the active/inactive port bindings22:07
sean-k-mooneymnaser: in the old flow in post livemigate on dest the compute node updated the port binding22:07
mriedemcorrect that is the call here https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L683622:08
mnaserNewBruce: can you update conductor to perhaps output the content of every statement in the if that sean-k-mooney linked?22:08
mnaserAnd do and rdo to rdo migration and see which one of them is evaluating to false and not doing port bindings?22:09
mriedemthat code in post_live_migration_at_destination won't update the port binding host if it's already the specified host https://github.com/openstack/nova/blob/stable/rocky/nova/network/neutronv2/api.py#L308822:09
mnaserCause rdo to rdo seems to do old school bindings from what I understand22:09
mriedemwhich is why it's a no-op with the new flow22:09
mriedembecause we've already activated the dest host port binding before we get to https://github.com/openstack/nova/blob/stable/rocky/nova/network/neutronv2/api.py#L308822:09
NewBrucemnaser yeah, thats no problem (to update the conductor and test)22:10
sean-k-mooneyright be if we create an inactive  binding on the dest and then somehow end ups runint the old code in post live migrate the host it gets back from neturon will be the source node as that will still be the active binding right22:11
mriedemNewBruce: in case you're not familar, this is the code on the source compute that activates the dest port binding once we've switched to post-copy22:11
mriedemhttps://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L1136-L116022:11
sean-k-mooneyso if we are corssing the streams here that check might be a little wierd22:11
NewBrucemriedem right; yeah i have filled that with a ton of debug messages as well :D (have the logs saved down in my diaries if they become useful)22:12
mnaserI think once we find out why rdo-rdo uses old flow even if upgrade levels are auto.. it might help22:13
cfriesenwhen doing spec re-approvals for T for stuff that didn't make it in S, do I just copy from stein/approved into train/approved?22:14
mordredmriedem: cool - thanks for the context on the migrate22:14
sean-k-mooneymriedem: that code assumes we get the lifecycle event22:15
*** priteau has quit IRC22:15
mriedemcfriesen: https://specs.openstack.org/openstack/nova-specs/readme.html#previously-approved-specifications22:15
sean-k-mooneyif we dont we wont activate it.22:15
sean-k-mooneyuntil post live migration so may that could be whats happening in the RDO to OSA flow22:15
cfriesenmriedem: you mean I'm supposed to read the readme?22:15
cfriesen:)22:16
mriedemsean-k-mooney: that code is just a best effort to try and activate the dest host port binding to reduce network downtime, as the comment says, " Otherwise the ports are bound to the destination host                               # in post_live_migration_at_destination."22:16
*** wolverineav has quit IRC22:16
sean-k-mooneyya but if we dont activate here then https://github.com/openstack/nova/blob/stable/rocky/nova/network/neutronv2/api.py#L3088 will be true22:16
mnasernot gonna lie, bugs like this makes me wish we had one agent that did it all :P22:17
mnaserI’ll walk myself out22:17
mriedemsean-k-mooney: sure, and honestly i thought that was still supposed to work22:17
mriedemi don't know enough about what's happening within neutron to cause that duplicate primary key error22:17
sean-k-mooneyit proably shoudl work but i dont think we have testeded it.22:18
mriedemthe new flow deals with port bindings on the port bindings resources in the port bindings API, the old flow just deals with the port resource and its binding:host_id attribute22:18
NewBrucei will post a large log dump into the launchpad… brb22:18
mnasermriedem: nova triés to update the old binding to point to the new host, but because a new binding already exists, it blows up22:18
mnaserCause you can’t have a binding with same port/host combo22:18
sean-k-mooneywe might be abel to reproduce the neutron issue with a functional test22:19
mriedemmnaser: yeah, i just thought neutron was handling that for us22:19
mnaseri guess maybe that’s where the bug lives22:19
*** wolverineav has joined #openstack-nova22:19
mriedemi.e. i thought if we changed the ports binding:host_id value and there was already a port binding for that host, but was inactive, neutron would automatically activate it22:19
sean-k-mooneycreate a port binding, create an inactive port binding on another host, try to update the orginal binding to the destiation instead of activating22:19
sean-k-mooneymriedem: i dont think it does that22:20
mnaserYep that would be a reproducer sean-k-mooney22:20
mriedemsean-k-mooney: hell i could just push a patch to comment this out https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L1155 and our live migration job should blow up22:20
* mriedem pushes22:20
mriedemNewBruce: while you're at it, dump the libvirtd and qemu-kvm package versions for both rdo and osa rocky nodes in the bu22:21
mriedem*bug22:21
NewBruce(i’ve just added to the ml2_port_binding trace to the launchpad as well as a debug version of the logs)22:22
openstackgerritChris Friesen proposed openstack/nova-specs master: Re-propose emulated virtual TPM spec to train  https://review.openstack.org/64946322:25
*** rcernin has joined #openstack-nova22:25
*** ttsiouts has quit IRC22:27
openstackgerritMatt Riedemann proposed openstack/nova master: DNM: Test theory about bug 1822884  https://review.openstack.org/64946422:28
openstackbug 1822884 in OpenStack Compute (nova) "live migration fails due to port binding duplicate key entry in post_live_migrate" [Undecided,New] https://launchpad.net/bugs/182288422:28
mriedemmnaser: NewBruce: sean-k-mooney: ^ we'll find out22:28
*** ttsiouts has joined #openstack-nova22:28
mnaserLet’s wait and see22:29
NewBrucemriedem posted version info22:29
sean-k-mooneyi might be able to reporduce somethign on the neutron side too if i extend https://github.com/openstack/neutron/blob/master/neutron/tests/fullstack/test_ports_rebind.py but honestly im not sure that neutorn has mulit service functional test like nova has where we use a fake message bus and run everything in the one process22:30
NewBrucemriedem will test it out22:30
mriedemNewBruce: our ci system will test it22:30
mriedemif that's the bug, the nova-live-migration job should explode22:31
* mnaser has maintenance from 12am to 3am. M22:31
mriedemNewBruce: and it does look like your libvirt/qemu versions are different between rdo and osa so i wonder if that has something to do with the event getting sent or not22:31
mnaserI’ll head off for a lil bit to doze and catch up on this buffer22:31
mriedemreminds me that i still have https://review.openstack.org/#/c/594527/22:32
*** ttsiouts has quit IRC22:32
mriedemand https://review.openstack.org/#/c/594139/122:33
NewBruceyep, id better get some shut eye soon too and check back in the morning. mnaser do you still want the debug from sean-k-mooney post earlier?22:33
NewBrucemriedem any value in testing the service values? / disabling binding-extended22:34
NewBruce?22:34
mriedemhmm, well we have a nova-grenade-live-migration job but would need to have that running on a stable/rocky change because then one compute would be queens and one would be rocky, or we could just pin rpc to queens in a job on master...22:35
sean-k-mooneyi dont think neutron provides a way to disabel binding extended via config so it would need a code chagne22:35
mriedemsean-k-mooney: there is no api for that?22:35
mriedemi guess not https://developer.openstack.org/api-ref/network/v2/index.html#id522:35
sean-k-mooneyi have looked before ill check but i dont think so22:35
*** BjoernT has quit IRC22:38
NewBrucei think we can run and upgrade accross all the compute nodes anyway; the issue we had with libevent seems to be isolated and easily tested for …. ok, lets see how that job comes out anyway22:38
NewBrucecheers22:38
*** slaweq has joined #openstack-nova22:38
mriedemposted https://review.openstack.org/649470 for the upgrade_levels/compute=queens pin simulation22:38
openstackgerritEric Fried proposed openstack/nova master: WIP: Update tests from fake_libvirt_util mocks  https://review.openstack.org/64947122:39
sean-k-mooneymriedem: we expect this to show up in both the nova-live-migration and the grenade version right?22:41
sean-k-mooneyactully ill check it in the morning or ill fall a sleep watching devstack22:42
sean-k-mooneynight o/22:42
*** slaweq has quit IRC22:43
openstackgerritMerged openstack/nova-specs master: Re-propose emulated virtual TPM spec to train  https://review.openstack.org/64946322:45
*** tkajinam has joined #openstack-nova22:55
*** gmann_afk is now known as gmann23:11
*** hongbin has quit IRC23:17
mriedemooooo hot dog i've got a devstack single node env with 2 non-cell0 cells, 2 computes on cell1 and 1 on cell223:19
mriedemand it's pretty easy23:19
*** tosky has quit IRC23:24
*** mlavalle has quit IRC23:33
openstackgerritMerged openstack/nova master: libvirt: vzstorage: Use 'writeback' QEMU cache mode  https://review.openstack.org/64337623:41
openstackgerritMerged openstack/nova master: libvirt: smbfs: Use 'writeback' QEMU cache mode  https://review.openstack.org/64337723:42
openstackgerritMerged openstack/nova master: Fix comment in test_attach_with_multiattach_fails_not_available  https://review.openstack.org/64944023:42
*** wolverineav has quit IRC23:45
mriedemheh just ran into bug 1781286 again23:47
openstackbug 1781286 in OpenStack Compute (nova) "CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call" [Medium,Triaged] https://launchpad.net/bugs/178128623:47
mriedemwe should maybe think about fixing that...23:47
mriedemalso, if things fail during server create rescheduling in conductor, chances are pretty good we don't set the instance to error status and it's stuck in build status23:48

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!