Monday, 2019-07-15

*** sapd1_x has joined #openstack-nova00:20
*** sapd1_x has quit IRC00:33
*** imacdonn has quit IRC01:15
*** imacdonn has joined #openstack-nova01:15
openstackgerritArthur Dayne proposed openstack/python-novaclient stable/rocky: Add Python 3 Train unit tests  https://review.opendev.org/67074901:26
openstackgerritya.wang proposed openstack/nova master: Add compatibility checks for CPU mode and CPU models and extra flags  https://review.opendev.org/67029901:27
openstackgerritya.wang proposed openstack/nova master: Support report multi CPU model traits  https://review.opendev.org/67030001:27
openstackgerritya.wang proposed openstack/nova master: Add release note  https://review.opendev.org/67044101:27
*** yedongcan has joined #openstack-nova02:20
*** factor has joined #openstack-nova02:51
*** Spencer_Yu has joined #openstack-nova03:10
*** tbachman has quit IRC03:13
Spencer_Yuhttps://bugs.launchpad.net/nova/+bug/1836141  Some failed actions would leave the remains in resize instance, which will lead next resize to fail.03:19
openstackLaunchpad bug 1836141 in OpenStack Compute (nova) "vm resize failed due to the remains left by failed actions" [Undecided,New]03:19
*** whoami-rajat has joined #openstack-nova03:24
*** psachin has joined #openstack-nova03:25
*** Spencer_Yu has quit IRC03:36
*** markvoelker has joined #openstack-nova03:37
*** ricolin has joined #openstack-nova03:43
*** sapd1_x has joined #openstack-nova03:48
*** ricolin has quit IRC03:49
*** udesale has joined #openstack-nova03:49
*** ricolin has joined #openstack-nova03:50
*** sapd1_x has quit IRC04:05
*** tbachman has joined #openstack-nova04:28
openstackgerritGhanshyam Mann proposed openstack/nova master: DRY get_flavor in flavor manage tests  https://review.opendev.org/66828104:37
*** sapd1 has quit IRC05:03
*** yaawang has quit IRC05:13
*** yaawang has joined #openstack-nova05:13
*** shilpasd has joined #openstack-nova05:16
*** rcernin has quit IRC05:27
*** Luzi has joined #openstack-nova05:44
*** ivve has quit IRC05:58
*** ratailor has joined #openstack-nova06:02
*** ratailor_ has joined #openstack-nova06:05
*** ratailor has quit IRC06:07
*** ircuser-1 has quit IRC06:15
*** luksky11 has joined #openstack-nova06:23
*** damien_r has joined #openstack-nova06:24
*** damien_r has quit IRC06:28
*** dpawlik has joined #openstack-nova06:31
*** yedongcan has quit IRC06:31
*** yedongcan has joined #openstack-nova06:33
*** pcaruana has joined #openstack-nova06:36
*** pcaruana has quit IRC06:40
*** rcernin has joined #openstack-nova06:44
*** whoami-rajat has quit IRC06:50
*** geekinutah has quit IRC06:50
*** rouk has quit IRC06:50
*** zbr has quit IRC06:50
*** Hazelesque has quit IRC06:50
*** fungi has quit IRC06:50
*** amodi has quit IRC06:50
*** ajo has quit IRC06:50
*** dtantsur|afk has quit IRC06:50
*** ildikov has quit IRC06:50
*** NobodyCam has quit IRC06:50
*** mugsie has quit IRC06:50
*** Kevin_Zheng has quit IRC06:50
*** niceplace_ has quit IRC06:50
*** melwitt has quit IRC06:50
*** yikun has quit IRC06:50
*** rajinir has quit IRC06:50
*** jhesketh has quit IRC06:50
openstackgerritBrin Zhang proposed openstack/nova-specs master: Proposal for a safer noVNC console with password authentication  https://review.opendev.org/62312006:54
*** ivve has joined #openstack-nova06:57
*** hoonetorg has quit IRC07:02
*** xek has joined #openstack-nova07:08
*** damien_r has joined #openstack-nova07:08
*** ttsiouts has joined #openstack-nova07:08
*** hemna has quit IRC07:13
*** hemna has joined #openstack-nova07:15
*** Kevin_Zheng has joined #openstack-nova07:16
*** melwitt has joined #openstack-nova07:16
*** niceplace_ has joined #openstack-nova07:16
*** yikun has joined #openstack-nova07:16
*** rajinir has joined #openstack-nova07:16
*** jhesketh has joined #openstack-nova07:16
*** whoami-rajat has joined #openstack-nova07:16
*** geekinutah has joined #openstack-nova07:16
*** rouk has joined #openstack-nova07:16
*** zbr has joined #openstack-nova07:16
*** Hazelesque has joined #openstack-nova07:16
*** fungi has joined #openstack-nova07:16
*** amodi has joined #openstack-nova07:16
*** ajo has joined #openstack-nova07:16
*** ildikov has joined #openstack-nova07:16
*** NobodyCam has joined #openstack-nova07:16
*** mugsie has joined #openstack-nova07:16
*** openstackgerrit has quit IRC07:18
*** rpittau|afk is now known as rpittau07:19
*** panda has quit IRC07:19
*** hoonetorg has joined #openstack-nova07:19
*** panda has joined #openstack-nova07:21
*** tbachman has quit IRC07:22
*** tbachman has joined #openstack-nova07:23
*** maciejjozefczyk has joined #openstack-nova07:23
*** slaweq has joined #openstack-nova07:25
*** tbachman has quit IRC07:28
*** ttsiouts has quit IRC07:36
*** ttsiouts has joined #openstack-nova07:36
*** ccamacho has joined #openstack-nova07:38
*** ricolin_ has joined #openstack-nova07:39
*** ttsiouts has quit IRC07:41
*** ricolin has quit IRC07:42
*** ttsiouts has joined #openstack-nova07:46
*** tssurya has joined #openstack-nova07:52
*** helenafm has joined #openstack-nova07:53
*** jangutter has joined #openstack-nova07:55
*** lpetrut has joined #openstack-nova08:00
*** cdent has joined #openstack-nova08:05
alex_xuefried: bauzas need your help on the direction https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:claim_for_instance08:23
*** Kevin_Zheng has quit IRC08:26
*** niceplace_ has quit IRC08:26
*** melwitt has quit IRC08:26
*** yikun has quit IRC08:26
*** rajinir has quit IRC08:26
*** jhesketh has quit IRC08:26
*** whoami-rajat has quit IRC08:27
*** geekinutah has quit IRC08:27
*** rouk has quit IRC08:27
*** zbr has quit IRC08:27
*** Hazelesque has quit IRC08:27
*** fungi has quit IRC08:27
*** amodi has quit IRC08:27
*** ajo has quit IRC08:27
*** ildikov has quit IRC08:27
*** NobodyCam has quit IRC08:27
*** mugsie has quit IRC08:27
*** panda has quit IRC08:29
*** panda has joined #openstack-nova08:31
*** Hazelesque has joined #openstack-nova08:35
*** NobodyCam has joined #openstack-nova08:35
*** geekinutah has joined #openstack-nova08:35
*** rouk has joined #openstack-nova08:36
*** whoami-rajat has joined #openstack-nova08:36
*** ajo has joined #openstack-nova08:36
*** niceplace has joined #openstack-nova08:36
*** melwitt has joined #openstack-nova08:36
*** fungi has joined #openstack-nova08:36
*** zbr has joined #openstack-nova08:37
*** yikun has joined #openstack-nova08:37
*** mugsie has joined #openstack-nova08:37
*** jhesketh has joined #openstack-nova08:37
*** davidsha has joined #openstack-nova08:52
*** sean-k-mooney has quit IRC08:52
*** rcernin has quit IRC08:55
*** sean-k-mooney has joined #openstack-nova08:56
*** ralonsoh has joined #openstack-nova09:02
*** lpetrut has quit IRC09:10
sean-k-mooneystephenfin: do you know the history behind why we went form testr to ostestr to stestr? im wondering if we should consider regerting to testr or look at finding a non subunit based alternitive09:38
*** derekh has joined #openstack-nova09:38
sean-k-mooneyhttps://blog.kortar.org/?p=37009:40
sean-k-mooneyfound the history ^09:41
stephenfinsean-k-mooney: I was pretty sure testr was subunit-based, tbh09:42
sean-k-mooneyit is09:42
sean-k-mooneythat does not mean it has the same bug09:43
*** ricolin_ has quit IRC09:47
cdentsean-k-mooney: are you talking about the overflow bug with too much logging? or something else?09:57
*** rpittau has quit IRC10:03
sean-k-mooneycdent: ya we are now hitting it downstream10:04
sean-k-mooneyRHEL8 ship newer versions of some libs then upper constratis uses for stien10:04
sean-k-mooneycdent: so we get more deprecation warnings then upstream10:05
cdentthe same bug is present in testr and ostestr (which is a testr wrapper). My understanding is the issue is in testtools and/or subunit not testr or stestr10:05
cdentand it really comes down to a binary data structure being too small10:05
cdentit should be per test10:05
sean-k-mooneycdent: well yes and no10:06
*** johnsom has quit IRC10:06
*** rpittau has joined #openstack-nova10:06
sean-k-mooneyit is in subunit parser10:06
sean-k-mooneybut the subunit protocal10:06
cdentso finding and turning off the deprecation warnings (or simply fixing them) is probably the easiest thing10:06
sean-k-mooneyallows large packets to be fragmented10:06
sean-k-mooneyi think the issue is we are not fragmenting the stream10:06
sean-k-mooneycdent: ya that is how we are fixing it upstream10:07
*** johnsom has joined #openstack-nova10:07
sean-k-mooneybut downstream that is more of a challange10:07
sean-k-mooneyit either requires backporting things that may not be backported upstream or downstream only patches10:07
* cdent wonders if we could our should make deprecations warnings 'once'10:08
sean-k-mooneycdent: i was wondering if there was a way to redirect all deperecation warnings to a seperate file so that it does not end up in the test output10:13
*** lxkong has quit IRC10:13
sean-k-mooneywe should be able to install a python logging filter i think but i havent tried10:14
*** lxkong has joined #openstack-nova10:14
cdentyeah, that does appear to be the case10:15
sean-k-mooneythe trick would be using a generic enough regex to match deprecations without redirecting real errors. we would also want the gate jobs to copy the deprecation file so that we can actully track how big it is and squash them10:18
*** ttsiouts has quit IRC10:19
*** ttsiouts has joined #openstack-nova10:19
sean-k-mooneyit lookse like we already filter a bunch of waring here https://github.com/openstack/nova/blob/master/nova/tests/fixtures.py#L815-L87710:22
sean-k-mooneycdent: actully https://github.com/openstack/nova/blob/master/nova/tests/fixtures.py#L820 should be makeing deprecation warning print once10:24
*** ttsiouts has quit IRC10:24
sean-k-mooneybut maybe that is per test?10:24
sean-k-mooneyi would have assumed it would be per logger but maybe we dont use the warning filter in all places we should10:24
cdentcould be that it is not turned on everywhere? there are some tests that don't use the usual base?10:26
sean-k-mooneythe test that is exploding downswtream is the test_instace_action functional test which runns a bunch of tests interall https://github.com/openstack/nova/blob/7279d6fa009c6e276188bcad0ad5a1832849a4f9/nova/tests/functional/notification_sample_tests/test_instance.py#L357-L37710:27
sean-k-mooneyit is not direcly using the fixtre but ill look at its inheritance tree10:28
sean-k-mooneymaybe we dont use it in the functional tests10:28
gibisean-k-mooney: does not this solved the problem https://review.opendev.org/#/c/656844/ ?10:29
sean-k-mooneyi can check if we have that on osp 15 which is stine but migi was saying the backport he has tried sofar do not fix it for us10:30
sean-k-mooneywe dont have auto import of backports so its possible we are missing that one10:31
* gibi goes get some food10:32
sean-k-mooneygibi: ya we have that and no it does not fix it for us10:36
*** mdbooth has joined #openstack-nova10:37
*** luksky11 has quit IRC10:38
*** geekinutah has quit IRC10:39
sean-k-mooneycdent: looks like the waring filter is installed for the functional tests10:43
sean-k-mooneyso i guess we are still looking too much10:43
*** sapd1_x has joined #openstack-nova10:44
sean-k-mooneypart of the issue is i think stestr is creating a log stream per worker not per test10:44
*** ttsiouts has joined #openstack-nova10:56
*** sapd1_x has quit IRC11:04
*** luksky11 has joined #openstack-nova11:14
*** ratailor_ has quit IRC11:17
*** tesseract has joined #openstack-nova11:20
*** sapd1 has joined #openstack-nova11:24
*** pcaruana has joined #openstack-nova11:32
*** psachin has quit IRC11:36
*** udesale has quit IRC11:39
*** udesale has joined #openstack-nova11:40
*** tesseract has quit IRC11:48
*** weshay|rover is now known as weshay11:51
*** tesseract has joined #openstack-nova11:51
*** needssleep is now known as TheJulia12:04
*** cdent has quit IRC12:12
*** ttsiouts has quit IRC12:15
*** ttsiouts has joined #openstack-nova12:15
*** ttsiouts has quit IRC12:18
*** ttsiouts has joined #openstack-nova12:18
*** hongda has joined #openstack-nova12:20
*** hongda has quit IRC12:24
*** ricolin_ has joined #openstack-nova12:30
*** ricolin__ has joined #openstack-nova12:31
*** ricolin_ has quit IRC12:35
*** hongda has joined #openstack-nova12:35
hongdaHello everyone. Can you help to review: https://review.opendev.org/#/c/670016/ and https://review.opendev.org/#/c/669867/ ?  They tried to fix live-migration failure when token expires.    Thanks a lot. XD12:56
*** lpetrut has joined #openstack-nova12:58
sean-k-mooneywe would have to fix this on master first before https://review.opendev.org/#/c/670016/ can be applied13:00
sean-k-mooneywhy is that stable only by the way13:00
efriedalex_xu: Talk to me13:01
*** mdbooth has quit IRC13:05
sean-k-mooneyhongda: im not sure we should just blindly use the admin context13:05
sean-k-mooneywe could be it would proably be better to only use the admin context if the token had expired13:06
*** jmlowe has quit IRC13:07
sean-k-mooneyhongda: it also looks like you are reusing an old bug that was closed in 201713:07
efriedsean-k-mooney: I'm not paying much attention here, but service auth was made for a situation where you start off with a user token for a long-running operation and then it expires somewhere in the middle.13:07
sean-k-mooneyhongda: it would be better to file a new noen and reference it13:07
efriedwrap the user auth in a service auth and you're good.13:07
efriedwhat service is this for?13:07
sean-k-mooneyefried: hongda patches https://review.opendev.org/#/c/669867/113:08
sean-k-mooneyand this stable only patch https://review.opendev.org/#/c/670016/13:08
sean-k-mooneyits related to https://bugs.launchpad.net/nova/+bug/164745113:08
openstackLaunchpad bug 1647451 in OpenStack Compute (nova) newton "Post live migration step could fail due to auth errors" [Medium,Fix committed] - Assigned to Lee Yarwood (lyarwood)13:08
sean-k-mooneyefried: so instead of juat createing an admin client here https://review.opendev.org/#/c/669867/1/nova/network/neutronv2/api.py  we shoudl use the service auth thing right13:09
efriedsean-k-mooney: I'm going to have to take a closer look at this a bit later. What's the operation that starts this flow?13:10
efriedi.e. why does the user token have the opportunity to expire before list_ports?13:10
sean-k-mooneyefried: i ltrally just started looking at this 10 mins ago. but i belive its the admin does openstack server migrate --live near the end of the lifetime of the token13:11
sean-k-mooneyand it expires midway13:11
*** openstackgerrit has joined #openstack-nova13:11
openstackgerritMerged openstack/python-novaclient master: Remove deprecated methods and properties  https://review.opendev.org/66776213:11
sean-k-mooneyor you know the live migration just take a while13:12
efriedat a glance, it appears as though the rest of the stuff in this flow is using admin auth13:12
sean-k-mooneyin either case it expires by the time it gets to post live migrate13:12
efriedbut I don't want to discount the possibility that the list_ports is being done under user auth specifically to guard against someone kicking off this operation when they shouldn't.13:12
sean-k-mooneywell live-migate is an admin only op so its propably fine13:12
*** belmoreira has joined #openstack-nova13:13
efriedokay. There's some more opportunities to reuse the admin client in this flow as well.13:14
efriedLet me come back to this after my mtg13:14
sean-k-mooneysure i was more worried about breaking the api request id tracking stuff13:14
sean-k-mooneybut i guess that will be preseved in the keystone context so it prably fine13:15
sean-k-mooney*proably13:15
*** boxiang has joined #openstack-nova13:15
efriedsean-k-mooney: If get_client with admin=True is breaking the global request ID, then that's broken all over the place. Would be a separate issue, if it's an issue at all.13:18
sean-k-mooneyefried: it proably isnt a problem13:18
efriednope, I'm looking at get_client itself now and it's properly handling the global request ID.13:19
sean-k-mooneybut that is just want i wanted to confrim before i +/-1'd and reviewd it properly13:19
sean-k-mooneyefried: arent you ment to be in a meeting :P13:19
* sean-k-mooney ignores the fact im on a emea wide meeting too13:20
*** yedongcan has left #openstack-nova13:20
efriedYeah, if we're happy that this is supposed to be an admin-only flow anyway, then this change makes sense to me.13:21
openstackgerritSurya Seetharaman proposed openstack/nova master: API microversion 2.75: Add 'power-update' external event  https://review.opendev.org/64561113:22
artomI don't think I'm doing this right13:23
artomI'm trying to see if test_server_connectivity_cold_migration_revert started failing recently-ish13:23
sean-k-mooneylife? openstack? irc?13:23
artomsean-k-mooney, I mean, yes, but:13:23
artomHere's my logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22test_server_connectivity_cold_migration_revert*FAILED%5C%22%20and%20tags%3A%5C%22job-output.txt%5C%2213:23
openstackgerritMerged openstack/python-novaclient master: Deprecate cells v1 and extension commands and APIs  https://review.opendev.org/66959713:24
openstackgerritMerged openstack/python-novaclient master: Add a guide to add a new microversion support  https://review.opendev.org/66700213:24
artomLooks like I got the wildcard wrong - but I need it, because there's a timestamp in there13:24
artomFor example: tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_cold_migration_revert [221.049469s] ... FAILED13:24
artomContext is, we may have merged https://review.opendev.org/#/c/663405/ too soon13:26
artomThat recently un-skipped test is failing pretty consistently (though not 100%) on https://review.opendev.org/#/c/668631/13:27
openstackgerritSurya Seetharaman proposed openstack/python-novaclient master: API microversion 2.75: Add 'power-update' external event  https://review.opendev.org/66679213:27
sean-k-mooney too soon in that its still broken or we need to receck things13:27
artomBut everything looks right from the Nova events POV, so maybe there's something else, and we need to re-skip test_server_connectivity_cold_migration_revert13:27
artomBut step 1 is determining whether those failures started happening on all runs right after we merged the un-skip patch, or whether it's just my patch that's causing trouble13:28
artomHence the logstash query13:28
sean-k-mooneyya13:29
sean-k-mooneyill see if i can hack something to work too quickly13:29
*** jmlowe has joined #openstack-nova13:29
sean-k-mooneyartom: it look like your query is being ignored more or less13:31
artomsean-k-mooney, I know :(13:31
*** mdbooth has joined #openstack-nova13:31
sean-k-mooneydo you have an example failure13:32
*** hemna has quit IRC13:33
artomsean-k-mooney, http://logs.openstack.org/31/668631/7/check/tempest-slow-py3/a16d7d9/job-output.txt.gz#_2019-07-14_15_36_51_14504013:33
sean-k-mooneythanks13:33
artomThe message operator, it's per line, right?13:34
sean-k-mooneyyes it should be13:34
*** betherly has joined #openstack-nova13:35
sean-k-mooneywell yes and no13:35
*** belmoreira has quit IRC13:35
sean-k-mooneythe log stream should chunk it per new line13:35
sean-k-mooneysorry not per line but per log output13:35
sean-k-mooneye.g. if you log multi line message teh log/parser/setream will stream that full log messge to logstash in one go13:36
*** hemna has joined #openstack-nova13:37
artomsean-k-mooney, oh, I think it splits on the _13:38
*** belmoreira has joined #openstack-nova13:38
*** maciejjozefczyk_ has joined #openstack-nova13:39
*** boxiang has quit IRC13:39
*** betherly has quit IRC13:40
*** maciejjozefczyk has quit IRC13:40
sean-k-mooneyits shouldnt13:41
sean-k-mooneyit might be but it shouldn't13:41
alex_xuefried: looking for some feedback on this direction https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:claim_for_instance before I'm going futher.13:41
efriedartom: (still) not paying a lot of attention here, but there ought to be lots of good examples in the elastic-recheck project13:41
artomefried, ack, thanks13:42
efriedalex_xu: Okay, I saw a bunch of patches come in this morning. Were you planning to put up a spec for this?13:42
sean-k-mooneyya https://github.com/openstack-infra/elastic-recheck/tree/master/queries i was looking at those13:42
sean-k-mooneyartom: ^13:42
artomsean-k-mooney, yeah... and I appear to be doing everything right :(13:44
sean-k-mooneyi dont see any example of actully match on the tempest output13:45
artomWait, do I need to capitalize AND?13:45
sean-k-mooneymaybe13:45
sean-k-mooneyyes13:45
*** eharney has joined #openstack-nova13:45
artom*facepalm*13:46
artomOK, now it's turning up nothing13:46
sean-k-mooneyartom: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps.test_server_connectivity_cold_migration_revert%5C%22%20AND%20message%3A%5C%22FAILED%5C%22%20AND%20tags%3A%5C%22job-output.txt%5C%2213:47
sean-k-mooneyset it to 30days13:47
*** amodi has joined #openstack-nova13:47
sean-k-mooneyand things show up13:47
sean-k-mooneythe last failure was about an hour ago13:48
sean-k-mooneylooks like it start to show up on teh 13th13:49
artomsean-k-mooney, that sounds about right13:50
artomMeans https://review.opendev.org/#/c/663405/ is causing it13:50
*** belmoreira has quit IRC13:51
stephenfinmelwitt: If you're around today, could you take a look at this doc fix? https://review.opendev.org/#/c/670125/13:51
artomAll on different changes, too13:52
sean-k-mooneythe first failure was https://review.opendev.org/#/c/627765/13:52
sean-k-mooneyit might be caused by https://review.opendev.org/#/c/663405/ but it looks like a neutron issue to me13:53
artomsean-k-mooney, it has to be "cause" by https://review.opendev.org/#/c/663405/ because before that landed we just didn't run that test :)13:55
alex_xuefried: if we need a spec, I can write up one13:55
openstackgerritStephen Finucane proposed openstack/nova master: Integrate 'pre-commit'  https://review.opendev.org/66551813:56
sean-k-mooneywell we dont know that https://review.opendev.org/#/c/663405/ would fix the issue in that test13:56
sean-k-mooneywe speculated that it should13:56
artomsean-k-mooney, you mean https://review.opendev.org/#/c/667177/?13:56
sean-k-mooneysorry https://review.opendev.org/#/c/663405/ is the tempest chage13:56
artomYeah, we thought it'd be fine13:56
artomApparently not13:57
artomAnd like I said, I checked logs for 2 failing runs, everything seems OK from the Nova POV13:57
sean-k-mooneyright so https://review.opendev.org/#/c/667177/ is likely not enough13:57
artomWe correctly wait for plug-time events, we received them, we finish booting up the guest13:57
sean-k-mooneyit look like the neutron l3 agent has not correctly set up the floating ip13:57
sean-k-mooneyand that is why the ssh connection is not working13:57
*** jmlowe has quit IRC13:57
artomsean-k-mooney, ping is failing13:58
artomBut with no working fip we'd expect both ping and SSH to fail13:58
sean-k-mooneythe vm has got an ip from dhcp13:58
sean-k-mooneyso it has network connectivity13:58
sean-k-mooneybut since the ping/ssh are failing then that means its a floating ip issue13:58
artomI'll start by filing a bug, we can start skipping that test again13:58
artomAnd then work on a solution in its own time13:59
*** belmoreira has joined #openstack-nova13:59
artomShould I file it under Neutron then?14:00
sean-k-mooneyi think the neutorn folks are aware of this14:00
efriedalex_xu: when looking through the patches, is there anything specific you want me to be aware of etc?14:00
sean-k-mooneyam yes but i would check with them first14:00
sean-k-mooneyartom: this is the important bit http://logs.openstack.org/31/668631/7/check/tempest-slow-py3/a16d7d9/job-output.txt.gz#_2019-07-14_15_36_51_30795614:00
sean-k-mooneyit sent a select to confim it recieved the ip to the dhcp server14:01
sean-k-mooneythen we try to ssh in and it fails14:01
*** jmlowe has joined #openstack-nova14:01
sean-k-mooneyactull it connect a little later  http://logs.openstack.org/31/668631/7/check/tempest-slow-py3/a16d7d9/job-output.txt.gz#_2019-07-14_15_36_51_31150014:02
sean-k-mooneyartom: ok so its the ping after teh rever that is failing14:06
*** yonglihe has joined #openstack-nova14:06
artomsean-k-mooney, yeah.14:07
alex_xuefried: I added a device manager, want to hear your thought, is it something we need, or I'm overcomplex it.14:07
artomsean-k-mooney, so my kids woke up (late sleepers), I'll be doing dad stuff got a bit, I'll check back in once I'm at the office14:07
sean-k-mooneythe differenice i am seeing is we dont seem to be trying to ssh before we ping in the final case14:07
*** mdbooth has quit IRC14:07
sean-k-mooneyartom: sure no worries14:08
yonglihesean-k-mooney, I still not totally got the unit test work for 'orphan cleanup'.14:08
yonglihebut still working on that.14:08
*** mdbooth has joined #openstack-nova14:09
artomsean-k-mooney, I guess I'll file the bug for now, in Neutron. We can always change component later.14:10
yonglihe'Add server sub-resource topology API' is on run queue. I fixed the merge conflict. Unit test should pass,  currently zuul failure seems not my fault: https://review.opendev.org/#/c/621476/.   (seems bumping on image saving test case)14:11
yongliheIt hit Bug 1737634:  http://status.openstack.org/elastic-recheck/index.html#173763414:20
openstackbug 1713163 in tempest "duplicate for #1737634 test_delete_saving_image fails because image hasn't transitioned to SAVING" [Medium,Confirmed] https://launchpad.net/bugs/171316314:20
*** belmoreira has quit IRC14:26
*** artom has quit IRC14:26
*** tesseract has quit IRC14:28
*** dpawlik has quit IRC14:29
*** belmoreira has joined #openstack-nova14:30
*** tesseract has joined #openstack-nova14:30
*** ivve has quit IRC14:36
*** dklyle has joined #openstack-nova14:38
*** belmoreira has quit IRC14:39
*** TxGirlGeek has joined #openstack-nova14:39
*** irclogbot_1 has quit IRC14:41
*** belmoreira has joined #openstack-nova14:45
*** ericyoung has quit IRC14:49
*** ericyoung has joined #openstack-nova14:49
*** mlavalle has joined #openstack-nova14:49
*** irclogbot_2 has joined #openstack-nova14:51
*** Luzi has quit IRC14:52
*** belmoreira has quit IRC14:52
*** rouk has quit IRC14:53
*** belmoreira has joined #openstack-nova14:55
*** betherly has joined #openstack-nova14:55
*** beekneemech is now known as bnemec15:01
*** tesseract has quit IRC15:02
efrieddansmith: http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-07-11.log.html#t2019-07-11T13:39:3115:03
efriedLibvirtDriver.power_off() calls self._destroy() which calls guest.poweroff() which calls self._domain.destroy()15:03
*** tesseract has joined #openstack-nova15:03
efriedThat is, unless I'm missing something, the domain XML is destroyed at power off, not during the startup.15:04
*** luksky11 has quit IRC15:04
dansmithefried: I'm not sure what you're saying or asking15:05
efrieddansmith: When we were talking about doing device (specifically vpmem) claims via the virt driver15:06
dansmithefried: when we go into a reboot operation, before we do anything to the guest, we can look at the existing xml.. then if we need to un/redefine it, we can do that with the context we just learned15:06
*** hemna has quit IRC15:07
efrieddansmith: Okay, but if I do a stop, then "wait a while", then start again, I've lost that information.15:07
dansmithno15:07
dansmithyou do not lose the xml across a stop/start15:07
dansmitha libvirt destroy operation does not lose data, it stops the instance (old xen terminology)15:07
dansmithan undefine operation drops the stored definition of the guest15:07
efriedself._domain.destroy() <== does this lose the data?15:08
dansmithno15:08
efriedoh15:08
efriedinteresting choice of name15:08
dansmithit comes from xen15:08
efriedalex_xu: ^15:08
alex_xudansmith: why the start instance action is building xml...15:10
dansmithalex_xu: to update the instance with any stored data or config changes that may have happened15:11
*** ricolin__ has quit IRC15:11
dansmithsince nova is the authority for a lot of things (like config drive or nic mac address, etc)15:11
*** ricolin_ has joined #openstack-nova15:11
alex_xuah..15:12
efriedso why do we need to store things like PCI devs, NUMA topo, etc. in the db at all?15:12
*** ttsiouts has quit IRC15:13
dansmithbecause we have apis for them?15:13
*** ttsiouts has joined #openstack-nova15:13
sean-k-mooneyefried: because we need to asign specfiic devices,cpus to instances and we dont persitit the libvirt xml15:14
dansmithbecause we need aggregated views of that data without contacting every node in the cluste for each sort of thing/15:14
sean-k-mooneyso if we dont store it in the db we dont know what ones are used if  an instance is not running15:14
alex_xusean-k-mooney: we just talk about the libvirt actually persistent everything15:14
sean-k-mooneyalso we need it for the filters15:15
sean-k-mooneyalex_xu: that would cause other issues on upgrade of libvirt versions and nova15:15
dansmithsean-k-mooney: not sure that's true (that we can't get it from libvirt from stored domains),15:15
dansmithbut the operations we need to do would be incredibly expensive if we had to collect "who is using what right now" by contacting every node15:15
sean-k-mooneydansmith: well if we delete teh domain on power off we need to reserve the cores so that we dont violate the requiremetn by booting a new vm15:16
efriedoperations like scheduling, determining available resources...15:16
sean-k-mooneyif we dont undefine the domain on power off15:16
dansmithsean-k-mooney: again, we do not delete the domain on poweroff15:16
sean-k-mooneythen its possible we dont need to store tehm for that15:16
sean-k-mooneybut we would still want to have it for the filter15:16
*** burt has quit IRC15:16
sean-k-mooneywe do on power on because it calls hard reboot15:17
efriedSo in the utopian future where all of that is done properly in placement, the scheduler wouldn't need that info in the db and we could theoretically get rid of it.15:17
dansmithright, we need that info stored because we've promised a cluster-wide API for them, and because the filters need to be able to survey the entire landscape efficiently15:17
sean-k-mooneywe might not on power off i didnt check15:17
sean-k-mooneye.g. ill take your word for it15:17
*** ttsiouts has quit IRC15:18
efrieddansmith: so, for a new resource we're tracking, specifically vpmem for this conversation,15:20
efriedsince we're tracking the inventory properly in placement,15:20
efriedthat means only the virt driver needs to know the connection between an allocation and a specific vpmem namespace,15:20
efriedso there's no reason to store vpmems in the database15:20
dansmithif that is doable (i.e. we don't have to store them for some other reason) then I think that's ideal, yes15:21
efriedand,15:21
efriedsince we can recover the vpmem information from the domain xml on operations other than migrations (see below), there's no need to store the vpmem info on the Instance either.15:21
*** ttsiouts has joined #openstack-nova15:21
efriedSo the only place we actually need the information is in a migrate context15:21
dansmithdo we need it in a migration context?15:21
alex_xuyes15:22
efriednot sure15:22
dansmiththe two ends of a migration are storing the machine-local context in their own libvirts15:22
dansmithmaybe for the revert case?15:22
efriedalex_xu: why would we need it in migration context?15:22
alex_xuI need copy the source pmem data to the dest pmem. I need dest pmem device path15:22
dansmithwait15:22
dansmithso for cold migration you're going to migrate the data as well?15:22
alex_xuyes15:23
efriedthere's a conf option15:23
alex_xuyes, it is configurable by the extra spec15:23
dansmithhow is that going to work? write it to disk, consider it a disk image to move and then write it back?15:23
sean-k-mooneydansmith: not by defualt15:23
alex_xuhttps://review.opendev.org/#/c/634556/12/nova/privsep/libvirt.py15:23
dansmithI *love* this snowflake feature you guys have come up with15:23
sean-k-mooneywe are default to not copy the data but we optionally can copy it15:23
alex_xu^ here is, ssh...15:23
dansmithalex_xu: so you're going to add another scp operation?15:24
alex_xuyes...15:24
dansmithgross15:24
alex_xuit is something we hate...15:24
*** hemna has joined #openstack-nova15:24
efriedWhat's the alternative? Block migrations etc. Which is worse?15:24
dansmithso if/when we get to the point of being able to use libvirt to move the images over the TLS tunnel, we're stuck scp'ing this thing?15:25
dansmithefried: not promising that this data is persistent across moves is one option15:25
dansmithefried: isn't this being added as a snowflake "you can never live migrate" feature?15:25
openstackgerritBalazs Gibizer proposed openstack/nova master: nova-manage: heal port allocations  https://review.opendev.org/63795515:25
efriedI thought live migrate was supported15:25
sean-k-mooneydansmith: in the livemigration case qemu/libvirt can copy the data15:25
alex_xuit support lm15:26
sean-k-mooneyfor cold migrate we have to do it but maybe we can make a libvirt feature request?15:26
dansmithokay, I thought it wasn't going to be able to move that data15:26
alex_xui'm ok without copy for resize15:26
efriedhttp://specs.openstack.org/openstack/nova-specs/specs/train/approved/virtual-persistent-memory.html#live-migration15:27
sean-k-mooneyalex_xu: the issue with scp/rsync for resize is it wont work for cross cell resize15:27
efriedapparently libvirt moves it along with RAM15:27
sean-k-mooneyalex_xu: e.g. for cross cell resize we cant assume teh compute nodes have network connectivtiy15:27
alex_xusean-k-mooney: yes, we are going to stop cross cell resize in the initial proposal15:27
*** igordc has joined #openstack-nova15:27
sean-k-mooneyat least not in the edge deployment case15:27
dansmithlive migration is transparent to the user, so that's good, and resize is not, so I think it's reasonable to say that there's no data copy when you resize15:28
dansmithyou could resize to a flavor with/without pmem too,15:28
*** maciejjozefczyk_ has quit IRC15:28
dansmithso not copying would cover that case in both directions15:28
*** maciejjozefczyk_ has joined #openstack-nova15:28
efriedper the spec, "reduction" in vpmem would not be allowed15:28
dansmithbecause of this15:28
dansmithI have to jump on a call now,15:29
*** hemna has quit IRC15:29
dansmithbut I'm really unhappy with us adding another "just scp it across" thing15:29
efriedalex_xu: Sounds like the spec update https://review.opendev.org/669970 needs to be rethought :)15:29
alex_xuyea, will update again15:29
alex_xuI think the copy for resize can be removed15:30
efried& cold migration15:30
efried(which is the same thing?)15:30
dansmiththey are the same thing15:30
* efried shoots self in face15:30
*** hemna has joined #openstack-nova15:31
*** gyee has joined #openstack-nova15:31
alex_xuremove the copy, remove the vpmem field15:31
efrieddoesn't the VM see the vpmem as persistent storage, though?15:31
sean-k-mooneyefried: not really15:31
efriedwon't it freak out if it boots and the data is gone?15:31
efriedokay, then yeah, that vastly simplifies things.15:31
sean-k-mooneyit sees it as ram/dimms15:31
efriedunfortunate that we already merged https://review.opendev.org/#/c/662697/ -- seems as though we won't be using that?15:32
sean-k-mooneythe data is ment to be persitent15:32
alex_xuand the most of usecase is for cache, so it is ok15:32
sean-k-mooneybut you should really store your data soewhere else and keep the working set in teh pmem15:32
sean-k-mooneyya most workload use it as high capasity scratch space for operating on a subset of the data but the long term storage of the data should be in a cinder volume15:33
efriedI've put a hold on https://review.opendev.org/#/c/634548/ for now15:36
*** lpetrut has quit IRC15:36
*** ivve has joined #openstack-nova15:40
gibiefried: thanks for catching bug in https://review.opendev.org/#/c/637955/ I've fixed it.15:51
efriedgibi: cool15:51
efriedgot a nice backlog today, but hopefully I can get back around to it.15:51
efriedgibi: since you're around, would you mind pushing https://review.opendev.org/#/c/657464/ ?15:51
gibiefried: on it15:52
openstackgerritStephen Finucane proposed openstack/nova master: Update supported transports for iscsi connector  https://review.opendev.org/52444315:52
efriedthanks15:52
*** hemna has quit IRC15:59
efriedsean-k-mooney: is rebuild admin-only?16:00
sean-k-mooneyno16:00
sean-k-mooneyrebuild and resize can both be done by tenants16:00
efriedsean-k-mooney: okay, so back to that thing we were looking at earlier, it looks like setup_networks_on_host is caled in rebuild and resize flows as well as migrate16:01
*** maciejjozefczyk_ has quit IRC16:02
efriedhm, unless teardown=True only happens on migrations...16:03
openstackgerritMerged openstack/os-resource-classes master: Propose FPGA and PGPU resource classes  https://review.opendev.org/65746416:03
efrieddoes _confirm_resize only happen on migration-y resizes?16:03
*** damien_r has quit IRC16:03
sean-k-mooneyefried: it happesn on resize or cold migration16:04
sean-k-mooneyin both cases we go into resize_verify and you chave to do reseize --confim16:05
efriedrats, nother meeting, will come back to this...16:05
sean-k-mooneyefried: stephenfin is adding an openstack server migrate --confirm as synatactic sugar but its the same call underneath16:06
stephenfinthat threw me too, fwiw16:07
stephenfin(whether cold migrations needed to be confirmed or not)16:07
*** ttsiouts has quit IRC16:07
sean-k-mooneystephenfin: well that depends on your config settings16:07
sean-k-mooneybut ya16:07
*** ttsiouts has joined #openstack-nova16:08
sean-k-mooneyi hit that oddity back in hevana so at this point i dont even think about them differently anymore.16:08
sean-k-mooney(cold migraton vs resize)16:08
*** tssurya has quit IRC16:11
*** ttsiouts has quit IRC16:13
*** belmoreira has quit IRC16:30
*** rpittau is now known as rpittau|afk16:30
*** artom has joined #openstack-nova16:39
*** artom has quit IRC16:43
*** helenafm has quit IRC16:49
*** ricolin_ has quit IRC16:50
*** davidsha has quit IRC16:51
*** udesale has quit IRC16:56
*** cdent has joined #openstack-nova16:58
*** belmoreira has joined #openstack-nova16:58
*** belmoreira has quit IRC17:02
*** derekh has quit IRC17:03
*** lpetrut has joined #openstack-nova17:03
*** hongda has quit IRC17:06
*** igordc has quit IRC17:07
*** artom has joined #openstack-nova17:15
bbobrovsean-k-mooney: hey17:17
bbobrovsean-k-mooney: regarding your comment to https://review.opendev.org/#/c/638680/24/nova/virt/libvirt/driver.py17:17
* sean-k-mooney clicks17:18
bbobrovsean-k-mooney: do you have a traceback how it blows up?17:18
sean-k-mooneyi have fixed it in https://review.opendev.org/#/c/670189/17:18
sean-k-mooneybbobrov: basicaly the old code would use q35 if there was not default for an archatecutre17:19
sean-k-mooneywhich is invalid if you have sparc or many other arch specific qemu-* packages installed17:20
sean-k-mooneyso libvirt would raise an exception saying q35 is not supported by emulator X17:20
bbobrovsean-k-mooney: understood, thanks. Don't we have a ci job to catch this kind of stuff?17:21
sean-k-mooneythis was not caught as this is currently not used outside of the tests and the test mock out all calls to libvirt17:21
sean-k-mooneyso it was passing because of invalid test data17:22
sean-k-mooneyif you tried to call that finciton in the agent then you got tracebacks17:22
*** ralonsoh has quit IRC17:22
sean-k-mooneyon ubunutu 18.04 they package all the emultor by defualt and install them when you install qemu and qemu-kvm17:23
bbobrovok, thanks. I'll review 670189 then17:23
sean-k-mooneyso i would have expect this to break but only  when we added code to call this17:23
sean-k-mooneyi think the signiture of get_domain_capabilities is proably not what we want but i have fixed up the function without changing it for now to not break the sev code an allow me to contue with my own work17:25
*** altlogbot_1 has quit IRC17:25
*** irclogbot_2 has quit IRC17:26
sean-k-mooneythe api signature of _get_domain_capabilities is actully more useful IMO.17:26
bbobrovsean-k-mooney: how should it look then? Maybe i could quickly fix the sev code17:26
bbobrovsean-k-mooney: and rebase on top of the fix17:26
sean-k-mooneyi think we should be able to optionally be able to pass the arch and machine type to it and have it query them if its not cached17:27
sean-k-mooneyhttps://review.opendev.org/#/c/670189/3/nova/virt/libvirt/host.py@67417:27
sean-k-mooneyhave it default to arch=None mtype=None and retrun all of them17:27
sean-k-mooneybut it depend on how we will be using it17:28
sean-k-mooneyif we look up data by the arch only then its fine17:28
sean-k-mooneyif we look it up by arch and mtype there is no gurantee the combination will be in the dict17:29
*** irclogbot_2 has joined #openstack-nova17:29
*** TxGirlGeek has quit IRC17:29
*** altlogbot_1 has joined #openstack-nova17:29
sean-k-mooneyi can work with the api as it is for now as i just need it to create traits17:30
sean-k-mooneybut if i was using this in the driver the current interface would be limiting17:30
*** altlogbot_1 has quit IRC17:31
sean-k-mooneyideally the libvirt driver would never need to call _get_domain_capabilities directly but since the mtype can be set in the image its possible that you will not find in in the cached copy and would need too17:31
*** TxGirlGeek has joined #openstack-nova17:32
*** irclogbot_2 has quit IRC17:32
*** lpetrut has quit IRC17:33
*** irclogbot_3 has joined #openstack-nova17:39
*** altlogbot_2 has joined #openstack-nova17:40
*** dpawlik has joined #openstack-nova17:41
*** tesseract has quit IRC17:45
*** dpawlik has quit IRC17:52
*** dpawlik has joined #openstack-nova17:52
*** igordc has joined #openstack-nova18:05
*** brault has joined #openstack-nova18:36
*** panda has quit IRC18:38
*** panda has joined #openstack-nova18:40
*** hemna has joined #openstack-nova18:40
efriedbbobrov: Hi, since you're here, are you working on rebasing the SEV series?18:40
*** brault has quit IRC18:41
*** tbachman has joined #openstack-nova18:42
*** brault has joined #openstack-nova18:44
*** jmlowe has quit IRC18:46
*** brault has quit IRC18:51
openstackgerritBoris Bobrov proposed openstack/nova master: Provide HW_CPU_X86_AMD_SEV trait when SEV is supported  https://review.opendev.org/63868019:03
openstackgerritBoris Bobrov proposed openstack/nova master: Add extra spec parameter and image property for memory encryption  https://review.opendev.org/66442019:03
openstackgerritBoris Bobrov proposed openstack/nova master: Extract SEV-specific bits on host detection  https://review.opendev.org/63633419:03
openstackgerritBoris Bobrov proposed openstack/nova master: Add <launchSecurity> and <driver iommu='on' /> to config.py  https://review.opendev.org/63631819:03
openstackgerritBoris Bobrov proposed openstack/nova master: Apply SEV-specific guest config when SEV is required  https://review.opendev.org/64456519:03
openstackgerritBoris Bobrov proposed openstack/nova master: Enable booting of libvirt guests with AMD SEV memory encryption  https://review.opendev.org/66661619:03
bbobrovefried: here is the answer :)19:03
efriedblam!19:04
efriedthanks bbobrov19:04
bbobrovi will reply to the comments now19:04
artomWhat happened to aspiers?19:04
*** belmoreira has joined #openstack-nova19:21
efriedwas wondering same19:21
*** maciejjozefczyk_ has joined #openstack-nova19:22
*** lee1 has joined #openstack-nova19:23
*** lee1 is now known as lyarwood19:23
*** jmlowe has joined #openstack-nova19:25
*** maciejjozefczyk_ has quit IRC19:28
*** luksky11 has joined #openstack-nova19:34
openstackgerritEric Fried proposed openstack/nova master: Use Adapter global_request_id kwarg  https://review.opendev.org/67090719:35
cdentefried: the semaphore thing is indirectly related to https://bugs.launchpad.net/nova/+bug/1835958 (power states not under the same lock, but in the realm of nova-compute performance)19:35
openstackLaunchpad bug 1835958 in OpenStack Compute (nova) "Nova sync power state on large clusters causes poor performance" [Undecided,New]19:35
efriedcdent: Only slightly related, I've been talking to alex_xu about moving more stuff under that semaphore.19:37
efriedspecifically driver-specific claim stuff19:37
efriedNot sure if you were around for those conversations.19:37
cdentnot that I'm aware of19:37
cdentit's a problem for high throughput nova-computes (like in vmware where the nova-compute is a chokepoint, rather than a member of a nice bit of parallelism like in a big kvm cloud)19:38
*** belmoreira has quit IRC19:40
efriedcdent: well, the plan is to start delegating virt-specific claimage to the virt driver itself. I could see where vmware could take its own semaphore on a specific internal nodeything and background the real claim job, so the next one could come in and get started (but on a different nodeything).19:40
efriedand it would be none of RT's business.19:40
efriedthe more we move claimables into placement, the more we can delegate the claim logic for same to the virt driver.19:41
efriedso e.g. PCI devices would eventually become virt driver business.19:42
efriedand - as we were discussing this morning - not even stored in the db anymore.19:42
cdentwell that would be lovely19:44
*** eharney has quit IRC19:49
slaweqhi, is there any n-meta-api service expert here? Can You take a look at https://bugs.launchpad.net/neutron/+bug/1836642 from Nova PoV? Thx in advance :)19:57
openstackLaunchpad bug 1836642 in OpenStack Compute (nova) "Metadata responses are very slow sometimes" [Undecided,New]19:57
efriedartom: catching up, are you filing an elastic-recheck profile for bug 1836595 ?20:01
openstackbug 1836595 in neutron "test_server_connectivity_cold_migration_revert failing" [Undecided,New] https://launchpad.net/bugs/183659520:01
*** betherly has quit IRC20:02
artomefried, I could - should I be? :)20:02
artomefried, I'm kinda hoping https://review.opendev.org/#/c/670848/ would merge though20:02
artomSpeaking of which, gmann ^^ :)20:03
artom(If you're around, not sure about your TZ)20:03
efriedartom: This is interesting, the failure on that skip patch http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/testr_results.html.gz20:05
artomefried, looks like feral packets to me20:06
efriedconflict deleting allocations, looks like it's tempest itself that's being a placement client. Without having looked at the code, one wonders whether it should be retrying20:06
efriedferal packets? how so?20:06
artomYeah, tempest makes a point not to use any Python clients20:06
artomefried, heh, a lazy way of saying "something unrelated I can't be arsed to debug" :P20:07
artomSo it'll do the GET requests itself20:07
efriedthere is no placement client, my point is that this isn't tempest calling something in nova that's talking to placement and getting a conflict.20:07
efriedthis is tempest talking directly to placement20:07
artomAnd that's bad?20:07
efriedit means there's likely an opportunity to harden the tempest code so this failure doesn't happen anymore.20:08
* efried clones tempest for possibly the second time ever...20:08
artom*facepalm*, oh just got what you're saying20:08
artomTempest shouldn't be deleting allocations itself, Nova should be doing it20:09
artomI think that response is from Nova though20:09
*** altlogbot_2 has quit IRC20:10
efriedmmm, yes, it looks like you're right.20:11
artomThat test is really confusing20:11
efriedif the error is coming from nova, it means we have a bug in nova.20:13
*** altlogbot_1 has joined #openstack-nova20:13
artomWell, looks like tempest is creating a server, not waiting for it to become ACTIVE, then immediately deleting it20:16
artomSo I suspect that makes Nova hit a race between the build process and the delete request20:16
cdentartom: that sounds right20:16
artomShort term that can be "fixed" by making Tempest wait20:17
artomLonger term I guess we'll need to stick a lock in Nova somewhere20:17
cdentplacement log entries are near http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/controller/logs/screen-placement-api.txt.gz#_Jul_15_17_27_35_28372020:18
cdenthandling for the req id starts here http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/controller/logs/screen-n-cpu.txt.gz#_Jul_15_17_27_33_96826420:20
artomActually from the little I know about placement it should handle those kinds of races with the generation thing, right?20:20
artomSo maybe Nova just needs to handle Placement error better20:20
efriedthe generation thing is exactly what's happening here20:20
efriedalloc deletion was specifically set up so that you couldn't race deletion with some other consumer op20:20
efriedso this guard is doing exactly what it's supposed to, and it's the overarching operation that's broken. As you say, trying to delete while creating.20:21
cdent"instance disappeared during build" http://logs.openstack.org/48/670848/1/check/neutron-tempest-dvr/ed2b81c/controller/logs/screen-n-cpu.txt.gz#_Jul_15_17_27_34_80692520:21
efriedonly question is where to fix it.20:21
efriedit would be possible to redrive the alloc delete until it works. But that's still potentially racy if the other op is trying to create the alloc at the same time.20:22
efriedthere's a couple races here actually.20:23
artom@instance_state(ACTIVE) for delete :D20:23
efriedThe other one is if the deletion goes through before the allocation is even created20:23
efriedit will not raise an exception (it returns False, but the caller doesn't check that) so we'll end up with a leaked allocation20:25
cdentefried, artom if you end up creating a bug about this, please let me know what it is so I can follow along, tomorrow20:26
* cdent waves goodnight20:26
*** cdent has quit IRC20:26
artomefried, mind doing it? You have a better grasp of that stuff, and I'm off in 30 minutes anyways, daycare duty20:27
efriedack20:27
*** pcaruana has quit IRC20:29
*** slaweq has quit IRC20:31
*** dpawlik has quit IRC20:34
*** TxGirlGeek has quit IRC20:35
*** BjoernT has joined #openstack-nova20:37
*** eharney has joined #openstack-nova20:41
*** BjoernT_ has joined #openstack-nova20:41
*** BjoernT has quit IRC20:42
*** TxGirlGeek has joined #openstack-nova20:46
artom*snerk*20:48
artomhttps://bugs.launchpad.net/nova/+bug/183620420:49
openstackLaunchpad bug 1836204 in OpenStack Compute (nova) "The allocation of VGPU has race problem" [High,Triaged] - Assigned to Alex Xu (xuhj)20:49
artomI guess it hates blacks20:49
artom*shakes head* I'm really sorry20:49
*** xek has quit IRC20:52
*** artom has quit IRC21:07
*** whoami-rajat has quit IRC22:04
*** BjoernT_ has quit IRC22:07
*** luksky11 has quit IRC22:20
*** icarusfactor has joined #openstack-nova22:22
*** factor has quit IRC22:22
*** ircuser-1 has joined #openstack-nova22:23
*** factor has joined #openstack-nova22:32
*** factor has quit IRC22:33
*** icarusfactor has quit IRC22:34
*** tbachman has quit IRC22:50
*** tbachman has joined #openstack-nova22:52
efrieddansmith: Do you feel we need a bp/spec for claim_for_instance?22:57
efriedthere should be no API, db, object, conf, upgrade, or doc impacts22:58
*** tbachman has joined #openstack-nova22:58
*** tkajinam has joined #openstack-nova22:59
*** rcernin has joined #openstack-nova23:29
*** TxGirlGeek has quit IRC23:45

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!