Monday, 2020-04-20

*** brinzhang has joined #openstack-nova00:20
*** ociuhandu has joined #openstack-nova00:21
*** ociuhandu has quit IRC00:25
*** threestrands has joined #openstack-nova00:35
*** maohongbo has quit IRC00:46
*** slaweq has joined #openstack-nova00:50
*** brinzhang has quit IRC00:58
*** brinzhang has joined #openstack-nova01:01
*** brinzhang has quit IRC01:28
*** brinzhang has joined #openstack-nova01:29
*** sapd1 has quit IRC01:52
*** sapd1 has joined #openstack-nova02:11
*** ociuhandu has joined #openstack-nova02:26
*** ociuhandu has quit IRC02:36
*** ociuhandu has joined #openstack-nova02:37
*** ociuhandu has quit IRC02:42
*** huaqiang has joined #openstack-nova02:42
*** ociuhandu has joined #openstack-nova02:55
*** jhesketh has joined #openstack-nova02:56
openstackgerritQitao proposed openstack/nova master: Use unittest.mock instead of third party mock  https://review.opendev.org/72114502:59
*** ociuhandu has quit IRC03:05
*** kevinz has joined #openstack-nova03:07
*** ircuser-1 has quit IRC03:09
*** mkrai has joined #openstack-nova03:15
*** psachin has joined #openstack-nova03:37
*** ociuhandu has joined #openstack-nova03:48
*** ociuhandu has quit IRC03:53
*** ratailor has joined #openstack-nova04:21
*** evrardjp has quit IRC04:23
*** udesale has joined #openstack-nova04:38
*** ratailor has quit IRC04:38
*** mkrai has quit IRC04:44
*** ratailor has joined #openstack-nova04:44
*** mkrai has joined #openstack-nova04:45
*** tkajinam has quit IRC05:12
*** tkajinam has joined #openstack-nova05:13
*** iokiwi has quit IRC05:16
*** iokiwi has joined #openstack-nova05:17
*** yaawang_ has quit IRC05:18
*** yaawang_ has joined #openstack-nova05:20
*** mkrai has quit IRC05:41
*** evrardjp has joined #openstack-nova05:41
*** mkrai has joined #openstack-nova05:42
*** vishalmanchanda has joined #openstack-nova05:46
*** dpawlik has joined #openstack-nova06:09
*** nightmare_unreal has joined #openstack-nova06:32
*** ociuhandu has joined #openstack-nova07:00
*** maciejjozefczyk has joined #openstack-nova07:02
*** ttsiouts has joined #openstack-nova07:11
*** rpittau|afk is now known as rpittau07:12
*** tesseract has joined #openstack-nova07:12
*** breizhkoala has joined #openstack-nova07:13
*** ttsiouts has quit IRC07:14
*** ttsiouts_ has joined #openstack-nova07:14
*** threestrands_ has joined #openstack-nova07:18
*** ralonsoh has joined #openstack-nova07:18
*** threestrands_ has quit IRC07:18
*** threestrands has quit IRC07:21
bauzasgood morning Nova07:24
gibigood morning Nova07:24
lyarwoodmorning morning07:25
*** tobias-urdin has joined #openstack-nova07:27
*** amodi has quit IRC07:45
*** udesale_ has joined #openstack-nova07:47
*** udesale has quit IRC07:50
lyarwoodhttps://review.opendev.org/#/c/718964/ & https://review.opendev.org/#/c/701756/ could use stable core review so I can cut the release ahead of RC if anyone has time today btw07:56
*** ratailor has quit IRC08:05
*** xek has joined #openstack-nova08:05
*** ratailor has joined #openstack-nova08:05
*** ociuhandu has quit IRC08:07
*** ratailor has quit IRC08:07
*** ratailor has joined #openstack-nova08:07
*** ratailor has quit IRC08:10
openstackgerritLee Yarwood proposed openstack/nova master: WIP nova-next: Start testing the q35 machine type  https://review.opendev.org/70870108:10
*** ccamacho has joined #openstack-nova08:12
*** mkrai has quit IRC08:15
*** mkrai_ has joined #openstack-nova08:15
*** ratailor has joined #openstack-nova08:29
*** ratailor_ has joined #openstack-nova08:38
*** ratailor has quit IRC08:42
*** ociuhandu has joined #openstack-nova08:43
bauzaslyarwood: any idea why https://review.opendev.org/699291 is not backported ? https://review.opendev.org/#/q/topic:bug/1855927+(status:open+OR+status:merged)08:48
*** ociuhandu has quit IRC08:48
*** ttsiouts_ has quit IRC08:48
bauzaslyarwood: ah, nevermind, found the backport https://review.opendev.org/#/c/701756/208:49
bauzassomething is actually weird with those changes, trying to untangle it08:52
*** ttsiouts has joined #openstack-nova08:53
bauzasah-ah, okay, got it08:54
bauzashttps://review.opendev.org/#/q/topic:bug/1856925+(status:open+OR+status:merged) vs. https://review.opendev.org/#/q/topic:bug/1855927+%28status:open+OR+status:merged%2908:54
*** vishalmanchanda has quit IRC08:56
*** tosky has joined #openstack-nova09:02
lyarwoodbauzas: sorry was head down on something downstream09:02
* lyarwood reads09:02
bauzaslyarwood: no worries, commenting with a +109:02
* bauzas sorted out all this mess09:02
lyarwoodright because https://review.opendev.org/#/c/699291/ hasn't landed yet09:03
lyarwoodsorry09:03
lyarwooddoes anyone know if we dump the compute service version somewhere in the logs during n-cpu startup?09:04
* lyarwood is trying to debug a weird stable/queens issue downstream and assumes there's an older compute in the env but can't prove it09:04
bauzaslyarwood: I just proposed to provide another revision of https://review.opendev.org/#/c/701756/2 that would be in a separate branch and just isolated09:05
toskylyarwood: you can check the package version, I guess09:06
bauzasbecause that's confusing09:06
*** ratailor has joined #openstack-nova09:06
bauzaslyarwood: AFAIR we do provide the compute service version, lemme grab you the logs09:06
*** ratailor_ has quit IRC09:06
*** psachin has quit IRC09:09
bauzaslyarwood: you can get the compute package version here which allows you to get the compute service version by looking at the code https://zuul.opendev.org/t/openstack/build/96486cc66f7542318bbcfc4a43784d56/log/controller/logs/screen-n-cpu.txt#86109:10
*** avolkov has joined #openstack-nova09:11
lyarwoodbauzas: do we not dump anything in the API or scheduler about the compute versions they are aware of09:12
lyarwoodbauzas: I'm specifically trying to find the service version, not the package version btw.09:12
lyarwoodbauzas: https://github.com/openstack/nova/blob/stable/queens/nova/compute/api.py#L3976-L3998 for this09:12
bauzaslyarwood: yup, I understood your question09:12
lyarwoodbauzas: I'm thinking that there's an older compute still registered somewhere that's causing that to use the old legacy path09:13
bauzasbut AFAIR, we don't expose the service versions, just the package versions09:13
lyarwoodkk thanks09:14
bauzaslyarwood: this being said, the version field of the Service object is maybe expose thru the REST API09:15
bauzasexposed*09:15
* bauzas goes looking at the api-ref09:16
bauzasmeh, nvm09:16
bauzasthis would be a nova-manage thing09:17
*** psachin has joined #openstack-nova09:17
bauzasat least we don't return it to the API https://docs.openstack.org/api-ref/compute/?expanded=list-compute-services-detail#id36809:18
*** ociuhandu has joined #openstack-nova09:18
bauzaslyarwood: I think I found something interesting for your problem09:21
*** dtantsur|afk is now known as dtantsur09:21
bauzaslyarwood: the logs can emit some information about an old service when it starts https://github.com/openstack/nova/blob/master/nova/service.py#L72-L8109:22
*** ratailor has quit IRC09:23
*** ratailor has joined #openstack-nova09:23
*** gregwork has quit IRC09:24
*** rcernin has quit IRC09:26
*** ociuhandu has quit IRC09:32
*** ociuhandu has joined #openstack-nova09:33
nightmare_unrealhello is there a way to get nova cli output in json ? for e.g. openstack server list -f json give output in json. but there is no such thing(-format) in nova cli09:34
gibibauzas, lyarwood: the version notifications contain the service version https://github.com/openstack/nova/blob/master/doc/notification_samples/common_payloads/ServiceStatusPayload.json#L1209:38
bauzasgibi: ah good point, forgot it09:39
lyarwoodgibi: ah! thanks09:39
*** ociuhandu has quit IRC09:39
lyarwoodgibi: but that's only for active services right?09:39
openstackgerritSylvain Bauza proposed openstack/nova master: Allocate mdevs when resizing or reverting resize  https://review.opendev.org/71274109:39
* bauzas wonders if he was drunk when he uploaded his last bit of https://review.opendev.org/#/c/712741/09:39
lyarwoodgibi: I'm assuming these computes are dead tbh so maybe the db is the only real way to tell09:40
bauzasI had to modify again something I fixed locally09:40
bauzasfor some reason, my editor provided me a stale version of an old file09:40
bauzasso my last commit reverted another fix I made09:41
bauzasstrange...09:41
bauzas(I have to be cautious now)09:41
bauzasAtom--09:41
gibilyarwood: if you can interact with the service via the REST API then you can get the above notificaiton09:44
gibilike when you disable it09:44
lyarwoodack thanks09:45
bauzasgibi: lyarwood: I don't think it's a crucial and secret information bit to expose the service object version thru the logs09:46
lyarwoodbauzas: for admins no I guess not09:47
gibibauzas: agree, this is not a secret for admins09:48
bauzasjust sayin', we could log something there https://github.com/openstack/nova/blob/stable/queens/nova/service.py#L166 (for DEBUG purposes)09:48
gibisure09:48
bauzaswe already do https://github.com/openstack/nova/blob/stable/queens/nova/service.py#L15809:48
bauzaslyarwood: feel free to file a patch ;)09:49
*** rcernin has joined #openstack-nova09:51
*** vishalmanchanda has joined #openstack-nova09:53
brinzhanggood morning all09:57
brinzhanggibi: I am not sure which is true, but I replied something in https://review.opendev.org/#/c/720670/09:58
brinzhangIndeed, nova-cinder interaction and nova-neutron interaction have different result, if the xxxclient reports exception09:59
*** mkrai has joined #openstack-nova10:00
*** ratailor has quit IRC10:00
*** ratailor has joined #openstack-nova10:02
*** mkrai_ has quit IRC10:02
bauzasstephenfin: gibi: fwiw, we said we could let it go https://review.opendev.org/#/c/712741/10:03
bauzas(don't leave it frozen)10:03
bauzasokay, it was a terrible play of words10:04
bauzas=> []10:04
brinzhangbauzas: a nit inline ^10:06
bauzasbrinzhang: can't see your comment :)10:07
gibibauzas: I will reply after lunch10:07
brinzhangbauzas: my net so .. slowly10:07
bauzasthanks migi and brinzhang10:07
gibibrinzhang: same to you10:07
bauzasoh shit, wrong mix10:07
brinzhanggibi: thanks10:08
bauzasthanks gibi10:08
bauzasmigi, gibi, dammit10:08
*** ociuhandu has joined #openstack-nova10:08
bauzasah, sad, he's not connected here10:08
brinzhangbauzas: done10:08
bauzasbrinzhang: ack, good, can respin since Zuul hasn't replied yet10:09
openstackgerritSylvain Bauza proposed openstack/nova master: Allocate mdevs when resizing or reverting resize  https://review.opendev.org/71274110:11
*** breizhkoala has quit IRC10:13
*** rpittau is now known as rpittau|bbl10:15
brinzhangbauzas: thanks quickly update ^10:16
brinzhangbauzas: stephenfin working on change "import mock" to "from unittest import mock", https://review.opendev.org/#/c/714676/310:16
brinzhangbauzas: does this [1] need to change? or wait for this merged, than stephenfin update that patch? [1]https://review.opendev.org/#/c/712741/6/nova/tests/functional/libvirt/test_vgpu.py@1710:17
bauzasI honestly think this is a rathole :)10:17
brinzhangbauzas: yeah,  I think so10:19
brinzhangbauzas: https://review.opendev.org/#/c/712741/6/nova/virt/libvirt/driver.py@10111 need someone to check? I saw you add ? in10:20
brinzhangothers looks good to me10:22
*** ociuhandu has quit IRC10:22
bauzasbrinzhang: not sure I understand your question ?10:32
bauzasbrinzhang: do you mean that the comment is confusing ?10:33
bauzasb/c it's a question ?10:33
brinzhangyes,10:33
brinzhangthat should a note, right?10:33
bauzasahah, no, it's just something like 'verify if we need to assign some mdevs"10:34
bauzasif it was a question, it would be a FIXME or a TODO10:34
brinzhangbauzas: ah, yes, I think you missed TODO or FIXME tag10:35
bauzasbrinzhang: again, no10:36
bauzasit wasn't a question for others10:36
brinzhangYou only put one question here, and there is no extra detail, I think it needs to be added, isn't it?10:37
bauzasbrinzhang: it's not really a question for others, it's just explain what the method does10:38
bauzasit just explains* sorry10:38
brinzhangit's ok, it really confuses me, maybe I should take it seriously.10:39
brinzhangthanks bauzas ^10:39
bauzasI can provide a FUP if you want10:41
*** derekh has joined #openstack-nova10:54
gibibrinzhang: responded in https://review.opendev.org/#/c/72067010:56
gibibauzas: I have started looking at https://review.opendev.org/#/c/712741/ just now10:57
brinzhanggibi: I think your think make sense, agree, thanks11:00
brinzhangthis is an invalid bug11:03
*** songwenping has joined #openstack-nova11:17
*** ttsiouts has quit IRC11:20
*** tkajinam has quit IRC11:21
songwenpinggb:Hi gibi. I am working on nova-cyborg-interaction now, and commit this patch https://review.opendev.org/#/c/720670/11:22
gibisongwenping: hi11:22
songwenpingWe haven't show the ARQ id in dashboard now.11:22
songwenpingBut i think we will show it like cinder volume.11:23
songwenpingSo should we handle the cyborg exception after showing it?11:24
gibisongwenping: still, expecting the end user to _know_ where and how to clean up after a seemingly _successfull_ server delete operations feels bad11:24
gibibasically after every server delete the end user would need to check the cyborg API to know if the ARQs are freed up or not11:24
gibiI don't like that11:25
brinzhanggibi, songwenping: agree with gibi, if there are so many resources leaked in Cyborg, it will be a heavy works to cleanup11:26
brinzhangbut compared with Cinder logical, it also has the same issue.11:26
brinzhangmaybe we shuold have a logical to deal with this, or dealed in Cinder and/or Cyborg11:27
*** sean-k-mooney has joined #openstack-nova11:29
*** ttsiouts has joined #openstack-nova11:30
songwenpinggb:Yeah, it's indeed a problem leaking many resources in system. I just want to keep pace with cinder logical.11:33
gibisongwenping: what is the use case you want to solve? you mentioned deploy and undeploy cyborg. there I think before undeploy the admin needs to clean up the cyborg users. Also mentioned failure in cyborg. If that failure is intermittent (e.g service restart or network interrupt) then I think end user needs to retry the delete. if the cyborg failure is static then that is a cyborg but to be fixed11:36
gibis/but/bug/11:37
*** mkrai has quit IRC11:37
*** sapd1 has quit IRC11:42
*** songwenping_ has joined #openstack-nova11:42
songwenping_gibi:i want to solve the second use case.11:43
*** songwenping has quit IRC11:44
brinzhanggibi: I give you use case from my customer11:45
brinzhangs/use case/ a use case11:45
brinzhanggibi: Due to the system upgrade, the cyborg service cannot be started. If the user wants to clear the instance that contains ‘accel: device_profile_name’ in the flavor, the instance will be in an error state and cannot release scarce resources such as GPU and FPGA. If that is the user's only resource, it may also be considered for manual cleaning. This is common for small customers.11:45
brinzhangOf course, this is a scene of its edge.11:47
gibibrinzhang: so during and upgrade some of the control plane services are still up (e.g. nova) but some of them are down (e.g cyborg)11:47
brinzhangThis may be a treatment, but it is not so perfect.11:48
gibibut if cyborg is down then who the user could ever free up FPGA resources?11:48
gibis/who/how/11:48
gibialso even if it is freed up it cannot be used again as cyborg is down11:49
brinzhangmaybe that canbe done in db, this is perhaps the worst case11:49
*** dpawlik has quit IRC11:50
*** dpawlik has joined #openstack-nova11:50
gibisorry but I my mind if cyborg service is down, then the user cannot and should not do anything with resources managed by cyborg11:51
brinzhanggibi: I just put forward such a scenario, and I agree with you, your consider is right11:51
gibithen we agree that this use case is not valid :)11:52
songwenping_agree with gibi.11:52
*** dpawlik has quit IRC11:52
*** dpawlik has joined #openstack-nova11:53
brinzhangyes, but I think I also need to think how to deal with this scenario, we did encounter this situation.11:53
*** dpawlik has quit IRC11:54
*** dpawlik has joined #openstack-nova11:54
brinzhanganother way, that can power off the server, and migrate its instance, than re-deployed the OpenStack11:55
brinzhang in a new region11:55
*** ociuhandu has joined #openstack-nova11:57
gibihonestly I don't see why does your deployment need to support manipulating FPGAs while cyborg service is doewn11:57
gibidown11:57
*** belmoreira has joined #openstack-nova12:01
brinzhangyes, it isnot make sense. good bye gibi, hope you have a good day ^12:01
*** ociuhandu has quit IRC12:02
*** rpittau|bbl is now known as rpittau12:03
gibibrinzhang: have a nice afternoon12:05
*** ttsiouts has quit IRC12:10
bauzasgibi: ack thanks12:12
bauzasbrinzhang: if you want, ping me tomorrow for discussing about something about reviews12:12
bauzasbrinzhang: (UTC+2 here)12:12
*** ttsiouts has joined #openstack-nova12:12
*** ttsiouts has quit IRC12:17
*** jangutter has joined #openstack-nova12:18
*** ratailor has quit IRC12:20
*** ttsiouts has joined #openstack-nova12:22
*** derekh has quit IRC12:24
*** ttsiouts has quit IRC12:27
*** ttsiouts has joined #openstack-nova12:32
*** udesale_ has quit IRC12:34
gibibauzas: will there be a FUP for the comments from lyarwood in https://review.opendev.org/#/c/712118/ ?12:40
bauzasgibi: haven't seen them yet12:40
bauzasgibi: FWIW, in https://review.opendev.org/#/c/712118/ like I said in a comment, this change is no longer needed for https://review.opendev.org/#/c/712741/12:41
*** Luzi has joined #openstack-nova12:42
gibibauzas: will you then remove it from the series?12:42
bauzasgibi: for the comment nits, sure I can do it in a FUP12:42
bauzasgibi: we can merge it since I already provided a ML thread for out-of-tree drivers maintainers12:42
gibibauzas: yes, comment nits are totally FUPable but if the whole patch is not needed then it is even better12:42
bauzasgibi: or wait, I'll rebase this one on top of https://review.opendev.org/#/c/712741/ and just provide a new revision for the nits12:43
bauzaswill be done in 1 min12:43
gibibut then the allocation would be an unused param12:43
gibiisn't it?12:43
*** ttsiouts has quit IRC12:44
*** ttsiouts has joined #openstack-nova12:45
bauzasgibi: for finish_revert_migration() yes12:46
bauzasgibi: to clarify, I'll just provide the new series12:46
bauzasand people can discuss on the opportunity to merge https://review.opendev.org/#/c/712118/ if nothing uses the new param or not12:46
gibibauzas: OK, I will check the new series12:47
bauzasgibi: should be done in 5 mins, just verifying unittests and functests because of a minor merge conflict12:47
gibicool12:49
*** priteau has joined #openstack-nova12:50
*** derekh has joined #openstack-nova12:59
francoispgibi hello, we would need an external reviewer (outside of RH) to check on https://review.opendev.org/#/c/669674/ , would you have time to have a look?13:00
bauzasgibi: excellent concern FWIW https://review.opendev.org/#/c/712741/6/nova/tests/functional/libvirt/test_vgpu.py@4513:00
*** artom has joined #openstack-nova13:00
gibifrancoisp: based on a recent agreement you only need to keep the two company rule for high impact changes, anything that13:03
gibiinvolves a microversion, service version, rpc version, or database13:03
gibimigration.13:03
gibifrancoisp: but sure I will look at that bugfix13:03
gibibauzas: honestly I failed to prove that it can actaully cause any problem but if you can add someting to the setUp to reset the test object level variable that could scratch my itch13:04
openstackgerritSylvain Bauza proposed openstack/nova master: Allocate mdevs when resizing or reverting resize  https://review.opendev.org/71274113:04
openstackgerritSylvain Bauza proposed openstack/nova master: Pass allocations to virt drivers when reverting resize  https://review.opendev.org/71211813:04
francoispthanks very much gibi13:05
gibifrancoisp: for reference http://lists.openstack.org/pipermail/openstack-discuss/2020-March/013553.html13:05
*** irclogbot_0 has quit IRC13:06
lyarwoodgibi: I asked for additional review outside of RH in francoisp's case as that change impacts all callers to cinder across all virt drivers.13:07
francoispok thanks gibi, that makes sense, otherwise you would get overwhelmed13:07
gibilyarwood: I see that is reasonable13:07
*** irclogbot_0 has joined #openstack-nova13:08
*** mriedem has joined #openstack-nova13:09
*** kevinz has quit IRC13:09
*** kevinz has joined #openstack-nova13:14
bauzasgibi: if you don't mind reapplying your +2 on the vgpu resize change given the only change was due to a merge conflict resolution https://review.opendev.org/#/c/712741/6..713:22
*** ttsiouts has quit IRC13:22
* bauzas goes working on the prelude section13:22
*** lbragstad has quit IRC13:24
*** lbragstad has joined #openstack-nova13:27
gibibauzas: done. and thanks for writing the prelude13:27
bauzasta13:27
*** ociuhandu has joined #openstack-nova13:30
gibifrancoisp, lyarwood: +A-d the cinder retry13:31
*** ttsiouts has joined #openstack-nova13:32
francoispok great, thank you gibi13:32
lyarwoodyup thanks gibi13:32
*** psachin has quit IRC13:34
*** ociuhandu has quit IRC13:35
*** ttsiouts has quit IRC13:37
*** lbragstad_ has joined #openstack-nova13:37
openstackgerritjayaditya gupta proposed openstack/nova master: Support for --force flag for nova-manage placement heal_allocations command  https://review.opendev.org/71539513:37
*** lbragstad has quit IRC13:39
*** hoonetorg has quit IRC13:40
*** ttsiouts has joined #openstack-nova13:42
*** ttsiouts has quit IRC13:45
*** ttsiouts has joined #openstack-nova13:45
nightmare_unrealmriedem:  thanks for the review :) As you have suggested I have made changes accordingly but I am still facing 1 issue. It seems the allocated ram ( bogus ram) won't change if you call heal allocation with force flag or without force flag :/  I have added comments for it https://review.opendev.org/#/c/715395/1013:47
nightmare_unrealThanks13:47
*** eharney has joined #openstack-nova13:52
*** tkajinam has joined #openstack-nova13:53
-openstackstatus- NOTICE: Zuul is temporarily offline; service should be restored in about 15 minutes.13:59
*** hoonetorg has joined #openstack-nova13:59
*** ociuhandu has joined #openstack-nova14:00
*** mkrai has joined #openstack-nova14:03
gmannmelwitt: stephenfin gibi seems like johnthetubaguy is not online. how we should proceed on these last bits to merge as 23rd is hard string freeze - https://review.opendev.org/#/q/topic:bp/policy-defaults-refresh+status:open14:05
gibigmann: I'm on a call I will ping back in an hour14:06
gibibut overall I can try to spend 1 hour on those today14:06
gmanngibi: thanks14:08
*** sapd1 has joined #openstack-nova14:10
*** songwenping_ has quit IRC14:16
*** hongbin has joined #openstack-nova14:21
openstackgerritsean mooney proposed openstack/nova-specs master: move implemented spec for train  https://review.opendev.org/70627614:26
openstackgerritsean mooney proposed openstack/nova-specs master: move implemented spec for train  https://review.opendev.org/70627614:27
*** ociuhandu has quit IRC14:29
mriedemnightmare_unreal: that's the point of the feature, correct? if it's not working you're going to need to debug it.14:29
mriedembut that's why i asked for that kind of test14:30
nightmare_unrealyeaah14:31
*** dklyle has joined #openstack-nova14:31
*** ociuhandu has joined #openstack-nova14:35
openstackgerritsean mooney proposed openstack/nova-specs master: move implemented spec for ussuri  https://review.opendev.org/72127814:35
*** hemna_ has quit IRC14:35
*** ociuhandu has quit IRC14:40
kashyaplyarwood: Hey, do you have the reproducer for that 'q35' thing on Ubuntu?14:40
* kashyap goes to check the nova-next WIP job URL...14:41
kashyapAh, you updated this morning14:41
kashyapOkay, it looks like a nudge.  (Because, I don't see any 'diff' b/n 3..4: https://review.opendev.org/#/c/708701/3..4/.zuul.yaml)14:42
*** mlavalle has joined #openstack-nova14:57
*** tkajinam has quit IRC14:59
lyarwoodkashyap: sorry was hacking away on something downstream15:02
lyarwoodkashyap: I've just rebased that today to see if it still reproduces15:02
lyarwoodkashyap: I don't have anything written up, I only manually reproduced it before.15:02
kashyaplyarwood: No problem; I asked on the change15:02
kashyap(I don't count on instant responses :))15:02
*** mgariepy has joined #openstack-nova15:03
*** dpawlik has quit IRC15:07
*** sapd1 has quit IRC15:14
*** mkrai has quit IRC15:22
*** mkrai has joined #openstack-nova15:22
*** hemna_ has joined #openstack-nova15:23
*** hemna_ has quit IRC15:24
*** vishalmanchanda has quit IRC15:26
*** sapd1 has joined #openstack-nova15:26
*** hongbin has quit IRC15:28
gibigmann: I'm +2 on the remaining policy changes.15:33
*** gyee has joined #openstack-nova15:40
*** ttsiouts has quit IRC15:43
*** happyhemant has joined #openstack-nova15:45
*** amodi has joined #openstack-nova15:46
*** hoonetorg has quit IRC15:46
-openstackstatus- NOTICE: Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references.15:47
gmanngibi: thanks. should i revise this as per comment if you are online and can re+2 - https://review.opendev.org/#/c/720129/715:48
*** ttsiouts has joined #openstack-nova15:51
*** hoonetorg has joined #openstack-nova15:59
*** KeithMnemonic has joined #openstack-nova16:00
gibigmann: if you respin it then I can re +216:00
gibigmann: but I might be slower during my evening16:01
gmanngibi: cool.  dojng16:01
gmanndoing16:01
gibicool16:01
*** ganso has quit IRC16:03
*** hongbin has joined #openstack-nova16:05
*** ganso has joined #openstack-nova16:05
*** rpittau is now known as rpittau|afk16:07
openstackgerritGhanshyam Mann proposed openstack/nova master: Add docs and releasenotes for BP policy-defaults-refresh  https://review.opendev.org/72012916:10
gmanngibi: ^^16:10
gibilooking16:10
openstackgerritBalazs Gibizer proposed openstack/nova master: Add docs and releasenotes for BP policy-defaults-refresh  https://review.opendev.org/72012916:12
gibijust fixed a missing verb  in the same sentence ^^16:12
gibibut +216:12
*** Luzi has quit IRC16:15
gmannthanks16:19
*** ttsiouts has quit IRC16:22
*** dtantsur is now known as dtantsur|afk16:32
*** yaawang has joined #openstack-nova16:33
*** yaawang_ has quit IRC16:33
*** evrardjp has quit IRC16:35
*** evrardjp has joined #openstack-nova16:35
*** ociuhandu has joined #openstack-nova16:36
*** ttsiouts has joined #openstack-nova16:38
*** csatari has quit IRC16:54
*** ttsiouts has quit IRC16:54
*** csatari has joined #openstack-nova16:55
*** ociuhandu has quit IRC16:56
*** hemna has joined #openstack-nova16:58
*** derekh has quit IRC17:03
*** sapd1 has quit IRC17:07
*** ociuhandu has joined #openstack-nova17:10
stephenfingibi: If you can hit https://review.opendev.org/717884 https://review.opendev.org/719100 and https://review.opendev.org/720042 then we're done with policy, afaict17:17
*** tesseract has quit IRC17:18
openstackgerritMerged openstack/nova stable/train: libvirt: Calculate disk_over_committed for raw instances  https://review.opendev.org/71896417:18
* stephenfin -> 🐕🚶17:18
*** priteau has quit IRC17:19
*** ociuhandu has quit IRC17:22
*** portdirect has quit IRC17:30
*** portdirect has joined #openstack-nova17:30
*** ttsiouts has joined #openstack-nova17:34
openstackgerritMerged openstack/nova master: Add retry to cinder API calls related to volume detach  https://review.opendev.org/66967417:36
*** ttsiouts has quit IRC17:39
*** ralonsoh has quit IRC17:39
*** evrardjp has quit IRC17:44
*** evrardjp has joined #openstack-nova17:49
*** billkgr has joined #openstack-nova17:52
*** maciejjozefczyk_ has joined #openstack-nova17:53
*** maciejjozefczyk has quit IRC17:53
*** happyhemant has quit IRC17:55
*** hemna has quit IRC17:58
*** hemna has joined #openstack-nova17:59
*** maciejjozefczyk_ has quit IRC18:01
*** ircuser-1 has joined #openstack-nova18:01
*** nightmare_unreal has quit IRC18:02
openstackgerritGhanshyam Mann proposed openstack/nova master: Fix the followup comment of policy doc  https://review.opendev.org/72132218:06
*** billkgr has quit IRC18:07
*** ttsiouts has joined #openstack-nova18:11
*** billkgr has joined #openstack-nova18:12
*** ociuhandu has joined #openstack-nova18:15
artomgmann, wait, are we trying to land that policy doc before RC?18:18
artomDidn't mean to sabotage that - but then my -1 carries less weight than gibi's +2, so :)18:19
*** ociuhandu has quit IRC18:20
*** ociuhandu has joined #openstack-nova18:20
*** hongbin has quit IRC18:21
gmannartom: yeah before RC. I am fixing your comment in follow up along with stephenfin comments -https://review.opendev.org/72132218:23
openstackgerritMerged openstack/nova master: Introduce scope_types in servers attributes Policies  https://review.opendev.org/71972918:24
*** mkrai has quit IRC18:24
*** mkrai_ has joined #openstack-nova18:24
*** ttsiouts has quit IRC18:26
*** ttsiouts has joined #openstack-nova18:26
*** hongbin has joined #openstack-nova18:26
*** hongbin has quit IRC18:27
openstackgerritMerged openstack/nova master: Add new default roles in servers attributes policies  https://review.opendev.org/71973018:33
openstackgerritMerged openstack/nova master: Add test coverage of existing remaining servers policies  https://review.opendev.org/72010418:34
openstackgerritMerged openstack/nova master: Introduce scope_types in remaining servers Policies  https://review.opendev.org/72010618:34
openstackgerritMerged openstack/nova master: Add new default roles in remaining servers policies  https://review.opendev.org/72011618:34
*** jrosser has quit IRC18:41
*** jrosser has joined #openstack-nova18:42
*** slaweq has quit IRC18:50
openstackgerritGhanshyam Mann proposed openstack/nova master: Fix the followup comment of policy doc  https://review.opendev.org/72132218:54
gmannstephenfin: artom i fixed the policy doc comment in followup, please check - https://review.opendev.org/#/c/721322/18:54
*** slaweq has joined #openstack-nova18:56
*** ociuhandu has quit IRC18:57
*** ociuhandu has joined #openstack-nova18:58
*** jdillaman has joined #openstack-nova19:00
*** slaweq_ has joined #openstack-nova19:02
*** ociuhandu has quit IRC19:03
*** slaweq has quit IRC19:05
*** ttsiouts has quit IRC19:06
*** belmoreira has quit IRC19:10
openstackgerritMerged openstack/nova master: Fix follow up comments on policy work  https://review.opendev.org/71783519:13
openstackgerritMerged openstack/nova master: Pass allocations to virt drivers when resizing  https://review.opendev.org/58908519:17
*** ttsiouts has joined #openstack-nova19:42
*** ttsiouts has quit IRC19:44
*** ttsiouts has joined #openstack-nova19:44
*** grandchild has joined #openstack-nova19:47
*** ttsiouts has quit IRC19:49
*** ociuhandu has joined #openstack-nova19:50
mnasersean-k-mooney: sorry to ping you here but i don't know what other channelt o find you in -- happy to hear your thoughts on https://review.opendev.org/#/c/720107/3 :)19:56
sean-k-mooneymnaser: im usally in nova,neuton,plamcent,kolla and somethim infra or oslo19:57
*** ociuhandu has quit IRC19:57
sean-k-mooneybut ya ill take a look now19:58
mnasersean-k-mooney: fair :) whois showed a lot less than those today :P19:58
sean-k-mooneywhois sean-k-mooney19:58
sean-k-mooneyi has a few but ya so container images19:59
sean-k-mooneyfun19:59
sean-k-mooneyim not sure that its fair to describe kolla image as like system image e.g. lxc style but they are not that light weight either20:00
*** ccamacho has quit IRC20:01
zigoWhat's blocking this backport patch ? https://review.opendev.org/#/c/711233/20:04
zigoThe bug https://bugs.launchpad.net/nova/+bug/1788014 is causing real life troubles and a fix would be really nice.20:04
openstackLaunchpad bug 1788014 in OpenStack Compute (nova) rocky "when live migration fails due to a internal error rollback is not handeled correctly." [Medium,In progress] - Assigned to Elod Illes (elod-illes)20:04
zigoWe had all sorts of down time due to it, lots of head scratching until we understood what was going on...20:05
melwittelod: question for your morrow ^20:08
*** ociuhandu has joined #openstack-nova20:10
*** grandchild has quit IRC20:13
*** ttsiouts has joined #openstack-nova20:18
*** ttsiouts has quit IRC20:28
*** ttsiouts has joined #openstack-nova20:28
*** ociuhandu has quit IRC20:30
*** billkgr has quit IRC20:31
*** xek has quit IRC20:33
*** ociuhandu has joined #openstack-nova21:03
*** ociuhandu has quit IRC21:08
*** jangutter has quit IRC21:16
*** avolkov has quit IRC21:31
*** dosaboy has quit IRC21:46
*** dosaboy has joined #openstack-nova22:02
*** ociuhandu has joined #openstack-nova22:04
*** rcernin has quit IRC22:05
*** rcernin has joined #openstack-nova22:06
*** mriedem has left #openstack-nova22:11
*** ociuhandu has quit IRC22:18
*** ociuhandu has joined #openstack-nova22:18
*** tosky has quit IRC22:19
*** ociuhandu has quit IRC22:23
*** ttsiouts has quit IRC22:27
*** abaindur has joined #openstack-nova22:34
*** jangutter has joined #openstack-nova22:39
*** tkajinam has joined #openstack-nova22:43
*** jangutter has quit IRC22:45
*** abaindur has quit IRC22:46
*** abaindur has joined #openstack-nova22:46
abaindurHello, I have a question about post copy live migration. What happens if live_migration_permit_post_copy is only set on nova compute on some hypervisors? Does it need to be the same across every host?22:47
sean-k-mooneymnaser: this is my counter proposal https://review.opendev.org/#/c/720107/3/goals/proposed/container-images.rst@1422:47
*** ttsiouts has joined #openstack-nova23:04
*** ttsiouts has quit IRC23:09
sean-k-mooneyabaindur: i think it is based on teh source node23:11
sean-k-mooneybut we dont test it so it shoudl be the same on all node but it might work if its different23:12
abaindurwould there be any issues if we migrated from a source host that had post copy enabled, but a destination host that didnt?23:12
abaindurwe want to give it a shot - but only wanted to run it on a subset of hypervisors23:12
abaindursean-k-mooney: one other question about live migration: reason we are going to post-copy is because we're seeing significant downtime (15 - 30+ seconds) during live migration. Seems to always start when VM is Paused on source/Resumed on dest, then start working shortly after port-binding activate call is made, and port is plugged on the host23:14
abaindurWe thought that maybe giving post-copy a shot would help, since it would give us the benefit of this fix: https://opendev.org/openstack/nova/commit/1f48d3d83b4d5f6f9cd96ee06d2fc005635c1ff923:15
abaindurBut are there any known issues around pre-copy live migration? Bulk of the time seems to be taken up in _post_copy_live_migration() function on the source host. For example it took 18+ seconds from statr of that function until the port-binding activate call was sent to neutron23:16
abaindursorry, not _post_copy_live_migration(). I meant _post_live_migration() function23:18
*** threestrands has joined #openstack-nova23:18
*** slaweq_ has quit IRC23:21
sean-k-mooneyabaindur: libvirt will check if the qemu and libvirt on each host support it23:22
sean-k-mooneyand only enable it if both do i belive23:22
sean-k-mooneyso in principal i dont think it would have a negitive effect just be aware that you would see different behavior migration too a host with it enabled vs migrating form a host with it enabled23:23
sean-k-mooneyi dont recall off the top of my head which config we check to enable it but i belive it would have a asymetric behavior as i think we only check one of them23:24
*** slaweq_ has joined #openstack-nova23:25
sean-k-mooneyabaindur: for what it si worth the port binding events shoudl work with or without post copy23:25
sean-k-mooney_post_live_migration is the function that cleans up the image on the source node and finishes and work reqiured on the dest23:27
sean-k-mooneyif you are using a nova and neutron that do not support multiple port bining _post_live_migration_at_dest is where we will do the port binding23:28
sean-k-mooneybut in a nova that support neutron multiple port bindign api we will prebind the port on the souce and activate it either in responce to the live migrion even tor the start of _post_live_migration23:29
abainduryea, but we noticed that in pre-copy LV mode, the port binding activate call is sent towards the end of that function. And connectivity is disrupted when VM is Paused.23:29
abaindurhttps://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L693123:29
sean-k-mooneyya so are you using ovs with the ovs firewall driver by any chance23:29
*** slaweq_ has quit IRC23:29
abaindurand here is where the port binding activation call is invoekd: https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L700523:29
sean-k-mooneyor are you using iptables23:29
abaindurbetween those 2 lines of code, we noticed nova taking 18+ seconds - 2 seocnds for bolume cleanup, few more seconds for disconnection, 5 sec for get_instance_nw_info, 5 more sec for   compute_utils.notify_about_instance_action, etc...23:30
abaindurOne the port binding activate call came, another 3-5 seconds for port to be plugged by OVS-agent23:31
abaindurwe're using iptables23:31
sean-k-mooneyon what release23:31
abaindurRocky23:31
sean-k-mooneythe iptable firewall should have less downtime the openvswithc23:31
sean-k-mooneyas we can pre plug the port and have neutron wire it up while we are waiting for migration too happen at the libvirt  level.23:32
sean-k-mooneywhen using the ovs firewall driver because libvirt recreate the ovs port it takes longer as neutron has to do it twice23:33
abainduras mentioned, we're seeing non-trivial downtime in that _post_live_migration() function on the src host, between the log "'_post_live_migration() is started.." (or when VM is Paused), and when neutron receives the port-binding /activate call at https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L700523:33
sean-k-mooneyabaindur: yes but before the multiple port binding  change it used to be even longer23:33
abaindurso is that down time then to be expected or unavoidable? :(23:34
sean-k-mooneyno its avoidable23:34
sean-k-mooneyso this is the feature you are trying to use https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/neutron-new-port-binding-api.html23:34
sean-k-mooneyi did not think this required post-copy but let me double check23:34
abaindurwell thats why we were considering trying out post-copy - to see if it speeds up when nova makes the port binding call23:35
sean-k-mooneyok so yes if you want the quick setup we are waiting for VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY23:35
abaindurpost-copy seems to trigger that call based on VIR_DOMAIN_EVENT states23:35
abaindurpre-copy waits for that _post_live_migration() on the host to invoke network_api.migrate_instance_start(), which seems to take a while for us23:36
sean-k-mooneythat will allow us to activate the port bidning from the souce earlier then _post_live_migration23:36
abaindurRight, thats why we will try post-copy mode. But was wondering if the downtime/delay we are seeing with pre-copy is expected or unavoidable?23:37
sean-k-mooneyabaindur: ya so it is a littl strange that your _post_live_migration function completion is os long23:37
sean-k-mooneyi would not expect _post_live_migration to take multiple seconds23:37
abaindurwhat are downsides to post copy besides that VM needs to be rebooted if theres a live migration error?23:38
abaindurand page faults may slow down the VM as memory needs to be copied over the network?23:39
*** ttsiouts has joined #openstack-nova23:39
sean-k-mooneythat is the main one. if there is a network outage while its still in post copy phase then the vm will crash23:39
sean-k-mooneyyes page falts acroos the network might beut all write happen locally23:39
sean-k-mooneyso the vm will only pause if it need to read un copied data23:39
sean-k-mooneyand once its copied local update to that will happen to the dest memmory23:40
sean-k-mooneyas you pointed out we are activating the port here https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L700523:40
sean-k-mooneywhich happen near the star of the function so the dely is likely related to cinder performance23:40
sean-k-mooneyabaindur: have you tried live migrating vms that dont have cinder volumes23:41
abainduryea, we tried VMs both volume and ephemeral based23:41
sean-k-mooneydid you see the same delay?23:41
abaindurpretty much23:42
abaindurtimed the volume code with some logs of our own, heres what we observed:23:42
abaindur1.913 seconds for _get_instance_block_device_info and self.driver.post_live_migration23:42
abaindurabout 0.4 sec for self.driver.get_volume_connector(instance)23:42
abaindur2.659 seconds for  self.volume_api.terminate_connection()23:42
abaindur3.28 seconds for network_info = self.network_api.get_instance_nw_info(ctxt, instance)23:42
abaindur5.019 seconds for self._notify_about_instance_usage(ctxt, instance, "live_migration._post.start", network_info=network_info)23:43
abainduranother almost 5 seconds nova spent just making the port binding activate API call, seems to be spending time in keystone and oslo_concurrency.lockutils code23:43
sean-k-mooneyis that absolute time of time for each function23:43
abainduryea, we just added logs throughout that post_live_migration() function23:43
abaindurbefore/after each of those calls to see what was taking so long23:43
sean-k-mooneyok so i subtrac those numebr to get the time for each23:44
sean-k-mooneythat is still very slow23:44
abaindurfor example, in one live migration, we say: VM Paused at 25:06.19223:44
sean-k-mooneyabaindur: are you using memcache for you keysontone auth tokens23:44
abaindurby time we saw nova-compute make the port binding call, it was at: 35:24.69423:45
abaindurVM Paused at 35:06.192  *23:46
*** jangutter has joined #openstack-nova23:46
sean-k-mooneyabaindur: can you check your nova.conf and see if you have https://zuul.opendev.org/t/openstack/build/dcde79801a624c25b195a46ead7af562/log/controller/logs/etc/nova/nova-cpu_conf.txt#62-6823:46
sean-k-mooneyalso 10 seconds to bind the port in precopy mode is too hight so there is something else hurting performance which is why im suspecign you do not have caching of keysotne configure correctly.23:49
*** jangutter has quit IRC23:52
abaindurOk no, memcache_servers is not set...23:52
abaindurthis is for nova compute on the hypervisor side right?23:52
sean-k-mooneyhttps://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6982-L7008 could be safely moved to https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6939 by the way and in the libvirt case https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L6933 woudl be fine too.23:53
sean-k-mooneyam it will be used both on the contoler and computes23:53
sean-k-mooneyso you are missing23:54
sean-k-mooney[keystone_authtoken]23:54
sean-k-mooneymemcached_servers = localhost:1121123:54
sean-k-mooneyor well your actull memcache servers23:54
abainduryeah, I dont see that config opt ever ok23:54
sean-k-mooneythis will be used to cach auth tokens for every api call we make23:54
abainduri dont see it ever set*23:55
sean-k-mooneyabaindur: so at least on the contoler side it has a significnat impact on the api23:58
sean-k-mooneyhttps://bugs.launchpad.net/nova/+bug/183664223:58
openstackLaunchpad bug 1836642 in neutron "Metadata responses are very slow sometimes" [High,Incomplete] - Assigned to Slawek Kaplonski (slaweq)23:58
sean-k-mooneywe adress this porblem in the gate by truning on caching with https://github.com/openstack/devstack/commit/d33cdd01f83b891b010e0fd238f1816910f3fd7723:58
abaindurI dont see it on controller either23:59
sean-k-mooneyi am not sure if i tis use dby the compute node but i think it will be whenever it is calling cinder neutron or placment23:59
sean-k-mooneyabaindur: its optional23:59
sean-k-mooneybut it improves perfromacne alot23:59

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!