Friday, 2024-01-19

opendevreviewTakashi Natsume proposed openstack/nova-specs master: Create specs directory for 2024.2 Dalmatian  https://review.opendev.org/c/openstack/nova-specs/+/90607300:36
opendevreviewAmit Uniyal proposed openstack/nova stable/2023.2: Updates glance fixture for create image  https://review.opendev.org/c/openstack/nova/+/90608805:23
opendevreviewAmit Uniyal proposed openstack/nova stable/2023.2: Fixes: bfv vm reboot ends up in an error state.  https://review.opendev.org/c/openstack/nova/+/90608905:23
opendevreviewAmit Uniyal proposed openstack/nova master: Added context manager for instance lock  https://review.opendev.org/c/openstack/nova/+/87364805:44
opendevreviewAmit Uniyal proposed openstack/nova master: Disconnecting volume from the compute host  https://review.opendev.org/c/openstack/nova/+/87744605:44
opendevreviewAmit Uniyal proposed openstack/nova master: Removed explicit call to delete attachment  https://review.opendev.org/c/openstack/nova/+/89128905:44
bauzas_colby: I'm here now but let's discuss this when you're back, I have some questions (saw your pastebin)08:32
opendevreviewAmit Uniyal proposed openstack/nova stable/2023.2: Fixes: bfv vm reboot ends up in an error state.  https://review.opendev.org/c/openstack/nova/+/90608909:19
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Configure and teardown ephemeral encryption secrets  https://review.opendev.org/c/openstack/nova/+/82675409:55
opendevreviewmelanie witt proposed openstack/nova master: imagebackend: Add support to libvirt_info for LUKS based encryption  https://review.opendev.org/c/openstack/nova/+/82675509:55
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to convert_image  https://review.opendev.org/c/openstack/nova/+/87093409:55
opendevreviewmelanie witt proposed openstack/nova master: Support create with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093209:55
opendevreviewmelanie witt proposed openstack/nova master: Support resize with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093309:55
opendevreviewmelanie witt proposed openstack/nova master: Add hw_ephemeral_encryption_secret_uuid image property  https://review.opendev.org/c/openstack/nova/+/87093509:55
opendevreviewmelanie witt proposed openstack/nova master: Add encryption support to qemu-img rebase  https://review.opendev.org/c/openstack/nova/+/87093609:55
opendevreviewmelanie witt proposed openstack/nova master: Support snapshot with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/87093709:55
opendevreviewmelanie witt proposed openstack/nova master: Support rebuild and unshelve with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87093909:55
opendevreviewmelanie witt proposed openstack/nova master: Support rescue with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/87367509:55
opendevreviewmelanie witt proposed openstack/nova master: libvirt: make <encryption> a subelement of <source>  https://review.opendev.org/c/openstack/nova/+/90551509:55
opendevreviewmelanie witt proposed openstack/nova master: WIP Support migration with ephemeral encryption  https://review.opendev.org/c/openstack/nova/+/90551209:55
opendevreviewmelanie witt proposed openstack/nova master: Reject (resize|rebuild) API requests with conflicting encryption  https://review.opendev.org/c/openstack/nova/+/90424009:55
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for qcow2 with LUKS  https://review.opendev.org/c/openstack/nova/+/77227309:55
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for raw with LUKS  https://review.opendev.org/c/openstack/nova/+/88431309:55
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for rbd with LUKS  https://review.opendev.org/c/openstack/nova/+/88991209:55
opendevreviewSylvain Bauza proposed openstack/nova master: libvirt: Cap with max_instances GPU types  https://review.opendev.org/c/openstack/nova/+/89962510:46
opendevreviewSylvain Bauza proposed openstack/nova master: vgpu: Allow device_addresses to not be set  https://review.opendev.org/c/openstack/nova/+/90208410:46
opendevreviewSylvain Bauza proposed openstack/nova master: WIP: add coverage for registered dyn opts  https://review.opendev.org/c/openstack/nova/+/90277110:46
opendevreviewSylvain Bauza proposed openstack/nova master: FUP: add coverage for registered dyn opts  https://review.opendev.org/c/openstack/nova/+/90277110:54
*** tosky_ is now known as tosky12:50
opendevreviewTobias Urdin proposed openstack/nova master: Fail nova-compute startup when RP cannot be created  https://review.opendev.org/c/openstack/nova/+/90438113:18
opendevreviewTobias Urdin proposed openstack/nova master: libvirt: set remaining to 0 when no disk to migrate  https://review.opendev.org/c/openstack/nova/+/87384613:25
opendevreviewTobias Urdin proposed openstack/nova master: libvirt: set remaining to 0 when no disk to migrate  https://review.opendev.org/c/openstack/nova/+/87384613:25
opendevreviewTobias Urdin proposed openstack/nova master: Fix wrong nova-manage command in upgrade check  https://review.opendev.org/c/openstack/nova/+/88081913:29
opendevreviewTobias Urdin proposed openstack/nova master: Remove libvirt tunnelled migration  https://review.opendev.org/c/openstack/nova/+/87902113:33
opendevreviewTobias Urdin proposed openstack/nova master: Remove libvirt tunnelled migration  https://review.opendev.org/c/openstack/nova/+/87902113:36
opendevreviewTobias Urdin proposed openstack/nova master: Fail nova-compute startup when RP cannot be created  https://review.opendev.org/c/openstack/nova/+/90438114:28
opendevreviewTobias Urdin proposed openstack/nova master: Fail nova-compute startup when RP cannot be created  https://review.opendev.org/c/openstack/nova/+/90438114:38
opendevreviewMerged openstack/nova master: [ironic] Partition & use cache for list_instance*  https://review.opendev.org/c/openstack/nova/+/90083114:48
bauzasgibi: thanks a lot for the mdev series on both bugfix series14:52
bauzasgibi: are you available for discussing about https://review.opendev.org/c/openstack/nova/+/845757/3/nova/virt/libvirt/driver.py#8563 ?14:52
bauzas(just replied)14:58
gibibauzas: o/.14:59
bauzastl;dr: we continue to support VGPU allocations reshape in our code14:59
bauzasand yeah, that's crazy.14:59
gibiso the reshape test needs a from state and to create a from state we need to support some old behavior?14:59
bauzasgibi: well, the better would be to cut from tree all the reshape code15:00
bauzasthat's what I replied15:00
bauzasI'm just about to file a blueprint about it15:00
bauzaspapework-wise, I feel brave enough to have a 6th effort on VGPU (and a 6th potential conflict) that would drop the whole reshape code (and tests)15:01
bauzasand then amend my bugfix to not care about non-nested RPs15:01
bauzasbut see, that's chicken and eggs here15:02
gibican we decouple the test from the reshape support? E.g. can we prepare the from state purely in placement / db and then let the reshape code run on that state?15:02
gibi(I'm not suggesting to do the reshape removal in the current cycle)15:02
bauzasgibi: not sure I get your point, are you asking me to modify my patch to assert that that the allocated RP is always a child RP ?15:03
bauzasand then, just fix the test to not fail 15:03
gibi1) on your patch I only asked for a tracker for the cleanup so you don't need to change anything there15:03
gibi2) when thinking about the removal of the `allocated_rp.parent_uuid is None:` conditional15:04
gibiI assume that you need that conditional becuase the functional reshape test sets up the from state (with VGPU on root RP) while nova is running and therefore you need the conditional15:05
gibiis it so?15:05
gibiOR, does _allocate_mdevs called during reshape itself?15:06
gibiand during reshape you need the conditional15:07
gibianyhow this is not super important now. We have a tracker, we will get back to this when we have time to clean up15:07
bauzassorry, was filing the blueprint15:09
bauzashttps://blueprints.launchpad.net/nova/+spec/drop-vgpu-reshape-support15:09
gibithanks!15:10
bauzasyeah, that's correct, the reshape test assumes to create a mdev *before* the reshape15:10
bauzasso we can reshape the allocation15:10
bauzashence the crazy conditional15:10
gibiOK. so hypotetically we could try to set up the from state purely in DB / placement and only start up the nova-compute service after it. So we won't try to create an mdev with nova in the from state15:11
gibibut I guess it is better use of our time to remove the whole reshape support15:12
bauzasI would need to read the functional test in question but I get your point15:12
bauzasthat's actually maybe something to test quickly15:12
bauzasI can write a POC on top of that patch15:12
gibiif you have time :)15:13
bauzasas a FUP, that'd remove the logic and then we could see what exactly to do in the functest15:13
bauzasgibi: I can try to spend a couple of hours on that, I'm pretty done with implementing stuff now15:13
bauzasmy left tasks are mostly rebases and reviews15:13
bauzas+ the mtty patch15:14
bauzasgibi: thanks for your idea, I'll try it ;)15:14
* bauzas goes to school15:14
gibicool :)15:15
bauzassean-k-mooney: tbc, you're -1.9999 because of the documentation for https://review.opendev.org/c/openstack/nova/+/845757 ?16:10
bauzasI'm asking for that due to the fact you're OK with the functest https://review.opendev.org/c/openstack/nova/+/845747/316:12
bauzaswhat I could do is try to test with two different custom rcs16:12
bauzasso we could document that this way16:12
kashyaptobias-urdin: Thanks for the respin here: https://review.opendev.org/c/openstack/nova/+/879021. Will look on Mon.  (I'm buried in another context now)16:24
opendevreviewSylvain Bauza proposed openstack/nova master: Support multiple allocations for vGPUs  https://review.opendev.org/c/openstack/nova/+/84575716:40
bauzassean-k-mooney: updated based on your concerns of the portability of group_policy ^16:44
bauzasI still think we want to document this as this is simpler for operators if they don't give a shit about other nested resource classes 16:45
bauzasif you look at the generated docs, they will see a big red warning about using group_policy16:45
sean-k-mooneybauzas: i thikn we shoudl be removing all supprot for group_policy in the flavor17:08
sean-k-mooneyand in placement in a new placment microverison17:09
sean-k-mooneyisolsate beween sepicifc groups is fine17:09
sean-k-mooneybut not how it currently works17:09
sean-k-mooneybauzas: regarding the funtional test i woudl personally prefer if you provided fucntioal tests usign custom resouce classes and/or traits too17:14
bauzashave you seen my new revision ?17:14
sean-k-mooneyof the followup patch yes im still -1.99917:15
bauzasI tend to disagree with you 17:15
sean-k-mooneymaybe -1.95 you did at least add the other options17:15
bauzasbecause we have a gap that we need to solve17:16
bauzasI'm just fixing some tech debt where we were only looking at one allocation17:16
bauzasand I'm just documenting ways to alleviate the problem17:16
sean-k-mooneyim ok with you documeiting the group_policy way if its not the first way we present adn it not the one we recommend17:16
bauzasthe group policy is really the simplest way for ops that don't care about qos-bandwidth or other nested resources17:17
bauzasand there are big warnings in the docs explaining the limitations of that approach17:17
sean-k-mooneysure but that does not mean the shoudl use it even if they dont have those constraits17:17
bauzasso your -1.95 seems very nitpicky honestlyu17:17
sean-k-mooneyit the way that will cause them the most upgrade pain17:17
sean-k-mooneywould you prefer i put it to -2 then17:17
bauzasputting -2 for docs seems a bit harsh for me17:18
bauzasI can just remove the whole doc 17:18
sean-k-mooneyits not about the doc17:18
bauzasand leave the patch17:18
bauzasI can even drop the functional test17:18
sean-k-mooneyits about recomending somehting that we know will cause upgrade apin in the future17:18
sean-k-mooneythe func test is fine17:19
bauzasdo you understand that all of your concerns are about the doc file ?17:19
bauzasnot about the python module?17:19
bauzaslike I said, I can just drop the doc and leave ops find the way they prefer by themselves17:19
sean-k-mooneyso i think it woudl be nice ot have other func test as well but you have enocuh to merge the patch i think17:19
sean-k-mooneyso yes its all about the doc17:20
bauzasso, I'll split the patch17:20
bauzasI'll just address the python fix in one change17:20
sean-k-mooneyno17:20
bauzasand we'll discuss the docs things in a separate change17:20
sean-k-mooneyplease keep it in two17:20
sean-k-mooneyyou can add a thid for the doc if you like17:20
bauzasthat's what I'm suggesting17:21
bauzas1/ functest17:21
sean-k-mooneybut im just askign you to swap 20 lins of docs17:21
bauzas2/ patch17:21
bauzas3/ doc17:21
bauzasso your concern is about the future and the fact that we persist flavors right?17:21
sean-k-mooneyyes17:21
bauzasI'm just trying to understand the concern correctly17:22
bauzassay one day another great feature would provide other nested RPs17:22
bauzasand ops eagerly wanting it17:22
sean-k-mooneyyep17:22
sean-k-mooneyor endusers17:22
bauzasthose new children would be one separate RPs but the GPU ones, right?17:23
sean-k-mooneyit might not be the operator17:23
sean-k-mooneythat not the issue17:23
sean-k-mooneyif we use isolate you cant have 2 request_groups form the same rp17:23
sean-k-mooneyso pci in placment for example17:23
sean-k-mooneyyou wont be abel to have 2 vfs form teh same hsot nic17:23
bauzasI get it17:24
bauzasisolate applies to all resource groups17:24
sean-k-mooneyyep17:24
bauzasso the problem is really with prefilters then ?17:24
sean-k-mooneyno17:24
bauzasbecause the flavors wouldn't express those17:24
bauzaspci in placement is a good strawman, let's continue to use it for the example17:24
sean-k-mooneythe problem is with the placment api17:25
sean-k-mooneywell it also apples to cyborg or neturon prots with resouce requests17:25
bauzassay I have VGPUS and PCI 17:25
sean-k-mooneysure17:25
bauzasthe problem arises when I want to ask for both, right?17:25
sean-k-mooneyyes17:25
bauzasand what if I'm asking for a single resource ?17:26
bauzas(resource *class)17:26
sean-k-mooneyyes17:26
sean-k-mooneyso pci in placeemnt is a less good exampel because we dont suppot that with neturon port yet. but ill give you an example17:27
bauzasyou agree that's not a problem if I'm only asking for VGPU *or* PCI, right ?17:27
sean-k-mooneyopenstack flavor set vgpu_2 --property "resources1:VGPU=1" \17:27
sean-k-mooney                                 --property "resources2:VGPU=1" \17:27
sean-k-mooney                                 --property "group_policy=isolate"17:27
sean-k-mooneyif we add --property "hw:pci_ailias=my_vf:2"17:28
sean-k-mooneythen if we only have one defivce with my_vf aviaable17:28
bauzasI got it17:28
bauzashence my warning 17:28
sean-k-mooneythen isolate will break that17:28
bauzas.. warning::                                                            Be careful when using ``group_policy`` as this policy is global to all the                                                            resources request. If you ask for other resources but only VGPUs in your                                                            flavor, that ``isolate`` policy will also apply for other                                      17:28
bauzas                      ResourceProviders.17:28
bauzasthat will be written in solid red17:29
sean-k-mooneyand it also breadk AggregateInstanceExtraSpecsFilter and computecapaitlies filter 17:29
bauzaswritten too17:29
bauzas                                                        .. note::                                                            If you use ``AggregateInstanceExtraSpecsFilter`` filter, you also need to                                                            configure your aggregates metadata with the ``group_policy`` values.17:29
bauzas(well, I can add the mention to computecapabilitiesFilter for sure)17:29
sean-k-mooneyand neutron QOS prots and cyborg integration if the cyborg request profile has multiple devices 17:30
bauzasthat's covered by the warning section17:30
sean-k-mooneycan you see why an approch that break 5 differnt feature is not something i want to recomemnd people to use as the first option17:30
bauzasI'm ready to bet that a very small fraction of users and ops don't care a bit about any of those stuff17:30
sean-k-mooneysure so we can docuemnt it for them but not the first way we recomemnd17:31
bauzasthis isn't breaking the feature, this is a limitation saying "you can't and shouldn't request multiple vgpus per instance if you want to create a flavor that asks for more than vgpus"17:31
sean-k-mooneyall im asking you to do is move lines 132-156 after lines 158-18117:31
bauzasif you really want to mix resources, do other ways17:32
bauzaslet's talk about the alternatives17:32
bauzascustom traits17:32
bauzaswhat you're proposing requires ops to define distinct traits between GPUs17:32
bauzaseven if they share the same type17:33
bauzasthe only difference would be "I'm GPU A" while the other would be "I'm GPU B"17:33
bauzasthat's not great, right?17:33
sean-k-mooneywell since this bugfix is really a feature then yes17:33
sean-k-mooneywithout intoducing a  new feature to expclity supprot selecting vgpus form diffent parent pysical devices17:34
sean-k-mooneyand makign that work transparly you whave to reuse one of our supproted features to acive the same goal17:34
bauzasthe real fix is is to drop the silly warning https://review.opendev.org/c/openstack/nova/+/845757/4/nova/virt/libvirt/driver.py#b8541 and just provide a loop17:35
bauzaswe could improve that for sure in a blueprint17:35
bauzaswe discussed that years ago and we said that it would require some better placement syntax17:36
bauzasbefore that, there is a quickwin17:36
sean-k-mooneywe dont supprot booting a vm with multiple gpus upstream today. it can be done ebcasue we dont block it but this is a new feature17:36
bauzastoday, there a gap that prevents you to request two allocations17:37
sean-k-mooneyright and its not supported as a result17:37
sean-k-mooneyupstream and downstream17:37
bauzaspas-ha[m]: this is Friday but I'd appreciate your thoughts https://review.opendev.org/c/openstack/nova/+/845757/17:38
bauzaspas-ha[m]: we can talk on Monday17:38
bauzassean-k-mooney: in the meantime, I'll split https://review.opendev.org/c/openstack/nova/+/845757/ in twice17:39
bauzasand just leave the doc discussion aside17:39
sean-k-mooneysure im ok with merging the repoducer and fix then following up with the doc. can you also add a short release note to the fix patch17:44
opendevreviewSylvain Bauza proposed openstack/nova master: Support multiple allocations for vGPUs  https://review.opendev.org/c/openstack/nova/+/84575717:44
opendevreviewSylvain Bauza proposed openstack/nova master: document how to ask for more a single vGPU  https://review.opendev.org/c/openstack/nova/+/90615117:44
bauzassean-k-mooney: if I go with a relnotes that would be a fixes section just mentioning that now you can request vGPUs using multiple resource groups17:45
sean-k-mooneybauzas: so another way to do this which we can disucss monday17:45
sean-k-mooneyis we could have a weigher that perferve allocatoin candates that had resouce form diffent RPs17:45
sean-k-mooneybauzas: yep exactly just short and sweet17:46
bauzascool17:46
spatelsean-k-mooney hey! 17:46
spatelquick question, does snapshot include memory footprint also ? 17:47
_colbybauzas: Thank you for your help. So I removed all the resource providers except those listed in the nova config for the device types and now nova is spinning up vgpu instances correctly. Its even creating the mdevs on its own. So I guess it was tying to create an mdev for a device not listed in the config17:47
bauzas_colby: good to know17:48
bauzas_colby: while you're here, I'd like your opinion17:48
sean-k-mooneyspatel: no17:49
sean-k-mooneyspatel: snaptions are of the root disk only17:49
sean-k-mooneyfor image baccked innstances17:49
bauzas_colby: I provided a patch that would allow you to request VGPU allocations with different resource groups https://review.opendev.org/c/openstack/nova/+/906151/117:49
bauzas_colby: the main concern is how to correctly express the request and my personal take is that in general VGPU flavors only ask for VGPUs17:50
bauzas_colby: do you confirm that you're not interested in mixing vgpu resources and other resources like pci passthrough devices in the same flavor ?17:51
spatelsean-k-mooney good to know :) thanks 17:56
_colbybauzas: we have a hypervisor that is strictly for passthrough and some that use vGPU. On the vGPU host we offer a vGPU option with 100% of the GPU though.17:58
_colbyIs it even possible to do passthrough with vgpu? With passthrough the nvidia drivers on the hypervisor need to be disabled17:59
sean-k-mooney_colby: not passthogu of a vgpu18:00
bauzas_colby: and you wouldn't be interested in requesting qos-bandwidth ports or cyborg ?18:00
_colbyah I see18:00
sean-k-mooneywe mean booting a vm with a sriov nic  + a vgpu18:00
_colbynot on our system no18:00
bauzas_colby: the context is that I'm documenting some usage18:00
bauzashttps://review.opendev.org/c/openstack/nova/+/906151/1/doc/source/admin/virtual-gpu.rst#13618:01
bauzasif you start using some extra spec parameter called 'group_policy', you may not be able to request other things18:01
bauzasbut you can continue to request for a single vGPU *and* other things tho18:01
bauzasthe alternative is to do a lot of configuration by privoding distinct traits https://review.opendev.org/c/openstack/nova/+/906151/1/doc/source/admin/virtual-gpu.rst#15818:02
bauzasbut that sounds nearly impracticable for ops to me18:02
_colbybauzas: interesting. Our use case this should be fin. We have not actually had many requests for multiple vgpus, and we dont offer any special NIC options at this time.18:05
_colbyWe have HPC systems for uses who need things like that. This is more for research groups to use for their projects.18:05
opendevreviewSylvain Bauza proposed openstack/nova master: Support multiple allocations for vGPUs  https://review.opendev.org/c/openstack/nova/+/84575718:06
opendevreviewSylvain Bauza proposed openstack/nova master: document how to ask for more a single vGPU  https://review.opendev.org/c/openstack/nova/+/90615118:06
bauzassean-k-mooney: added the relnote ^18:06
bauzassean-k-mooney: _colby also provided some insights about their VGPU usages18:06
bauzas_colby: thanks for your valuable feedback18:08
bauzas_colby: fwiw, we have a couple of improvments in flight this cycle and you may be interested in getting those18:08
sean-k-mooneyyep i saw18:09
bauzashttps://review.opendev.org/q/topic:%22bug/2041519%22+AND+(is:open+OR+is:merged)18:09
sean-k-mooney_colby even with the patch that bauzas is providing to fix this current bug i would really treat multipel vgpus as experimental18:09
bauzas_colby: the above link would give you the SRIOV fix 18:10
bauzassean-k-mooney: I'm cool with documenting that18:10
sean-k-mooneybauzas: have you tested the move operatios with multipel vgpus. like cold migrate and or your future live migration supprot18:10
bauzas(the fact that's more an unsupported feature than a real thing)18:11
bauzassean-k-mooney: no tbc18:11
bauzasalthough I don't see any reason why it wouldn't work 18:11
sean-k-mooneyi woudl certenly feel more comfortabel saying  "multiple vgpus in the same vm is experimatal, please report bugs"18:11
bauzassean-k-mooney: that's why I'm offering to drop the doc18:12
bauzasthe relnote seems enough to me18:12
bauzasif people want to use it, cool 18:12
sean-k-mooneylets discuss that next week. you shoudl go start your weekend18:13
bauzasthe patches are all ready now18:13
bauzaslet's see what CI is saying18:13
bauzasbtw. it occured to me that our jobs become again flakey on weird things18:13
sean-k-mooneyya a little 18:20
sean-k-mooneyany thing in particalr18:20
sean-k-mooneyi have seen some nova-lvm failures18:20
opendevreviewJay Faulkner proposed openstack/nova stable/2023.2: [ironic] Partition & use cache for list_instance*  https://review.opendev.org/c/openstack/nova/+/90615521:11
*** obrest is now known as obre21:15

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!