Wednesday, 2024-02-21

*** haleyb is now known as haleyb|out00:38
opendevreviewTakashi Kajinami proposed openstack/nova master: SEV; Add TODO note about iommu setting done in recent QEMU  https://review.opendev.org/c/openstack/nova/+/90963503:59
opendevreviewTakashi Kajinami proposed openstack/nova master: SEV: Add TODO note about iommu setting done in recent QEMU  https://review.opendev.org/c/openstack/nova/+/90963504:01
adam-metal3Hi all! I am new to using openstack sdk and in general I am new in most openstack components, after reading the nova and openstacksdk docs I still have a question. I hope someone could give some direction. If I understand correctly multiple volumes could be attached to a VM and the VM can be booted from volume. 07:47
adam-metal3Could there be a scenario where multiple bootable volume is attached to a VM and if yes then how could I tell via the openstacksdk that which volume was used for booting in this scenario?07:47
LarsErikPjamespage: Hi! sorry if I'm getting annoying :P Any news on a SRU for zed containing nova 26.2.1? 👼07:52
opendevreviewmelanie witt proposed openstack/nova master: WIP Support encrypted backing files for qcow2  https://review.opendev.org/c/openstack/nova/+/90796108:35
opendevreviewmelanie witt proposed openstack/nova master: Support cross cell resize with ephemeral encryption for qcow2  https://review.opendev.org/c/openstack/nova/+/90959508:35
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for raw with LUKS  https://review.opendev.org/c/openstack/nova/+/88431308:35
opendevreviewmelanie witt proposed openstack/nova master: libvirt: Introduce support for rbd with LUKS  https://review.opendev.org/c/openstack/nova/+/88991208:35
opendevreviewRajesh Tailor proposed openstack/nova master: Add support for showing requested az in output  https://review.opendev.org/c/openstack/nova/+/90456808:59
opendevreviewTakashi Kajinami proposed openstack/nova master: SEV: Add TODO note about iommu setting done in recent QEMU  https://review.opendev.org/c/openstack/nova/+/90963509:29
mnasiadkaHello, I've raised https://bugs.launchpad.net/nova/+bug/2054437 - is that maybe by design or is that an actual bug?10:24
*** mklejn_ is now known as mklejn10:32
opendevreviewDmitry Tantsur proposed openstack/nova master: ironic: clean up references to memory_mb/cpus/local_gb  https://review.opendev.org/c/openstack/nova/+/87841811:18
* bauzas is back, was on PTO this morning12:30
gibiUggla: sean-k-mooney: bauzas: discussing an edge case in the manilas series13:36
gibiwhen the user detaches a share from a VM via the REST API and this is the last user of the share on the compute13:36
gibithe share is unmounted13:37
gibilets assume that the unmounting fails on the compute13:37
gibishould nova delete the share mapping and leak the mount13:37
sean-k-mooney... the detach instnace action should go to error13:37
gibiOR13:37
sean-k-mooneyi think13:37
sean-k-mooneyan allow detach again via the api13:38
gibishould nova put the mapping to ERROR and let the user retry13:38
sean-k-mooneyalternitivly we can have a perodic13:38
sean-k-mooneyi think letting the user retyr is good13:38
gibithen we are on the same page13:38
sean-k-mooneybut nova could also retry if we wanted in a perodic13:38
gibiperiodic can be added separately yes13:38
Uggla@sean-k-mooney, ok but it means the VM cannot be restarted until the error on the share is gone.13:39
sean-k-mooneythe periodic only comes to mind as the share being monted may mean that the data can be accessed on the host13:39
sean-k-mooneyand if there was a vm breakout it might expose mroe data to that mallious guest then woudl other wise be there13:39
Uggla@sean-k-mooney, if I'm not wrong you wanted to be able to start the vm even if the share is in an error state for debugging. Am i wrong ?13:42
bauzassorry, I was preparing a meeting13:44
* bauzas looking above13:44
bauzasgibi: I would agree with sean-k-mooney, the instance action should be an errort13:46
bauzasso the user would need to detach the share again13:46
sean-k-mooneyi think we could allow starting the vm but not attch the errored share to it13:46
sean-k-mooneyor we could prevent that and require you to do the detach again before starting13:47
sean-k-mooneyim not sure what is the perfered approch here13:47
bauzassean-k-mooney: if we are not able to correctly umount the share by the compute, then we should continue to have a share mapping13:48
bauzasI think Uggla doesn't support already mounted shares without having share mappings13:48
bauzasUggla: right?13:48
sean-k-mooneythats fine13:48
sean-k-mooneywe can keep the vm in stopped with the share attach13:49
sean-k-mooneyand the instnace action in error13:49
sean-k-mooneyand then the user can either start the vm  or try the detach again13:49
bauzasyeah13:49
sean-k-mooneydepending on what they want 13:49
bauzasif they start the instance, they will have the share in it13:49
sean-k-mooneyif that is the approch we take then we dont need a perodic13:49
UgglaI could add a detach force option, that will allow the instance to start ? (but leak the share, I could avoid db surgery)13:50
sean-k-mooneyyep13:50
sean-k-mooneyUggla: please no :) we have scares form any time we add force13:50
Ugglaok13:50
bauzasUggla: I can't remember, does your series support to look at shares that are altrady mounted ?13:50
sean-k-mooneyi think if we just start with the share mounted as if you never tired the detach then we are good13:50
sean-k-mooneywe just need to make sure that we dont update the vm unless the host unmount works13:51
Ugglabauzas, yes you will be able to see the share in error state.13:51
sean-k-mooneyUggla: the share does not need to be in error13:51
sean-k-mooneythe share can stay in attached/in-use 13:51
sean-k-mooneythe instance action should be in error13:51
sean-k-mooneyoh actully13:52
sean-k-mooneydetach is not an instance ac tion n the normal case13:52
sean-k-mooneydo we have an instnace action event for detach13:52
sean-k-mooneyif we are going t oboot the vm with the share attached then the share attachemnt state should not be error13:53
Ugglasean-k-mooney, yes there will have an instance action for detach.13:53
bauzasUggla: I need to look again at the spec then13:53
sean-k-mooneyack then ya only that need to go to error13:53
sean-k-mooneyeveything else can keep its previous status13:53
bauzasbecause I think users shouldn't know whether the share is already mounted or not13:53
sean-k-mooneycorrect13:53
sean-k-mooneywe should not leak that13:53
sean-k-mooneywe should just report we failed to detach the share form the vm13:54
bauzastbc, I thought the detail was:  13:54
sean-k-mooneyand unmounting it is part of that internally but not somethign we should tell the use via the api 13:54
bauzasas an user, I want to attach an instance to a share13:54
bauzasit will create a share mapping13:54
bauzaseventually, when I'm starting, then nova looks at the share13:55
bauzasif the share is already mounted, cool13:55
bauzasif the share needs to be mounted, all cool too, it does it now13:55
bauzasright?13:55
sean-k-mooneythats my expectation13:55
sean-k-mooneythe user contract is if a share mappign exits between an instnace and a share13:56
Ugglabauzas, no. User need to ensure the share is in active status (means it is mounted and VM and share has a link)13:56
sean-k-mooneyif we start the instance then nova will do whatever is required to ensure teh vm starts with the share13:56
bauzasUggla: okay, hence my needing to look at your spec13:57
UgglaI mean if you start the VM and the share is still in "attaching  status" then it will fail.13:57
bauzasUggla: when is the share checked whether it's mounted ?13:57
bauzasinactive == mounted, while active == mounted, right ?13:57
sean-k-mooneyUggla: that sounds wrong to me13:57
bauzasdoh13:57
bauzasinactive == unmounted13:57
bauzasthere it goes, opening https://specs.openstack.org/openstack/nova-specs/specs/2024.1/approved/libvirt-virtiofs-attach-manila-shares.html13:58
sean-k-mooneythe use should not have to pool the share mapping status to start the vm13:58
sean-k-mooneywhen the vm is stopp and we attach a share13:58
sean-k-mooneywe should be able to do start immietaly 13:58
gibibut then there will be a race between the mount RPC and the start RPC13:59
bauzas"Mount operation will be done when the share is not mounted on the compute host. If a previous share would have been mounted on the compute host for another server, then it will attempt to mount it and a warning will be logged that the share was already mounted."13:59
Ugglabauzas, nothing = nothing, attaching = attach in progress, inactive = share mounted and ready, actif = Vm up and share readable13:59
bauzasit doesn't say when we mount13:59
sean-k-mooneywell the mount api call can only return after the mapping is created in the db13:59
sean-k-mooneyso there should not be a race13:59
sean-k-mooneyor rather theer does not need to be a race14:00
gibithe race would be about the host mount not about the DB14:00
sean-k-mooneyright but spwan should be ensuring the mount14:00
bauzasUggla: nothing == nothing, that's the "phi" you have in the spec ?14:00
gibibut we have a separate RPC to ensure the mount14:00
Ugglayep.14:00
gibiit is not the spawn that does the host mount14:00
bauzasgibi: when is called that RPC method ?14:01
sean-k-mooneythat soudn incorrect to me14:01
bauzasshit, I need to go in meeting14:01
gibibauzas: during the attach REST API call14:01
bauzasah, now I remember, that's the async RPC call14:01
gibiif I understood Sean correctly they want to do the host mount during spawn instead14:02
sean-k-mooneyyep14:02
bauzasI'm fine whenever14:02
sean-k-mooneyi would also expect the share to be unmoutned when we stop the vm for what its worth14:02
bauzasmy point is that the share mount can be done whenever we want, at least *before* we start14:02
sean-k-mooneybecause i would expect hard reboot to unmount it and remount it14:03
sean-k-mooneysince we normally use hard reboot to fix things like that14:03
bauzassean-k-mooney: you can attach the same mount multiple times14:03
sean-k-mooneythats fine14:03
sean-k-mooneythe mount and unmount could be based on the first/last active vm14:03
sean-k-mooneyor better each instance could have its onw mount and we could avoid reusign it. that what i orgianly wanted in the spec 14:04
bauzassean-k-mooney: fwiw, in the spec it says that an instance stop does make the share umounted14:04
sean-k-mooneyto have the same share mounted multipel times in seperat locations14:04
sean-k-mooneybut i know that is not what is in the code14:04
bauzasso it's already fine14:04
bauzas(if the implementation is the same)14:04
sean-k-mooneyso a few weeks ago i was sruprised14:04
gibiI guess the RPC during the attach REST API call is created to prepare for the future when a share can be mounted live without a stop/start cycle.14:05
sean-k-mooneywehn the share was not mounted under the instance state dir14:05
sean-k-mooneybecause i was orgianlly expecting the share not to be shared betwen instnace on the same host14:05
bauzasI think the word "inactive" is correct14:05
bauzasnothing is using the share14:05
bauzasbut we still have the reservation14:05
sean-k-mooneybauzas: the active/inactive state of a share mapping cant be mapped ot if its mounted on the hsot14:06
bauzasI hope so14:06
sean-k-mooneysince if we are sharing the mount point betwen vms it woudl depend on if other vms on the host are using it14:06
bauzasyeah, inactive just means that the instance doesn't use the share but it's still 'mapped' to it14:06
bauzaswhether the share is umounted or not, that's something internal14:07
bauzasas you say, the share could be continued to be mounted, because used by other instances14:07
bauzasnow, back to the original question from gibi,14:07
bauzasthere are two possibilities14:08
sean-k-mooneyfor me active means we have done what is required in nova and maniall so that the vm can use it if you start the vm14:08
Ugglasean-k-mooney, no active means VM is up and uses the share.14:08
sean-k-mooneythen it has no extra info14:09
bauzasIMHO, I think that a detach action that eventually goes into unmounting and which is impossible to do so should make the instance action on error14:09
sean-k-mooneythat is what instnace status=active means14:09
bauzasor we need a periodic14:09
bauzasbut honestly, I think we should say the truth14:09
sean-k-mooneythe active status of a share mappign shoudl not depend on the vm status (active vs powered off)14:10
bauzasthat's written in the spec14:10
sean-k-mooneythese are two diffenet logical resouces in the api14:10
gibibauzas: it is not about the instance action but about the state of the sharemapping. We can keep it inactive or we can put it in error.14:11
bauzasgibi: I got that, lemme summarize the exception handling case14:12
bauzasuser stops a VM, share mapping goes inactive14:12
sean-k-mooneylooking at the Share mapping status diagram in the spec14:13
bauzaswhen it stops the VM, some internal call is doing two actions :14:13
bauzas1/ detach the share from the VM14:13
sean-k-mooneyi can see where you were going with it but it does not follow the same conventiosn as cinder or neutron ports14:13
bauzas2/ umount the share if not used elsewhere14:13
gibibauzas: OK so you suggest to do the unmount during stop, not during the detach REST API call14:14
sean-k-mooneywell the spec says we do the unmount during stop14:14
bauzasnah, sorry14:14
bauzasI'm waaaay confused14:14
bauzasuser has to do two actions then14:14
bauzasstop then detach14:14
bauzasright?14:14
Ugglayep14:14
bauzasand detach does two things14:15
bauzas(internally)14:15
bauzasone is updating the guest XML14:15
gibinope14:15
bauzasthe other is to umount the share14:15
gibithe vm is stopped no need to update the XML14:15
Ugglagibi ++14:15
bauzasmy head is full of vGPU crap for now14:15
gibithe XML is generated during start based on the share_mappings14:16
bauzasI have to reload that context14:16
sean-k-mooneythe vm is stopped meaning the share is also unmoutned already14:16
gibinot14:16
gibitoday14:16
sean-k-mooneyin the spec that what the diagram states14:16
gibitoday stop is not unmounting the share from the host14:16
gibiOK, then that is a gap between the spec and the code14:16
sean-k-mooneywell that is not in line with the spec then14:16
sean-k-mooneyyep14:16
sean-k-mooneywhat in the spec i think aligns with what i would like to see14:17
gibido we want to move the host unmount to stop?14:17
sean-k-mooneyif we do that then detach cant fail in this edge case14:17
gibiand do we want to fail the stop action if unmount fails?14:17
bauzasdo we all agree on the fact that unmount is not strictly required ?14:18
sean-k-mooneyand i belive unmounting the share (if no other vm is using it) in stop is also more secure14:18
gibibauzas: do you mean, is it OK to leak mounts?14:18
sean-k-mooneybauzas: no14:18
gibiI'm not sure what happens with a mount if the export on the manila side is removed14:18
bauzasumount is not strictly required if we still have a share mapping, right ?14:18
gibibauzas: if you mean we have a record of the mount so we can do the unmount later, then yes14:19
sean-k-mooneyas far as im concerned the attach and detach rps shoudl not try to mount/unmount the share14:19
bauzasgibi: I'm going back to my original statement : I don't want to have share leaks14:19
bauzasgibi: if we're not able to guarantee that some share mapping state will continue to handle shares that are sitll residues on the compute, then we need to error the detach call14:20
sean-k-mooneyform the spec14:20
sean-k-mooney"""Share attachment/detachment can only be done if the VM state is STOPPED or ERROR. These are operations only on the database, and no RPC calls will be required to the compute API. This is an intentional design for this spec. As a result, this could lead to situation where the VM start operation fails as an underlying share attach fails."""14:20
sean-k-mooneythat ^ is what the code should be doing14:20
gibibauzas: I agree14:20
bauzasthe phi-state in the diagram actually needs to be explained more14:21
gibiOK. so If we follow that statement from the spec then both the mount and unmount RPC is unnecessary14:21
sean-k-mooneycorrect 14:21
sean-k-mooneyits just a cell db update14:21
bauzasI have a vgpu meeting in 8 mins so I'll need to bail out shortly14:21
sean-k-mooneyto create a row or delete it 14:21
gibithis also means we will host mount during start and unmount during stop / delete14:22
sean-k-mooneyyep14:22
sean-k-mooneyif start fails to mount we go to error wheere you can either retry or detach the share14:22
gibiso if the mount / umount fails start or stop action can fail and put the VM state in ERROR14:22
sean-k-mooneyyep14:22
sean-k-mooneythats what we approved14:22
sean-k-mooneyanb all the later assumtiosn of the spec are based on that14:23
gibiUggla: do you have a reasoning why we diverged from the spec in this regard? Was there any late findings during implementation?14:23
gibior in other words, what issues do you forsee if we do what the spec says and move the host mount / unmount?14:25
Ugglagibi, yes but I do not remember well. I guess the reason was simpler error mgmt and simplify later online attach.14:27
sean-k-mooneywell this discussion shows that the error magament is more complex14:28
gibionline attach is a good topic. When we have that in libvirt we will need an RPC during the attach REST API call. 14:28
sean-k-mooneyyes, although i do not think that is relvent here14:29
bauzasI think the intent was to let the user restart its instance if it was erroring due to transient issues14:29
sean-k-mooneyyes we will need an rpc  but that rpc would only be successful if the share was both mounted and attached to the vm14:29
sean-k-mooneyit shoudl likely revert the share mounting on the host if the libvirt attach failed14:30
sean-k-mooneyso you can try attach again14:30
bauzasthat said, there are billing implications on a stop action that would fail due to some internal problem14:30
sean-k-mooneywe explcitly allow detach in ERROR14:30
sean-k-mooneyto make sure that is not a porblem14:30
bauzasI don't think our public cloud providers woud like their users to be continued to be charged because they have some network issues14:30
gibiI think bauzas refer to stop failing not detach failing14:30
sean-k-mooneyif stop failes you can detach and then stop again14:30
bauzasgibi: correct14:31
gibiahh14:31
gibiyou cannot detach if it is not stopped14:31
sean-k-mooneythe spec say you can do it in the error state14:31
sean-k-mooneyalso from the ordering the vm would be stopped then the detach woudl fail14:31
sean-k-mooney*then the unount woudl fail14:32
gibiso stop fails, puts the VM in ERROR, we allow detach in ERROR, so user does the detach, detach deletes the mapping, we lost the mapping while the unmount never happened14:32
sean-k-mooneyso it should be safe to allow the detach to update the db.14:32
sean-k-mooneyyep and that where a perodic would come in 14:33
sean-k-mooneyto allow that to be healed14:33
* bauzas on a meeting by now14:34
sean-k-mooneysame14:34
gibithen lets park this for after the meeting14:34
opendevreviewTakashi Kajinami proposed openstack/os-vif master: Drop wrong stacklevel  https://review.opendev.org/c/openstack/os-vif/+/90968215:05
tkajinam^^^ I noticed I've made this mistake in a few places... it'd be nice if this can be merged and included in the final release for 2024.1 ...15:23
opendevreviewRajesh Tailor proposed openstack/nova master: Add support for showing requested az in output  https://review.opendev.org/c/openstack/nova/+/90456815:29
opendevreviewStephen Finucane proposed openstack/nova-specs master: Add openapi spec  https://review.opendev.org/c/openstack/nova-specs/+/90944818:06
artomSo, I tried running with CPU power management defaulted to on, and got this: https://review.opendev.org/c/openstack/whitebox-tempest-plugin/+/909785/120:05
artomInteresting libvirt failures launching, for example, with an emulator thread policy20:05
artomI wonder if the job is doing something wrong, or are there bugs...20:06
sean-k-mooney is that using isolate20:09
artom<cputune>20:10
artom<vcpupin vcpu='0' cpuset='6'/>20:10
artom<vcpupin vcpu='1' cpuset='5'/>20:10
artom<emulatorpin cpuset='4'/>20:10
artom</cputune>20:10
artomThat looks like `share`20:10
sean-k-mooneyits only share if cpu_shared_set=420:11
sean-k-mooneycpu_dedicated_set = 4-620:11
sean-k-mooneythat is isolate20:11
sean-k-mooneyhttps://763b7c31d44017973024-0677f910aec9b59957afc606ff2e533f.ssl.cf2.rackcdn.com/909785/1/check/whitebox-devstack-multinode/532b307/compute/logs/etc/nova/nova-cpu_conf.txt20:12
sean-k-mooneysu yes there is a bug its not onlinging the isolat cores20:12
sean-k-mooneythere will be 1 and its included in teh instnace numa toplogy20:12
artomThat's on the compute20:12
artomcpu_dedicated_set = 4-620:12
artomcpu_shared_set = 2,320:12
sean-k-mooneyoh this was on the contoler20:13
artomNo you got it right, it's the compute20:13
sean-k-mooneyso ya sound like a trivial bug with isolate20:13
sean-k-mooneythat should eb simple to reproduce and fix20:13
artomI think there are others20:14
artomIt's failing with a similar error on NUMA live migration...20:15
sean-k-mooneythat is the only one that can use a core form the cpu dedicated set that is not for a vcpu20:15
artomBut I need to track down the exact instance that it's live migrating...20:15
sean-k-mooneyso isolate is the only way we have a non vcpu  pinned to the cpu_dedicated_set20:16
sean-k-mooneyfor numa live migration this would cause issues if we dont have renes fix for the cpu pinning20:16
artomWhich we don't20:17
sean-k-mooneywell we should merge and backport that fix20:17
sean-k-mooneyfor now can you file a bug for isolate20:17
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/87777320:18
sean-k-mooneyso ya still open20:18
artomYeah, I guess he's concentrating on the manila stuff20:18
sean-k-mooneywhich is fair20:19
* artom tries depending on it to see if it reduces the number of failures20:20
sean-k-mooneybut that why that is broken20:20
sean-k-mooneyit should assumign its not in mergee confict20:20
artomI'll see you in two hours ;)20:21
sean-k-mooneyi was going to say i wont be around then....20:21
artomWhich is entirely legitimate20:21
sean-k-mooneybut when has that happend when i planed20:21
artomThat's between you and your sense of disipline ;)20:22
sean-k-mooneyand my stomach20:23
sean-k-mooneyfor some reasons it has notions that i should feed it at least once a  day20:23
artomPfft, what does it think it is20:24
sean-k-mooneyso the bug is here by the way https://github.com/openstack/nova/blob/master/nova/virt/libvirt/cpu/api.py#L10520:25
sean-k-mooneyi could have sworn  instance.numa_topology.cpu_pinning also had the isolated core for emulator threads20:25
sean-k-mooneywhen using that policy20:26
artomMore in power_up(), but yeah20:26
artomBoth, really20:26
sean-k-mooneyso based on the docs string https://github.com/openstack/nova/blob/master/nova/objects/instance_numa.py#L291-L29620:28
sean-k-mooneyit should be all the cores for teh instance20:29
artomWhen has a docstring ever been wrong ;)20:29
sean-k-mooneyi said this hsould be easy but this behviaor is berried deep in the cpu pinning code20:39
sean-k-mooneylooking at it quickly https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L2693-L270920:39
sean-k-mooneyi think it shold be working...20:40
artomYeah, same, the more I look, the more I'm lost20:40
sean-k-mooneyas far as i can tell we are correctly reserving 1 addtional core in the vms numa node 0 for the isolated thread20:40
sean-k-mooneyand then storign it in the same set/dict as all the rest20:41
sean-k-mooneythat what     if instance_cell.cpuset_reserved:20:41
sean-k-mooney                pinned_cpus |= instance_cell.cpuset_reserved20:41
sean-k-mooneyis doing 20:41
sean-k-mooneyanyway first stpe file bug20:42
sean-k-mooneysecond step create repoducer20:42
artomSo at least for that same instance20:43
artomWith the emulator thread on host CPU 420:43
artomIt logs this:20:44
artomFeb 21 19:15:30.167111 [compute/logs/screen-n-cpu.txt] np0036828693 nova-compute[42254]: DEBUG nova.virt.hardware [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf tempest-EmulatorThreadTest-1184416592 tempest-EmulatorThreadTest-1184416592-project-member] Selected cores for pinning: [(0, 6), (1, 5)], in cell 0 {{(pid=42254) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:905}}20:44
sean-k-mooneyso im sus that https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L2689-L269420:44
sean-k-mooneydoes not have a bug20:44
sean-k-mooneybut i cant fully say why i felll like that20:44
artomSo it doesn't log the "extra" emulator thread core20:45
sean-k-mooneyright20:46
artomAnd doesn't pin it at https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L90820:46
sean-k-mooneyjust doing an or20:46
sean-k-mooneypinned_cpus |= instance_cell.cpuset_reserved20:46
sean-k-mooneywould work if cpuset_reserved is disjoit from instance_cell.cpu_pinning.values()20:46
sean-k-mooneywhich it should be20:46
sean-k-mooneybut...20:47
sean-k-mooneythis do you see this debug message https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L663-L67320:48
sean-k-mooneyPacking an instance onto a set of sibling ...20:48
artomFeb 21 19:15:30.165439 [compute/logs/screen-n-cpu.txt] np0036828693 nova-compute[42254]: DEBUG nova.virt.hardware [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf tempest-EmulatorThreadTest-1184416592 tempest-EmulatorThreadTest-1184416592-project-member] Packing an instance onto a set of siblings:     host_cell_free_siblings: [{6}, set(), {5}, {4}, set()]    instance_cell: InstanceNUMACell(cpu_pinning_raw=None,cpu_policy='dedi20:49
artomcated',cpu_thread_policy=None,cpu_topology=<?>,cpuset=set([]),cpuset_reserved=None,id=0,memory=64,pagesize=None,pcpuset=set([0,1]))    host_cell_id: 0    threads_per_core: 1    num_cpu_reserved: 1 {{(pid=42254) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:663}}20:49
sean-k-mooneyok so ya its asking for the the isolated core20:50
artomWell, we know the policy itself works20:50
artomBut is it recording the extra pinned core in a way that the CPU power management code can then power on/off20:50
sean-k-mooneyso we should run https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L754-L78220:50
sean-k-mooneyits recordigin it in cpuset_reserved20:51
sean-k-mooneywhich is then |= with the other cores20:51
artomThe comment on that says that it's reserved for the hypervisor20:51
artomIn the object20:51
sean-k-mooneyin this case hypervior means qemu20:52
sean-k-mooneyits reserved for the qemu emulator thread20:52
artomHah20:52
artomSo should the power management code be powering down/up cpuset_reserved as well?20:52
artomWithout going through the whole paperwork, we can do a quick POC and Depends-on in whitebox to see20:53
sean-k-mooneyit should already be incldued into pinned_cpus20:53
sean-k-mooneybut yes20:53
sean-k-mooneyit should be powering up the emulator thread core20:53
sean-k-mooneydo you see this message https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L776-L77720:53
artomIt just checks cpu_pinning...20:54
sean-k-mooney yep20:54
sean-k-mooneyFeb 21 19:15:30.166825 np0036828693 nova-compute[42254]: INFO nova.virt.hardware [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf tempest-EmulatorThreadTest-1184416592 tempest-EmulatorThreadTest-1184416592-project-member] Computed NUMA topology reserved pCPUs: usable pCPUs: [[], [], [4]], reserved pCPUs: {4}20:55
sean-k-mooneybut notice anythying odd20:55
sean-k-mooneywe have these two logs back to back20:55
sean-k-mooneyFeb 21 19:15:30.166472 np0036828693 nova-compute[42254]: INFO nova.virt.hardware [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf tempest-EmulatorThreadTest-1184416592 tempest-EmulatorThreadTest-1184416592-project-member] Computed NUMA topology CPU pinning: usable pCPUs: [[6], [5], [4]], vCPUs mapping: [(0, 6), (1, 5)]20:55
sean-k-mooneyFeb 21 19:15:30.166825 np0036828693 nova-compute[42254]: INFO nova.virt.hardware [None req-ec45061f-e9a4-4b02-9354-0cb390bd28cf tempest-EmulatorThreadTest-1184416592 tempest-EmulatorThreadTest-1184416592-project-member] Computed NUMA topology reserved pCPUs: usable pCPUs: [[], [], [4]], reserved pCPUs: {4}20:55
sean-k-mooneyit calulated both20:56
sean-k-mooneyand we shoudl be merging them20:56
sean-k-mooneybut then it prints 20:57
sean-k-mooneySelected cores for pinning: [(0, 6), (1, 5)], in cell 0 {{(pid=42254) _pack_instance_onto_cores /opt/stack/nova/nova/virt/hardware.py:905}}20:57
sean-k-mooneywhich is form here https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/virt/hardware.py#L905-L90620:58
sean-k-mooneyartom: tldr adding cpuset_reserved to power on and power off would fix it20:59
artomYeah, lemme try a quick hax of that20:59
sean-k-mooneybut but i thin kbased on the code we ere expiectin that not to be required 21:00
sean-k-mooneyso maybe we need to update the  doc string or the propery to include cpuset_reserved21:00
sean-k-mooneyi.e. if you made https://github.com/openstack/nova/blob/3209f6551652cff7bef0b9d9719ab940dd05a0f8/nova/objects/instance_numa.py#L291-L296 also include it21:02
sean-k-mooneyok im going to finish there let me know how your hacking goes21:02
artomIt's done, just need to push and retry whitebox21:02
opendevreviewArtom Lifshitz proposed openstack/nova master: POC: online dedicated emulator threads as well  https://review.opendev.org/c/openstack/nova/+/90979521:03
artomHrmm, I can't Depends-on different Nova changes, can I?21:03
sean-k-mooneyyou can21:04
sean-k-mooneyyou can depends on 2 diffent patches in the same repo21:05
sean-k-mooneyyou normally dont want too21:05
artomReally? And Zuul does like a magic merge of them?21:05
sean-k-mooneyyep21:05
sean-k-mooneyso as long as they dont have merge conflcit between each other it will work21:05
artomDayum, I'm impressed21:05
artomOK, https://review.opendev.org/c/openstack/nova/+/909795 can run while I pick up the kids from school21:06
sean-k-mooneywe normally done do tha because its way more fragile but it works i limited caes21:06
opendevreviewArtom Lifshitz proposed openstack/nova master: POC: online dedicated emulator threads as well  https://review.opendev.org/c/openstack/nova/+/90979521:07
sean-k-mooneyartom: cell is spelt with a c21:08
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/909795/1/nova/objects/instance_numa.py#30121:08
artomDepends what you're celling21:09
sean-k-mooney:)21:09
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Fix device type when booting from ISO image  https://review.opendev.org/c/openstack/nova/+/90961121:09
opendevreviewPavlo Shchelokovskyy proposed openstack/nova master: Fix device type when booting from ISO image  https://review.opendev.org/c/openstack/nova/+/90961123:08

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!