Tuesday, 2018-10-23

imacdonntonyb: around?00:10
tonybimacdonn: Kinda ... in meetings (yes plural) ;P00:11
*** rcernin has joined #openstack-nova00:11
imacdonntonyb: heh, ok .. whenever you get a chance, take a look at this backport .... https://review.openstack.org/61170100:11
tonybimacdonn: Will do.  I'm thinking I might ask on the operators list before we merge00:12
tonybIt00:12
imacdonntonyb: fair enough - thanks!00:12
*** rcernin_ has quit IRC00:12
*** rcernin has quit IRC00:13
*** rcernin has joined #openstack-nova00:14
*** mlavalle has quit IRC00:32
*** takashin has joined #openstack-nova00:33
openstackgerritMerged openstack/nova stable/pike: Handle volume API failure in _post_live_migration  https://review.openstack.org/61109300:43
*** tetsuro has joined #openstack-nova00:45
*** gouthamr has joined #openstack-nova00:47
*** brinzhang has joined #openstack-nova00:50
*** tetsuro has quit IRC01:06
*** tetsuro_ has joined #openstack-nova01:06
*** mrsoul has quit IRC01:21
*** spatel has joined #openstack-nova01:22
*** imacdonn has quit IRC01:23
*** imacdonn has joined #openstack-nova01:23
*** markvoelker has joined #openstack-nova01:25
*** hongbin has joined #openstack-nova01:27
*** tiendc has joined #openstack-nova01:29
*** hongbin has quit IRC01:36
openstackgerritMerged openstack/nova master: Zuul: Update barbican experimental job  https://review.openstack.org/61014101:37
*** hongbin has joined #openstack-nova01:37
*** hongbin_ has joined #openstack-nova01:41
*** Dinesh_Bhor has joined #openstack-nova01:42
*** hongbin has quit IRC01:43
*** trungnv has joined #openstack-nova01:44
*** mhen has quit IRC01:44
Dinesh_BhorHi All, Is there a recommended/official API rate limiter for Nova?01:46
*** mhen has joined #openstack-nova01:47
*** TuanDA has joined #openstack-nova01:49
*** idlemind has joined #openstack-nova01:53
*** Dinesh_Bhor has quit IRC01:59
*** hongbin has joined #openstack-nova02:05
*** hongbin_ has quit IRC02:07
openstackgerritMerged openstack/nova master: Remove the extensions framework from wsgi.py  https://review.openstack.org/60709202:12
*** lei-zh has joined #openstack-nova02:22
* alex_xu begin the spec review day02:22
openstackgerritMerged openstack/nova master: Remove duplicate legacy-tempest-dsvm-multinode-full job  https://review.openstack.org/61093102:24
*** lei-zh has quit IRC02:30
*** lei-zh1 has joined #openstack-nova02:30
*** Dinesh_Bhor has joined #openstack-nova02:35
*** trungnv has quit IRC02:36
*** TuanDA has quit IRC02:36
*** trungnv has joined #openstack-nova02:36
*** TuanDA has joined #openstack-nova02:36
openstackgerritFan Zhang proposed openstack/nova master: Retry after hitting libvirt error VIR_ERR_OPERATION_INVALID in live migration.  https://review.openstack.org/61227202:48
*** psachin has joined #openstack-nova02:57
*** dave-mccowan has quit IRC02:57
openstackgerritSeyeong Kim proposed openstack/nova master: Enable connection_info refresh for new-style attachments  https://review.openstack.org/57900402:59
*** mikeoschen has joined #openstack-nova03:00
*** munimeha1 has quit IRC03:01
openstackgerritMerged openstack/nova stable/rocky: Use nova-consoleauth only if workaround enabled  https://review.openstack.org/61067303:03
openstackgerritMerged openstack/nova master: conductor: Recreate volume attachments during a reschedule  https://review.openstack.org/58707103:10
*** Dinesh_Bhor has quit IRC03:37
*** lei-zh1 has quit IRC03:39
*** hongbin has quit IRC03:52
*** udesale has joined #openstack-nova03:58
openstackgerritSam Morrison proposed openstack/nova stable/rocky: Fix up compute rpcapi version for pike release  https://review.openstack.org/61256104:02
openstackgerritSam Morrison proposed openstack/nova stable/queens: Fix up compute rpcapi version for pike release  https://review.openstack.org/61256204:03
*** Dinesh_Bhor has joined #openstack-nova04:42
openstackgerritSeyeong Kim proposed openstack/nova master: Enable connection_info refresh for new-style attachments  https://review.openstack.org/57900404:51
*** lei-zh1 has joined #openstack-nova04:54
*** janki has joined #openstack-nova04:59
*** spatel has quit IRC05:02
*** ratailor has joined #openstack-nova05:18
*** jiaopengju has quit IRC05:20
*** jiaopengju has joined #openstack-nova05:23
*** mikeoschen has quit IRC05:23
*** tbachman has quit IRC05:33
*** tbachman has joined #openstack-nova05:37
*** ralonsoh has joined #openstack-nova05:40
*** ttsiouts has quit IRC05:42
*** ttsiouts has joined #openstack-nova05:43
*** Luzi has joined #openstack-nova05:44
*** ttsiouts has quit IRC05:47
*** takashin has left #openstack-nova05:48
*** spsurya has joined #openstack-nova05:48
*** tetsuro_ has quit IRC05:50
*** jangutter has joined #openstack-nova06:03
*** hamdyk has joined #openstack-nova06:06
*** maciejjozefczyk has quit IRC06:09
*** adrianc has joined #openstack-nova06:26
*** adrianc_ has joined #openstack-nova06:26
*** ccamacho has joined #openstack-nova06:31
*** slaweq has joined #openstack-nova06:43
*** moshele has joined #openstack-nova06:50
*** Dinesh_Bhor has quit IRC06:52
*** pcaruana has joined #openstack-nova06:56
aperevalovHello, community! Do you have functional test for Direct ports. Does this test requires HW with SR-IOV support? I can't find something similar in the neutron-tempest-test.07:01
bauzasgood morning Nova07:02
openstackgerritSeyeong Kim proposed openstack/nova master: Enable connection_info refresh for new-style attachments  https://review.openstack.org/57900407:02
*** jaosorior has quit IRC07:02
*** jaosorior has joined #openstack-nova07:05
*** maciejjozefczyk has joined #openstack-nova07:05
*** rcernin has quit IRC07:06
*** rpittau has quit IRC07:07
*** rpittau has joined #openstack-nova07:07
*** alexchadin has joined #openstack-nova07:10
*** sahid has joined #openstack-nova07:16
*** Dinesh_Bhor has joined #openstack-nova07:16
*** helenafm has joined #openstack-nova07:20
*** lei-zh1 has quit IRC07:24
*** lei-zh1 has joined #openstack-nova07:24
openstackgerritJan Gutter proposed openstack/nova-specs master: Spec to implement generic HW offloads for os-vif  https://review.openstack.org/60761007:40
openstackgerritAdrian Chiris proposed openstack/nova master: WIP - SRIOV live migration  https://review.openstack.org/61262007:44
openstackgerritAdrian Chiris proposed openstack/nova master: WIP - SRIOV live migration  https://review.openstack.org/61262008:02
*** bhagyashris has joined #openstack-nova08:07
openstackgerritPooja Jadhav proposed openstack/nova master: Ignore root_gb for BFV in simple tenant usage API  https://review.openstack.org/61262608:15
*** k_mouza has joined #openstack-nova08:15
*** pvradu has joined #openstack-nova08:16
*** ttsiouts has joined #openstack-nova08:17
*** dtantsur|afk is now known as dtantsur08:22
*** mvkr has quit IRC08:26
*** derekh has joined #openstack-nova08:29
openstackgerritMerged openstack/nova master: Convert legacy-tempest-dsvm-neutron-src-oslo.versionedobjects job  https://review.openstack.org/61027108:29
*** moshele has quit IRC08:30
*** gouthamr has quit IRC08:32
openstackgerritElod Illes proposed openstack/nova master: Transform compute_task notifications  https://review.openstack.org/48262908:37
*** k_mouza has quit IRC08:41
openstackgerritElod Illes proposed openstack/nova master: Transform compute_task notifications  https://review.openstack.org/48262908:43
*** k_mouza has joined #openstack-nova08:43
*** priteau has joined #openstack-nova08:45
*** moshele has joined #openstack-nova08:50
*** cdent has joined #openstack-nova08:51
*** mvkr has joined #openstack-nova08:51
*** Dinesh_Bhor has quit IRC08:54
*** moshele has quit IRC09:00
*** k_mouza has quit IRC09:06
*** k_mouza has joined #openstack-nova09:07
*** gouthamr has joined #openstack-nova09:11
*** tetsuro has joined #openstack-nova09:14
*** bzhao__ has quit IRC09:20
*** jpena|off is now known as jpena09:22
*** alex_xu has quit IRC09:23
*** lei-zh1 has quit IRC09:28
*** k_mouza has quit IRC09:28
*** alex_xu has joined #openstack-nova09:29
bauzasstephenfin: good morning09:33
bauzasstephenfin: do we have some documentation about PCI NUMA policies in https://docs.openstack.org/nova/queens/admin/adv-config.html ?09:33
bauzasstephenfin: or shall I refer to the implemented spec https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/share-pci-between-numa-nodes.html ?09:33
bauzasstephenfin: context is me trying to update https://review.openstack.org/#/c/552924/10/specs/stein/approved/numa-topology-with-rps.rst09:33
*** k_mouza has joined #openstack-nova09:35
*** mvkr has quit IRC09:36
*** bhagyashris has quit IRC09:39
*** mvkr has joined #openstack-nova09:50
stephenfinbauzas: We should have. Sec09:51
*** udesale has quit IRC09:51
*** udesale has joined #openstack-nova09:52
*** abhi89 has joined #openstack-nova09:55
stephenfinbauzas: This is all we have https://docs.openstack.org/nova/rocky/configuration/config.html#pci09:55
stephenfinWe should probably expand that out. As such though, the spec looks like the best option09:56
abhi89Hi.. whenever nova service runs a command using sudo, does this make a call to unix_chkpwd program which is part of pam_unix module?09:58
*** pvradu has quit IRC10:00
*** pvradu has joined #openstack-nova10:00
openstackgerritElod Illes proposed openstack/nova master: Transform scheduler.select_destinations notification  https://review.openstack.org/50850610:01
*** TuanDA has quit IRC10:04
*** pvradu_ has joined #openstack-nova10:05
*** Dinesh_Bhor has joined #openstack-nova10:07
*** sahid has quit IRC10:08
*** adrianc has quit IRC10:08
*** pvradu has quit IRC10:09
*** adrianc_ has quit IRC10:09
*** pvc_ has joined #openstack-nova10:15
pvc_hi bauzas can you help me regarding vgpu?10:15
*** Dinesh_Bhor has quit IRC10:15
pvc_http://paste.openstack.org/show/732763/10:15
bauzasstephenfin: ack, thanks10:16
bauzasstephenfin: anway, it's just for explainint that it won't be modified by my spec :)10:16
*** cdent has quit IRC10:16
pvc_bauzas hi10:16
*** brinzh has joined #openstack-nova10:17
*** brinzhang has quit IRC10:20
pvc_can help me regarding vgpu bauzas?10:22
bauzaspvc_: I'm pretty busy today with specs reviews and writing, can we discuss other days ?10:22
bauzaspvc_: anyway, looking at your paste, doesn't seem related to nova at all10:23
pvc_noted on this i cant install the driver10:23
pvc_thank you10:23
*** fanzhang has quit IRC10:26
stephenfinOh, today is spec review day. Oops10:32
pvc_can i ask if i need the vfio-pci or not?10:36
pvc_this one NVIDIA-Linux-x86_64-390.72-vgpu-kvm.run?10:38
*** cdent has joined #openstack-nova10:43
*** mrch has joined #openstack-nova10:43
*** moshele has joined #openstack-nova10:45
*** adrianc_ has joined #openstack-nova10:45
*** adrianc has joined #openstack-nova10:45
*** tbachman has quit IRC10:47
pvc_how can i add now a vgpu to use by my instance?10:50
sean-k-mooneypvc_: you need to add a resouce class request to your instance flavor10:54
sean-k-mooneyif you do then the libvirt dirver will automatically add it to your vm10:54
sean-k-mooneythat said that assumes your host has an nvida vgpus capable graphics card and you have the correct drivers on the host to enable it10:55
sean-k-mooneypvc_: e.g. you card needs to be on this list https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html10:56
sean-k-mooneypvc_: if your card is not on that list you cannot use the vgpu support and can only use pci passthough instead to dedicate the entire gpu to the guest10:57
pvc_i have this mdev10:57
*** erlon has joined #openstack-nova10:58
pvc_sean-k-mooney http://paste.openstack.org/show/732798/10:58
sean-k-mooneythose are the mdev typs not mdevs but you need to choose one to enable in teh nova config10:59
pvc_i already have this wait11:00
pvc_enabled_vgpu_types = nvidia-16011:01
pvc_i add it on overcloud-compute11:01
pvc_Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 4c6de80f-cc4d-47c9-a63d-f72a5584ecb6.11:02
openstackgerritSylvain Bauza proposed openstack/nova-specs master: Proposes NUMA topology with RPs  https://review.openstack.org/55292411:03
pvc_do i need to do something sean-k-mooney?11:03
sean-k-mooneypvc_: overcloud compute are you doing a triple-o deployment? if so im assumeing the compute node is an ironic baremental node11:03
pvc_yes it is a tripleo-deployment11:03
*** ttsiouts has quit IRC11:04
sean-k-mooneylooking at the docs https://docs.openstack.org/nova/queens/admin/virtual-gpu.html i dont see anything specifically11:05
pvc_i already set my flavor http://paste.openstack.org/show/732799/11:06
openstackgerritMerged openstack/nova master: Rename tempest-nova job to follow conventions  https://review.openstack.org/61223011:08
*** yikun has quit IRC11:09
pvc_libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-10-23T11:07:33.700541Z qemu-kvm: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/9d0df47b-1fc5-4868-bd39-3cf301918e78,bus=pci.0,addr=0x5: vfio error: 9d0df47b-1fc5-4868-bd39-3cf301918e78: error getting device from group 58: Input/output error11:09
pvc_sean-k-mooney http://paste.openstack.org/show/732800/11:10
sean-k-mooneypvc_: that look like you have having issue with your iommu config11:10
openstackgerritMerged openstack/nova master: Use assertRegex instead of assertRegexpMatches  https://review.openstack.org/61160811:11
sean-k-mooneyjust to check a few things you are useing kvm as the hyptervior correct11:11
sean-k-mooneycan you show me your kernel commandline11:11
sean-k-mooneyyou should have intel_iommu=on and iommu=pt set11:11
sean-k-mooneysimilarly you should have vt-d enabled in the bios asd you would for sriov11:12
pvc_http://paste.openstack.org/show/732802/ sean-k-mooney11:12
*** tetsuro has quit IRC11:12
sean-k-mooneybut my guess is that the slot the gpu is in is shareing an iommu group with another device11:13
sean-k-mooneypvc_: add iommu=pt to the host as a starting point.11:13
pvc_wait11:13
pvc_i add it then reboot11:13
sean-k-mooneyafter you regenerate your grub file yes11:14
pvc_i dont need the vfio_pci right?11:14
*** udesale has quit IRC11:14
sean-k-mooneyyou do11:14
pvc_vfio-pci module?11:14
sean-k-mooneymdev stands for vfio mediated device11:14
pvc_i disable it since i cannot install the NVIDIA-kvm driver11:15
sean-k-mooneyif you disable it then kvm cannot pass through any device that depend on the kernel vfio framework which included mdevs11:16
pvc_okay i'll set it back11:16
sean-k-mooneydid you follow instruction that said you should disable it from nvidia?11:16
pvc_but my compute node already have the nvidia driver11:16
pvc_because earlier i cant install the nvidia-driver11:16
pvc_after disabled the vfio_pci, i successfully installed it11:17
pvc_i will get it back11:17
sean-k-mooneyso looking at https://images.nvidia.com/content/grid/pdf/GRID-vGPU-User-Guide.pdf in section 3.1 it does not say anything about disableing vfio so i would guess that is the issue11:18
pvc_i will get it back sean-k-mooney wait11:18
pvc_so install the driver then enable it. I enable it first after installing the driver11:19
pvc_sean-k-mooney can i add all the available vgpu on the nova.conf?11:22
pvc_then create a flavor that have 1 VGPU, 2 VGPU, 3 VGPU?11:23
sean-k-mooneypvc_: you can only have 1 vgpu per vm currently and you can only enable 1 vgpu mdev type per phyical host11:25
sean-k-mooneypart of the limitation comes form libvirt and part form nvidia. multiple mdevs can be attached to the same instance but its a rather new addtion to the kernel and things have not really mautred yet in libvirt/qemu11:26
pvc_so i cannot use the 24gpus on my physical host?11:26
janguttersean-k-mooney: minor correction, intel_iommu=on is required, and "iommu=pt" is discouraged.11:27
*** panda is now known as panda|lunch11:27
janguttersean-k-mooney: iommu=pt used to be required when DPDK hadn't set vfio-pci as it's default transport yet, and people used uio.11:28
*** ratailor has quit IRC11:29
sean-k-mooneyjangutter: yes but i still recommend iommu=pt as the iommu is picky somethimes and i find you hit less corner cases if you limit its scope to pasthrough devices11:29
pvc_ERROR nova.compute.manager [instance: 014fcc5f-660b-40df-92ed-9f4587993fa7] Verify all devices in group 58 are bound to vfio-<bus> or pci-stub and not already in use11:30
sean-k-mooneypvc_: yes so as i seaid previously i think your error is the gpus is sharingin an iommu group with anothe rdevice11:31
sean-k-mooneypvc_: all devices in a iommu group must use the same kernel driver11:31
janguttersean-k-mooney: heh, iommu=pt is one of the worst-named options. The "passthrough" mapping it enables is a way to 'bypass' the IOMMU by creating a 1:1 memory map for the PCI space.11:31
pvc_how can i check that sean-k-mooney?11:31
sean-k-mooneyjangutter: yes and that 1:1 mapping fixes so manny things :P11:31
jangutterpvc_: yeah, you have to hand off _all_ the devices in an iommu group, and some platforms can't divide between them.11:32
sean-k-mooneypvc_: am you can find this in sysfs11:32
pvc_sysfs?11:32
sean-k-mooneylet me see if i can remember11:32
pvc_thank you11:32
pvc_jangutter i cannot use the 12vgpus of my 1 tesla?11:32
pvc_only 1 vgpus?11:32
*** jpena is now known as jpena|lunch11:33
jangutterpvc_: it depends, the chipset may allow you to pass the entire card at once, but not a portion of it.11:33
sean-k-mooneypvc_:  you can but if your tesla share an iommu group with a nic then they both need to be bound to vfio-pci or pci-stub11:34
jangutterpvc_: the kernel documentation (low level warning) is at: https://www.kernel.org/doc/Documentation/vfio.txt11:34
pvc_thankyou jangutter, how can i do that sean-k-mooney?11:34
pvc_sorry this is my first time using vgpu, im using just the pci-passthroigh11:34
sean-k-mooneyfirst we need to see what is in iommu group 58 in  /sys/class/iommu/11:35
jangutterpvc_: you can do something like: ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices11:35
jangutterpvc_: where the pci address is obviously yours.11:35
jangutterpvc_: that's a list of PCI devices in one group11:35
jangutterpvc: they _all_ have to be passed through together, or it will fail.11:36
pvc_lrwxrwxrwx. 1 root root 0 Oct 23 11:36 0000:06:00.0 -> ../../../../devices/pci0000:00/0000:00:02.0/0000:06:00.011:36
jangutterAlso check /sys/kernel/iommu_groups/58 ?11:37
openstackgerritElod Illes proposed openstack/nova master: Transform scheduler.select_destinations notification  https://review.openstack.org/50850611:37
pvc_979d010e-17a3-4ac9-987a-565f9ba4b4a611:38
pvc_[root@overcloud-novacompute-0 iommu]# ls /sys/kernel/iommu_groups/58/devices/ 979d010e-17a3-4ac9-987a-565f9ba4b4a611:38
jangutterpvc_: interesting... I haven't seen a UUID there yet. can you ls -l it?11:39
sean-k-mooneyjangutter: my guess is the uuid is a mdev uuid11:39
pvc_http://paste.openstack.org/show/732803/11:39
pvc_mdev bus types plus driver http://paste.openstack.org/show/732805/11:41
*** janki has quit IRC11:41
*** claudiub has joined #openstack-nova11:41
claudiubheyo. since it's spec review day, could you take a look again at the live-resize one? https://review.openstack.org/#/c/141219/11:42
jangutterpvc_, sean-k-mooney: should the PCI passthrough libxml element set "managed=true"?11:43
janguttersean-k-mooney: managed=true generally means that it will auto-bind vfio-pci to the device before attempting passthrough?11:43
bauzasjangutter, sean-k-mooney: context ?11:43
sean-k-mooneyjangutter: sorry where was the libvirt xml i missed that11:43
*** ttsiouts has joined #openstack-nova11:43
sean-k-mooneybauzas: trying to help pvc_ with there vgpu issue11:43
bauzasand?11:44
sean-k-mooneybauzas: pvc_ is seeing a weird iommu error11:44
jangutterhang on, looking up the libxml doc.11:44
bauzaswe don't manage the iommu group11:44
*** cdent has quit IRC11:45
jangutter(rofl) s/libxml/libvirt/11:45
sean-k-mooneybauzas: yes that is managed by the uefi and kernel11:45
*** eharney has joined #openstack-nova11:45
*** markvoelker has quit IRC11:45
bauzasoh wait11:46
janguttersean-k-mooney: when doing https://libvirt.org/formatdomain.html#elementsHostDevSubsys <--- there's a 'managed=yes' element in the xml. if that's set it will do the vfio-pci binding for you.11:46
bauzasis pvc_ doing PXI11:46
sean-k-mooneybauzas: pvc_ is getting Verify all devices in group 58 are bound to vfio-<bus> or pci-stub and not already in use11:46
bauzaspci passthrough?11:46
pvc_im using vgpu bauzas11:47
sean-k-mooneybauzas: no pvc_ is trying to do vgpu passthrouhg not pci11:47
sean-k-mooneybauzas: this is there flavor http://paste.openstack.org/show/732799/11:47
bauzassec, GPU passthrough?11:47
sean-k-mooneyand pvc_ has set the gpu type in the config to nvidia-16011:47
bauzasI'm confused11:48
jangutterpvc_: what does "readlink /sys/bus/pci/devices/0000:06:00.0" say?11:48
bauzasvirtual GPU or GPU passthrough?11:48
*** ttsiouts has quit IRC11:48
sean-k-mooneybauzas: pvc_ has a tesla gp100 and is trying to ues the mdev based virtual gpu11:48
bauzasthen don't do vfio bus11:49
bauzasor pci stub11:49
bauzasjust use the nvidia kernel module11:50
jangutteraaah, the penny drops.11:50
sean-k-mooneypvc_: can you provide the libvirt xml that nova generated so we can see what its doing11:50
pvc_i add an option of  options vfio-pci ids=10de:15f8 should i disable this?11:50
pvc_then new error is occured11:50
bauzassean-k-mooney : I think pvc_ is mixing two different things11:50
pvc_2018-10-23 11:49:46.754 7 WARNING nova.virt.libvirt.driver [req-a3570604-eed3-4f8f-a244-202ac2b92b7d - - - - -] Error from libvirt while getting description of instance-00000001: [Error Code 42] Domain not found: no domain with matching uuid '2ac6c395-5f92-4e9e-a52a-cf90b9d551c5' (instance-00000001): libvirtError: Domain not found: no domain with matching uuid '2ac6c395-5f92-4e9e-a52a-cf90b9d551c5' (instance-00000001)11:50
openstackgerritJim Rollenhagen proposed openstack/nova-specs master: Use conductor groups to partition nova-compute services for Ironic  https://review.openstack.org/60970911:50
sean-k-mooneypvc_: you should not be disableing or foceing vfio-pci11:51
sean-k-mooneyyou should allow it to be loaded if need but oterwise do not set any options for the vfio kernel module at all11:52
pvc_http://paste.openstack.org/show/732806/11:53
bauzasyup this11:54
sean-k-mooneypvc_: the phyical gpu needs to be bound to the nvidia grid driver and the mdevs will be created via the vfio framwork in the kenel but you should not force the pgpu to use vfio-pci11:54
pvc_i need to remove that?11:54
bauzasyup it's only for GPU passthrough11:54
pvc_okay wait11:54
bauzashence my confusion11:54
pvc_i'll remove11:54
sean-k-mooneybauzas: im guessing the driver in use should be nvidia_vgpu_vfio or nvida correct11:54
sean-k-mooneypraobly nvidia_vgpu_vfio11:55
pvc_i reboot again the hypervisor wait11:55
bauzassean-k-mooney, I don't remember the module name but yeah something like that11:56
pvc_bauzas is it possible to use the 12vgpus of my gpu?11:57
bauzasFWIW, I'll have connection issues this afternoon due to some planned outage in my street11:57
sean-k-mooneywell of the 3 nouveau, nvidia_vgpu_vfio, nvidia. nouveau is the opensouce driver for the pgpu,  nvidia is the binary driver from nvida for the same so that just leaves nvidia_vgpu_vfio11:58
sean-k-mooneypvc_: that depends on the mdev type you selected. but you should be able to however you can only request 1 vgpu per guest currently11:59
pvc_06:00.0 3D controller [0302]: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] [10de:15f8] (rev a1)         Subsystem: NVIDIA Corporation Device [10de:118f]         Kernel driver in use: nvidia12:00
pvc_it's already nvidia12:00
pvc_http://paste.openstack.org/show/732807/12:00
sean-k-mooneypvc_: yes but it may need to be nvidia_vgpu_vfio. nvidia is gust the normal binary driver for using the gpu on the host12:01
sean-k-mooneybest thing to do is try and boot a vm and see what happens12:01
pvc_same error :( http://paste.openstack.org/show/732808/12:02
*** ttsiouts has joined #openstack-nova12:02
pvc_http://paste.openstack.org/show/732808/ bauzas and sean-k-mooney12:02
*** spatel has joined #openstack-nova12:04
sean-k-mooneypvc_: in this case try unbinding the card from nvdia driver and bind it to nvidia_vgpu_vfio12:04
pvc_noted on this12:04
pvc_i add this on module12:04
pvc_options nvidia_vgpu_vfio ids=10de:15f812:05
sean-k-mooneycan yo bind it by hand instad of via the moduel file to test it12:05
pvc_how can i bind it? i'm sorry im not done it before12:05
bauzasthat's super weird12:05
pvc_this is my conf12:06
pvc_http://paste.openstack.org/show/732809/12:06
*** dave-mccowan has joined #openstack-nova12:07
bauzaspvc_ did you remove the nouveau driver ?12:07
sean-k-mooneyecho  06:00.0 | sudo tee /sys/bus/pci/drivers/nvidia/unbind12:07
sean-k-mooneyecho  06:00.0 | sudo tee /sys/bus/pci/drivers/nvidia_vgpu_vfio/bind12:07
pvc_[root@overcloud-novacompute-0 nova]# lsmod | grep nou [root@overcloud-novacompute-0 nova]#12:07
pvc_yes bauzas12:07
pvc_tee: /sys/bus/pci/drivers/nvidia/unbind: No such device12:08
*** spatel has quit IRC12:08
pvc_I install this driver12:09
pvc_NVIDIA-Linux-x86_64-390.72-vgpu-kvm.run12:09
bauzasI need to drop, planned outage here12:09
sean-k-mooneypvc_:  from http://paste.openstack.org/show/732807/ that should have been the driver in use12:10
pvc_i use this 0000:06:00.012:10
pvc_tee: /sys/bus/pci/drivers/nvidia_vgpu_vfio/bind: No such file or directory12:10
pvc_it is already unbind12:10
pvc_no nvidia_vgpu_vfio on drivers12:11
pvc_just nvidia12:11
*** janki has joined #openstack-nova12:14
sean-k-mooneypvc_: this is the latest verion of the nvdia vgpu user guide https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf i think you need to back through it and section 4.2 specifically12:14
pvc_sean-k-mooney im using a ubuntu image with img_hide_hypervisor_id='true'12:14
sean-k-mooneypvc_: i asked thi earliar but is the compute node a phyical server or a vm12:15
pvc_compute node is a physical server12:15
pvc_on docs it said the grid driver12:15
pvc_but this is the driver i installed12:15
pvc_installed NVIDIA-Linux-x86_64-390.72-vgpu-kvm.run12:16
pvc_i have this grid driver NVIDIA-Linux-x86_64-390.75-grid.run12:16
sean-k-mooneypvc_: oh ok i think i understand the issue then12:17
sean-k-mooneyyou install the guest driver on the host12:17
pvc_yes sean-k-mooney12:17
pvc_to enable this12:17
pvc_ /sys/class/mdev_bus/*/mdev_supported_types12:17
pvc_i install the driver on my compute node ( baremetal server ) im using a tripleo-deployment12:18
*** tbachman has joined #openstack-nova12:20
aperevalovhello, do nova or neutron has functional test for direct (SR-IOV) port (something like tempest test)?12:22
pvc_sean-k-mooney what will i do then?12:22
sean-k-mooneyaperevalov: i dont belive so in upstream tempest. neutron may have fullstack test but ingerneral our sriov testing is limited12:23
*** abhi89 has quit IRC12:23
*** tbachman_ has joined #openstack-nova12:24
*** cdent has joined #openstack-nova12:24
sean-k-mooneypvc_: this seams to be a driver issue not a nova one. there is little more advice i can give other then they to follow the nvida docs form start to finish exactly12:24
aperevalovsean-k-mooney: I assume, if such functional test exists it uses real HW with SR-IOV, but not simulators or emulators.12:25
*** tbachman has quit IRC12:25
*** tbachman_ is now known as tbachman12:25
pvc_so you think i installed the grid one?12:25
*** janki has quit IRC12:25
pvc_my baremetal is centos 712:25
sean-k-mooneyaperevalov: so the fact we can use simulator/emulators for sriov is the main reason we have such limited testing12:25
*** janki has joined #openstack-nova12:25
sean-k-mooneypvc_: yes you need to install the grid one12:25
sean-k-mooneyat least i think so.12:26
pvc_okay wait12:26
sean-k-mooneyaperevalov: i recently looked at using the netdevsim kernel module for sriov testing but it only emulates the kernel netdevs it does not emulate the pci devices or virtual function so we cant use it for testing12:27
*** janki has quit IRC12:27
aperevalovsean-k-mooney: I checked netdevsim on the latest kernel too, there are no device to put it into docker container. Docker container it's because I'm doing such research for kuryr-kubernetes.12:28
aperevalovsean-k-mooney: yes netdevsim is based on its own bus, but not pci. I also found attemp to submit to QEMU's pci emulation the SR-IOV support.12:29
sean-k-mooneyaperevalov: yes so there is a netdev but no pci device on the virtual pci device. so you can use ip link and allocate vfs via sysfs but they dont show up on the virtual pci bus12:30
sean-k-mooneyaperevalov: yes i have see that in the past but that wont help us with testing as we would need the hosting vms provided by the could providers to emulate pci device that support sriov12:31
aperevalovsean-k-mooney: but it was postponed due to lack of existing working qemu drivers, initial author suggested copied e1000, but it wasn't in working condition.12:31
*** udesale has joined #openstack-nova12:31
sean-k-mooneyaperevalov: so first qemu would have to be extended then libvirt and then nova. once that is done we would need cloud providers to proved devices with virt sriov capable nics then we could start using it in the upstream gate12:32
sean-k-mooney*provided vms with ...12:32
sean-k-mooneyaperevalov: effectivly we rely on thirdpart cis to test sriov. either intels or melonox's ci12:33
sean-k-mooneyintels ci was intented to have signifcatily more sriov testing then it currently has but that never happened12:33
*** janki has joined #openstack-nova12:34
aperevalovsean-k-mooney, I see it's a long way, and seems netdevsim (just kernel module) looks like easiest (if kernel community will be agreed to bind it with pci bus).12:34
sean-k-mooneyaperevalov: most people dont know that sr-iov is a specificaiton from the PCI-SIG i dont think the current netdevsim moudle is technically a confroming sriov implentaiton without the pci emulation12:35
pvc_sean-k-mooney ls: cannot access /sys/class/mdev_bus/*/mdev_supported_types: No such file or directory12:36
sean-k-mooneyaperevalov: that said its a netdev simulator not a sriov simulator so they skipped the bits they did not need for there own testing12:36
*** markvoelker has joined #openstack-nova12:36
pvc_i reboot again to reflect the new driver12:36
*** markvoelker has quit IRC12:37
*** jchhatbar has joined #openstack-nova12:38
aperevalovsean-k-mooney: ok, I'll talk with authors of netdevsim. Thank you for information. BTW is intel or mellanox ci is publicly available. Or that work is going through their teems involved into openstack community?12:38
*** jamesdenton has joined #openstack-nova12:39
*** janki has quit IRC12:40
*** jpena|lunch is now known as jpena12:40
aperevalovsean-k-mooney: I think moshele knows it.12:40
*** brinzh has quit IRC12:40
pvc_hi sean-k-mooney12:41
mosheleaperevalov: I know what?12:41
pvc_after installign the grid driver the mdev is gone12:41
sean-k-mooneyso i used to work at intel and one of my roles there was product owner of the intel nfv ci. it used to fall to the upstream teams to identify which features needed ci testing and either add it or request that the team maintaining it add it to there backlog12:41
pvc_ls: cannot access /sys/class/mdev_bus/*/mdev_supported_types: No such file or directory12:41
pvc_sean-k-mooney any ideas? ls: cannot access /sys/class/mdev_bus/*/mdev_supported_types: No such file or directory12:41
pvc_after installing the grid driver only not the vgpu12:42
sean-k-mooneyim not sure how the intel nfv ci currently works but if you reach out to the new maintianer im sure they will respond12:42
mosheleaperevalov: The Mellanox CI is public but not part of the openstack community. and I think intel CI is the same, because it depend on nic vendor12:45
sean-k-mooneypvc_: sorry not really. as i said this seams to be an nvidia driver issue. i dod not have acess to the hardware to test it my self so beyond reading there docs there is not much more advice i can give12:45
pvc_but sean-k-mooney when i install the kvm-vgpu i can list the mdev12:46
aperevalovmoshele: does it use in gerrit integration tests by zuul?12:47
mosheleaperevalov: we run tempest scenarios and configure the tempest to use vnic_type=direct12:48
mosheleaperevalov:  see https://github.com/openstack/tempest/blob/master/tempest/config.py#L62812:49
sean-k-mooneymoshele: oh when was that option added?12:50
openstackgerritStephen Finucane proposed openstack/nova stable/rocky: fixtures: Track volume attachments within CinderFixtureNewAttachFlow  https://review.openstack.org/61248512:50
sean-k-mooneythat is useful to know about12:50
openstackgerritStephen Finucane proposed openstack/nova stable/rocky: Add regression test for bug#1784353  https://review.openstack.org/61248612:50
openstackgerritStephen Finucane proposed openstack/nova stable/rocky: conductor: Recreate volume attachments during a reschedule  https://review.openstack.org/61248712:50
mosheleaperevalov: long time ago12:50
moshelesean-k-mooney: long time ago12:50
*** jchhatbar is now known as janki12:50
pvc_vfio_iommu_type1.allow_unsafe_interrupts=112:51
pvc_sean-k-mooney vfio_iommu_type1.allow_unsafe_interrupts=1?12:51
sean-k-mooneymoshele: cool aperevalov the intel ci does somthing similar12:51
sean-k-mooneyaperevalov: the intel ci uses the standard senario test but addes extraflaovr extraspecs for cpu pinning hugepages numa toplogy exctra12:52
aperevalovsean-k-mooney: If I trully understood, kuryr-kubernetes (tempest test) can also be running there?12:52
sean-k-mooneypvc_: are you getting a message in dmesg?  vfio_iommu_type1.allow_unsafe_interrupts=1 is specificaly for working around old buggy hardware12:52
sean-k-mooneyaperevalov: the intel nfv ci does not load the kuryr-kuberntese tempest module or deploy tempetst12:53
sean-k-mooneyat least it didnt when i was invovled with it12:53
pvc_Oct 23 12:53:40 overcloud-novacompute-0 journal: 2018-10-23 12:53:40.708+0000: 3128: warning : virDomainAuditHostdev:424 : Unexpected hostdev type while en12:53
pvc_Oct 23 12:53:40 overcloud-novacompute-0 journal: libvirt: QEMU Driver error : Requested operation is not valid: domain is not running12:53
sean-k-mooney* or deply kuryr-kubernetes12:53
sean-k-mooneypvc_: i dont really have time to contiue debugging sorry. i need to update some review and catch up on spec review today12:54
pvc_2018-10-23 12:41:10.331+0000: 3216: error : virPCIDeviceNew:1787 : Device 0003:01:05.1 not found: could not access /sys/bus/pci/devices/0003:01:05.1/config12:56
pvc_virPidFileAcquirePath:422 : Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable12:57
pvc_sean-k-mooney there  is an issue on my nova_libvirt13:02
sean-k-mooneypvc_: ok but that is not an nova issue. its a either a libvirt or a docker/triplo issue assuming you can acess /sys/bus/pci/devices/0003:01:05.1/config from the host.13:05
* jroll glances at channel topic13:06
pvc_there is no 0003:01:01.1 sean13:06
*** bnemec has joined #openstack-nova13:08
*** psachin has quit IRC13:10
*** cdent has quit IRC13:12
*** cdent has joined #openstack-nova13:14
pvc_sean-k-mooney is libvirtd not running is not an issue?13:15
efriedbauzas: https://review.openstack.org/#/c/612497/ <== provider config yaml file, split out from the device passthrough spec (with some of jaypipes' Rocky provider config file mixed in)13:17
bauzasefried: ack13:19
bauzasI have some planned outage this EU afternoon hence me being a bit afk but will look later tonight13:19
pvc_bauzas  Failed to acquire pid file '/var/run/libvirtd.pid': Resource temporarily unavailable :(13:20
pvc_bauzas fio error: cad68f60-930c-4d9b-b954-3e0cd855651e: error getting device from group 58: Input/output error13:20
pvc_anyone can help?13:33
*** mriedem has joined #openstack-nova13:34
*** adrianc_ has quit IRC13:35
*** adrianc has quit IRC13:35
*** boden has joined #openstack-nova13:42
*** awaugama has joined #openstack-nova13:42
*** boden has left #openstack-nova13:42
*** liuyulong has joined #openstack-nova13:48
*** mlavalle has joined #openstack-nova13:52
pvc_hi sean-k-moone do i need to hide the hypervisor of the image?13:53
pvc_hi sean-k-mooney   do i need to hide the hypervisor of the image?13:53
alex_xujaypipes: for https://review.openstack.org/#/c/555081, are you saying that the user must specify guest numa topology when using resources:PCPU=1 or resources:VCPU=113:53
sean-k-mooneyalex_xu: im not suer if cpu pinning auto creates a numa toplogy today but it is does not its one of the few numa specifc things that does not13:54
pvc_sean-k-mooney i have an error on my XML13:56
pvc_2bf12bf5 - default default] Error launching a defined domain with XML: <domain type='kvm'>13:56
alex_xusean-k-mooney: yes, I also think that. If the flavor doesn't include any guest numa topo, then we will get a None value for the InstanceTopologyObj. But jaypipes still want to use InstnaceTopology to store the cpu pinning. that is my confuse.13:57
sean-k-mooneyalex_xu: well cpus have numa affintiy so i would be fine with saying if your request pinning you now have a numa toploy of 1 numa node for the vm unless you set a numa toploygy explcitly13:58
sean-k-mooneyalex_xu: we do this for hugepages13:58
sean-k-mooneypersonly i have normally argued against that but we have too much presdent to change it at this point13:59
openstackgerritDan Smith proposed openstack/nova master: Make CellDatabases fixture reentrant  https://review.openstack.org/61166513:59
openstackgerritDan Smith proposed openstack/nova master: Modify get_by_cell_and_project() to get_not_deleted_by_cell_and_projects()  https://review.openstack.org/60766313:59
openstackgerritDan Smith proposed openstack/nova master: Minimal construct plumbing for nova list when a cell is down  https://review.openstack.org/56778513:59
openstackgerritDan Smith proposed openstack/nova master: Refactor scatter-gather utility to return exception objects  https://review.openstack.org/60793413:59
openstackgerritDan Smith proposed openstack/nova master: Return a minimal construct for nova show when a cell is down  https://review.openstack.org/59165813:59
openstackgerritDan Smith proposed openstack/nova master: Return a minimal construct for nova service-list when a cell is down  https://review.openstack.org/58482913:59
pvc_is that related sean-k-mooney you think?13:59
sean-k-mooneypvc_: i dont know but im busy with 3 other things. i do not have time to help futher sorry.14:00
*** edmondsw has joined #openstack-nova14:00
alex_xusean-k-mooney: yea, that should be ok, that is just a clarify I ask on the spec, since it isn't clear about that14:01
*** cdent has quit IRC14:01
sean-k-mooneyalex_xu: for what its worth the free cpus are already tracked in the numa toployg blob in the nova db so i dont hink jay was proposing changing that14:02
alex_xujaypipes: ^ probably that is what I'm asking, are you plan to change the guest without numa topo to single numa cell topo14:02
jaypipesalex_xu: *currently* there is no way for a user to get pinned CPUs without the instance_extra.numa fields containing a serialized blob of InstanceNUMATopology object.14:03
jaypipesalex_xu: because, as you know, we couple the CPU pinning, memory page and NUMA topology stuff all together in the InstanceNUMATopology object :(14:03
sean-k-mooneyjaypipes: do we currently invent a singel numa node topology today. its been to long since i looked at the details of that code to rember that off the top of my head14:03
jaypipesalex_xu: the cpu-resource-tracking spec proposes absolutely no changes to any of that.14:04
*** panda|lunch is now known as panda14:04
sean-k-mooneyjaypipes: i  have you spec on my list to review but i assumed we would still contiue to do whatever we do today on that front14:04
jaypipessean-k-mooney: mriedem has basically shot down the possibility of cpu-resource-tracking happening in stein anyway, so I haven't been spending much time on it. :(14:05
alex_xujaypipes: yes, but your spec didn't say I must specify HW:NUMA_xxx stuff when I use resources:PCPU=114:05
mriedemonce again i am the killer of all hopes and dreams and kittens14:05
jrollalways and forever14:06
mriedemif others think we can pull that change off in stein and are planning on devoting review time to it, then go nuts14:06
jaypipesalex_xu: I was asked by bauzas to take all NUMA stuff out of the cpu-resource-tracking spec so he could address it in his numa spec.14:06
*** Luzi has quit IRC14:06
alex_xujaypipes: and what does mean for CONF.shared_cpu_set, it is for the VCPU will pinning to the CPU set of CONF.shared_cpu_set, and then I must specify HW:NUMA_xxx with resources:VCPU=1?14:06
jaypipesalex_xu: I don't understand your question. could you rephrase?14:08
alex_xulet me try :)14:08
jaypipesalex_xu: CONF.cpu_shared_set already exists, btw14:08
sean-k-mooneyalex_xu: jaypipes so jsut looking at https://github.com/openstack/nova/blob/297de7fb9fbabe74b5305ef0aa82e196d5f48d5e/nova/virt/hardware.py#L1543-L1554 we create a singel node numa toplogy for the guest today if using pinning unless you say otherwise14:09
jaypipessean-k-mooney: right, because we've coupled CPU pinning and NUMA together into the InstanceNUMAToplogy object.14:09
sean-k-mooneyalex_xu: so if you just set resources:PCPU=1 then i would assume we would create a singel numa toplogy14:09
*** moshele has quit IRC14:10
sean-k-mooneyjaypipes: ya. if you want to decouple them and fix it im happy with that idea too14:10
jaypipessean-k-mooney: mriedem would never approve such a gigantic change. :P14:10
alex_xusean-k-mooney: no, we return early at https://github.com/openstack/nova/blob/297de7fb9fbabe74b5305ef0aa82e196d5f48d5e/nova/virt/hardware.py#L1538, actualy it is NOne14:10
sean-k-mooneyalex_xu: that for shared cpus14:11
jaypipesalex_xu: resources=VCPU:1 does not equal cpu_policy:shared14:11
sean-k-mooneypinned cpus have cpu_policy==dedicated14:11
jaypipesalex_xu: just another example of terrible coupling in this interface :(14:12
jaypipesif cpu_policy == fields.CPUAllocationPolicy.SHARED:14:12
jaypipes^^ that is not the same as resources=VCPU:114:12
alex_xujaypipes: your spec is about decouple cpu pinning and numa. so conf.cpu_shared_set defined the pcpus which the shared VCPU is running. if the conf.cpu_shared_set=7-15, dose it means nova-compute will pin the guest vcpus to the physical cpu 7 to 15?14:12
sean-k-mooneyalex_xu: it will float them over that rage14:13
sean-k-mooney*range14:13
sean-k-mooneybut it wont 1:1 pin the shared cpus14:13
jaypipesalex_xu: no, my spec is not about decoupling CPU pinning and NUMA... my spec is about handling the allocation of dedicated CPUs in a deliberate way. My spec does not touch assignment of host processor to guest vCPU thread, which is what you are referring to.14:13
sean-k-mooneythat will be left to the kernel14:13
alex_xusean-k-mooney: jaypipes so for the request resources:VCPU=1, this VCPU sitll can running on the pcpu which defined in conf.cpu_dedicated_set...14:15
sean-k-mooneyalex_xu: with jays spec no. if i rememebre correctly jay was proposeing depercating the hw:cpu_policy extra spec and vcpu would courrespond to the shared set and PCPU reousfce woudl be from dedicated set14:16
jaypipesalex_xu: if the virt driver isn't changed to assign one of the dedicated host CPUs, yep. But from placement (and resource tracking) perspective, we don't care about that. All we care about is that some amount of dedicated (or shared) CPU resources are being deducted from the appropriate inventory of that class of resource (either VCPU or PCPU)14:17
sean-k-mooneyalex_xu: jays spec is basicaly discribing how we will keep a tally count of PCPU and VCPU in placement14:18
sean-k-mooneythe asiginment of vms to host dedicated or shared cpu sets will be handeled by the virt driver not placment using the exisitng numa toplogy blob in the nova db as we do today14:19
sean-k-mooneyplacement will just make sure we have enough cpus to fulltile the request without tracking which ones are free14:19
sean-k-mooneythats the virt driver/ resouce trakers jobs14:19
jaypipessean-k-mooney: and yes, you're right that my spec proposes deprecating the cpu_policy extra spec.14:19
sean-k-mooneyi think i left a comment about may using it to translate the flavor VCPU filed into resouces:VCPU=X or resources:PCPU=x to ease transition but long term it would nolonger be needed14:21
alex_xuah....I probably I see...give me more seconds...14:21
*** janki has quit IRC14:21
mriedemis tpatil intel?14:24
mriedemoh NTT14:24
*** tiendc has quit IRC14:24
openstackgerritArtom Lifshitz proposed openstack/nova stable/rocky: Move live_migration.pre.start to the start of the method  https://review.openstack.org/61271414:25
openstackgerritArtom Lifshitz proposed openstack/nova stable/rocky: Ensure attachment cleanup on failure in driver.pre_live_migration  https://review.openstack.org/61271514:25
artommriedem, ^^ it has begun *dun dun dun*14:25
pvc_hi anyone14:26
pvc_how can i remove a pci devices?14:26
pvc_nova_libvirt searching for it but it is not existing14:26
*** cdent has joined #openstack-nova14:27
openstackgerritMatthew Booth proposed openstack/nova master: Fix test bug when host doesn't have /etc/machine-id  https://review.openstack.org/61271714:28
*** alexchadin has quit IRC14:30
openstackgerritMatthew Booth proposed openstack/nova master: Add regression test for bug 1550919  https://review.openstack.org/59173314:32
openstackbug 1550919 in OpenStack Compute (nova) "[Libvirt]Evacuate fail may cause disk image be deleted" [Medium,In progress] https://launchpad.net/bugs/1550919 - Assigned to Matthew Booth (mbooth-9)14:32
alex_xusean-k-mooney: jaypipes with that spec, the request with resources:PCPU=1 and without any HW:NUMA_.. stuff, that vcpu is also floating on all the pcpus?14:32
alex_xusince that spec is only about the counting pcpu and vcpu...14:33
openstackgerritMatthew Booth proposed openstack/nova master: Don't delete disks on shared storage during evacuate  https://review.openstack.org/57884614:33
*** medberry has joined #openstack-nova14:34
mriedempvc_: just fyi, today is a spec review sprint in nova so most people are busy with that. you could try asking your questions in the #openstack or #openstack-operators channels. for pci passthrough questions i'd normally direct you to sahid or cfriesen or moshele but none of them are online right now.14:35
mriedemi'd also think that excluding the pci devices you don't want to expose from https://docs.openstack.org/nova/latest/configuration/config.html#pci.passthrough_whitelist would work, but i don't know a lot about that code14:35
mriedempvc_: you could also post a question to the openstack-dev mailing list14:35
mriedemif this is a common problem and we don't have documentation for it, then we should have a docs bug14:36
pvc_thank you so much14:36
pvc_i will do that14:36
mriedemyw14:36
jaypipesalex_xu: yes14:38
jaypipesalex_xu: since the hw:numa_xxx tags are currently the only way to trigger any pinning behaviour.14:39
sean-k-mooneywell not quite you can use hw:cpu_policy14:39
alex_xujaypipes: ok, i see now, then in the future, we want that case work correctly, right?14:39
jaypipesalex_xu: and since my spec doesn't propose any changes to that, then the existing behaviour if an instance does not have the hw:numa_xxx specs means its vCPU threads float over whatever host processors are in CONF.cpu_shared_set.14:39
alex_xusean-k-mooney: yea, without hw_cpu_polciy also14:40
sean-k-mooneyalex_xu: you should assume that if you have resouce:PCPU=X the virt drive will pin those cores but how it does that is not really realted to jays spec14:40
jaypipessean-k-mooney: you should NOT assume that.14:40
sean-k-mooneyjaypipes: why that was the prerequeit for deprecating hw:cpu_policy14:41
jaypipessean-k-mooney: the only thing that guarantees assignment to a particular host CPU is the presence of hw:numa_xxx specs14:41
sean-k-mooneythe hw:numa_xxx specs today do not do that14:41
jaypipessean-k-mooney: that is a change that the virt driver will need to make, yes. but that change isn't part of my spec...14:41
*** hamdyk has quit IRC14:43
alex_xuso...probably we need to doc at somewhere for the user, resources:PCPU doesn't means you get a dedicated cpu for your guest...14:43
sean-k-mooneyalex_xu: if we deprecated hw:cpu_policy we dont need to because it will. if we dont then yes14:44
sean-k-mooneyi guess we should document it in either case but the point being that if the only way to remove hw:cpu_policy is to either make resources:PCPU mean the virt driver will pin you or add a trait for pinned cpus but that seams dumb14:47
sean-k-mooneyfrom a placement point of view it does not care care if you are pinned or not14:48
sean-k-mooneyits jsut a resouce class the fact that we are giving it special semantic is a nova thing not placement14:48
alex_xusean-k-mooney: I see now14:49
*** ccamacho has quit IRC14:57
*** hamzy has quit IRC15:04
melwitt15:14
*** ccamacho has joined #openstack-nova15:17
*** ccamacho has quit IRC15:17
*** k_mouza has quit IRC15:18
*** k_mouza has joined #openstack-nova15:18
alex_xusean-k-mooney: jaypipes thanks for helping me understand correctly, I see now. I leave my to 0 now, still not sure we let resources:PCPU works as that, that confuses for the end user. maybe we should set cpu policy to dedicated in numa implement when only have resources:PCPU, but we deprecate and remove the cpu_policy extra spec. so not sure15:24
mriedemKevin_Zheng: the detach/attach root volume on stopped instance spec might have applications in the rescue a volume-backed instance spec https://review.openstack.org/#/c/532410/15:24
mriedemjust FYI15:24
* alex_xu ends his spec review day...15:24
mriedemalex_xu: o/15:24
alex_xumriedem: enjoy~15:25
mriedemoh you know i will :)15:27
openstackgerritMatt Riedemann proposed openstack/nova master: Migrate "reboot an instance" user guide docs  https://review.openstack.org/61273015:28
*** dtantsur is now known as dtantsur|brb15:32
*** hamzy has joined #openstack-nova15:38
*** mvkr has quit IRC15:39
*** liuyulong is now known as liuyulong|away15:41
pvc_hi guys can i ask? So if ever i want to use vgpu i can just use one vgpu on enabled_vgpu_types = nvidia-35, then how about its performance if i launch 5 instance on that flavor?15:41
*** ivve has joined #openstack-nova15:42
mriedembauzas: ^15:44
melwittdansmith: could you take a look at this bug fix for a wrong pike compute rpc api alias causing problems with rolling upgrade pike => queens? https://review.openstack.org/612231 needs a +W15:44
pvc_is there a way that i can use all the vgpus of my gpu considering this on docs As of the Queens release, Nova only supports a single type. If more than one vGPU type is specified (as a comma-separated list), only the first one will be used.15:49
dansmithmelwitt: yep done, looks legit15:53
melwittdansmith: ty15:54
*** helenafm has quit IRC15:56
melwittpvc_: I can't comment on the performance but the limitation on vGPU type means having multiple enabled_vgpu_types on one compute host. at present, you can't have more than one type on the same compute host15:56
melwittwe are working on adding support for multiple types this cycle15:57
pvc_thank you melwitt, i can launch many instance on flavor with gpu resources but im worried on the performance since we will install an gpu application on it.15:58
sean-k-mooneypvc_: be aware that supporting vGPU types on the same host will not enable multipel gpus to be consumed by a singel vm16:02
pvc_i login to instance then query the nvidia, may i know if this is the driver that is need to be show? 0:05.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:15f8] (rev a1)16:03
*** tbachman has quit IRC16:04
openstackgerritJan Gutter proposed openstack/os-vif master: Extend port profiles with datapath offload type  https://review.openstack.org/57208116:05
*** udesale has quit IRC16:06
*** tbachman has joined #openstack-nova16:06
*** mrch has quit IRC16:09
*** ttsiouts has quit IRC16:16
*** gyee has joined #openstack-nova16:16
*** ttsiouts has joined #openstack-nova16:17
*** ttsiouts has quit IRC16:21
*** cdent has quit IRC16:33
melwittdansmith: I've been meaning to ask you if you could review our cycle priorities doc where I've written down themes https://review.openstack.org/60980716:36
pvc_melwitt but it is possible to launch many instance on that flavor even if it just only one right?16:37
*** pvradu_ has quit IRC16:37
melwittpvc_: yes, it is. it's just that all the instances on the same compute host will have to use the same gpu type, for example nvidia-3516:38
*** cdent has joined #openstack-nova16:38
*** moshele has joined #openstack-nova16:38
dansmithmriedem: especially given it's spec day, could you circle back to this at some point? https://review.openstack.org/#/c/609709/316:38
mriedemdo i have to?16:39
pvc_so that is an issue on performance, so if ever i have 24 gpus on my compute node i can only use 1 right?16:39
dansmithmelwitt: ack16:40
*** k_mouza has quit IRC16:40
dansmithmriedem: no, but you were the last to -1 it, I asked a question which you never answered, and jroll has updated it16:40
mriedemok i hadn't seen, nor was looking16:41
mriedemi didn't pay attention to any of this at the PTG, so if there are more details from the ptg on alternatives and such, those should be in the spec, including caveats about what happens if the conductor group for a node is updated (as i had several questions about that)16:42
mriedemit sounds like, well that might work automagically, or it might not, shrug16:42
mriedemand i guess we don't want anything in the hypervisors API for this because it would be too ironic specific...16:44
melwittpvc_: hm, looks like you're right, it says only one vGPU per instance. I don't know why that is limited https://docs.openstack.org/nova/latest/admin/virtual-gpu.html#configure-a-flavor-controller16:44
melwittbauzas ^16:44
sean-k-mooneymelwitt: its limited because libvirt cannot support multiple vgpus on a singel instance currently16:45
melwittsean-k-mooney: thanks16:45
pvc_yes but the problem is we launch instance it will use the same vgpu enabled on the nova.conf, is there no way to use the another vgpu?16:45
sean-k-mooneypvc_: that is not how that works16:46
pvc_it's just for sharing sean-k-mooney?16:46
sean-k-mooneywhat you are enabling in the nova.conf is the type of vgpu16:46
dansmithmriedem: there are probably sequencing recommendations for when you change the mappings, but I think those are docs and don't need to be in the spec in great detail, IMHO. I don't think the behaviors are really any different than just spinning up a new or stopping a hashring-balanced compute today16:46
pvc_so if i use nvida-21116:46
sean-k-mooneyyour are not enabvling a specific mdev instnce jsut the type of mdev that will be allocated when a vm requests a vgpu16:47
pvc_so in terms of performance it is okay sean-k-mooney?16:47
dansmithmriedem: if there's more that needs to be here, let's tell them what that is and let them move on16:47
sean-k-mooneypvc_: mdevs are not shared between instances. so the perfromance of the vgpus will be determined by the mdev type you select16:48
pvc_so if ever i have 3 mdev types ( nvidia-1, nvidia-2, nvidia-3), i use the nvidia-1 on nova.conf. What will happen to the 2 mdev types?16:48
sean-k-mooneyhigher perfroamce mdev types consume more physical resouce and allow less vgpus to be createed so its a tradeoff16:49
sean-k-mooneyif you set nvidia-1 the nova will only create mdevs of the nvida-1 type16:49
sean-k-mooneythere are stict rules about if and how mdev types can be mixed16:50
sean-k-mooneythey are vendor specific so initally we chose not to allow mixing them16:50
pvc_is that okay to be use for many instances?16:51
sean-k-mooneyyes we will limit you to the number of mdevs that the enabled type reporst it can allocate16:51
sean-k-mooneythere no over subsiptoin or sharing of vgpus16:52
pvc_so if ever i have 24 mdev types i can launch 24 instance, correct me if im wrong.16:52
sean-k-mooneypvc_: no16:53
sean-k-mooneyan mdev type is like a nove flavor. its corresponed to a set of phyical resouces form the gpu. in the nvida case its a set of  max reolutions, cuda cores, and vRAM16:54
sean-k-mooneythis is all detailed in the nvidia documentation16:54
pvc_thank you sean-k-mooney16:55
pvc_so if i have a chance to launch an instance is it okay and nova will tell me if ever i cannot allocate omre16:55
sean-k-mooneyyes you will get a no valid host error if all vpus have been consumed16:55
sean-k-mooneynova will read the number of allication that can be made form the mdev-type specified by the nova.conf and report that to placement16:56
pvc_i understand now, it is okay if you only specifiy one enabled_vgpu_types on nova.conf. But by default it will consume the mdev types. Thank you :)16:57
pvc_But by default it will consume the all mdev types*16:57
pvc_am i right?16:57
sean-k-mooneypvc_: no if you dont specyu an enabled vgpu type the no vgpus will be consumable16:58
pvc_i already specify one enabled ( example nvidia-160)16:58
*** derekh has quit IRC17:00
pvc_thank you sean-k-mooney. :)17:00
sean-k-mooneypvc_: you should look at https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#vgpu-types-tesla-p100 to determin how your tesla P100 can be subdevied17:01
pvc_noted on this. thanks for your help. :)17:03
*** pvradu has joined #openstack-nova17:04
openstackgerritMerged openstack/nova-specs master: Document Stein review priorities  https://review.openstack.org/60980717:07
*** auggy has joined #openstack-nova17:08
*** pvradu has quit IRC17:09
*** hamzy has quit IRC17:12
*** hamzy has joined #openstack-nova17:12
*** dtantsur|brb is now known as dtantsur17:13
melwittdtantsur: I've been meaning to ask you a question about this ironic bug we fixed around RC time https://bugs.launchpad.net/nova/+bug/1787910 in comment #2 you mentioned the regression broke the ironic-inspector CI upstream. do you happen to know why the ironic-tempest-dsvm-ipa-wholedisk-bios-agent_ipmitool-tinyipa job we run in nova did not break the same way?17:17
openstackLaunchpad bug 1787910 in OpenStack Compute (nova) rocky "OVB overcloud deploy fails on nova placement errors" [High,Fix committed] - Assigned to Matt Riedemann (mriedem)17:17
dtantsurmelwitt: looking17:17
*** lbragstad is now known as lbragstad_f00d17:17
melwittthanks. I don't know the differences between the ironic-inspector job and our job17:18
dtantsurmelwitt: okay, so tripleo got broken because it still used disk/ram filters with ironic17:18
melwitttrying to learn if there's a test coverage gap we can close to catch more issues17:18
dtantsuras to inspector, I don't remember why exactly I mentioned that. but there is at least one difference17:18
dtantsurironic CI does not set vcpus/memory_mb on nodes, which leads to them not exposed to nova for some time17:19
dtantsurironic-inspector, as part of its functioning, discovers these properties and sets them17:19
sean-k-mooneydtantsur: i was under the impression we ended up keeping the disk/ram filter for that reason in rocky and it was going to be fixed in ironic this cycle17:19
dtantsursean-k-mooney: I don't think it was for this reason, but I may be missing something17:20
dtantsurfor ironic these properties (and filters) are optional since.. mmm.. pike? maybe queens17:20
melwittdtantsur: thanks for the pointer17:21
dtantsurnp17:21
sean-k-mooneydtantsur: we defintly had a bug in the RC period related to them during rocky release. mriedem do you remeber what the bug related to ironc and the disk/ram filter was?17:21
dtantsursean-k-mooney: the link melwitt posted above? yeah, essentially disk/ram filter stopped working for ironic. since nobody cares about this case any more, we just ended up fixing tripleo to not enable these filters.17:22
dtantsurno further investigation was done IIUC17:22
sean-k-mooneydtantsur: maybe not sure17:23
melwittyeah, looks like it, use of the filters resulted in NoValidHost because of the bug17:24
melwittso I probably the failure to update_available_resource was showing up in our ironic job, but was hidden because things otherwise worked (without ram/disk filters). which seems like it would be unexpected17:25
sean-k-mooneyi think the only reason to use the disk/ram filter today would be if you were using the caching scheduler since it does not use palcement17:25
*** pvc_ has quit IRC17:26
dtantsuryep17:27
*** dtantsur is now known as dtantsur|afk17:29
openstackgerritsean mooney proposed openstack/nova-specs master: Add spec for sriov live migration  https://review.openstack.org/60511617:31
melwittlooks like there were two different bugs in this bug. one was the update_available_resource fail (which wasn't caught by any CI) and then the core/ram/disk filter problem which was "unfixable", that is, only way out was to stop configuring deployments to use the filters17:31
melwittit just so happened that because tripleo CI was failing because it was using the core/ram/disk filters, they also noticed the update_available_resource failure in the logs17:33
sean-k-mooneybrb17:33
*** jpena is now known as jpena|off17:34
openstackgerritMerged openstack/nova-specs master: Detach and attach boot volumes - Stein  https://review.openstack.org/60062817:34
*** pvradu has joined #openstack-nova17:42
*** lbragstad_f00d is now known as lbragstad17:45
mriedemefried: do you/anyone care about this anymore? https://review.openstack.org/#/c/560174/ it was mostly just historical documentation right?17:52
*** pvradu has quit IRC17:53
*** ralonsoh has quit IRC17:57
*** hamzy has quit IRC17:59
*** hamzy has joined #openstack-nova18:00
*** moshele has quit IRC18:12
*** medberry has quit IRC18:18
*** hamzy has quit IRC18:23
*** hamzy has joined #openstack-nova18:24
efriedmriedem: I'm not sure. The information is probably still useful. I guess the last sentence would need to be updated to reflect how we actually solved it. Swhy I hadn't abandoned it yet.18:27
openstackgerritSundar Nadathur proposed openstack/nova-specs master: Nova Cyborg interaction specification.  https://review.openstack.org/60395518:27
openstackgerritArtom Lifshitz proposed openstack/nova stable/queens: Move live_migration.pre.start to the start of the method  https://review.openstack.org/61277318:33
openstackgerritArtom Lifshitz proposed openstack/nova stable/queens: Ensure attachment cleanup on failure in driver.pre_live_migration  https://review.openstack.org/61277418:33
efriedmriedem: abandoned18:35
melwitthuh, looks no longer possible to filter logstash by n-cpu log type only. I guess not enough people used it18:35
*** irclogbot_4 has joined #openstack-nova18:35
artomHuh, stable/pike is going to be very problematic for ^^18:37
mriedemmelwitt: tags:"screen-n-cpu.txt"18:38
melwittmriedem: filters out my result when I do18:38
artomOh wait, our downstream bug is OSP13/queens, so we're good18:39
melwittoh wait18:39
melwittmriedem: user error, my bad18:39
edleafecdent: ^^ Got all the tests passing locally \o/18:40
edleafedoh! fat fingers ^^18:40
melwittI want to create an e-r query and I'm rusty18:41
mriedemwell, there are specs to be reviewed if you wanted to do that instead :)18:41
melwittI'm doing that too18:41
melwittI thought this would be quicker than it's being18:42
*** panda has quit IRC18:45
*** panda has joined #openstack-nova18:45
melwittbah, there's already a query for this. just e-r hasn't posted anything on it18:49
mriedemjroll: you might want to take a gander at this ironic volume-backed resize/cold migrate spec https://review.openstack.org/#/c/449155/18:51
mriedemi haven't been through it in quite awhile18:51
mriedembut i'm also not very ironically inclined18:51
*** cfriesen has joined #openstack-nova18:52
mriedemseems that tuba guy would also care about this18:52
*** pvc has joined #openstack-nova18:53
pvcsean-k-mooney suddenly root@test-vgpu:/home/ubuntu# nvidia-smi No devices were found18:53
pvcso sad18:53
cfriesensince it's spec review day, I'd appreciate some eyes on https://review.openstack.org/#/c/571111/19:01
cfriesen(the emulated TPM spec)19:01
pvcnervermind me19:02
cfriesenI think the open questions are whether we want to support CRB at this point (and if so how to ask for it), and whether we need to explicitly call out what happens for non-x86 architectures or leave that for the implementation.19:04
*** edleafe_ has joined #openstack-nova19:15
*** edleafe_ has quit IRC19:16
*** pcaruana has quit IRC19:23
*** david-lyle has joined #openstack-nova19:27
*** dklyle has quit IRC19:28
*** lbragstad has quit IRC19:50
*** lbragstad has joined #openstack-nova19:53
*** david-lyle is now known as dklyle19:53
mriedemjaypipes: are there any plans for this? https://review.openstack.org/#/c/529135/19:58
sean-k-mooneymriedem: by the way sorry to be so negitive on https://review.openstack.org/#/c/612500 i understand why huawei and zte wants this but i really dont think its a viable option20:02
cdentany particularly exciting specs to look at?20:03
sean-k-mooneyi dont know is https://review.openstack.org/#/c/600016/ worth reading ?20:04
cdentsean-k-mooney: jay likes it, eric's less sure. I think if I update it to include the use case that eric's describing, it's a useful feature, but not likely to happen in stein20:05
jaypipesmriedem: that's another one I tired of fighting about.20:05
artomWouldn't that be more or a placement spec at this point?20:05
jaypipesmriedem: would be nice to get a real solution for it. apparently Oath has a whole code series for that backported against Ocata that I am supposed to figure out what is wrong with. :(20:06
*** openstackgerrit has quit IRC20:06
sean-k-mooneycdent: i have not read it yet but my first reaction to "GET /resource_providers?having=VCPU"  was thats not already a thing?20:06
sean-k-mooneye.g. i just assumed there was an effiect way to say give me all there resouce provierd with X resouce class20:07
cdentnope, you can't easily list _any_ resource provider that has a particular class of resource, as it will leave out providers that are full20:07
sean-k-mooneyah so its the and are not full bit that is missing right20:07
sean-k-mooneywell and may or maynot be full20:08
mriedemsean-k-mooney: it's not my spec20:08
sean-k-mooneye.g. just list all the RPs that have X regarless of there fullness20:08
cdentexactly20:08
cdentartom: placement specs is not a thing yet20:09
sean-k-mooneymriedem: ya i kindof assumed you jsut repoposed after the spec repo rebase20:09
sean-k-mooneymriedem: but did you not say huawei wanted this back in dublin20:09
mriedemhuawei wants sriov bond yes20:09
mriedem*huawei's customers want sriov bond20:09
sean-k-mooneymriedem: ya thats want i ment rather then that spec specifcally20:10
mriedembut i can't get the network product guys internally to clearly describe the requirements they have against nova20:11
artomcdent, oh. Um, should it be, with the split and everything?20:11
mriedemso i can't really compare what they want, or their proposed solutions, to the zte spec20:11
*** mvkr has joined #openstack-nova20:12
mriedemexcept they all say "something has to do the bond within the guest" hand wave hand wave20:12
sean-k-mooneymriedem: ya i know that feeling. the zte solution is very non cloudy20:12
cdentartom: the idea is that it will likely happen once the split is deemed complete. We have a list of things to complete before we declare that20:12
*** spatel has joined #openstack-nova20:12
mriedemjaypipes: you need to get your build request user metadata scheduler filter thing done first20:13
sean-k-mooneymriedem: what is often missed is sure you can do the bond in the guest but unless you tell neutron about it so it can confiugre the top of rack switch you screwed if you want to use lacp20:13
artomcdent, fair enough, thanks :) And reading the commit message, you just wanted to stash https://review.openstack.org/#/c/600016/ somewhere20:13
*** tbachman has quit IRC20:13
jaypipesmriedem: yep.20:14
cdentartom: yup20:14
jaypipesmriedem: right after I shoot myself in the head from looking at Chef recipe bullshit.20:14
melwittmriedem: just so I'm clear, does the initial allocation ratios spec cover the ability to set per aggregate allocation ratios? https://review.openstack.org/552105 I mean, I know it's not called out in the spec, but does the spec allow it to work as a side effect?20:17
*** medberry has joined #openstack-nova20:17
melwittor is there more work or another spec we would need to restore per aggregate allocation ratio abilities?20:18
sean-k-mooneymelwitt: i think that would be a different spec. but if the desire for aggragate allocation ratios was just to be able to set it via an api then plament does that20:19
melwittyeah, that's what I mean, if it's already possible to set allocation ratio per aggregate in placement, then I guess after the spec I linked is implemented, users will have what they need to do it20:20
sean-k-mooneymelwitt: well no thats not possible but you can prgramtclly set an allocation via the api. you would have to loop over the RPs in the placement aggreate to set them via the placemetn api to get the same effect as nova old aggreate allocation ratios20:21
mriedemmelwitt: no20:22
mriedemmelwitt: you're looking for https://review.openstack.org/54468320:22
mriedemwhich doesn't have an owner20:22
mriedemand is contentious in implementation20:22
melwitturgh20:22
sean-k-mooneycdent: misusing your spec to help with ^20:23
sean-k-mooneyany opion on  /resource_providers?having=VCPU&member-of=Y20:23
sean-k-mooneyso that in osc-placement we could say somthing like " set allocation ration for all resocue class X in aggreate Y"20:24
mriedemmelwitt: i think the summary on https://review.openstack.org/#/c/544683/ is that it's all possible outside of nova; adding it within nova was desirable as a proxy since we have a host aggregates API already with rbac, which placement didn't have until rocky. in dublin we said a simple meta CLI could be added to osc-placement to do this for people, but that didn't happen. but since we mirror the compute host aggregates to plac20:24
mriedemt aggregates, this would build on that for mirroring the allocation ratios as well if *_allocation_ratio was set on a compute host aggregate in nova.20:24
melwittok, so that's what the "proxy" talk was about. so the proxy would still have to set a ratio per RP, there's no concept of an aggregate spanning ratio20:27
sean-k-mooneymriedem: the meta cli being somting like "openstack placement allocation ratio set 1.0 --class VCPU --mem-of <my aggreate>"20:27
mriedemmelwitt: correct20:27
*** openstackgerrit has joined #openstack-nova20:27
openstackgerritMerged openstack/nova-specs master: Add .idea folder to .gitignore  https://review.openstack.org/58161120:27
mriedemresource provider aggregates in placement are just a group of linked providers, there is no metadata about the aggregate20:27
mriedemso the upside to nova doing the proxy is you get back to what we had before we broke this, and you only have to set aggregate allocation ratios in one place, rather than both nova and placement separately.20:28
melwitthm. trying to think how the old filters worked, how did operators express the per aggregate ratio /me looks for it20:28
mriedemthe downside is it's more proxy stuff in the comptue API20:28
mriedemthey do it with metadata on the host aggregate20:28
melwittok, I understand now20:29
mriedemhttps://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregateramfilter20:29
mriedem"If the host is in more than one aggregate and thus more than one value is found, the minimum value will be used."20:29
mriedemi don't know if jay's spec deals with that20:29
melwittthinking...20:32
melwittif you're only interested in per aggregate ratios, then you'd only need to set them in placement and not nova, I think20:32
melwittso maybe the scenario of "set them in both nova and placement separately" would be a rare one20:33
jaypipesmelwitt: as long as nova doesn't overwrite them over and over again..20:34
melwittyou're either going to be setting them in nova, or in placement, depending on whether you want to do it per compute (or via conf) or per aggregate (or via API)20:34
mriedemright, so if you manage this via the api you don't set the config overrides (iweb case)20:34
mriedemif you're cern, you only use config and not the api20:34
mriedembut we don't support both20:34
mriedemif set, config takes precedence20:34
melwittright. and if you're oath, you'd use only the api and not the conf20:34
jaypipesmriedem, melwitt: my light at the end of this tunnel is that jules is making kielbasa and cabbage for dinner, which is one of my favorites. yum.20:35
mriedemi can smell your house from here20:35
melwittok, I see. based on this, I think maybe we shouldn't try to proxy. it doesn't seem like it would buy much even for users20:35
jaypipesmriedem: you'll be able to smell it in the morning too.20:35
mriedemhi-o20:35
jaypipeslol20:35
mriedemmelwitt: well you probably want to run that by mgagne20:35
melwittand penick20:36
melwittyeah, I'm just speculating here. will definitely want to talk to them and see if they would be ok with a placement CLI that can fan out to aggregate RPs20:36
melwittor if being able to set the aggregate metadata is really that valuable to them20:37
jaypipesmriedem: how will melwitt get ahold of penick, though?20:37
jaypipesmriedem: he is a tough cookie to track down.20:37
mriedempaper airplane?20:37
jaypipeshehe20:37
melwittyeah, that part will be tough. I don't know him that well20:37
jaypipesmelwitt: you can set five upon him.20:38
jaypipesfor some reason I always think of He-Man's Attack Cat when I picture Five...20:38
mriedemmelwitt: it is important to mgagne because they allow non-admin users access to set host aggregate allocation ratios via the API20:38
melwittjaypipes: lol battlecat, that's awesome20:38
mriedemthat's why the RBAC need was big in placement for mgagne20:38
jaypipesmelwitt: battlecat, right :)20:39
melwittI had a battlecat, so cool20:39
melwittjaypipes: have you watched that "toys that made us" show on netflix? they have one that has the he-man toy line. I found it really interesting20:40
*** hamzy has quit IRC20:40
*** hamzy has joined #openstack-nova20:40
melwittmriedem: ahhh ok20:40
jaypipesmelwitt: hmm, no... sounds like it would be good to watch, though!20:42
melwittthere's only like 4 episodes, I watched em all20:42
*** ivve has quit IRC20:42
melwittthe more I think about it, I'm like proxying isn't so terrible either is it? being that we're already mirroring aggregates anyway. setting the ratios would be a small extra step, it seems. but if placement has RBAC, then would mgagne be happy... hrm.20:44
jaypipesall that hard work I did this morning from 3:30am - 8am in reducing my browser tab count by 63 tabs has been slowly destroyed again.20:49
melwittyou were working at 3:30am??20:50
mriedemcdent: if you have people internally that still care about these https://review.openstack.org/#/c/552190/ https://review.openstack.org/#/c/549067/ you might want to kick them to update them for stein20:51
cdentmriedem: i've kicked them several times to no avail20:52
mriedemshall i drop the abandon hammer then?20:52
mriedemthat way you're not the bad guy20:52
*** spatel has quit IRC20:52
cdentmriedem: meh, I'd let them ride a bit longer, might still be a chance20:52
jaypipesmelwitt: yeah, couldn't sleep.20:53
mriedemovercommitting a dedicated pcpu... https://review.openstack.org/#/c/599957/ seems...odd20:53
jaypipesBTW, I have COMPLETELY failed in my sean-k-mooney spellchecker powers today.20:54
mriedemisn't that an oxymoron?20:54
mriedemoxymoron = overcommitted dedicated pcpu20:54
melwittthinking about that hurts my brain20:54
jaypipesmriedem: yeah, it is, and I chatted with tpatil about that in Denver20:54
melwittdefinitely sounds contradictory20:55
jaypipesmriedem: mostly they just need the whole "allow a single compute host to have dedicated stuff and non-dedicated stuff on the same box" thing20:55
jaypipesmriedem: I doubt Tushar will follow up on that spec.20:55
mriedemand then PCPU inventory has an allocation_ratio>1.0?20:55
*** ttsiouts has joined #openstack-nova20:56
mriedemb/c this spec https://review.openstack.org/#/c/599957/ is all about many new config options20:56
mriedemwhich kinda sucks20:56
melwittthey're saying two of the options (cpu_dedicated_set, cpu_shared_set) come from a different spec though, `Standardize CPU resource tracking`21:00
*** mchlumsky has quit IRC21:01
melwittso only the cpu_pinning_allocation_ratio would be new. which really I guess is meaning pcpu_allocation_ratio, whereas the existing cpu_allocation_ratio technically means vcpu_allocation_ratio?21:02
*** erlon has quit IRC21:03
melwittso that they separate handling of vcpu vs pcpu21:03
*** cdent has quit IRC21:03
melwittalso interesting, it says based on the prereq spec https://review.openstack.org/#/c/555081 that PCPU inventory will be created with hardcoded allocation ratio of 1.0 and they want to be able to change it/overcommit it. so wouldn't that just be a call to the placement API?21:05
melwittbut I guess they want to be able to set it the same way as is possible for VCPU21:06
melwittvia nova.conf21:06
*** priteau has quit IRC21:09
*** dklyle has quit IRC21:19
*** dklyle has joined #openstack-nova21:20
mriedemedleafe: all sorts of API gross for you to munch on here https://review.openstack.org/#/c/580336/421:36
*** spsurya has quit IRC21:38
*** slaweq has quit IRC21:39
*** awaugama has quit IRC21:39
*** rtjure has quit IRC21:39
cfriesenI think my question about the "overcommit PCPUs" idea is what does it buy you that a low CPU overcommit ratio (with non-dedicated cpus) wouldn't?21:41
sean-k-mooneyjaypipes: haha was there a partical pharse that exausted the spellchecking ablity today :)21:46
sean-k-mooneycfriesen: over commiting pinned cpus i think would be less objectionable but the fact we choose hw:cpu_policy=dedicated|shared and not hw:cpu_policy=pinned|floating makes me dislike the idea21:48
edleafemriedem: gee thanks!21:49
cfriesensean-k-mooney: once you have more than one instance using a cpu, it's shared.  I don't see what this would buy us compared to just using a shared CPU21:52
sean-k-mooneycfriesen: well i very limited cases it may improve you performance as pinning will result in numa affinity for leagacy reasons but i would personally prefer hw:cpu_policy=share + hw:numa_nodes=121:54
cfriesensean-k-mooney: agreed.  And if you were using hugepages you'd get the numa affinity anyways.21:55
sean-k-mooneycfriesen: also since we are going to be using the dedictate_cpu_set for allocating realtime cpus i think this is likely to break that also21:56
cfriesenwait...how is a realtime cpu different from a regular dedicated cpu?21:57
sean-k-mooneycfriesen: realtime we set the "nice" value or whatever the priority values is on linux to realtime so it does not get premented if you have a realtime kernel21:58
sean-k-mooneycfriesen: we dont set the tread prioity for dedicated cpus21:58
*** claudiub has quit IRC22:02
cfriesensean-k-mooney: that's what I thought, just making sure.  if you've already got a "dedicated" cpu with nothing else running on it, what's the benefit of making the vcpu task realtime?22:02
cfriesen(unless the host hasn't properly moved all kernel work off that cpu)22:02
sean-k-mooneybasiclly ^22:03
sean-k-mooneyor there is some backgound process that is on the host that is not confied properly22:03
sean-k-mooneybut very little22:03
cfriesenbut if the host hasn't moved kernel work off that cpu, and you run the qemu task as realtime, don't you risk priority inversion anyways if the guest is doing a busy-loop and never letting other tasks run?22:05
*** slaweq has joined #openstack-nova22:05
jaypipessean-k-mooney: no, just distracted generally :)22:05
sean-k-mooneycfriesen: one if its actully a kernel thread then the kernel will run it if it needs too22:06
*** tbachman has joined #openstack-nova22:06
*** ttsiouts has quit IRC22:06
sean-k-mooneycfriesen: second if its a user tread the kernel can rescudle it i think if the vm is in a busy loop22:06
mriedemmelwitt: can we abandon https://review.openstack.org/#/c/509042/ or are there plans to update that?22:06
*** ttsiouts has joined #openstack-nova22:06
sean-k-mooneycfriesen: i think we tell people dont use this unless you have set up your host properly for realtime workloads and properly isolated the cores22:07
melwittmriedem: I'd like to update it but we need allocation ownership concepts in placement else it's moot22:07
mriedemso is anyone driving that dependency?22:07
sean-k-mooneycfriesen: at least i tell people do ues the realtime feature unless you have set up the host properly22:07
cfriesensean-k-mooney: on hosts with the RT kernel a bunch of kernel things get run in schedulable threads22:08
melwittmriedem: no, just saying, that's why it's stuck22:08
melwittand maybe, I guess I could ask jaypipes because I thought I saw mention of the idea of an owner attribute in some other spec22:09
mriedemok, so....if it's stuck, and no one is working on unstucking it, and it's not high enough priority to are, should we just abandon22:09
sean-k-mooneycfriesen: on https://review.openstack.org/#/c/599957 i suggested just adding a hw:cpu_policy=pinned which would pin the vm to one of the shared cpu set cores instead. does that sound better then over commiting dedicated cpus to you?22:09
mriedem*care22:09
openstackgerritMerged openstack/nova-specs master: Dynamically find releases for move-implemented-specs  https://review.openstack.org/59262822:09
sean-k-mooneycfriesen: oh ya but you have to use isolcpus too if you  useing the realtime core extra specs in nova correctly22:10
cfriesensean-k-mooney: I think you'd need to pin each vCPU in the VM to one of the shared cpu set cores.22:10
sean-k-mooneycfriesen: yes that is what i was suggesting22:10
melwittmriedem: for the record, I care about it a lot but I can't argue that allocation ownership in placement is a priority given everything else that's going on. I can abandon it on that basis22:10
cfriesensean-k-mooney: isolcpus means no scheduling, so only works with explicit pinning and one vcpu per pcpu.22:10
*** ttsiouts has quit IRC22:11
sean-k-mooneycfriesen: yep22:11
cfriesensean-k-mooney: yeah, so pinning to shared cpu set cores makes more sense to me22:11
*** eharney has quit IRC22:13
sean-k-mooneycfriesen: ok that comment is in there spec but while i kind of get this usescase i also think its not a great one.22:18
cfriesenit's purely a small performance optimization compared to just using "shared"22:20
cfriesenwhich is valid, but it makes the resource tracking really messy22:21
sean-k-mooneycfriesen: well in tushars case they want to have more over subsction with the same perfromce i think22:21
cfriesensean-k-mooney: their performance document is comparing against "shared"22:21
sean-k-mooneyyes but they are not comparing against shared with hw:numa_nodes=1 are they22:22
sean-k-mooneyalso where is the doc?22:22
cfriesennope.22:22
cfriesenlinked at the bottom of the spec22:22
*** rcernin has joined #openstack-nova22:24
openstackgerritMerged openstack/nova-specs master: fix tox python3 overrides  https://review.openstack.org/57979322:24
sean-k-mooneynot to kill this entirlaly but the other thing that makes me uncomfortable about this is by doing this you increase the risk of specter or l1tf22:24
sean-k-mooneyin a public cloud enve you are now allowing two instace that could be from different host to be pinned to the same core where they will context switch22:25
*** bnemec has quit IRC22:25
*** mriedem has quit IRC22:26
cfriesensean-k-mooney: agreed, that seems sketchy.  ideally you'd want to ensure only instances from the same tenant were permanently pinned together22:27
sean-k-mooneyya which is a pain22:27
*** slaweq has quit IRC22:38
*** hamzy has quit IRC22:59
*** threestrands has joined #openstack-nova23:02
*** slaweq has joined #openstack-nova23:11
*** idlemind has quit IRC23:38
*** slaweq has quit IRC23:45
*** gyee has quit IRC23:46

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!