Wednesday, 2021-03-10

*** vishalmanchanda has quit IRC00:05
*** k_mouza has joined #openstack-nova00:05
*** k_mouza has quit IRC00:10
*** brinzhang has joined #openstack-nova00:22
*** tosky has quit IRC00:36
*** sean-k-mooney1 has joined #openstack-nova00:40
*** sean-k-mooney has quit IRC00:41
*** martinkennelly has quit IRC00:41
*** macz_ has joined #openstack-nova01:04
*** sapd1 has quit IRC01:05
*** macz_ has quit IRC01:09
*** mlavalle has quit IRC01:27
*** iurygregory_ has joined #openstack-nova01:30
*** dviroel_ has joined #openstack-nova01:31
*** iurygregory has quit IRC01:38
*** dviroel has quit IRC01:38
*** ttx has quit IRC01:38
*** spatel_ has joined #openstack-nova01:39
*** dviroel_ is now known as dviroel01:39
*** macz_ has joined #openstack-nova01:40
*** macz_ has quit IRC01:45
*** ttx has joined #openstack-nova01:47
*** spatel_ has quit IRC02:10
*** hamalq has quit IRC02:14
*** macz_ has joined #openstack-nova02:22
*** macz_ has quit IRC02:27
openstackgerritJinsheng Zhang proposed openstack/nova-specs master: Add nova support ironic instance port group network metadata spec  https://review.opendev.org/c/openstack/nova-specs/+/77964402:33
*** spatel_ has joined #openstack-nova03:05
*** jamesdenton has joined #openstack-nova03:42
*** jamesdenton has quit IRC03:50
*** jamesdenton has joined #openstack-nova03:51
*** k_mouza has joined #openstack-nova03:56
*** gyee has quit IRC03:58
*** k_mouza has quit IRC04:00
*** links has joined #openstack-nova04:08
*** macz_ has joined #openstack-nova04:23
*** macz_ has quit IRC04:27
*** vishalmanchanda has joined #openstack-nova04:59
*** ratailor has joined #openstack-nova05:09
*** spatel_ has quit IRC05:43
*** macz_ has joined #openstack-nova05:50
*** macz_ has quit IRC05:55
*** jamesdenton has quit IRC05:57
*** jamesden_ has joined #openstack-nova05:57
*** sapd1_x has quit IRC06:13
*** whoami-rajat has joined #openstack-nova06:23
*** LinPeiWen has quit IRC07:13
*** ralonsoh has joined #openstack-nova07:21
*** khomesh24 has joined #openstack-nova07:29
*** LinPeiWen has joined #openstack-nova07:32
*** slaweq has quit IRC07:42
*** ociuhandu has joined #openstack-nova07:45
*** slaweq has joined #openstack-nova07:46
*** dklyle has quit IRC07:48
*** ociuhandu has quit IRC07:51
*** slaweq has quit IRC07:52
*** slaweq has joined #openstack-nova07:54
openstackgerritMerged openstack/nova stable/ussuri: Default user_id when not specified in check_num_instances_quota  https://review.opendev.org/c/openstack/nova/+/77721707:54
openstackgerritMerged openstack/nova stable/ussuri: Add regression test for bug 1914777  https://review.opendev.org/c/openstack/nova/+/77721807:55
openstackbug 1914777 in OpenStack Compute (nova) victoria "Possible race condition between n-cpu and n-api when deleting a building instance" [High,In progress] https://launchpad.net/bugs/1914777 - Assigned to melanie witt (melwitt)07:55
openstackgerritMerged openstack/nova stable/ussuri: Handle instance = None in _local_delete_cleanup  https://review.opendev.org/c/openstack/nova/+/77721908:02
*** ociuhandu has joined #openstack-nova08:05
*** lpetrut has joined #openstack-nova08:06
*** xinranwang has joined #openstack-nova08:08
gibigood morning08:09
*** tesseract has joined #openstack-nova08:13
*** ociuhandu has quit IRC08:16
*** ociuhandu has joined #openstack-nova08:17
*** ociuhandu has quit IRC08:17
*** ociuhandu has joined #openstack-nova08:18
*** brinzhang_ has joined #openstack-nova08:20
*** brinzhang has quit IRC08:23
*** ociuhandu has quit IRC08:23
yonglihegood morning08:27
*** rpittau|afk is now known as rpittau08:27
*** ociuhandu has joined #openstack-nova08:28
*** tesseract has quit IRC08:29
*** tesseract has joined #openstack-nova08:31
openstackgerritYongli He proposed openstack/nova master: Smartnic support - cyborg drive  https://review.opendev.org/c/openstack/nova/+/77136208:40
openstackgerritYongli He proposed openstack/nova master: smartnic support - new vnic type  https://review.opendev.org/c/openstack/nova/+/77136308:40
openstackgerritYongli He proposed openstack/nova master: smartnic support  https://review.opendev.org/c/openstack/nova/+/75894408:40
*** tosky has joined #openstack-nova08:46
*** MrClayPole has quit IRC08:47
openstackgerritAdit Sarfaty proposed openstack/nova master: Retry on vmware create_vm when it fails  https://review.opendev.org/c/openstack/nova/+/76458608:47
*** MrClayPole has joined #openstack-nova08:48
*** andrewbonney has joined #openstack-nova09:03
*** derekh has joined #openstack-nova09:09
*** lucasagomes has joined #openstack-nova09:11
*** sapd1 has joined #openstack-nova09:16
bauzasgood morning09:20
* bauzas wonders whether the gate is impacted by the OVH SBG fab issue09:21
yonglihesean-k-mooney: refer to patch comments, really hope that address those concerns.  for first patch commit message changed thanks.09:25
*** LinPeiWen has quit IRC09:26
gibibauzas: jobs are runnig this morning as far as I see09:31
bauzasgibi: I haven't seen jobs running in any sbg 1  to 4 dcs09:36
bauzasbut /me gives hugs to OVH folks around09:37
bauzasmy backups are impacted, but meh ;)09:37
*** martinkennelly has joined #openstack-nova09:47
*** rcernin has quit IRC09:54
*** jangutter_ has joined #openstack-nova09:55
*** jangutter has quit IRC09:59
*** nightmare_unreal has joined #openstack-nova10:12
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Remove dead error handling code  https://review.opendev.org/c/openstack/nova/+/77970410:15
openstackgerritLee Yarwood proposed openstack/nova master: WIP zuul: Replace grenade and nova-grenade-multinode with grenade-multinode  https://review.opendev.org/c/openstack/nova/+/77888510:19
*** swp20 has joined #openstack-nova10:21
*** ociuhandu has quit IRC10:23
*** ociuhandu has joined #openstack-nova10:24
*** ociuhandu has quit IRC10:31
*** ociuhandu has joined #openstack-nova10:31
stephenfinkashyap: Can you take a look at https://review.opendev.org/c/openstack/nova/+/779304/ today, please?10:32
stephenfinlyarwood: Seen this before? https://zuul.opendev.org/t/openstack/build/a19ff22927a045df9f8982d7ef678238/log/controller/logs/screen-n-cpu.txt#1636710:36
*** masterpe has quit IRC10:36
stephenfinWARNING: Failed to get udev device handler for device /dev/sda1.\n  /dev/sda15: stat failed: No such file or directory\n  Path /dev/sda15 no longer valid for device(8,15)\n  /dev/sda15: stat failed: No such file or directory\n  Path /dev/sda15 no longer valid for device(8,15)\n  ...10:36
stephenfinartom: Look what I see in some build logs -> This host appears to have multiple sockets per NUMA node. The `socket` PCI NUMA affinity will not be supported.10:38
stephenfinartom: from https://zuul.opendev.org/t/openstack/build/a19ff22927a045df9f8982d7ef678238/log/controller/logs/screen-n-cpu.txt10:38
*** jangutter has joined #openstack-nova10:42
lyarwoodstephenfin: yeah ./me finds the bug10:42
lyarwoodstephenfin: https://bugs.launchpad.net/cinder/+bug/190178310:43
openstackLaunchpad bug 1901783 in Cinder "volume delete fails because cinder-rootwrap lvs fails with exit code 139" [Undecided,Triaged]10:43
stephenfinoh, fun. This one is happening in nova (via privsep)10:44
lyarwoodoh really?10:44
lyarwoodsorry I didn't actually click through10:44
*** LinPeiWen has joined #openstack-nova10:44
* lyarwood looks10:44
stephenfinall good10:44
stephenfinhttps://a59f59e90e7507f14599-2cecd17ff9ea5f4d02c12f6d5fc5aedd.ssl.cf1.rackcdn.com/776681/7/check/nova-lvm/a19ff22/controller/logs/screen-n-cpu.txt10:44
stephenfinsearch for 8405c8d1-a8ed-4aed-8bfd-972ac6ab9c4310:44
lyarwoodah right it's the nova-lvm job10:44
lyarwoodthat makes sense10:44
stephenfin(the instance)10:44
stephenfinyup, agreed10:45
*** jangutter_ has quit IRC10:45
kashyapstephenfin: Mornin, I have it open; will definitely look today.  (Just finishing something more time-constrained.)10:47
stephenfinkashyap: Cool. Just remember feature freeze is tomorrow10:47
stephenfinso this is also pretty time constrained10:47
kashyapstephenfin: Oh, yeah.  Darn10:48
kashyapstephenfin: I see that you've read the messy/complicated situation here: https://bugzilla.redhat.com/show_bug.cgi?id=1929357#c510:49
openstackbugzilla.redhat.com bug 1929357 in libvirt "UEFI: Provide a way how to configure different combinations of secure boot enabled/disabled and keys enrolled/not enrolled" [Medium,New] - Assigned to phrdina10:49
*** k_mouza has joined #openstack-nova10:49
stephenfinyup10:49
kashyapSigh, so we have to do custom-parsing still10:50
stephenfinthe libvirt auto-configure feature isn't ready for primetime so I've side-stepped it and gone straight to the (QEMU) source10:50
stephenfinYes, but I think what we've done is definitely better than the previous hardcoded list10:50
stephenfinso it's an improvement, just not as much of an improvement as we'd hoped for10:51
kashyapstephenfin: What do you mean "straight to the source"?  You mean using directly the firmware descriptor files shipped by OVMF/EDK2?10:51
stephenfinyes10:51
kashyapstephenfin: Agree, on removing the hard-coded list10:51
kashyapstephenfin: A detail: each distro (depending on the distro) ships them as part of either EDK2 or OVMF package.  I did it for Fedora; and filed bugs for Debian and Ubuntu: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=93226910:52
openstackDebian bug 932269 in ovmf "Ship the firmware "descriptor files" as part of the 'ovmf' package" [Normal,Fixed]10:52
stephenfinI checked. Ubuntu 20.04 has them, as does Debian now10:52
kashyapExcellent; by now they should10:53
stephenfinAnd I found references to them for SUSE10:53
stephenfinAll in the standard locations too, thankfully10:53
kashyapstephenfin: Yes, I've worked w/ SUSE folks too in the past to get it worked in10:53
stephenfinbauzas: Friendly reminder that the UEFI secure boot series is all green now, save for errant gate failures, and ready for your attention10:58
artomstephenfin, is that... bad?11:00
stephenfinIt's unexpected, no?11:01
artomWell, from the nodepool VMs, where we would expect everything to be in one node and socket, yeah11:01
stephenfinThough I guess the gate jobs are VMs and therefore not really representative of real hardware11:01
*** masterpe has joined #openstack-nova11:01
artomstephenfin, well, see here: https://zuul.opendev.org/t/openstack/build/a19ff22927a045df9f8982d7ef678238/log/controller/logs/screen-n-cpu.txt#892-90111:03
artomIt put each CPU in its own socket11:03
artomSo my expectation is clearly wrong :)11:03
stephenfin \o/11:04
*** jangutter has quit IRC11:07
*** dtantsur|afk is now known as dtantsur11:07
*** jangutter has joined #openstack-nova11:07
*** belmoreira has joined #openstack-nova11:08
*** k_mouza has quit IRC11:15
*** k_mouza has joined #openstack-nova11:15
*** jangutter_ has joined #openstack-nova11:18
*** macz_ has joined #openstack-nova11:19
*** sapd1 has quit IRC11:21
*** jangutter has quit IRC11:22
*** macz_ has quit IRC11:24
bauzasstephenfin: yup, I was looking at other series, but I'll review them again this afternoon11:30
*** iurygregory_ is now known as iurygregory11:33
yonglihesean-k-mooney: I try to work out  your concern and that bug is about LM/resize, that's out of scope.  refer to comments, there are some idea about that.11:40
*** jangutter has joined #openstack-nova11:43
yonglihesean-k-mooney: I did not dig into that bug,  just say: It's seems that's data out of sync somehow,  change how to store the data won't help? (fix me)11:43
*** jangutter_ has quit IRC11:46
*** mgariepy has quit IRC11:49
*** derekh has quit IRC11:52
*** derekh has joined #openstack-nova11:54
alex_xusean-k-mooney1: yonglihe I looked those two bugs, hope I understand that correctly https://review.opendev.org/c/openstack/nova/+/771363/11/nova/objects/network_request.py#4211:54
*** macz_ has joined #openstack-nova12:00
*** macz_ has quit IRC12:06
sean-k-mooney1artom: nova default to creating 1 socket per vcpu12:07
sean-k-mooney1so in the gate vms you will have 8 socket and 1 numa node generated implcitly by qemu/libvirt12:08
sean-k-mooney1even for non numa vms12:08
sean-k-mooney1stephenfin: by the way passing the function form the driver.py to host.py goes away in a  later patch where i actully move the fucntions too host.py12:09
stephenfinsean-k-mooney1: could you do that first?12:09
sean-k-mooney1we dont actully use the function that take the callback until that patch12:09
* stephenfin goes for lunch - bbiab12:09
sean-k-mooney1maybe but it will be a bit of a pain to split back out the patches12:09
sean-k-mooney1i can do that it more a question of time12:10
sean-k-mooney1artom: would you be ok removing the numa toplogy object form the pci tracker and adding in a numa to socket map instead12:11
*** ociuhandu has quit IRC12:12
sean-k-mooney1although if we have to account for the many socket to 1 numa node of the gate i guess we need to retink that abit12:12
*** ociuhandu has joined #openstack-nova12:12
sean-k-mooney1in the gate all pci devices will be connected to the first socket12:12
*** ociuhandu has quit IRC12:12
sean-k-mooney1we only have 1 pcie root12:12
sean-k-mooney1and there is also only 1 numa node by default12:13
*** ociuhandu has joined #openstack-nova12:13
sean-k-mooney1a hack would be to have that numa to socket dict be int->list[int] and for you to just use the first socket form the list12:13
sean-k-mooney1we did discuss that at one point12:14
*** sean-k-mooney1 is now known as sean-k-mooney12:16
sean-k-mooneyalex_xu: its the same bug just upstream and downstream version12:17
sean-k-mooneyalex_xu: the bug is not related to claiming the pci devics12:17
sean-k-mooneyalex_xu: we correctly claim the pci device the issue is we dont update the neutron port profile correctrly as part of unshelve12:18
*** ociuhandu has quit IRC12:18
sean-k-mooneyalex_xu: and we generate the xml with the pci address stored in the profile12:18
sean-k-mooneyso we use the pci address form its previous host12:18
sean-k-mooneyso the vm we are unsleving is the one using the incorrect device12:19
alex_xusean-k-mooney: yes, so for cyborg, we should update the port with new arq uuid12:19
sean-k-mooneyyes12:19
alex_xuso I'm thinking we need the network_request.arq_uuid again to pass that new arq from conductor to compute12:20
sean-k-mooneyand we should update the profile with the claimed pci address for sriov too we just dont today12:20
alex_xuyea12:20
sean-k-mooneyalex_xu: well network_request is not stored in the db right12:20
alex_xuit is fine, we needn't the old arq12:20
sean-k-mooneyim uncomfortable with having the neutron port binding be the only place we persit that12:20
sean-k-mooneywe need it for things like hard reboot no?12:21
alex_xuwe needn't, I check that also, we build network_info from the neutron directly12:21
sean-k-mooneywe do and store it in the network info cache12:21
sean-k-mooneymy point is i dont think we should be using the value sotre in neutron to generate the xml12:22
sean-k-mooneywe should be tryign to do it without using the network info cache12:22
sean-k-mooneyso that user cant acidentally alter the port profile in a way that can brake nova12:22
alex_xubut all the action is using that network info, even for boot instance12:22
sean-k-mooneyright and that is a design flaw12:23
sean-k-mooneythat info can be currpted12:23
sean-k-mooneyi have seen it happen in customer deployment many times12:23
alex_xuI'm thinking if the user done that, whether it means there already have other thing goes wrong12:24
sean-k-mooneygenerally it can be recoverd but we should really treat that as write only12:24
sean-k-mooneyits possible yes12:24
alex_xuso binding profile is only for neutron use, but nova should have it own copy12:25
alex_xuthe hard thing is if the nova copy is different with neutron one, what should nova do12:25
sean-k-mooneybinding profile was intended to be used to pass info from nova to neutorn in one direction12:25
sean-k-mooneyalex_xu: use the nova one12:26
artomsean-k-mooney, yeah, that is a good idea12:26
alex_xuok12:26
alex_xuthen correct the neutron info12:26
artomAnd to internalize the function that generates that map into the NUMATopology object itself12:26
sean-k-mooneyalex_xu: https://github.com/openstack/neutron-lib/blob/master/neutron_lib/api/definitions/portbindings.py#L31-L3412:26
artomNot sure we have enough time before FF though12:26
sean-k-mooneyalex_xu: correct field that differ ideally12:26
sean-k-mooneybut we cant do that based on the info cache we have today12:27
artomsean-k-mooney, also, if those _filter functions are really meant to be purely functional, then @classmethod is not the way to do it - you put them outside the class entirely :)12:27
artomI'll see what I can hack up today12:27
sean-k-mooneyartom: why class method are prefectly valid ways to do that12:27
alex_xufor yongli's patch, should we block it for that issue?12:28
sean-k-mooneyyou coudl put them outside the class entirely or even make them static12:28
alex_xuI feel it is hard for yongli to fix that, since it is boarder issue12:28
sean-k-mooneyalex_xu: am no we could proceed and clean this up next cycle12:28
alex_xuyea12:28
*** jraju__ has joined #openstack-nova12:28
sean-k-mooneyalex_xu: im just uncomfortable realying on that12:28
sean-k-mooneyi was hoping we could ask cybrog for the arq using the neutorn port uuid12:29
sean-k-mooneyand have cyborg be the souce of truth12:29
*** links has quit IRC12:29
alex_xubut nothing we can do it now, I don't think system metadata is good place, since we said before, we shouldn't anything to it, since it without version control12:29
sean-k-mooneyalex_xu: actully i thought the preferece was to use it for things linke this over adding more tables12:30
sean-k-mooneyalex_xu: we can have version contol by using ovos12:30
sean-k-mooneyit allow use to store addtional info without db migrations12:30
sean-k-mooneybut ok lets put this issue assign for now and maybe add a todo12:31
sean-k-mooneyalex_xu: can we maybe add a test to vaoidate unshelve and ensure that the port profile is updated with the new arq?12:31
*** ratailor has quit IRC12:31
alex_xusean-k-mooney: I don't yongli implement that, I remember he said unshelve is out of scope for the spec12:32
sean-k-mooneydidnt we add support for shelve with cyborg recently12:32
sean-k-mooneyif its not supproted with neutron port then we would need to block it yes12:33
alex_xuI think when yongli's spec merged, the shelve with cyborg doesn't merge yet12:33
alex_xuI think the missing part is update the new arq uuid12:34
sean-k-mooneyyep12:34
sean-k-mooneyso we allow shelve now https://github.com/openstack/nova/blob/master/nova/compute/api.py#L416912:34
sean-k-mooneyif the compute service version is new enough12:35
sean-k-mooneyalex_xu: ok what we can do is proceed as it is12:36
sean-k-mooneyand when i submit the patch to block shelve for sriov instance with a workaoutnd config option i can include cyborg too12:37
sean-k-mooneyinfact it will do it for both automaticlly12:37
alex_xuok, so we want a explictly block for shelve instance?12:37
sean-k-mooneyi was just going to check if any of the port vnic types were in VNIC_TYPES_DIRECT_PASSTHROUGH which the cyborg ones are12:38
* alex_xu is trying to figure out what cyborg unshelve do12:38
sean-k-mooneyyes becuse a vm form one tenatn can break another tenat by unshelving12:38
sean-k-mooneyso like numa migration i want to block shelve for all instance using neutron sriov port with a [workaround]/allow_unsafe_shelve to renable it and backprot that12:39
sean-k-mooneythen i plan to fix it as a bug12:39
sean-k-mooneyso we can fix shleve for both sriov and cybrog by ensuring the port profile si updated for the new arq12:40
sean-k-mooneyso i planned to return a 403 permission deined12:40
sean-k-mooneyunless you had [workaround]/allow_unsafe_shelve=true or you had a fixed verion of nova12:41
sean-k-mooneyalex_xu: does that sound resaonable?12:41
sean-k-mooneygmann: ^ that should be ok form the api point of view yes?12:41
sean-k-mooneygmann: the exisitng issue for sriov ports is a long term bug so i dont want to have to make an api micorverion bump for that12:42
sean-k-mooney403 can be retruned form this already and i think that should be valid?12:42
alex_xusean-k-mooney: or we have a check whether instance have a port with device profile? then we needn't an workaround option12:43
sean-k-mooneygmann: the current block we have uses https://github.com/openstack/nova/blob/ab07507e5cfce6232fef373d07ff92ea704541da/nova/exception.py#L158-L15912:43
*** ociuhandu has joined #openstack-nova12:44
sean-k-mooneyalex_xu: i want the workaound for people that are using this with sriov today12:44
sean-k-mooneyalex_xu: i know that it currenlty does the wrong thing and you can end up using the wrong device12:44
sean-k-mooneywhich is why i want to blcok it12:44
sean-k-mooneybut we have had custoemr manually update the port profile and hard reboot the instace to match the claimed device12:45
sean-k-mooneyits a horable hack but they really wanted to keep shelve12:45
alex_xusean-k-mooney: ok, got it12:45
sean-k-mooneyso if they want to shoot them selves in the foot then that is what the workaroudn is for12:45
alex_xuok12:46
sean-k-mooneyalex_xu: if your ok with handeling the shelve issue this way i guess im ok with using the requeted_networks for now12:47
sean-k-mooneyand the port profile12:47
alex_xusean-k-mooney: yes, I'm ok with that12:47
alex_xuthen we need to wait yonglihe back, he is going to take his boy back from school :)12:47
*** ociuhandu has quit IRC12:48
sean-k-mooneyill go link this converation to the gerrit review12:49
alex_xucool12:49
sean-k-mooneyim more or less ok with the patch other then that but still want to see if gibi is ok with defining the constanct before they are need12:50
*** ociuhandu has joined #openstack-nova12:51
alex_xuyea, definitely need gibi's feedback12:51
sean-k-mooneywe may also need some extra test coverage for the vnic_types i havent fully review the last patch12:51
*** ociuhandu has quit IRC13:02
*** ociuhandu has joined #openstack-nova13:05
openstackgerritLee Yarwood proposed openstack/nova master: WIP zuul: Replace grenade and nova-grenade-multinode with grenade-multinode  https://review.opendev.org/c/openstack/nova/+/77888513:07
*** nicolasbock has quit IRC13:11
*** nicolasbock has joined #openstack-nova13:11
*** mgariepy has joined #openstack-nova13:12
sean-k-mooneyartom: so since your pci socket policy patch conflit with both my port numa patch and vdpa series we proably should set them up in a chain13:18
*** ociuhandu has quit IRC13:19
artomsean-k-mooney, sure. Which one of yours is more likely to land at this point?13:19
artomI guess put that one on top of my socket stuff?13:19
sean-k-mooneyvdpa obviously goes at the end since that is the lesast likely to be ready13:19
artomYeah, what I thought :)13:19
sean-k-mooneyya the port numa one13:19
*** ociuhandu has joined #openstack-nova13:20
artomI'm in the process of adding some patches to my series, I'll let you know when it's pushed?13:20
sean-k-mooneywould you be oke putting your patch on top of the port numa one? if i rebase it on top of master to pic up the patches of your that already merged13:20
sean-k-mooneyhum i guess yours already had 2 +2s buyt you need to change it for the multi socket thing13:21
sean-k-mooneyi dont think there are outstanding quesion left for mine.13:22
artomsean-k-mooney, mind linking your series?13:22
sean-k-mooneyits one patch  or the prot numa polices https://review.opendev.org/c/openstack/nova/+/77379213:22
artomAh, there's only the one? https://review.opendev.org/c/openstack/nova/+/77379213:22
sean-k-mooneyya i should have a second for numa vswitch but i wont get to that until next cycle so just one for sriov13:23
*** brinzhang_ has quit IRC13:23
artomThere aren't serious conflicts there, though13:23
sean-k-mooneyi was going to say maybe defer to stephenfin and gibi for order13:23
artomMaybe some trivial "context code" in the tests13:23
sean-k-mooneycorrect its minor13:23
*** brinzhang_ has joined #openstack-nova13:24
sean-k-mooneyjust dont wnat them to fight in the gate13:24
sean-k-mooneysicne one of the two will merge first13:24
sean-k-mooneyand the other need to be respun13:24
artomsean-k-mooney, lemme drop the kids off13:24
artomBack in ~1 hour13:24
*** ociuhandu has quit IRC13:25
*** jangutter_ has joined #openstack-nova13:25
sean-k-mooneysure ill wait for stephenfin  or gibi  to comment and ill do whatever they say13:26
sean-k-mooneyif they say put user first ill rebase on top of https://review.opendev.org/c/openstack/nova/+/77277913:26
*** ociuhandu has joined #openstack-nova13:27
*** jangutter has quit IRC13:30
*** ociuhandu has quit IRC13:33
*** sapd1 has joined #openstack-nova13:38
stephenfinsean-k-mooney: what's the question?13:43
*** jangutter_ has quit IRC13:44
sean-k-mooneystephenfin: artoms socket patch conflict with both my port numa and vdpa patches so i thin we need to stack them13:44
sean-k-mooneystephenfin: question is port numa first or socket13:45
sean-k-mooneywith vpda last13:45
stephenfinsocket13:45
stephenfinit's closest to merge13:45
*** jangutter has joined #openstack-nova13:45
stephenfinin fact it's approved13:45
sean-k-mooneyright but ig cant merge since it broken13:45
stephenfinhuh?13:46
sean-k-mooneyartom need to fix suport form multipel socekt on one numa node13:46
sean-k-mooneyby making it a list and taking the first socket13:46
sean-k-mooneyto make it work in the gate vms13:46
stephenfincan't that be a follow-up?13:46
stephenfinas it's an edge condition13:46
sean-k-mooneyi gues but i also am not happy with the numa topology object being passed as a copy to the pci tracker13:46
stephenfinis there real hardware with that design, out of curiosity?13:46
sean-k-mooneywe should have passed a dict of numa:socket13:47
stephenfinyeah, neither was I but I couldn't find a better way13:47
stephenfinthat not much better13:47
sean-k-mooneywell we should have passed it to the function13:47
stephenfinit's still information that can get out of sync13:47
sean-k-mooneythat cant13:47
sean-k-mooneywe dont supprot memory hotplug on the compute host13:47
stephenfinand the NUMATopology can, for what we need?13:48
sean-k-mooneyso the numa node to sockt mapping wont get out of daty13:48
stephenfin*what we need it for there?13:48
sean-k-mooneystephenfin: its a copy so the pinned cpus and mempages wil be out of date13:48
stephenfinbut we don't use that, right?13:48
stephenfinso we're simply passing too much information, some of which will get out-of-date13:48
sean-k-mooneyright but we dont have any comment stating that it will be incorrect and if we started trying to use it it could break13:48
sean-k-mooneystephenfin: yep13:49
stephenfinokay, fair13:49
stephenfinI'm happy to look at alternatives so13:49
sean-k-mooneyi would have prefered not to store it in the class too and pass it to the filter evne if that involed passing it donw the layres but the too mugh datat it my main concnern13:49
stephenfinbut I wouldn't block on it, because of how close feature freeze is13:49
*** ociuhandu has joined #openstack-nova13:49
stephenfinyeah, artom and I went over that in the review13:50
sean-k-mooneyya so we coudl reduce the amount of data we pass in a folowup13:50
stephenfinI believe him when he said it was too ugly13:50
sean-k-mooneyi just dislike the fact i need to change my mental model13:50
sean-k-mooneythe filter were not ment to use any data not passed into them13:51
sean-k-mooneywhich is why they were class metods13:51
sean-k-mooneyto signal that13:51
stephenfindid you see my other idea?13:51
stephenfinfrom https://review.opendev.org/c/openstack/nova/+/774149/1213:51
stephenfin"Spitballing. How about adding 'host_socket' and maybe 'host_cell' attributes to 'InstanceNUMACell'? The latter isn't really necessary, since we use 'id' for that right now, but it would let us decouple this in the future (it's a confusing design, IMO). That former would let us avoid passing host NUMA information through to the PCI manager."13:51
sean-k-mooneynope i only saw this after the patches were merged13:51
stephenfinWell then, thoughts on that? Unnecessary duplication or potentially useful?13:52
stephenfinhost_cell should probably read host_node too13:52
sean-k-mooneywell i orginally wanted to put the socket in the extra_info dict in the pcidevice13:52
sean-k-mooneyso we did not need the lookup13:52
sean-k-mooneybut let me read that a few times13:53
sean-k-mooneystephenfin: am adding host_socket to instanceNumaCell wont help unfortunetly13:53
sean-k-mooneystephenfin: we dont have the socket of the pci device13:54
sean-k-mooneyif we did that and we added host_socket to the pci extra info then yes13:54
yonglihesean-k-mooney, alex_xu gibi: thanks,  guys.13:54
stephenfinoh, right13:54
stephenfindarn13:54
stephenfinthen nvm13:55
sean-k-mooneyso what artom has merged in the first to pathces works its just got sharp edges13:55
sean-k-mooneyand if we reduced it to a dict of numa node to list of sockets13:56
sean-k-mooneyand then jsut did mapping[cell.id][0]13:56
sean-k-mooneyi think that would be enough13:56
sean-k-mooneyand we coudl do that in a follow up13:56
sean-k-mooneyso add a properyt or function to the host numa object to return that mappign dict and pass that in to the pci_tracker when we contuct it13:57
sean-k-mooneythat also get rid of some of the nested loops13:57
sean-k-mooneystephenfin: ill rebase my port patch on his then but what do you think of ^13:58
stephenfinwfm13:58
stephenfinbut as a follow-up, IMO13:58
stephenfinjust to de-risk the whole enterprise13:58
sean-k-mooneysure13:59
*** spatel_ has joined #openstack-nova14:01
*** ociuhandu has quit IRC14:01
*** ociuhandu has joined #openstack-nova14:01
kashyapstephenfin: I'm looking at the UEFI code; I'm commenting there as I go along (to facilitate parallel processing).  I see that you're already neck-deep in another discussion here; no rush14:03
kashyapstephenfin: Another thing on my mind (maybe you've already done it somwhere, and I need to catch up): a "guard" / check that denies secure boot for non-q35 machine types -- it's a q35-only feature14:04
* kashyap keeps lookin14:04
stephenfinkashyap: I don't need to do that explicitly14:06
stephenfinbut I do, to be safe14:06
stephenfinthe firmware metadata files won't report support for pc-i440fx-* machine types with the secure-boot feature14:06
stephenfinso we'll hit https://review.opendev.org/c/openstack/nova/+/779302/2/nova/virt/libvirt/host.py#150614:06
stephenfinsee also line 1499 in that14:07
stephenfinone of the reasons I invested so much time in beefing up the FakeLibvirtFixture was to "automate" as much of this as possible14:07
kashyapstephenfin: Indeed, the JSON files report, correctly so, only for pc-q35-* (ugly name; but it's QEMU's "historical decision)14:07
stephenfincorrect14:07
kashyapstephenfin: I see.  Yeah, I've seen your work fly-by on the test / fixtures stuff.  Fine work14:08
stephenfinwith that being said https://review.opendev.org/c/openstack/nova/+/776681/7/nova/virt/libvirt/driver.py#583714:08
stephenfinThat's mostly for test purposes14:08
stephenfinactually, no, I remember - it's so we can decided whether to enable secure boot or not in the 'optional' case14:09
*** jamesden_ is now known as jamesdenton14:09
stephenfinif we didn't have that, we'd attempt to enable it and fail14:09
kashyapstephenfin: Ah, I didn't get tot he _check-secure_boot_support() method yet; looks good14:09
kashyapYeah, we don't want to enable secure Boot by "default" (may people tend to opt out of the UEFI complexity, for good reasons)14:12
*** jrollen is now known as jroll14:13
sean-k-mooneystephenfin: artom summerised that here https://review.opendev.org/c/openstack/nova/+/772779/17#message-744974dfe27df3782bd2e6c066aa1b241fc7136a14:14
sean-k-mooneyill be afk untill the top of the hour and ill rebase my patches then14:14
gibistephenfin, sean-k-mooney, artom: fixing the sharp edges in artoms socket policy series as a followup works for me. I can review both that and the port numa policy when they are put in order. The order does not matter to me I have the intention to push both through :)14:14
sean-k-mooney:)14:16
gibiand we have more than a day! :D14:16
bauzasI'm switching away a bit from stephenfin's uefi secure boot series while 50% of the changes are now approved14:22
bauzaswho wants reviews here?14:22
bauzassean-k-mooney: vDPA series or the policy one ?14:23
* bauzas offers time 14:23
bauzasgibi: nothing crucial for your eyes ?14:23
stephenfingibi: plus a week for them to actually get through the gate /o\14:23
stephenfinsigh14:23
stephenfinit really hates my db migration series14:23
bauzasstephenfin: well, no worries if they are accepted before FF14:23
bauzasif they are in the gate, we can recheck if needed14:24
artomstephenfin, sean-k-mooney, so one thing that hadn't occurred to me what we could do is still store something in self (let's say self.numa_topology for now), but access it from consume_requests() and supports_requests(), which are already at the object level, and pass it to the _filter() methods, and we can keep them classmethods in that way14:24
bauzaseven if we're past the feature freeze14:24
artomBut to be honest, I'm burnt out on the thing. It's good enough as is, I'll remove my -1, merge, and that way folks are free to concentrate on other stuff before FF14:25
artomInstead of endlessly revisiting this14:25
stephenfinack14:27
gibistephenfin: I will recheck the db series into W or at least "die" trying :)14:27
sean-k-mooneyartom: ya that would have worked.14:27
openstackgerritStephen Finucane proposed openstack/nova master: WIP: Change default policy for GET '/os-hypervisors' API  https://review.opendev.org/c/openstack/nova/+/76579814:28
sean-k-mooneyi have a ptg topic on reworking the pci handeling and fixing bugs14:28
artomsean-k-mooney, you have *all* the PTG topics ;)14:28
sean-k-mooneyartom: lets think about it as part of that next cycle14:28
stephenfingibi: gmann: That's my approach to the 'os-hypervisors as project admin' change ^14:28
sean-k-mooneyartom: ya i know i feel bad for adding them but we might just skip some14:28
stephenfingibi: gmann: tbh, for what it gives us I'm thinking it's less and less a good idea14:28
sean-k-mooneyartom: more a list of what i knwo we shoudl addrss right now then what we plann on adressing next cycle14:29
stephenfinbauzas: ta for the reviews, btw :)14:29
gibiartom: as I said follow up work for me for the refactor. Do we know that socket - numa cardinality issue on the gate is just gate test node specific and fixebale later?14:29
bauzasstephenfin: sorry for stopping but I want to zap here14:29
sean-k-mooneyartom: i plan to stick through or otherwise note which one wont be worked on in xena after we discuss downstream what we have capsity to work on14:30
bauzasso, which one to pick ?14:30
artomgibi, the thing stephenfin pointed out? It's normal, our nodepool VMs are set up that way14:30
stephenfinbauzas: all good. chances are something will fail in the gate so no point bulk approving them14:30
artomgibi, the new code correctly noped out when it saw there were multiple sockets per NUMA node14:30
bauzasstephenfin: that's the usual game14:30
gibiartom: does it also mean that we cannot test this now on the gate in tempest?14:31
sean-k-mooneygibi: it can happen on really old hardware with a frontside bus. its unlkely to happen on any moderne (made in the last 12 years) hardware14:31
bauzasgibi: again, nothing to point out specifically ?14:31
bauzasif not, I'll choose sean-k-mooney's vDPA stuff as it's HPC14:31
artomgibi, I mean we never could we don't have PCI devices :)14:31
gibisean-k-mooney: thanks, so it is just gate env issue, cool14:31
sean-k-mooneygibi: multipl socket per numa ndoe imples the memory contoler was on the motherboard not the cpu14:31
gibiartom: correct, sorry :)14:31
sean-k-mooneyintell moved it to the cpu 12 years ago with nehelem14:31
gibiOK, then I'm fine with the socket policy series as is14:32
bauzasthat reminds me, any barebone server you could recommend me for running an openstack env ?14:32
artomgibi, cool, thanks!14:32
gibilet's merge it and do whathever refactor you agree on as a followup14:32
sean-k-mooneybauzas: i got one from ebay for 1600 or so but i allo have older ones for about 50014:32
gibiit also means that Sean's port policy patch should go on top14:33
sean-k-mooneybauzas: thre are lots you can get second hand depending on your budget14:33
bauzassean-k-mooney: woah that's expensive14:33
sean-k-mooneybauzas: if you dont have an atic get a tower server14:33
gmannstephenfin: on shelve block: by this you mean you will return this in new code based on config right ? "the current block we have uses https://github.com/openstack/nova/blob/ab07507e5cfce6232fef373d07ff92ea704541da/nova/exception.py#L158-L159"14:33
gmannsean-k-mooney: ^^14:33
gmannnot stephenfin14:33
bauzassean-k-mooney: I have a cave, but it's already fully packed so a barebone would get my preference14:33
gmannstephenfin: ack, sorry i missed that part of filtering. I will check that14:34
bauzasa tower is nice but too huge for the place I'd put it in there14:34
sean-k-mooneybauzas: i have to run but we can talk about it later if you want you can get ceaper system if you hapy to go older in 200-500 price point14:34
bauzassean-k-mooney: I'd appreciate it14:34
bauzasI honestly don't need a lot of power14:35
bauzasI just want rapid disks and RAM14:35
bauzasfor spinning VMs14:35
sean-k-mooney gmann  am it would return that for vms with cyborg prots and a similer new excpetion for vms with sriov port14:35
bauzasI was both frustrated and ashamed by gibi testing my own series :p14:35
sean-k-mooneygmann: the new one woudl also just inherit form forbidden14:35
stephenfingmann: Ack. Like I said, I don't really like it myself. Relying on this obscure aggregate metadata property that was previously only used for a specific filter and optional pre-filter feels wrong14:35
sean-k-mooneygmann: so no change in error code14:35
gibibauzas: no worries. I'm lucky as I have access to a lab with decent hardware (even with SRIOV capable NICs)14:36
bauzasgibi: that's what I honestly'd like to have14:36
stephenfingmann: If it were me, I'd probably just drop the idea and remove that idea from the spec but perhaps there are people who really want this? :-D14:36
sean-k-mooneybut ya basically return that for shelve based vnic_type in (direct,direct-pyshical ...) all the sriov or cyborg vnic types14:36
gmannsean-k-mooney: +1, as long as it inherit from forbidden error, wsgi layer will handle it as 403 otherwise you need to handle it explicitly in API controller.14:37
sean-k-mooneyok ill add you to the patch to review if that is ok when i get time to write it14:37
gmannsean-k-mooney: and yes 403 return for error cases does not need microversion as it is already generic return code as you mentioned.14:37
yonglihegibi: hope you have some bandwidth . I can fix comments at you guys night.  wish me lucky. -:)14:37
gibiyonglihe: I plan to get to the smartnic series before I leave for today14:38
gibiyonglihe: have a nice night14:38
gmannstephenfin: but if we drop we again go back to the problem of using PROJECT_AMDIN for POST /server API on specific host14:38
stephenfingmann: That's a fair point14:38
yonglihegibi: thanks have a good day.14:39
stephenfingmann: In that case, we can persist with it. I wasn't sure of the use case but that sounds like a good one.14:39
* gibi goes to look at the os-hypervisor policy patch14:40
*** fnordahl has quit IRC14:42
gmannstephenfin: 1 comment, for case where there is no 'filter_tenant_id' in host's aggregate then that host we can return for project admin14:49
stephenfinoh, that makes things more useful14:49
stephenfinisn't that a lot of information to expose though?14:50
stephenfinMaybe not. They'll just see the summary view so ID, hostname, state, and status14:51
gibithat would basically mean that all project admin in a non tenant separated cloud see every compute14:51
gmannyeah14:51
gibido we allow project admins to boot on all the computes in that case directly?14:51
gibiI mean what exactly this means?14:52
gibi15:38 < gmann> stephenfin: but if we drop we again go back to the problem of using14:52
gmannall you mean all or only host's computes with no 'filter_tenant_id'14:52
gibi               PROJECT_AMDIN for POST /server API on specific host14:52
gibidoes the above mean that we want to allow project admins to boot on specific compute host?14:53
stephenfinI'm looking and we already do14:53
gmannif it pass the filter_tenant_id filter (host match with filter_tenant_id or there is host with no filter_tenant_id) then that compute canbe allowed and rest 40314:53
gibihm, that feels powerful14:53
stephenfinbut gmann's comment there says we should change that to system admin14:54
gibiespecially in a non tenant separated cloud14:54
*** ociuhandu has quit IRC14:54
gmannstephenfin: no, i mean system reader get everything, and project admin only go with those filter_tenant_id filters14:54
*** ociuhandu has joined #openstack-nova14:54
stephenfinisn't that what I've done?14:55
gmannmay be we have to check if context is scoped to system or project14:55
gmannthis one? https://review.opendev.org/c/openstack/nova/+/765798/5..6/nova/api/openstack/compute/hypervisors.py#19714:55
gmannand you will keep default policy as SYSTEM_READER?14:56
stephenfinoh, are you talking about the context I use in my call to 'objects.Aggregate.get_by_metadata'?14:57
stephenfinI'm changing the 'os_compute_api:os-hypervisors:list' policy to project admin or system reader14:58
stephenfinbut you'll note that check doesn't pass a context argument14:58
stephenfinso it will never pass for a project admin, right?14:58
stephenfinwhile it will pass for a system admin or system reader14:59
stephenfinsince they don't need context14:59
stephenfinit's a bit of a hack, I'll admit, but it seemed logical to me14:59
gmannstephenfin:  it will pass with project admin too right with ' 'os_compute_api:os-hypervisors:list' policy to project admin or system reader'14:59
gmannit will fail for project member/reader15:00
stephenfinthe check will fail (non-fatally) for project admins, which means we'll enter that block15:00
gmannbut with this it would fail right?> -https://review.opendev.org/c/openstack/nova/+/765798/6/nova/policies/hypervisors.py#3715:01
gmann*would not15:01
*** jraju__ has quit IRC15:01
stephenfinI'm confused. Let's start from the top15:02
stephenfinThe policy has been changed from "system reader" to "system reader or project admin"15:02
stephenfinWe check that here https://review.opendev.org/c/openstack/nova/+/765798/5..6/nova/api/openstack/compute/hypervisors.py#27715:02
*** ociuhandu has quit IRC15:03
stephenfinIf e.g. a project reader tries to access this API, they'll get a HTTP 40315:03
*** ociuhandu has joined #openstack-nova15:03
stephenfinwe then check the *same policy* again here https://review.opendev.org/c/openstack/nova/+/765798/5..6/nova/api/openstack/compute/hypervisors.py#19715:03
stephenfinonly this time we do not provide "target={'project_id': context.project_id}"15:04
gmannyeah 277 protect from project member/reader15:04
stephenfinyou must provide that to pass any project checks15:04
gmannahh, i did not notice the target15:04
gmanngot it, little confusing to read15:04
stephenfinso that check will return False for project members but true for system admin/member/reader15:04
stephenfin*project admins15:05
gmannyeah15:05
stephenfinyeah, like I said, it's a bit of hack15:05
*** zzzeek has quit IRC15:05
stephenfinthat's what I was trying to call out with this comment15:05
stephenfin# don't pass project information, forcing the check to be a system15:05
stephenfin# admin check15:05
gmannor simple we can check context.system_scope == 'all' or whatever exact value15:05
stephenfinI should have said "don't provide project information as part of the target"15:05
stephenfinthat would be simpler :)15:06
*** zzzeek has joined #openstack-nova15:06
stephenfinwhat happens if we don't have scoping enabled though?15:06
stephenfinwould that result in admins not being able to list all hypervisors?15:06
gmannhttps://github.com/openstack/oslo.context/blob/0d02866365bc8b779aef9ebd0b79a52c96ae40e5/oslo_context/context.py#L25115:07
gmannhumm that is tricky, in old way we allowed to list all hypervisors15:08
gmannso as long as we support deprecated policy it will pass the 277 and @197 if you can check "if CONF.oslo_policy.enforce_scope and context.system_scope == 'all'" then we can support old way too15:09
gmanni mean only enable that check for CONF.oslo_policy.enforce_scope=true15:09
gmannif false then we do old way to keep compatibility15:10
gmannmeans return all hyperviros15:10
stephenfinYup, good idea15:11
stephenfinokay, so back to the response if 'filter_tenant_id' is not set on any aggregate15:11
stephenfinare you suggesting we return all hypervisors for project admins if 'filter_tenant_id=project_id' is not present on any aggregate?15:12
stephenfinthat would allow users to know all compute nodes in a given deployment15:12
stephenfinwhich seems too much15:12
gmannyeah, as filter_tenant_id is not there on aggregate scheduler filter aggregate_multitenancy_isolation allow any project to boot instance,15:14
gmannhttps://github.com/openstack/nova/blob/0e7cd9d1a95a30455e3c91916ece590454235e0e/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L5415:14
*** __ministry1 has joined #openstack-nova15:14
gmanni mean that is how host is kept open for all project15:14
gibiI project admin in a non tenant isolated cloud should not be allowed to explicitly select any target host, I think15:14
stephenfinthere's a big difference between being able to run an instance on any host and being able to get information on all those hosts though15:15
gibigmann: the filters allows using hosts but for me the question is does the project admin needs to know which host is which?15:15
gibistephenfin: +115:15
stephenfinwe don't even expose the host that the instance is on for non-admins15:15
stephenfinas in the 'OS-EXT-SRV-ATTR:hypervisor_hostname' attribute15:16
gmannfor this use case,  only hstname is needed for create server request15:17
gmannyeah for non-admin we will block list hypervisor or create instance on specific host15:17
gmanngibi's has good point 'I project admin in a non tenant isolated cloud should not be allowed to explicitly select any target host, I think'15:18
stephenfinI just checked. It fails on a DevStack deployment for me15:18
gmannnon tenant isolated cloud is anotehr thing to consider15:18
stephenfin. devstack/openrc15:18
stephenfin$ openstack --os-compute-api-version 2.latest server create --flavor m1.tiny --image cirros-0.5.1-x86_64-disk --network adec0800-859b-4aee-bf51-75eecb7aacf2 --hypervisor-hostname devstack-1 --wait test-server15:18
stephenfinPolicy doesn't allow compute:servers:create:requested_destination to be performed. (HTTP 403) (Request-ID: req-094bc901-ca54-4c9d-80dc-36f21b9d30ee)15:18
stephenfinSo unless an admin changes that policy, this information isn't very helpful15:19
stephenfineven for the tenant isolated cloud15:20
stephenfinAm I missing something?15:20
gmannyeah this one https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L18915:20
stephenfinAh, wait, I am. 'source devstack/openrc' won't set things up for a project admin15:22
stephenfinit'll use the demo user15:22
stephenfinso I could wrangle things to use a project admin and it should pass since it's actually using this policy https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L20515:23
*** ociuhandu has quit IRC15:23
stephenfinbut the TODO there indicates that the use of PROJECT_ADMIN is a mistake that we need to fix15:23
stephenfinand we shouldn't really be letting project admins select their host15:23
stephenfinat least that's how I read that TODO15:24
gmannyeah that is the use case end up with hypervors policy change15:24
*** ociuhandu has joined #openstack-nova15:24
gmannfirst we discussed of letting system to create instance for project need vm on specific host as project admin cannot get host info15:24
gmannbut later we agreed to do policy change in hypervisor itself and let project admin to list host15:25
gmannIn current situation, with default policy, unless user get system and project scope both in their token they cannot boot instance on specific host.15:26
gmannuse system scope power to get host info and then project admin power to boot instance with host15:26
*** ociuhandu has quit IRC15:26
*** ociuhandu has joined #openstack-nova15:26
stephenfinOkay, so this is an alternative to a new microversion for the 'POST /server' API?15:27
gmannyes15:27
stephenfinInstead of adding e.g. a 'project_id' field to the request body for that15:27
stephenfinGotcha15:27
gmannexactly15:27
stephenfinSo my gut says for this to happen, we'd have to insist on tenant-isolation15:28
gmanngibi: stephenfin but if we want to be more secure, I think returning no host if no matching 'filter_tenant_id' is also fine for me15:29
gmannand if we get any use case to allow non-'filter_tenant_id=project_id' host then we can  see15:29
gmanni mean the current way stephenfin doing.15:30
stephenfinI think that's necessary. Listing all hosts in a deployment is too much power15:30
*** __ministry1 has quit IRC15:30
gibiI agree15:31
gmannyeah, and it contradict our new concept of system vs project ..15:31
stephenfin(Continuing to think out loud) Even with that though, it's still odd that we will now allow users to list some hypervisors but we won't show them the 'OS-EXT-SRV-ATTR:hypervisor_hostname' attribute15:31
stephenfinBut maybe not as odd as having to ask a system admin to create an instance for you15:31
stephenfingmann: This is a tricky problem :)15:31
gmannstephenfin: humm, yeah but let it be strict use case of 'you want to create server on host so you can get hypervisor list but no where else we will return host info'15:32
gibihow I see 1) if there is strict tenant isolation and I'm the admin of that tenant then it is OK that I can see the hosts dedicated to my tenant and also it is OK that I can specify which host a VM lands on 2) if there is no strict tenant isolation then a project admin should not see more hypervisor info than a project member and also should not be able to specify which host a VM lands on15:33
gmann+115:33
gibiin short if there is isolation then inside an isolated buble the project admin can do whathever15:33
gibiI don't care15:33
gibibut as soon as tenants interact15:34
sean-k-mooneygibi: i can live it that. i was leanign to be more generous and say if ther eis not strict isolation for this tenat15:34
gibithe project admin should no have super power15:34
sean-k-mooneythen it can see all host that are not isolated15:34
stephenfingibi: so should we add another check to 'POST /servers'?15:34
gmanncan we add this info in api-ref? this will clarify the usage for this API.15:34
sean-k-mooneysince that would be the set of host they could boot on15:34
gmannat lest by seeing how confusing those API combination is15:34
stephenfinif I'm a project admin, the host I request must belong to an aggregate I'm isolated to15:34
gibiagree ^^15:35
sean-k-mooneyrequireing isolation is ok too15:35
gmannstephenfin: gibi for POST server we do not need as they do not know the host info15:35
stephenfingmann: you could conceivably guess15:35
*** jamesdenton has quit IRC15:35
gmannohk on agrregate check for POST eyx15:35
gmannyes15:35
stephenfinWhat do we return? HTTP 403?15:35
stephenfinI'm not sure if this warrants a microversion or not. Policy changes like this are weird15:36
sean-k-mooneyi think that is what the existing policy would return15:36
*** jamesdenton has joined #openstack-nova15:36
sean-k-mooneyif a project_member did it15:36
openstackgerritDan Smith proposed openstack/nova master: Make nova-ceph-multistore use policy.yaml  https://review.opendev.org/c/openstack/nova/+/77981515:36
gmannyeah 403 is better, if they do not pass policy permision or and host does not elong/allow them15:36
stephenfinnot for a project admin though15:36
stephenfincurrently they can guess a hostname and request that they be booted on that15:37
gmannyeah for project admin, policy allow15:37
sean-k-mooneystephenfin: ya and the filter will prevent it15:37
sean-k-mooneyif the host is not in there set15:37
stephenfinright, but it didn't before15:37
sean-k-mooneywell assuming your using it15:37
stephenfinso that's a change in behavior15:37
gmannor by bribing the system user :)15:37
stephenfinergo, what do we respond with and does this need a microversion15:37
sean-k-mooneystephenfin: not the aggreate tenatn isolation fitler or placment version would block the boot if isolatio was configured15:37
stephenfinbut not if it wasn't15:38
sean-k-mooneycorrect15:38
gmannfor non isolated cloud, it is change in behavior15:38
sean-k-mooneyif it was not then even as a normal user you can guess the host by using the AZ:host syntax15:38
sean-k-mooneyi dont think we need to do the check on server create15:39
sean-k-mooneywe should leave that up to placment/the schduler fileters15:39
sean-k-mooneyfor /os-hyperviors we shoudl check15:39
gmannbut for non isolated cloud, anyone can boot on host if they know host name?15:39
sean-k-mooneywith an az yes15:40
sean-k-mooneyyou can jsut do --avaiablity-zone $az_name:$hostname15:40
gmannbut  we are restricting in list hypervisors, do not return if no tenant isolation15:40
sean-k-mooneyi think --host is admin only?15:40
sean-k-mooneygmann: i was originally suggeting returning all hyperviors if no isolation for what it worth15:41
gmannyeah but with current discussion it seem that is too much info15:41
sean-k-mooneyya it seams to have pivioted15:42
stephenfinsean-k-mooney: '--hypervisor-hostname' is project-admin scoped https://github.com/openstack/nova/blob/master/nova/policies/servers.py#L21715:42
sean-k-mooneymy view on this is its the admins choice to grant project admin to a tenant15:42
gmannIMO, if we restrict hypervisor list then we should do the same for boot instance also otherwise we are opening a loop hole in our API15:42
sean-k-mooneyso if they care about restircting the view they shoudl configure isolation15:42
sean-k-mooneyif they dont then they dont15:42
sean-k-mooneygmann: we are restricting the hypervior list using the schduler metadta15:43
*** dklyle has joined #openstack-nova15:43
sean-k-mooneygmann: so if you have isolation configured you get it for free15:43
stephenfincan we defer this to Xena?15:43
gmannsean-k-mooney: and what we discussed now is restrict if no tenant isolcation too15:43
stephenfinthis is starting to feel like it warrants its own spec15:44
gmannyeah may be we should.15:44
sean-k-mooneyyep so we should keep the consitent15:44
*** lpetrut has quit IRC15:44
gmannand let discuss in PTG on all cases15:44
stephenfingiven the security implications and discussion around API changes15:44
sean-k-mooneyeither restic both or dont15:44
stephenfin*changes to other APIs15:44
sean-k-mooneywhen no isolation is configured for the tenant15:44
gmannand we can discuss host info in GET server also for such project admin15:44
stephenfinyeah, I'm in favour of the same behavior for both listing hypervisors and creating servers on a particular hypervisor15:45
gmannyeah, same behavior is needed whatever we agreed to15:45
sean-k-mooneygmann: getting hypervior_hostname is proably valid if you can do the hypervior list15:45
gmannyeah, and i think we return that in PUT/REBUILD server API also which also we can discuss15:46
*** ociuhandu has quit IRC15:46
*** ociuhandu has joined #openstack-nova15:46
sean-k-mooneyit basicaly come down too this. show project admins be aware of the hypervior that there vms can run on15:46
gmannso 1. we can update the current spec to remove the policy change 2create new spec for Xena and discuss in PTG  ?15:46
sean-k-mooneyif yes then it should see it on all apis15:46
sean-k-mooneywhere its relevent15:47
gmanngibi: sean-k-mooney stephenfin ^^ hope it is ok to update spec at this stage as this is to remove the things we agreed to do15:48
sean-k-mooneyand then there is the quetion of if it should require tenatn isolation or not15:48
gmannnot on adding anything new15:48
sean-k-mooneygmann: we can update specs at any time to reflect reality15:48
stephenfingmann: Yeah, I'll update the spec and add an item to the PTG to discuss this15:48
gmann+1, thanks15:48
sean-k-mooneygmann: this wont get moved into implemneted anyway15:48
gmannsean-k-mooney: oh why?15:49
gibigmann: yeah, spec update and a new spec is OK to me15:49
*** ociuhandu has quit IRC15:49
sean-k-mooneygmann: well its not done right15:49
gmannstephenfin:gibi sean-k-mooney i can create the new xena spec for PTG15:49
gibigmann: thanks15:49
*** ociuhandu has joined #openstack-nova15:49
stephenfinthanks15:49
sean-k-mooneyor is enough of the spec done to mark the wallby one as implemented?15:49
gibigmann: and If no discussion happens until the PTG on the spec then I suggest to raise this topic on the PTG too15:49
gmannsean-k-mooney: actually that is some special case we want to handle in that spec otherwise original spec of 'standardize the hypervisor API' is done15:49
gmanngibi: ack, make sense15:50
sean-k-mooneyoh its part of that15:50
sean-k-mooneysorry ya15:50
gibisean-k-mooney: I put the os-hypervisor API moderinzation bp to implemented as I got from stephenfin that the policy part is just a nice to have15:50
sean-k-mooneyso that spec can move to implmetned and then xena one for the edgecase15:50
gmannyeah15:50
sean-k-mooneycool15:50
gibiOK, we are in wild agreement15:51
*** jangutter_ has joined #openstack-nova15:51
sean-k-mooneyit would appear so. its funny how that often looks liek an argument :)15:52
*** jangutter has quit IRC15:55
*** macz_ has joined #openstack-nova15:57
*** macz_ has quit IRC15:57
*** macz_ has joined #openstack-nova15:58
*** LinPeiWen has quit IRC16:00
*** derekh has quit IRC16:00
openstackgerritStephen Finucane proposed openstack/nova-specs master: Remove policy changes from modernize-os-hypervisors-api spec  https://review.opendev.org/c/openstack/nova-specs/+/77982116:00
stephenfingmann, gibi: ^16:00
stephenfinalso added to the PTG doc16:01
gibistephenfin: on it16:02
gibiand thanks16:02
*** derekh has joined #openstack-nova16:03
*** amodi has quit IRC16:05
*** spatel_ has quit IRC16:08
*** amodi has joined #openstack-nova16:08
*** ociuhandu has quit IRC16:15
gmannstephenfin: thanks, lgtm16:18
*** spatel_ has joined #openstack-nova16:19
*** gyee has joined #openstack-nova16:22
*** ociuhandu has joined #openstack-nova16:24
*** ociuhandu has quit IRC16:24
*** ociuhandu has joined #openstack-nova16:25
*** ociuhandu has quit IRC16:29
*** jangutter has joined #openstack-nova16:35
*** ociuhandu has joined #openstack-nova16:35
*** ociuhandu has quit IRC16:37
*** ociuhandu has joined #openstack-nova16:37
*** takamatsu has quit IRC16:37
openstackgerritMerged openstack/nova-specs master: Remove policy changes from modernize-os-hypervisors-api spec  https://review.opendev.org/c/openstack/nova-specs/+/77982116:38
*** jangutter_ has quit IRC16:39
*** ociuhandu has quit IRC16:51
dansmithbauzas: I never saw any replies to my comments on the rpc bump patch16:55
dansmithbauzas: did you not notice, or just think they were all stupid? :)16:56
bauzasdansmith: I forgot to look at it, my bad.16:56
dansmithokay16:56
bauzasdansmith: but I quickly saw them16:56
bauzasreviews ftw atm16:56
dansmithokay none of them are critical obviously16:56
bauzasdansmith: I can look at them now while you're on16:56
bauzas(at least once my weekly meeting is done)16:57
dansmithokay16:57
*** lucasagomes has quit IRC16:58
*** mgariepy has quit IRC16:59
*** ociuhandu has joined #openstack-nova17:00
*** khomesh24 has quit IRC17:00
*** derekh has quit IRC17:02
*** derekh has joined #openstack-nova17:02
*** rpittau is now known as rpittau|afk17:05
*** mlavalle has joined #openstack-nova17:06
stephenfingmann: Any chance you could take a look at https://review.opendev.org/c/openstack/nova/+/778550 ?17:09
stephenfinIt's pretty late in the day, but it's a fairly simple microversion17:09
gmannstephenfin: sure, I will check, may be tomorrow I have to go out today after noon.17:12
stephenfinsweet, thanks :)17:12
*** dtantsur is now known as dtantsur|afk17:15
*** xinranwang has quit IRC17:18
openstackgerritStephen Finucane proposed openstack/nova master: api: Rename 'parameter_types.hostname' -> 'fqdn'  https://review.opendev.org/c/openstack/nova/+/77854917:22
openstackgerritStephen Finucane proposed openstack/nova master: api: Add support for 'hostname' parameter  https://review.opendev.org/c/openstack/nova/+/77855017:22
*** jangutter_ has joined #openstack-nova17:26
stephenfinsean-k-mooney: okay, got to the end of the series. I see what you mean about patch ordering17:28
stephenfinare you working on that series rn?17:29
sean-k-mooneynot yet but soon17:30
sean-k-mooneyim going to move the neutorn port one now17:30
sean-k-mooneyon top of artom change17:30
sean-k-mooneythen go back to vdpa17:30
*** jangutter has quit IRC17:31
* gibi leaves for today17:31
sean-k-mooneygibi: o/ enjoy your evening17:31
stephenfincool, mind if I take a quick swing at it?17:32
stephenfinI think I know what you're going for17:32
sean-k-mooneyif you want go for it17:32
sean-k-mooneyi tried to break it out at first to be logicly split but then i found later when i got the real hardware i need to fix some thing and some of those fixed end in the last patch17:33
sean-k-mooneyi would like to bring those fix to the correct patch17:33
*** tesseract has quit IRC17:34
*** sapd1 has quit IRC17:34
bauzasdansmith: okay, I looked at all your comments and thanks for them17:38
*** takamatsu has joined #openstack-nova17:38
bauzasdansmith: you made a great point, which is I made extra work for free when trying to continue to support 5.017:38
bauzasinstead of 5.1217:38
dansmithbauzas: well, not "for free", it clearly cost you something :D17:38
bauzasdansmith: I probably misunderstood that we were pinning to the least supported version from all computes, which is *not* 5.017:39
dansmith(but I know what you mean)17:39
dansmithbauzas: well, I think you aren't able to faithfully support 5.0 anyway, was my point17:39
bauzasyeah17:39
bauzasthe point is, we're entering the RC period and touching my change for removing stuff could require me some further work17:40
bauzasso maybe let's just pretend we can support 5.0, which is impossible anyway17:40
dansmithyes, although this is pretty much the time to be doing that17:41
dansmithbut as you wish17:41
bauzasI'll first reply to the other comments and do a quick respin17:41
dansmithack17:41
bauzasand if we have time, I can look at removing the unnecessary bits17:41
bauzaswhich should simplify my change17:42
bauzasbut here, baby steps17:42
bauzasto secure the RPC bump anyway17:42
dansmithsounds good17:42
sean-k-mooneydansmith: since your here and talking about this topic17:43
sean-k-mooneydansmith: when can we do ovo major version bumps17:44
dansmithyou guys talking about me being here now makes me sad :/17:44
sean-k-mooneydo they have to align with RPC bumps or not17:44
dansmiththey do not17:44
sean-k-mooneywell it was actully a time zone reference not avaiablity17:44
dansmithI did an instance bump long ago, let me show you my scars...17:44
*** mgariepy has joined #openstack-nova17:45
dansmithbut if you find that it should be a good map of how to do it for the worst case, and the newer backport manifest stuff likely makes it easier nowadays17:45
sean-k-mooneyok that is fine so we have a few comment for some other objects like the compute_node object to do X i 2.017:45
bauzasI remember the instance major bump pain17:45
dansmithinstance is like the worst case though, compute node is probably easier since we don't pass it around everywhere17:46
sean-k-mooneyya i was more wondering what the rules were around it17:46
bauzasgood point, but still something difficult17:46
bauzassean-k-mooney: the idea is, you have to understand which services are using the object17:46
bauzasand which version of it17:46
sean-k-mooneyyep we can check what the max version is supproted on the dest right17:47
sean-k-mooneyand then we backlevel before sending17:47
bauzasit's not just the compute17:47
dansmithsean-k-mooney: the rules are similar to the rpc interface, but you have to be able to support both version of the object across the boundary.. compute has to support the 1.x object in case another compute sends it (although likely not the case for compute_node),17:47
bauzasor the object being passed over the wire17:47
dansmithand of course conductor has to be able to support both17:47
bauzasand other services also have to understand the new minimum version17:48
sean-k-mooneyyep that is more or less what i was assuming regarding supproting both17:48
bauzasif they get the object17:48
bauzasI mean, if they rehydrate it17:48
dansmithyup17:48
bauzassean-k-mooney: which ovo object are you considering to bump ?17:48
sean-k-mooneyits somewhat bounded now that we have the check for min compute service version now17:49
bauzascomputenode is also used by a lot of services, hence my concern17:49
sean-k-mooneye.g. that it now enforce n+1 max delta17:49
sean-k-mooneybauzas: well im not nessisaly suggesting we do that one next cycle17:49
sean-k-mooneyjust trying to figure out when we are allowed too17:49
bauzasyup, gotcha17:50
bauzasshould we now discuss about raising the API minimums, just for the fun ?17:50
bauzasREST* APIs17:50
bauzas:)17:50
sean-k-mooneywell we have a todo to stop declaring the numa_toplogy object as a stign adn declar it as an ovo filed17:50
dansmithbauzas: it is, but not passed around between them as much right?17:50
sean-k-mooneyform like juno17:51
sean-k-mooneybauzas: so remove in 2.0 came up when i review artoms patchs17:51
sean-k-mooneywhich is why it was on my mind since we addded anouther one of those comments17:51
bauzasdansmith: trying to think about it17:52
bauzasdansmith: as a nested object, can't promise it isn't used17:52
sean-k-mooneygive that todo is 5 years old https://github.com/openstack/nova/blob/ab07507e5cfce6232fef373d07ff92ea704541da/nova/objects/compute_node.py#L84-L8617:52
dansmithwhat other object includes computenode in it?17:52
bauzasgood question17:52
dansmithbauzas: you said "as a nested object" ...17:53
bauzasdansmith: the Service object at least, I remember writing the nesting17:53
dansmithokay but I don't think that gets passed around either17:54
bauzashah, and the RequestSpec one17:54
dansmithreqspec has computenode in it?17:54
bauzasthe SchedulerRetries one17:54
bauzasit has a list of cn objects in it17:54
sean-k-mooneydo we use that any more17:55
bauzaswhich itself is nested in the spec object17:55
sean-k-mooneyi tought we dont do retreis since placment17:55
sean-k-mooneywe use alternate hosts17:55
dansmiththat seems weird because those are from different DBs, but maybe we just jam them in there to pass around for the retries?17:55
sean-k-mooneydansmith: we should check if its still needed i think that might have been for when we used the retryfilter17:56
bauzassean-k-mooney: good point, it's no longer used17:56
dansmithyeah17:56
dansmithwell, I certainly hadn't remembered that we do that, but changing it would mean more bumps, so..17:56
dansmithhowever,17:56
dansmithI'm not sure it's really likely to be a problem17:56
dansmithit's sourced from the control services, so if they put 1.x in there until everyone is upgraded,17:57
dansmithit should be fine17:57
sean-k-mooneyya again just raised this because i was trying to under stand when we can adress the Do X in next major version comments that are in some objects17:57
sean-k-mooneymany of which have been there for years at this point17:58
sean-k-mooneyhttps://github.com/openstack/nova/commit/1337890ace918fa2555046c01c8624be014ce2d8 was the instance one so i guess reviewing that is a good start.18:01
sean-k-mooneyoh ya i rembere the obj_relationships stuff18:02
sean-k-mooneyoh i see we had a _BaseInstance and InstanceV1 and InstanceV2 class for a cycle kind of like the proxy for rpc18:05
sean-k-mooneythen that patch actully drop V1 after we were shoudl everytin supported v218:06
sean-k-mooneyi vaguely remeber this happening https://github.com/openstack/nova/commit/713d8cb0777afb9fe4f665b9a40cac894b04aacb added 2 objects.18:07
*** takamatsu has quit IRC18:10
*** andrewbonney has quit IRC18:19
*** tbarron has joined #openstack-nova18:37
*** ralonsoh has quit IRC18:41
*** nightmare_unreal has quit IRC18:44
*** zul has quit IRC18:54
*** mgariepy has quit IRC19:03
*** mgariepy has joined #openstack-nova19:06
*** ociuhandu has quit IRC19:12
*** ociuhandu has joined #openstack-nova19:12
openstackgerritStephen Finucane proposed openstack/nova master: add constants for vnic type vdpa  https://review.opendev.org/c/openstack/nova/+/77047419:18
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'VDPA' to 'PciDeviceType'  https://review.opendev.org/c/openstack/nova/+/77748119:18
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Add vDPA nodedev parsing  https://review.opendev.org/c/openstack/nova/+/77053319:18
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Add guest generation for vDPA  https://review.opendev.org/c/openstack/nova/+/77053219:18
openstackgerritStephen Finucane proposed openstack/nova master: pci: Add vDPA vnic to PCI request mapping and filtering  https://review.opendev.org/c/openstack/nova/+/77835019:18
openstackgerritStephen Finucane proposed openstack/nova master: add hw:mlock extra spec  https://review.opendev.org/c/openstack/nova/+/77834719:18
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Move PCI host device parsing to 'host'  https://review.opendev.org/c/openstack/nova/+/77985119:18
stephenfinsean-k-mooney: there you go19:18
*** ociuhandu has quit IRC19:19
*** ociuhandu has joined #openstack-nova19:20
sean-k-mooneythanks. ill take a look and test it out shortly.19:22
*** ociuhandu has quit IRC19:22
*** ociuhandu has joined #openstack-nova19:23
openstackgerritStephen Finucane proposed openstack/nova master: objects: Add 'VDPA' to 'PciDeviceType'  https://review.opendev.org/c/openstack/nova/+/77748119:31
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Move PCI host device parsing to 'host'  https://review.opendev.org/c/openstack/nova/+/77985119:31
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Add vDPA nodedev parsing  https://review.opendev.org/c/openstack/nova/+/77053319:31
openstackgerritStephen Finucane proposed openstack/nova master: libvirt: Add guest generation for vDPA  https://review.opendev.org/c/openstack/nova/+/77053219:31
openstackgerritStephen Finucane proposed openstack/nova master: pci: Add vDPA vnic to PCI request mapping and filtering  https://review.opendev.org/c/openstack/nova/+/77835019:31
openstackgerritStephen Finucane proposed openstack/nova master: add hw:mlock extra spec  https://review.opendev.org/c/openstack/nova/+/77834719:31
*** brinzhang0 has joined #openstack-nova19:33
*** ociuhandu_ has joined #openstack-nova19:34
*** dpawlik6 has joined #openstack-nova19:34
*** macz_ has quit IRC19:35
*** macz_ has joined #openstack-nova19:36
*** mugsie__ has joined #openstack-nova19:37
*** tosky_ has joined #openstack-nova19:37
*** benj_- has joined #openstack-nova19:37
*** dosaboy_ has joined #openstack-nova19:38
*** whoami-rajat has quit IRC19:40
*** artom has quit IRC19:42
*** ociuhandu has quit IRC19:42
*** brinzhang_ has quit IRC19:42
*** tosky has quit IRC19:42
*** mjturek has quit IRC19:42
*** dpawlik has quit IRC19:42
*** benj_ has quit IRC19:42
*** dosaboy has quit IRC19:42
*** mugsie has quit IRC19:42
*** Underknowledge has quit IRC19:42
*** dpawlik6 is now known as dpawlik19:42
*** Underknowledge has joined #openstack-nova19:43
*** irclogbot_0 has quit IRC19:44
*** jamesdenton has quit IRC19:45
*** mugsie__ is now known as mugsie19:47
*** jamesdenton has joined #openstack-nova19:48
*** belmoreira has quit IRC19:52
*** belmoreira has joined #openstack-nova19:58
*** artom has joined #openstack-nova20:01
openstackgerritMerged openstack/nova master: libvirt: Stop passing around virt_type, caps  https://review.opendev.org/c/openstack/nova/+/77568920:12
openstackgerritMerged openstack/nova master: libvirt: Add missing type hints  https://review.opendev.org/c/openstack/nova/+/77568820:13
*** tosky_ is now known as tosky20:16
openstackgerritsean mooney proposed openstack/nova master: Support per port numa policies with SR-IOV  https://review.opendev.org/c/openstack/nova/+/77379220:23
sean-k-mooneyartom: ^ im having troble getting the socket test i addded here working https://review.opendev.org/c/openstack/nova/+/773792/10/nova/tests/functional/libvirt/test_pci_sriov_servers.py#125420:24
*** takamatsu has joined #openstack-nova20:25
artomsean-k-mooney, I wonder if it's because I set the 'socket' attribute on the pools in my filter method20:31
artomAnd the pools that get passed to remove_device() don't have it20:31
artomSo it fails the equality test20:31
sean-k-mooneyits there in the host state object   {"physnets": [], "tunneled": false}, "nova_object.changes": ["physnets", "tunneled"]}, "socket": 1},20:32
sean-k-mooneyactully ill copy this to pastbin that not waht i wanted20:32
sean-k-mooneyhttp://paste.openstack.org/show/803443/20:34
artomHrmm, yeah, if I add:20:34
artom            del pool_keys['devices']20:34
artom            if 'socket' in pool_keys:20:34
artom                del pool_keys['socket']20:34
artomIn _find_pool()20:34
artomIt makes the tests pass20:34
artomOK, I have to run20:34
artomHopefully that helped a little bit, talk in ~1 hour20:34
sean-k-mooneyok ya that might be the issue20:35
sean-k-mooneybut if it is your current patch might break neutron sriov ports? ill do some testing locally and try and see if its just a functest issue or a real one20:36
sean-k-mooneythe other polices do work however so i dont know it might just be an edgecase i need to fix to make both work correctly20:37
artomI suspect it's a logic error in my patches20:38
artomThough I do wonder how come my own functional tests passed...?20:38
* artom -> really run20:38
*** tosky has quit IRC20:49
*** tosky has joined #openstack-nova20:49
*** derekh has quit IRC20:52
*** k_mouza has quit IRC20:57
*** ociuhandu_ has quit IRC20:59
*** dklyle has quit IRC21:10
*** ociuhandu has joined #openstack-nova21:12
*** mtreinish has joined #openstack-nova21:15
sean-k-mooneyartom: i get the error on real hardware too.21:15
*** martinkennelly has quit IRC21:20
*** belmoreira has quit IRC21:20
artomsean-k-mooney, I wonder if if has something to do with VFs and PFs?21:31
*** flaviof has quit IRC21:31
*** flaviof has joined #openstack-nova21:31
sean-k-mooneyim not sure21:34
sean-k-mooneydid you test your code on real hardware21:34
sean-k-mooneyhum if i do some hacking in /sys to put all my nics on numa 0 i can boot again21:35
sean-k-mooneybasiclly echo 0 | sudo tee /sys/class/net/eth*/device/numa_node21:36
sean-k-mooneythen stop nova compute and libvirt fully and start them again21:36
artomI didn't21:36
*** vdrok has quit IRC21:37
sean-k-mooneywhen that was -1 it was failing i hadd to add pool_keys.pop('socket', None)21:37
sean-k-mooneyto find_pool21:37
*** vdrok has joined #openstack-nova21:37
sean-k-mooneyill try removing that again21:37
sean-k-mooneyi can test it with pci alis too i guess i just need to spend some time to do it21:39
*** NobodyCam has quit IRC21:39
*** NobodyCam has joined #openstack-nova21:40
artomThat's the thing, I wasn't touching /sys, or indeed the PCI devices21:40
artomBesides calculating the socket and saving it in pool['socket']21:41
sean-k-mooneyya well this is what is printign for the final resouce view21:41
sean-k-mooneyPciDevicePool(count=1,numa_node=0,product_id='10c9',tags={dev_type='type-PF',physical_network='public'},vendor_id='8086'), PciDevicePool(count=7,numa_node=0,product_id='10ca',tags={dev_type='type-VF',parent_ifname='eth1',physical_network='public'},vendor_id='8086')]21:41
sean-k-mooneyso the socket is not being added there incorerctly21:41
*** dklyle has joined #openstack-nova21:42
*** ociuhandu has quit IRC21:42
sean-k-mooneya pic dev look more or less normal too21:43
sean-k-mooney{"dev_id": "pci_0000_01_11_3", "address": "0000:01:11.3", "product_id": "10ca", "vendor_id": "8086", "numa_node": 0, "label": "label_8086_10ca", "dev_type": "type-VF", "parent_addr": "0000:01:00.1", "parent_ifname": "eth1", "capabilities": {"network": ["rx", "tx", "sg", "tso", "gso", "gro", "rxvlan", "txvlan", "txudptnl"]}}21:43
sean-k-mooneystrange it seam to be workign without the pop(socket ,none)21:45
artomsean-k-mooney, hrmm, I really think it's the parent PF issue21:45
artomIt doesn't have the socket21:45
*** TheJulia has quit IRC21:45
sean-k-mooneyya maybe althou it shoudl21:45
*** TheJulia has joined #openstack-nova21:45
sean-k-mooneywell in the same way the VF will21:45
artomThe logging I added to remove_device() and _find_pool() logs what I expect with your tests21:45
artomBut not with my functional tests21:46
*** johnsom has quit IRC21:46
artomMeaning - it's never called, because in my func tests it only tests PFs (or "standard" PCI)21:46
artom(Not sure which)21:46
*** johnsom has joined #openstack-nova21:46
artomBy testing with VFs and their parent PFs, you've uncovered an issue21:46
sean-k-mooneystandard is not a PF21:46
sean-k-mooneystandard is a device that does not supprot sriov21:47
artomBut it has no parent, is my point21:47
sean-k-mooneya PF is type-PF21:47
sean-k-mooneystandard is type-PCI21:47
artomThe point is not the type, it's the parent device (or lack thereof)21:47
sean-k-mooneycorrect you do not have to mark the parent as unavaible21:48
sean-k-mooneyas we do with VFs21:48
sean-k-mooneyor the child as unaviable with PFs21:48
sean-k-mooneyso you are not calling _handle_device_dependents21:49
sean-k-mooneyin your tests21:49
artomNope :)21:49
artomNot that it as intentional21:49
sean-k-mooneywell you might be but its a noop for you21:49
artomOr that I was aware it existed21:50
sean-k-mooneyyou are calling it21:50
sean-k-mooneybut i does nothing for STANDARD/type-PCI21:50
artomYeah, if I do         for pool in pools:21:51
artom            if 'socket' in pool:21:51
artom                del pool['socket']21:51
artomAt the end of my new filter21:52
artomIt fixes your func tests21:52
artomI guess you've made your point about being purely functional and not having side effects ;)21:52
sean-k-mooneymaking it it a pure fucntion  may or may not have caught it depending on if we wrote the test right21:53
artomI think the "best" solution would be to not set pool['socket'] altogether, and come at this from another angle: figure out which NUMA nodes are allowed, and filter on those21:54
artomNo side effects21:54
sean-k-mooneywell you should not be setting pool sockets21:55
sean-k-mooneyand you dont need too if you did what was in my last comment21:55
*** slaweq has quit IRC21:55
sean-k-mooneyhttps://review.opendev.org/c/openstack/nova/+/772779/17#message-744974dfe27df3782bd2e6c066aa1b241fc7136a21:56
sean-k-mooneythe body of your filter could be basically this right21:57
sean-k-mooney vm_sockets = [mappings[cell.id]  for  cell in numa_cells]21:57
sean-k-mooney   pools_to_remove = []21:57
sean-k-mooney   for pool in pools:21:57
sean-k-mooney       pool_socket = mappings.get(pool['numa_node'])21:57
sean-k-mooney       if not (pci_socket and pool_socket in vm_sockets):21:57
sean-k-mooney          pools_to_remove += pool21:57
sean-k-mooney return [pool for pool in pools if pool not in pools_to_remove]21:57
artomThere are some unbound variables in there, so I can't quite grok it21:57
artomBut I can come up with something equivalent21:57
*** masayukig has quit IRC21:58
sean-k-mooneywhere mappings is basically {numa_node:[socket_id,socket_id]21:58
*** masayukig has joined #openstack-nova21:58
sean-k-mooneyartom: mappings is the only one right21:58
artompci_socket as well21:58
sean-k-mooneyoh that is pool_socket21:59
sean-k-mooneyi forcot to rename it21:59
sean-k-mooneyartom: although maybe that shoudl be22:00
sean-k-mooney if not pool_socket or pool_socket in vm_sockets:22:00
sean-k-mooneyfor socket policy you dont allow device with no numa node correct22:01
sean-k-mooneyso that should be an or not an and22:01
*** spatel_ has quit IRC22:04
artomsean-k-mooney, well, I did a thing, and it made all tests pass.22:06
artomIt's brute force and dumb, but "easy" to understand22:06
artomShall I push?22:06
sean-k-mooneyartom: so i can "fix" this in my patch by adding  pool_keys.pop('socket', None) to _find_pool22:06
artomIt'll kick my last patch out of the gate, and we'll have to get +2s and +As again tomorrow22:07
sean-k-mooneyi dont know its technically broken but you have a followup or i can fix it in my patch22:07
sean-k-mooneywe proably shoudl kick it out of the gate as much as i hate that22:08
artomYeah, it's the correct thing to do22:08
artomAs it, it's broken with any parent-having device22:08
sean-k-mooneyya22:08
*** vishalmanchanda has quit IRC22:08
openstackgerritArtom Lifshitz proposed openstack/nova master: pci: implement the 'socket' NUMA affinity policy  https://review.opendev.org/c/openstack/nova/+/77277922:09
openstackgerritArtom Lifshitz proposed openstack/nova master: Support per port numa policies with SR-IOV  https://review.opendev.org/c/openstack/nova/+/77379222:09
sean-k-mooneyso the minimal fix is just add one line  pool_keys.pop('socket', None) to _find_pool22:09
*** ociuhandu has joined #openstack-nova22:10
sean-k-mooneyah i see what you did22:10
sean-k-mooneyare you still ok with rewriting it again in the followup22:11
sean-k-mooneythat should work as is though22:11
sean-k-mooneyill test it now22:11
artomsean-k-mooney, yeah, we can improve in a follow up22:12
artomFor now, do the simple stupid thing that's easy to understand and review, and works ;)22:12
sean-k-mooneyok i have the unit and funct test running for those locally now.  im going to go get somethign to eat and when i get back ill test boot real vms and ill comment on the review22:15
sean-k-mooneythen ill rebase the vdpa change on top of both patches and test those22:16
*** zzzeek has quit IRC22:16
artomThat's... optimistic ;)22:16
artomThough I suppose if that was the only outstanding issue with the port NUMA policies, might as well aim for the moon ;)22:16
sean-k-mooneyi have some nits form stephen it looks like i mised but ya i think its the only issue22:17
*** zzzeek has joined #openstack-nova22:18
sean-k-mooneythese https://review.opendev.org/c/openstack/nova/+/773792/11#message-8d43910738d79b4f530fd6b4176074ad6f53c54322:18
*** ociuhandu has quit IRC22:18
openstackgerritMerged openstack/os-resource-classes master: Fix hacking min version to 3.0.1  https://review.opendev.org/c/openstack/os-resource-classes/+/72755722:20
*** rcernin has joined #openstack-nova22:27
*** k_mouza has joined #openstack-nova22:42
*** k_mouza has quit IRC22:43
*** k_mouza has joined #openstack-nova22:43
*** k_mouza has quit IRC22:47
*** k_mouza has joined #openstack-nova22:53
*** k_mouza has quit IRC22:57
dansmithmelwitt: easy stats: https://review.opendev.org/c/openstack/nova/+/77981522:58
dansmithconverting the single-line glance policy I write from old json to new yaml, so I can write to it easier from a child job to change other stuff22:59
melwitt+223:01
dansmithm'thanks23:01
*** k_mouza has joined #openstack-nova23:28
*** rpittau|afk has quit IRC23:37
*** rpittau|afk has joined #openstack-nova23:37
*** aarents has quit IRC23:38
openstackgerritsean mooney proposed openstack/nova master: Support per port numa policies with SR-IOV  https://review.opendev.org/c/openstack/nova/+/77379223:38
*** k_mouza has quit IRC23:55
*** k_mouza has joined #openstack-nova23:59

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!