openstackgerrit | wangxu proposed openstack/cyborg master: Fix deployable get all sort issue and unit test failure. https://review.openstack.org/590075 | 02:50 |
---|---|---|
*** pliu has joined #openstack-cyborg | 08:03 | |
openstackgerrit | ShaoHe Feng proposed openstack/cyborg master: support sub provider https://review.openstack.org/585146 | 08:39 |
openstackgerrit | ShaoHe Feng proposed openstack/cyborg master: support sub provider https://review.openstack.org/585146 | 11:04 |
openstackgerrit | ShaoHe Feng proposed openstack/cyborg master: support sub provider https://review.openstack.org/585146 | 11:07 |
openstackgerrit | ShaoHe Feng proposed openstack/cyborg master: support sub provider https://review.openstack.org/585146 | 12:09 |
*** shaohe_feng has joined #openstack-cyborg | 13:13 | |
*** shaohe_feng has quit IRC | 13:18 | |
*** jaypipes has quit IRC | 14:06 | |
*** jaypipes has joined #openstack-cyborg | 14:10 | |
*** guhcampos has joined #openstack-cyborg | 14:34 | |
*** guhcampos has quit IRC | 14:41 | |
*** guhcampos has joined #openstack-cyborg | 14:42 | |
*** alex_xu has quit IRC | 15:24 | |
*** Sundar_ has joined #openstack-cyborg | 20:39 | |
*** guhcampos has quit IRC | 20:40 | |
Sundar_ | Hi Eric and Sean | 20:40 |
efried | Sundar_: hold on, Sean's not here yet... | 20:41 |
*** sean-k-mooney has joined #openstack-cyborg | 20:43 | |
efried | Sundar_, sean-k-mooney: Absolutely agree libvirt-XML-specific code should live in the os-acc driver and/or plugin, NOT in nova. | 20:45 |
Sundar_ | Sean, I am just saying that the os-acc provides device-specific info to Nova virt drivers depending on the hypervisor. For e.g., for libvirt + PCI, it may provide a XML snippet for a PCI device, but does not modify the VM's domain XML. That is left to the libvirt driver. | 20:46 |
Sundar_ | efried: Agreed | 20:46 |
sean-k-mooney | efried: only if the os-acc driver is part of nova and not cyborg | 20:46 |
Sundar_ | efried: Agreed | 20:46 |
efried | "part of nova and not cyborg" - the driver stands alone. It talks to both nova and cyborg, but it's not really "part of" either. | 20:47 |
sean-k-mooney | what is the argument for allowing arbitary xml to be instered form an external lib that is loaded dynamically at runtime | 20:47 |
sean-k-mooney | efried: this exact usecase was forbin for both os-brick and os-vif | 20:48 |
Sundar_ | sean-k-mooney: It is not necessarily arbitrary: the return value, including the schema, can be specified and checked by the virt driver. But it allows for more extensibility -- environment/device-specific variations can be handled better | 20:49 |
sean-k-mooney | Sundar_: how do you know how to asign virtual pci address for acclorator withing the guest | 20:50 |
sean-k-mooney | jsut as an example | 20:50 |
Sundar_ | sean-k-mooney: The analogy with os-vif cannot be taken too far. The os-vif handles a small number of plugins (Linux bridge, OVS, etc.) and it is intertwined with virt driver logic in practice. With os-acc, we will have far more variation in devices. Also, we do not want to repeat the coupling and legacy pracices of os-vif | 20:51 |
sean-k-mooney | Sundar_: you are going to end up coupleing the drivers to hyperviros in a way that is highly fragile | 20:52 |
Sundar_ | sean-k-mooney: For the virtual PCI, the os-acc plugin/driver will get it in a device-specific way and, depending on the hypervisor (e.g. libvirt+qemu), can construct the needed snippets | 20:52 |
sean-k-mooney | Sundar_: which would mean i would have to pass the iamge and flavor to you in addtion to the list of all interfaces | 20:53 |
Sundar_ | sean-k-mooney: "Coupling the drivers to hypervisors' -- can you make that more concrete with an example or specific scenario? | 20:53 |
sean-k-mooney | well for a start the fact that os-acc needs to know you are using libvirt at all is a failing in my view | 20:53 |
sean-k-mooney | whey canre we not define a datamodle using oslo versioned object that will be returned by os-acc and have each of the virt driver interperate them internally | 20:54 |
Sundar_ | For the specific case of libvirt and PCI, I don't see the issue with this. But I am not sure if that works for all devices and hypervisors. Power seems quite different, for example. may be efried can chime in. | 20:56 |
efried | Yeah, sean-k-mooney, platform-gnostic code is like half of the raison d'ĂȘtre for os-acc. | 20:58 |
Sundar_ | As long as the XML snippet is well-defined and constrained, what is the issue? | 20:58 |
efried | sean-k-mooney: Because e.g. the 'plug' operation for libvirt entails modifying xml, whereas on power it involves issuing a REST command to a proprietary API. | 20:58 |
sean-k-mooney | efried: plug does not modify xml | 20:59 |
sean-k-mooney | os-vif cant by design. | 20:59 |
efried | what does, then? | 20:59 |
efried | the nova virt driver? | 20:59 |
sean-k-mooney | the libvirt driver | 20:59 |
efried | okay, so be it. | 20:59 |
efried | what about discovery? | 21:00 |
efried | who's responsible for that? | 21:00 |
sean-k-mooney | os-vif is invoked with a ovo construted by the driver to wire up the backend | 21:00 |
efried | Again, libvirt => lspci (I think); Power => REST call. | 21:00 |
sean-k-mooney | efried: current the libvirt driver alos | 21:00 |
sean-k-mooney | i had proposed haveing a call in os-vif for discovery | 21:00 |
sean-k-mooney | it was pushed back on however so we can discover what plugins are install but thats about it | 21:01 |
sean-k-mooney | efried: in the os-vif the spcific example of modiging the hypervior code came up for smartnic. | 21:02 |
sean-k-mooney | sorry i shoudl retype that but what i ment to say originaly having os-vif generate xml so that new nic could be added was proposed | 21:02 |
sean-k-mooney | it was also rejected | 21:03 |
Sundar_ | sean-k-mooney: Please look at #link https://libvirt.org/formatdomain.html#elementsHostDevSubsys . How would the virt driver determine XML elements like <boot order='1'/> or <rom bar=...> ? | 21:03 |
Sundar_ | Those are device-specific and configuration-specific | 21:03 |
sean-k-mooney | Sundar_: today we already do that. at least part of it. | 21:04 |
sean-k-mooney | we would pass that infomation from neutron | 21:04 |
sean-k-mooney | in the vif_binding details | 21:04 |
efried | how does neutron know? | 21:05 |
efried | based on the ml2 thingy, or the plugin? | 21:05 |
sean-k-mooney | ml2 in general. | 21:05 |
Sundar_ | sean-k-mooney: That results in a double translation: first you have to fill in a VIF, and then translate that to another syntax. Since the final syntax is also well-defined, we may as well go to that directly | 21:06 |
sean-k-mooney | so for example ml2/ovs detect if ovs is kernel or dpdk and select vhost-user or a tap device | 21:06 |
Sundar_ | As devices and requirements both evolve, it would be faster to change os-acc alone rather than both os-acc and all virt drivers in Nova | 21:07 |
sean-k-mooney | Sundar_: yes its call the bridge pattern it allow both impmentation to vary indepently | 21:07 |
Sundar_ | We are not preventing independent evolution of the virt driver by providing XML snippets or equivalent | 21:08 |
Sundar_ | Restating: As devices and requirements both evolve, it would be faster to change os-acc alone rather than both os-acc and all virt drivers in Nova | 21:08 |
sean-k-mooney | Sundar_: will os-acc take into account the numa affinity of the ram and cpus selected by the driver as part of it xml generation | 21:09 |
sean-k-mooney | Sundar_: will it also account for the fact that if qe are using libvirt/qemu instead of libvirt/kvm we dont provide numa affintiy | 21:09 |
Sundar_ | That goes to the heart of it: NUMA affinity etc. would be stated in a device-specific way, and presumably will be in Cyborg or os-acc. Nova does not want to deal with such device-specific config | 21:10 |
sean-k-mooney | they are both libvirt but there is more copeling then i think you are hoping form to the internal of the libvirt driver | 21:10 |
Sundar_ | In your scheme, where would NUMA affinity for, say, a Quick Assist device be specified? | 21:11 |
sean-k-mooney | nova | 21:11 |
Sundar_ | But I thought Nova developers stated loud and clear that they don;t want to handle devices and their configurations | 21:12 |
sean-k-mooney | unless we are talking about moveing all passthough, pci,vgpu,pmem, sriov out of nova and out of the resouce tracker into a singel lib that is hypervior agnistic i dont see how it reduces compleity | 21:13 |
sean-k-mooney | there is a difference between phyical device management and virtualistation of the same | 21:14 |
sean-k-mooney | efried: what is your perspectiv | 21:14 |
Sundar_ | Back at the Ireland PTG, it was stated in one of the Nova sessions that handling PCI devices in general could be out of Nova. There is no clear roadmap for migrating device handling out of Nova, but I believe that's the general direction that people want to move | 21:15 |
sean-k-mooney | device handeling yes hypoervisor xml generation no | 21:17 |
efried | I'd like nova to be able to handle *scheduling* of devices generically like any other resource. But doing real stuff like attaching them to VMs, programming them, carving physical into virtual, configuring QoS, etc. - all that is going to need to be platform-specific (and sometimes device-specific code). There's no way around that. | 21:17 |
sean-k-mooney | and also i had suggested doing that back in denver by refactoring the pci code in the resouce tracker into a seperate lib with an ovo interrface and no rest or user facing api. | 21:17 |
efried | So it's going to need to be done in the virt driver, or some plugin. | 21:17 |
sean-k-mooney | efried: atching them to vms however is something the hypervior will do at the end of the day no? | 21:19 |
sean-k-mooney | the discovery is going to be device specific. in somecases the discover will only be posible via the hypervor | 21:20 |
sean-k-mooney | but in the later case that is because we are using something like hyperv or vspher where we cannont run code on the hypervior | 21:21 |
Sundar_ | There are lots of touch points between devices and CPU/memory, and they are going to grow and evolve over time. Having a restrictive API could constrain or slow such evolution. For example, shared virtual memory along the lines of CAPI | 21:21 |
sean-k-mooney | Sundar_: shared virtual memory beteen device and guest require you to generat the xml why | 21:22 |
Sundar_ | By shared virtual memory, I don;t mean DMA, but coherent memory where the device is participating in the processor's cache coherence protocol. AFAIK, that is not standard yet in libvirt. It is going to require further evolution. That will presumably result in some XML tags | 21:24 |
sean-k-mooney | Sundar_: looking at https://www.ibm.com/developerworks/aix/library/au-aix-achieve-performance-capi/index.html it looks like ivhsmem | 21:24 |
efried | sean-k-mooney: "run code on the hypervisor" is not something nova should ever, ever do. and by nova I mean "outside of the virt driver". The virt driver can do it; or some platform-specific plugin can do it. | 21:24 |
sean-k-mooney | efried: yes i am fine with that. | 21:25 |
sean-k-mooney | efried: when i was refering to running code on the hypervieros i was refing to the nova-compute agent and how it typically deployed on the hypervior with libvirt | 21:27 |
sean-k-mooney | im aware for ironic and orther hyperviros that the compute agent is not always colocated with the hypervior | 21:27 |
sean-k-mooney | although i also thend to reseve the them hypervior for the softare layer and not the phyical server that the vm/compute context executs on | 21:28 |
efried | By definition, the compute service runs on a system that knows how to talk to the hypervisor. Whether that's by exec'ing commands or communicating over a REST API. And either way, it's the responsibility of platform-specific code to do whatever that is. | 21:29 |
Sundar_ | efried: What is your take w.r.t Power/CAPI/IBM technologies on passing just the PCI address (or mediated device UUID, or some such) from os-acc to the virt driver? | 21:30 |
sean-k-mooney | efried: yes and by said definition the nova compute agent should only interfact with the phyical system through the api exposed by the hypervior that driver is talking to though there are some expetion to that in the libvirt case | 21:31 |
sean-k-mooney | Sundar_: in the CAPI case if this was libvirt kvm. the CAPI protocal would be impemnted between qemu and the device not libvirt and would require passing the memory regoins to qemu. | 21:32 |
Sundar_ | How would such device-specific data like memory regions be communicated to qemu? | 21:33 |
sean-k-mooney | Sundar_: we map a dma region in a simlar fashion for vhost user. in that case qemu passes the virt to phyical page traslation table to dpdk via the unix socket | 21:33 |
efried | Sundar_: Passing just the <identifier> to the virt driver for what operation? Plug? That works as long as there's some other place at which any necessary configuration has been done. Which again would need to be platform-specific code. | 21:33 |
sean-k-mooney | efried: vhost-user is useing a shared memory interface via dma to hugepages without ever passing that memory mapping to nova | 21:34 |
sean-k-mooney | the mappings are negociated between the qemu frontend and the dpdk backend | 21:35 |
Sundar_ | My model is that Cyborg discovers devices and their properties. There needs to be some way to communicate their properties and requirements to the hypervisor. We could extend the VAN with specific fields for each such property/requirement, or ... | 21:35 |
Sundar_ | we could use whatever the corresponding hypervisor already expects, as long as it is standard | 21:35 |
sean-k-mooney | how many different protocls do you expect to have to supprot | 21:38 |
sean-k-mooney | we will have mmio, ivshmem and capi but seam to take a simlar approch to do mmio via pci bars | 21:40 |
sean-k-mooney | we will have vfio-mediated devices | 21:41 |
sean-k-mooney | we will have pci passthough of different kinds. | 21:41 |
Sundar_ | That is because PCI MMIO is standardized. Whereas newer things like OpenCAPI and device-coherent SVM are all relatively new and likely to grow. The short answer is, I don't know how many such protocols there will be, nor do I think that anybody can tell you for sure today | 21:41 |
sean-k-mooney | Sundar_: so you are porposing creating a libary that needs to support not just multiple protocols, but multiple hyperviors, and multiple versions of those hyperviors instad of create an abstration layer container the common datataype | 21:43 |
Sundar_ | Going back to the Nova's philosophy, I think Nova wants to be agnostic of devices. Even things like NUMA affinity of devices may need to be modularized to minimize Nova involvement | 21:43 |
Sundar_ | 'support multiple hypervisors' -- yes, by using plugins/drivers. The existing Nova virt drivers are not (or should not be) device-aware. The os-acc plugins are expected to fill that need | 21:46 |
sean-k-mooney | https://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/os-vif-library.html#proposed-change if your spec does not already contain a detailed breakdown of the split of responiblities as was done for os-vif can i ask that you add one | 21:47 |
sean-k-mooney | Sundar_: i think the only interface between opentack and the hypervior should be via the nova virt driver | 21:48 |
sean-k-mooney | at least wehn it comre to the virtualisation fo the guest | 21:49 |
Sundar_ | Yes, ultimately the Nova virt driver handles all hypervisor-related actions for launching an instance. The os-acc (or its plugins) are helpers -- specific areas are delegated by Nova virt to os-acc. | 21:52 |
Sundar_ | The os-acc should not directly modift the instance XML, for instanc | 21:52 |
Sundar_ | The split of ownership can be added to the spec. Sure. | 21:53 |
Sundar_ | Does that address your concern? | 21:53 |
sean-k-mooney | Sundar_: if the nova-virt driver calls oss acc and is returned an ovo with then the virt driver uses that to generate the xml or the equvalent for another driver then yes. | 21:54 |
sean-k-mooney | but only there is a clear seperation between preparing the device. and generating the hyperviors specific configration input | 21:54 |
Sundar_ | I am wondering how the separation would be so clear-cut for device/cpu/memory interaction, such as conveying device properties for NUMA etc. while keeping Nova device-agnostic. Anyways, I need to join a call now. We can touch base later. | 22:02 |
Sundar_ | Thanks, efried and sean-k-mooney | 22:03 |
*** sean-k-mooney has left #openstack-cyborg | 22:11 | |
*** efried has quit IRC | 23:31 | |
*** efried has joined #openstack-cyborg | 23:31 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!