Thursday, 2018-08-09

openstackgerritwangxu proposed openstack/cyborg master: Fix deployable get all sort issue and unit test failure.  https://review.openstack.org/59007502:50
*** pliu has joined #openstack-cyborg08:03
openstackgerritShaoHe Feng proposed openstack/cyborg master: support sub provider  https://review.openstack.org/58514608:39
openstackgerritShaoHe Feng proposed openstack/cyborg master: support sub provider  https://review.openstack.org/58514611:04
openstackgerritShaoHe Feng proposed openstack/cyborg master: support sub provider  https://review.openstack.org/58514611:07
openstackgerritShaoHe Feng proposed openstack/cyborg master: support sub provider  https://review.openstack.org/58514612:09
*** shaohe_feng has joined #openstack-cyborg13:13
*** shaohe_feng has quit IRC13:18
*** jaypipes has quit IRC14:06
*** jaypipes has joined #openstack-cyborg14:10
*** guhcampos has joined #openstack-cyborg14:34
*** guhcampos has quit IRC14:41
*** guhcampos has joined #openstack-cyborg14:42
*** alex_xu has quit IRC15:24
*** Sundar_ has joined #openstack-cyborg20:39
*** guhcampos has quit IRC20:40
Sundar_Hi Eric and Sean20:40
efriedSundar_: hold on, Sean's not here yet...20:41
*** sean-k-mooney has joined #openstack-cyborg20:43
efriedSundar_, sean-k-mooney: Absolutely agree libvirt-XML-specific code should live in the os-acc driver and/or plugin, NOT in nova.20:45
Sundar_Sean, I am just saying that the os-acc provides device-specific info to Nova virt drivers depending on the hypervisor. For e.g., for libvirt + PCI, it may provide a XML snippet for a PCI device, but does not modify the VM's domain XML. That is left to the libvirt driver.20:46
Sundar_efried: Agreed20:46
sean-k-mooneyefried: only if the os-acc driver is part of nova and not cyborg20:46
Sundar_efried: Agreed20:46
efried"part of nova and not cyborg" - the driver stands alone. It talks to both nova and cyborg, but it's not really "part of" either.20:47
sean-k-mooneywhat is the argument for allowing arbitary xml to be instered form an external lib that is loaded dynamically at runtime20:47
sean-k-mooneyefried: this exact usecase was forbin for both os-brick and os-vif20:48
Sundar_sean-k-mooney: It is not necessarily arbitrary: the return value, including the schema, can be specified and checked by the virt driver. But it allows for more extensibility -- environment/device-specific variations can be handled better20:49
sean-k-mooneySundar_: how do you know how to asign virtual pci address for acclorator withing the guest20:50
sean-k-mooneyjsut as an example20:50
Sundar_sean-k-mooney: The analogy with os-vif cannot be taken too far. The os-vif handles a small number of plugins (Linux bridge, OVS, etc.) and it is intertwined with virt driver logic in practice. With os-acc, we will have far more variation in devices. Also, we do not want to repeat the coupling and legacy pracices of os-vif20:51
sean-k-mooneySundar_: you are going to end up coupleing the drivers to hyperviros in a way that is highly fragile20:52
Sundar_sean-k-mooney: For the virtual PCI, the os-acc plugin/driver will get it in a device-specific way and, depending on the hypervisor (e.g. libvirt+qemu), can construct the needed snippets20:52
sean-k-mooneySundar_: which would mean i would have to pass the iamge and flavor to you in addtion to the list of all interfaces20:53
Sundar_sean-k-mooney: "Coupling the drivers to hypervisors' -- can you make that more concrete with an example or specific scenario?20:53
sean-k-mooneywell for a start the fact that os-acc needs to know you are using libvirt at all is a failing in my view20:53
sean-k-mooneywhey canre we not define a datamodle using oslo versioned object that will be returned by os-acc and have each of the virt driver interperate them internally20:54
Sundar_For the specific case of libvirt and PCI, I don't see the issue with this. But I am not sure if that works for all devices and hypervisors. Power seems quite different, for example. may be efried can chime in.20:56
efriedYeah, sean-k-mooney, platform-gnostic code is like half of the raison d'ĂȘtre for os-acc.20:58
Sundar_As long as the XML snippet is well-defined and constrained, what is the issue?20:58
efriedsean-k-mooney: Because e.g. the 'plug' operation for libvirt entails modifying xml, whereas on power it involves issuing a REST command to a proprietary API.20:58
sean-k-mooneyefried: plug does not modify xml20:59
sean-k-mooneyos-vif cant by design.20:59
efriedwhat does, then?20:59
efriedthe nova virt driver?20:59
sean-k-mooneythe libvirt driver20:59
efriedokay, so be it.20:59
efriedwhat about discovery?21:00
efriedwho's responsible for that?21:00
sean-k-mooneyos-vif is invoked with a ovo construted by the driver to wire up the backend21:00
efriedAgain, libvirt => lspci (I think); Power => REST call.21:00
sean-k-mooneyefried: current the libvirt driver alos21:00
sean-k-mooneyi had proposed haveing a call in os-vif for discovery21:00
sean-k-mooneyit was pushed back on however so we can discover what plugins are install but thats about it21:01
sean-k-mooneyefried: in the os-vif the spcific example of modiging the hypervior code came up for smartnic.21:02
sean-k-mooneysorry i shoudl retype that but what i ment to say originaly having os-vif generate xml so that new nic could be added was proposed21:02
sean-k-mooneyit was also rejected21:03
Sundar_sean-k-mooney: Please look at #link https://libvirt.org/formatdomain.html#elementsHostDevSubsys . How would the virt driver determine XML elements like <boot order='1'/> or <rom bar=...> ?21:03
Sundar_Those are device-specific and configuration-specific21:03
sean-k-mooneySundar_: today we already do that. at least part of it.21:04
sean-k-mooneywe would pass that infomation from neutron21:04
sean-k-mooneyin the vif_binding details21:04
efriedhow does neutron know?21:05
efriedbased on the ml2 thingy, or the plugin?21:05
sean-k-mooneyml2 in general.21:05
Sundar_sean-k-mooney: That results in a double translation: first you have to fill in a VIF, and then translate that to another syntax. Since the final syntax is also well-defined, we may as well go to that directly21:06
sean-k-mooneyso for example ml2/ovs detect if ovs is kernel or dpdk and select vhost-user or a tap device21:06
Sundar_As devices and requirements both evolve, it would be faster to change os-acc alone rather than both os-acc and all virt drivers in Nova21:07
sean-k-mooneySundar_: yes its call the bridge pattern it allow both impmentation to vary indepently21:07
Sundar_We are not preventing independent evolution of the virt driver by providing XML snippets or equivalent21:08
Sundar_Restating: As devices and requirements both evolve, it would be faster to change os-acc alone rather than both os-acc and all virt drivers in Nova21:08
sean-k-mooneySundar_: will os-acc take into account the numa affinity of the ram and cpus selected by the driver as part of it xml generation21:09
sean-k-mooneySundar_: will it also account for the fact that if qe are using libvirt/qemu instead of libvirt/kvm we dont provide numa affintiy21:09
Sundar_That goes to the heart of it: NUMA affinity etc. would be stated in a device-specific way, and presumably will be in Cyborg or os-acc. Nova does not want to deal with such device-specific config21:10
sean-k-mooneythey are both libvirt but there is more copeling then i think you are hoping form to the internal of the libvirt driver21:10
Sundar_In your scheme, where would NUMA affinity for, say, a Quick Assist device be specified?21:11
sean-k-mooneynova21:11
Sundar_But I thought Nova developers stated loud and clear that they don;t want to handle devices and their configurations21:12
sean-k-mooneyunless we are talking about moveing all passthough, pci,vgpu,pmem, sriov out of nova and out of the resouce tracker into a singel lib that is hypervior agnistic i dont see how it reduces compleity21:13
sean-k-mooneythere is a difference between phyical device management and virtualistation of the same21:14
sean-k-mooneyefried: what is your perspectiv21:14
Sundar_Back at the Ireland PTG, it was stated in one of the Nova sessions that handling PCI devices in general could be out of Nova. There is no clear roadmap for migrating device handling out of Nova, but I believe that's the general direction that people want to move21:15
sean-k-mooneydevice handeling yes hypoervisor  xml generation no21:17
efriedI'd like nova to be able to handle *scheduling* of devices generically like any other resource. But doing real stuff like attaching them to VMs, programming them, carving physical into virtual, configuring QoS, etc. - all that is going to need to be platform-specific (and sometimes device-specific code). There's no way around that.21:17
sean-k-mooneyand also i had suggested doing that back in denver by refactoring the pci code in the resouce tracker into a seperate lib with an ovo interrface and no rest or user facing api.21:17
efriedSo it's going to need to be done in the virt driver, or some plugin.21:17
sean-k-mooneyefried: atching them to vms however is something the hypervior will do at the end of the day no?21:19
sean-k-mooneythe discovery is going to be device specific. in somecases the discover will only be posible via the hypervor21:20
sean-k-mooneybut in the later case that is because we are using something like hyperv or vspher where we cannont run code on the hypervior21:21
Sundar_There are lots of touch points between devices and CPU/memory, and they are going to grow and evolve over time. Having a restrictive API could constrain or slow such evolution. For example, shared virtual memory along the lines of CAPI21:21
sean-k-mooneySundar_: shared virtual memory  beteen device and guest require you to generat the xml why21:22
Sundar_By shared virtual memory, I don;t mean DMA, but coherent memory where the device is participating in the processor's cache coherence protocol. AFAIK, that is not standard yet in libvirt. It is going to require further evolution. That will presumably result in some XML tags21:24
sean-k-mooneySundar_: looking at https://www.ibm.com/developerworks/aix/library/au-aix-achieve-performance-capi/index.html it looks like ivhsmem21:24
efriedsean-k-mooney: "run code on the hypervisor" is not something nova should ever, ever do. and by nova I mean "outside of the virt driver". The virt driver can do it; or some platform-specific plugin can do it.21:24
sean-k-mooneyefried: yes i am fine with that.21:25
sean-k-mooneyefried: when i was refering to running code on the hypervieros i was refing to the nova-compute agent and how it typically deployed on the hypervior with libvirt21:27
sean-k-mooneyim aware for ironic and orther hyperviros that the compute agent is not always colocated with the hypervior21:27
sean-k-mooneyalthough i also thend to reseve the them hypervior for the softare layer and not the phyical server that the vm/compute context executs on21:28
efriedBy definition, the compute service runs on a system that knows how to talk to the hypervisor. Whether that's by exec'ing commands or communicating over a REST API. And either way, it's the responsibility of platform-specific code to do whatever that is.21:29
Sundar_efried: What is your take w.r.t Power/CAPI/IBM technologies on passing just the PCI address (or mediated device UUID, or some such) from os-acc to the virt driver?21:30
sean-k-mooneyefried: yes and by said definition the nova compute agent should only interfact with the phyical system through the api exposed by the hypervior that driver is talking to though there are some expetion to that in the libvirt case21:31
sean-k-mooneySundar_: in the CAPI case if this was libvirt kvm. the CAPI protocal would be impemnted between qemu and the device not libvirt and would require passing the memory regoins to qemu.21:32
Sundar_How would such device-specific data like memory regions be communicated to qemu?21:33
sean-k-mooneySundar_: we map a dma region in a simlar fashion for vhost user. in that case qemu passes the virt to phyical page traslation table to dpdk via the unix socket21:33
efriedSundar_: Passing just the <identifier> to the virt driver for what operation? Plug? That works as long as there's some other place at which any necessary configuration has been done. Which again would need to be platform-specific code.21:33
sean-k-mooneyefried: vhost-user is useing a shared memory interface via dma to hugepages without ever passing that memory mapping to nova21:34
sean-k-mooneythe mappings are negociated between the qemu frontend and the dpdk backend21:35
Sundar_My model is that Cyborg discovers devices and their properties. There needs to be some way to communicate their properties and requirements to the hypervisor. We could extend the VAN with specific fields for each such property/requirement, or ...21:35
Sundar_we could use whatever the corresponding hypervisor already expects, as long as it is standard21:35
sean-k-mooneyhow many different protocls do you expect to have to supprot21:38
sean-k-mooneywe will have mmio, ivshmem and capi but seam to take a simlar approch to do mmio via pci bars21:40
sean-k-mooneywe will have vfio-mediated devices21:41
sean-k-mooneywe will have pci passthough of different kinds.21:41
Sundar_That is because PCI MMIO is standardized. Whereas newer things like OpenCAPI and device-coherent SVM are all relatively new and likely to grow. The short answer is, I don't know how many such protocols there will be, nor do I think that anybody can tell you for sure today21:41
sean-k-mooneySundar_: so you are porposing creating a libary that needs to support not just multiple protocols, but multiple hyperviors, and multiple versions of those hyperviors instad of create an abstration layer container the common datataype21:43
Sundar_Going back to the Nova's philosophy, I think Nova wants to be agnostic of devices. Even things like NUMA affinity of devices may need to be modularized to minimize Nova involvement21:43
Sundar_'support multiple hypervisors' -- yes, by using plugins/drivers. The existing Nova virt drivers are not (or should not be) device-aware. The os-acc plugins are expected to fill that need21:46
sean-k-mooneyhttps://specs.openstack.org/openstack/nova-specs/specs/newton/implemented/os-vif-library.html#proposed-change if your spec does not already contain a detailed breakdown of the split of responiblities as was done for os-vif can i ask that you add one21:47
sean-k-mooneySundar_: i think the only interface between opentack and the hypervior should be via the nova virt driver21:48
sean-k-mooneyat least wehn it comre to the virtualisation fo the guest21:49
Sundar_Yes, ultimately the Nova virt driver handles all hypervisor-related actions for launching an instance. The os-acc (or its plugins) are helpers -- specific areas are delegated by Nova virt to os-acc.21:52
Sundar_The os-acc should not directly modift the instance XML, for instanc21:52
Sundar_The split of ownership can be added to the spec. Sure.21:53
Sundar_Does that address your concern?21:53
sean-k-mooneySundar_: if the nova-virt driver calls oss acc and is returned an ovo with then the virt driver uses that to generate the xml or the equvalent for another driver then yes.21:54
sean-k-mooneybut only there is a clear seperation between preparing the device. and generating the hyperviors specific configration input21:54
Sundar_I am wondering how the separation would be so clear-cut for device/cpu/memory interaction, such as conveying device properties for NUMA etc. while keeping Nova device-agnostic. Anyways, I need to join a call now. We can touch base later.22:02
Sundar_Thanks, efried and sean-k-mooney22:03
*** sean-k-mooney has left #openstack-cyborg22:11
*** efried has quit IRC23:31
*** efried has joined #openstack-cyborg23:31

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!