openstackgerrit | Li Liu proposed openstack/cyborg master: Added bitstream metadata standardization spec https://review.openstack.org/558265 | 02:25 |
---|---|---|
*** evin has joined #openstack-cyborg | 06:16 | |
*** xinran_ has joined #openstack-cyborg | 06:17 | |
*** xinran_ has quit IRC | 10:26 | |
*** xinran_ has joined #openstack-cyborg | 11:33 | |
*** edleafe has quit IRC | 12:23 | |
*** mszwed has quit IRC | 12:23 | |
*** edleafe has joined #openstack-cyborg | 12:26 | |
*** mszwed has joined #openstack-cyborg | 12:26 | |
*** egallen has joined #openstack-cyborg | 12:47 | |
*** circ-user-c3hdH has joined #openstack-cyborg | 13:39 | |
*** xinran_ has quit IRC | 13:43 | |
*** shaohe_feng_ has joined #openstack-cyborg | 13:47 | |
*** zhipeng has joined #openstack-cyborg | 13:53 | |
*** evin has quit IRC | 13:54 | |
*** xinran_ has joined #openstack-cyborg | 13:56 | |
*** Sundar has joined #openstack-cyborg | 13:57 | |
*** NokMikeR has joined #openstack-cyborg | 13:57 | |
*** Li_Liu has joined #openstack-cyborg | 13:59 | |
Li_Liu | #info Li Liu | 13:59 |
zhipeng | #startmeeting openstack-cyborg | 13:59 |
openstack | Meeting started Wed Apr 25 13:59:21 2018 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. | 13:59 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 13:59 |
*** openstack changes topic to " (Meeting topic: openstack-cyborg)" | 13:59 | |
openstack | The meeting name has been set to 'openstack_cyborg' | 13:59 |
zhipeng | #topic Roll Call | 13:59 |
*** openstack changes topic to "Roll Call (Meeting topic: openstack-cyborg)" | 13:59 | |
zhipeng | #info Howard | 13:59 |
NokMikeR | #info Mike | 13:59 |
Sundar | #info Sundar | 13:59 |
Sundar | Hi Howard and Mike | 14:00 |
NokMikeR | Hello Sundar, Howard et all. | 14:00 |
circ-user-c3hdH | #info Helloway | 14:00 |
zhipeng | Hello everyone :) | 14:00 |
Li_Liu | #info Li Liu | 14:00 |
Li_Liu | Hi guys | 14:01 |
shaohe_feng_ | #info shaohe | 14:03 |
shaohe_feng_ | hi all | 14:03 |
zhipeng | hi | 14:03 |
zhipeng | let's start then | 14:04 |
zhipeng | #topic KubeCon EU ResMgmt WG preparation | 14:05 |
*** openstack changes topic to "KubeCon EU ResMgmt WG preparation (Meeting topic: openstack-cyborg)" | 14:05 | |
zhipeng | #link https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU/edit?usp=sharing | 14:05 |
xinran_ | Hi all | 14:05 |
zhipeng | so i think I mentioned that for this year's planning I want to be able to align what we have done here | 14:05 |
zhipeng | with the k8s community | 14:05 |
zhipeng | KubeCon EU is around the corner next week, and it would be a great place to start participating | 14:06 |
edleafe | #info edleafe | 14:06 |
zhipeng | Sundar could you help share some status about the resource mgmt wg in k8s ? | 14:06 |
Sundar | Sure | 14:06 |
Li_Liu | that's great | 14:06 |
Sundar | We started participating last year, with a document describing the FPGA structure and use cases | 14:07 |
Sundar | The main thing to note is that the FPGA structural model -- with regions, accelerators, local memory etc. -- is the same independent of orch framework -- openStack, K8s etc | 14:07 |
Sundar | Also, the use cases defined in the Cyborg/Nova spec stay the same -- FPGA as a Service, Accelerated Function as a Service, etc. | 14:08 |
Sundar | The main difference is in the set of mechanisms available | 14:08 |
Sundar | In OpenSTack, we have the notion of nested Resource Providers (nRPs), which provides a natural tree structure that matches many device topologies | 14:09 |
Sundar | The data models and resource handling in K8s is still evolving | 14:09 |
Sundar | What we have now is the device plugin mechanism: there is standard API by which kubelet can invoke a plugin for a category of devices | 14:10 |
Sundar | The plugin advertises a resource name, e.g. intel.com/fpga-a10, and lists the devices corresponding to that. There is also a provision to update that list over time, and to report the health of each device | 14:11 |
Sundar | Based on this information, when a pod spec asks fora resource, the standard K8s scheduler picks a node and informs the kubelet on that node | 14:12 |
Sundar | The kubelet then invokes another API on the device plugin to allocate a device of the requested type and prepare it | 14:12 |
*** Chuck_ has joined #openstack-cyborg | 14:12 | |
Sundar | After that, the kubelet invokes a container runtime (e.g. Docker) through the CRI with an OCI runtime spec | 14:13 |
Sundar | This basic mechanism does not include the nested structure of FPGAs, and we have been discussing how to fit that in | 14:14 |
Sundar | However, there are many options: we can use Custom Resource Definitions (CRDs) https://kubernetes.io/docs/concepts/api-extension/custom-resources/#customresourcedefinitions | 14:14 |
zhipeng | and vGPU as well I suppose ? | 14:14 |
Sundar | Howard, yes, I think vGPUs, esp.of different types, will also require further consideration | 14:15 |
Sundar | CRDs are eseentially custom resource classes: we can instantiate resources for a CRD. | 14:15 |
Sundar | There are also two ongoing proposals for including resource classes: | 14:16 |
Sundar | The first one is: https://docs.google.com/document/d/1qKiIVs9AMh2Ua5thhtvWqOqW0MSle_RV3lfriO1Aj6U/edit# | 14:16 |
Sundar | An alternative proposal for resource classes is at: https://docs.google.com/document/d/1666PPUs4Lz56TqKygcy6mXkNazde-vwA7q4e5H92sUc/edit# | 14:17 |
Sundar | Thei stated goals and non-goals are not exactly the same. | 14:18 |
Li_Liu | We don't have access two those 2 google docs.. | 14:18 |
Sundar | Li_Liu, these are supposed to be public dos -- please ask for access | 14:18 |
Sundar | *docs | 14:18 |
Li_Liu | ok, just did | 14:18 |
zhipeng | Sundar there is a PR on Resource API | 14:19 |
zhipeng | this is correlates to jiaying's or vish's ? | 14:19 |
Sundar | IMHO, lot of the discussion comes from GPU background. For FPGAs, we are trying to get alignment within Intel first | 14:19 |
Sundar | Zhipeng, Jiaying's proposal came first -- but I need to look at the PR before confirming | 14:20 |
zhipeng | okey :) | 14:21 |
Sundar | So, in short, our task is to find a way of handling FPGAs, without having the benefit of a tree-structured data model, but handling the same FPGA structures and usage models | 14:21 |
NokMikeR | is the vGPU is vendor specific? ie tied to a particular driver implementation? | 14:21 |
Li_Liu | Sundar, does the k8s community have any plans to add the tree-structure data model in the near future? | 14:23 |
Sundar | NokMikeR, the discussions I have seen have been centered on Nvidia's vGPU types, though not necessarily phrased in vendor-specific terms | 14:23 |
NokMikeR | in other words how do you differentie the features on one vGPU vs another if the underlying features in the real gpu are different - or are they abstracted somehow? | 14:23 |
Sundar | Li_Liu, None that I am aware of. | 14:23 |
NokMikeR | Sundar: ok thought so re: nvid gpus. | 14:24 |
Sundar | NokMikeR, in OpenStack, the answer is clearer: the device itself exposes different vGPU types as traits, and their capacities as units of a generic accelerator RC | 14:24 |
Sundar | Cyborg needs to handle GPUs and FPGAs of course. But, IMHO, there is enough attention on GPUs :) It is FPGAs that need further thought :) | 14:25 |
zhipeng | Sundar Jiaying's proposal, as far as I understand, still tries to modify the k8s core functionality ? | 14:25 |
shaohe_feng_ | Sundar: do we decide to only support one vendor GPU or FPGA in this release without nest Provider? | 14:26 |
Sundar | Zhipeng, yes, it requires changes on controller side and kubelet changes | 14:26 |
Sundar | Shaohe: for Rocky, I was proposing to include only device of a particualr type: one GPU or one FPGA. But, based on feedback, w ehave to relax it to multiple devices of the same type, i.e., | 14:28 |
Sundar | you could have 2 GPUs of the same type, 2 FPGAs of the same type etc. | 14:28 |
zhipeng | Sundar if we say, propose a CRD type of kube-cyborg thing, will it make sense to the res mgmt wg people ? | 14:28 |
zhipeng | meaning that similar to OpenStack | 14:29 |
zhipeng | we view accelerator not part of the general compute infra | 14:29 |
zhipeng | and have its own model and scheduling process if needed | 14:29 |
Sundar | ZHipeng, we can propose CRDs, but the exact workflows will matter. | 14:30 |
Li_Liu | zhipeng, you are saying, similar to what we did to Cyborg, we cut a piece out from K8S? | 14:30 |
zhipeng | I think the main pain point is still at scheduler extention | 14:30 |
zhipeng | which Derek also mentioned KubeCon last Dec | 14:31 |
zhipeng | Li_Liu essentially a out-of-band controller for accelerators | 14:31 |
shaohe_feng_ | Sundar: will cyborg support nest provider in Rocky release? | 14:31 |
zhipeng | shaohe_feng_ I think Placement won't support it | 14:32 |
shaohe_feng_ | zhipeng: Got it. | 14:32 |
zhipeng | but the way we are modeling it is very close to nrp, correct me if i'm wrong Li_Liu | 14:32 |
Sundar | Zhipeng, I was also advocating a scheduler extension. But apparently it is not popular within the community. There is a proposal to revamp the scheduler itself: https://docs.google.com/document/d/1NskpTHpOBWtIa5XsgB4bwPHz4IdRxL1RNvdjI7RVGio/edit# | 14:32 |
Sundar | So, the scheduler, as well as its extension APIs, may change | 14:32 |
zhipeng | well CRDs are generally great for API aggregation, but complex for resource related functionalities | 14:34 |
Sundar | Here is a possible way to get to a few basic cases without anything fancy (this is not fully agreed upon, please take this as an option, not a plan): | 14:34 |
zhipeng | like binding the resource to the pod | 14:34 |
zhipeng | since the process is external via CRD | 14:34 |
Sundar | •Publish each region type as a resource. E.g. intel.com/fpga-dcp, intel.com/fpga-vg. •The pod spec asks for a region type as a resource, and also specifies a bitstream ID. That could be a label. •An admission controller inserts an init container on seeing a FPGA resource. •The scheduler picks a node based on the requested region type (and ignores the bitstream ID). •The init container pulls the bitstream wi | 14:34 |
Sundar | Ah, that didn't come out well in IRC -- let me re-type | 14:35 |
Sundar | •Publish each region type as a resource. E.g. intel.com/fpga-dcp, intel.com/fpga-vg | 14:35 |
Sundar | •The pod spec asks for a region type as a resource, and also specifies a bitstream ID. That could be a label | 14:35 |
Sundar | •An admission controller inserts an init container on seeing a FPGA resource. | 14:35 |
Sundar | •The scheduler picks a node based on the requested region type (and ignores the bitstream ID). | 14:35 |
Sundar | •The init container pulls the bitstream with that ID from a bitstream repository (mechanism TBD) and programs the selected device. | 14:36 |
Sundar | I have heard that, if we give a higher security context to the init container for programming, it may affect other containers in the same pod. I am still trying to find evidence for that | 14:37 |
zhipeng | lol this is just too complicated | 14:37 |
zhipeng | thx Sundar I think we have a good understanding of the status qup | 14:38 |
zhipeng | quo | 14:38 |
Sundar | Zhipeng, more complicated than other proposals out there? ;) | 14:38 |
zhipeng | anyone else got questions regarding k8s ? | 14:38 |
zhipeng | if you are attending KubeCon we could meet f2f, and give them hell XD | 14:38 |
Sundar | lol | 14:39 |
* NokMikeR braces for impact | 14:39 | |
Li_Liu | if you guys have any dial-in-able meeting during kubecon, please loop us in | 14:41 |
shaohe_feng_ | yes, loop us in | 14:41 |
zhipengh[m] | Okey I will give a howler if a bridge is available | 14:42 |
*** zhipeng has quit IRC | 14:42 | |
zhipengh[m] | Seems like my PC irc client just died | 14:42 |
zhipengh[m] | #topic Sub team arrangements | 14:43 |
*** zhipeng has joined #openstack-cyborg | 14:45 | |
zhipeng | phew | 14:46 |
Sundar | :) | 14:46 |
zhipeng | cell phone irc bouncer crashed just now | 14:46 |
zhipeng | moving on | 14:46 |
zhipeng | #topic subteam arrangments | 14:47 |
*** openstack changes topic to "subteam arrangments (Meeting topic: openstack-cyborg)" | 14:47 | |
zhipeng | okey so given recent events, I think it is necessary to reorg the subteams | 14:47 |
zhipeng | and also encourage subteam to organize their specific meetings | 14:48 |
zhipeng | for specific topics | 14:48 |
zhipeng | so I would suggest shaohe to help lead the driver subteam | 14:48 |
zhipeng | work with our Xilinx and lenovo colleagues on FPGA and GPU driver in Rocky | 14:49 |
shaohe_feng_ | Ok. | 14:49 |
zhipeng | Li Liu help lead the doc team, to work with our CMCC member and others to make documentation as good as your spec :) | 14:49 |
Li_Liu | sure | 14:49 |
zhipeng | I will keep on the release mgmt side | 14:49 |
zhipeng | shaohe_feng_ you can sync up with Chuck_ on a meeting time more suited for US west coast | 14:50 |
zhipeng | mainly China morning times I guess | 14:50 |
shaohe_feng_ | zhipeng: what is Chuck_? | 14:51 |
Chuck_ | I am Chuck_ :-) | 14:51 |
Chuck_ | Hi Shaohe, this is Chuck from Xilinx | 14:51 |
Sundar | lol | 14:51 |
shaohe_feng_ | Chuck_: hello | 14:51 |
Chuck_ | I work from US west time | 14:51 |
zhipeng | I will add Chuck_ into our wechat group as well | 14:51 |
zhipeng | talk in Chinese :) | 14:52 |
Li_Liu | count me in for those driver meeting shaohe_feng_ | 14:52 |
Li_Liu | :) | 14:52 |
Chuck_ | yes, look forward to working with you. | 14:52 |
shaohe_feng_ | Li_Liu: OK. | 14:53 |
zhipeng | and subteam plz send report to the mailing list, you can decide whether it is bi-weekly or weekly | 14:53 |
zhipeng | or monthly even | 14:53 |
zhipeng | up to you | 14:53 |
Sundar | Is it all WeChat in Chinse then? ;) I can join if that helps | 14:53 |
zhipeng | it is for the China region devs :P | 14:54 |
zhipeng | all in Chinese and crazy emoticons :P | 14:54 |
Li_Liu | WeChat has a translation feature tho :) | 14:54 |
shaohe_feng_ | Sundar: we speak Chinese there. :) | 14:54 |
shaohe_feng_ | Sundar: you can learn Chinese. | 14:54 |
Sundar | OK :) My daughters learnt Mandarin. I should have joined them | 14:55 |
zhipeng | haha will learn a lot | 14:55 |
Sundar | :) Do we have alignment on what use cases we will deliver in Rocky? | 14:56 |
zhipeng | CERN HPC PoC could be the one for GPU | 14:57 |
Sundar | Can we say we will deliver AFaaS pre-programmed, and FPGA aaS with request time programming? These are the simplest ones, and many customers want that | 14:57 |
Sundar | Zhipeng, yes, GPU POC too | 14:57 |
zhipeng | Sundar that is something we should be able to deliver | 14:57 |
zhipeng | for FPGA | 14:57 |
shaohe_feng_ | Sundar: oh, do we will have same RC name for VGPU and FPGA, and other accelerators in Rocky release? | 14:57 |
Sundar | Shaohe, yes our agreemnt with Nova is to use a generic RC for all accelerators | 14:58 |
Sundar | as in the spec | 14:59 |
Sundar | DO we have kosamara here? | 14:59 |
Sundar | Any input on the spec from GPU perspective? | 14:59 |
shaohe_feng_ | Sundar: But I'm worry about without nest provider. | 14:59 |
Sundar | Shaohe, without nRP, we will apply the traits etc. to the compute node RP | 15:00 |
Sundar | Do you see problems in doing that? | 15:00 |
shaohe_feng_ | how do we distinguish vGPU and FPGA in one host? | 15:01 |
*** zhipeng has quit IRC | 15:01 | |
*** edleafe has quit IRC | 15:01 | |
*** mszwed has quit IRC | 15:01 | |
*** circ-user-c3hdH has quit IRC | 15:01 | |
*** adreznec has quit IRC | 15:01 | |
*** ChanServ has quit IRC | 15:01 | |
*** circ-user-c3hdH has joined #openstack-cyborg | 15:01 | |
*** adreznec has joined #openstack-cyborg | 15:01 | |
*** ChanServ has joined #openstack-cyborg | 15:01 | |
*** barjavel.freenode.net sets mode: +o ChanServ | 15:01 | |
Li_Liu | hello? | 15:02 |
*** zhipeng has joined #openstack-cyborg | 15:02 | |
NokMikeR | welcome back | 15:02 |
Li_Liu | everyone quit? | 15:02 |
NokMikeR | net split a glitch in the matrix. | 15:02 |
Sundar | All traits will get mixed in the compute node. So, a flavor asking for resource:CUSTOM_ACCELERATOR=1 and trait:CUSTOM_GPU_<foo> will come to cyborg which needs to choose a GPU based on its Deployables | 15:02 |
Sundar | Li_Liu, I am still here :) | 15:02 |
*** circ-user-c3hdH has quit IRC | 15:03 | |
shaohe_feng_ | Sundar: for example, we make a FPGA traits and GPU traits on host provider. | 15:03 |
*** edleafe has joined #openstack-cyborg | 15:03 | |
*** mszwed has joined #openstack-cyborg | 15:03 | |
zhipeng | okey this is a crazy night | 15:03 |
shaohe_feng_ | Sundar: and there is a one GPU and one FPGA | 15:03 |
Sundar | Sorry, i have another call at this time | 15:03 |
shaohe_feng_ | firstly we consume one FPGA | 15:04 |
Sundar | Shaohe, can we continue another time? | 15:04 |
shaohe_feng_ | Sundar: OK. | 15:04 |
zhipeng | okey moving on | 15:04 |
zhipeng | #topic critical rocky spec update | 15:04 |
*** openstack changes topic to "critical rocky spec update (Meeting topic: openstack-cyborg)" | 15:04 | |
*** ttk2[m] has quit IRC | 15:05 | |
zhipeng | xinran_ are you still around ? | 15:05 |
xinran_ | Yes I’m here | 15:05 |
zhipeng | could you provide a brief update on the quota spec ? | 15:06 |
*** zhipengh[m] has quit IRC | 15:06 | |
xinran_ | Ok. Like we discussed with xiyuan, I think we should do the usage part first, so I want to separate the spec into two part. | 15:09 |
xinran_ | What do you think | 15:09 |
zhipeng | makes sense | 15:11 |
zhipeng | have you already updated it as two parts ? | 15:11 |
xinran_ | Not yet will update the usage part in this week | 15:12 |
zhipeng | sounds great :) | 15:12 |
zhipeng | #action xinran to update the quota spec into two parts and complete the usage one first | 15:13 |
xinran_ | For the limit part, i think it still need more discussion with xiyuan | 15:13 |
zhipeng | no problem | 15:13 |
Li_Liu | So that we will have 2 spec on this in stead of 1 right? | 15:13 |
zhipeng | yep | 15:13 |
zhipeng | but the limit one should be rather simple | 15:13 |
zhipeng | since we will utilize a lot of things Keystone has already designed | 15:14 |
Li_Liu | I see | 15:14 |
xinran_ | Yes I think so | 15:15 |
zhipeng | okey another thing is that the os-acc spec will need more time | 15:15 |
zhipeng | so I suggest we relax the deadline for proposal on that one to the June MS2 | 15:16 |
zhipeng | if by then we still could not get it landed, then I will block it for Rocky but could be land first thing in Stein :) | 15:16 |
Li_Liu | zhipeng, let me know if you need some help on that one | 15:16 |
zhipeng | sounds reasonable | 15:16 |
zhipeng | ? | 15:16 |
zhipeng | Li_Liu sure :) | 15:17 |
Li_Liu | since I think I have some thoughts on it | 15:17 |
zhipeng | #agreed os-acc spec extended to MS2 for approval | 15:17 |
zhipeng | Li_Liu no problem, feel free to share it | 15:17 |
zhipeng | okey moving on | 15:18 |
zhipeng | #topic open patches/bugs | 15:18 |
*** openstack changes topic to "open patches/bugs (Meeting topic: openstack-cyborg)" | 15:18 | |
zhipeng | I will push up the fix for mutable-config this week | 15:18 |
zhipeng | maybe combined with the fix lenovo folk has provided but blocked by me due to trivial fix | 15:20 |
zhipeng | any other issues on this topic ? | 15:21 |
zhipeng | okey then | 15:22 |
zhipeng | #topic AoB | 15:22 |
*** openstack changes topic to "AoB (Meeting topic: openstack-cyborg)" | 15:22 | |
zhipeng | any other business | 15:22 |
xinran_ | what is AoB...... | 15:23 |
xinran_ | Ah i know! | 15:23 |
zhipeng | xinran_ bien :) | 15:24 |
xinran_ | ;) | 15:25 |
zhipeng | okey if there are no other topics | 15:26 |
zhipeng | let's conclude the meeting today | 15:26 |
zhipeng | thx for the great conversation :) | 15:26 |
zhipeng | #endmeeting | 15:26 |
*** openstack changes topic to "OpenStack Cyborg Project Discussion" | 15:26 | |
openstack | Meeting ended Wed Apr 25 15:26:31 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 15:26 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-04-25-13.59.html | 15:26 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-04-25-13.59.txt | 15:26 |
openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-04-25-13.59.log.html | 15:26 |
NokMikeR | thanks all, bye | 15:26 |
*** NokMikeR has quit IRC | 15:27 | |
*** zhipeng has quit IRC | 15:28 | |
*** Chuck_ has quit IRC | 15:50 | |
*** ttk2[m] has joined #openstack-cyborg | 16:12 | |
*** zhipengh[m] has joined #openstack-cyborg | 16:22 | |
*** xinran_ has quit IRC | 17:33 | |
*** evin has joined #openstack-cyborg | 17:34 | |
*** shaohe_feng_ has quit IRC | 17:36 | |
*** Li_Liu has quit IRC | 18:37 | |
*** Sundar has quit IRC | 18:57 | |
*** evin has quit IRC | 20:19 | |
*** openstackgerrit has quit IRC | 21:20 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!