Wednesday, 2018-04-25

openstackgerritLi Liu proposed openstack/cyborg master: Added bitstream metadata standardization spec  https://review.openstack.org/55826502:25
*** evin has joined #openstack-cyborg06:16
*** xinran_ has joined #openstack-cyborg06:17
*** xinran_ has quit IRC10:26
*** xinran_ has joined #openstack-cyborg11:33
*** edleafe has quit IRC12:23
*** mszwed has quit IRC12:23
*** edleafe has joined #openstack-cyborg12:26
*** mszwed has joined #openstack-cyborg12:26
*** egallen has joined #openstack-cyborg12:47
*** circ-user-c3hdH has joined #openstack-cyborg13:39
*** xinran_ has quit IRC13:43
*** shaohe_feng_ has joined #openstack-cyborg13:47
*** zhipeng has joined #openstack-cyborg13:53
*** evin has quit IRC13:54
*** xinran_ has joined #openstack-cyborg13:56
*** Sundar has joined #openstack-cyborg13:57
*** NokMikeR has joined #openstack-cyborg13:57
*** Li_Liu has joined #openstack-cyborg13:59
Li_Liu#info Li Liu13:59
zhipeng#startmeeting openstack-cyborg13:59
openstackMeeting started Wed Apr 25 13:59:21 2018 UTC and is due to finish in 60 minutes.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.13:59
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.13:59
*** openstack changes topic to " (Meeting topic: openstack-cyborg)"13:59
openstackThe meeting name has been set to 'openstack_cyborg'13:59
zhipeng#topic Roll Call13:59
*** openstack changes topic to "Roll Call (Meeting topic: openstack-cyborg)"13:59
zhipeng#info Howard13:59
NokMikeR#info Mike13:59
Sundar#info Sundar13:59
SundarHi Howard and Mike14:00
NokMikeRHello Sundar, Howard et all.14:00
circ-user-c3hdH#info Helloway14:00
zhipengHello everyone :)14:00
Li_Liu#info Li Liu14:00
Li_LiuHi guys14:01
shaohe_feng_#info shaohe14:03
shaohe_feng_hi all14:03
zhipenghi14:03
zhipenglet's start then14:04
zhipeng#topic KubeCon EU ResMgmt WG preparation14:05
*** openstack changes topic to "KubeCon EU ResMgmt WG preparation (Meeting topic: openstack-cyborg)"14:05
zhipeng#link https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU/edit?usp=sharing14:05
xinran_Hi all14:05
zhipengso i think I mentioned that for this year's planning I want to be able to align what we have done here14:05
zhipengwith the k8s community14:05
zhipengKubeCon EU is around the corner next week, and it would be a great place to start participating14:06
edleafe#info edleafe14:06
zhipengSundar could you help share some status about the resource mgmt wg in k8s ?14:06
SundarSure14:06
Li_Liuthat's great14:06
SundarWe started participating last year, with a document describing the FPGA structure and use cases14:07
SundarThe main thing to note is that the FPGA structural model -- with regions, accelerators, local memory etc. -- is the same independent of orch framework -- openStack, K8s etc14:07
SundarAlso, the use cases defined in the Cyborg/Nova spec stay the same -- FPGA as a Service, Accelerated Function as a Service, etc.14:08
SundarThe main difference is in the set of mechanisms available14:08
SundarIn OpenSTack, we have the notion of nested Resource Providers (nRPs), which provides a natural tree structure that matches many device topologies14:09
SundarThe data models and resource handling in K8s is still evolving14:09
SundarWhat we have now is the device plugin mechanism: there is standard API by which kubelet can invoke a plugin for  a category of devices14:10
SundarThe plugin advertises a resource name, e.g. intel.com/fpga-a10, and lists the devices corresponding to that. There is also a provision to update that list over time, and to report the health of each device14:11
SundarBased on this information, when a pod spec asks fora resource, the standard K8s scheduler picks a node and informs the kubelet on that node14:12
SundarThe kubelet then invokes another API on the device plugin to allocate a device of the requested type and prepare it14:12
*** Chuck_ has joined #openstack-cyborg14:12
SundarAfter that, the kubelet invokes a container runtime (e.g. Docker) through the CRI with an OCI runtime spec14:13
SundarThis basic mechanism does not include the nested structure of FPGAs, and we have been discussing how to fit that in14:14
SundarHowever, there are many options: we can use Custom Resource Definitions (CRDs) https://kubernetes.io/docs/concepts/api-extension/custom-resources/#customresourcedefinitions14:14
zhipengand vGPU as well I suppose ?14:14
SundarHoward, yes, I think vGPUs, esp.of different types, will also require further consideration14:15
SundarCRDs are eseentially custom resource classes: we can instantiate resources for a CRD.14:15
SundarThere are also two ongoing proposals for including resource classes:14:16
SundarThe first one is: https://docs.google.com/document/d/1qKiIVs9AMh2Ua5thhtvWqOqW0MSle_RV3lfriO1Aj6U/edit#14:16
SundarAn alternative proposal for resource classes is at: https://docs.google.com/document/d/1666PPUs4Lz56TqKygcy6mXkNazde-vwA7q4e5H92sUc/edit#14:17
SundarThei stated goals and non-goals are not exactly the same.14:18
Li_LiuWe don't have access two those 2 google docs..14:18
SundarLi_Liu, these are supposed to be public dos -- please ask for access14:18
Sundar*docs14:18
Li_Liuok, just did14:18
zhipengSundar there is a PR on Resource API14:19
zhipengthis is correlates to jiaying's or vish's ?14:19
SundarIMHO, lot of the discussion comes from GPU background. For FPGAs, we are trying to get alignment within Intel first14:19
SundarZhipeng, Jiaying's proposal came first -- but I need to look at the PR before confirming14:20
zhipengokey :)14:21
SundarSo, in short, our task is to find a way of handling FPGAs, without having the benefit of a tree-structured data model, but handling the same FPGA structures and usage models14:21
NokMikeRis the vGPU is vendor specific? ie tied to a particular driver implementation?14:21
Li_LiuSundar, does the k8s community have any plans to add the tree-structure data model in the near future?14:23
SundarNokMikeR, the discussions I have seen have been centered on Nvidia's vGPU types, though not necessarily phrased in vendor-specific terms14:23
NokMikeRin other words how do you differentie the features on one vGPU vs another if the underlying features in the real gpu are different - or are they abstracted somehow?14:23
SundarLi_Liu, None that I am aware of.14:23
NokMikeRSundar: ok thought so re: nvid gpus.14:24
SundarNokMikeR, in OpenStack, the answer is clearer: the device itself exposes different vGPU types as traits, and their capacities as units of a generic accelerator RC14:24
SundarCyborg needs to handle GPUs and FPGAs of course. But, IMHO, there is enough attention on GPUs :) It is FPGAs that need further thought :)14:25
zhipengSundar Jiaying's proposal, as far as I understand, still tries to modify the k8s core functionality ?14:25
shaohe_feng_Sundar: do we decide to only support one vendor GPU or FPGA in this release without nest Provider?14:26
SundarZhipeng, yes, it requires changes on controller side and kubelet changes14:26
SundarShaohe: for Rocky, I was proposing to include only device of a particualr type: one GPU or one FPGA. But, based on feedback, w ehave to relax it to multiple devices of the same type, i.e.,14:28
Sundaryou could have 2 GPUs of the same type, 2 FPGAs of the same type etc.14:28
zhipengSundar if we say, propose a CRD type of kube-cyborg thing, will it make sense to the res mgmt wg people ?14:28
zhipengmeaning that similar to OpenStack14:29
zhipengwe view accelerator not part of the general compute infra14:29
zhipengand have its own model and scheduling process if needed14:29
SundarZHipeng, we can propose CRDs, but the exact workflows will matter.14:30
Li_Liuzhipeng, you are saying, similar to what we did to Cyborg, we cut a piece out from K8S?14:30
zhipengI think the main pain point is still at scheduler extention14:30
zhipengwhich Derek also mentioned KubeCon last Dec14:31
zhipengLi_Liu essentially a out-of-band controller for accelerators14:31
shaohe_feng_Sundar: will cyborg support nest provider in Rocky release?14:31
zhipengshaohe_feng_ I think Placement won't support it14:32
shaohe_feng_zhipeng: Got it.14:32
zhipengbut the way we are modeling it is very close to nrp, correct me if i'm wrong Li_Liu14:32
SundarZhipeng, I was also advocating a scheduler extension. But apparently it is not popular within the community. There is a proposal to revamp the scheduler itself: https://docs.google.com/document/d/1NskpTHpOBWtIa5XsgB4bwPHz4IdRxL1RNvdjI7RVGio/edit#14:32
SundarSo, the scheduler, as well as its extension APIs, may change14:32
zhipengwell CRDs are generally great for API aggregation, but complex for resource related functionalities14:34
SundarHere is a possible way to get to a few basic cases without anything fancy (this is not fully agreed upon, please take this as an option, not a plan):14:34
zhipenglike binding the resource to the pod14:34
zhipengsince the process is external via CRD14:34
Sundar•Publish each region type as a resource. E.g. intel.com/fpga-dcp, intel.com/fpga-vg.  •The pod spec asks for a region type as a resource, and also specifies a bitstream ID. That could be a label.  •An admission controller inserts an init container on seeing a FPGA resource. •The scheduler picks a node based on the requested region type (and ignores the bitstream ID). •The init container pulls the bitstream wi14:34
SundarAh, that didn't come out well in IRC -- let me re-type14:35
Sundar•Publish each region type as a resource. E.g. intel.com/fpga-dcp, intel.com/fpga-vg14:35
Sundar•The pod spec asks for a region type as a resource, and also specifies a bitstream ID. That could be a label14:35
Sundar•An admission controller inserts an init container on seeing a FPGA resource.14:35
Sundar•The scheduler picks a node based on the requested region type (and ignores the bitstream ID).14:35
Sundar•The init container pulls the bitstream with that ID from a bitstream repository (mechanism TBD) and programs the selected device.14:36
SundarI have heard that, if we give a higher security context to the init container for programming, it may affect other containers in the same pod. I am still trying to find evidence for that14:37
zhipenglol this is just too complicated14:37
zhipengthx Sundar I think we have a good understanding of the status qup14:38
zhipengquo14:38
SundarZhipeng, more complicated than other proposals out there? ;)14:38
zhipenganyone else got questions regarding k8s ?14:38
zhipengif you are attending KubeCon we could meet f2f, and give them hell XD14:38
Sundarlol14:39
* NokMikeR braces for impact14:39
Li_Liuif you guys have any dial-in-able meeting during kubecon, please loop us in14:41
shaohe_feng_yes, loop us in14:41
zhipengh[m]Okey I will give a howler if a bridge is available14:42
*** zhipeng has quit IRC14:42
zhipengh[m]Seems like my PC irc client just died14:42
zhipengh[m]#topic Sub team arrangements14:43
*** zhipeng has joined #openstack-cyborg14:45
zhipengphew14:46
Sundar:)14:46
zhipengcell phone irc bouncer crashed just now14:46
zhipengmoving on14:46
zhipeng#topic subteam arrangments14:47
*** openstack changes topic to "subteam arrangments (Meeting topic: openstack-cyborg)"14:47
zhipengokey so given recent events, I think it is necessary to reorg the subteams14:47
zhipengand also encourage subteam to organize their specific meetings14:48
zhipengfor specific topics14:48
zhipengso I would suggest shaohe to help lead the driver subteam14:48
zhipengwork with our Xilinx and lenovo colleagues on FPGA and GPU driver in Rocky14:49
shaohe_feng_Ok.14:49
zhipengLi Liu help lead the doc team, to work with our CMCC member and others to make documentation as good as your spec :)14:49
Li_Liusure14:49
zhipengI will keep on the release mgmt side14:49
zhipengshaohe_feng_ you can sync up with Chuck_ on a meeting time more suited for US west coast14:50
zhipengmainly China morning times I guess14:50
shaohe_feng_zhipeng: what is Chuck_?14:51
Chuck_I am Chuck_ :-)14:51
Chuck_Hi Shaohe, this is Chuck from Xilinx14:51
Sundarlol14:51
shaohe_feng_Chuck_:  hello14:51
Chuck_I work from US west time14:51
zhipengI will add Chuck_ into our wechat group as well14:51
zhipengtalk in Chinese :)14:52
Li_Liucount me in for those driver meeting shaohe_feng_14:52
Li_Liu:)14:52
Chuck_yes, look forward to working with you.14:52
shaohe_feng_Li_Liu: OK.14:53
zhipengand subteam plz send report to the mailing list, you can decide whether it is bi-weekly or weekly14:53
zhipengor monthly even14:53
zhipengup to you14:53
SundarIs it all WeChat in Chinse then? ;) I can join if that helps14:53
zhipengit is for the China region devs :P14:54
zhipengall in Chinese and crazy emoticons :P14:54
Li_LiuWeChat has a translation feature tho :)14:54
shaohe_feng_Sundar: we speak  Chinese there. :)14:54
shaohe_feng_Sundar: you can learn Chinese.14:54
SundarOK :) My daughters learnt Mandarin. I should have joined them14:55
zhipenghaha will learn a lot14:55
Sundar:) Do we have alignment on what use cases we will deliver in Rocky?14:56
zhipengCERN HPC PoC could be the one for GPU14:57
SundarCan we say we will deliver AFaaS pre-programmed, and FPGA aaS with request time programming? These are the simplest ones, and many customers want that14:57
SundarZhipeng, yes, GPU POC too14:57
zhipengSundar that is something we should be able to deliver14:57
zhipengfor FPGA14:57
shaohe_feng_Sundar: oh, do we will have same RC name for VGPU and FPGA, and other accelerators in Rocky release?14:57
SundarShaohe, yes our agreemnt with Nova is to use a generic RC for all accelerators14:58
Sundaras in the spec14:59
SundarDO we have kosamara here?14:59
SundarAny input on the spec from GPU perspective?14:59
shaohe_feng_Sundar: But I'm worry about without nest provider.14:59
SundarShaohe, without nRP, we will apply the traits etc. to the compute node RP15:00
SundarDo you see problems in doing that?15:00
shaohe_feng_how do we distinguish vGPU and FPGA in one host?15:01
*** zhipeng has quit IRC15:01
*** edleafe has quit IRC15:01
*** mszwed has quit IRC15:01
*** circ-user-c3hdH has quit IRC15:01
*** adreznec has quit IRC15:01
*** ChanServ has quit IRC15:01
*** circ-user-c3hdH has joined #openstack-cyborg15:01
*** adreznec has joined #openstack-cyborg15:01
*** ChanServ has joined #openstack-cyborg15:01
*** barjavel.freenode.net sets mode: +o ChanServ15:01
Li_Liuhello?15:02
*** zhipeng has joined #openstack-cyborg15:02
NokMikeRwelcome back15:02
Li_Liueveryone quit?15:02
NokMikeRnet split a glitch in the matrix.15:02
SundarAll traits will get mixed in the compute node. So, a flavor asking for resource:CUSTOM_ACCELERATOR=1 and trait:CUSTOM_GPU_<foo> will come to cyborg which needs to choose a GPU based on its Deployables15:02
SundarLi_Liu, I am still here :)15:02
*** circ-user-c3hdH has quit IRC15:03
shaohe_feng_Sundar: for example, we make a FPGA traits and GPU traits on host provider.15:03
*** edleafe has joined #openstack-cyborg15:03
*** mszwed has joined #openstack-cyborg15:03
zhipengokey this is a crazy night15:03
shaohe_feng_Sundar: and there is a one GPU and one FPGA15:03
SundarSorry, i have another call at this time15:03
shaohe_feng_firstly we consume one FPGA15:04
SundarShaohe, can we continue another time?15:04
shaohe_feng_Sundar: OK.15:04
zhipengokey moving on15:04
zhipeng#topic critical rocky spec update15:04
*** openstack changes topic to "critical rocky spec update (Meeting topic: openstack-cyborg)"15:04
*** ttk2[m] has quit IRC15:05
zhipengxinran_ are you still around ?15:05
xinran_Yes I’m here15:05
zhipengcould you provide a brief update on the quota spec ?15:06
*** zhipengh[m] has quit IRC15:06
xinran_Ok.  Like we discussed with xiyuan, I think we should do the usage part first, so I want to separate the spec into two part.15:09
xinran_What do you think15:09
zhipengmakes sense15:11
zhipenghave you already updated it as two parts ?15:11
xinran_Not yet will update the usage part in this week15:12
zhipengsounds great :)15:12
zhipeng#action xinran to update the quota spec into two parts and complete the usage one first15:13
xinran_For the limit part, i think it still need more discussion with xiyuan15:13
zhipengno problem15:13
Li_LiuSo that we will have 2 spec on this in stead of 1 right?15:13
zhipengyep15:13
zhipengbut the limit one should be rather simple15:13
zhipengsince we will utilize a lot of things Keystone has already designed15:14
Li_LiuI see15:14
xinran_Yes I think so15:15
zhipengokey another thing is that the os-acc spec will need more time15:15
zhipengso I suggest we relax the deadline for proposal on that one to the June MS215:16
zhipengif by then we still could not get it landed, then I will block it for Rocky but could be land first thing in Stein :)15:16
Li_Liuzhipeng, let me know if you need some help on that one15:16
zhipengsounds reasonable15:16
zhipeng?15:16
zhipengLi_Liu sure :)15:17
Li_Liusince I think I have some thoughts on it15:17
zhipeng#agreed os-acc spec extended to MS2 for approval15:17
zhipengLi_Liu no problem, feel free to share it15:17
zhipengokey moving on15:18
zhipeng#topic open patches/bugs15:18
*** openstack changes topic to "open patches/bugs (Meeting topic: openstack-cyborg)"15:18
zhipengI will push up the fix for mutable-config this week15:18
zhipengmaybe combined with the fix lenovo folk has provided but blocked by me due to trivial fix15:20
zhipengany other issues on this topic ?15:21
zhipengokey then15:22
zhipeng#topic AoB15:22
*** openstack changes topic to "AoB (Meeting topic: openstack-cyborg)"15:22
zhipengany other business15:22
xinran_what is AoB......15:23
xinran_Ah i know!15:23
zhipengxinran_ bien :)15:24
xinran_;)15:25
zhipengokey if there are no other topics15:26
zhipenglet's conclude the meeting today15:26
zhipengthx for the great conversation :)15:26
zhipeng#endmeeting15:26
*** openstack changes topic to "OpenStack Cyborg Project Discussion"15:26
openstackMeeting ended Wed Apr 25 15:26:31 2018 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)15:26
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-04-25-13.59.html15:26
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-04-25-13.59.txt15:26
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg/2018/openstack_cyborg.2018-04-25-13.59.log.html15:26
NokMikeRthanks all, bye15:26
*** NokMikeR has quit IRC15:27
*** zhipeng has quit IRC15:28
*** Chuck_ has quit IRC15:50
*** ttk2[m] has joined #openstack-cyborg16:12
*** zhipengh[m] has joined #openstack-cyborg16:22
*** xinran_ has quit IRC17:33
*** evin has joined #openstack-cyborg17:34
*** shaohe_feng_ has quit IRC17:36
*** Li_Liu has quit IRC18:37
*** Sundar has quit IRC18:57
*** evin has quit IRC20:19
*** openstackgerrit has quit IRC21:20

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!