Wednesday, 2017-06-07

*** crushil has joined #openstack-cyborg01:40
*** crushil has quit IRC02:26
*** crushil has joined #openstack-cyborg02:31
*** sekelso has quit IRC03:13
*** sekelso has joined #openstack-cyborg03:22
*** sekelso has quit IRC03:30
*** sekelso has joined #openstack-cyborg03:38
*** crushil has quit IRC03:38
*** crushil has joined #openstack-cyborg03:40
*** crushil has quit IRC03:40
*** sekelso has quit IRC04:16
*** joseppc has quit IRC07:16
*** jkilpatr has joined #openstack-cyborg11:10
*** joseppc has joined #openstack-cyborg11:49
*** mikeH has joined #openstack-cyborg12:06
*** sekelso has joined #openstack-cyborg13:20
*** skelso has joined #openstack-cyborg13:22
*** sekelso has quit IRC13:25
*** NokMikeR has joined #openstack-cyborg13:47
*** zhipeng_ has joined #openstack-cyborg13:47
*** crushil has joined #openstack-cyborg13:49
*** skelso has quit IRC13:52
*** skelso has joined #openstack-cyborg14:03
NokMikeRany meeting today?14:07
zhipeng_yes I just sent out the email to the openstack-dev14:09
zhipeng_weekly meeting as usual14:09
NokMikeRok thanks14:09
*** zhipeng_ has quit IRC14:18
*** zhipeng_ has joined #openstack-cyborg14:18
*** joseppc has quit IRC14:52
crushil\o14:59
jkilpatro/14:59
zhipeng_hey15:00
zhipeng_let's staaaart the longest irc meeting ever15:00
jkilpatrcan't be as bad as last week15:00
zhipeng_#startmeeting openstack-cyborg15:00
openstackMeeting started Wed Jun  7 15:00:56 2017 UTC and is due to finish in 60 minutes.  The chair is zhipeng_. Information about MeetBot at http://wiki.debian.org/MeetBot.15:00
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:00
openstackThe meeting name has been set to 'openstack_cyborg'15:01
zhipeng_hahaha15:01
zhipeng_let's hope so15:01
zhipeng_okey so quick update from my side15:01
zhipeng_on the api/db patch15:01
zhipeng_#topic BP discussion15:01
zhipeng_#link https://review.openstack.org/#/c/445814/15:01
zhipeng_so ChrisD reviewed with the comments that there is an ongoing discussion on the traits15:02
zhipeng_we might consider to align our design to it15:02
zhipeng_originally, the placement resource provider was meant for just compute node15:02
jkilpatrI was looking over that, care to summarize?15:02
zhipeng_sure I'm putting my thoughts together now15:03
zhipeng_so now the placement team see the pitfall for that15:03
zhipeng_since for example for shared storage (external arrays I would suppose)15:03
zhipeng_if you only count the storage side of things on the compute node15:04
zhipeng_your resource provider will never correctly reflect the required traits15:04
jkilpatrso this is an issue with accelerators that may be shared between many computes?15:04
zhipeng_the resouce provider should reflect the shared storage arrays, rather than only local discks15:04
zhipeng_no, I think this is an issue for accelerators as whole15:05
jkilpatrhow so?15:05
zhipeng_since if the resource provider only identify with compute node15:05
zhipeng_we could wind up with the same problem as we have now, since accelerator characters are bundled with the compute charaters15:05
zhipeng_well we could have our own resource class for sure, but that does not solve the problem15:06
zhipeng_nova scheduler asks the placement api to provide all the necessary resources15:06
zhipeng_and for Cyborg, one of the important goals is that accelerators being treated as the first class citezen15:07
zhipeng_meaning that we should have indiidual resource providers for accelerators15:07
zhipeng_from the email link Chris provided, there is an etherpad documenting the "Plan B"15:08
jkilpatrok so the issue is that if we have a 'gpu' resource provider it's dependent on computes in a way that resource providers aren't supposed to be.15:09
zhipeng_which I liked very much, is working on to extend the current nested resource provider definition, to a more relaxed, multiple resource providers one15:09
zhipeng_yes exactly15:09
zhipeng_the scheduling decision would still largely depends on the regular compute features, since we are just part of the traits15:09
crushilinteresting15:09
zhipeng_so back to the "Plan B", the current nested resource provider model is designed primarily for stuff like NUMA nodes15:10
zhipeng_where you got this parent-child relationship15:10
crushilSo, how does that change our implementation?15:10
zhipeng_the Plan B extneds the scope to be more general, meaning for Cyborg use cases15:10
zhipeng_we could have multiple resource provider for each and every accelerators15:11
zhipeng_(if they deemed important for the workload)15:11
zhipeng_crushil the change is that15:11
zhipeng_our DB design has to align with the proposed nested resource provider/trait design15:11
zhipeng_at least DB schemas15:12
zhipeng_so that when cyborg agent populate our inventory to the placement api15:12
zhipeng_it could understand it correctly15:12
crushilOk, what about the other specs?15:13
zhipeng_not concerned that much :)15:14
crushilgotcha15:14
zhipeng_So I'm thinking we might need two DB schemas15:14
zhipeng_the current one in the spec patch, could be used for the discovery phase15:15
zhipeng_that is when user start the cyborg service and then agent/driver do the discovery/pre-config15:15
zhipeng_collect what we have, on the host15:15
zhipeng_the second set of schema needs to be aligned with nested resource provider15:16
zhipeng_to interact with placement api and eventually nova-scheduler15:16
zhipeng_for the VM to select the correct accelerator resource15:16
jkilpatrso we need to maintain two parallel db's for each purpose or do you mean we want to change the format in a future release?15:17
zhipeng_what I'm thinking is that we don't have exhaustive knowledge on the hardware now15:18
zhipeng_therefore we keep a seperate DB schema, the host side one should be more extendable or more abstract15:18
zhipeng_But on another thought15:19
zhipeng_it might be just too complex .....15:19
zhipeng_what do you guys think15:19
jkilpatrI think we should try and keep one db as much as possible, I don't want to try and maintain parallel sets of data15:19
zhipeng_that makes sense15:19
crushilI agree, having multiple DBs is just clunky15:20
zhipeng_in that case we will just use the resource provider schema,I will follow up with Chris to see which one I should use15:21
zhipeng_the current one or the proposed one15:21
jkilpatrsounds good.15:22
jkilpatrAnything else on that subject?15:22
zhipeng_nope15:22
zhipeng_anything else from you guys on the open spec ?15:23
crushilnope15:23
zhipeng_great15:23
zhipeng_#topic initial code development15:23
zhipeng_so, any roadblocks15:24
jkilpatrbeen trying to understand oslo rpc and message passing and start structuring the conductor/agent15:24
zhipeng_sounds like a great start :)15:24
crushilI have created stubs and I will push them up by the end of the week15:25
zhipeng_great !15:25
jkilpatrcrushil, sounds good.15:25
zhipeng_let's do small pieces like Justin suggested15:26
crushilI will fill them out rebased on top of the API and agent patches15:26
jkilpatrso a lot of what we will be doing involves rpc between different components, so people with integrating parts need to talk to each other about interfaces15:26
jkilpatrI don't think we should be too worried about a stable internal interface15:26
zhipeng_yes I agree15:26
zhipeng_oslo.messaging could provide everything we need15:27
jkilpatrwell sometimes we need rpc for example the driver should be called by the agent over rpc I'm thinking (we could invoke directly but I'm not sure if I want to do that)15:27
zhipeng_i think it should be done over rpc15:29
zhipeng_unless, we gave driver restful apis ?15:29
jkilpatrI don't think that's the right application here. Our internal code needs to be more tightly integrated than restfulness allows.15:30
zhipeng_yep15:30
*** rushil has joined #openstack-cyborg15:30
zhipeng_so rpc should be fine here15:31
zhipeng_i think at the moment, it is agent talking to the generic driver15:31
zhipeng_later on, we should design something like the neutron ml2 driver interface15:31
zhipeng_that every driver, vendor or not, implements the interface which rpc calls will go through15:32
zhipeng_ in a rather standard way15:32
rushilOk. So, are we going to follow the neutron model vs the nova/cinder model?15:33
zhipeng_i think more like the neutron moddel15:33
zhipeng_for out-of-tree drivers15:33
rushilBut isn't that too complicated15:34
zhipeng_cinder and nova are mostly in-tree maintained drivers15:34
zhipeng_it won't be too complicated for us i think15:34
zhipeng_neutron is complicated because they have to define the type drivers and mechanism drivers15:34
rushilWell, cinder has out of tree drivers based on whether you have CI or not15:34
zhipeng_I think in-tree drivers also requires the CI15:35
zhipeng_otherwise the cinder team removes your driver15:35
rushilNo, they just make it unsupported i.e. move it out of tree15:35
zhipeng_for us, as long as it is PCIe communicated devices, the driver interface won't be too complicated15:35
zhipeng_but if we need to support extra protocols, that is where things will get wild15:36
zhipeng_rushil ah okey15:36
rushilOk. I just want to make sure we don't make things more complicated than they should be15:36
zhipeng_yes that is always our goal15:36
jkilpatrI can agree on a standard rpc interface but that's less complicated than I think you are making it out to be.15:36
zhipeng_we even wanted to skip the conductor :P15:36
jkilpatrand I nearly got away with it too!15:37
zhipeng_jkilpatr haha15:37
rushilLol15:37
zhipeng_rushil the cyborg ml2 driver would be modeled from your generic driver implementation :P15:39
rushilI wouldn't call it ml2 driver though15:40
zhipeng_of course we will have another name for it15:40
zhipeng_aluminum drivers :P15:41
zhipeng_for cyborg robots15:41
rushilHehe15:41
jkilpatrAnyways I'll try have a stub up this week (conductor) and then agent next week.15:42
jkilpatrdepends on how other tasks go for me.15:42
rushiljkilpatr: Cool15:43
zhipeng_sounds great, i got another colleague working on cyborg this week, so api code will be developed in parallel15:43
rushilAwesome15:44
zhipeng_hopefully when we settled the spec, the initial code will come out15:44
zhipeng_and we could iterate over15:44
zhipeng_#topic AoB15:44
zhipeng_any other topics15:44
rushilBtw our group at Lenovo sent out initial emails to vendors to get their drivers aligned with cyborg15:44
zhipeng_wow15:45
zhipeng_that is awesome15:45
rushilI'll keep you guys posted on that15:45
zhipeng_could you disclose the vendor names for now ?15:45
zhipeng_or should we wait until later15:45
rushilThe usual suspects15:46
zhipeng_e.g ?15:46
rushilNvidia, AMD15:46
rushilAnd smaller ones like Micron15:47
zhipeng_cool !15:47
rushilI'll let y'all know when they are committed to contributing code15:47
zhipeng_great :)15:47
zhipeng_okey if there are no other topics, we go to the usual long slumber ~~15:50
zhipeng_will try to remember to close the meeting an hour later15:50
crushilCool, thanks zhipeng_15:51
*** NokMikeR has quit IRC15:51
*** rushil has quit IRC15:53
*** jkilpatr has left #openstack-cyborg15:56
*** skelso has quit IRC16:12
*** skelso has joined #openstack-cyborg16:13
*** joseppc has joined #openstack-cyborg16:13
*** skelso has quit IRC16:30
*** skelso has joined #openstack-cyborg16:30
*** jkilpatr has joined #openstack-cyborg16:40
*** zhipeng_ has quit IRC16:59
*** zhipeng_ has joined #openstack-cyborg17:00
zhipeng_#endmeeting17:00
openstackMeeting ended Wed Jun  7 17:00:56 2017 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)17:00
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.html17:00
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.txt17:01
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-06-07-15.00.log.html17:01
*** zhipeng_ has quit IRC17:01
*** skelso has quit IRC19:02
*** skelso has joined #openstack-cyborg19:02
*** skelso has quit IRC19:11
-openstackstatus- NOTICE: The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes20:05
*** skelso has joined #openstack-cyborg20:13
*** skelso has quit IRC20:24
*** skelso has joined #openstack-cyborg20:25
*** mikeH has quit IRC21:15
*** crushil has quit IRC21:31
*** crushil has joined #openstack-cyborg21:31
*** crushil has quit IRC21:36
*** jkilpatr has quit IRC22:34
*** skelso has quit IRC22:40
*** openstack has joined #openstack-cyborg23:13
*** mikeH has joined #openstack-cyborg23:37
*** skelso has quit IRC23:37
*** mikeH has quit IRC23:55

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!