Wednesday, 2017-05-17

*** crushil has quit IRC00:01
*** crushil has joined #openstack-cyborg00:02
*** crushil has quit IRC00:32
*** crushil has joined #openstack-cyborg00:41
*** crushil has quit IRC00:50
*** skelso has joined #openstack-cyborg01:38
*** sekelso has joined #openstack-cyborg01:40
*** skelso has quit IRC01:43
*** sekelso has quit IRC02:17
*** sekelso has joined #openstack-cyborg02:18
*** skelso has joined #openstack-cyborg02:20
*** skelso has quit IRC02:20
*** skelso has joined #openstack-cyborg02:21
*** sekelso has quit IRC02:22
*** crushil has joined #openstack-cyborg02:55
*** skelso has quit IRC03:14
*** crushil has quit IRC03:31
*** jkilpatr has quit IRC10:38
-openstackstatus- NOTICE: gerrit is being restarted to help stuck git replication issues10:53
*** jkilpatr has joined #openstack-cyborg11:00
*** mikeH has joined #openstack-cyborg11:30
*** skelso has joined #openstack-cyborg12:53
*** kriskend_ has joined #openstack-cyborg13:12
*** crushil has joined #openstack-cyborg13:59
*** skelso has quit IRC14:05
*** cdent has joined #openstack-cyborg14:52
jkilpatrmorning everyone14:56
crushilMorning jkilpatr14:57
crushil\o15:00
crushilMeeting?15:00
*** zhipeng has joined #openstack-cyborg15:02
jkilpatrmeeting, you know zhiping15:03
jkilpatrhis internet probably decided not to wake up this morning.15:03
zhipeng#startmeeting openstack-cyborg15:03
openstackMeeting started Wed May 17 15:03:31 2017 UTC and is due to finish in 60 minutes.  The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.15:03
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:03
openstackThe meeting name has been set to 'openstack_cyborg'15:03
jkilpatrah speak the devil.15:03
zhipengjust got connected15:03
zhipeng:P15:03
jkilpatrok then I had a list of topics to go over15:03
jkilpatrunless someone wants to bring other things up before I steamroll head?15:04
* cdent is here to lurk15:04
jkilpatrahead*15:04
zhipeng#topic BP discussion15:04
zhipengjkilpatrt go ahead15:04
jkilpatrzhipeng, your outlined Cyborg api doesn't actually have an attach api call. What did you plan to do catch some property of a booting instance?15:05
zhipengas far as I could tell I did include the attach api ?15:07
jkilpatrnova *can* do pci hotplug, supposedly. which would make live attachment *possible* but I have no grasp on the gap between "it should work" and "it actually will"15:07
* jkilpatr is checking the commit again please hold15:07
* jkilpatr hold music ends15:07
jkilpatrso get have get, post, put ,delete but they all seem to be for managing the database of accelerators, you can put to update a accelerator spec but what if I have an instance I need to attach it to15:08
zhipengOkey for VM instance, I still think we would have to look for Nova for the actual attachment operation15:09
zhipengactually Jay and I discussed about this today15:09
zhipengunless we have identify a set of properties for the accelerator connection to the host15:10
zhipengmeaning that we have a os-brick like library15:10
zhipengunless we have that, then we could just assume it is a regular PCI-e attachment15:11
zhipengI think for most of the use cases we have at the moment, PCIe would be the most usual case and Nova currently support that15:12
jkilpatrI think we need to come to a workflow conclusion sooner rather than later, how does the user ask for a vm with a specific accelerator, do they post to us and then we talk to nova, do they post to nova and we watch for some tag etc15:13
jkilpatrshould there be like a 'user workflow' spec? I guess it goes into the api15:13
zhipengworkflow is kinda an end-to-end thing15:14
zhipengI personally would suggest for the moment, we take the Cinder approach, and expand later base upon this15:14
jkilpatrthat meaning live attachment?15:15
zhipengbecause this might be the only way that we don't create a major impact on the existing implementations15:15
zhipengnot necessary live-attachment15:15
zhipengbut just attach.detach ops in general15:15
zhipengfor VM instances15:16
zhipengsince you will need the instance id and host id anyway15:16
zhipengand nova got all these15:16
jkilpatrok then, we have a booted vm we need to do the attachment, we can setup nova to do the passthrough and then reboot the instance, not sure how nova feels about it but if it doesn't work I don't think they would be opposed to us making that work15:16
jkilpatrbut also nova does support live pci attachment, so we can/should just use that15:16
zhipengyes if Nova indeed could support that15:17
jkilpatrhttps://wiki.openstack.org/wiki/Nova/pci_hotplug15:17
jkilpatrthis is a stub right now, so some support15:17
zhipengwe just don't differentiate in Cyborg15:17
jkilpatrthe driver actually handles this of course, we just run attach on a live instance that was spawned with a flavor or some other indicator that says "cyborg: TeslaP100"15:18
zhipengyes15:18
jkilpatrbecause we have to make sure Nova gets it to the right spot firs, so we need to have the placement api fed live knowlege and then the instance needs to call the resource name we fed the placement api15:19
jkilpatrmaybe we can have a command in cyborg that will help you make flavors, but that's for later.15:19
zhipenghotplug could be the trait when we use placement api15:19
zhipengso when we schedule it, it know it needs a compute node with hotplug feature15:20
jkilpatrwouldn't we want a bunch of traits15:20
zhipengyep I guess so :)15:20
jkilpatrlike gpu, with cuda support, with hotplug support .... so on and so forth15:21
zhipengwe could start with really basic and simple ones15:21
jkilpatrthis is why we need a flavor creation wizard in cyborg but once again, later.15:21
zhipengyes15:21
jkilpatrok anyone have comments on these ideas? things that they might want that this won't cover?15:21
jkilpatrzhipeng, I'm going to put a comment on your api patch as a reminder to add cinder like attach/detach to the spec sound good?15:22
zhipengwhich I thought is already done in the current patch ?15:23
cdentit would be great to see, at some point, a narration of the expected end to end flow from a user's standpoint, if it doesn't already exist. Including how various services will be touched.15:23
zhipengcdent we got a flow chart in our BOS presentation, but rather rudimentary at the moment15:24
jkilpatrcdent, we have a decent idea of how we want it to work but I expect some things will change as we get into the nitty gritty of placement problems15:24
cdentsure, change is the nature of this stuff :)15:24
jkilpatrzhipeng, ok so if I want to attach an accelerator to an instance what do I do? Do I put to update an accelerator spec with a new instance ID to attach to?15:24
zhipeng:)15:24
jkilpatrcdent, I think I'll put up a user workflow spec later today, just so that we keep track of all of this better.15:25
cdent\o/15:25
cdentcan you add me as a review on that when it is up, so I get some email to remind me to look?15:25
zhipengjust as we drew for our presentation, after the user using Cyborg service to complete the discovery phase and Cyborg finishing interaction with placement to advertise the accelerator inventory15:26
jkilpatrwill do, whats your email?15:26
cdentcdent@anticdent.org is me on gerrit15:26
zhipengthen user just request to create an instance on a compute node with the corresponding accelerator trait15:27
zhipengif trait include hotplug, then maybe it will be a live attachment15:27
jkilpatrI really don't think we're going to get away with one trait per accelerator, users will probably bundle them into flavors, but instead of being tied to a list of whitelisted pci devices these flavors can be much mroe general.15:27
zhipengwhich means user could attach the accelerator after VM creation15:27
zhipengI was told by jay today that trait are per resource provider15:28
zhipengso it would mostly be one trait per compute node15:29
jkilpatrI'll have to look at it in detail, I was watching the summit presentation again today.15:29
zhipengor we got vGPUs or FPGA virtual functions15:29
zhipengthen it would be nested resource provider15:29
zhipengand we could have trait on the virtual functions15:29
zhipengbut anyways it does not tie to a specific accelerator15:29
zhipengjust depending on you model your accelerators into resource providers15:30
zhipengcdent plz correct me if i'm wrong :P15:30
jkilpatrwe're going to need to be careful with that.15:30
zhipeng#link https://pbs.twimg.com/media/DAAUxEWUAAAV6zA.jpg15:31
cdentzhipeng: that looks mostly correct, but I'm only partially paying attention :(15:31
zhipengcdent no problemo :)15:31
zhipengas long as I don't make any extremely wrong claims :)15:31
jkilpatrso implementation. What can we start and when?15:32
zhipengas soon as we freeze the specs15:32
zhipengi suppose we should all go ahead start coding15:32
crushilCan we focus on closing out the specs out first though?15:32
zhipengyes15:32
jkilpatrok then that's a plan.15:32
zhipengokey, then for api spec15:33
zhipeng#link https://review.openstack.org/44581415:33
zhipengany other questions ?15:33
jkilpatrI just posted a comment there, otherwise I'm happy enough15:34
zhipengokey15:34
jkilpatrum should we pick a database tech? what's available already sql, mongo... reddis (not sure about that one)15:34
crushilMariaDB?15:35
zhipeng#action jkilpatr to post a reminder comment, the api spec patch is LGTM15:35
zhipengi think we could just use mysql15:35
jkilpatrMariaDB == mysql except when it doesnt15:35
jkilpatrI think openstack ships with Maria right now15:36
zhipengyes15:36
crushilyup15:36
zhipengnext up, agent spec15:36
zhipeng#link https://review.openstack.org/#/c/446091/15:36
jkilpatrlooks like most people are happy with it.15:37
zhipeng#info Jay Pipes suggest agent could directly interact with placement api, instead of going through conductor15:37
zhipeng#link https://pbs.twimg.com/media/DAAUtyoUMAAXQXI.jpg15:37
jkilpatrso from the summit preso all the computes already talk to the placement api themselves15:37
zhipengbut I guess we don't need to reflect that in the agent spec15:37
jkilpatrso it's designed to scale well like that15:37
jkilpatrI'd prefer to be explicit, I'll patch it into my spec today15:38
crushilWe should reflect that in the agent spec15:38
zhipengjkilpatr yes, and for implementation, Jay suggest we could directly just copy nova/scheduler/client/report.py15:39
zhipengsince it is basically rest calls between agent and placement api15:39
jkilpatrthat's the sort of laziness I can get behind.15:39
zhipengXD15:39
crushillol15:39
zhipeng#agreed jkilpatr do a quick update on agent spec to reflect jaypipes comment, then the agent spec patch LGTM15:40
zhipengokey, next up, generic driver15:40
zhipeng#link https://review.openstack.org/#/c/447257/15:40
zhipengany more comments15:41
zhipenglooks fine to me15:41
crushiljkilpatr, Any other comments on your end? I have tried to address all of your and Roman's comments in the patch15:41
jkilpatrwhat about detect accelerator? discovery has to be handled by someone, do we want drivers to have a discovery call?15:41
jkilpatrI like the rest of the api list for it, good job15:42
crushilI can add that to the list. What would be the flow though for discovery?15:43
zhipengi think discovery already part of the spec ?15:43
zhipengsee line 12115:43
jkilpatrah yup just not in the other list15:43
crushilIt's not part of the API list15:44
jkilpatrcrushil, the flow (which I think you should add into your spec or maybe me in to the agent spec)15:44
jkilpatris agent on first startup says "hey I've never been started up before, lets call discover for all my drivers"15:44
jkilpatrwhatever returns true it lists and sends to the conductor to store in the db as possible accelerators15:44
jkilpatrlater on operators can call discover to do this again and add new accelerators.15:45
jkilpatras a note I think accelerators should get added in a "not ready" state with the operator having to tell cyborg to go install drivers otherwise we risk bad endings installing software on live clouds15:45
jkilpatrmore things to add to the agent spec15:45
zhipengagree15:46
crushil+115:46
crushilMakes sense, but should we add it to the driver spec or agent spec or both?15:46
zhipengi think for both, because discovery is directly triggered by agent to run loops on drivers ,right ?15:47
jkilpatrcrushil, driver spec just needs "on discovery return if the accelerator exists or not" agent is the one that will call discovery then wait for the operator to call the api to move the accelrator into 'ready' before calling the install driver function.15:47
zhipengyep15:48
jkilpatrum speaking of message passing15:48
jkilpatrmost of this should be done over message passing15:48
jkilpatrrabbitmq/oslo messaging fine?15:48
crushilYup, that is the OS standard15:48
zhipengyep15:48
zhipeng#agreed crushil to update the driver spec to include the discovery interface, and jkilpatr update the agent spec to reflect the related operations, otherwise it is LGTM15:50
zhipengmoving along, next up, interaction https://review.openstack.org/#/c/448228/15:50
zhipeng#link https://review.openstack.org/#/c/448228/15:50
zhipengI think we still need more work in this15:51
zhipengfirst of all thx to gryf to work this on his own time15:51
ttk2[m]I think this is where most of the workflow stuff is hidden right now.15:51
ttk2[m]Oh this is jkilpatr moved to my phone.15:52
zhipengyes15:53
zhipengwe should continue to work on the spec, but I don't think it will block our implementation15:53
zhipengany thoughts ?15:54
zhipengand tttk2[m] I think you could just work with Roman on this patch to illustrate the workflow15:56
zhipengand also have cdent for review15:56
crushilYa, makes sense. But, we need to have a cutoff date to finish the spec15:56
zhipengwe slip the Apr 15th one rather quickly lol, but ya I agree we need another cutoff date15:57
zhipengwhat is the m2 deadline for Pike ?15:57
crushilJune 915:58
zhipengi think we could just use that for all the non-LGTM specs per today's meeting15:59
ttk2[m]Ok then. Can we comment on the specs with that deadline.15:59
crushilWe should close out all the other specs sooner though15:59
ttk2[m]I feel like we should make a point of moving info out of meetings and into specs so we don't lose them in the back hole or IRC logs.15:59
crushil+115:59
zhipeng+116:00
zhipengat least for all the LGTM specs I will merge those by the end of this week16:01
zhipeng#agreed set June 9th for a hard cut-off date for all the remaining spec, including cyborg-nova interaction16:01
zhipengnext up , conductor spec16:01
zhipeng#link https://review.openstack.org/#/c/463316/16:02
zhipengi think I will post some review, most on the wording16:02
zhipengbut this should be a simple one for us to freeze this week16:02
ttk2[m]Agreed. It's pretty much just glue code.16:03
zhipeng#agreed after some polishing, conductor spec LGTM this week16:04
zhipengthe last one in the queue, not a spec patch tho16:04
zhipeng#link https://review.openstack.org/#/c/461220/16:04
zhipengcould folks just give a +1 so that I could merge it, it is mostly a house cleaning stuff16:04
gryfI have mixed feelings about that16:07
* gryf just joined16:08
zhipenggryf which topic ?16:08
gryfnacsa.tgz in a repo16:08
gryfit doesn't sound right16:08
zhipengwe just hosted in the sandbox16:09
zhipengwe could even move them out to an individual repo later on16:09
gryfwell, yeah16:09
zhipengbut we did have extensive discussion on that matter16:09
zhipengwith moshe and his team16:09
gryfbut it will affect size of the repositiory16:09
zhipengthen I think maybe we could move the sandbox out to an individual repo, such as cyborg-sandbox16:10
zhipengso that it won't affect the cyborg project repo itself16:11
gryfyes, I think that the better solution16:11
gryfalso16:11
gryfI'd like to avoid keeping binary blobs in repository16:11
ttk2[m]Agreed.16:12
zhipengthat's fine for me :)16:12
zhipengbut we do need to merge it first, and then move it out16:12
*** cdent has left #openstack-cyborg16:12
zhipengdue process16:12
gryfso the perfect solution would be to unpack it, and make the commmit which move entire work into its own directory. what do you think?16:12
zhipengnuh that won't be necessary16:12
zhipengi think just move to another repo just for records16:13
zhipengwe won't do any release, for example , for the cyborg-sandbox16:13
ttk2[m]Um if we merge it it's in the repo history forever.16:13
zhipengit just sits there16:13
zhipengno we could move it our16:13
zhipengand we need to move out the spec later as well16:13
zhipengcyborg-spec will be the standalone repo to store all the specs16:14
ttk2[m]I don't have super strong feelings. But Id like to keep binaries out of the repo16:14
gryfttk2[m], +116:14
zhipengI have no problem either16:14
zhipengbut let's just follow a procedure and get it done16:14
zhipengsounds reasonable for everyone ?16:16
gryfzhipeng, what exactly do you mean by following procedure?16:18
zhipenghave it first in the current cyborg repo, and then move it out to a seperate one16:19
gryfI'm against it. as ttk2[m] said - if we merge it, it stays forever.16:19
zhipengwhy ??16:19
gryfit's a git :>16:19
ttk2[m]Because history16:19
zhipengsay we couldn;t even move the specs out ?16:20
adreznecmerging it will permanently increase the repo size because the artifact will remain in the history forever16:20
zhipengokey understood16:21
gryfzhipeng, we can, but they will be available, if someone would like to go back in time (in history) and nothing prevent him to do so :D16:21
zhipengthen I will abandon the patch and directly submit it to the seperate repo instead16:21
zhipengthis sounds reasonable ?16:21
gryfunless, we do some rebase stuff on the repo itself, but I'm not aware if this is a good practice16:21
gryfyup16:21
zhipeng#agreed abandon the nacsa sandbox patch and directly submit it to a seperate repo16:22
adreznecgryf: yeah, you basically have to use a rebase or git filter-branch to remove it, but that'll break everyone's checked out repos since you're rewriting history... so not typically good practice16:22
zhipengokey, we got many things settled :)16:23
zhipeng#topic CI discussion16:23
zhipengas I understand ttk2[m] and gryf has some discussion on the CI settings16:23
zhipengdo we have any perference now ?16:23
gryfadreznec, yeah.16:23
gryfzhipeng, it was mostly very high level discussion16:24
gryfwe have to have some concrete implementation first16:25
zhipengwill then on high level, any directions that we want to follow upon :)16:25
zhipengokey16:25
zhipengbut have vendors to provide third party CI env would always be a good idea16:25
zhipengbaremetal or vm, is it correct ?16:25
gryfwe can figure that out later16:25
zhipengsure16:26
zhipeng#topic AoB16:26
zhipengany other topics ?16:26
ttk2[m]Keep up the good work guys.16:26
zhipengthat would be a good note our meeting ends on :)16:27
zhipengok thx guys, let's end the meeting for today16:28
zhipeng#endmeeting16:28
openstackMeeting ended Wed May 17 16:28:25 2017 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:28
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.html16:28
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.txt16:28
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.log.html16:28
*** zhipeng has quit IRC16:30
*** mikeH has quit IRC16:37
*** crushil has quit IRC17:37
*** crushil has joined #openstack-cyborg17:44
*** davidjzhao has joined #openstack-cyborg18:01
*** davidjzhao has quit IRC18:05
*** crushil has quit IRC18:27
*** jkilpatr has quit IRC18:50
*** jkilpatr has joined #openstack-cyborg18:54
*** crushil has joined #openstack-cyborg19:03
*** crushil has quit IRC19:49
*** jkilpatr has quit IRC20:04
*** crushil has joined #openstack-cyborg20:06
*** jkilpatr has joined #openstack-cyborg20:08
*** kriskend_ has quit IRC21:35
*** jkilpatr has quit IRC22:07
*** jkilpatr has joined #openstack-cyborg22:24
*** crushil has quit IRC23:32
*** kriskend_ has joined #openstack-cyborg23:52
*** kriskend_ has quit IRC23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!