*** crushil has quit IRC | 00:01 | |
*** crushil has joined #openstack-cyborg | 00:02 | |
*** crushil has quit IRC | 00:32 | |
*** crushil has joined #openstack-cyborg | 00:41 | |
*** crushil has quit IRC | 00:50 | |
*** skelso has joined #openstack-cyborg | 01:38 | |
*** sekelso has joined #openstack-cyborg | 01:40 | |
*** skelso has quit IRC | 01:43 | |
*** sekelso has quit IRC | 02:17 | |
*** sekelso has joined #openstack-cyborg | 02:18 | |
*** skelso has joined #openstack-cyborg | 02:20 | |
*** skelso has quit IRC | 02:20 | |
*** skelso has joined #openstack-cyborg | 02:21 | |
*** sekelso has quit IRC | 02:22 | |
*** crushil has joined #openstack-cyborg | 02:55 | |
*** skelso has quit IRC | 03:14 | |
*** crushil has quit IRC | 03:31 | |
*** jkilpatr has quit IRC | 10:38 | |
-openstackstatus- NOTICE: gerrit is being restarted to help stuck git replication issues | 10:53 | |
*** jkilpatr has joined #openstack-cyborg | 11:00 | |
*** mikeH has joined #openstack-cyborg | 11:30 | |
*** skelso has joined #openstack-cyborg | 12:53 | |
*** kriskend_ has joined #openstack-cyborg | 13:12 | |
*** crushil has joined #openstack-cyborg | 13:59 | |
*** skelso has quit IRC | 14:05 | |
*** cdent has joined #openstack-cyborg | 14:52 | |
jkilpatr | morning everyone | 14:56 |
---|---|---|
crushil | Morning jkilpatr | 14:57 |
crushil | \o | 15:00 |
crushil | Meeting? | 15:00 |
*** zhipeng has joined #openstack-cyborg | 15:02 | |
jkilpatr | meeting, you know zhiping | 15:03 |
jkilpatr | his internet probably decided not to wake up this morning. | 15:03 |
zhipeng | #startmeeting openstack-cyborg | 15:03 |
openstack | Meeting started Wed May 17 15:03:31 2017 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:03 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:03 |
openstack | The meeting name has been set to 'openstack_cyborg' | 15:03 |
jkilpatr | ah speak the devil. | 15:03 |
zhipeng | just got connected | 15:03 |
zhipeng | :P | 15:03 |
jkilpatr | ok then I had a list of topics to go over | 15:03 |
jkilpatr | unless someone wants to bring other things up before I steamroll head? | 15:04 |
* cdent is here to lurk | 15:04 | |
jkilpatr | ahead* | 15:04 |
zhipeng | #topic BP discussion | 15:04 |
zhipeng | jkilpatrt go ahead | 15:04 |
jkilpatr | zhipeng, your outlined Cyborg api doesn't actually have an attach api call. What did you plan to do catch some property of a booting instance? | 15:05 |
zhipeng | as far as I could tell I did include the attach api ? | 15:07 |
jkilpatr | nova *can* do pci hotplug, supposedly. which would make live attachment *possible* but I have no grasp on the gap between "it should work" and "it actually will" | 15:07 |
* jkilpatr is checking the commit again please hold | 15:07 | |
* jkilpatr hold music ends | 15:07 | |
jkilpatr | so get have get, post, put ,delete but they all seem to be for managing the database of accelerators, you can put to update a accelerator spec but what if I have an instance I need to attach it to | 15:08 |
zhipeng | Okey for VM instance, I still think we would have to look for Nova for the actual attachment operation | 15:09 |
zhipeng | actually Jay and I discussed about this today | 15:09 |
zhipeng | unless we have identify a set of properties for the accelerator connection to the host | 15:10 |
zhipeng | meaning that we have a os-brick like library | 15:10 |
zhipeng | unless we have that, then we could just assume it is a regular PCI-e attachment | 15:11 |
zhipeng | I think for most of the use cases we have at the moment, PCIe would be the most usual case and Nova currently support that | 15:12 |
jkilpatr | I think we need to come to a workflow conclusion sooner rather than later, how does the user ask for a vm with a specific accelerator, do they post to us and then we talk to nova, do they post to nova and we watch for some tag etc | 15:13 |
jkilpatr | should there be like a 'user workflow' spec? I guess it goes into the api | 15:13 |
zhipeng | workflow is kinda an end-to-end thing | 15:14 |
zhipeng | I personally would suggest for the moment, we take the Cinder approach, and expand later base upon this | 15:14 |
jkilpatr | that meaning live attachment? | 15:15 |
zhipeng | because this might be the only way that we don't create a major impact on the existing implementations | 15:15 |
zhipeng | not necessary live-attachment | 15:15 |
zhipeng | but just attach.detach ops in general | 15:15 |
zhipeng | for VM instances | 15:16 |
zhipeng | since you will need the instance id and host id anyway | 15:16 |
zhipeng | and nova got all these | 15:16 |
jkilpatr | ok then, we have a booted vm we need to do the attachment, we can setup nova to do the passthrough and then reboot the instance, not sure how nova feels about it but if it doesn't work I don't think they would be opposed to us making that work | 15:16 |
jkilpatr | but also nova does support live pci attachment, so we can/should just use that | 15:16 |
zhipeng | yes if Nova indeed could support that | 15:17 |
jkilpatr | https://wiki.openstack.org/wiki/Nova/pci_hotplug | 15:17 |
jkilpatr | this is a stub right now, so some support | 15:17 |
zhipeng | we just don't differentiate in Cyborg | 15:17 |
jkilpatr | the driver actually handles this of course, we just run attach on a live instance that was spawned with a flavor or some other indicator that says "cyborg: TeslaP100" | 15:18 |
zhipeng | yes | 15:18 |
jkilpatr | because we have to make sure Nova gets it to the right spot firs, so we need to have the placement api fed live knowlege and then the instance needs to call the resource name we fed the placement api | 15:19 |
jkilpatr | maybe we can have a command in cyborg that will help you make flavors, but that's for later. | 15:19 |
zhipeng | hotplug could be the trait when we use placement api | 15:19 |
zhipeng | so when we schedule it, it know it needs a compute node with hotplug feature | 15:20 |
jkilpatr | wouldn't we want a bunch of traits | 15:20 |
zhipeng | yep I guess so :) | 15:20 |
jkilpatr | like gpu, with cuda support, with hotplug support .... so on and so forth | 15:21 |
zhipeng | we could start with really basic and simple ones | 15:21 |
jkilpatr | this is why we need a flavor creation wizard in cyborg but once again, later. | 15:21 |
zhipeng | yes | 15:21 |
jkilpatr | ok anyone have comments on these ideas? things that they might want that this won't cover? | 15:21 |
jkilpatr | zhipeng, I'm going to put a comment on your api patch as a reminder to add cinder like attach/detach to the spec sound good? | 15:22 |
zhipeng | which I thought is already done in the current patch ? | 15:23 |
cdent | it would be great to see, at some point, a narration of the expected end to end flow from a user's standpoint, if it doesn't already exist. Including how various services will be touched. | 15:23 |
zhipeng | cdent we got a flow chart in our BOS presentation, but rather rudimentary at the moment | 15:24 |
jkilpatr | cdent, we have a decent idea of how we want it to work but I expect some things will change as we get into the nitty gritty of placement problems | 15:24 |
cdent | sure, change is the nature of this stuff :) | 15:24 |
jkilpatr | zhipeng, ok so if I want to attach an accelerator to an instance what do I do? Do I put to update an accelerator spec with a new instance ID to attach to? | 15:24 |
zhipeng | :) | 15:24 |
jkilpatr | cdent, I think I'll put up a user workflow spec later today, just so that we keep track of all of this better. | 15:25 |
cdent | \o/ | 15:25 |
cdent | can you add me as a review on that when it is up, so I get some email to remind me to look? | 15:25 |
zhipeng | just as we drew for our presentation, after the user using Cyborg service to complete the discovery phase and Cyborg finishing interaction with placement to advertise the accelerator inventory | 15:26 |
jkilpatr | will do, whats your email? | 15:26 |
cdent | cdent@anticdent.org is me on gerrit | 15:26 |
zhipeng | then user just request to create an instance on a compute node with the corresponding accelerator trait | 15:27 |
zhipeng | if trait include hotplug, then maybe it will be a live attachment | 15:27 |
jkilpatr | I really don't think we're going to get away with one trait per accelerator, users will probably bundle them into flavors, but instead of being tied to a list of whitelisted pci devices these flavors can be much mroe general. | 15:27 |
zhipeng | which means user could attach the accelerator after VM creation | 15:27 |
zhipeng | I was told by jay today that trait are per resource provider | 15:28 |
zhipeng | so it would mostly be one trait per compute node | 15:29 |
jkilpatr | I'll have to look at it in detail, I was watching the summit presentation again today. | 15:29 |
zhipeng | or we got vGPUs or FPGA virtual functions | 15:29 |
zhipeng | then it would be nested resource provider | 15:29 |
zhipeng | and we could have trait on the virtual functions | 15:29 |
zhipeng | but anyways it does not tie to a specific accelerator | 15:29 |
zhipeng | just depending on you model your accelerators into resource providers | 15:30 |
zhipeng | cdent plz correct me if i'm wrong :P | 15:30 |
jkilpatr | we're going to need to be careful with that. | 15:30 |
zhipeng | #link https://pbs.twimg.com/media/DAAUxEWUAAAV6zA.jpg | 15:31 |
cdent | zhipeng: that looks mostly correct, but I'm only partially paying attention :( | 15:31 |
zhipeng | cdent no problemo :) | 15:31 |
zhipeng | as long as I don't make any extremely wrong claims :) | 15:31 |
jkilpatr | so implementation. What can we start and when? | 15:32 |
zhipeng | as soon as we freeze the specs | 15:32 |
zhipeng | i suppose we should all go ahead start coding | 15:32 |
crushil | Can we focus on closing out the specs out first though? | 15:32 |
zhipeng | yes | 15:32 |
jkilpatr | ok then that's a plan. | 15:32 |
zhipeng | okey, then for api spec | 15:33 |
zhipeng | #link https://review.openstack.org/445814 | 15:33 |
zhipeng | any other questions ? | 15:33 |
jkilpatr | I just posted a comment there, otherwise I'm happy enough | 15:34 |
zhipeng | okey | 15:34 |
jkilpatr | um should we pick a database tech? what's available already sql, mongo... reddis (not sure about that one) | 15:34 |
crushil | MariaDB? | 15:35 |
zhipeng | #action jkilpatr to post a reminder comment, the api spec patch is LGTM | 15:35 |
zhipeng | i think we could just use mysql | 15:35 |
jkilpatr | MariaDB == mysql except when it doesnt | 15:35 |
jkilpatr | I think openstack ships with Maria right now | 15:36 |
zhipeng | yes | 15:36 |
crushil | yup | 15:36 |
zhipeng | next up, agent spec | 15:36 |
zhipeng | #link https://review.openstack.org/#/c/446091/ | 15:36 |
jkilpatr | looks like most people are happy with it. | 15:37 |
zhipeng | #info Jay Pipes suggest agent could directly interact with placement api, instead of going through conductor | 15:37 |
zhipeng | #link https://pbs.twimg.com/media/DAAUtyoUMAAXQXI.jpg | 15:37 |
jkilpatr | so from the summit preso all the computes already talk to the placement api themselves | 15:37 |
zhipeng | but I guess we don't need to reflect that in the agent spec | 15:37 |
jkilpatr | so it's designed to scale well like that | 15:37 |
jkilpatr | I'd prefer to be explicit, I'll patch it into my spec today | 15:38 |
crushil | We should reflect that in the agent spec | 15:38 |
zhipeng | jkilpatr yes, and for implementation, Jay suggest we could directly just copy nova/scheduler/client/report.py | 15:39 |
zhipeng | since it is basically rest calls between agent and placement api | 15:39 |
jkilpatr | that's the sort of laziness I can get behind. | 15:39 |
zhipeng | XD | 15:39 |
crushil | lol | 15:39 |
zhipeng | #agreed jkilpatr do a quick update on agent spec to reflect jaypipes comment, then the agent spec patch LGTM | 15:40 |
zhipeng | okey, next up, generic driver | 15:40 |
zhipeng | #link https://review.openstack.org/#/c/447257/ | 15:40 |
zhipeng | any more comments | 15:41 |
zhipeng | looks fine to me | 15:41 |
crushil | jkilpatr, Any other comments on your end? I have tried to address all of your and Roman's comments in the patch | 15:41 |
jkilpatr | what about detect accelerator? discovery has to be handled by someone, do we want drivers to have a discovery call? | 15:41 |
jkilpatr | I like the rest of the api list for it, good job | 15:42 |
crushil | I can add that to the list. What would be the flow though for discovery? | 15:43 |
zhipeng | i think discovery already part of the spec ? | 15:43 |
zhipeng | see line 121 | 15:43 |
jkilpatr | ah yup just not in the other list | 15:43 |
crushil | It's not part of the API list | 15:44 |
jkilpatr | crushil, the flow (which I think you should add into your spec or maybe me in to the agent spec) | 15:44 |
jkilpatr | is agent on first startup says "hey I've never been started up before, lets call discover for all my drivers" | 15:44 |
jkilpatr | whatever returns true it lists and sends to the conductor to store in the db as possible accelerators | 15:44 |
jkilpatr | later on operators can call discover to do this again and add new accelerators. | 15:45 |
jkilpatr | as a note I think accelerators should get added in a "not ready" state with the operator having to tell cyborg to go install drivers otherwise we risk bad endings installing software on live clouds | 15:45 |
jkilpatr | more things to add to the agent spec | 15:45 |
zhipeng | agree | 15:46 |
crushil | +1 | 15:46 |
crushil | Makes sense, but should we add it to the driver spec or agent spec or both? | 15:46 |
zhipeng | i think for both, because discovery is directly triggered by agent to run loops on drivers ,right ? | 15:47 |
jkilpatr | crushil, driver spec just needs "on discovery return if the accelerator exists or not" agent is the one that will call discovery then wait for the operator to call the api to move the accelrator into 'ready' before calling the install driver function. | 15:47 |
zhipeng | yep | 15:48 |
jkilpatr | um speaking of message passing | 15:48 |
jkilpatr | most of this should be done over message passing | 15:48 |
jkilpatr | rabbitmq/oslo messaging fine? | 15:48 |
crushil | Yup, that is the OS standard | 15:48 |
zhipeng | yep | 15:48 |
zhipeng | #agreed crushil to update the driver spec to include the discovery interface, and jkilpatr update the agent spec to reflect the related operations, otherwise it is LGTM | 15:50 |
zhipeng | moving along, next up, interaction https://review.openstack.org/#/c/448228/ | 15:50 |
zhipeng | #link https://review.openstack.org/#/c/448228/ | 15:50 |
zhipeng | I think we still need more work in this | 15:51 |
zhipeng | first of all thx to gryf to work this on his own time | 15:51 |
ttk2[m] | I think this is where most of the workflow stuff is hidden right now. | 15:51 |
ttk2[m] | Oh this is jkilpatr moved to my phone. | 15:52 |
zhipeng | yes | 15:53 |
zhipeng | we should continue to work on the spec, but I don't think it will block our implementation | 15:53 |
zhipeng | any thoughts ? | 15:54 |
zhipeng | and tttk2[m] I think you could just work with Roman on this patch to illustrate the workflow | 15:56 |
zhipeng | and also have cdent for review | 15:56 |
crushil | Ya, makes sense. But, we need to have a cutoff date to finish the spec | 15:56 |
zhipeng | we slip the Apr 15th one rather quickly lol, but ya I agree we need another cutoff date | 15:57 |
zhipeng | what is the m2 deadline for Pike ? | 15:57 |
crushil | June 9 | 15:58 |
zhipeng | i think we could just use that for all the non-LGTM specs per today's meeting | 15:59 |
ttk2[m] | Ok then. Can we comment on the specs with that deadline. | 15:59 |
crushil | We should close out all the other specs sooner though | 15:59 |
ttk2[m] | I feel like we should make a point of moving info out of meetings and into specs so we don't lose them in the back hole or IRC logs. | 15:59 |
crushil | +1 | 15:59 |
zhipeng | +1 | 16:00 |
zhipeng | at least for all the LGTM specs I will merge those by the end of this week | 16:01 |
zhipeng | #agreed set June 9th for a hard cut-off date for all the remaining spec, including cyborg-nova interaction | 16:01 |
zhipeng | next up , conductor spec | 16:01 |
zhipeng | #link https://review.openstack.org/#/c/463316/ | 16:02 |
zhipeng | i think I will post some review, most on the wording | 16:02 |
zhipeng | but this should be a simple one for us to freeze this week | 16:02 |
ttk2[m] | Agreed. It's pretty much just glue code. | 16:03 |
zhipeng | #agreed after some polishing, conductor spec LGTM this week | 16:04 |
zhipeng | the last one in the queue, not a spec patch tho | 16:04 |
zhipeng | #link https://review.openstack.org/#/c/461220/ | 16:04 |
zhipeng | could folks just give a +1 so that I could merge it, it is mostly a house cleaning stuff | 16:04 |
gryf | I have mixed feelings about that | 16:07 |
* gryf just joined | 16:08 | |
zhipeng | gryf which topic ? | 16:08 |
gryf | nacsa.tgz in a repo | 16:08 |
gryf | it doesn't sound right | 16:08 |
zhipeng | we just hosted in the sandbox | 16:09 |
zhipeng | we could even move them out to an individual repo later on | 16:09 |
gryf | well, yeah | 16:09 |
zhipeng | but we did have extensive discussion on that matter | 16:09 |
zhipeng | with moshe and his team | 16:09 |
gryf | but it will affect size of the repositiory | 16:09 |
zhipeng | then I think maybe we could move the sandbox out to an individual repo, such as cyborg-sandbox | 16:10 |
zhipeng | so that it won't affect the cyborg project repo itself | 16:11 |
gryf | yes, I think that the better solution | 16:11 |
gryf | also | 16:11 |
gryf | I'd like to avoid keeping binary blobs in repository | 16:11 |
ttk2[m] | Agreed. | 16:12 |
zhipeng | that's fine for me :) | 16:12 |
zhipeng | but we do need to merge it first, and then move it out | 16:12 |
*** cdent has left #openstack-cyborg | 16:12 | |
zhipeng | due process | 16:12 |
gryf | so the perfect solution would be to unpack it, and make the commmit which move entire work into its own directory. what do you think? | 16:12 |
zhipeng | nuh that won't be necessary | 16:12 |
zhipeng | i think just move to another repo just for records | 16:13 |
zhipeng | we won't do any release, for example , for the cyborg-sandbox | 16:13 |
ttk2[m] | Um if we merge it it's in the repo history forever. | 16:13 |
zhipeng | it just sits there | 16:13 |
zhipeng | no we could move it our | 16:13 |
zhipeng | and we need to move out the spec later as well | 16:13 |
zhipeng | cyborg-spec will be the standalone repo to store all the specs | 16:14 |
ttk2[m] | I don't have super strong feelings. But Id like to keep binaries out of the repo | 16:14 |
gryf | ttk2[m], +1 | 16:14 |
zhipeng | I have no problem either | 16:14 |
zhipeng | but let's just follow a procedure and get it done | 16:14 |
zhipeng | sounds reasonable for everyone ? | 16:16 |
gryf | zhipeng, what exactly do you mean by following procedure? | 16:18 |
zhipeng | have it first in the current cyborg repo, and then move it out to a seperate one | 16:19 |
gryf | I'm against it. as ttk2[m] said - if we merge it, it stays forever. | 16:19 |
zhipeng | why ?? | 16:19 |
gryf | it's a git :> | 16:19 |
ttk2[m] | Because history | 16:19 |
zhipeng | say we couldn;t even move the specs out ? | 16:20 |
adreznec | merging it will permanently increase the repo size because the artifact will remain in the history forever | 16:20 |
zhipeng | okey understood | 16:21 |
gryf | zhipeng, we can, but they will be available, if someone would like to go back in time (in history) and nothing prevent him to do so :D | 16:21 |
zhipeng | then I will abandon the patch and directly submit it to the seperate repo instead | 16:21 |
zhipeng | this sounds reasonable ? | 16:21 |
gryf | unless, we do some rebase stuff on the repo itself, but I'm not aware if this is a good practice | 16:21 |
gryf | yup | 16:21 |
zhipeng | #agreed abandon the nacsa sandbox patch and directly submit it to a seperate repo | 16:22 |
adreznec | gryf: yeah, you basically have to use a rebase or git filter-branch to remove it, but that'll break everyone's checked out repos since you're rewriting history... so not typically good practice | 16:22 |
zhipeng | okey, we got many things settled :) | 16:23 |
zhipeng | #topic CI discussion | 16:23 |
zhipeng | as I understand ttk2[m] and gryf has some discussion on the CI settings | 16:23 |
zhipeng | do we have any perference now ? | 16:23 |
gryf | adreznec, yeah. | 16:23 |
gryf | zhipeng, it was mostly very high level discussion | 16:24 |
gryf | we have to have some concrete implementation first | 16:25 |
zhipeng | will then on high level, any directions that we want to follow upon :) | 16:25 |
zhipeng | okey | 16:25 |
zhipeng | but have vendors to provide third party CI env would always be a good idea | 16:25 |
zhipeng | baremetal or vm, is it correct ? | 16:25 |
gryf | we can figure that out later | 16:25 |
zhipeng | sure | 16:26 |
zhipeng | #topic AoB | 16:26 |
zhipeng | any other topics ? | 16:26 |
ttk2[m] | Keep up the good work guys. | 16:26 |
zhipeng | that would be a good note our meeting ends on :) | 16:27 |
zhipeng | ok thx guys, let's end the meeting for today | 16:28 |
zhipeng | #endmeeting | 16:28 |
openstack | Meeting ended Wed May 17 16:28:25 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:28 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.html | 16:28 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.txt | 16:28 |
openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.log.html | 16:28 |
*** zhipeng has quit IRC | 16:30 | |
*** mikeH has quit IRC | 16:37 | |
*** crushil has quit IRC | 17:37 | |
*** crushil has joined #openstack-cyborg | 17:44 | |
*** davidjzhao has joined #openstack-cyborg | 18:01 | |
*** davidjzhao has quit IRC | 18:05 | |
*** crushil has quit IRC | 18:27 | |
*** jkilpatr has quit IRC | 18:50 | |
*** jkilpatr has joined #openstack-cyborg | 18:54 | |
*** crushil has joined #openstack-cyborg | 19:03 | |
*** crushil has quit IRC | 19:49 | |
*** jkilpatr has quit IRC | 20:04 | |
*** crushil has joined #openstack-cyborg | 20:06 | |
*** jkilpatr has joined #openstack-cyborg | 20:08 | |
*** kriskend_ has quit IRC | 21:35 | |
*** jkilpatr has quit IRC | 22:07 | |
*** jkilpatr has joined #openstack-cyborg | 22:24 | |
*** crushil has quit IRC | 23:32 | |
*** kriskend_ has joined #openstack-cyborg | 23:52 | |
*** kriskend_ has quit IRC | 23:58 |
Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!