Wednesday, 2017-05-17

*** crushil has quit IRC		00:01
*** crushil has joined #openstack-cyborg		00:02
*** crushil has quit IRC		00:32
*** crushil has joined #openstack-cyborg		00:41
*** crushil has quit IRC		00:50
*** skelso has joined #openstack-cyborg		01:38
*** sekelso has joined #openstack-cyborg		01:40
*** skelso has quit IRC		01:43
*** sekelso has quit IRC		02:17
*** sekelso has joined #openstack-cyborg		02:18
*** skelso has joined #openstack-cyborg		02:20
*** skelso has quit IRC		02:20
*** skelso has joined #openstack-cyborg		02:21
*** sekelso has quit IRC		02:22
*** crushil has joined #openstack-cyborg		02:55
*** skelso has quit IRC		03:14
*** crushil has quit IRC		03:31
*** jkilpatr has quit IRC		10:38
-openstackstatus- NOTICE: gerrit is being restarted to help stuck git replication issues		10:53
*** jkilpatr has joined #openstack-cyborg		11:00
*** mikeH has joined #openstack-cyborg		11:30
*** skelso has joined #openstack-cyborg		12:53
*** kriskend_ has joined #openstack-cyborg		13:12
*** crushil has joined #openstack-cyborg		13:59
*** skelso has quit IRC		14:05
*** cdent has joined #openstack-cyborg		14:52
jkilpatr	morning everyone	14:56
crushil	Morning jkilpatr	14:57
crushil	\o	15:00
crushil	Meeting?	15:00
*** zhipeng has joined #openstack-cyborg		15:02
jkilpatr	meeting, you know zhiping	15:03
jkilpatr	his internet probably decided not to wake up this morning.	15:03
zhipeng	#startmeeting openstack-cyborg	15:03
openstack	Meeting started Wed May 17 15:03:31 2017 UTC and is due to finish in 60 minutes. The chair is zhipeng. Information about MeetBot at http://wiki.debian.org/MeetBot.	15:03
openstack	Useful Commands: #action #agreed #help #info #idea #link #topic #startvote.	15:03
openstack	The meeting name has been set to 'openstack_cyborg'	15:03
jkilpatr	ah speak the devil.	15:03
zhipeng	just got connected	15:03
zhipeng	:P	15:03
jkilpatr	ok then I had a list of topics to go over	15:03
jkilpatr	unless someone wants to bring other things up before I steamroll head?	15:04
* cdent is here to lurk		15:04
jkilpatr	ahead*	15:04
zhipeng	#topic BP discussion	15:04
zhipeng	jkilpatrt go ahead	15:04
jkilpatr	zhipeng, your outlined Cyborg api doesn't actually have an attach api call. What did you plan to do catch some property of a booting instance?	15:05
zhipeng	as far as I could tell I did include the attach api ?	15:07
jkilpatr	nova can do pci hotplug, supposedly. which would make live attachment possible but I have no grasp on the gap between "it should work" and "it actually will"	15:07
* jkilpatr is checking the commit again please hold		15:07
* jkilpatr hold music ends		15:07
jkilpatr	so get have get, post, put ,delete but they all seem to be for managing the database of accelerators, you can put to update a accelerator spec but what if I have an instance I need to attach it to	15:08
zhipeng	Okey for VM instance, I still think we would have to look for Nova for the actual attachment operation	15:09
zhipeng	actually Jay and I discussed about this today	15:09
zhipeng	unless we have identify a set of properties for the accelerator connection to the host	15:10
zhipeng	meaning that we have a os-brick like library	15:10
zhipeng	unless we have that, then we could just assume it is a regular PCI-e attachment	15:11
zhipeng	I think for most of the use cases we have at the moment, PCIe would be the most usual case and Nova currently support that	15:12
jkilpatr	I think we need to come to a workflow conclusion sooner rather than later, how does the user ask for a vm with a specific accelerator, do they post to us and then we talk to nova, do they post to nova and we watch for some tag etc	15:13
jkilpatr	should there be like a 'user workflow' spec? I guess it goes into the api	15:13
zhipeng	workflow is kinda an end-to-end thing	15:14
zhipeng	I personally would suggest for the moment, we take the Cinder approach, and expand later base upon this	15:14
jkilpatr	that meaning live attachment?	15:15
zhipeng	because this might be the only way that we don't create a major impact on the existing implementations	15:15
zhipeng	not necessary live-attachment	15:15
zhipeng	but just attach.detach ops in general	15:15
zhipeng	for VM instances	15:16
zhipeng	since you will need the instance id and host id anyway	15:16
zhipeng	and nova got all these	15:16
jkilpatr	ok then, we have a booted vm we need to do the attachment, we can setup nova to do the passthrough and then reboot the instance, not sure how nova feels about it but if it doesn't work I don't think they would be opposed to us making that work	15:16
jkilpatr	but also nova does support live pci attachment, so we can/should just use that	15:16
zhipeng	yes if Nova indeed could support that	15:17
jkilpatr	https://wiki.openstack.org/wiki/Nova/pci_hotplug	15:17
jkilpatr	this is a stub right now, so some support	15:17
zhipeng	we just don't differentiate in Cyborg	15:17
jkilpatr	the driver actually handles this of course, we just run attach on a live instance that was spawned with a flavor or some other indicator that says "cyborg: TeslaP100"	15:18
zhipeng	yes	15:18
jkilpatr	because we have to make sure Nova gets it to the right spot firs, so we need to have the placement api fed live knowlege and then the instance needs to call the resource name we fed the placement api	15:19
jkilpatr	maybe we can have a command in cyborg that will help you make flavors, but that's for later.	15:19
zhipeng	hotplug could be the trait when we use placement api	15:19
zhipeng	so when we schedule it, it know it needs a compute node with hotplug feature	15:20
jkilpatr	wouldn't we want a bunch of traits	15:20
zhipeng	yep I guess so :)	15:20
jkilpatr	like gpu, with cuda support, with hotplug support .... so on and so forth	15:21
zhipeng	we could start with really basic and simple ones	15:21
jkilpatr	this is why we need a flavor creation wizard in cyborg but once again, later.	15:21
zhipeng	yes	15:21
jkilpatr	ok anyone have comments on these ideas? things that they might want that this won't cover?	15:21
jkilpatr	zhipeng, I'm going to put a comment on your api patch as a reminder to add cinder like attach/detach to the spec sound good?	15:22
zhipeng	which I thought is already done in the current patch ?	15:23
cdent	it would be great to see, at some point, a narration of the expected end to end flow from a user's standpoint, if it doesn't already exist. Including how various services will be touched.	15:23
zhipeng	cdent we got a flow chart in our BOS presentation, but rather rudimentary at the moment	15:24
jkilpatr	cdent, we have a decent idea of how we want it to work but I expect some things will change as we get into the nitty gritty of placement problems	15:24
cdent	sure, change is the nature of this stuff :)	15:24
jkilpatr	zhipeng, ok so if I want to attach an accelerator to an instance what do I do? Do I put to update an accelerator spec with a new instance ID to attach to?	15:24
zhipeng	:)	15:24
jkilpatr	cdent, I think I'll put up a user workflow spec later today, just so that we keep track of all of this better.	15:25
cdent	\o/	15:25
cdent	can you add me as a review on that when it is up, so I get some email to remind me to look?	15:25
zhipeng	just as we drew for our presentation, after the user using Cyborg service to complete the discovery phase and Cyborg finishing interaction with placement to advertise the accelerator inventory	15:26
jkilpatr	will do, whats your email?	15:26
cdent	cdent@anticdent.org is me on gerrit	15:26
zhipeng	then user just request to create an instance on a compute node with the corresponding accelerator trait	15:27
zhipeng	if trait include hotplug, then maybe it will be a live attachment	15:27
jkilpatr	I really don't think we're going to get away with one trait per accelerator, users will probably bundle them into flavors, but instead of being tied to a list of whitelisted pci devices these flavors can be much mroe general.	15:27
zhipeng	which means user could attach the accelerator after VM creation	15:27
zhipeng	I was told by jay today that trait are per resource provider	15:28
zhipeng	so it would mostly be one trait per compute node	15:29
jkilpatr	I'll have to look at it in detail, I was watching the summit presentation again today.	15:29
zhipeng	or we got vGPUs or FPGA virtual functions	15:29
zhipeng	then it would be nested resource provider	15:29
zhipeng	and we could have trait on the virtual functions	15:29
zhipeng	but anyways it does not tie to a specific accelerator	15:29
zhipeng	just depending on you model your accelerators into resource providers	15:30
zhipeng	cdent plz correct me if i'm wrong :P	15:30
jkilpatr	we're going to need to be careful with that.	15:30
zhipeng	#link https://pbs.twimg.com/media/DAAUxEWUAAAV6zA.jpg	15:31
cdent	zhipeng: that looks mostly correct, but I'm only partially paying attention :(	15:31
zhipeng	cdent no problemo :)	15:31
zhipeng	as long as I don't make any extremely wrong claims :)	15:31
jkilpatr	so implementation. What can we start and when?	15:32
zhipeng	as soon as we freeze the specs	15:32
zhipeng	i suppose we should all go ahead start coding	15:32
crushil	Can we focus on closing out the specs out first though?	15:32
zhipeng	yes	15:32
jkilpatr	ok then that's a plan.	15:32
zhipeng	okey, then for api spec	15:33
zhipeng	#link https://review.openstack.org/445814	15:33
zhipeng	any other questions ?	15:33
jkilpatr	I just posted a comment there, otherwise I'm happy enough	15:34
zhipeng	okey	15:34
jkilpatr	um should we pick a database tech? what's available already sql, mongo... reddis (not sure about that one)	15:34
crushil	MariaDB?	15:35
zhipeng	#action jkilpatr to post a reminder comment, the api spec patch is LGTM	15:35
zhipeng	i think we could just use mysql	15:35
jkilpatr	MariaDB == mysql except when it doesnt	15:35
jkilpatr	I think openstack ships with Maria right now	15:36
zhipeng	yes	15:36
crushil	yup	15:36
zhipeng	next up, agent spec	15:36
zhipeng	#link https://review.openstack.org/#/c/446091/	15:36
jkilpatr	looks like most people are happy with it.	15:37
zhipeng	#info Jay Pipes suggest agent could directly interact with placement api, instead of going through conductor	15:37
zhipeng	#link https://pbs.twimg.com/media/DAAUtyoUMAAXQXI.jpg	15:37
jkilpatr	so from the summit preso all the computes already talk to the placement api themselves	15:37
zhipeng	but I guess we don't need to reflect that in the agent spec	15:37
jkilpatr	so it's designed to scale well like that	15:37
jkilpatr	I'd prefer to be explicit, I'll patch it into my spec today	15:38
crushil	We should reflect that in the agent spec	15:38
zhipeng	jkilpatr yes, and for implementation, Jay suggest we could directly just copy nova/scheduler/client/report.py	15:39
zhipeng	since it is basically rest calls between agent and placement api	15:39
jkilpatr	that's the sort of laziness I can get behind.	15:39
zhipeng	XD	15:39
crushil	lol	15:39
zhipeng	#agreed jkilpatr do a quick update on agent spec to reflect jaypipes comment, then the agent spec patch LGTM	15:40
zhipeng	okey, next up, generic driver	15:40
zhipeng	#link https://review.openstack.org/#/c/447257/	15:40
zhipeng	any more comments	15:41
zhipeng	looks fine to me	15:41
crushil	jkilpatr, Any other comments on your end? I have tried to address all of your and Roman's comments in the patch	15:41
jkilpatr	what about detect accelerator? discovery has to be handled by someone, do we want drivers to have a discovery call?	15:41
jkilpatr	I like the rest of the api list for it, good job	15:42
crushil	I can add that to the list. What would be the flow though for discovery?	15:43
zhipeng	i think discovery already part of the spec ?	15:43
zhipeng	see line 121	15:43
jkilpatr	ah yup just not in the other list	15:43
crushil	It's not part of the API list	15:44
jkilpatr	crushil, the flow (which I think you should add into your spec or maybe me in to the agent spec)	15:44
jkilpatr	is agent on first startup says "hey I've never been started up before, lets call discover for all my drivers"	15:44
jkilpatr	whatever returns true it lists and sends to the conductor to store in the db as possible accelerators	15:44
jkilpatr	later on operators can call discover to do this again and add new accelerators.	15:45
jkilpatr	as a note I think accelerators should get added in a "not ready" state with the operator having to tell cyborg to go install drivers otherwise we risk bad endings installing software on live clouds	15:45
jkilpatr	more things to add to the agent spec	15:45
zhipeng	agree	15:46
crushil	+1	15:46
crushil	Makes sense, but should we add it to the driver spec or agent spec or both?	15:46
zhipeng	i think for both, because discovery is directly triggered by agent to run loops on drivers ,right ?	15:47
jkilpatr	crushil, driver spec just needs "on discovery return if the accelerator exists or not" agent is the one that will call discovery then wait for the operator to call the api to move the accelrator into 'ready' before calling the install driver function.	15:47
zhipeng	yep	15:48
jkilpatr	um speaking of message passing	15:48
jkilpatr	most of this should be done over message passing	15:48
jkilpatr	rabbitmq/oslo messaging fine?	15:48
crushil	Yup, that is the OS standard	15:48
zhipeng	yep	15:48
zhipeng	#agreed crushil to update the driver spec to include the discovery interface, and jkilpatr update the agent spec to reflect the related operations, otherwise it is LGTM	15:50
zhipeng	moving along, next up, interaction https://review.openstack.org/#/c/448228/	15:50
zhipeng	#link https://review.openstack.org/#/c/448228/	15:50
zhipeng	I think we still need more work in this	15:51
zhipeng	first of all thx to gryf to work this on his own time	15:51
ttk2[m]	I think this is where most of the workflow stuff is hidden right now.	15:51
ttk2[m]	Oh this is jkilpatr moved to my phone.	15:52
zhipeng	yes	15:53
zhipeng	we should continue to work on the spec, but I don't think it will block our implementation	15:53
zhipeng	any thoughts ?	15:54
zhipeng	and tttk2[m] I think you could just work with Roman on this patch to illustrate the workflow	15:56
zhipeng	and also have cdent for review	15:56
crushil	Ya, makes sense. But, we need to have a cutoff date to finish the spec	15:56
zhipeng	we slip the Apr 15th one rather quickly lol, but ya I agree we need another cutoff date	15:57
zhipeng	what is the m2 deadline for Pike ?	15:57
crushil	June 9	15:58
zhipeng	i think we could just use that for all the non-LGTM specs per today's meeting	15:59
ttk2[m]	Ok then. Can we comment on the specs with that deadline.	15:59
crushil	We should close out all the other specs sooner though	15:59
ttk2[m]	I feel like we should make a point of moving info out of meetings and into specs so we don't lose them in the back hole or IRC logs.	15:59
crushil	+1	15:59
zhipeng	+1	16:00
zhipeng	at least for all the LGTM specs I will merge those by the end of this week	16:01
zhipeng	#agreed set June 9th for a hard cut-off date for all the remaining spec, including cyborg-nova interaction	16:01
zhipeng	next up , conductor spec	16:01
zhipeng	#link https://review.openstack.org/#/c/463316/	16:02
zhipeng	i think I will post some review, most on the wording	16:02
zhipeng	but this should be a simple one for us to freeze this week	16:02
ttk2[m]	Agreed. It's pretty much just glue code.	16:03
zhipeng	#agreed after some polishing, conductor spec LGTM this week	16:04
zhipeng	the last one in the queue, not a spec patch tho	16:04
zhipeng	#link https://review.openstack.org/#/c/461220/	16:04
zhipeng	could folks just give a +1 so that I could merge it, it is mostly a house cleaning stuff	16:04
gryf	I have mixed feelings about that	16:07
* gryf just joined		16:08
zhipeng	gryf which topic ?	16:08
gryf	nacsa.tgz in a repo	16:08
gryf	it doesn't sound right	16:08
zhipeng	we just hosted in the sandbox	16:09
zhipeng	we could even move them out to an individual repo later on	16:09
gryf	well, yeah	16:09
zhipeng	but we did have extensive discussion on that matter	16:09
zhipeng	with moshe and his team	16:09
gryf	but it will affect size of the repositiory	16:09
zhipeng	then I think maybe we could move the sandbox out to an individual repo, such as cyborg-sandbox	16:10
zhipeng	so that it won't affect the cyborg project repo itself	16:11
gryf	yes, I think that the better solution	16:11
gryf	also	16:11
gryf	I'd like to avoid keeping binary blobs in repository	16:11
ttk2[m]	Agreed.	16:12
zhipeng	that's fine for me :)	16:12
zhipeng	but we do need to merge it first, and then move it out	16:12
*** cdent has left #openstack-cyborg		16:12
zhipeng	due process	16:12
gryf	so the perfect solution would be to unpack it, and make the commmit which move entire work into its own directory. what do you think?	16:12
zhipeng	nuh that won't be necessary	16:12
zhipeng	i think just move to another repo just for records	16:13
zhipeng	we won't do any release, for example , for the cyborg-sandbox	16:13
ttk2[m]	Um if we merge it it's in the repo history forever.	16:13
zhipeng	it just sits there	16:13
zhipeng	no we could move it our	16:13
zhipeng	and we need to move out the spec later as well	16:13
zhipeng	cyborg-spec will be the standalone repo to store all the specs	16:14
ttk2[m]	I don't have super strong feelings. But Id like to keep binaries out of the repo	16:14
gryf	ttk2[m], +1	16:14
zhipeng	I have no problem either	16:14
zhipeng	but let's just follow a procedure and get it done	16:14
zhipeng	sounds reasonable for everyone ?	16:16
gryf	zhipeng, what exactly do you mean by following procedure?	16:18
zhipeng	have it first in the current cyborg repo, and then move it out to a seperate one	16:19
gryf	I'm against it. as ttk2[m] said - if we merge it, it stays forever.	16:19
zhipeng	why ??	16:19
gryf	it's a git :>	16:19
ttk2[m]	Because history	16:19
zhipeng	say we couldn;t even move the specs out ?	16:20
adreznec	merging it will permanently increase the repo size because the artifact will remain in the history forever	16:20
zhipeng	okey understood	16:21
gryf	zhipeng, we can, but they will be available, if someone would like to go back in time (in history) and nothing prevent him to do so :D	16:21
zhipeng	then I will abandon the patch and directly submit it to the seperate repo instead	16:21
zhipeng	this sounds reasonable ?	16:21
gryf	unless, we do some rebase stuff on the repo itself, but I'm not aware if this is a good practice	16:21
gryf	yup	16:21
zhipeng	#agreed abandon the nacsa sandbox patch and directly submit it to a seperate repo	16:22
adreznec	gryf: yeah, you basically have to use a rebase or git filter-branch to remove it, but that'll break everyone's checked out repos since you're rewriting history... so not typically good practice	16:22
zhipeng	okey, we got many things settled :)	16:23
zhipeng	#topic CI discussion	16:23
zhipeng	as I understand ttk2[m] and gryf has some discussion on the CI settings	16:23
zhipeng	do we have any perference now ?	16:23
gryf	adreznec, yeah.	16:23
gryf	zhipeng, it was mostly very high level discussion	16:24
gryf	we have to have some concrete implementation first	16:25
zhipeng	will then on high level, any directions that we want to follow upon :)	16:25
zhipeng	okey	16:25
zhipeng	but have vendors to provide third party CI env would always be a good idea	16:25
zhipeng	baremetal or vm, is it correct ?	16:25
gryf	we can figure that out later	16:25
zhipeng	sure	16:26
zhipeng	#topic AoB	16:26
zhipeng	any other topics ?	16:26
ttk2[m]	Keep up the good work guys.	16:26
zhipeng	that would be a good note our meeting ends on :)	16:27
zhipeng	ok thx guys, let's end the meeting for today	16:28
zhipeng	#endmeeting	16:28
openstack	Meeting ended Wed May 17 16:28:25 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)	16:28
openstack	Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.html	16:28
openstack	Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.txt	16:28
openstack	Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2017/openstack_cyborg.2017-05-17-15.03.log.html	16:28
*** zhipeng has quit IRC		16:30
*** mikeH has quit IRC		16:37
*** crushil has quit IRC		17:37
*** crushil has joined #openstack-cyborg		17:44
*** davidjzhao has joined #openstack-cyborg		18:01
*** davidjzhao has quit IRC		18:05
*** crushil has quit IRC		18:27
*** jkilpatr has quit IRC		18:50
*** jkilpatr has joined #openstack-cyborg		18:54
*** crushil has joined #openstack-cyborg		19:03
*** crushil has quit IRC		19:49
*** jkilpatr has quit IRC		20:04
*** crushil has joined #openstack-cyborg		20:06
*** jkilpatr has joined #openstack-cyborg		20:08
*** kriskend_ has quit IRC		21:35
*** jkilpatr has quit IRC		22:07
*** jkilpatr has joined #openstack-cyborg		22:24
*** crushil has quit IRC		23:32
*** kriskend_ has joined #openstack-cyborg		23:52
*** kriskend_ has quit IRC		23:58

Generated by irclog2html.py 2.14.0 by Marius Gedminas - find it at mg.pov.lt!