*** fanzhang_ has left #openstack-cyborg | 00:58 | |
*** openstackgerrit has joined #openstack-cyborg | 03:14 | |
openstackgerrit | Xinran WANG proposed openstack/cyborg master: Allocation/Deallocation API Specification https://review.openstack.org/597991 | 03:14 |
---|---|---|
*** jiapei has joined #openstack-cyborg | 03:42 | |
*** fanzhang_ has joined #openstack-cyborg | 07:11 | |
*** fanzhang_ has left #openstack-cyborg | 07:11 | |
*** fanzhang_ has joined #openstack-cyborg | 07:11 | |
*** helenafm has joined #openstack-cyborg | 07:13 | |
*** jiapei has quit IRC | 08:31 | |
*** helenafm has quit IRC | 08:49 | |
*** helenafm has joined #openstack-cyborg | 10:15 | |
*** helenafm has quit IRC | 11:24 | |
*** efried has quit IRC | 16:10 | |
*** efried has joined #openstack-cyborg | 16:10 | |
*** dims_ is now known as dims | 18:05 | |
*** Sundar has joined #openstack-cyborg | 19:10 | |
Sundar | efried: Please LMK when you are here. | 19:11 |
efried | Sundar: ō/ | 19:11 |
Sundar | efried: On the Nova spec, you correctly noted that I am not asking to include acceN in Placement queries | 19:11 |
Sundar | These additional properties are needed to specify bitstreams, functions and things that Nova doesn't care about | 19:12 |
efried | I understand | 19:12 |
Sundar | So, I have proposed that they be bundled with the request groups in the same way as resources or traots | 19:12 |
Sundar | So, if we stick to that idea, do you expect pushback from other Nova developers? What does it take to get this through? | 19:13 |
efried | I could see some people objecting to the fact that it *looks* like they're tied together | 19:14 |
efried | which, of course, they are | 19:14 |
efried | I think you're going to have the same problem gibi is trying to solve via https://review.openstack.org/#/c/597601/ -- there's no way to map a request group to a piece of an allocation. | 19:15 |
efried | But that's orthogonal to the design decision of using accN to denote such information. | 19:15 |
Sundar | The alternative is to split the device profile entries in each VAR and store it in Cyborg. The problem then is that the information gets duplicated. That can lead to issues with consistency. | 19:17 |
efried | Which information gets duplicated? | 19:17 |
efried | you mean include stuff like bitstream info in the device profile? | 19:18 |
efried | that seems... reasonable. | 19:18 |
efried | at least for phase 1 | 19:18 |
Sundar | For e.g., say the device profile says: { resources:ACCEL_GPU=2; trait:GPU_FOO=required}. Then we would create 2 separate VARs. Each one could contain a copy like: {resources: ACCEL_GPU=1, trait:GPU_FOO=required} | 19:19 |
Sundar | So, there are 2 VARs, each requesting 1 resource | 19:19 |
efried | yes, that's as it should be. One VAR, one accelerator. | 19:20 |
Sundar | That means, the resources/traits go duplicated. Also, the semantics that both resources need to come from the same resource provider is lost from Cyborg's perspective | 19:20 |
Sundar | *got | 19:20 |
efried | So | 19:20 |
efried | I was thinking a device profile would be for *one* device. | 19:20 |
efried | But | 19:20 |
efried | then you can't, as you say, demand that two devices come from the same provider | 19:21 |
efried | or more generally, take advantage of granular syntax. | 19:21 |
efried | unless | 19:21 |
Sundar | Yes. Nova would have that info, and presumably can colocate them. But, if VCyborg needs that info to connect the accelerators together, or whatever, that is ruled out | 19:22 |
efried | we number the extra spec key using the same pattern as we do for the other resources | 19:22 |
efried | which is actually probably a better, more composable design. | 19:22 |
Sundar | Yes! | 19:22 |
Sundar | +1 | 19:22 |
Sundar | That is the proposal in the spec | 19:22 |
efried | it is? | 19:22 |
efried | I got the impression you had set up device profiles so they could contain more than one accelerator | 19:23 |
Sundar | Yes, I propose to take the device profile fields, cobvert them to extra spec granular syntax and fold them in | 19:23 |
Sundar | Yes, in the above example, we would convert that to: | 19:23 |
Sundar | resources2:ACCEL_GPU=2; trait2:GPU_FOO=required | 19:24 |
Sundar | The only sticking point is who does that numbering. I thought Cyborg or os-acc could do that. But the comments point out that there could be request groups in the flavor unrelated to device profiles or Cybirg | 19:24 |
Sundar | So, Nova should do the numbering. I am fine with that | 19:24 |
efried | Yeah, you should ask gibi how he did that for the network bandwidth PoC. Pretty sure he would have done it from the nova thing that parses the extra specs. | 19:25 |
Sundar | OK. I'll take a look at the spec you cited above, and ask him. | 19:26 |
Sundar | Thanks! | 19:26 |
Sundar | efried: This spec seems orthogonal to our needs, Do you have pointers to other specs from him on this topic? | 19:27 |
efried | Sundar: Here's where I suspect that is getting done: https://github.com/openstack/nova/blob/e658f41d686e4533640b101622f2342348c0316d/nova/scheduler/utils.py#L433 | 19:28 |
efried | Oh, the spec I cited above has nothing to do with renumbering request groups. That was just acking the fact that you're (probably) going to need a way to map a numbered group in the *request* to some piece of the allocation in the *response*. And placement has no way right now to help you with that - you basically have to reverse-engineer it yourself. | 19:29 |
efried | Let's say for example that your user wants two accelerators of the same type (say VGPU), doesn't care if they're from the same provider or not, and also wants to associate some other resource with each - let's say VRAM. | 19:31 |
efried | So the user specifies one dev profile with VGPU=1,VRAM=1024 and another with VGPU=1,VRAM=2048 | 19:31 |
efried | placement, in its infinite wisdom, decides to satisfy both accelerators from the same provider | 19:32 |
efried | so you get back an allocation with: VGPU=2,VRAM=3072 | 19:32 |
efried | And now you have to reverse-engineer that to figure out that you actually wanted one VGPU=1,VRAM=1024 and one VGPU=1,VRAM=2048 | 19:33 |
efried | Simple example, but add in more device profiles at the same time and/or try to generalize that reverse engineering, and it quickly becomes problematic. | 19:33 |
efried | That's what gibi proposed his spec for. | 19:34 |
Sundar | Yea, I got that | 19:34 |
efried | Because in placement, *while* we're calculating allocation candidates (but not after), we know which request group corresponds to which bit of each allocation request. | 19:34 |
efried | We just don't carry that information forward to the response, or preserve it internally in any way. | 19:34 |
Sundar | A related issue is that, if VRAM were not a standard resource, you may say accel:VRAM=2048 or something | 19:34 |
efried | um, rather not. If it's not a standard resource, it can be CUSTOM_VRAM, but it should still be tracked in placement. | 19:35 |
Sundar | Now, the accel extra spec has to go with the rest of the request group, for the device to be suitably configured during bind | 19:35 |
efried | If not, you're setting yourself up for resource contention problems that placement was expressly design to obviate. | 19:36 |
Sundar | OK, that was not a good example. What if I wnat to assign the GPU to host software like DPDK for cryptography, rather than to a VM? Then the dev prof may say accel:assign-to-host=true | 19:37 |
Sundar | This is not a trait | 19:38 |
Sundar | because it is not meant for scheduling, | 19:38 |
Sundar | only to control how the assignment is done | 19:39 |
efried | I don't understand that use case. But yeah, traits are another useful way to expose the problem. | 19:39 |
efried | If I ask for resource1=VGPU:1&required1=CUSTOM_FOO&resource2=VGPU:1, and I get back two allocations from different providers, I need to go figure out (based on traits? or what I already know about the providers?) which VGPU I'm supposed to apply to CUSTOM_FOO. | 19:40 |
efried | again, it's not sooper hard, but once you generalize it out, you end up essentially having to duplicate a ton of the placement logic. | 19:41 |
Sundar | Yes | 19:42 |
Sundar | OK, it still comes down to how gibi solved this problem, I suppose. I'll dig around. | 19:43 |
efried | *this* problem I don't think gibi solved. | 19:44 |
efried | He solved the renumbering problem. | 19:44 |
efried | I believe his PoC behaved quite similarly to what you're proposing - the resources/required gizmos live in the profile (the binding profile? or port? in his case) | 19:44 |
efried | and then he has to renumber those groups when he folds the profile in with the other flavor stuff. | 19:45 |
efried | They wrote up a blog post that walked through the PoC, and I think some of that becomes clear in the demo | 19:45 |
efried | https://rubasov.github.io/2018/09/21/openstack-qos-min-bw-demo.html | 19:46 |
efried | yeah, search for resource_request in the above doc and you'll see what I'm talking about. | 19:47 |
efried | each port profile looks like it gets to ask for resources for a single port. (I guess the actual VIF/VNIC resource is implicit? Not sure how that's working.) | 19:48 |
efried | So the resources/required keys are unnumbered in the port profile. | 19:48 |
efried | And then (not sure if this is in the doc, but I know it's true because it was in the original spec and I confirmed by asking the question in the room when they did the demo) those get numbered in a non-conflicting way when they get folded into the rest of the request. | 19:49 |
efried | Sundar: switching screens; ping me if you want to talk more. | 19:55 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!