*** irclogbot_0 has quit IRC | 02:23 | |
*** edmondsw has quit IRC | 02:26 | |
*** edleafe has quit IRC | 02:27 | |
openstackgerrit | chenker proposed openstack/cyborg master: Add the module used but not be imported https://review.openstack.org/643842 | 03:07 |
---|---|---|
openstackgerrit | chenker proposed openstack/cyborg master: Add the module used but not be imported https://review.openstack.org/643842 | 03:11 |
openstackgerrit | chenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py https://review.openstack.org/643874 | 08:09 |
openstackgerrit | chenker proposed openstack/cyborg master: Add the module used but not be imported https://review.openstack.org/643842 | 08:16 |
*** ikuo_o_ has joined #openstack-cyborg | 08:37 | |
*** ikuo_o_ is now known as ikuo_o | 08:40 | |
openstackgerrit | chenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py https://review.openstack.org/643874 | 08:40 |
openstackgerrit | chenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py https://review.openstack.org/643874 | 10:57 |
*** ikuo_o has quit IRC | 11:09 | |
*** edmondsw has joined #openstack-cyborg | 11:50 | |
*** edleafe has joined #openstack-cyborg | 12:21 | |
openstackgerrit | chenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py https://review.openstack.org/643874 | 12:39 |
*** shaohe_feng_ has joined #openstack-cyborg | 14:37 | |
openstackgerrit | OpenStack Release Bot proposed openstack/os-acc master: Update master for stable/stein https://review.openstack.org/644018 | 14:41 |
shaohe_feng_ | hello everyone | 15:00 |
*** wangzhh has joined #openstack-cyborg | 15:02 | |
shaohe_feng_ | evening wangzhh | 15:03 |
wangzhh | evening shaohe. | 15:03 |
shaohe_feng_ | #startmeeting openstack-cyborg-driver | 15:04 |
openstack | Meeting started Mon Mar 18 15:04:20 2019 UTC and is due to finish in 60 minutes. The chair is shaohe_feng_. Information about MeetBot at http://wiki.debian.org/MeetBot. | 15:04 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 15:04 |
*** openstack changes topic to " (Meeting topic: openstack-cyborg-driver)" | 15:04 | |
openstack | The meeting name has been set to 'openstack_cyborg_driver' | 15:04 |
shaohe_feng_ | let waits for a minutes. | 15:04 |
shaohe_feng_ | #info shaohe_feng_ | 15:05 |
wangzhh | Fine. | 15:05 |
*** xinranwang has joined #openstack-cyborg | 15:05 | |
xinranwang | Hi all | 15:06 |
wangzhh | Hi xinran. | 15:06 |
xinranwang | Sorry for late | 15:06 |
xinranwang | #info xinranwang | 15:06 |
shaohe_feng_ | evening xinranwang | 15:06 |
wangzhh | #info wangzhh | 15:06 |
xinranwang | hi shaohe_feng_ wangzhh | 15:06 |
shaohe_feng_ | we have not hold this meeting for a long time. | 15:06 |
shaohe_feng_ | #link https://wiki.openstack.org/wiki/Meetings/CyborgDriverTeamMeeting#Agenda_for_next_meeting_:_Mar_18th.2C_2019 | 15:07 |
shaohe_feng_ | here is the agent. | 15:07 |
*** Li_Liu has joined #openstack-cyborg | 15:07 | |
shaohe_feng_ | s/agent/agenda | 15:07 |
Li_Liu | Hi Gyus | 15:07 |
wangzhh | Hi, uncle Li. | 15:08 |
shaohe_feng_ | Li_Liu: morning uncle Li. | 15:08 |
Li_Liu | Hi, xiaohei~~ | 15:08 |
Li_Liu | hi shaohe | 15:08 |
Li_Liu | you guys wanna do a zoom meeting instead? | 15:08 |
shaohe_feng_ | I want to introduce some some hardware accelerators. | 15:08 |
shaohe_feng_ | 1. the current know type of accelerator card | 15:09 |
shaohe_feng_ | as we all know cyborg will support mdev and pci card. | 15:09 |
shaohe_feng_ | but now I find there are 2 other kinds of hardware card we can support. | 15:10 |
shaohe_feng_ | one is ip over PCIE, another is USB. | 15:10 |
Li_Liu | i see | 15:10 |
shaohe_feng_ | wangzhh: do you know these two kind cards? | 15:11 |
Li_Liu | can they fit into our current design? | 15:11 |
shaohe_feng_ | not sure, so we need more discuss with them. | 15:11 |
wangzhh | I don't know much about ip over pcie, what does that mean? | 15:11 |
Li_Liu | I think it's a remote case | 15:12 |
Li_Liu | PCI over ethernet? | 15:12 |
shaohe_feng_ | Li_Liu: yes. | 15:13 |
shaohe_feng_ | #link https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/vca-2-visual-compute-accelerator-product-brief.pdf | 15:13 |
shaohe_feng_ | Li_Liu: No, IP over pci. | 15:13 |
Li_Liu | from Operation System point of view, it's still a pci device right? | 15:14 |
shaohe_feng_ | Li_Liu: is it s pci devices, but you communicate with it by it. | 15:14 |
shaohe_feng_ | Li_Liu: it is a local pci card. | 15:14 |
shaohe_feng_ | such as the vca2 card, see link above, | 15:15 |
Li_Liu | you mean other hosts can communicate with it over ethernet? | 15:15 |
wangzhh | So, actually, it is a pci device? | 15:16 |
shaohe_feng_ | oh, the local host communicate the local card, over PCIE. | 15:16 |
wangzhh | from os view. | 15:16 |
shaohe_feng_ | wangzhh: from the os view, you can see it a new kind device with new driver maybe. | 15:17 |
shaohe_feng_ | there's another card I have attend meeting last week, seem this is a common way for some card. | 15:18 |
shaohe_feng_ | we can dig more about this kind of card. | 15:18 |
shaohe_feng_ | for usb card, the movidius AI card is this kind. | 15:19 |
Li_Liu | I think we just need to make sure 2 things: 1. can os-acc attach it like all the other devices, 2. can the resource fit into our current data model | 15:20 |
shaohe_feng_ | yes. | 15:20 |
Li_Liu | as long as these two requirements can meet, we should be good | 15:20 |
wangzhh | It make sense. | 15:21 |
shaohe_feng_ | I think the usb devices can satisfy these two requirements. | 15:21 |
Li_Liu | shaohe_feng_ have they finalize the resource structure yet? | 15:22 |
shaohe_feng_ | Li_Liu: usb, yes. | 15:23 |
shaohe_feng_ | just remind these 2 kind devices. | 15:23 |
wangzhh | How about another one? | 15:23 |
shaohe_feng_ | wangzhh: I'm not looking into looking into it well. | 15:24 |
shaohe_feng_ | OK, let's go ahead. | 15:24 |
wangzhh | OK. | 15:24 |
shaohe_feng_ | Re-enumeration of hardware card | 15:24 |
shaohe_feng_ | most of us know the issue of Re-enumeration. | 15:25 |
Li_Liu | the issue we discussed last week? | 15:25 |
shaohe_feng_ | no, but this is a common issue. | 15:26 |
shaohe_feng_ | the bus of a hardware card maybe change after we resize a hardware and reboot. | 15:26 |
shaohe_feng_ | seem this is a big problem for accelerator manage in cyborg. | 15:27 |
shaohe_feng_ | I have discuss it with Yongli, the main PCI devices contributor in nova. | 15:28 |
shaohe_feng_ | he say, nova does not allow resize hardware | 15:28 |
Li_Liu | you mean add/remove device after reboot? | 15:29 |
shaohe_feng_ | yes | 15:29 |
shaohe_feng_ | unless evict all VMs from this node. | 15:30 |
shaohe_feng_ | Li_Liu: wangzhh: xinranwang: what's do you think about? | 15:30 |
shaohe_feng_ | Or do you have a good ideas for hardware resize? | 15:31 |
wangzhh | Wuu, IMO, it's better to change status to error or offline in cyborg. | 15:31 |
wangzhh | And let operator sync it manaully. | 15:31 |
Li_Liu | Let's say before restart we have 3 cards, after restart we now have 4 cards | 15:32 |
Li_Liu | I think driver can find out which one is the new one right? | 15:32 |
wangzhh | We can supply a tool or api for operator to update it. | 15:32 |
xinranwang | If we plug in a new card on server, and reboot. But the hw resource assigned to an instance does not change. | 15:32 |
xinranwang | will the bdf change? if so, that should be an issue. | 15:33 |
wangzhh | Li_liu, as xinran said. | 15:33 |
Li_Liu | the bdf might change, but we don't need to guarantee give user the card with the same bdf | 15:33 |
shaohe_feng_ | the bdf maybe change, bus-port of a usb devices also maybe change. | 15:34 |
Li_Liu | just give user the card with the same type | 15:34 |
wangzhh | The most tricky thing is how to handle the resource which had been assinged. | 15:35 |
xinranwang | if user has done some work on old hw, that will be a loss. | 15:36 |
Li_Liu | in that case, operator has to notify the user to backup first | 15:37 |
Li_Liu | size operator should know when the resizing is happening | 15:37 |
Li_Liu | since* | 15:37 |
Li_Liu | In 99% of the scenarios tho, I don't think it matters anyway | 15:38 |
wangzhh | What about power failure? | 15:38 |
xinranwang | how will nova record the hw resource from cyborg, there should be a field of nova instance to record this. is this attach_handle_uuid? | 15:39 |
Li_Liu | if power failure happen, the device should not be resized right? | 15:39 |
shaohe_feng_ | if you hotplug in a hardware before failure happen, the things is also bad. | 15:40 |
wangzhh | Li, If we just reboot the server, the bus wont't change? | 15:42 |
wangzhh | *won't | 15:42 |
Li_Liu | lol... as I said.. if operator wants to do this... he/she needs to notify users... | 15:43 |
shaohe_feng_ | if you do not resize hardware, the bus wont't change. | 15:43 |
shaohe_feng_ | Li_Liu: yes. | 15:43 |
xinranwang | wangzhh: no, it will not change | 15:43 |
wangzhh | shaohe_feng_, Got it. | 15:43 |
Li_Liu | wangzhh, I think simple reboot should not change the bdf | 15:43 |
Li_Liu | bios just scan the pci tree | 15:43 |
Li_Liu | if nothing new is inserted, it should not change | 15:44 |
shaohe_feng_ | live migrate the VM to another host. | 15:44 |
xinranwang | that's more complex... | 15:44 |
wangzhh | scheduler filter should deal with this part. shaohe_feng_ | 15:45 |
shaohe_feng_ | the data center can scale their hardwares. For example the want to support more AI card in their exist hosts. | 15:46 |
shaohe_feng_ | OK, let keep this issue in mind, maybe we can find a good way to solve it | 15:47 |
shaohe_feng_ | go ahead. | 15:47 |
shaohe_feng_ | multi-level resources support | 15:47 |
shaohe_feng_ | now I want to support a new multi-level card. | 15:48 |
shaohe_feng_ | similar to pfga card. | 15:48 |
shaohe_feng_ | for example. There is a one region in a card but 4 functions in a region. | 15:49 |
Li_Liu | sure, to support new cards. as long as it can meet the requirements I mentioned earlier | 15:49 |
Li_Liu | 4 different functions? | 15:49 |
shaohe_feng_ | there's 3 requirements: | 15:49 |
shaohe_feng_ | Li_Liu: in my new card, they are same function, but for fpga, it may different functions. fpga is more complex. | 15:50 |
shaohe_feng_ | 1. we should know the topology of this devices. | 15:51 |
shaohe_feng_ | 2. user can apply any level of the resources, for example, he want to apply a region or just one function. | 15:52 |
shaohe_feng_ | 3. avoid fragmentization | 15:52 |
shaohe_feng_ | Li_Liu: now the cyborg satisfy the the former 2 requirements, right? | 15:53 |
Li_Liu | shaohe_feng_ it should | 15:54 |
shaohe_feng_ | Ok, greate. | 15:54 |
shaohe_feng_ | what's about 3. | 15:54 |
Li_Liu | cyborg was designed to have these in mind | 15:54 |
shaohe_feng_ | good. | 15:54 |
Li_Liu | the 3rd one is related to scheduling algorithm | 15:54 |
Li_Liu | we might need to work with nova weigher for that | 15:55 |
shaohe_feng_ | Li_Liu: that's need cyborg help. | 15:55 |
Li_Liu | that's for sure | 15:55 |
shaohe_feng_ | let me elaborate it | 15:55 |
shaohe_feng_ | 3 regions | 15:56 |
Li_Liu | cyborg can provide a weigher like mechanism and work with nova | 15:56 |
shaohe_feng_ | one region with 4 function. | 15:56 |
shaohe_feng_ | User 1 apply one function from region 1 | 15:56 |
shaohe_feng_ | user 2 want another 2 more functions. I expect cyborg allocate them from region 1 instead of region 2/3. | 15:58 |
shaohe_feng_ | user 3 want another one more functions, it is also from region 1. | 15:58 |
shaohe_feng_ | the allocation should not scatter among region 1,2 and 3 | 15:59 |
shaohe_feng_ | they should centralize 1 region. | 16:00 |
Li_Liu | that should be easy to do. a weigher would do the job' | 16:00 |
shaohe_feng_ | so user 4 can apply the rest 2 whole regions. | 16:01 |
shaohe_feng_ | Li_Liu: OK, is there a weigher mechanism for it now? | 16:02 |
Li_Liu | not yet | 16:02 |
Li_Liu | we can plan this | 16:02 |
shaohe_feng_ | OK, good. | 16:02 |
Li_Liu | coz I think numa scheduling also needs this feature | 16:02 |
shaohe_feng_ | this is useful. | 16:02 |
Li_Liu | for sure | 16:03 |
Li_Liu | I will add this to T release plannig | 16:03 |
shaohe_feng_ | there's a common scenario for this feature. | 16:03 |
shaohe_feng_ | Li_Liu: good, thanks. | 16:04 |
Li_Liu | npnp | 16:04 |
shaohe_feng_ | AoB? | 16:04 |
Li_Liu | I need to pick up my lunch now, you guys can go ahead. don't stay too late.. :P | 16:05 |
shaohe_feng_ | Li_Liu: wangzhh: xinranwang ? | 16:05 |
Li_Liu | I am all good\ | 16:05 |
shaohe_feng_ | good. | 16:05 |
shaohe_feng_ | glad to talk with you. | 16:05 |
wangzhh | Me, too. | 16:05 |
shaohe_feng_ | let's end the meeting. | 16:06 |
xinranwang | i am fine with that. NUMA should also need the similar mechanism | 16:06 |
shaohe_feng_ | thanks all. | 16:06 |
shaohe_feng_ | #endmeeting | 16:06 |
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)" | 16:06 | |
openstack | Meeting ended Mon Mar 18 16:06:55 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 16:06 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg_driver/2019/openstack_cyborg_driver.2019-03-18-15.04.html | 16:06 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg_driver/2019/openstack_cyborg_driver.2019-03-18-15.04.txt | 16:07 |
openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg_driver/2019/openstack_cyborg_driver.2019-03-18-15.04.log.html | 16:07 |
wangzhh | Good night. Bye. | 16:07 |
shaohe_feng_ | wangzhh: good night. bye. | 16:07 |
*** shaohe_feng_ has quit IRC | 16:11 | |
*** wangzhh has quit IRC | 18:12 | |
*** dustinc is now known as dustinc|lunch | 18:13 | |
*** xinranwang has quit IRC | 18:15 | |
*** dustinc|lunch is now known as dustinc | 19:23 | |
*** irclogbot_0 has joined #openstack-cyborg | 20:12 | |
*** irclogbot_0 has quit IRC | 20:25 | |
*** irclogbot_0 has joined #openstack-cyborg | 20:27 | |
*** irclogbot_0 has quit IRC | 21:05 | |
*** irclogbot_0 has joined #openstack-cyborg | 21:07 | |
*** irclogbot_0 has quit IRC | 21:26 | |
*** irclogbot_0 has joined #openstack-cyborg | 21:28 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!