Monday, 2019-03-18

*** irclogbot_0 has quit IRC02:23
*** edmondsw has quit IRC02:26
*** edleafe has quit IRC02:27
openstackgerritchenker proposed openstack/cyborg master: Add the module used but not be imported  https://review.openstack.org/64384203:07
openstackgerritchenker proposed openstack/cyborg master: Add the module used but not be imported  https://review.openstack.org/64384203:11
openstackgerritchenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py  https://review.openstack.org/64387408:09
openstackgerritchenker proposed openstack/cyborg master: Add the module used but not be imported  https://review.openstack.org/64384208:16
*** ikuo_o_ has joined #openstack-cyborg08:37
*** ikuo_o_ is now known as ikuo_o08:40
openstackgerritchenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py  https://review.openstack.org/64387408:40
openstackgerritchenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py  https://review.openstack.org/64387410:57
*** ikuo_o has quit IRC11:09
*** edmondsw has joined #openstack-cyborg11:50
*** edleafe has joined #openstack-cyborg12:21
openstackgerritchenker proposed openstack/cyborg master: Fix method 'test_discover' assertError in test_driver.py  https://review.openstack.org/64387412:39
*** shaohe_feng_ has joined #openstack-cyborg14:37
openstackgerritOpenStack Release Bot proposed openstack/os-acc master: Update master for stable/stein  https://review.openstack.org/64401814:41
shaohe_feng_hello everyone15:00
*** wangzhh has joined #openstack-cyborg15:02
shaohe_feng_evening wangzhh15:03
wangzhhevening shaohe.15:03
shaohe_feng_#startmeeting openstack-cyborg-driver15:04
openstackMeeting started Mon Mar 18 15:04:20 2019 UTC and is due to finish in 60 minutes.  The chair is shaohe_feng_. Information about MeetBot at http://wiki.debian.org/MeetBot.15:04
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.15:04
*** openstack changes topic to " (Meeting topic: openstack-cyborg-driver)"15:04
openstackThe meeting name has been set to 'openstack_cyborg_driver'15:04
shaohe_feng_let waits for a minutes.15:04
shaohe_feng_#info shaohe_feng_15:05
wangzhhFine.15:05
*** xinranwang has joined #openstack-cyborg15:05
xinranwangHi all15:06
wangzhhHi xinran.15:06
xinranwangSorry for late15:06
xinranwang#info xinranwang15:06
shaohe_feng_evening xinranwang15:06
wangzhh#info wangzhh15:06
xinranwanghi shaohe_feng_  wangzhh15:06
shaohe_feng_we have not hold this meeting for a long time.15:06
shaohe_feng_#link https://wiki.openstack.org/wiki/Meetings/CyborgDriverTeamMeeting#Agenda_for_next_meeting_:_Mar_18th.2C_201915:07
shaohe_feng_here is the agent.15:07
*** Li_Liu has joined #openstack-cyborg15:07
shaohe_feng_s/agent/agenda15:07
Li_LiuHi Gyus15:07
wangzhhHi, uncle Li.15:08
shaohe_feng_Li_Liu: morning uncle Li.15:08
Li_LiuHi, xiaohei~~15:08
Li_Liuhi shaohe15:08
Li_Liuyou guys wanna do a zoom meeting instead?15:08
shaohe_feng_I want to introduce some some hardware accelerators.15:08
shaohe_feng_1. the current know type of accelerator card15:09
shaohe_feng_as we all know cyborg will support mdev and pci card.15:09
shaohe_feng_but now I find there are 2 other kinds of hardware card we can support.15:10
shaohe_feng_one is ip over PCIE, another is USB.15:10
Li_Liui see15:10
shaohe_feng_wangzhh: do you know these two kind cards?15:11
Li_Liucan they fit into our current design?15:11
shaohe_feng_not sure, so we need more discuss with them.15:11
wangzhhI don't know much about ip over pcie, what does that mean?15:11
Li_LiuI think it's a remote case15:12
Li_LiuPCI over ethernet?15:12
shaohe_feng_Li_Liu: yes.15:13
shaohe_feng_#link https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/vca-2-visual-compute-accelerator-product-brief.pdf15:13
shaohe_feng_Li_Liu: No, IP over pci.15:13
Li_Liufrom Operation System point of view, it's still a pci device right?15:14
shaohe_feng_Li_Liu: is it s pci devices, but you communicate with it by it.15:14
shaohe_feng_Li_Liu: it is a local pci card.15:14
shaohe_feng_such as the vca2 card, see link above,15:15
Li_Liuyou mean other hosts can communicate with it over ethernet?15:15
wangzhhSo, actually, it is a pci device?15:16
shaohe_feng_oh, the local host communicate the local card, over PCIE.15:16
wangzhhfrom os view.15:16
shaohe_feng_wangzhh: from the os view, you can see it a new kind device with new driver maybe.15:17
shaohe_feng_there's another card I have attend meeting last week, seem this is a common way for some card.15:18
shaohe_feng_we can dig more about this kind of card.15:18
shaohe_feng_for usb card, the movidius AI card is this kind.15:19
Li_LiuI think we just need to make sure 2 things: 1. can os-acc attach it like all the other devices, 2. can the resource fit into our current data model15:20
shaohe_feng_yes.15:20
Li_Liuas long as these two requirements can meet, we should be good15:20
wangzhhIt make sense.15:21
shaohe_feng_I think the usb devices can satisfy these two requirements.15:21
Li_Liushaohe_feng_ have they finalize the resource structure yet?15:22
shaohe_feng_Li_Liu: usb, yes.15:23
shaohe_feng_just remind these 2 kind devices.15:23
wangzhhHow about another one?15:23
shaohe_feng_wangzhh: I'm not looking into looking into  it well.15:24
shaohe_feng_OK, let's go ahead.15:24
wangzhhOK.15:24
shaohe_feng_Re-enumeration of hardware card15:24
shaohe_feng_most of us know the issue of Re-enumeration.15:25
Li_Liuthe issue we discussed last week?15:25
shaohe_feng_no, but this is a common issue.15:26
shaohe_feng_the bus of a hardware card maybe change  after we resize a hardware and reboot.15:26
shaohe_feng_seem this is a big problem for accelerator manage in cyborg.15:27
shaohe_feng_I have discuss it with Yongli, the main PCI devices contributor in nova.15:28
shaohe_feng_he say, nova does not allow resize hardware15:28
Li_Liuyou mean add/remove device after reboot?15:29
shaohe_feng_yes15:29
shaohe_feng_unless evict all VMs from this node.15:30
shaohe_feng_Li_Liu: wangzhh: xinranwang: what's do you think about?15:30
shaohe_feng_Or do you have a good ideas for hardware resize?15:31
wangzhhWuu, IMO, it's better to change status to error or offline in cyborg.15:31
wangzhhAnd let operator sync it manaully.15:31
Li_LiuLet's say before restart we have 3 cards, after restart we now have 4 cards15:32
Li_LiuI think driver can find out which one is the new one right?15:32
wangzhhWe can supply a tool or api for operator to update it.15:32
xinranwangIf we plug in a new card on server, and reboot. But the hw resource assigned to an instance does not change.15:32
xinranwangwill the bdf change? if so, that should be an issue.15:33
wangzhhLi_liu, as xinran said.15:33
Li_Liuthe bdf might change, but we don't need to guarantee give user the card with the same bdf15:33
shaohe_feng_the bdf maybe change, bus-port of a usb devices also maybe change.15:34
Li_Liujust give user the card with the same type15:34
wangzhhThe most tricky thing is how to handle the resource which had been  assinged.15:35
xinranwangif user has done some work on old hw, that will be a loss.15:36
Li_Liuin that case, operator has to notify the user to backup first15:37
Li_Liusize operator should know when the resizing is happening15:37
Li_Liusince*15:37
Li_LiuIn 99% of the scenarios tho, I don't think it matters anyway15:38
wangzhhWhat about power failure?15:38
xinranwanghow will nova record the hw resource from cyborg, there should be a field of nova instance to record this.  is this attach_handle_uuid?15:39
Li_Liuif power failure happen, the device should not be resized right?15:39
shaohe_feng_if you hotplug in a hardware before failure happen, the things is also bad.15:40
wangzhhLi, If we just reboot the server, the bus wont't change?15:42
wangzhh*won't15:42
Li_Liulol... as I said.. if operator wants to do this... he/she needs to notify users...15:43
shaohe_feng_if you do not resize hardware, the bus wont't change.15:43
shaohe_feng_Li_Liu: yes.15:43
xinranwangwangzhh:  no, it will not change15:43
wangzhhshaohe_feng_, Got it.15:43
Li_Liuwangzhh, I think simple reboot should not change the bdf15:43
Li_Liubios just scan the pci tree15:43
Li_Liuif nothing new is inserted, it should not change15:44
shaohe_feng_live migrate the VM to another host.15:44
xinranwangthat's more complex...15:44
wangzhhscheduler filter should deal with this part. shaohe_feng_15:45
shaohe_feng_the data center can scale their hardwares. For example the want to support more AI card in their exist hosts.15:46
shaohe_feng_OK, let keep this issue in mind, maybe we can  find a good way to solve it15:47
shaohe_feng_go ahead.15:47
shaohe_feng_multi-level resources support15:47
shaohe_feng_now I want to support a new multi-level card.15:48
shaohe_feng_similar to pfga card.15:48
shaohe_feng_for example. There is a one region in a card but  4 functions in a region.15:49
Li_Liusure, to support new cards. as long as it can meet the requirements I mentioned earlier15:49
Li_Liu4 different functions?15:49
shaohe_feng_there's 3 requirements:15:49
shaohe_feng_Li_Liu: in my new card, they are same function, but for fpga, it may different functions. fpga is more complex.15:50
shaohe_feng_1. we should know the topology of this devices.15:51
shaohe_feng_2. user can apply any level of the resources, for example, he want to apply a region or just one function.15:52
shaohe_feng_3. avoid fragmentization15:52
shaohe_feng_Li_Liu: now the cyborg satisfy the the former 2 requirements, right?15:53
Li_Liushaohe_feng_ it should15:54
shaohe_feng_Ok, greate.15:54
shaohe_feng_what's about 3.15:54
Li_Liucyborg was designed to have these in mind15:54
shaohe_feng_good.15:54
Li_Liuthe 3rd one is related to scheduling algorithm15:54
Li_Liuwe might need to work with nova weigher for that15:55
shaohe_feng_Li_Liu: that's need cyborg help.15:55
Li_Liuthat's for sure15:55
shaohe_feng_let me elaborate it15:55
shaohe_feng_3 regions15:56
Li_Liucyborg can provide a weigher like mechanism and work with nova15:56
shaohe_feng_one region with 4 function.15:56
shaohe_feng_User 1 apply one function from region 115:56
shaohe_feng_user 2 want another 2 more functions. I expect cyborg allocate them from  region 1 instead of region 2/3.15:58
shaohe_feng_user 3 want another one more functions, it is also from  region 1.15:58
shaohe_feng_the allocation should not scatter among region 1,2 and 315:59
shaohe_feng_they should centralize 1 region.16:00
Li_Liuthat should be easy to do. a weigher would do the job'16:00
shaohe_feng_so user 4 can apply the rest 2 whole regions.16:01
shaohe_feng_Li_Liu: OK, is there a weigher mechanism for it now?16:02
Li_Liunot yet16:02
Li_Liuwe can plan this16:02
shaohe_feng_OK, good.16:02
Li_Liucoz I think numa scheduling also needs this feature16:02
shaohe_feng_this is useful.16:02
Li_Liufor sure16:03
Li_LiuI will add this to T release plannig16:03
shaohe_feng_there's a common scenario for this feature.16:03
shaohe_feng_Li_Liu: good, thanks.16:04
Li_Liunpnp16:04
shaohe_feng_AoB?16:04
Li_LiuI need to pick up my lunch now, you guys can go ahead. don't stay too late.. :P16:05
shaohe_feng_Li_Liu: wangzhh: xinranwang ?16:05
Li_LiuI am all good\16:05
shaohe_feng_good.16:05
shaohe_feng_glad to talk with you.16:05
wangzhhMe, too.16:05
shaohe_feng_let's end the meeting.16:06
xinranwangi am fine with that. NUMA should also need the similar mechanism16:06
shaohe_feng_thanks all.16:06
shaohe_feng_#endmeeting16:06
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)"16:06
openstackMeeting ended Mon Mar 18 16:06:55 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)16:06
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg_driver/2019/openstack_cyborg_driver.2019-03-18-15.04.html16:06
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg_driver/2019/openstack_cyborg_driver.2019-03-18-15.04.txt16:07
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg_driver/2019/openstack_cyborg_driver.2019-03-18-15.04.log.html16:07
wangzhhGood night. Bye.16:07
shaohe_feng_wangzhh: good night. bye.16:07
*** shaohe_feng_ has quit IRC16:11
*** wangzhh has quit IRC18:12
*** dustinc is now known as dustinc|lunch18:13
*** xinranwang has quit IRC18:15
*** dustinc|lunch is now known as dustinc19:23
*** irclogbot_0 has joined #openstack-cyborg20:12
*** irclogbot_0 has quit IRC20:25
*** irclogbot_0 has joined #openstack-cyborg20:27
*** irclogbot_0 has quit IRC21:05
*** irclogbot_0 has joined #openstack-cyborg21:07
*** irclogbot_0 has quit IRC21:26
*** irclogbot_0 has joined #openstack-cyborg21:28

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!