Wednesday, 2019-03-13

*** tetsuro has joined #openstack-cyborg00:44
*** tetsuro has quit IRC01:47
*** Sundar has joined #openstack-cyborg02:00
*** Li_Liu has joined #openstack-cyborg02:45
*** xinranwang has joined #openstack-cyborg02:54
*** Coco_gao has joined #openstack-cyborg02:57
Coco_gaohi all02:57
Li_LiuHi Coco02:58
Li_Liuhow's your trip?02:58
Coco_gaogreat02:58
Li_Liunice02:58
*** shaohe_feng_ has joined #openstack-cyborg02:59
xinranwangHi all02:59
Li_Liulet's wait for a couple more min for others02:59
shaohe_feng_hi all02:59
Coco_gaohi shaohe02:59
shaohe_feng_Coco_gao: morning.02:59
Li_Liu#startmeeting openstack-cyborg03:01
openstackMeeting started Wed Mar 13 03:01:17 2019 UTC and is due to finish in 60 minutes.  The chair is Li_Liu. Information about MeetBot at http://wiki.debian.org/MeetBot.03:01
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.03:01
*** openstack changes topic to " (Meeting topic: openstack-cyborg)"03:01
openstackThe meeting name has been set to 'openstack_cyborg'03:01
Li_LiuLet's get started03:01
Li_Liu#topic Roll Call03:01
*** openstack changes topic to "Roll Call (Meeting topic: openstack-cyborg)"03:01
Li_Liu#info Li_Liu03:01
Coco_gao#info Coco_gao03:01
xinranwang#info xinranwang03:01
Li_Liuare sundar and zhenghao here yet?03:02
Li_Liu#topic Code Freeze Status Update03:02
*** openstack changes topic to "Code Freeze Status Update (Meeting topic: openstack-cyborg)"03:02
Li_Liuhttps://review.openstack.org/#/q/status:open%20project:openstack/cyborg03:03
Coco_gaoI will update my patch according to the comments these two days.03:03
Li_LiuCoco_gao, thanks03:03
Li_LiuWhy https://review.openstack.org/#/c/574075/ this one is not merged yet?03:04
Li_Liustrange ><||03:04
Coco_gaoThat's depend on my patch03:04
Coco_gaobecause my patch are not merged.03:05
zhipengZuul not started03:05
Li_LiuI see03:05
*** wangzhh has joined #openstack-cyborg03:06
Li_LiuBy the hard dead line of code freeze, please add UT to the features you own03:06
wangzhhhi all03:06
Li_LiuHi wangzhh03:06
wangzhhSorry for late.03:07
Coco_gaoI will add my UTs.03:08
Sundar#info Sundar03:08
Li_LiuHi Sundar03:08
SundarSorry for the delay03:08
SundarHi Li_Liu03:08
Coco_gaoHi Sundar03:08
SundarHi Coco_gao and all03:08
Li_LiuI think are in a good shape so far03:09
zhipengAny luck to have xilinx driver lol ?03:10
Li_Liuno updates tho03:10
Li_LiuI can follow up with Chuck later03:11
Li_Liubut prob not gonna make to the deadline03:11
Li_Liuzhipeng, we can still refine our docs after the deadline right?03:12
zhipengNeed to do it before the RC03:13
SundarWhat is needed for docs?03:14
Li_LiuRC1 is Mar 18 - Mar 2203:14
Li_Liuhttps://docs.openstack.org/cyborg/latest/#developer-documentation03:15
Li_Liuyumeng already added quite some stuff there03:16
SundarLi_Liu: We have feedback that we should improve our API docs. I can document the current v1 API, if nobody else volunteers.03:16
Li_Liuwe need to keep improving it03:16
Li_LiuSundar, sure thanks a lot03:16
Li_LiuI will take some time to work on the python-clinet03:17
Li_Liuas least make it align with our docs and APIs03:17
Coco_gaoHi Sundar, still one thing about driver ovo. Is deployable name unique? why is that?03:18
Coco_gaoThanks a lot.03:18
SundarCoco_gao: I think so, because it will be used as resource provider name in Placement, and that must be unique AFAIK03:19
Li_LiuHow are we going to guarantee deployable's name's uniqueness03:19
Li_Liuare we doing the check when the resource is reported?03:20
Coco_gaowe can't if we set the name field in the drivers.03:20
SundarCoco_gao: I don't see explicit documentation that it must be unique. I will check and get back.03:20
SundarCoco_gao and all: why can't Cyborg agent construct the name from other fields like vendor, type, etc., and add a unique id?03:21
SundarE.g. 'INTEL_FPGA_PAC_CARD_ID1'03:21
Li_Liuis this ID1 a uuid?03:22
SundarLi_Liu: I was thinking a simple integer03:22
SundarOh, wait03:22
SundarThere is a convention for naming nested RPs03:22
SundarIt is based on compute node name03:22
SundarI will check and send email.03:23
xinranwangnow the deployable name is the filename in /sys/class/fpga, it's unique03:23
Sundarxinranwang: It is unique within a compute node03:24
SundarThe same name can repeat across nodes03:24
Coco_gaoSundar, I agree we'd better do that in agent.03:24
SundarWe are not reporting anything ti Placement yet, right?03:25
Coco_gaoxinranwang, that's the problem when across nodes, name maybe same right?03:25
Li_Liuhow about when we report the deployable to placement API, we concate name+uuid03:25
SundarLi_Liu: good idea. I'll get back with the name convention for nested RPs03:25
wangzhh  xinranwang, what if different node has same device? Is it unique?03:26
xinranwangif we support NRP, we can identify which host the deployable locate, should it be ok to have same deployable name in different compute node ?03:26
Coco_gaoxinranwang, that will be ok, i think.03:27
shaohe_feng_the fpga devices name is generated by the kernel.03:27
wangzhhxinranwang, Not really, Now it is global unique.03:27
shaohe_feng_the name is unique03:27
shaohe_feng_it does not mater if different node has same device03:28
Coco_gaothe reason why we need to keep unique from the aspect of driver ovo is that we need to identify the deployable. But driver ovo is compared in the same node, so, the name need only to be unique in one host.03:28
shaohe_feng_Coco_gao: yes.03:29
wangzhhCoco, so we should change db, it is global unique now.03:29
shaohe_feng_for device@host is unique03:29
xinranwangso i think it's ok to have same deployable name in different compute node,  in placement side.  But name should be unique on same compute node.03:29
shaohe_feng_the name is not used to identify a device03:30
Coco_gaowangzhh, Sundar and all, maybe we need to change the db constrains on the deployable table, name field.03:30
Coco_gaodo you argree if I modify that?03:30
Li_Liuwhat constrain?03:31
shaohe_feng_just a Prompt for human03:31
Coco_gaothe name field is unique in deployable table.03:31
Li_Liuah, ok03:31
Li_Liugo ahead03:31
Li_Liuno problem on my side03:31
shaohe_feng_I agree03:31
Coco_gaoOK, thank you are for the advice.03:31
Coco_gaoall03:32
wangzhhOf course. But how to handle device like gpu, <device_name>_<address>?03:32
SundarCoco_gao: I think it is ok to make it unique because: there  is some proposed convention to name nested RPs like '<hostname>_<numaNode>_<x>' and x must be unique within a node anyway for us.03:32
shaohe_feng_just keep id/uuid unique. it it machine readable.03:32
shaohe_feng_unique in a node is ok.03:33
wangzhhshaohe, when driver report a device, it does not have a uuid.03:33
shaohe_feng_not need global03:33
shaohe_feng_wangzhh: agent gen one for it.  :)03:33
shaohe_feng_bus is also unique.03:34
shaohe_feng_bus is also machine  readable.03:34
Coco_gaoSundar, the problem is how to generate x to make sure same card is using the same x when reporting.03:34
wangzhhshaohe, agent will generate the uuid every time?03:34
shaohe_feng_wangzhh: no. just once.03:35
shaohe_feng_wangzhh: it need to check the bus.03:35
shaohe_feng_wangzhh: on a node, bus is used for machine read .03:35
shaohe_feng_on a cluster, uuid is used for machine read03:35
SundarThere may not be a PCI bdf in all hypervisors.03:36
wangzhhshaohe, I suppose you mean to generate it at first  time.03:36
shaohe_feng_Coco_gao: the x can be generated  by the bus.03:36
shaohe_feng_Coco_gao: let me show you an example03:36
shaohe_feng_wangzhh: yes.03:36
Coco_gaothanks shaohe03:36
Li_Liuif there's no bdf, can we use uuid?03:36
shaohe_feng_Li_Liu:  there's another identification without bdf03:37
shaohe_feng_for03:37
wangzhhBut agent doesn't know which time it is.03:37
wangzhhshaohe_feng_03:37
shaohe_feng_wangzhh: it need to check.  if the bus not in the db, then it is the first time.03:38
Li_Liushaohe_feng_, sure that also works03:38
shaohe_feng_seems mdev has a uuid.03:38
shaohe_feng_and usb has it own bus.03:38
SundarThe driver should report a unique id within the node for each device. It could be PCI bdf for libvirt or whatever is unique for PowerVM and others03:38
wangzhhIf so, agent should query db first. do something like diff?03:38
SundarThen that could be the x factor03:39
Sundarwangzhh: No, agent should not query db. For 2 reasons: scaling, upgrades can change db schema03:39
shaohe_feng_wangzhh: yes. wen agent start. ti should sync with db firstly03:39
wangzhh+103:39
shaohe_feng_when03:39
wangzhhshaohe_feng_ agent doesn't query db now.03:40
shaohe_feng_Sundar: no, it should sync when it start. and can keep the info in cache.03:40
SundarAgent should not keep state. Even if it reads db at startup, it cannot assume that it will remain in sync, because operator can update config03:40
SundarNo cache, please. We will hit all kinds of issues with stale caches, aging, etc.03:41
Li_Liushaohe_feng_, is the cache only containing the information related to the node?03:41
shaohe_feng_yes.03:41
wangzhhAgree with sundar at this part. :)03:41
shaohe_feng_it's own node info.03:41
shaohe_feng_let me show you what I do.03:42
Li_LiuSundar, I think it should be ok if it only holds its own information in cache03:42
wangzhhshaohe_feng_ haha  talk is cheap, show me your code. :)03:42
SundarLi_Liu: The operator may want to disable or enable specific devices, or do other config.03:42
shaohe_feng_wangzhh: yes, I do show you code03:43
Coco_gaobefore diff, the agent should get  the old driver ovo, is that from db or cache?03:43
wangzhhCool.03:43
shaohe_feng_wangzhh: I have implemented it.03:43
SundarLi_Liu: Then we have to propagate such changes to each agent, ensure that it has received it, etc. The agent doesn't need any state for discovery -- just add a unique field that driver reports.03:43
shaohe_feng_I report the placement by: device_name@host this is  unique03:43
shaohe_feng_and I just pud the device_name in cyborg db03:44
shaohe_feng_it can works well, no any conflict,03:44
Coco_gaoI agree with shaohe.03:44
shaohe_feng_for placement use device_name@host for index.03:44
SundarCoco_gao: Again, there are some conventions proposed for nested RP names. I am still trying to find the spec/doc where I saw that.03:45
shaohe_feng_but cyborg does not use device_name for index.03:45
shaohe_feng_Sundar: that's 2 things, but if you want to keep it same. it is OK.03:46
Sundarshaohe_feng: There's no point in making them different. The only reason why we have a deployable name is to report to placement03:47
shaohe_feng_the big problems it not this.03:47
Li_LiuSundar, please help to find out the conventions. shaohe_feng_, could you share you code with us?03:48
shaohe_feng_the big problems is enumeration.03:48
xinranwangmaybe keep deployable name unique on same compute, and add hostname like "@host"  when report to placement.03:48
Li_LiuIt seems we need some further discussion on this issue, we can discuss it in tomorrow's zoom sync03:48
xinranwangi believe that's what shaohe_feng_  did.03:48
shaohe_feng_Li_Liu: if restart, and some change on the host. the enumeration may change the bus of a same device.03:49
Li_Liushaohe_feng_, yea, I know03:50
shaohe_feng_I means cloud provider may resize the hardware on the host03:50
shaohe_feng_so that's we really need to care03:50
shaohe_feng_after all, the machine need bus to identify a device not the name.03:51
shaohe_feng_Li_Liu: yes.  maybe we care the same thing.03:51
Li_Liushaohe_feng_, driver should do the mapping from bus to device name/id I believe03:52
shaohe_feng_Li_Liu: yes, that's what we need to improve.03:53
Li_Liuok03:54
Sundar@all: Please look at https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/stein/approved/numa-topology-with-rps.rst?h=refs/changes/24/552924/14#n16303:55
Coco_gaoshaohe_feng, that will be ok if name change, the conductor will delete the old device with name1 and add new device to db with name2. But actually, the db is exactly the same with the real situation.03:55
Li_LiuSince it's pretty late for me here. Let's close it up and discuss more detail in tomorrow's sync up03:55
shaohe_feng_Li_Liu: OK.03:56
Coco_gaobut that name change for one device is not supposed to be frequent.03:56
shaohe_feng_Coco_gao: yes. not frequently.03:56
shaohe_feng_seldom resize the baremetal03:57
Li_LiuAlright, let's call the meeting for today. Have a good night/day where ever you are03:59
Li_Liu#endmeeting03:59
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)"03:59
openstackMeeting ended Wed Mar 13 03:59:45 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)03:59
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-03-13-03.01.html03:59
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-03-13-03.01.txt03:59
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-03-13-03.01.log.html03:59
*** Li_Liu has quit IRC04:34
*** Sundar has quit IRC04:47
*** Coco_gao has quit IRC06:06
*** xinranwang has quit IRC06:14
*** diga has joined #openstack-cyborg06:58
*** diga has quit IRC07:09
*** wangzhh has quit IRC07:46
*** helenafm has joined #openstack-cyborg08:30
*** shaohe_feng_ has quit IRC09:02
*** FlorianFa has joined #openstack-cyborg09:23
*** diga has joined #openstack-cyborg09:36
*** diga has quit IRC11:03
*** diga has joined #openstack-cyborg12:59
digaHello everyone13:16
digado we have meeting today ?13:16
*** irclogbot_0 has quit IRC14:09
*** irclogbot_0 has joined #openstack-cyborg14:12
*** irclogbot_0 has quit IRC14:25
*** irclogbot_0 has joined #openstack-cyborg14:28
*** diga has quit IRC15:35
*** irclogbot_0 has quit IRC15:36
*** irclogbot_0 has joined #openstack-cyborg15:39
*** irclogbot_0 has quit IRC15:49
*** irclogbot_0 has joined #openstack-cyborg15:51
*** irclogbot_0 has quit IRC15:52
*** irclogbot_0 has joined #openstack-cyborg15:56
*** sum12 has left #openstack-cyborg16:07
*** helenafm has quit IRC16:40
*** FlorianFa has quit IRC16:41
*** FlorianFa has joined #openstack-cyborg16:54

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!