*** tetsuro has joined #openstack-cyborg | 00:44 | |
*** tetsuro has quit IRC | 01:47 | |
*** Sundar has joined #openstack-cyborg | 02:00 | |
*** Li_Liu has joined #openstack-cyborg | 02:45 | |
*** xinranwang has joined #openstack-cyborg | 02:54 | |
*** Coco_gao has joined #openstack-cyborg | 02:57 | |
Coco_gao | hi all | 02:57 |
---|---|---|
Li_Liu | Hi Coco | 02:58 |
Li_Liu | how's your trip? | 02:58 |
Coco_gao | great | 02:58 |
Li_Liu | nice | 02:58 |
*** shaohe_feng_ has joined #openstack-cyborg | 02:59 | |
xinranwang | Hi all | 02:59 |
Li_Liu | let's wait for a couple more min for others | 02:59 |
shaohe_feng_ | hi all | 02:59 |
Coco_gao | hi shaohe | 02:59 |
shaohe_feng_ | Coco_gao: morning. | 02:59 |
Li_Liu | #startmeeting openstack-cyborg | 03:01 |
openstack | Meeting started Wed Mar 13 03:01:17 2019 UTC and is due to finish in 60 minutes. The chair is Li_Liu. Information about MeetBot at http://wiki.debian.org/MeetBot. | 03:01 |
openstack | Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. | 03:01 |
*** openstack changes topic to " (Meeting topic: openstack-cyborg)" | 03:01 | |
openstack | The meeting name has been set to 'openstack_cyborg' | 03:01 |
Li_Liu | Let's get started | 03:01 |
Li_Liu | #topic Roll Call | 03:01 |
*** openstack changes topic to "Roll Call (Meeting topic: openstack-cyborg)" | 03:01 | |
Li_Liu | #info Li_Liu | 03:01 |
Coco_gao | #info Coco_gao | 03:01 |
xinranwang | #info xinranwang | 03:01 |
Li_Liu | are sundar and zhenghao here yet? | 03:02 |
Li_Liu | #topic Code Freeze Status Update | 03:02 |
*** openstack changes topic to "Code Freeze Status Update (Meeting topic: openstack-cyborg)" | 03:02 | |
Li_Liu | https://review.openstack.org/#/q/status:open%20project:openstack/cyborg | 03:03 |
Coco_gao | I will update my patch according to the comments these two days. | 03:03 |
Li_Liu | Coco_gao, thanks | 03:03 |
Li_Liu | Why https://review.openstack.org/#/c/574075/ this one is not merged yet? | 03:04 |
Li_Liu | strange ><|| | 03:04 |
Coco_gao | That's depend on my patch | 03:04 |
Coco_gao | because my patch are not merged. | 03:05 |
zhipeng | Zuul not started | 03:05 |
Li_Liu | I see | 03:05 |
*** wangzhh has joined #openstack-cyborg | 03:06 | |
Li_Liu | By the hard dead line of code freeze, please add UT to the features you own | 03:06 |
wangzhh | hi all | 03:06 |
Li_Liu | Hi wangzhh | 03:06 |
wangzhh | Sorry for late. | 03:07 |
Coco_gao | I will add my UTs. | 03:08 |
Sundar | #info Sundar | 03:08 |
Li_Liu | Hi Sundar | 03:08 |
Sundar | Sorry for the delay | 03:08 |
Sundar | Hi Li_Liu | 03:08 |
Coco_gao | Hi Sundar | 03:08 |
Sundar | Hi Coco_gao and all | 03:08 |
Li_Liu | I think are in a good shape so far | 03:09 |
zhipeng | Any luck to have xilinx driver lol ? | 03:10 |
Li_Liu | no updates tho | 03:10 |
Li_Liu | I can follow up with Chuck later | 03:11 |
Li_Liu | but prob not gonna make to the deadline | 03:11 |
Li_Liu | zhipeng, we can still refine our docs after the deadline right? | 03:12 |
zhipeng | Need to do it before the RC | 03:13 |
Sundar | What is needed for docs? | 03:14 |
Li_Liu | RC1 is Mar 18 - Mar 22 | 03:14 |
Li_Liu | https://docs.openstack.org/cyborg/latest/#developer-documentation | 03:15 |
Li_Liu | yumeng already added quite some stuff there | 03:16 |
Sundar | Li_Liu: We have feedback that we should improve our API docs. I can document the current v1 API, if nobody else volunteers. | 03:16 |
Li_Liu | we need to keep improving it | 03:16 |
Li_Liu | Sundar, sure thanks a lot | 03:16 |
Li_Liu | I will take some time to work on the python-clinet | 03:17 |
Li_Liu | as least make it align with our docs and APIs | 03:17 |
Coco_gao | Hi Sundar, still one thing about driver ovo. Is deployable name unique? why is that? | 03:18 |
Coco_gao | Thanks a lot. | 03:18 |
Sundar | Coco_gao: I think so, because it will be used as resource provider name in Placement, and that must be unique AFAIK | 03:19 |
Li_Liu | How are we going to guarantee deployable's name's uniqueness | 03:19 |
Li_Liu | are we doing the check when the resource is reported? | 03:20 |
Coco_gao | we can't if we set the name field in the drivers. | 03:20 |
Sundar | Coco_gao: I don't see explicit documentation that it must be unique. I will check and get back. | 03:20 |
Sundar | Coco_gao and all: why can't Cyborg agent construct the name from other fields like vendor, type, etc., and add a unique id? | 03:21 |
Sundar | E.g. 'INTEL_FPGA_PAC_CARD_ID1' | 03:21 |
Li_Liu | is this ID1 a uuid? | 03:22 |
Sundar | Li_Liu: I was thinking a simple integer | 03:22 |
Sundar | Oh, wait | 03:22 |
Sundar | There is a convention for naming nested RPs | 03:22 |
Sundar | It is based on compute node name | 03:22 |
Sundar | I will check and send email. | 03:23 |
xinranwang | now the deployable name is the filename in /sys/class/fpga, it's unique | 03:23 |
Sundar | xinranwang: It is unique within a compute node | 03:24 |
Sundar | The same name can repeat across nodes | 03:24 |
Coco_gao | Sundar, I agree we'd better do that in agent. | 03:24 |
Sundar | We are not reporting anything ti Placement yet, right? | 03:25 |
Coco_gao | xinranwang, that's the problem when across nodes, name maybe same right? | 03:25 |
Li_Liu | how about when we report the deployable to placement API, we concate name+uuid | 03:25 |
Sundar | Li_Liu: good idea. I'll get back with the name convention for nested RPs | 03:25 |
wangzhh | xinranwang, what if different node has same device? Is it unique? | 03:26 |
xinranwang | if we support NRP, we can identify which host the deployable locate, should it be ok to have same deployable name in different compute node ? | 03:26 |
Coco_gao | xinranwang, that will be ok, i think. | 03:27 |
shaohe_feng_ | the fpga devices name is generated by the kernel. | 03:27 |
wangzhh | xinranwang, Not really, Now it is global unique. | 03:27 |
shaohe_feng_ | the name is unique | 03:27 |
shaohe_feng_ | it does not mater if different node has same device | 03:28 |
Coco_gao | the reason why we need to keep unique from the aspect of driver ovo is that we need to identify the deployable. But driver ovo is compared in the same node, so, the name need only to be unique in one host. | 03:28 |
shaohe_feng_ | Coco_gao: yes. | 03:29 |
wangzhh | Coco, so we should change db, it is global unique now. | 03:29 |
shaohe_feng_ | for device@host is unique | 03:29 |
xinranwang | so i think it's ok to have same deployable name in different compute node, in placement side. But name should be unique on same compute node. | 03:29 |
shaohe_feng_ | the name is not used to identify a device | 03:30 |
Coco_gao | wangzhh, Sundar and all, maybe we need to change the db constrains on the deployable table, name field. | 03:30 |
Coco_gao | do you argree if I modify that? | 03:30 |
Li_Liu | what constrain? | 03:31 |
shaohe_feng_ | just a Prompt for human | 03:31 |
Coco_gao | the name field is unique in deployable table. | 03:31 |
Li_Liu | ah, ok | 03:31 |
Li_Liu | go ahead | 03:31 |
Li_Liu | no problem on my side | 03:31 |
shaohe_feng_ | I agree | 03:31 |
Coco_gao | OK, thank you are for the advice. | 03:31 |
Coco_gao | all | 03:32 |
wangzhh | Of course. But how to handle device like gpu, <device_name>_<address>? | 03:32 |
Sundar | Coco_gao: I think it is ok to make it unique because: there is some proposed convention to name nested RPs like '<hostname>_<numaNode>_<x>' and x must be unique within a node anyway for us. | 03:32 |
shaohe_feng_ | just keep id/uuid unique. it it machine readable. | 03:32 |
shaohe_feng_ | unique in a node is ok. | 03:33 |
wangzhh | shaohe, when driver report a device, it does not have a uuid. | 03:33 |
shaohe_feng_ | not need global | 03:33 |
shaohe_feng_ | wangzhh: agent gen one for it. :) | 03:33 |
shaohe_feng_ | bus is also unique. | 03:34 |
shaohe_feng_ | bus is also machine readable. | 03:34 |
Coco_gao | Sundar, the problem is how to generate x to make sure same card is using the same x when reporting. | 03:34 |
wangzhh | shaohe, agent will generate the uuid every time? | 03:34 |
shaohe_feng_ | wangzhh: no. just once. | 03:35 |
shaohe_feng_ | wangzhh: it need to check the bus. | 03:35 |
shaohe_feng_ | wangzhh: on a node, bus is used for machine read . | 03:35 |
shaohe_feng_ | on a cluster, uuid is used for machine read | 03:35 |
Sundar | There may not be a PCI bdf in all hypervisors. | 03:36 |
wangzhh | shaohe, I suppose you mean to generate it at first time. | 03:36 |
shaohe_feng_ | Coco_gao: the x can be generated by the bus. | 03:36 |
shaohe_feng_ | Coco_gao: let me show you an example | 03:36 |
shaohe_feng_ | wangzhh: yes. | 03:36 |
Coco_gao | thanks shaohe | 03:36 |
Li_Liu | if there's no bdf, can we use uuid? | 03:36 |
shaohe_feng_ | Li_Liu: there's another identification without bdf | 03:37 |
shaohe_feng_ | for | 03:37 |
wangzhh | But agent doesn't know which time it is. | 03:37 |
wangzhh | shaohe_feng_ | 03:37 |
shaohe_feng_ | wangzhh: it need to check. if the bus not in the db, then it is the first time. | 03:38 |
Li_Liu | shaohe_feng_, sure that also works | 03:38 |
shaohe_feng_ | seems mdev has a uuid. | 03:38 |
shaohe_feng_ | and usb has it own bus. | 03:38 |
Sundar | The driver should report a unique id within the node for each device. It could be PCI bdf for libvirt or whatever is unique for PowerVM and others | 03:38 |
wangzhh | If so, agent should query db first. do something like diff? | 03:38 |
Sundar | Then that could be the x factor | 03:39 |
Sundar | wangzhh: No, agent should not query db. For 2 reasons: scaling, upgrades can change db schema | 03:39 |
shaohe_feng_ | wangzhh: yes. wen agent start. ti should sync with db firstly | 03:39 |
wangzhh | +1 | 03:39 |
shaohe_feng_ | when | 03:39 |
wangzhh | shaohe_feng_ agent doesn't query db now. | 03:40 |
shaohe_feng_ | Sundar: no, it should sync when it start. and can keep the info in cache. | 03:40 |
Sundar | Agent should not keep state. Even if it reads db at startup, it cannot assume that it will remain in sync, because operator can update config | 03:40 |
Sundar | No cache, please. We will hit all kinds of issues with stale caches, aging, etc. | 03:41 |
Li_Liu | shaohe_feng_, is the cache only containing the information related to the node? | 03:41 |
shaohe_feng_ | yes. | 03:41 |
wangzhh | Agree with sundar at this part. :) | 03:41 |
shaohe_feng_ | it's own node info. | 03:41 |
shaohe_feng_ | let me show you what I do. | 03:42 |
Li_Liu | Sundar, I think it should be ok if it only holds its own information in cache | 03:42 |
wangzhh | shaohe_feng_ haha talk is cheap, show me your code. :) | 03:42 |
Sundar | Li_Liu: The operator may want to disable or enable specific devices, or do other config. | 03:42 |
shaohe_feng_ | wangzhh: yes, I do show you code | 03:43 |
Coco_gao | before diff, the agent should get the old driver ovo, is that from db or cache? | 03:43 |
wangzhh | Cool. | 03:43 |
shaohe_feng_ | wangzhh: I have implemented it. | 03:43 |
Sundar | Li_Liu: Then we have to propagate such changes to each agent, ensure that it has received it, etc. The agent doesn't need any state for discovery -- just add a unique field that driver reports. | 03:43 |
shaohe_feng_ | I report the placement by: device_name@host this is unique | 03:43 |
shaohe_feng_ | and I just pud the device_name in cyborg db | 03:44 |
shaohe_feng_ | it can works well, no any conflict, | 03:44 |
Coco_gao | I agree with shaohe. | 03:44 |
shaohe_feng_ | for placement use device_name@host for index. | 03:44 |
Sundar | Coco_gao: Again, there are some conventions proposed for nested RP names. I am still trying to find the spec/doc where I saw that. | 03:45 |
shaohe_feng_ | but cyborg does not use device_name for index. | 03:45 |
shaohe_feng_ | Sundar: that's 2 things, but if you want to keep it same. it is OK. | 03:46 |
Sundar | shaohe_feng: There's no point in making them different. The only reason why we have a deployable name is to report to placement | 03:47 |
shaohe_feng_ | the big problems it not this. | 03:47 |
Li_Liu | Sundar, please help to find out the conventions. shaohe_feng_, could you share you code with us? | 03:48 |
shaohe_feng_ | the big problems is enumeration. | 03:48 |
xinranwang | maybe keep deployable name unique on same compute, and add hostname like "@host" when report to placement. | 03:48 |
Li_Liu | It seems we need some further discussion on this issue, we can discuss it in tomorrow's zoom sync | 03:48 |
xinranwang | i believe that's what shaohe_feng_ did. | 03:48 |
shaohe_feng_ | Li_Liu: if restart, and some change on the host. the enumeration may change the bus of a same device. | 03:49 |
Li_Liu | shaohe_feng_, yea, I know | 03:50 |
shaohe_feng_ | I means cloud provider may resize the hardware on the host | 03:50 |
shaohe_feng_ | so that's we really need to care | 03:50 |
shaohe_feng_ | after all, the machine need bus to identify a device not the name. | 03:51 |
shaohe_feng_ | Li_Liu: yes. maybe we care the same thing. | 03:51 |
Li_Liu | shaohe_feng_, driver should do the mapping from bus to device name/id I believe | 03:52 |
shaohe_feng_ | Li_Liu: yes, that's what we need to improve. | 03:53 |
Li_Liu | ok | 03:54 |
Sundar | @all: Please look at https://git.openstack.org/cgit/openstack/nova-specs/tree/specs/stein/approved/numa-topology-with-rps.rst?h=refs/changes/24/552924/14#n163 | 03:55 |
Coco_gao | shaohe_feng, that will be ok if name change, the conductor will delete the old device with name1 and add new device to db with name2. But actually, the db is exactly the same with the real situation. | 03:55 |
Li_Liu | Since it's pretty late for me here. Let's close it up and discuss more detail in tomorrow's sync up | 03:55 |
shaohe_feng_ | Li_Liu: OK. | 03:56 |
Coco_gao | but that name change for one device is not supposed to be frequent. | 03:56 |
shaohe_feng_ | Coco_gao: yes. not frequently. | 03:56 |
shaohe_feng_ | seldom resize the baremetal | 03:57 |
Li_Liu | Alright, let's call the meeting for today. Have a good night/day where ever you are | 03:59 |
Li_Liu | #endmeeting | 03:59 |
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)" | 03:59 | |
openstack | Meeting ended Wed Mar 13 03:59:45 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) | 03:59 |
openstack | Minutes: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-03-13-03.01.html | 03:59 |
openstack | Minutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-03-13-03.01.txt | 03:59 |
openstack | Log: http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-03-13-03.01.log.html | 03:59 |
*** Li_Liu has quit IRC | 04:34 | |
*** Sundar has quit IRC | 04:47 | |
*** Coco_gao has quit IRC | 06:06 | |
*** xinranwang has quit IRC | 06:14 | |
*** diga has joined #openstack-cyborg | 06:58 | |
*** diga has quit IRC | 07:09 | |
*** wangzhh has quit IRC | 07:46 | |
*** helenafm has joined #openstack-cyborg | 08:30 | |
*** shaohe_feng_ has quit IRC | 09:02 | |
*** FlorianFa has joined #openstack-cyborg | 09:23 | |
*** diga has joined #openstack-cyborg | 09:36 | |
*** diga has quit IRC | 11:03 | |
*** diga has joined #openstack-cyborg | 12:59 | |
diga | Hello everyone | 13:16 |
diga | do we have meeting today ? | 13:16 |
*** irclogbot_0 has quit IRC | 14:09 | |
*** irclogbot_0 has joined #openstack-cyborg | 14:12 | |
*** irclogbot_0 has quit IRC | 14:25 | |
*** irclogbot_0 has joined #openstack-cyborg | 14:28 | |
*** diga has quit IRC | 15:35 | |
*** irclogbot_0 has quit IRC | 15:36 | |
*** irclogbot_0 has joined #openstack-cyborg | 15:39 | |
*** irclogbot_0 has quit IRC | 15:49 | |
*** irclogbot_0 has joined #openstack-cyborg | 15:51 | |
*** irclogbot_0 has quit IRC | 15:52 | |
*** irclogbot_0 has joined #openstack-cyborg | 15:56 | |
*** sum12 has left #openstack-cyborg | 16:07 | |
*** helenafm has quit IRC | 16:40 | |
*** FlorianFa has quit IRC | 16:41 | |
*** FlorianFa has joined #openstack-cyborg | 16:54 |
Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!