Thursday, 2019-09-19

*** chenke has joined #openstack-cyborg01:12
*** openstackgerrit has joined #openstack-cyborg01:23
openstackgerritShogo Saito proposed openstack/cyborg master: Fix ARQ delete API issue  https://review.opendev.org/68301301:23
openstackgerritXinran WANG proposed openstack/cyborg master: bug fixing: let FPGA driver report correct traits when no SRIOV enabled  https://review.opendev.org/68095301:57
*** luyao has joined #openstack-cyborg02:02
*** chenke has quit IRC02:07
*** chunxiu has joined #openstack-cyborg02:13
*** TxGirlGeek has quit IRC02:38
openstackgerritYumengBao proposed openstack/cyborg master: conductor writes device_profile update to db  https://review.opendev.org/67940602:42
*** shaohe_feng has joined #openstack-cyborg02:47
*** s_shogo has joined #openstack-cyborg02:47
*** chenke has joined #openstack-cyborg02:56
*** xinranwang has joined #openstack-cyborg02:59
*** Yumeng has joined #openstack-cyborg03:00
chenkeHi03:00
*** wangzhh has joined #openstack-cyborg03:00
shaohe_fenghi all03:01
chenkehi maxiaoha03:01
*** changzhi has joined #openstack-cyborg03:01
wangzhhHi all. Hi shaohe.03:01
Yumenghi all03:01
*** Sundar has joined #openstack-cyborg03:02
SundarHi all03:02
Sundar#startmeeting openstack-cyborg03:02
openstackMeeting started Thu Sep 19 03:02:43 2019 UTC and is due to finish in 60 minutes.  The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot.03:02
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.03:02
*** openstack changes topic to " (Meeting topic: openstack-cyborg)"03:02
openstackThe meeting name has been set to 'openstack_cyborg'03:02
Sundar#topic Who's here03:02
*** openstack changes topic to "Who's here (Meeting topic: openstack-cyborg)"03:03
Sundaro/03:03
chenkeo/03:03
Yumeng#info Yumeng03:03
s_shogo#info s_shogo03:03
wangzhh#info wangzhh03:03
changzhi#info changzhi03:03
chenke#info chenke03:03
SundarHi chenke, Yumeng, s_shogo, wangzhh. Welcome changzhi03:04
shaohe_feng#info shaohe_feng03:04
SundarHi shaohe03:04
Sundar#topic Status03:04
*** openstack changes topic to "Status (Meeting topic: openstack-cyborg)"03:04
SundarFirst, thank you all for an active Train cycle. We have hit feature feeze a week ago03:04
SundarSO also did other projects.03:05
SundarThe good news: Cyborg side of the Nova integration is pretty much done. We just need to clean up the way we invoke other services03:05
chenkeGreat03:06
wangzhhCool03:06
SundarNot so good news: Our Nova patches did not enough reviews from Nova developers, and so did not make the cut.03:06
SundarPart of the problem is that, Cyborg patches were open for a long time, so Nova developers did not see it as ready, though we could put up a VM with Cyborg + Nova patches03:07
SundarAlso, there was a longstanding request to show tempest CI working. That completed exactly in the milestone week. That was too late to get sustained reviews.03:08
shaohe_fengWe know intigration is a big effort03:08
shaohe_fengSundar: you d a lot of effort. Thanks03:08
chenkeIt is understandable the patch in nova be merged slowly.03:09
SundarNP, thanks Shaohe. I am optimistic about U because I think we are close. and I have re-proposed the Nova spec. This time, tempest and most things are merged. Things that attratc croos-project attention, like tempest, privsep, sdk_adapter stuff, etc. are all done or making good progress03:09
SundarHope to get the Nova patches in the runway very early in the cycle. The more we wait, the more things get bogged down among the tons of other reviews.03:10
SundarThat said, we have a few more things to wrap up in Train :)03:11
SundarFirst, remove the hardcoding of 'dvstack-admin'. Thanks, chenker and all for addressing that :)03:12
SundarSecond, v1 API is deprecated but still supported in Train. But it is not working because we removed all v1 from devstack. I should re-enable it, I think03:12
xinranwang#info xinranwang03:13
SundarSHaohe's async bind, privsep, rbac are important03:13
xinranwangHi all03:13
SundarI think all the pep8/flake fixes from chenker/zhurong are looking good and will probably merge this week03:14
SundarCan you all think of anything else?03:14
YumengSundar: and please don't forget update device_profile db by conductor:https://review.opendev.org/#/c/679406/03:14
Yumengjust updated03:14
shaohe_fengSundar remain some slot for me to introduce the async jobs, so other's can easily to review it.03:15
shaohe_fengThanks03:15
Yumengand this gpu fix :https://review.opendev.org/#/c/675059/    I tested in my devstack env, it works03:15
SundarAh yes, that too, Yumeng :)  There are quite a few patches up there, including https://review.opendev.org/680953.03:15
SundarSure, let's knock off as much as we can. Was just listing the ones critical to complete in Train03:16
openstackgerritMerged openstack/cyborg master: P5: Fix pep8 error in cyborg/accelerator  https://review.opendev.org/67917503:16
Sundarshaohe_feng: Sure03:16
SundarFolks, anything else before we dive into Shaohe's async bind?03:17
s_shogoI'm starting test&validation task, with real machine , begin with common functions, independet from specific accelerators.03:18
s_shogoIf extracted some bugs or erros, report that or post patches till the Train release.03:18
SundarSure, s_shogo. I think the client effort can be aimed early in U release, since the Train release milestone for clients is past03:19
SundarI have some questions on RBAC: https://review.opendev.org/#/c/678177/ . In https://review.opendev.org/#/c/678177/3/cyborg/common/policy.py@83, should it be allow rule? ANybody can create an ARQ and thereby bind that ARQ, and so program an FPGA?03:19
s_shogoSundar: OK, I'll do the client&sdk task continuously, to the U release.03:21
Sundarwangzhh: What do you think?03:22
xinranwangshould we complete v2 API in T?03:22
wangzhhSundar, it should be allowed and recheck it in the method  if it is  a program action or not.03:23
Sundarwangzhh: ok03:23
Sundarxinranwang: Only devices API remains. We are supposed to merge only bug fixes, I think. So, it will probably go to U. Is anything else remaining?03:23
SundarOK, 35 min remaining. Let's move to async bind.03:25
Sundar#topic Async bind03:25
*** openstack changes topic to "Async bind (Meeting topic: openstack-cyborg)"03:25
SundarShaohe, take it away!03:25
shaohe_fengNow let's we start to introduce async bind. Any questions can fafter the introduction.03:26
shaohe_fengBriefly put, bind is to find a suitable device(maybe PCI, or MDEV) on the right host for a server instance to use.03:26
shaohe_fengSo what's the suitable device, we need a spec to describe it.03:26
shaohe_fengOn v1 we discribe the device directly on nova flavor extra spec, and cyborg parser the spec, Xinran implement this work.03:26
shaohe_fengOn v2, after the PTG discussion, we define it in cyborgs owen Device Pofile. And Sundar implement it.03:26
shaohe_fengI have no chance to attend PTG for discussion,  More details please talk with Sundar.03:26
shaohe_fengThans Xinran and Sundar's effor.03:26
shaohe_fengBefore we introduce async bind, let's know some implement(rules) in the current code firstly.03:26
shaohe_feng1. The AtachHandler in ExtARQ is not a list, so only one AtachHandler(one devcie for ARQ)03:27
shaohe_fengprofile group in order to get the expected devices.03:27
shaohe_fengNow Our cyborg ARQ API bind API is sync, be we define it as async, so need to improve.03:27
shaohe_fengSo what we changed:03:27
shaohe_feng1. Use a thread pool to start the async job.03:27
shaohe_fengIn cyborg spec, sundar suggests use concurrent, yes it is a python stand lib.  See python office link:03:27
shaohe_fenghttps://docs.python.org/3/library/concurrent.futures.html03:27
shaohe_fengAlso we can greening it by greenlet. patched it by eventlet.03:28
shaohe_fengutures = eventlet.import_patched('concurrent.futures') # 'greening' futures,03:28
openstackgerritMerged openstack/cyborg master: P6: Fix pep8 error in cyborg/agent and cyborg/db  https://review.opendev.org/67919303:28
shaohe_fengeasily to greening03:28
shaohe_fengSee python mail list discussion.03:28
shaohe_fengI have simply test it, it can work, but I did not test it performance, do not enable greening in the patch.03:28
shaohe_feng2. I move out the bind logical from ExtARQ object.03:29
shaohe_fengLet the ExtARQ maintain's its base function, such as its attribution's CRUD.03:29
shaohe_fengMove it to cyborg/accelerator/common/handler.py (not sure this is a good place, this is a OPEN)03:29
shaohe_fengAdd a basic and general bind handle class named Accelerators. (not sure this is a good name, this is a OPEN)03:29
shaohe_fengIt support the base _bind03:29
shaohe_fenghttps://review.opendev.org/#/c/681005/16/cyborg/accelerator/common/handler.py03:29
shaohe_fengIf a new acclerators need extra opeation, can derived it and extend it if needed, such as FPGA03:29
shaohe_fengline 386 at03:30
shaohe_fengFor FPGA it need to get image metadata, download image, program image and update the placement.03:30
shaohe_fengIf _bind is time consume, use "wrap_job_tb" to wraper it.03:31
shaohe_fengIn this wraper I add it with "is_job" and can catch every Exception/traceback during bind process, then log it.03:31
openstackgerritMerged openstack/cyborg master: P7: Fix pep8 error in cyborg/objects and cyborg/image  https://review.opendev.org/67952603:31
openstackgerritMerged openstack/cyborg master: P8: Fix pep8 error in cyborg/tests and add post_mortem_debug.py  https://review.opendev.org/67953803:31
shaohe_fengI also add a bind in the general class to start the jobs tagged with "is_job".03:31
shaohe_fengI also add a master to monitor the jobs(as sundar suggestted)03:31
shaohe_fenghttps://review.opendev.org/#/c/681005/16/cyborg/accelerator/common/handler.py03:31
shaohe_fengIt checks the jobs status and also will get the job Exception/traceback.03:32
shaohe_fengplease add a SUPPORT_RESOURCES in03:32
shaohe_feng4. I add ARQ_STATES_TRANSFORM_MATRIX to sync the status.03:32
shaohe_fengTalked with sundar and xinran, we add extra status: ARQ_DELETING and ARQ_BIND_STARTED03:32
shaohe_fengline at 2903:32
shaohe_fengI just refacor Sundar's effort. Do not change his logical, at present. So did not change any API define exposed to user. Thanks for Sundar's effort.03:33
shaohe_fengI did not test multi/batch AQRs, for example, a request for 2 FPGAs, or 1 GPU and 1 FPGA.03:33
shaohe_fengHave no really env.03:33
shaohe_fengSo I think we need to merge the patch, and let more developers test it.03:33
shaohe_fengThat's the different with VM management. Ironic or Cyborg sometimes need hardware, so it is difficult to manage.03:34
shaohe_fengthe commit message show you how to test this patch and03:34
shaohe_fenganalyze the process by log:   https://review.opendev.org/#/c/681005/16//COMMIT_MSG03:35
shaohe_fengAlso there's still lot of works on it. Need to improve it continuously. Let it works firstly, then improvement.03:35
shaohe_fengsorry03:36
shaohe_fengany questions?03:37
Sundarshaohe_feng: Thanks for all the time and hard work03:37
SundarFor testing, hope people can use the fake driver. It supports FPGA resource class. Can we get it to take the programming patch but treat it as a no-op?03:38
Sundar*programming code path03:38
shaohe_fengDo you means make some mock do not really programming?03:39
SundarYes03:39
shaohe_fengHardware support is really than VM03:39
Yumengshaohe_feng: that's really a comprehensive and deep research and very helpful introduction.03:39
shaohe_fengYumeng thanks. hopeful it is useful.03:40
s_shogoThanks, shaohe_feng :03:40
xinranwangshaohe_feng:  thanks Shaohe for your efforts03:41
shaohe_fengSundar let me give a method to mock it later.03:41
SundarNot everybody has hardware, as you said. But concurrent execution is not easy to test throughly. It may work in my env but fail in somebody else's. We can hopefully get more people to check it out using fake driver03:41
SundarGreat, thanks03:41
shaohe_fengYes, will give a guide for how to mock it.03:41
chenkeGreat jobs thanks ShaoHe.03:42
SundarAlso: "Move it to cyborg/accelerator/common/handler.py". Bind is really an operation on an ExtARQ. It logically belongs with objects/ext_arq.py. If you want to split that into separate source file, that is OK. But it can be a mix-in rather than a separate object/class, IMHO03:42
Yumengshaohe_feng: great! looking froward to the mock guide03:43
shaohe_fengI have check nova's object code, Then I make this change.03:44
wangzhhshaohe_feng, Thx for your effort.03:44
shaohe_fengSundar any details for how to split it?03:44
Sundarshaohe_feng: I found this blog useful: http://www.qtrac.eu/pyclassmulti.html03:45
SundarIt considers many ways to split a Python class into different source files, and finally recommends mix-ins03:46
shaohe_fengglance it. seem it is a big change.03:48
SundarHmmm... only the last part is the mix-in. That could be a small change. You can move your chosen methods into a separate file, put it in a mix-in, and inherit that mix-in into the ExtARQ object class03:49
SundarI can help as much as I can.03:50
shaohe_fenggood, then I can write a mock evn  guide for test.03:51
SundarIn that article, the last section "The Definitive Version?" alone is about mix-ins03:51
SundarOK, great03:51
SundarAnything else, Shaohe?03:52
shaohe_fengno, that's all for me.03:52
SundarThanks very much, once again.03:53
Sundar#topic AoB03:53
*** openstack changes topic to "AoB (Meeting topic: openstack-cyborg)"03:53
shaohe_fenglet move the patch on03:53
SundarPython IPv6 jobs: https://review.opendev.org/#/c/682517/ Please review03:53
SundarMany patches hit merge conflict after recent merges03:53
shaohe_fengit does not matter.03:54
shaohe_fengwe just improve our git skill03:54
shaohe_fengother active project03:54
SundarWe need one more review for https://review.opendev.org/#/c/680953/ from outside Intel.03:55
shaohe_fengconflict  is very common03:55
SundarSure03:55
SundarTrain schedule: https://releases.openstack.org/train/schedule.html RC1 candidate is next week!03:56
SundarHope to get the critical patches in by that time.03:56
SundarAfter that, even bug fixes are not assured03:56
SundarBTW, Cyborg will get packaged as a RPM as part of OpenStack release:  https://opendev.org/openstack/rpm-packaging/src/branch/master/openstack/cyborg03:57
SundarAnything else, guys?03:58
shaohe_fengno03:58
chenkeno03:58
SundarHave a good day! Bye03:58
Sundar#endmeeting03:58
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)"03:58
openstackMeeting ended Thu Sep 19 03:58:56 2019 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)03:58
Yumengbye03:58
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-09-19-03.02.html03:58
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-09-19-03.02.txt03:59
chenkebye all.03:59
shaohe_fengthank you03:59
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg/2019/openstack_cyborg.2019-09-19-03.02.log.html03:59
wangzhhbye03:59
shaohe_fengbye03:59
chenkethanks all.03:59
s_shogobye03:59
xinranwangbye03:59
*** Sundar has quit IRC03:59
*** s_shogo has quit IRC04:06
*** changzhi has quit IRC04:10
openstackgerritSundar Nadathur proposed openstack/cyborg master: Fix arq api errors in delete and unbind  https://review.opendev.org/68291304:12
openstackgerritMerged openstack/os-acc master: Removing project os-acc.  https://review.opendev.org/68249804:39
*** chunxiu has quit IRC06:01
*** chenke has quit IRC06:07
*** xinranwang has quit IRC06:08
*** wangzhh has quit IRC06:08
*** Yumeng has quit IRC06:26
openstackgerritchenchunxiu proposed openstack/cyborg master: Fix arq api errors in delete and unbind  https://review.opendev.org/68303506:38
openstackgerritchenchunxiu proposed openstack/cyborg master: Fix arq api errors in delete and unbind  https://review.opendev.org/68303506:40
*** chenke has joined #openstack-cyborg07:29
openstackgerritchenker proposed openstack/cyborg master: Fix the hardcoding of user role using sdk_adapter approach  https://review.opendev.org/68256508:52
*** tetsuro has joined #openstack-cyborg09:20
*** tetsuro has quit IRC10:34
*** chenke has quit IRC11:14
*** shaohe_feng has quit IRC11:50
openstackgerritYumengBao proposed openstack/cyborg master: conductor writes device_profile update to db  https://review.opendev.org/67940611:54
openstackgerritYumengBao proposed openstack/cyborg master: conductor writes device_profile update to db  https://review.opendev.org/67940611:55
*** tetsuro has joined #openstack-cyborg11:58
openstackgerritchenker proposed openstack/cyborg master: Fix the hardcoding of user role using sdk_adapter approach  https://review.opendev.org/68256512:08
*** chenke has joined #openstack-cyborg12:09
*** chenke has quit IRC12:24
*** chenke has joined #openstack-cyborg13:33
openstackgerritMerged openstack/cyborg master: bug fixing: let FPGA driver report correct traits when no SRIOV enabled  https://review.opendev.org/68095313:52
*** efried_pto is now known as efried13:56
*** tetsuro has quit IRC14:00
*** tetsuro has joined #openstack-cyborg14:06
*** chenke has quit IRC14:44
*** tetsuro has quit IRC14:52
*** tetsuro has joined #openstack-cyborg14:55
*** efried is now known as efried_pto14:57
*** tetsuro has quit IRC15:13
*** tetsuro has joined #openstack-cyborg15:14
*** tetsuro has quit IRC15:15
*** TxGirlGeek has joined #openstack-cyborg15:20
*** openstackgerrit has quit IRC16:06
*** gmann_afk is now known as gmann17:21
*** efried_pto has quit IRC18:01
*** efried has joined #openstack-cyborg18:03
*** efried is now known as efried_pto18:03
*** TxGirlGeek has quit IRC21:00
*** TxGirlGeek has joined #openstack-cyborg21:00
*** TxGirlGeek has quit IRC23:08
*** TxGirlGeek has joined #openstack-cyborg23:08

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!