Thursday, 2020-04-16

*** tetsuro has joined #openstack-cyborg00:03
*** brinzhang has joined #openstack-cyborg00:13
*** swp20 has joined #openstack-cyborg00:32
openstackgerritShogo Saito proposed openstack/cyborg master: Programming support (v2 Deployable API)  https://review.opendev.org/69819001:40
*** tetsuro has quit IRC01:47
*** tetsuro has joined #openstack-cyborg01:54
*** brinzhang_ has joined #openstack-cyborg02:04
*** brinzhang has quit IRC02:08
*** swp20 has quit IRC02:08
*** brinzhang has joined #openstack-cyborg02:08
*** brinzhang_ has quit IRC02:09
*** s_shogo has joined #openstack-cyborg02:44
*** songwenping_ has joined #openstack-cyborg02:51
*** Sundar has joined #openstack-cyborg02:58
*** Yumeng has joined #openstack-cyborg03:00
*** chenke has joined #openstack-cyborg03:00
Yumenghi Sundar03:01
*** xinranwang has joined #openstack-cyborg03:01
brinzhangHi Yumeng, Sundar03:01
s_shogoHi all03:01
SundarHi Yumeng, brinzhang03:02
xinranwangHi all03:02
chenkehi all~03:02
SundarHow are you all doing?03:02
Sundar#startmeeting openstack-cyborg03:02
openstackMeeting started Thu Apr 16 03:02:27 2020 UTC and is due to finish in 60 minutes.  The chair is Sundar. Information about MeetBot at http://wiki.debian.org/MeetBot.03:02
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.03:02
*** openstack changes topic to " (Meeting topic: openstack-cyborg)"03:02
openstackThe meeting name has been set to 'openstack_cyborg'03:02
brinzhangwelcome to songwenping_03:02
chenkefrom beijing university?03:03
SundarWelcome, songwenping_ . Would you like to introduce yourself?03:03
YumengSundar: Good. How's it going with your new job?03:03
chenkewelcome songwenping03:03
songwenping_Hi all03:03
Yumengwelcome songwenping03:03
s_shogowelcome03:03
songwenping_I am from Inspur.03:03
SundarGood. Brin has company. :)03:04
songwenping_do the nova-cyborg-interaction work.03:04
brinzhangsongwenping_ mainly work on nova-cyborg interaction, and his take some patch in this meeting03:04
brinzhangSundar, Yumeng, start meeting?03:04
SundarInteresting. What aspect of Nova interaction?03:05
xinranwangwelcome03:05
Yumengbrinzhang,songwenping_ : Are you planning to use cyborg in production, right?03:05
brinzhangGPU, and others03:05
brinzhangYumeng, considering03:05
Sundarbrinzhang: The meeting has started. We'll get to the patches after hearing a bit more about the Nova interaction.03:05
*** haibin-huang23 has joined #openstack-cyborg03:06
SundarIs it about more instance ops like shelve and suspend?03:06
Sundarsongwenping_, brinzhang: ^03:07
brinzhangSundar: now follow you patch https://review.opendev.org/#/c/716185/6/api-guide/source/accelerator-support.rst03:08
songwenping_not do these operators03:08
xinranwangSundar:  I'd like to know what operation could be add in next release, shelve/unshelve, like what else? And is there any gap here to complete. Because I have started to investigate it and hope add more operations in Nova in next release.03:08
brinzhangSundar, we also blocked these operations now, maybe we can open that after enough develop03:09
*** shaohe_feng has joined #openstack-cyborg03:09
shaohe_fenghi all03:09
Yumengxinranwang,brinzhang,Sundar and all: here is the etherpad for V release. https://etherpad.opendev.org/p/cyborg-victoria-goals  we can add topics now.03:09
brinzhangSundar: we are still in testing03:09
Sundarxinranwang: The API guide that brinzhang posted above lists the unsupported operations. Many opes there are important for operators. Sean Mooney from Nova is also working on some of these ops for V release, I think. You may want to check with him.03:10
Sundarbrinzhang: Good to know. Thanks.03:11
brinzhangYumeng: sean-k-mooney works on evacuate now03:11
SundarYumeng: Thanks for posting the V release etherpad.03:11
brinzhanghttps://review.opendev.org/#/c/715326/03:11
chenkeHi Sundar, do you mean sean will implement some virtual machine operations in the next version?03:12
xinranwangsean-k-mooney:  Is there already some patches from sean mooney? I'd like to have a look.03:12
Sundarchenke: Yes, he says he intends to do that03:12
xinranwangSundar:03:12
*** songwenping has joined #openstack-cyborg03:12
Sundarxinranwang: Please see https://review.opendev.org/#/c/715326/03:12
xinranwangSundar: thanks.03:13
chenkeOkay, this is good news, for both cyborg and nova》03:13
*** songwenping_ has quit IRC03:13
*** songwenping has quit IRC03:13
SundarYes.Shall we get started on our patches? https://review.opendev.org/#/q/status:open+project:openstack/cyborg+branch:master03:13
SundarI am reviewing https://review.opendev.org/#/c/696089/03:14
chenkeSure.03:14
brinzhangSundar https://review.opendev.org/#/c/696089/, how about this patch? need your check03:14
*** songwenping has joined #openstack-cyborg03:15
brinzhangexcept some nits inline, others looks good to me03:15
Sundarbrinzhang: Yes, that's what I am looking at. Will complete after this meeting.03:15
SundarBTW, Release candidate 1 for Ussuri is due Apr 23 https://releases.openstack.org/ussuri/schedule.html03:16
SundarSo, 8 days left. What are the highest priority patches to merge before that?03:16
Yumengbrinzhang: Thanks for posting 696089!03:17
SundarProgramming support https://review.opendev.org/698190 ?03:17
shaohe_fenglink:  https://etherpad.opendev.org/p/cyborg-victoria-goals03:18
shaohe_fengI have add this program API in the etherpad03:18
shaohe_fengthis is really needed in product env to adopt cyborg03:18
YumengSundar: from my side. I think Zuul check bandit failure can be merged before Ussuri is due. https://review.opendev.org/#/q/topic:fix-bandit-check-failures+(status:open+OR+status:merged)03:18
SundarYumeng: agreed03:18
SundarBut it is failing the bandit check :)03:19
brinzhangYumeng, I found the bandit are not be voted, so you should enable it in zuul task03:19
brinzhangand that should be works fine, that we can consider merge it.03:19
brinzhanga questions, what is bandit to do? I am know few of it03:20
YumengSundar, brinzhang: yes. After we fixed all failure, we can set that to voting. and at that time, bandit check will not fail again.03:20
brinzhangIMO, firstly, you should enable this task in zuul task03:21
Sundarbrinzhang: Bandit is a tool to detect security flaws. https://pypi.org/project/bandit/03:21
brinzhangthan fix the failure, that we can ensure that correctly03:21
brinzhangSundar: ack, thanks, will see it later03:22
Sundarbrinzhang: bandit is now a non-voting job. We need to fix the issues there and then make it voting. Otherwise, other open patches will be affected.03:22
xinranwanghttps://review.opendev.org/#/c/696089/ The rafactor arq api is also important, I think.03:22
Yumengbandit is for code security check, which can check operations like if shell operations in cyborg code is safe03:22
chenkeYe. I think that's the right way03:22
SundarSo, the programming API https://review.opendev.org/698190 is not for Ussuri?03:23
brinzhangSundar: I mean, we should enable it as the base patch, and then fix the issue, right?03:23
shaohe_fengBandit  will scan the python code and gen AST to check  security issues03:24
s_shogoSundar , shaohe_feng : As I mentioned in the program patch , as reply for the Sundar's comment (https://review.opendev.org/698190) , I would like to take discussion the interface of PATCH API or not. IMO, that effects the schedule of merge.03:24
shaohe_fengwe had better let programming API in cyborg as soon as possible03:25
shaohe_fengso we should try do our best to make it ready03:25
shaohe_fengthis is a very important feature03:26
shaohe_fengs_shogo thank you for post this feature.03:26
Sundarbrinzhang: IMHO, it is better to have a patch series that fixes the open bandit issues and then make it a voting job. We need to merge other open patches for Ussuri. They will be affected if we make it a voting job too soon. Agreed?03:26
brinzhangshaohe_feng:I also prefer to use update method03:26
shaohe_fengupdate for firmware update?03:26
shaohe_fengdo you means a new update API?03:27
brinzhangSundar: we just enable the vote in zuul, but cannot merged, that cannot impact any open patches, I think03:27
YumengSundar,brinzhang : Agreed. We  should fix bandit failures first, then update bandit job as voting in zuul check. otherwise, if we change voting first, other patches will get Zuul -1.03:27
shaohe_fenglet's welcome a new fresher haibin-huang03:28
brinzhangif we just fix that bandit failures, we cannot ensure it work fine after we enabled bandit vote in zuul03:28
Yumengbrinzhang: I will post all patches for bandit today.03:28
shaohe_fengshe will try to help verify the program API in product evn.03:28
chenkes/she/he03:29
haibin-huang23I am very glad to join cyborg family.03:29
SundarWelcome haibin-huang . Would you like to introduce yourself?03:29
shaohe_fengsorry, he :]03:29
Yumengwelcome haibin-huang! :)03:29
shaohe_fenghaibin-huang you can help to review this new API03:29
brinzhangwelcome haibin-huang03:29
*** tetsuro has quit IRC03:30
chenkewelcome haibin again. since beijing hackson.03:30
s_shogowelcome haibin-huang03:30
xinranwangwelcome haibin-huang2303:30
songwenpingwelcome haibin-huang too03:30
shaohe_feng#link  https://review.opendev.org/#/c/698190/03:30
brinzhangYumeng: ok, sound good, we shuold ensure the bandit run works fine, than star merge the open fix bandit patches03:30
haibin-huang23I am from Intel. I work on ONAP in past years. I focus on multicloud in ONAP project03:31
haibin-huang23thank you everyone03:31
SundarGood to know you, haibin-huang2303:31
brinzhangshaohe_feng, Sundar: the program API, if we use update instead of patch, is it ok?03:32
Sundarbrinzhang: The HTTP verbs are PUT, POST and PATCH. When you say update, which one do you mean?03:33
brinzhangshaohe_feng, Sundar: ... PUT, soory03:34
SundarIMO, PATCH is the one that is closest in semantics to the programming operation. We are also using it for programming during ARQ binding.03:34
shaohe_fengbrinzhang you can read feilding's paper03:35
shaohe_fengtry to find which is more accurate03:35
brinzhangI just think it's not very popular, if you all think that ok, I will be not concern this ^03:36
songwenpinghttps://review.opendev.org/#/c/718584/03:36
SundarPUT should be idempotent. PATCH need not be. The programming operation is not idempotent -- it can fail once and succeed the next time03:36
shaohe_fengbut the patch seems is newer method, I'm not sure feilding have discuss  the patch in his paper.03:37
Sundarbrinzhang: I think you pointed me to some Nova oprrations. But Nova developers themselves criticized my patches when I followed their precedent. :)03:38
Sundar*operations with PUT03:38
brinzhangSundar: I see that, and I have not looked into03:39
brinzhangpls see https://review.opendev.org/#/c/718584/03:39
brinzhangand https://review.opendev.org/#/c/718584, that revert device and deployable when resource provider create fail03:40
*** songwenping has quit IRC03:41
SundarDoes everybody think that the right approach is to delete/revert Cyborg objects when a Placement update fails?03:41
brinzhangSundar: did you review this patch?03:41
shaohe_fengbrinzhang why do you prefer PUT?03:41
*** songwenping has joined #openstack-cyborg03:41
songwenpingi am sorry my network is not good03:42
brinzhangsongwenping: Sundar said Does everybody think that the right approach is to delete/revert Cyborg objects when a Placement update fails?03:42
Sundarxinranwang, Yumeng, s_shogo, shaohe_feng, all: Re. https://review.opendev.org/#/c/718584, Does everybody think that the right approach is to delete/revert Cyborg objects when a Placement update fails?03:43
xinranwangIMO, It is reasonable.03:44
xinranwangOtherwise, the conductor will not notice the diff and will now report to placement correctly.03:45
shaohe_fengLet me think about it03:45
xinranwangwhich will cause inconsistency03:45
SundarOk. if you all decide that the best approach is to revert Cyborg db updates, then why not do Placement update first and update Cyborg db only if that succeeds?03:46
songwenpingxinranwang:right03:46
shaohe_fengyes, there will be disaccord  between placement and cyborg DB03:47
SundarPersonally, I'd prefer to update Cyborg db always, and mark the objects that failed to sync with Placement. That way, operators will know that Cyborg did its job correctly and Placement failed.03:47
shaohe_fengSundar  our design is do placement update first.03:48
shaohe_fengwe have discuss this issue in Dan smith's patch.03:48
brinzhangSundar: I saw you comment, I think your suggestion will be compex, but it is also an alternative way03:49
brinzhangs/compex/complex03:49
shaohe_fengdo Placement update first and update Cyborg db only if that succeeds is the easiest way to keep consistent03:49
shaohe_fengwe have discuss it before.03:49
SundarOk, I'll support your decision. Has it been tested fully in various scenarios, including Placement service down for some time and then comes up?03:50
shaohe_fengbet another things: if placement DB successfully, but cyborg DB failed03:50
shaohe_fengthen what should cyborg do?03:50
xinranwangSundar:  please see chenke's patch here: https://review.opendev.org/#/c/711912/203:51
songwenpingSundar:no problem03:51
brinzhangshaohe_feng: that will be raise > 500, I rembered chenke done this03:51
songwenpingi have test this scenario03:52
xinranwangThe commit message has explain the root reason of data imconsistency03:52
shaohe_fengso I prefer make sure cyborg DB successfully first.03:53
Yumengagree we do the revert to handle the exception. and Let's  add reconsidering placement report as a V release goal. whether we need to decouple placement report or not.03:53
xinranwangIf placement succeed and cyborg DB failed. The next time cyborg conductor do diff, it finds a diff,  writes db and update placement with same info. I think the placment will return like "resourve provider already exists" something like this, But it will not crash the service. What do you think?03:55
shaohe_fengsundar's comment:  Personally, I'd prefer to update Cyborg db always, and mark the objects that failed to sync with Placement. That way, operators will know that Cyborg did its job correctly and Placement failed.03:55
brinzhangthis way just reduce the date inconsistence's scenario, that cannot cover all scenarios.03:56
chenkeactually. no one can cover all scenriaos.03:56
s_shogoI also discussed the Placement-Cyborg interaction in enable/disable API , with xinranwang.03:56
shaohe_fengwe have issues(same to sudar's comments) this in Dan's patch, let us check, why not we give this?03:57
shaohe_fenglet me check it.03:57
shaohe_fenganother things:03:57
xinranwangchenke's patch has fix several secenarios already.  But as we said, it is difficult to cover all sceanrios.03:58
shaohe_fengwe need More accelerators drivers support03:58
xinranwangI think we can revert like this patch does, and discuss the placement and db update decoupling in V.03:59
chenkeagree.03:59
brinzhangif we add a flag to mark the data, I think we also need to add a period task to check this flag, and then do resource consistency03:59
SundarOur philosophy should be that Cyborg db is the source of knowledge about accelerators in any OpenStack cloud. If we let Placement take precedence, that reduces the value of Cyborg. That's just my opinion. As brinzhang said, it may be complex to make those changes.03:59
Sundarbrinzhang: we already have a periodc task from the agent04:00
shaohe_fengwe discuss it in:  https://review.opendev.org/#/c/708726/04:01
shaohe_fengSundar agree.04:02
shaohe_fengcyborg db should firstly.04:02
shaohe_fengbut I see brinzhang agree Dan comments.04:03
brinzhangyeah, I agree this way, but now we should do?04:03
Sundarshaohe_feng: Dan Smith's suggestion may not apply well for Cyborg. For example, when an instance terminates, Placement resources get released. ARQs are unbound and deleted, but the unbinding may take some time if we do any device cleanup in the future. There is a period of time when the device is still not free (which Cyborg knows) but Placement04:03
Sundardoesn't know.04:04
brinzhangmaybe we can make this as a plan in V release04:04
shaohe_fengyes. This is a known and controversial issue04:04
shaohe_fengI agree with you.04:04
brinzhangsongwenping's just make this as a bug fix04:04
Sundarbrinzhang and all: first, is it an important issue to fix in Ussuri?04:04
shaohe_fengand I have discuss with xinranwang for it.04:04
shaohe_fengwe are all same options with you.04:05
shaohe_fengmake cyborg db correctly firstly?04:06
shaohe_fengbrinzhang agree?04:06
shaohe_fengor you can list some pros and cons to make the conclusion04:07
xinranwangThe root reason is that now we coupling placement and db update, and we can not avoid this dependency. IMHO, we should discuss the decoupling in V. Now, we just make sure that there is less risk to have data inconsistency.04:07
chenke+104:08
brinzhangI am not sure whether is good for me, when would like to fix this issue? shaohe_feng04:08
brinzhangxinranwang: agree +104:09
shaohe_fengOK, let's xinranwang to make a good design firstly04:09
songwenpingxinranwang: agree +104:09
Sundarxinranwang: are you voting to merge the patch now, and revisit it later?04:10
brinzhangxinranwang's mean is to agree to merge this patch, then in V we should do Sundar suggestion, right?04:10
shaohe_fengmaybe a new spec for it firstly, and we can discuss it base on the new spec.04:10
xinranwangI think we can merge it to let U have less risk of data inconsistency04:11
Sundarall ok with that?04:12
Yumengagree +104:12
Sundarshaohe_feng, s_shogo ^04:12
s_shogoagree, +104:13
chenkesome advice inline .04:13
chenkehttps://review.opendev.org/#/c/718584/9/cyborg/conductor/manager.py04:13
brinzhangYumeng, xinranwang: please add this to V release plan04:13
xinranwangI have thought about the decoupling a bit, there's lots of things to discuss,several gaps as I mentioned in last meeting. We should think more and discusss more to have a solution of decoupling.04:13
shaohe_fengI give up vote.04:13
shaohe_fengstay neutral04:13
SundarOk, great. Please mention this in commit message of this patch.04:14
shaohe_fengif we merge it, please leave details note in the patch04:14
brinzhangxinranwang: can you leave comments in this patch? that songwenping can update that04:14
shaohe_fengsuch as FIXME or NOTE in it.04:14
brinzhangThat TODO(), I think04:15
SundarAnything else, folks? We are over the time.04:15
xinranwangshaohe_feng:  yes, understand, it is a hard decision, we have a long way to go  lol...04:15
xinranwangbrinzhang:  sure04:15
brinzhangxinranwang: thanks04:15
brinzhangit's time to launch, end meetting?04:16
shaohe_fengOK.04:16
shaohe_fengmore drivers are welcome this new release right?04:16
SundarGreat. Thanks a lot, everybody! Great progress this cycle. We have another week to make even more contributions. Take care and stay safe!04:16
Sundar#endmeeting04:16
*** openstack changes topic to "Pending patches (Meeting topic: openstack-cyborg)"04:16
openstackMeeting ended Thu Apr 16 04:16:39 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)04:16
openstackMinutes:        http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-04-16-03.02.html04:16
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-04-16-03.02.txt04:16
openstackLog:            http://eavesdrop.openstack.org/meetings/openstack_cyborg/2020/openstack_cyborg.2020-04-16-03.02.log.html04:16
shaohe_feng^ Sundar04:16
Sundarshaohe_feng: Yes, newer drivers are always welcome, hopefully with CI support so that they can be maintained.04:17
*** Sundar has quit IRC04:21
*** s_shogo has quit IRC04:34
*** tetsuro has joined #openstack-cyborg04:48
*** songwenping has quit IRC05:41
*** xinranwang has quit IRC06:19
openstackgerritYumengBao proposed openstack/cyborg master: Refactor v2 arq api  https://review.opendev.org/69608906:29
openstackgerritShogo Saito proposed openstack/cyborg master: Programming support (v2 Deployable API)  https://review.opendev.org/69819006:57
*** links has joined #openstack-cyborg08:02
openstackgerritMerged openstack/cyborg master: Refactor v2 arq api  https://review.opendev.org/69608909:07
*** chenke has quit IRC09:08
*** tetsuro has quit IRC09:52
*** tetsuro has joined #openstack-cyborg09:54
*** shaohe_feng has quit IRC10:18
*** tetsuro has quit IRC10:28
openstackgerritWenping Song proposed openstack/cyborg master: revert device and deployable when resource provider create fail  https://review.opendev.org/71858410:35
*** haibin-huang23 has quit IRC11:48
openstackgerritWenping Song proposed openstack/cyborg master: revert device and deployable when resource provider create fail  https://review.opendev.org/71858411:54
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: Ascend driver:[B602:subprocess_popen_with_shell_equals_true]  https://review.opendev.org/72045612:38
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: Ascend driver:[B602:subprocess_popen_with_shell_equals_true]  https://review.opendev.org/72045612:42
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: [B104:hardcoded_bind_all_interfaces]  https://review.opendev.org/72014912:43
*** links has quit IRC12:45
*** links has joined #openstack-cyborg12:56
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: SPDK driver:[B602:subprocess_popen_with_shell_equals_true]  https://review.opendev.org/72047513:25
*** haibin-huang has quit IRC13:26
*** haibin-huang has joined #openstack-cyborg13:26
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: [B108:hardcoded_tmp_directory]  https://review.opendev.org/72014313:48
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: [B104:hardcoded_bind_all_interfaces]  https://review.opendev.org/72014913:48
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: Ascend driver:[B602:subprocess_popen_with_shell_equals_true]  https://review.opendev.org/72045613:48
openstackgerritYumengBao proposed openstack/cyborg master: Fix bandit error: SPDK driver:[B602:subprocess_popen_with_shell_equals_true]  https://review.opendev.org/72047513:48
openstackgerritYumengBao proposed openstack/cyborg master: Change bandit job from non-voting to voting  https://review.opendev.org/72047913:48
*** Yumeng has quit IRC14:13
*** links has quit IRC17:44

Generated by irclog2html.py 2.15.3 by Marius Gedminas - find it at mg.pov.lt!