Thursday, 2020-07-30

*** zhouhan_ has joined #openvswitch01:14
*** zhouhan has quit IRC01:17
*** dholler has quit IRC02:20
*** dholler has joined #openvswitch02:34
*** troulouliou_div2 has quit IRC02:50
*** anilvenkata has joined #openvswitch04:45
*** atpa8a has joined #openvswitch04:49
*** links has joined #openvswitch05:08
*** armax has quit IRC05:18
*** ralonsoh has joined #openvswitch05:43
*** atpa8a has quit IRC06:07
*** atpa8a has joined #openvswitch06:09
*** jaicaa has quit IRC06:24
*** jaicaa has joined #openvswitch06:27
*** maciejjozefczyk has joined #openvswitch06:52
*** slaweq has joined #openvswitch07:10
*** rcernin has quit IRC07:32
*** eelco has joined #openvswitch07:34
*** ak77_ has quit IRC08:04
*** ak77_ has joined #openvswitch08:05
*** Madkiss_ has left #openvswitch08:13
*** Madkiss has joined #openvswitch08:13
*** ktraynor has joined #openvswitch08:22
*** rcernin has joined #openvswitch08:30
*** rcernin has quit IRC08:34
*** links has quit IRC08:55
*** links has joined #openvswitch09:06
*** imaximets__ is now known as imaximets09:39
*** zhouhan_ has quit IRC09:43
*** matteo has joined #openvswitch09:57
*** psahoo has joined #openvswitch10:28
*** rcernin has joined #openvswitch10:38
*** rcernin has quit IRC10:54
*** thaller has quit IRC11:12
*** thaller has joined #openvswitch11:13
*** psahoo has quit IRC12:18
*** psahoo has joined #openvswitch12:32
*** thaller has quit IRC12:38
*** thaller has joined #openvswitch12:42
*** bostondriver has joined #openvswitch12:53
*** dcbw has joined #openvswitch13:52
*** psahoo has quit IRC13:55
*** panda has quit IRC15:21
*** eelco has quit IRC15:22
*** armax has joined #openvswitch15:23
*** zhouhan has joined #openvswitch15:41
*** zhouhan has quit IRC15:42
*** zhouhan has joined #openvswitch15:44
*** livelace has joined #openvswitch15:52
*** anilvenkata has quit IRC16:31
*** anilvenkata has joined #openvswitch16:31
*** slaweq has quit IRC16:37
*** links has quit IRC16:44
*** dceara has joined #openvswitch16:44
*** zhouhan_ has joined #openvswitch16:45
*** zhouhan has quit IRC16:48
*** zhouhan_ has quit IRC17:11
*** zhouhan has joined #openvswitch17:12
*** mmichelson_ is now known as mmichelson17:13
mmichelsonHi everyone. I'm going to go ahead and start the meeting17:14
mmichelson#startmeeting ovn_community_development_discussion17:14
openstackMeeting started Thu Jul 30 17:14:16 2020 UTC and is due to finish in 60 minutes.  The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot.17:14
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.17:14
openstackThe meeting name has been set to 'ovn_community_development_discussion'17:14
mmichelsonNormally I'd start the meeting by giving my update first, but I need to step away for a couple of minutes17:14
mmichelsonSo if anyone else wants to go ahead, I'll be back in a bit.17:14
dcearaHi17:15
*** anilvenkata has quit IRC17:16
*** Franky_T has joined #openvswitch17:16
_lore_hi all17:17
dcearaI can start, I have a quick update: we've been hitting some (probably) raft related issues lately. In ovn-k8s deployments, in specific conditions, the SB database ends up in an inconsistent state, i.e., on a follower the raft logs try to modify/delete records that are not in the snapshot. We're still investigating to figure out what the trigger is. It sounds a bit similar to what zhouhan reported a month or so ago. I was wondering if we got a17:21
dceararoot cause of that until now.17:21
dcearaOnce the DB ends up in this situation it will refuse any write transactions from clients.17:23
dcearaThat's it on my side for today. Thanks.17:25
mmichelsonOK, and I'm back now.17:25
mmichelsonI can go next17:25
mmichelsonEasy things first: I got the ECMP symmetric reply patch merged. Thanks numans for reviews. And thanks zhouhan for fixing the compile error introduced.17:26
mmichelsonNext, if you're an OVN committer you've probably seen my messages with Jeremy Kerr of Patchwork. It looks like we're going to have OVN as a separate project in patchwork from OVS. This will make it significantly easier to spot relevant patch series and get them reviewed.17:27
mmichelsonAnd having a separate patchwork project is also going to simplify the existing CI (i.e. 0-day robot) processing.17:27
mmichelsonIf you are a committer and have an objection to moving OVN to its own patchwork project, please speak up in the email thread.17:28
numansmmichelson++17:28
zhouhanmmichelson: sounds great17:28
mmichelsonAnd finally, we've had a number of fixes go into 20.06 and I think we're verging on the need for another release. Right now, all regressions and other bugs found by ovn-kubernetes have been fixed. However, one thing that's worth talking about is whether we think it is appropriate to put any "flow explosion" fixes into the 20.06 branch.17:29
mmichelsonovn-kubernetes is looking at changing to a shared gateway mode, and they have flow explosion concerns. So the question is, are these changes (those that have gone in, as well as those that are still up for review) candidates for branch-20.06?17:30
zhouhanif ovn-k8s can't wait for 20.09, then I think it is ok to add them to 20.0617:31
zhouhanotherwise, it would be better to avoid backporting, because those are not new features17:32
dcearammichelson: For the arp responder flow explosion patches, even though they're quite large, I think we can argue that they are bug fixes.17:32
zhouhans/not new features/not bug fixes17:33
zhouhanbut as dceara said, some of them were bug fixes. However, are those critical bugs?17:34
numansI'm fine if they need to be backported17:34
mmichelsonzhouhan, I think criticality is in the eye of the beholder :)17:36
zhouhanI fine for backporting, but I just want to make sure we can always keep released branches stable enough. We'd be cautious for any change that could impact the existing feature to be backported.17:37
mmichelsonzhouhan, +1. Yeah, that's why I wanted to float the idea in here.17:38
mmichelsonAnyways, the backporting idea doesn't have any hard vetoes, so that's good to see.17:39
mmichelsonAnd that's all I had wanted to bring up. Whoever wants to go next, feel free.17:39
numansI can go real quick.17:39
numansI worked on stabilizing the 20.06 branch as ovn-k8s CI reported issues.17:39
numansAll the issues are addressed now and I hope this will be the last regression because of I-P patches.17:40
dcearanumans++17:40
numansLast week I submitted a 2 patch series to improve conntrack usage in OVN. Would appreciate some reviews on it - https://patchwork.ozlabs.org/project/openvswitch/list/?series=19163017:41
zhouhanmmichelson: for a release, we freeze for weeks to make sure what's released is stable. We may have same criteria if we want to backport features - give some time for it to stay in master branch so that we have more confidence of its stability17:41
numanszhouhan, I couldn't get the chance to review the other 2 patches of yours. I'll get back to them soon. Hopefully by tomorrow.17:41
mmichelsonzhouhan, that makes sense. I'd argue that maybe we need more hardened CI so that we can get more immediate feedback as patches are merged to master.17:42
numanszhouhan, I've one point here. Right now no CMS be it openstck or ovn-k8s is testing their CI tests on top of OVN master.17:42
zhouhannumans: thanks numans17:42
numansand hence we are not able to catch any regressions on master17:42
numansAnd our test coverage is definitely not covering many things.17:42
numansAll the I-P patch series regressions were caught once 20.06 was consumed by our internal QE testing and ovn-k8s testing.17:43
numansWe need to improve more test coverage on master.17:43
numansin order for us to be sure that new features don't cause regression.17:43
numansMay be we should run ovn-k8s kind tests when we commit a patch to ovn master branch.17:44
numansany thoughts here ?17:44
numansI think that should be possible with github actions.17:44
mmichelson+`117:45
mmichelson+1, I mean17:45
numansmmichelson, you had some plans on the upstream CI right ?17:45
zhouhannumans: mmichelson: yes, that's a problem. We should improve test in master. But still, there is more chance to find bugs in master when people keep developing on it. Otherwise, if we completely trust CI and then release, there is not much point to keep a released branch :)17:46
numansmmichelson, may be github actions can be considered.17:46
numanszhouhan, Agree. But as a developer we definitely miss out on edge cases and some scenarios :)17:47
mmichelsonnumans, github actions could be a good idea. The only problem I have is that since we don't use PRs, the CI would run after the change is already pushed17:47
numansmmichelson, github actions would also run once we push a patch.17:48
numansSo may be patchwork based CI (if you're planning on those lines) can test a patch before applying.17:48
numansand once a patch is committed we can run ovn-k8s tests for example.17:48
numansBut I guess we can discuss about it in the ML too :)17:49
zhouhannumans: yes, I mean, we should do both: 1) improve testing on master, e.g. borrow CI from ovn-k8s/networking-ovn to test against OVN master. 2) give more time for a new feature on master before backporting to released branch17:49
mmichelsonnumans, sure.17:49
stintelhi all. I'm getting this error every minute: Jul 30 20:48:12 ministore ovs-vswitchd[13839]: ovs|02297|odp_util(handler13)|ERR|internal error parsing flow key17:49
stintelrecirc_id(0x1),dp_hash(0xa90679df),skb_priority(0x7),in_port(7),skb_mark(0),ct_state(0x21),ct_zone(0),ct_mark(0),ct_label(0),ct_tuple4(src=10.50.18.6,dst=239.0.0.250,proto=2,tp_src=0,tp_dst=0),eth(src=b2:1d:c3:86:4d:33,dst=01:00:5e:00:00:fa),eth_type(0x8100),vlan(vid=18,pcp=0),encap(eth_type(0x0800),ipv4(src=10.50.18.6,dst=239.0.0.250,proto=2,tos=0xc0,ttl=1,frag=no))17:49
numanszhouhan, agree on both.17:49
numansI'm done with the update. If some one wants to go next.17:49
numansstintel, Hi. this is on OVN deployment ?17:50
stintelopenvswitch-2.13.0 on kernel 5.7.8 (using kernel openvswitch modules)17:50
numanswe are in the middle of OVN meeting. Probably we can discuss after it.17:50
stintelnumans: I am seeing this permanently17:50
stintelah sorry about that17:50
numansor you can bring up next if you want :)17:50
zhouhanmay I go next?17:51
numanssure.17:51
zhouhanI was working on scale testing last week.17:51
zhouhanI found that there were regression between 2.12 and later branches. The northd CPU utitilization almost doubled in 20.03/20.06 compared to 2.12.17:52
zhouhanI was testing the creating and bind 12K ports in 1200 HVs scenario17:53
mmichelsonzhouhan, ouch17:53
zhouhanI am also reworking on the separate nb_cfg in Chassis/Chassis_private. Will send the patch soon.17:54
zhouhanI'll do more testing and analysis, and this is my update.17:55
numansI want to discuss a bit on the ovn-northd. Any idea on the ovn-northd-ddlog ?17:55
* zhouhan have the same question17:56
numansI feel may be we should add I-P support to ovn-northd (may be a rudimentary one to start with)17:56
numansWith my last work on the I-P patches, I feel more confident in it.17:56
zhouhannumans: do you mean I-P without DDlog?17:57
numansAnd this could relieve a bit of CPU for ovn-northd17:57
numanszhouhan, yes.17:57
numansuntil we have ddlog ready17:57
numansnot a full I-P support, but start with some basic scenarios17:57
numansJust a thought and wanted to check what everyone here thinks ?17:58
numansIs it worth it ?17:58
zhouhannumans: but it seems blp and leonid have brought ddlog very close for northd17:58
zhouhannumans: I wonder if this would be a big waste of effort17:58
numanszhouhan, That's the concern I have too.17:58
zhouhanI think the DDlog problem is (I guess) that northd code keeps changing and then it would be hard for Ben to catch up with17:59
zhouhanIf we do I-P, would it be the same problem?17:59
numanszhouhan, yes. that's the problem. I think sooner we have ddlog better it is.17:59
*** ralonsoh has quit IRC18:00
numanszhouhan, probably not. Because we are not adding new feature to northd right ? So it ddlog version doesn't need to catch up on it.18:00
numansAnyway I wanted to check on this :)18:01
zhouhannumans: sorry, what do you mean "we are not adding new feature to northd"? I think we kept adding :)18:01
mmichelsonAdding I-P doesn't add features that ddlog cares about18:01
numanszhouhan, I think I misunderstood your comment- If we do I-P, would it be the same problem?18:01
numansyes.18:02
numansIf some one wants to jump in and update please do so. Looks like I'm taking more time :)18:03
zhouhanoh, I meant, if we do I-P manually (without DDlog), would we face the same problem that northd keeps changes and our I-P implementation can't catch up?18:03
mmichelsonI think it depends to what degress we add I-P18:03
mmichelson*degree18:03
zhouhannumans: BTW, do you have any idea why northd CPU doubled after 2.12?18:03
numanszhouhan, no idea on that.18:04
zhouhanok18:04
dcearazhouhan: what scale scenario are you testing with?18:04
zhouhandceara: I was testing the creating and bind 12K ports in 1200 HVs scenario18:04
dcearazhouhan: without ACLs/LBs I assume, right?18:05
dcearazhouhan: one thing that comes to mind is the hairpin flows for LBs on logical switches.18:05
zhouhanOh, it seems northd is costing CPU when system is idle (not running any tests). This didn't happen before.18:06
zhouhandceara: no, not ACLs/LBs18:06
dcearazhouhan: ack18:06
zhouhanI'll dig more on this. Please continue if anyone wants to update18:07
mmichelsonI'm guessing by the silence that there's noboday else wanting to update18:10
mmichelsonSo I'll end the meeting here. Thanks everyone18:10
numansBye18:10
mmichelson#endmeeting18:10
openstackMeeting ended Thu Jul 30 18:10:14 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)18:10
openstackMinutes:        http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-07-30-17.14.html18:10
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-07-30-17.14.txt18:10
zhouhanbye18:10
openstackLog:            http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-07-30-17.14.log.html18:10
numansstintel, I'd suggest to send an email to the ovs-dev ML18:10
numansstintel, from what little I know, this normally happens when there is a mismatch in the way flow key is seen by vswitchd and the kernel datapath.18:11
*** livelace has quit IRC18:11
*** matteo has quit IRC18:18
*** matteo has joined #openvswitch18:18
stintelnumans: ok, I'll try that, probably after my 2w holiday. it's always the same multicast destination address. not seeing this for any other traffic18:35
*** livelace has joined #openvswitch18:36
*** Franky_T has quit IRC18:47
*** maciejjozefczyk has quit IRC19:13
*** maciejjozefczyk has joined #openvswitch19:13
*** zhouhan has quit IRC19:46
*** zhouhan has joined #openvswitch19:46
*** zhouhan_ has joined #openvswitch19:47
*** dceara has quit IRC19:50
*** zhouhan has quit IRC19:51
*** maciejjozefczyk has quit IRC19:58
*** zhouhan_ has quit IRC20:51
*** zhouhan has joined #openvswitch20:52
*** zhouhan_ has joined #openvswitch21:14
*** zhouhan has quit IRC21:14
*** livelace has quit IRC21:14
*** zhouhan_ has quit IRC21:39
*** zhouhan has joined #openvswitch21:46
*** zhouhan_ has joined #openvswitch21:54
*** zhouhan has quit IRC21:55
*** zhouhan has joined #openvswitch22:07
*** zhouhan_ has quit IRC22:10
*** matteo has quit IRC22:14
*** zhouhan has quit IRC22:22
*** zhouhan has joined #openvswitch22:23
*** zhouhan has quit IRC23:04
*** rcernin has joined #openvswitch23:14
*** zhouhan has joined #openvswitch23:37
*** bostondriver has quit IRC23:38

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!