Thursday, 2020-06-18

*** zhouhan_ has joined #openvswitch00:38
*** zhouhan has quit IRC00:42
*** markmcclain has quit IRC02:01
*** yamamoto has joined #openvswitch02:01
*** markmcclain has joined #openvswitch02:02
*** dholler has quit IRC02:25
*** yamamoto has quit IRC02:26
*** yamamoto has joined #openvswitch02:27
*** yamamoto has joined #openvswitch02:27
*** dholler has joined #openvswitch02:38
*** zhouhan_ has quit IRC02:43
*** zhouhan has joined #openvswitch02:44
*** rcernin has quit IRC02:48
*** rcernin has joined #openvswitch03:00
*** rcernin has quit IRC03:05
*** armax has quit IRC03:17
*** rcernin has joined #openvswitch03:21
*** rcernin has quit IRC03:22
*** rcernin has joined #openvswitch03:22
*** yamamoto has quit IRC03:41
*** yamamoto has joined #openvswitch03:49
*** anilvenkata has joined #openvswitch04:57
*** cpaelzer__ has joined #openvswitch05:17
*** cpaelzer has quit IRC05:17
*** acidfu_ has joined #openvswitch05:28
*** acidfoo has quit IRC05:30
*** numans has joined #openvswitch05:40
*** cpaelzer__ is now known as cpaelzer05:45
*** links has joined #openvswitch05:46
*** eelco has joined #openvswitch06:02
*** imaximets__ has joined #openvswitch06:06
*** imaximets_ has quit IRC06:06
*** links has quit IRC06:07
*** links has joined #openvswitch06:12
*** maciejjozefczyk has joined #openvswitch06:22
*** blahdodo_ has quit IRC06:31
*** links has quit IRC06:32
*** blahdodo has joined #openvswitch06:35
*** psahoo has joined #openvswitch06:36
*** dmarchan1 is now known as dmarchand06:48
*** mmirecki has joined #openvswitch06:55
*** slaweq has joined #openvswitch07:01
*** links has joined #openvswitch07:01
*** slaweq has quit IRC07:06
*** slaweq has joined #openvswitch07:07
*** dceara has joined #openvswitch07:18
*** maciejjozefczyk has quit IRC07:19
*** maciejjozefczyk has joined #openvswitch07:37
*** anilvenkata has quit IRC07:47
*** anilvenkata has joined #openvswitch07:52
*** aconstan has joined #openvswitch07:53
*** rcernin has quit IRC07:54
*** rcernin_ has joined #openvswitch07:55
*** links has quit IRC08:00
*** links has joined #openvswitch08:10
*** rcernin_ has quit IRC08:20
*** darkemon has quit IRC08:30
*** darkemon has joined #openvswitch08:32
*** yamamoto has quit IRC08:42
*** yamamoto has joined #openvswitch08:42
*** yamamoto has quit IRC09:12
*** yamamoto has joined #openvswitch09:13
*** yamamoto has joined #openvswitch09:23
*** links has quit IRC09:27
*** links has joined #openvswitch09:28
*** yamamoto has quit IRC09:40
*** psahoo has quit IRC09:53
*** psahoo has joined #openvswitch10:09
*** yamamoto has joined #openvswitch10:14
*** psahoo_ has joined #openvswitch10:16
*** yamamoto has quit IRC10:20
*** psahoo has quit IRC10:20
*** yamamoto has joined #openvswitch10:20
*** yamamoto has quit IRC10:43
*** yamamoto has joined #openvswitch10:44
*** yamamoto has quit IRC10:44
*** yamamoto has joined #openvswitch10:48
*** rcernin_ has joined #openvswitch10:49
*** yamamoto has quit IRC10:53
*** yamamoto has joined #openvswitch10:54
*** yamamoto has quit IRC10:59
*** yamamoto has joined #openvswitch11:16
*** yamamoto has quit IRC11:20
*** yamamoto has joined #openvswitch11:34
*** yamamoto has quit IRC12:12
*** bostondriver has joined #openvswitch12:45
*** donhw has quit IRC12:55
*** apus has quit IRC13:04
*** apus has joined #openvswitch13:04
*** donhw has joined #openvswitch13:15
*** armax has joined #openvswitch13:27
*** yamamoto has joined #openvswitch13:28
*** yamamoto has quit IRC13:34
*** acidfu_ has quit IRC13:53
*** billp_ has joined #openvswitch13:56
*** dcbw has joined #openvswitch13:59
*** billp has quit IRC14:00
*** imaximets__ is now known as imaximets14:01
*** KpuCko has quit IRC14:09
*** rcernin_ has quit IRC14:14
*** acidfu_ has joined #openvswitch14:51
*** slaweq has quit IRC15:33
*** links has quit IRC15:44
*** eelco has quit IRC15:58
*** jobewan has quit IRC16:09
*** mmirecki has quit IRC16:11
*** zhouhan has quit IRC16:37
*** zhouhan has joined #openvswitch16:38
*** armax has quit IRC16:48
*** armax has joined #openvswitch16:49
*** zhouhan_ has joined #openvswitch17:01
*** zhouhan has quit IRC17:04
*** zhouhan_ has quit IRC17:15
*** zhouhan has joined #openvswitch17:16
numansHello17:17
pandahi17:17
numansmmichelson, meeting time ?17:17
flaviofhi all17:17
dcearaHi17:17
numansFinding the tag to start the meeting.17:18
numans#startmeeting ovn_community_development_discussion17:18
openstackMeeting started Thu Jun 18 17:18:44 2020 UTC and is due to finish in 60 minutes.  The chair is numans. Information about MeetBot at http://wiki.debian.org/MeetBot.17:18
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.17:18
openstackThe meeting name has been set to 'ovn_community_development_discussion'17:18
numansHello everyone.17:18
numansNot sure if mmichelson is there or not.17:19
_lore_hi all17:19
numanswe can probably start.17:19
flaviofo/17:19
mmichelsonsorry I'm here17:19
mmichelsonJust got pulled away for a sec17:19
pandao/17:19
numansmmichelson, I just started. All yours.17:19
mmichelsonOK, thanks17:19
mmichelsonBiggest thing is that last week we released 20.06.0 and 20.03.117:19
numansthanks for the release.17:20
flaviofwoot!17:20
mmichelsonI noticed blp's patch series to OVS to remove insensitive language where possible. I think that at least in our documentation we probably should follow suit where it makes sense.17:20
numansmmichelson, agree. I did a grep and find few instances of such words.17:20
mmichelsonSo I did some searches for specific trigger words, and in documentation it's not too difficult to fix up17:21
numansagree.17:21
mmichelsonOther than that, I've been chipping away at old patches of mine to try to get them in shape to be updated (case sensitivity in MAC and IPv6 addresses, ovs-scale-test plaintext client)17:21
mmichelsonAnd I've been reviewing17:22
mmichelsonThat's all from me. Whoever wants to go next, feel free.17:22
numansI can go real quick.17:22
numansI got the ack from zhouhan for the v12 of I-P patches.17:22
numanszhouhan thanks for the review.17:23
numanswaiting for dceara's comments if any.17:23
numansI worked on a couple of patches and submitted for review.17:23
numansOne was to add packet marking for packets which got the router policies applied.17:23
zhouhannumans: np17:23
numansAnd did some reviews.17:24
numansThat's it from me.17:24
dcearanumans, ack, I'll try to have another look at the I-P patches tomorrow.17:24
_lore_can I go next? very quick17:24
numansdceara, thanks.17:24
_lore_this week I mainly work on mtu issue in ovs/ovn17:25
*** mmirecki has joined #openvswitch17:25
_lore_in particular if DF is not set, the sender is fragmenting the traffic after an ICMP error msg sent by OVN17:25
_lore_the issue is OVN still continues to send an ICMP error msg on fragmented traffic if we have connection tracking in ingress pipeline17:26
_lore_I figured out it is a issue in the ovs kernel datapath, I need to send the fix upstream17:26
numans_lore_, thanks for fixing this.17:27
_lore_then I noticed the value we configured for check_packet_len is the frame size and not the mtu17:27
_lore_so I posted a patch for it17:27
_lore_numans: imaximets: I sent a v217:27
_lore_any comments on it?17:27
numans_lore_, ack.17:27
numansI don't have any.17:27
imaximets_lore_, I didn't look yet.17:28
*** psahoo_ has quit IRC17:28
*** Franky_T has joined #openvswitch17:28
_lore_ack, actually in the current implementation we are wasting 14+4 bytes17:28
_lore_that's all from my side17:29
zhouhanmay I go next?17:31
numans sure.17:32
zhouhanwe noticed another RAFT problem this week17:32
zhouhanFor some reason, one of the nodes in the cluster missed some transactions, and become inconsistent from the leader and the other node17:33
zhouhanRestarting the node doesn't help, because the current logs are consistent with the cluster and updates can continue.17:34
numanszhouhan, so the missed transactions are gone for ever ?17:34
zhouhanThe inconsistent part is in the snapshot, which is never going to be synced unless a install snapshot RPC is triggered, which doesn't happen usually.17:35
numansok17:35
zhouhannumans: yes, for that node, the data is inconsistent for ever. So any clients connected to that server initially would get inconsistent data17:35
numanszhouhan, its the leader ?17:36
zhouhanonly re-joining the node to the cluster would solve the issue.17:36
zhouhannumans: no it is not the leader17:36
numanszhouhan, ok.17:36
numansso ovn-controller and ovn-northd will not see this inconsistency since they always connect to leader right ?17:37
dcearazhouhan, Should we have a periodic consistency check to detect such cases earlier?17:37
zhouhanFor some time it is not even detected. However, once there is a transaction appending from the leader that need to touch the inconsistent part of the data, i.e. delete an unexisted row, the server would detect itself as inconsistent and then prevent any transaction through that node, and all the clients connected to that node would fail for ever.17:38
zhouhandceara: it detects as possible as it can from the server point of view.17:39
zhouhandceara: but it is not gracefully handled, and it still allows client to connect17:39
imaximetszhouhan, but clients should reconnect to other correct server and sync with it.17:40
dcearazhouhan, So is the problem that we allow new client connections even when in this state?17:40
zhouhanThe root cause of the inconsistent data is still not clear. One thing suspected to be triggering this is that the node rebooted by itself before this happens.17:40
zhouhandceara: that is one of the problem. But the first thing is how could the inconsitent data happen. I still have no clue.17:41
*** mmirecki has quit IRC17:41
dcearazhouhan, ack.17:42
zhouhanimaximets: because of fast-resync, clients won't get the correct data unless it restarts17:42
zhouhanso I am thinking maybe the client side detection added by dceara is still needed for such cases.17:43
imaximetszhouhan, will the recent fix from dceara that I merged fix the issue with fast-resync in this case?17:43
dcearazhouhan, but does the IDL detect the missing updates?17:43
imaximetszhouhan, I mean, client will detect inconsistency eventually and disable fast-resync.17:44
zhouhanimaximets: that fix is for conditional monitoring. In this case it is purely data inconsistency on server side, so that's not helpful17:44
imaximetszhouhan, oh.. ok.17:44
zhouhanimaximets: the IDL detection and disabling fast-resync (the last patch of the series) was not merged :)17:45
dcearazhouhan, OK, but you should still see logs on ovn-controller about inexistent rows. Do you see those in your case?17:45
zhouhandceara: I guess it would detect, if there are transactions to trigger it. But we fixed them before it happens (by restarting the clients)17:46
imaximetszhouhan, I understand.17:46
dcearazhouhan, OK, I can address the comments from imaximets and send a new version of that patch then.17:46
zhouhanI would thank Ali for reporting this issue (who may be not here in the channel today)17:46
zhouhanThat's my update :)17:46
mmichelsonOK, anybody else care to give an update?17:48
flaviofMay I go next?17:48
dcearaI just have a quick note for today: zhouhan my plan is to have a go at the lflow explosion reported by Girish for dnat_and_snat as soon as I get a chance. That's unless you started already on it.17:48
zhouhandceara: I am still not sure. It would help to self-correct in such situation, but we also need to make sure such problem is exposed without being hidden completely17:48
zhouhandceara: sure, thanks for helping on dnat_and_snat flow problem!17:49
dcearazhouhan, that's why I was thinking of a periodic self check on the server side to see if the DBs are consistent.17:49
imaximetszhouhan, dceara: I think we should report such issues loudly in logs with ERR log level at least.17:50
dcearaimaximets, ++17:50
zhouhandceara: hmm, on server side, I am not sure what other check can be done, beside the current check when transaction detects inconsistency.17:51
*** Franky_T has quit IRC17:52
zhouhandceara: in theory, the raft log should already ensure consistency. It must be a bug somewhere in some corner situation.17:52
dcearazhouhan, I see, ok.17:52
zhouhanimaximets: +1 for error logs17:52
dcearazhouhan, imaximets: then i'll respin the patch and use error logs instead of the current WARN and we can continue the discussion on the ML (at least for the client side)17:53
zhouhanimaximets: on server side, it already have error logs when it is detected, but not quite straightforward. It is only "syntax error: ..."17:53
zhouhandceara: sounds good17:53
imaximetszhouhan, We might need to improve server side logs.17:54
imaximetsdceara, thanks.17:55
zhouhanyeah, and better handling on disconnecting itself from the cluster in such case, I think17:55
*** Franky_T has joined #openvswitch17:55
pandazhouhan: is all this captured in some bug description ?17:56
zhouhanpanda: no, it is just here :)17:57
pandazhouhan: ok.17:57
zhouhanOne way to detect such situation from monitoring point of view, is to compare the number of rows of particular tables, such as logical_flow and port_binding, periodically from each individual node.17:58
imaximetspanda, zhouhan: It's good that we have meeting logs. :)17:58
flaviof++17:58
dcearaimaximets, zhouhan Shall we consider opening a github issue for this?17:58
mmichelsonProbably a good idea17:59
zhouhan++17:59
zhouhanWe don't have an official way to track OVS bugs I guess17:59
zhouhanOr even OVN bugs17:59
imaximetsit's usually just an e-mail thread.17:59
numansgithub issues could be a starter here.18:00
mmichelsonYeah, github issues make sense to me. THe project is on github, after all :)18:00
zhouhanyes, email thread is good for discussion but doesn't provide a good track. Can we agree on github as bug tracking in the future?18:01
pandazhouhan: rows may not be the right path, if you recevive two updates so the rows number remains the same, you are not detecting changes.18:01
flaviof#action use https://github.com/ovn-org/ovn/issues as a way of tracking ovn bugs going forward18:01
zhouhanpanda: yes, it is just one indicator. When this happened, there were hundreds of rows difference in our case :)18:02
pandazhouhan: yep ok.18:02
zhouhanpanda: of course it doesn't guarantee to detect all inconsistency by such monitoring18:02
dcearazhouhan, maybe such a monitoring utility would be useful to have in the repo itself. It doesn't have to be 100% precise if it raises an alarm about a potential inconsistency. What do you think?18:03
zhouhandceara: +118:03
pandaI would try to use logical clocks f possible18:04
pandabut it might be a long term solution.18:04
zhouhanor maybe implement a feature in raft to periodically compare snapshots between the servers18:05
imaximetszhouhan, dceara: crazy idea: clinet IDL that connects to all cluster nodes at once and monitors the difference over time.18:06
pandadoes the update use 2-part or 3parts commits ?18:06
zhouhanI'd like to try to figure out the root cause after all :)18:06
mmichelsonimaximets, I was wondering if you might bring something like that up :)18:06
dcearaimaximets, that sounds cool!18:06
pandazhouhan: yeah, taht would be the start :)18:06
imaximetszhouhan, sure. root cause must be identified anyway.18:06
zhouhanThis was interesting discussion. Thanks all.18:07
* dceara has to run. Bye all!18:08
* numans too.18:08
flaviofbye dceara !18:08
zhouhan(a pity that blp wasn't here)18:08
zhouhanbye all18:08
mmichelsonbye!18:08
numansbye18:08
pandabye18:08
imaximetsbye18:08
dcearaThanks!18:08
mmichelsonSeems a good place to end the meeting :)18:08
mmichelson#endmeeting18:08
imaximetsdidn't work. :)18:09
mmichelsonuhhh18:09
flaviofmaybe numans have to do it?18:09
pandatoo late18:09
numans#endmeeting18:09
openstackMeeting ended Thu Jun 18 18:09:30 2020 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)18:09
openstackMinutes:        http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-06-18-17.18.html18:09
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-06-18-17.18.txt18:09
openstackLog:            http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2020/ovn_community_development_discussion.2020-06-18-17.18.log.html18:09
pandathis will be a 12hours meeting18:09
_lore_bye all18:09
pandanumans: thanks for the merge!18:09
pandabye18:09
numansflaviof, I thought any one can end the meeting18:10
numanspanda, welcome18:10
flaviofnumans: I think you have to finish what you started. ;)18:10
numansflaviof, indeed :)18:10
numansflaviof, Actually I was about to disappear18:10
*** factor__ has quit IRC18:11
*** factor__ has joined #openvswitch18:11
*** Franky_T has quit IRC18:16
*** dholler has quit IRC18:22
*** acidfoo_ has joined #openvswitch18:27
*** acidfu_ has quit IRC18:29
*** jpettit has joined #openvswitch18:39
*** jpettit has quit IRC18:40
*** maciejjozefczyk has quit IRC18:52
*** zhouhan has quit IRC19:00
dcearaimaximets, re the IDL inconsistency detection patch, based on the previous discussion during the OVN meeting, I'm wondering now if we should do an idl_retry even in cases where we could correct the inconsistency (e.g., deletion of missing row)19:15
*** zhouhan_ has joined #openvswitch19:16
dcearazhouhan_, not sure if you got this too so resending: <dceara> imaximets, re the IDL inconsistency detection patch, based on the previous discussion during the OVN meeting, I'm wondering now if we should do an idl_retry even in cases where we could correct the inconsistency (e.g., deletion of missing row)19:17
*** zhouhan_ has quit IRC19:19
*** zhouhan_ has joined #openvswitch19:20
*** zhouhan_ has quit IRC19:20
*** zhouhan has joined #openvswitch19:21
zhouhandceara: thanks, I didn't see that19:21
zhouhandceara: I am not what this means. The idea was to retry with last_id = 0, right?19:23
dcearazhouhan: the patch set last_id = 0 and then called ovsdb_idl_retry which forces reconnect. But only if the update was trying to modify a missing row. In other cases, e.g., "add existing row" the code recovers by deleting and then readding19:26
dcearathe row19:26
dcearazhouhan: I'm thinking that these cases also correspond to inconsistencies right?19:27
*** anilvenkata has quit IRC19:27
zhouhanok, I see. Yes, I agree with that.19:27
dcearazhouhan: ok, I'll give it a bit more thought before sending a new revision, thanks!19:28
zhouhandceara: thank you19:30
*** yamamoto has joined #openvswitch19:32
*** yamamoto has quit IRC19:37
*** zhouhan has quit IRC19:51
*** mmichelson has quit IRC19:52
*** zhouhan has joined #openvswitch19:52
*** mmichelson has joined #openvswitch19:59
*** zhouhan_ has joined #openvswitch20:05
*** zhouhan has quit IRC20:05
*** mmirecki has joined #openvswitch20:11
*** zhouhan_ has quit IRC20:12
*** zhouhan has joined #openvswitch20:13
*** slaweq has joined #openvswitch20:23
*** zhouhan has quit IRC20:30
*** zhouhan has joined #openvswitch20:32
*** mmirecki has quit IRC20:33
*** mmirecki has joined #openvswitch20:36
*** strondeak has joined #openvswitch20:41
*** maciejjozefczyk has joined #openvswitch20:48
*** strondeak has quit IRC21:14
*** mmirecki has quit IRC21:16
*** strondeak has joined #openvswitch21:17
*** dcbw has quit IRC21:52
*** rcernin_ has joined #openvswitch21:56
*** zhouhan_ has joined #openvswitch21:58
*** zhouhan has quit IRC21:58
*** armax has quit IRC22:09
*** rcernin_ has quit IRC22:13
*** slaweq has quit IRC22:13
*** aconstan has quit IRC22:39
*** armax has joined #openvswitch22:55
*** __lore__ has joined #openvswitch23:00
*** _lore_ has quit IRC23:00
*** rcernin has joined #openvswitch23:16
*** bostondriver has quit IRC23:30
*** yamamoto has joined #openvswitch23:34
*** yamamoto has quit IRC23:39
*** __lore__ is now known as _lore_23:45

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!