Thursday, 2021-05-13

*** anilvenkata has quit IRC03:55
*** hexa- has quit IRC04:20
*** hexa- has joined #openvswitch04:22
*** hexa- has quit IRC04:29
*** hexa- has joined #openvswitch04:30
*** ralonsoh has joined #openvswitch04:31
*** rcernin has quit IRC05:13
*** moldorcoder7 has quit IRC05:14
*** rcernin has joined #openvswitch05:19
*** avishnoi has quit IRC05:20
*** avishnoi has joined #openvswitch05:22
*** moldorcoder7 has joined #openvswitch05:41
*** anilvenkata has joined #openvswitch06:24
*** slaweq has joined #openvswitch06:26
*** anilvenkata has quit IRC06:49
*** mdgray has joined #openvswitch07:13
*** avishnoi has quit IRC07:17
*** elvira has joined #openvswitch07:24
*** avishnoi has joined #openvswitch07:27
*** __lore__ has joined #openvswitch07:35
*** _lore_ has quit IRC07:35
*** __lore__ has quit IRC07:37
*** _lore_ has joined #openvswitch07:40
*** dholler has joined #openvswitch07:58
*** __lore__ has joined #openvswitch08:08
*** _lore_ has quit IRC08:08
*** jaicaa has quit IRC08:36
*** avishnoi has quit IRC08:58
*** avishnoi has joined #openvswitch09:00
*** istokes has joined #openvswitch09:00
*** blahdodo has quit IRC09:26
*** dholler has quit IRC09:27
*** blahdodo has joined #openvswitch09:30
*** avishnoi has quit IRC09:46
*** anilvenkata has joined #openvswitch09:47
*** avishnoi has joined #openvswitch10:08
*** anilvenkata has quit IRC10:11
*** __lore__ has quit IRC10:27
*** _lore_ has joined #openvswitch10:54
*** anilvenkata has joined #openvswitch11:51
*** avishnoi has quit IRC12:06
*** avishnoi has joined #openvswitch12:07
*** anilvenkata has quit IRC12:27
*** anilvenkata has joined #openvswitch12:27
*** anilvenkata has quit IRC12:33
*** subuk has joined #openvswitch12:44
*** bostondriver has joined #openvswitch12:46
*** links has joined #openvswitch12:53
*** avishnoi has quit IRC13:15
*** anilvenkata has joined #openvswitch13:18
*** rcernin has quit IRC13:19
*** dcbw has joined #openvswitch13:34
*** avishnoi has joined #openvswitch13:37
*** psahoo has joined #openvswitch13:38
*** _lore_ has quit IRC13:42
*** _lore_ has joined #openvswitch13:45
*** anilvenkata has quit IRC13:47
*** anilvenkata has joined #openvswitch13:52
*** KpuCko has quit IRC14:14
*** anilvenkata has quit IRC14:44
*** fbl has joined #openvswitch14:48
*** _lore_ has quit IRC15:44
*** elvira has quit IRC16:02
*** links has quit IRC16:32
*** psahoo has quit IRC16:44
*** _lore_ has joined #openvswitch16:49
*** blp has joined #openvswitch17:14
mmichelsonHi everyone17:15
mmichelson#startmeeting ovn_community_development_discussion17:15
openstackMeeting started Thu May 13 17:15:30 2021 UTC and is due to finish in 60 minutes.  The chair is mmichelson. Information about MeetBot at http://wiki.debian.org/MeetBot.17:15
openstackUseful Commands: #action #agreed #help #info #idea #link #topic #startvote.17:15
openstackThe meeting name has been set to 'ovn_community_development_discussion'17:15
mmichelsonI changed the meeting name back to the original so the link in the topic should work now :)17:15
_lore_hi all17:16
ihrachyso/17:16
*** zhouhan has joined #openvswitch17:16
blphi!17:17
mmichelsonLast Friday was soft freeze for 21.06. That means that any new feature patches that have been posted since then are not candidates for 21.06. I've been trying to do some code review this week. There are a lot of patches out there17:17
zhouhanhi17:17
blpI see that Ihar asked a question on the list about ddlog debugging. I'll try to look at that today.17:17
mmichelsonIncluding a 27 patch series from a certain someone...17:17
ihrachysthanks :)17:17
mmichelsonI'm also working on v8 of the ARP and floating IP fixes series.17:18
mmichelsonThe next version is going to be quite different based on feedback I received from dceara and numans17:18
mmichelsonAnd that's really about all I can report.17:18
blpI don't think anyone should feel obligated to review that whole 27-part series. I try to order these things so that a review of any number of patches starting from the beginning still provides value.17:19
blpSo, if you can review the first patch, or the first 5 patches, or whatever, that's great.17:19
mmichelsonblp, that's a good point. When I did my first pass, I focused more on the patches at the beginning17:19
blpI post the whole series because the whole thing is ready. It probably looks intimidating that way, though.17:20
mmichelsonblp, also I'm kind of still in the mindset with ddlog patches from you that I'll look at them, and do my best to understand them, but I'm not in a good position to critique them, especially when it comes to best practices.17:20
blpIt takes some experience.17:22
blpI think that the actual ddlog changes in the series take less review and are less risky than the other patches.17:22
mmichelsonYeah with a lot of them, it's easy to see what's different, but I have to just take it on trust that, for instance, interning fields is actually improving performance.17:23
blpI did provide measurements.17:23
mmichelsonI could run a test of my own to verify, but I don't have the instinct just from looking at code to go "ah yes, that'll do it!"17:24
mmichelsonAnyway, this digression has gone on a bit. Anyone else, feel free to share.17:25
_lore_can I go next? quite fast17:25
_lore_this week I worked on a I-P issue for localport, actually when we delete a localport for the ovs bridge the flows in table 65 are not updated17:26
_lore_I posted a fix today17:26
mmichelsonExcellent17:27
_lore_I would like to ask to zhouhan about this commit: db41da34323c17:27
_lore_inc-proc-eng: Call clear_tracked_data before recompute.17:27
_lore_because in my case it triggers some issues17:28
zhouhan_lore_: it was a refactor from what was already used in one of the engine handlers17:29
zhouhan_lore_: to make it generic17:29
_lore_in particular when the update for the ovs_interface table is "lost" by this commit since the ct_zone node trigger a full recompute and clear the tracked data17:29
_lore_zhouhan: got my point?17:29
*** KpuCko has joined #openvswitch17:30
zhouhanIn theory, recompute should not need any tracked data, right?17:30
zhouhanI will take a look at the issue in your case17:30
_lore_zhouhan: thx17:31
_lore_the flow is:17:31
_lore_ovs_interface change (we remove the localport) --> physical_dlow_change_ovs_iface_handler --> en_ct_zone full recompute --> flow_outpu_physical_flow_changes_handler17:32
zhouhanI did notice some dependency problems in ovn-controller, which are potential issues. Not sure if they are related to your case. I was planning to fix them as soon as I get some time.17:33
_lore_in flow_outpu_physical_flow_changes_handler we do not run physical_handle_ovs_iface_changes() since ovs_ifaces_changed is set to false by the full recompute of the ct_zone17:33
zhouhanthanks, I will take a note on this17:33
_lore_so I had to changes:17:33
_lore_1- revert your commit17:33
_lore_2- ovs_ifaces_changed = true; in en_physical_flow_changes_run17:34
_lore_I went for the second one17:34
_lore_that's all from me17:34
_lore_ah no, I posted a trivial patch17:35
_lore_http://patchwork.ozlabs.org/project/ovn/patch/fdb9df310c76168f7813a1c75dd3ad26bb531da6.1620849904.git.lorenzo.bianconi@redhat.com/17:35
zhouhanen_physical_flow_changes_run is very tricky. numans has a patch that removes it, and I have been reviewing it and some comments are still open17:35
_lore_thx17:35
_lore_that one is an issue hit by openstack17:36
numanszhouhan, Sorry. I'm not able to address your comments yet. Hopefully I'll address them and post the patch next week17:39
zhouhannumans: no worries.17:39
zhouhanI have a quick update17:40
zhouhanA small patch pending for review: https://patchwork.ozlabs.org/project/ovn/patch/20210510215938.369355-1-hzhou@ovn.org/17:41
zhouhanIn fact dumitru reviewed it but he also helped on adding the ddlog part fix, so I added him as co-author. Better that someone else take a quick look17:42
imaximetszhouhan, AFAICT, numans acked it already.17:42
mmichelsonzhouhan, looks like numans acked it17:42
zhouhanOh, sorry, let me check. Maybe I just need dumitru's signoff17:42
mmichelsonhe signed off too :)17:42
zhouhan:D17:43
zhouhanGreat then. Nothing else to update17:43
*** ralonsoh has quit IRC17:43
mmichelsonCool, anyone else?17:44
ihrachysok can I?.. real quick :)17:44
imaximetszhouhan, I'm wondering if we can add an strstr(actions, "_MC_") to the lflow_add function and assert?17:44
ihrachysmore of a heads up re: https://mail.openvswitch.org/pipermail/ovs-dev/2021-May/382967.html two things there: 1) there are test suite failures in master so don't be scared if you see them and 2) looking for tips on how to debug ddlog (mis?)behavior, but blp already said he'll take a look so thanks for that.17:45
zhouhanimaximets: it may be helpful. But I really want to fix the issue in I-P, so that we won't need that workaround in northd.17:46
imaximetszhouhan, sure.17:46
zhouhanihrachys: thanks for digging into it. You also mentioned the ACL priority issue for the allow-stateless patch. Just want to know are you working on a fix?17:47
ihrachysyes17:47
zhouhanok, thanks. It seems to me we will have to add more ACL flows in the pre-acl stage to fix the priority problem, right?17:48
ihrachyssome flow number increase may induce though, need to try to keep it under control17:48
ihrachysyou are right, at least my initial thinking17:48
*** mdgray has quit IRC17:48
blpDoes anyone have comments on renaming the "master" branch to something else? I didn't receive any feedback via email.17:49
zhouhanblp: sounds good to me17:49
blpMaybe, when I get a chance, I will take a stab at writing up a commit to do it for OVS.17:50
mmichelsonblp, I'm fine with it. I noticed numans has been calling it the "main" branch on the ML lately17:50
imaximetsblp, I missed this email somehow, found it today in the archive.  Sounds OK to me.17:50
ihrachyswill master be an alias or smth? thinking about 3parties pulling master in CI.17:51
imaximets"main" seems to be a default choice.17:51
zhouhanihrachys: probably we can make sure the extra flows are needed only if there are mixed (allow-related and allow-stateless) ACLs, so it won't do any harm for existed usecases.17:52
blpThere is some support for forwarding the old name to the new one, but it doesn't work for everything.17:52
ihrachyszhouhan: right, what I meant saying to keep it under control. though may be harder to do in e.g. ddlog, maybe not though, not much experience with that17:53
ihrachyszhouhan: it's sad matches are not structured but text, otherwise we could compare sets and detect where a conflict may happen.17:55
ihrachysblp: so maybe trying that if it's easy, and giving an official heads up of several weeks17:56
mmichelsonOne nice thing about the master branch is that there are likely a lot of places that don't even specify it since it's the default. But you can't rely on everyone to do that.17:57
blpihrachys: Is that about master->main?17:57
ihrachysblp: yes17:57
mmichelsonI'll have to retrain some muscle memory regarding rebasing branches, but eh, no biggie.17:58
blpOK. I will make a mental note to work out the details.17:58
zhouhanihrachys: for "it's sad matches are not structured but text, otherwise we could compare sets and detect where a conflict may happen.", could you explain more?18:00
mmichelsonzhouhan, I can relate to what ihrachys is talking about. ACL and router policy matches are just strings, rather than parsed expressions18:01
ihrachysin the test case in question, there are two rules with different priority, one for allow-stateless and another for allow-related for the same match. if we could somehow safely detect that matches are overlapping / the same, we could then skip adding some flows in same cases.18:01
blpIt's difficult to make structured relational data as powerful as a language.18:01
ihrachysyes I don't even know it's possible18:02
blpIt's better if you want to have rigidly structured features, but that wasn't the original goal for OVN.18:02
zhouhanihrachys: ok, so you meant analysing the ACLs in ovn-northd instead of adding them directly to flows18:03
ihrachysyes, doing some pre-processing for example to detect redundant ACLs18:04
ihrachysbut that's too much to ask, I didn't mean it like an actual proposal :)18:04
mmichelsonI've completely lost track of whose turn it was and who all has gone at this point. Anyone left?18:06
imaximetsI have a small update18:06
imaximetsI mostly worked on trying to fix CI.  GHA strted to put random text to /etc/hosts, so the final fix was to remove this stuff.18:07
imaximetsNow CI seems to work fine.18:07
zhouhanthanks imaximets!18:08
mmichelsonimaximets, that is so strange. I love how your fix was to remove the line that says "don't remove this line"18:08
zhouhanlol18:08
imaximetsnumans, accepted my patch to combine ARP responder flows.  dceara reviewed my stream record/replay and 2-Tier ovsdb series.  I'll follow up on them later.18:08
imaximetsmmichelson, I was in a bad mood. :)18:09
mmichelsonimaximets, :)18:09
imaximetsI also have a question:18:09
imaximetscontainer management software likes to use SIGTERM in order to stop containers.18:10
imaximetsbut OVS-based applications doesn't perform a fully gracefull shutdown on SIGTERM.18:10
imaximetsi.e. northd will not release the lock and ovsdb-server will not transfer leadership.18:11
blpovsdb locks get released automatically when the connection drops18:11
imaximetsSo, I'm thinking, maybe we need to catch SIGTERM from the daemon, set exiting=true and re-raise at the very end?18:12
blpleadership should transfer quickly when the other servers see the connection dropped (it would be better to transfer it beforehand though)18:12
blpIt's a good idea to handle SIGTERM more gracefully.18:13
blpI think we have some code for that...18:13
imaximetsblp, yes, the point is that it's probably better to terminate gracefully instead of relying on a failover.18:13
blpOh yes, lib/fatal-signal.c. It doesn't address this use case perfectly and will probably need an update.18:14
imaximetsblp, we can register a custom handler and in the very end call a default handler form the fatal-signal.c and execute fatal_signal_run().18:14
blpA custom signal handler you mean? Yes, fatal-signal does that already.18:15
imaximetsblp, yes.  fatal-signal doesn't register its own handler if it was already registered by the daemon in the main code.18:16
blpimaximets: Yes. The intention behind that was different (it was meant to preserve signals that had been set to SIG_IGN by the process that exec()'d the daemon) but it has that effect.18:18
blpfatal_signal_run() is called multiple places automatically so that you don't get a daemon blocking somewhere unexpected and failing to die.18:19
imaximetsblp, ok.  I'll think a bit more about this and will, probably, prepare some patches.18:19
blpSignals are one of the most awful parts of POSIX.18:21
imaximetsThere is one more problem around SIGTERM:  sometimes raise() that called to re-raise the original signal and kill the process fails.18:21
imaximetsin this case fatal_signal_run() falls into OVS_NOT_REACHED() that calls abort.18:22
blpimaximets: How/when would that happen?18:22
blpThis is a new case for me.18:22
imaximetsbut raise(SIGABRT) fails inside the glibc and glibc ends up with ABORT_INSTRUCTION that effectively triggers a segfault to kill the process.18:23
blpWTF?18:23
imaximetsblp, this happens sometimes during some complex upgrade of containers alng with the host OS, so I don't really know what the hell is happened and why raise() fails.18:24
imaximetsI suspect some glibc incompatibility or kernel issue.18:24
imaximetsblp, you can find some backtraces here: https://bugzilla.redhat.com/show_bug.cgi?id=195703018:25
openstackbugzilla.redhat.com bug 1957030 in ovsdb2.13 "core dumps for ovsdb-sever and northd seen in OCP 4.7->4.8 upgrade" [High,New] - Assigned to ovs-team18:25
blpIf signal(SIGTERM, SIG_DFL); raise(SIGTERM); fails, then all bets are all, everything's fucked.18:26
imaximetsI don't think that we can fix that, but we at least could call VLOG_FATAL and avoid segfault with coredump.18:27
imaximetsAnd we could find some more information by printing errno.18:27
imaximetsVLOG_FATAL will result in exit(EXIT_FAILURE)18:28
blpI think I want a segfault with coredump, it's the only way this will ever be debuggable.18:28
imaximetsBut we pretty much sure that segfault is because of the failure of raise().  IF raise() failed we will have a segfault in the end, and debugging of glibc is not really useful.18:29
blpIf raise() fails, why are we confident that exit() will work?18:31
imaximetswe're not.  But, if it will not work, we will reach OVS_NOT_REACHED and will have a segfault anyway.  So, it's just an extra attempt to save a "normal" termination.18:33
blpIt shouldn't be a normal termination, it should be a signal termination.18:35
blpThere's a bug here and it's not our bug. Can we get that bug fixed?18:37
mmichelsonblp, we can report it, but in the meantime it would be best if we could make our stuff try to avoid triggering it if possible.18:38
imaximetsThat would be ideal to fix, of course.18:38
*** acidfu has quit IRC18:38
imaximetsFor now, I guess, we can at least print an error if raise() failed and allow it to reach the OVS_NOT_REACHED.  This should not make any harm.18:39
*** acidfu has joined #openvswitch18:39
imaximetsThe issue itself should be reported to appropriate project for investigation.18:40
imaximetsThat's it form my side.18:40
blpLogging an error seems reasonable. Or we could VLOG_ABORT.18:41
imaximetsblp, yep.  More explicit version of OVS_NOT_REACHED. :)18:42
imaximetsSorry, my small update took 35 minutes.18:46
mmichelsonlol18:46
blpI think everyone is gone now, but... anyone else?18:46
mmichelsonI think we're done. 90 minutes this week, wow.18:47
mmichelsonBye everyone!18:47
imaximetsbye18:47
mmichelson#endmeeting18:47
openstackMeeting ended Thu May 13 18:47:15 2021 UTC.  Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)18:47
openstackMinutes:        http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2021/ovn_community_development_discussion.2021-05-13-17.15.html18:47
openstackMinutes (text): http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2021/ovn_community_development_discussion.2021-05-13-17.15.txt18:47
openstackLog:            http://eavesdrop.openstack.org/meetings/ovn_community_development_discussion/2021/ovn_community_development_discussion.2021-05-13-17.15.log.html18:47
*** blp has left #openvswitch18:47
*** thaller has quit IRC18:50
*** thaller has joined #openvswitch18:50
*** istokes has quit IRC18:53
*** donhw has quit IRC19:00
*** donhw has joined #openvswitch19:37
*** donhw has quit IRC19:44
*** warewolf has quit IRC20:09
*** warewolf has joined #openvswitch20:09
*** donhw has joined #openvswitch20:48
*** slaweq has quit IRC20:49
*** fbl has quit IRC20:49
*** zhouhan has quit IRC21:02
*** dcbw has quit IRC21:54
*** bostondriver has quit IRC22:02
*** aconole has quit IRC22:23
*** rcernin has joined #openvswitch22:57

Generated by irclog2html.py 2.17.2 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!