Wednesday, 2024-03-20

f0oGood Morning - Here some preliminary findings on the potential OVS route leakage: https://paste.opendev.org/show/bluBQiReoTozIfNPXnxU/07:50
f0omy immediate guess is that OVS utilizes the default VRF for everything which is why it can ping br-vxlan/br-mgmt/br-OSA* devices07:51
f0owhich means that this issue might not be caused by my specific setup07:52
f0obumpy road, I should not have my IRC client in a test-vm on the OpenStack that I'm fiddling with the routing with08:12
f0obut on the other hand it's immediate show of results lol08:12
jrosserf0o: the only thing i can suggest is that someone else with an OVN setup (i don't have one) can try to reproduce pinging something on the mgmt net from a vm08:35
f0ojrosser: I've done more tests in the meantime08:37
f0oand the conclusion is that OVS/OVN entirely ignore VRF and just put the packet onto the interface that has a matching route bypassing any routing policies08:37
f0oI've "resolved" it by moving _ALL_ interfaces away from the default VRF which lead to OVN and OVS crash, so I added single host-routes back into the default routing table (for all northd hosts and vteps of hypervisors)08:38
f0oThis is not a solution, but it removes the bulk of the management stuff away. I can still ping the VTEPs and Northd IPs from VMs which is bad enough08:38
f0oI think the only correct solution to this is applying iptables/nftables rules to forbid forwarding if the exit-interface is anything that isn't part of the provider-networks08:40
f0obut that can have some side-effects too since it's quite the jack-hammer method08:41
jrosserbut surely this is so fundamental that there must be a strategy from the neutron team for dealing with it08:41
f0ofor your quick test, you should always be able to ping 10.0.3.1 in OSA AIO since that's always fixed and present08:41
jrosseri dont have anything with ovs / ovn so cant reproduce, outside an all-in-one08:42
f0oI just got a full nmap port-scan on 10.20.1.126 which is a northd host and I had to place a host-only route on the OVN Gateway Nodes's default table08:42
f0othis is scary08:42
f0oas an adversary I can quite quickly scan RFC ranges for northd ports (those are fixed ports) and attempt a DoS with some fuzzed packets to get northd to crash or worse08:43
f0ojrosser: the AIO one should have the same issue, if a VM there can ping 10.0.3.1 you know you're vulnerable08:44
f0obecause that's the fixed IP for lxcbr008:44
f0oI'm just now installing virtualbox to get an AIO setup on my box08:44
noonedeadpunkf0o: are you using ovn bgp agent in a setup?08:46
f0onope08:46
noonedeadpunkwell, then we haven't tried vrfs in our setup in fact.... we've tried with the bgp agent, but we quickly found that it would require a patch that it's not merged yet08:48
f0oI dont think this is related to VRFs08:49
noonedeadpunkhttps://review.opendev.org/c/openstack/ovn-bgp-agent/+/90650508:49
f0ojust installing ubuntu into vbox right now to get this replicated into AIO08:49
jrossernoonedeadpunk: i think its more q question of if you can just get at things you would not expect to, from the vm08:49
jrosserthis is not to do with bgp at all08:49
noonedeadpunkbut eventually, I do see vms traffic on aio on br-vlan08:50
noonedeadpunkbut I'd need to turn my head around this issue I guess, as I'm quite slow on understanding networking08:51
jrosserf0o: i think that it's confusing as youve got a lot of terms we don't normally use at all on the openstack nodes, vrf for example08:51
noonedeadpunkwait08:51
f0oI wrote a wall of text yesterday (my) evening with what I suspect is happening08:52
noonedeadpunkso you can ping your mgmt network from vm o_O08:52
jrosserthese are usually things on switches/route08:52
f0oyes I can ping br-mgmt/br-vlan/br-vxlan/lxcbr0 from within a VM08:52
noonedeadpunkthis I can test really quickly....08:52
jrosserf0o: yes i understand, but also you are totally familiar with your setup - so bear with us whilst we try to understand08:52
f0ono worries I try to get this done in AIO so I can share the configs for one-click-replication08:52
noonedeadpunkI have quite big ovn sandbox handy08:53
noonedeadpunklike 7 computes, 3 control planes, standalone net nodes, etc08:53
f0onoonedeadpunk: try pingin 10.0.3.1 from a VM on geneve then08:53
noonedeadpunksome of them are broked with bgp agent though... but shouldn't be an issue08:53
jrossernoonedeadpunk: see line 36 of this https://paste.opendev.org/show/bluBQiReoTozIfNPXnxU/08:54
jrosserthats the interesting part08:54
f0ofor me the OVN GatewayNodes are super happy to just copy the packet into lxcbr008:54
noonedeadpunkwell, I don't have gateway nodes there where I have lxcbr008:55
jrosserso a difference would be that noonedeadpunk has standalone gateway nodes so i expect these are not running lxc08:55
f0onoonedeadpunk: then use your br-mgmt ;)08:55
f0opoint is it puts packets where it shouldnt08:55
noonedeadpunkand eventually, the assumed choice is to have gateway nodes either standalone or on computes with OVN08:55
jrosserbut they must have an interface/ip on the openstack mgmt network08:55
noonedeadpunkon the contrary to OVS08:55
noonedeadpunkthey do, sure08:55
f0oping those then08:55
noonedeadpunkyeah, sec08:56
f0obecaue I can ping OSA management nodes from my VM08:56
f0orun a tcpdump on your gateway nodes to see if it pushes packets. there may or may not be a return path but that does not mean it's not moving the packets into that device08:56
f0obecause when I ping the VTEPs (10.20.8.0/22) I dont see a reply on the VM but the tcpdump shows that the packet was pushed out br-vxlan08:59
f0o08:58:14.965375 br-vxlan Out ifindex 20 86:ec:48:c5:67:54 ethertype IPv4 (0x0800), length 104: 185.243.23.86 > 10.20.8.11: ICMP echo request, id 41, seq 1, length 6408:59
f0othere just isnt a return-path for the VTEP back into the FIP range08:59
f0ojust getting that ubuntu vm started now for the AIO replication08:59
* noonedeadpunk needs to find guniune working net node first :D09:00
f0o:D09:00
jrosserthe AIO has some iptables/nat stuff just to make it more complicated09:00
f0onice09:00
f0o:D09:01
f0oit's an obscure issue because I bet that 99% of the cases those RFC networks dont have a return path for the packet so it seems like it doesnt leak. In my case the Gateway Node are the actual TOR Routers so all hypervisors have a default route to them for connectivity. So there is always a return path (other than for br-vxlan because that is local only)09:02
noonedeadpunkwell, actually on AIO I for sure saw plain traffic on br-vlan from VMs09:03
noonedeadpunkSo basically IF I just add a gateway ip of "public" network of AIO to the br-vlan interface there - it get happily src nated by the node09:04
noonedeadpunkand I'd assume, that nothing would really stop such aio vm from pinging local to the node networks09:05
f0onice virtualbox doesnt run for me -.-09:11
f0owget gets a segfault on a stock ubuntu22.0409:11
noonedeadpunkso I see that https://paste.opendev.org/show/bVemrDPKgYSjWu5uxWSW/09:14
noonedeadpunk10.21.11.52 is a mgmt IP of a gateway node I'm trying to ping from VM with floating IP09:14
f0oand I'm guessing bond0 is the br-ext?09:15
noonedeadpunkyep, it's part of br-ext ovs bridge09:15
noonedeadpunkoh ,well... there's no fip... src nat only... Let me assign fip :D09:15
noonedeadpunk(but I guess it should be same)09:16
f0oshould be the same I think09:16
f0ofor me vlan3012 holds the gateway IP of the FIP range because the router can deliver the packet itself - so it is it's own next-hop09:17
f0ogenev_sys_6081 > vlan3012 (with IP of the gw of neutron's external network) > funky stuff happens here09:17
f0oif I fabricate a packet and send it over the wire to vlan3012 everything seems to be working as expected, the packet gets discarded because no route is found.09:18
noonedeadpunkyeah, so I guess indeed once you add IP on the interface - things can go in a weird direction09:18
f0oif the packet comes from genev_sys_6081 then funky stuff happens09:18
f0oit's like packets from OVS dont obey routing tables09:19
noonedeadpunkI guess I would create a separate routing table and forward all traffic from the public nets to it explicitly09:19
noonedeadpunkas I dont' think you can use vrfs in fact09:19
f0othat's what VRFs do09:19
f0oVRFs in linux are just separate routing tables09:19
noonedeadpunkor wel, at least we didn't manage to get the working nicely with bgp setup09:20
f0ono neither me with ovn-bgp - which is why I skipped it09:20
noonedeadpunkwell, yes, but I guess you need to put an interface to vrf?09:20
noonedeadpunkand in this case bridge interfaces that are leaking are created with ovs/ovn?09:21
f0oyou can slave interfaces into vrf but it's not a requirement, you can also just next-hop from one table into the next09:21
noonedeadpunkyeah, exactly09:21
f0obut yes in this case the leaking (and slaved) interfaces are created by OVS/OVN09:21
noonedeadpunkwe did that for bgp...09:21
noonedeadpunkso I guess that what I meant by not able to use vrf 09:22
noonedeadpunkbut jumping from table should work09:22
noonedeadpunksorry, my networking is not really great, so not always use proper languague to explain myself09:22
f0ono worries I get what you say09:23
f0oI'm attempt to try it out now09:23
f0oobviosuly the solution to this "dont be your own next-hop" because then OVS cant do funky stuff and packets _have to_ obey the VRFs - but that's hardly a solution, more of a workaround to me09:24
noonedeadpunkyeah09:24
f0olike you said earlier, have the gateway nodes on the compute nodes09:24
f0oat the cost of dragging the whole external vlan to all hosts09:25
noonedeadpunkwell, for this specific matter we have standalone hosts designated for gateways09:25
noonedeadpunkas we don't wanna have that on controllers nor on computes09:26
noonedeadpunkbut it's $$09:26
f0obut those designated gateways are just dumb-routers then, because you cant have them actually route anything. They just translate vxlan to vlan and dump it on the wire. very very expensive vteps09:26
jrosser^ same here, i have "separated and simplified" rather than collapse any of these functions together09:26
noonedeadpunkgeneve to vlan, but yes :)09:27
f0obecause the moment those designated gateways could push the packets on L3 you end up in this issue that they will happily push it into br-mgmt09:27
noonedeadpunkand well. we also do have vpnaas, which does spawn network namespaces there09:27
noonedeadpunkwell, unless you inject a rule to use different routing table with high prio, so that main routing table is not available then09:28
noonedeadpunkand some iptables on top :)09:28
f0onoonedeadpunk: but that's exactly what I do09:28
f0ominus iptables09:28
noonedeadpunkhm09:29
f0onoonedeadpunk: https://paste.opendev.org/show/buOWbgSHxagRB00GurO0/09:29
noonedeadpunkbut then if route to mgmt net is not present in that vrf - how it get's to it09:29
f0oI'm running that setup now (https://paste.opendev.org/show/buOWbgSHxagRB00GurO0/) and with this I can no longer ping br-mgmt but I can ping br-vxlan and those host-routes of northd09:30
f0oso it's better than before but it still happily pushes packets into any routes in default table09:30
f0othe 3 host-routes on br-mgmt are northd just as clarification09:31
f0ohttps://paste.opendev.org/show/bciHUaK5qBGTiEX0ng1j/ << the ICMP test showing that northd is pingable but a different host on br-mgmt is not09:32
f0othis is "as good" as I can get it right now without employing nftables09:32
noonedeadpunkwe played with smth like that https://paste.openstack.org/show/bGvFW15cfKdirA6KDkCY/09:36
noonedeadpunkwhere 203.0.113.0/24 is our public network09:37
noonedeadpunkbut yeah09:37
f0oand that worked?09:41
f0obecause VRFs in linux are part of the rule 1000 (l3mdev-table)09:42
noonedeadpunkwell, I'm not sure we had exactly same usecase frankly speaking09:42
noonedeadpunkas there was not vrf per say09:42
noonedeadpunkbut that pretty much insured that vms can't reach pretty much anything that's on the host09:43
f0olet me see that i can replicate that09:43
f0oso with my shifting br-mgmt into MGMT vrf I nuked my entire setup10:03
f0oand I didnt notice10:03
f0oremember when I wanted HAProxy to bind on a specific interface? that's because I wanted br-mgmt into VRF and HAProxy wouldnt utilize it without it. SO yeah I killed HAProxy10:04
f0ototally spaced that10:04
f0obut alright that fire is put out, back to policies10:05
f0onoonedeadpunk: I ran `ip rule add from 185.243.23.0/24 lookup 20` on both routers (table 20 is the IBGP fulltable) but the problem persists, my VM can still ping br-mgmt etc 10:51
f0obasically I replicated your last paste, including the 999 rule prio10:51
f0ohttps://paste.opendev.org/show/bqkxpnVt3JZQcrjHM8iJ/10:52
noonedeadpunkbut how in the world it's getting there?10:56
noonedeadpunkunless you have a route to br-mgmt in 2010:57
noonedeadpunkor "default route" in the table does10:57
f0oNope10:59
f0o`ip route show to match 10.20.0.11/32 table 20` returns empty10:59
f0o`ip r sh table 20 | egrep -e default -e "^10.20.0." -e br-mgmt` also returns empty11:00
f0othe whole box has not a single default route11:00
f0oodd disconnect... I was saying that the whole box has no def routes anywhere and that grepping for br-mgmt/default/10.20.0 in table 20 would return nothing11:01
f0obut this is why I believe that OVS just entirely ignores the kernel routing tables and just moves packets into the interfaces itself11:02
f0oI cannot explain it otherwise11:03
noonedeadpunkActually... I think you might be right here. As in order for OVS respecting kernel routing, you need to explicitly "eject" traffic through defining a flow in OVS11:07
noonedeadpunkI think this is actually what ovn-bgp-agent is doing to make things work fwiw11:07
noonedeadpunkSo I guess I had also a flow `ovs-ofctl dump-flows br-ext` like `cookie=0x3e7, duration=489002.621s, table=0, n_packets=166709, n_bytes=10220943, priority=900,ip,in_port="patch-provnet-0" actions=mod_dl_dst:1a:f8:a1:5c:d0:43,NORMAL`11:09
noonedeadpunkwhere `1a:f8:a1:5c:d0:43` was mac of br-ext bridge in kernel space11:10
noonedeadpunkbut yeah11:10
noonedeadpunkthis is already a mess11:10
noonedeadpunkI'm not sure this is good way to go anyway11:10
f0o_this sure is a mess11:25
f0o_but it might make sense to force OVS to flush onto a wire so that the kernel can take over11:26
f0o_lets see if I can employ nftables without getting too much of a throughput pernalty11:27
noonedeadpunkI believe that kernel routing is not being used for purpose to enable path acceleration (like dpdk) and not have a penalty/limitations of kernel11:40
f0o_yeah I think so too11:40
noonedeadpunkbut potentially it's better to talk to neutron folks about that, as they know way more about ovn11:40
f0o_well iptable -t filter -A FORWARD -o br-mgmt -j DROP works11:40
f0o_doesnt work for lxcbr0 tho because OSA adds a -o lxcbr0 -j ACCEPT11:41
f0o_but I can likely inject some -s check above that accept and drop it early11:41
f0o_wondering if those DROPs should be something that OSA wants to add as safeguard11:42
noonedeadpunkpotentially - yes, but then you can also set `lxc_net_manage_iptables: false` and then osa won't inject any iptable rules11:44
f0o_ngl life was easier with linuxbridges xD11:44
noonedeadpunkit really was...11:46
noonedeadpunkfwiw, iptables rules are added here: https://opendev.org/openstack/openstack-ansible-lxc_hosts/src/branch/master/templates/lxc-system-manage.j2#L93-L11011:46
f0o_ok idk what I did but this is very unstable connection right now11:54
f0o_let me revert everything11:54
f0o_I think this issue that I'm having is just an oversight from neutron-team in implementing OVS. Because coincidentally OVS never supported full-tables because it would crash https://github.com/openvswitch/ovs-issues/issues/185 - But I talked to one of the OVS devs and I got a patch that made OVS work with full tables by only listening on route changes in the default table.11:59
f0o_I think this issue that I'm having is just an oversight from neutron-team in implementing OVS. Because coincidentally OVS never supported full-tables because it would crash https://github.com/openvswitch/ovs-issues/issues/185 - But I talked to one of the OVS devs and I got a patch that made OVS work with full tables by only listening on route changes in the default table.11:59
f0o_so I can see how neutron just dismissed the option that a OVS gatewaynode could be a full-fledged router12:00
f0o_and only focused on VTEP<>L2 bridging for GatewayNodes12:00
f0o_just theorycraftig12:00
noonedeadpunkyeah, I guess that's pretty much true. as potentially they left this usecase for ovn-bgp-agent12:01
noonedeadpunklike if you want getway to be a router - you want bgp12:01
noonedeadpunkor will use bgp anyway12:01
f0o_that is pretty much true, unfortunately I simply cannot get ovn-bgp-agent to run12:02
f0o_I've broken my head over it for a week and it just didnt worked well12:02
noonedeadpunkWell, I got it working12:06
noonedeadpunkand in relatively reliable way, though it was /o\ experience overall I guess :D12:07
f0o_did you use your public ASN all the way (IBGP all the way) or did you use private-ASN for bgp-agent<>routers ?12:10
noonedeadpunkbut we scraped that, as the only way FIP to work with non-DVR scenario is through SB DB driver12:10
noonedeadpunkI think we used public asn on /28 subnets12:11
f0o_because I kept running into an issue that IBGP along the whole path would create split-brain issues and attempting to rectify it with route-deflectors was a huge pain to the point where I just dropped it12:11
noonedeadpunkAs NB DB driver simply does not announce FIPs on the gateway nodes, but rather on computes where VMs are running which is wrong12:12
f0o_and adding a private-ASN in the middle of the path would create odd advertisements to our transits, which is correct as it would show "MYASN 65001" in the path and become a Bogon12:12
noonedeadpunkand I've submitted bunch of bugs reports12:12
f0o_:D12:12
f0o_I also ran into that but assumed that it was how it's supposed to be12:12
noonedeadpunkand then also there was bug in nb db that it did not withdraw announcements from frr12:12
f0o_Compute -> Router12:12
f0o_haha12:13
noonedeadpunknah, sb driver does that properly12:13
f0o_so alright ovn-bgp-agent not a solution for me still12:13
f0o_feels like I'm running circles here12:13
f0o_I think I will just do iptables and run some throughput tests and that be it12:13
noonedeadpunkSo eventually we used public ASN for announcements and ebgp-multihop12:14
noonedeadpunkand public net for tenants was obviously a different one that's used for announcements12:14
noonedeadpunkyeah, anyway12:15
f0o_as much as I would love full-path bgp it does sound like more pain than it's worth at the current state12:15
noonedeadpunkwe still picked just stupid l2 after all, and then do magic on leafs12:15
noonedeadpunkit really is, imo12:15
noonedeadpunkand then there's still no support for multiple VRFs still12:15
noonedeadpunkbut when it will be for NB it's back to point that you get FIP announced from computes12:16
f0o_my brain is spinning but I guess that's the fever more than anything else12:17
f0o_I got iptables to work now, easy peasy.12:18
f0o_verified it as working too12:18
f0o_not ideal solution but it does work until maybe ovn-bgp-agent matures12:18
f0o_soooo iptables dont work actually12:56
f0o_haha12:56
f0o_nvmd typo12:58
f0o_I'm literally not seeing the forest from all the trees now12:58
opendevreviewMerged openstack/openstack-ansible-rabbitmq_server master: Add support for the apply_to parameter for policies  https://review.opendev.org/c/openstack/openstack-ansible-rabbitmq_server/+/91071218:55

Generated by irclog2html.py 2.17.3 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!