Dual network problem – an interesting case from our practice
An interesting network management case with no simple solution
There are nodes based on Raspberry PI. Using carrier boards, every node has two ethernet ports. Nodes connected to two switches simultaneously. Every ethernet port on network has a unique mac address. Also, the switches. Thus with 16 devices, each with two ethernet ports and two cables, we end up with 64 mac addresses. But we also connect the two switches to each other with an ethernet cable and we connect our own computer. Thus 68 mac addresses.
Each Linux node holds an arp table, which tells you which other mac addresses can be reached from each ethernet interface. In this case ALL mac addresses can be reached from every ethernet port. We have setup the two ethernet ports on the Linux node to bundle and have one single IP number. Each Linux node also holds a routing table, the routing table tells you how you can reach all the other IP numbers on the LAN.
For bonding is used the Linux Ethernet Bonding Driver. If used either “miimon” or arp_ip_target to detect if a given local port is alive, it is an all or nothing + global decision that applies to all traffic from the node, instead of the traffic specifically from node 6 to node 9. Specifically due to that limitation was added a link between the switches, so we currently use active-backup as mentioned in the high-availability in a multiple switch topology. Works in various cases as even if node 6 uses the wrong port + switch to try to reach node 9, the traffic gets forwarded via the link between the switches. In an ideal world we would not have this link to avoid a failure in 1 switch affecting the other. If the link between the switches breaks, it does not work at all i.e., some nodes keep using switch 1 and others switch 2 (given point 2 above + how active backup works).
In reality the current configuration is using the same mac address for the slaves. This is because it uses the default of nonde/0 for the “fail_over_mac” bonding driver option. The bonding driver can do so, because for this NIC the driver is the one in control of the mac address.
The problem is that the routing table and arp tables are not updated fast enough when a switch is down or a cable is unplugged. Our Linux nodes send packages to each other 10 times per second, sometimes faster. Thus, the TCP/IP stack should discover within less than one second, that a link which it used before is down and it should try the other option. But this does not happen right now for some weird reason, which we don’t understand.
Is used a Linux setting where bonded the two ports such that both have the same IP address. It works, somehow. It is able to send on one link or the other. If one switch is down it will use the other and vice versa. But if node 6 usually sends through one port to node 9, then this cable gets disconnected. Then node 6 cannot figure out automatically that it now needs to reach node 9 through the other port. I actually don’t know if this is a problem in the switch or the arp table on each node or a fundamental problem with the bundle feature in Linux.
In the end our target is to have a system which can continue to work even if there is a single point of failure in any of the network stuff. It is also our wish to be able to detect this and send a message to the server about where this single point of failure occurred.
Here is a scenario if there is no cable connecting the 2 switches:
- node b had a failure in its port 2 (can also be on the switch side or just a cable that gets damaged)
– node a must use its port 1 to reach node b: node a-port 1 -> switch 1 -> nodeb-port1)
- node c had a failure in its port 1 (can also be on the switch side or just a cable that gets damaged)
– node a must use its port 2 to reach node c: node a-port2->switch2 -> nodec-port2
- node d crashed
My current bonding configuration does not work when both 1 and 2 happen at the same time (can’t decide to use one port to reach node b and a different port to reach node c). Additionally, if 3 happens, a configuration that would survive either 1 or 2, would stop working due to 3 (requires all nodes to be reachable through one of the ports switches).
An ideal configuration would work without issues when all 3 issues are active at the same time. If node d comes back live, it would also be able to resume its connection to it. Additionally, the ideal configuration would survive this 4th scenario that might or not be possible in practice:
- node e has working links to both switches, but switch 1 has an internal failure where traffic from/to node e can only reach a subset of the other nodes that does not include node a
– node a must use its port 2 to reach node e
With a cable between the switches, scenarios 1-3 at the same time work fine (since it does not matter which port node a uses, it can still reach both nodes b and c). However, if the link between the switches dies, it no longer works. Even with the link between the switches working, it also can’t survive scenario 4.
There is no complete solution. All tests had been performed in the virtual environment due to absence of the real test PI’s. It wasn’t possible to check connection between the two virtual “PIs” in accurate and easy way due to the virtual environment limitations. Variant with web-server creation and establishing client-server model on two virtual “PIs” wasn’t tested. Instead, stability tests were performed on virtual “PI’s” ethernet ports, bonding mode 3 (broadcast) showed itself as the most reliable solution. As for now, using bonding mode 3 is recommended, but we need feedback on how it performs on real equipment.
Without bonding mode 3 interconnect between the switches must be used, it is mandatory according do high-availability in a multiple switch topology documentation.