Thoughts about unidirectional connections (especially in LoRa networks)
Started by nilu96 ·
Hi together! I'm currently working with Reticulum and would like to set up a local LoRa network with standalone transport nodes running microReticulum. I've noticed that, unfortunately, many peer-to-peer connections are unidirectional – Node A can hear Node B, but not the other way around.
The announce system in Reticulum works fine in one way but lets imagine the following situation:
Node A <--> Node B <-- Node C
^ ^
| |
∨ ∨
Node D <--> Node E <--> Node F
Node C sends an announce. The announce makes its way through the mesh using two parallel paths. Node B receives the repeated announce from Node C and from Node E. Node B only saves the direct path to Node C because it is the shortest path to Node C AND NOT the path via Node E. Finally, the announce arrives Node A.
Now, Node A wants to send a packet or establish a link to Node C. The first packet is sent to Node B. Node B acts as a transport node and repeats the packet that is now "addressed" to Node C. Since the connection between Node B and Node C is unidirectional, Node B is unable to send any packets to Node C. The link establishment fails or the packet is dropped.
Node A maybe tries to make a path request but this would just lead to establishing the very same (broken) path again. I think the path_states table might allow for a longer path temporarily, but I think this is originally meant for another purpose. Since the shorter path is still valid (in the receiving direction), the "broken"/unidirectional path would be restored in the long run.
Do you have any thought about that and do you have ideas for solving this issue? I already thought about two mechanisms:
- score different paths: increase the score for transported proof packets, decrease the score for proof timeouts (maybe also link timeouts)
- keeping track of reachable neighbors by listening to recently sent announces with hops=1 and preferring routes with next_hop in reachable_neighbors
Cheers!
Somehow the format got messed up. The right vertical arrow should connect Node C and Node F in both directions :)
I'd like to note that this issue also applies to a malicious transport nodes which only sends out announcements and never lets any other data types through. If the malicious transport node is the only connection between two disconnected networks then obviously not much can be done but even if there is alternative path for data to travel, nodes may see an announcement coming from the malicious transport node first and try to pass data through it, made worse by the fact that this transport node doesn't have the overhead of transmitting other traffic and thus may be faster than most other transport nodes, so a few malicious nodes could damage the network's ability to get a valid two way path between destinations.
Reticulum tends assumes an interface is either bi-directional for all traffic, or dead. So there's a lot of odd edgecases for stuff like this. The simplest protocol would be for the lora nodes one hop away from eachother to periodically (or on first hearing an announcement) to form a link destination between each other and test the packet loss between each other, and if it is unacceptable then to not use each other as possible paths unless there is no better option, but it's possible that a lora node may be just barely audible and then later on the conditions change and now the interface is one-directional again. This is basically the two generals problem on steroids made harder through the limited bit rate of lora. Maybe there should be way of saying to reticulum that "This interface exists but is only 85% reliable on uploading data, only use this if there's nothing better".
For the more abstract and general case of a malicious node as everything a node could do on accident it can also do maliciously and worse. The only kinda effective way I have come up with to solve to this in all cases would be to have Three nodes, Node A, Node B, and Node C. Assuming that they are all directly connected to each other, If node A and C suspect that node B is malicious, they can generate completely new identities on the network, fake that they are further away in hops and then they can attempt to pass data/link/any other type of traffic through that node B to check if it is actually routing traffic as it should. But this Requires node A and C to trust each other fully, though it might be way of doing this without trust. I don't think there's one size fits all solution to this issue either
In all these cases, if a node were to be found to one-directional then they would just be completely removed from pathing and the link would just not exist. It would be nice if reticulum could take advantage of these one-way interfaces instead of just completely ignoring them, specially since for some interfaces, the bitrate in one direction is higher than another (most regular network connections have more download speed than upload). But having a link destination using one path for upload or download or having to somehow deal with path tables for both ways makes everything quadratically and is probably be incompatible with the current system.
This can be easily implemented i think.
Keep track of how many times a peer does not respond to requests while still sending data to us.
If it reach a limit in a specific timeframe then this means the connection is unidirectional(at least temporarily, i.e., LoRa interference or noise etc.).
Mark it as such, and use alternative paths, maybe forgive it after a while and test it again for bidirectional communication.
The mechanism is similar to announce limitation, as you check how many announces are sent in a timeframe and take an action.
The disadvantage is that maybe you use a neighbour node which rely on the "malicious" node as well, but didn't marked it as unidirectional yet, so this process propagate a little bit and may take a while until the "malicious" node is not optimal anymore.
This can be done with probably 10-15 bytes of data and few checks for each request.
Oh I LOVE this question because I'm working on a path request video and this is right up this alley.
Good news: there are a few simple mechanisms that help recover from this, already!
When A tries to establish a Link to C, it'll ultimately fail. At that point, A calls Transport.expire_path(C). That zeroes out the timestamp, then the cull pass that runs later inthe same jobs() tick removes that entry.
Then, A will fire a path request openly. B and D both hear it (and B, which would have marked the path to C it has as UNRESPONSIVE, would not have cleared its cached path entry yet, so it will reply with that path despite it being unresponsive). Both B and D then respond. But, because A has dropped out its path entirely, it will accept whichever shows up first. Let's assume D does. Because they're identical (just the saved copy it got from C originally), despite the shorter hops that B will claim, it will not override the now-establishe dpath through D. Then, with a path via D, the LInk will presumably work and all moves on from there.
Also, if A was a transport node itself, remember that it will forward announces too, meaning B will hear a path to C from A. If that happens to trigger after B has marked the C path UNRESPONSIVE, then in that case even B will also take the -> A -> D -> E -> F -> C route too.
Now if you're thinking "okay but race condition!", more good news. That's exaclty why PATHFINDER_RW exists. It's a half-second jitter in response to path requests, so D has a great shot at winning this.
LXMF (which I assume you're using alongside all this) retries up to 5 times. Very good chance for D to win
Thanks for sharing your thoughts! @welo that’s a good point, malicious nodes even complicate the situation!
I did not know the details about the path request procedure! That’s very interesting. But even tho there is chance to find a bidirectional path, it might be better to have a deterministic way for reliable paths. As I understood it correctly, the functioning bidirectional path also will be dropped again, as soon as Node C sends a regular announce, right?
nilu96 wrote:
it might be better to have a deterministic way for reliable paths.
if only we could haha! that's the thing, this is messy RF. Node D suddenly goes offline, you're out of luck anyway, right? Node B moves and now A can't reach it, then waht? And so on.
I think the very important question here is why is C -> B unidirectional? Is it just at the fade edge? then sometimes it might still get lossy bidirectional traffic. Is node B intentionally a directional link in C's direction? Is node B just doing Rx-side techniques for better sensitivity, creating that imbalance (kinda implies C is near the fade edge too). One really nice thing about Reticulum is adding a node is cheap (in terms of compute, and overall $ cost all things considered). Plenty of hops are possible seamlessly.
It sounds a bit heavy-handed but ironically the best approach may simply be "add another node or two". Or if you're coordinating (which is what it sounds like from your original post; let's ignore malicious actors for the time being), get B and C on a true point to point link or something.
Not to say some tweaks or application-level choices couldn't help smooth those over! But I think the fundamental problem still might lie somewhere else
KenAKAFrosty wrote:
It sounds a bit heavy-handed but ironically the best approach may simply be "add another node or two".
Even if you add a node in between C and B, let's say node Z, wouldn't node B hear an announcement from C first instead of Z anyways, thus then just prioritizing and trying to resolve paths through that unidirectional link ignoring Z and trying to go straight from B to C. If you get to organise the nodes yourself you could maybe control this, but if not in a randomly distributed sea of more and more transport lora nodes in one area, the chance that any one of the receivers is just in the perfect range for the the transmission to be unidirectional increases , and because those links would be the most efficient route for traffic to pass if they were good connections, a lot of traffic will try to go through these links and fail one way or another.
I hear ya, and I promise I'm trying to steelman this side too! Where it gets weirder though is when we lift out of the diagram which is basically modeled as a bunch of point-to-point links, right? In reality we're probably talking the typical dipole donut broadcast from each node.
It's already WAY harder to even create this configuration once you deal with that (and ignoring other 3d geometry or interference; even just assuming "a wide open plain" where each node lives, this is the kind of arrangement it would have to look like to fit that diagram

Notice how even here, C has to have a notably broader reach than the others. and E is nearly hearing it too. IT's just really really hard to create that arrangement in actual all-lora-only-broadcast scenarios.
If C has similar reach as the rest, then it quickly would end up in a spot where it would be bidirectional
Since this is LoRa and on boards we're probably talking about, there isn't really much "rx-sensitivity tricks" going on there, the main variable will just be antenna placement, interfernce, and power (like a heltec v4 having 28 vs 22 for example)
And it's pretty load-bearing that B itself is just reliably, losslessly just always faster to get there in the first place, which seems unlikely for all sorts of reasons from CSMA/CA behavior to moving interference to announce queueing and so on.
THis is just an extreme edge type of case, and I think the final rub is this:
Even if you accounted for all of this, and made explicit adjustments for this exact kind of scenario, that all becomes moot when node A can still end up with a bad link via that path for whatever other reason haha. the joy of distributed systems, right?!
"try, observe, adjust" from A's PoV really is the only way to appropriately handle lossy links, for whatever reason it's lossy. And after first failure you have a working system.
To nilu's point, yes after C announces this could reset. But the announce rate by default is 6 hours.
The one, simplest change I could think of (and note I have not traced any other ramifications of this yet), is that paths that are currently marked UNRESPONSIVE don't get sent back as a response to a path request (which would just allow D to cleanly win on the first re-attempt instead of the jitter dance)
I like your idea to not answer path requests if a path is marked UNRESPONSIVE and instead repeating the path request to heal a path. Even tho this would only heal the path temporarily, this would still be a win – and as you said, in a lora only mesh there is always change and the network has to handle broken paths anyway.
I am still curious what do you think of alternative solutions, like I mentioned in my original post or the idea from Nomad1n0:
Keep track of how many times a peer does not respond to requests while still sending data to us.
If it reach a limit in a specific timeframe then this means the connection is unidirectional(at least temporarily, i.e., LoRa interference or noise etc.). Mark it as such, and use alternative paths, maybe forgive it after a while and test it again for bidirectional communication.
nilu96 wrote:
I like your idea to not answer path requests if a path is marked UNRESPONSIVE and instead repeating the path request to heal a path. Even tho this would only heal the path temporarily, this would still be a win – and as you said, in a lora only mesh there is always change and the network has to handle broken paths anyway.
I am still curious what do you think of alternative solutions, like I mentioned in my original post or the idea from Nomad1n0:
> Keep track of how many times a peer does not respond to requests while still sending data to us.
If it reach a limit in a specific timeframe then this means the connection is unidirectional(at least temporarily, i.e., LoRa interference or noise etc.). Mark it as such, and use alternative paths, maybe forgive it after a while and test it again for bidirectional communication.
Oh sorry don't get me wrong! I'm ALL FOR continually improved orchestration calls like this (read some outputs/metrics, make decisions, adjust). Many or most of which should probably be optional, just because some devices will be more constrained with the size they can manage. So treating these as "optional upgrades or hints" could have a really nice path, definitely
welo wrote:
It would be nice if reticulum could take advantage of these one-way interfaces
There are many cases where there are one-way connections, and there can even be networks where everyone can reach everyone, but connections are only one way (think e.g. of three nodes in a triangle).
Having some laser connections for high bandwidth over a long line of sight, it's half the price to have it only unidirectional. If the main data flow is only in one direction, the other direction could be done by more "conventional" connections. Or if we have a few nodes, build a unidirectional ring to save ressources but still have high bandwidth connection.
With radio, unidirectional connections are also common -- I can think of different scenarios (unintentional and intentional); with MeshCore we have it often enough in practice.
Good for me to now have confirmation that Reticulum does not really work when there are only unidirectional physical connections.
Regards!
dreieck wrote:
@welo wrote:
> It would be nice if reticulum could take advantage of these one-way interfaces
There are many cases where there are one-way connections, and there can even be networks where everyone can reach everyone, but connections are only one way (think e.g. of three nodes in a triangle).
Having some laser connections for high bandwidth over a long line of sight, it's half the price to have it only unidirectional. If the main data flow is only in one direction, the other direction could be done by more "conventional" connections. Or if we have a few nodes, build a unidirectional ring to save ressources but still have high bandwidth connection.
With radio, unidirectional connections are also common -- I can think of different scenarios (unintentional and intentional); with MeshCore we have it often enough in practice.
Good for me to now have confirmation that Reticulum does not really work when there are only unidirectional physical connections.
Regards!
Actually this scenario is pretty tractable! What you're describing is quite a bit different (espeically when you expect it, like the free space optical you mentioned; side note, QR stream as optical data is actually a fully live thing! see https://rns.recipes/forum/showcase/retiqr-an-alternative-reticulum-qrhid-interface)
The main answer to this (I think) we can do even without any core code changes is just a custom interface! It just needs to know to define the inbound/outbound as those different physical mediums.
In your intentional triangle setup, I'll use a laser chain cuz that's cool as hell: it'd just mean that your outgoing bytes are wired up to your laser, and your incoming bytes would come from your photodetector. Would love to make a demo of that some day! (i have to resist the urge now, too many things on plate already haha).
Great question!!
edit: forgot to mention! if you just mean "absorb data from one direction, but never have return data flow", that's completely doable already too! there's an outgoing field on the interface; set that to False and you're set!