Monday, September 28, 2020

900MHz 802.11ah WiFi modules

 Hmmm... So it looks like there are the first 900MHz wifi modules, which claim 1 - 1.5km range outdoors, with data rates of megabits to tens of megabits in many circumstances.  The module I have found is this one:

https://www.silextechnology.com/connectivity-solutions/embedded-wireless/sx-newah

The only problem I see so far, is that there is no comment as to whether they can do ad-hoc mode, or only AP or station mode.

If they can do ad-hoc mode, then this is potentially very interesting for use in Serval Mesh Extenders.

More when I discover it... But if anyone else has experience working with these, I would love to hear from you.

UPDATE: It looks like it uses an NRC7292 chipset for the 802.11ah, for which source code is available:

https://github.com/newracom/nrc7292_sw_pkg

They already have monitor mode implemented in the chip, it seems, so it looks like it *should* be possible to implement ad-hoc mode on it.

Thursday, September 24, 2020

Chasing down the last LBARD HF TX queue bugs and Initial Performance Graphs

To recap, in the last blog post I found and fixed a lot of bugs in LBARD over HF data modem operation.  This has taken much longer than hoped, in part due to the 4 to 5 hours it takes to do a complete run.

The last fixes I discussed in the previous post have resolved the remaining TX queue shuffling bugs, so far as I can see. But in some runs I am still seeing that a few bundles are not transferred in one or both directions.  These are typically among the lowest priority bundles, but not strictly the lowest priority bundles.

For example, in the last test run, the following bundles are not transferred:

peer4random79

peer4random94

peer4random96

The files are named in increasing file size, from peer4random00 to peer4random99.  Thus the 94 and 96 files are among the lowest priority bundles, and the 79 file is reasonably close.

The first step to tracking down what happened with them is to get the bundle IDs (BIDs) that correspond to these files:

peer4random79 - 6CE2CB42* - bundle #21 on the sending side

peer4random94 - 469E2B18* - bundle #6 on the sending side

peer4random96 - 338FC557* - bundle #4 on the sending side

Armed with that information, I can see if those bundles get transferred at all.  Let's start with the 79 file.

At 17:53.27 the sender first realises that the recipient does not have the bundle. However, the TX queue is already full, so it doesn't make it onto the TX queue.

At 18:28.39 it gets queued again, and this time there is space in the TX queue, and it does get queued in position 3 in the TX queue.

At 18:28.39 bundle #66 gets queued, which has a higher priority, so our bundle is demoted to position 4 in the TX queue.

At 18:29.06 bundle #53 gets queued, which also has a higher prioity, so our bundle is now demoted to position 5 in the TX queue.

At 18:29.06 bundles #27, #71 and #56 also get queued, and our bundle is now in position 8.

At 18:29.29 bundles #51 and #55 get queued, and our bundle is now in position 10. One more higher prioity bundle, and it will get bumped from the TX queue altogether until the tree-sync process gets restarted.  This happens immediately, as bundle #59 gets queued.

At 19:06.57 our target bundle, #21, gets requeued, however the TX queue is at that time already full of higher-priority bundles, so does not make it ont othe TX queue.

At 19:40.05 it does finally make it back onto the TX queue in position 8.

At 19:40.14 bundle #41 gets queued, bumping our bundle to position 9.

Then at 19:40.49 bundle #27 gets queued, bumping our bundle to position 10, followed immediately by bundle #23, which causes our bundle to get bumped from the TX queue.

Our bundle is next attempted to be requeued at 20:13.42, but the TX queue is already full of bundles with equal or higher priority, so it doesn't get queued.

The next attempt is at 20:43.25. And this looks like it might be the problematic instance. The TX queue is not full, and our bundle has a higher priority than the TX queue's entries, and also compared with the bundle currently being sent.  As a result, it is immediately selected for transmission.  However, then bundle #28 also gets queued for sending, which bumps #21 from being the bundle currently being sent.  However, there is no code that then puts the bundle into the TX queue. 

As the TX queue had not overflowed since the last restart of the tree-sync process, no flag is raised to require the tree-sync to restart one last time, and so this bundle will never finish being sent.  And indeed this is what happens. Similar things happen for the other two bundles.

So the fix is relatively simple: When de-queuing a bundle that should still get sent, we need to insert it into the TX queue.  

I'm heartened that this fault seems to describe the last situation where this problem of bundles not being forwarded can occur.

The only immediate problem I can see, is that there is already code that is supposed to do this.  It turned out the problem was that I attempted to re-queue the bundle being sent before removing it as the bundle being currently sent. The queuing logic was noticing this, and not adding the bundle back into the queue, because as far as it was concerned, the bundle was already being sent.  

By reordering this sequence of events, I was able to solve the problem, and it now reliably delivers every single bundle.  Together with all the performance fixes, we now have things at a state where they work reliably, which is great. So lets get some basic performance statistics from these runs:



We can see that there is considerable variation in the transmission times (and thus speed), with respect to the bundle sizes.  That is, for bundles of about the same size, the delivery time can vary by a factor of 5 or so.  

This is not unexpected, as the prioritisation of bundles during the tree-sync protocol can cause some bundles to be fully or partially transmitted, before tree-sync realises that there is a higher priority bundle. When this occurs, the bundle that pre-empts the lower-priority bundle will appear to take longer to deliver, because part of one or more other bundles will have occurred during the time accounted to the higher-priority bundle. Conversely, if a bundle has been partially transferred before being pre-empted by a higher priority one, the pre-empted bundle will appear to deliver more quickly, because only the time from when it resumes transmission will be accounted.

Otherwise, we see that the average data throughput, excluding the approximately 400 bytes of Bundle Manifest, improves gradually with increasing bundle size. For larger bundles we see transfer speeds of around 40 bytes per second, which is about half of the channel capacity of ~93 bytes per second we discovered previously.  This is still not ideal, but given that it includes all the overheads for the transmissions to occur, to be prioritised and for synchronisation of bundle lists between the two parties, its not too bad.  Note that these figures are for one direction only, with similar results for the opposite direction.

If we look at the overall bundle transfer times, we see that this varies between about 50 and 200 seconds, only somewhat influenced by the size of the bundle's payload.  This corresponds to message delivery rates between 2x (3600/50) = 144 and 2x (3600/200) = 36 bundles per hour over an HF link. That is, between ~850 and ~3,500 bundles per day.  Again, while higher figures would be welcome, being able to deliver one to a handful of text messages very few minutes would seem to be sufficient for delivering a useful level of communications between small isolated communities.  Round-trip message send and reply cycles would likely be of the order of 2 to 10 minutes, which again is sufficient to facilitate useful communications.

So at this point we have reliable and reasonably performant delivery of Serval Mesh traffic over Codan HF data modems.  The next step will be to try to setup a real HF link with two radios spaced many kilometres, or perhaps a couple of hundred kilometres apart, and see how it performs in practice with some real message traffic.

Otherwise, I'll now be turning my attention to implementing LoRa radio support for Mesh Extenders to help make lower-cost and more internationally portable Mesh Extenders, so that we can experiment with more interesting mixed-radio networks.  Because the Wi-Fi links between Mesh Extenders are self-organising, this will allow such mixed-media networks to be easily built. It will then be interesting to see how the system behaves with such networks.

After that, I will start looking at implementing "affinity tags" so that when we start connecting large numbers of such networks together, that we can prevent traffic being replicated places it doesn't need to go.  We should also finally implement the logic for detecting whether senders or recipients are likely to be located on the far end of an HF link, so that we can also factor that into the prioritisation.

Monday, September 21, 2020

Continuing working on the LBARD HF protocol efficiency problems

Ok, back from the last post where I was thinking about some longer-term future approaches, to trying to get the current code working as well as it can.

The problem I am seeing at the moment, is that the same blocks are being sent repeatedly, while other blocks are not being sent at all.  

One problem was that pieces of the manifest were being marked as sent, when they hadn't.  This was due to an out-by-one error in the comparison of the extent of a block being sent, compared with the actual data. The effect was when a block was sent, both it, and the following block were marked as having been sent.

This seems to be a common factor in many of the slow transfers.  It has only started showing up now, because we were previously testing with payload-less bundles, and so the manifest would keep being resent, which would then have a higher chance of re-sending the missing part through the normal cyclic re-transmission.

So let's set a test run going, and see if it solves the problem.

Well, its helped, but close examination of the log files reveals that there are still pieces being counted against the wrong blocks, due to a duplicate code-path.  I've now de-duplicated that.  I also notice that sometimes manifest or body pieces would be counted against the other as having been sent, which of course is also causing problems.  So I have fixed those as well. 

On the plus side, its pretty how these sorts of bugs can be causing these problems: If we are marking things that have not been sent as having been sent, then we are not going to send those any time soon.  Instead, we will keep sending the same pieces over and over, because we think we haven't sent them.

I'm also seeing the situation that the sending of a bundle can get reset part way through, which then makes it forget the transfer bitmap information for that bundle, resulting in it re-sending pieces it has already sent. This might be because an old feature that was built in that causes the tree-sync process to restart periodically, to avoid lock-up in the protocol, which in turn causes the selection of the currently being sent bundle to potentially change.  However, the root causes for those lock-up have all been fixed now, so that can probably be removed, if it is still there.  I'll investigate that while the new run with the previous bug fixes in place runs.

Yes, it now looks like the resetting of sending bundles is the most significant remaining problem. As mentioned, this causes the knowledge of the state of a transfer to be lost.  There are two approaches to solving this: 

1. Don't switch which bundle we are transferring. 

2. Remember the state of other bundles, so that we can continue from where we left off, if we do switch which bundle we are transferring.

Option 1 sounds great in theory, but the synchronisation process may well interrupt the transfer of a lower priority bundle with a higher priority one, for example.  So this is probably not possible to achieve in practice. Although I will instrument the changing of the current bundle being sent, to find out why it is changing.

Thus it probably makes sense to implement something to remember the progress of a bundle when switching.  This can be kept in a nice little cache somewhere.  Of course, this is one of several complications that would go away if/when I get to implementing the new approach I wrote about in the previous blog post.  But for now, the cache will have to do.

I've now implemented this caching of transfer progress.  However, the first run after implementing in it showed a weird situation where bundles transferred quite regularly in one direction, but none in the other.   In the direction where traffic was flowing, I saw 6 bundles received fairly quickly, with all of them transferring in 50 to 80 seconds.  That's the good news.  However, there would then be a gap ranging from a couple of minutes to half an hour with nothing being transferred.  Then the same six bundles would be transferred again.

I think what happened is that one of the servald instances crashed or didn't start properly.  As a result one party had no bundles to share, and could not store them into its Rhizome database, and so the same transfers happened repeatedly.  That might have just been some freak event that caused one of the servald instances to not have started properly.  To find out, I'll run the test again, and see what happens.

Second run worked properly in that regard.  Some bundles transferring at a reasonable speed, but others takeing a VERY long time, upto an hour or so.  

One bug spotted: We don't mark the final body block as sent if it is a partial block. This looks to be able to cause the final block to be sent many times repeatedly, which is clearly not good.  I've tried to fix that now.

The other bigger problem, is that the bundle keeps getting de-queued and re-queued, which of course is causing a lot of delay in transfer.  It also seems that at least one code path can de-queue the current bundle being sent, without stashing its progress for later retrieval. 

Found at least one: The sending of a new generation ID by the sender, which causes us to restart the tree-sync process.  In that code, we stop sending the current bundle.  However, we should just keep on sending the same bundle, unless and until we find a better bundle to send.  This is likely the cause of considerable delays, because the synchronisation will run for potentially several minutes before it finds a new bundle to send. That's likely to be it.  We should probably reduce the generation ID rotation rate for HF, anyway, since it doesn't make sense to reset the tree-sync process so frequently on a high-latency link.

All these problems keep making me think about the new tree-based approach I wrote about recently, as there would be no need to do any of this, and resumption of transmission would happen in a much more natural and efficient manner -- even if the lbard process really did die and restart.

Anyway, back to the existing code, it now looks like transfers are happening reasonably sensibly.  In trying to make a script to show the sequence of transfers, I am however still seeing that the currently selected bundle for transmission tends to change more often than it seems it should.  This led me to look more deeply into the TX queue management, and I'm seeing some odd things there, where some bundles added to the queue seem to disappear, and others that were never reported as having been added to the queue turn up in there.  So I'll add some more debugging for that.

Yes: The TX queue is being completely purged from time to time. This is again the resetting of the generation ID, the unique identifier that is used to indicate when an LBARD instance restarts.  Apparently I was resetting the generation ID every four minutes.  Thus it is no wonder that the TX queues are getting purged and we are ending up stuck in a synchronisation hell.  Instead, the generation ID should only be changed when the TX queue at one end or the other has overflown, and thus been emptied, and so the sending side doesn't know which bundles it should have sent when the TX queue overflowed. 

Given that the transfers are now otherwise working fairly efficiently, I'm hopeful that this will get us to some kind of reasonably reliable transfer with steady transfer times for bundles of a given size.

Running more tests, I have found and fixed a few more problems:

1. If the carrier is lost in a HF modem call, we can end up in a situation where no more normal packets get sent, presumably because the number of unacknowledged packets is high, causing the packet rate to be practically zero.

2. Particularly with bundles that require many data packets, they end up slowing down in transfer over time, and eventually transfer halts.  The halting is probably due in part to (1), but the reducing transfer efficiency must have another cause.

3. Some received pure data packets are corrupted, which is a probable significant contributor to (2).

To fix (1), I have reset the TX sequence number and last acknowledged sequence number whenever a HF call starts.

For (2), I need to examine what is going on in the logs of runs, which will probably make more sense once I have fixed (3).

For (3), I suspect that the modem's TX buffer is being easily overwhelmed.  This is in spite of the fact that I have hardware flow control enabled.  It is possible the hardware flow control doesn't work properly with the USB serial adapters.  In any case, I have implemented a check before sending pure data packets, so that they will not be sent if no normal packet has been seen in the last 30 seconds.  This should stop the buffers getting too over full.  But a fuller investigation will require that I log every byte sent to and received from each modem, and then compare the two streams to see exactly what kind of data errors we are experiencing. Is it lost bytes, as I am suspecting, or is it bit errors, for example?

So adding the logging of bytes sent/received, I can see that indeed something is going wrong with the pure data packets almost immediately.  This makes it much easier to see what is going on, since I don't have to wait an hour or more each time. Now to look at the TX and RX logs to see what is happening...

We are indeed missing large slabs of data.  Taking a look at the FTDI USB serial adapter documentation, it looks like you cannot safely write more than 32 bytes at a time, if using hardware flow-control, as that is the size of the FTDI chips internal buffer.  So I've modified the serial writing code to write 16 bytes at a time, to see if that fixes the missing bytes problem. However the code to back-off due to hardware flow control doesn't seem to get triggered at all, and the problem with corrupt data continues.  Found one problem with the TX and RX logs, where the pure data packets weren't being included.  Fixed that, now lets see how it looks...

Ok, so now I am not seeing any more lost bytes at the start, but the pure data packets were still being munged.  That problem turned out to be caused by the truncation of the data packets on the receive side. I also found and fixed a couple of segmentation fault bugs in the process.  Now re-running...

Still seeing corrupted packets. It turns out the pure data packets were not being properly escaped before sending, thus if they contained a "!" character in their data or header, it was messing things up.  Fix that now, too.  This could well be causing the failure to transfer larger bundles efficiently, as it would effectively stuff up the transfer of larger pieces more often.  So its satisfying to have found this problem.  Let's see what it does to the transfer rates...

Yes, bundles now seem to transfer in between 30 and 200 seconds. However, now I am seeing that the same bundles get sent more than once.  I suspect that this is due to the synctree reset that happens when the TX queue is emptied, but overflow of the TX queue was previously recorded.  This tells LBARD that it needs to start the sync again. However, if the bundles were received, then they should not get transferred again.  Looking in the logs, I am seeing HTTP 400 errors when trying to insert the bundles into the rhizome database. Thus we are still having a problem with corruption during transfer. I'll have to look at the reassembly of bundles. My suspicion is that the problem will be with the pure data packets somewhere.

The corruption problems have now been solved. Part of it was that writing in 16-byte pieces helps the serial over USB transfers, but messes up the HTTP transactions.  So I have split the code to only do that for the serial transfers.  That fixed the problem with bundles not making it into the Rhizome database, and transfers now proceed progressively through more than just the same 10 bundles repeating. 

However, I am still seeing the HF call dropping out sometimes. Those tend to recover fairly quickly, and might be legitimate loss of carrier in the radio setup.

More concerning though, is that there can be a period of half an hour or more where no transfers in either direction occur.  These are quite mysterious, as the HF modem doesn't seem to drop out, but the logs of both sender and receiver simply show nothing at all for a period of 46 minutes in this particular case.  I guess I will need to run it again, and try to catch when this happens.  

This time we hit a segfault after almost an hour and transferring 80 bundles (between both directions) in about 64 minutes. The problem is that sometimes rubbish ends up in the TX queue.  More specifically, the priority of a bundle ends up being stored in the bundle ID field of a slot in the TX queue. 

So I've put more debug instrumentation in to find out how the TX queue list is getting corrupted. In the process, I also fixed the case where if a bundle is currently being sent, it could still end up being queued for later transmission as well, e.g., if the tree-sync process is restarted at some point.  

I've also added code that detects when the top of the TX queue ends up containing an illegal bundle number to reduce the impact. However, I would really like to fix the error at its cause.  What I believe is most likely the problem, is that the shuffling of the TX queue has an out-by-one error somewhere, as the bundle numbers occur immediately before the list of priorities. Thus if one too many entries were copied down, it would copy a bundle priority.

So time to start a new test run...

Its now running through a lot happier.  157 of the ~220 bundles have transferred, but it has taken about 4 hours to get that far. But it's not as bad as it sounds, as it seems that my laptop somehow stops USB communications while the display is locked.  This means that if I don't periodically prod the computer, long periods of no transfers result.  I'm pretty sure that this is the cause of this problem that I noticed previously.

That said, I am also seeing it able to get into a hang up and redial loop, where no packets get sent in the meantime.  I assume that the time to next packet TX is somehow being set wrongly. Or more the point, it can grow large, and wasn't being reset when a call was re-established.  I've examined all the code-paths that can get there, and added explicit resetting of the next packet TX time.

I tracked down the "nothing sent for an hour or more" problem: It's actually not with LBARD, but with the laptop I am using.  It automatically suspends after 2 hours of inaction, even when connected to mains power.  This doesn't really help when it happens mid-run.  So I've disabled that in the laptop, and should now be able to run transfers without this problem.

I've also looked through and found the problem that was causing the illegal bundle numbers ending up in the TX queue, and fixed that. The problem was indeed an out-by-one with the list shuffling code, as expected.  I'm expecting that this will likely fix the problem where only 199 or so of the 202 bundles were being transferred, as it could result in the situation where the TX queue loses an entry or two without actually overflowing to trigger the re-start of the sync process to re-discover the bundles that were lost from the queue.   Thus I'm hoping that runs now will correctly transfer all 202 bundles -- 101 in each direction.  It might also end up being marginally faster, because fewer sync restarts will be required.

In short, I'm now hoping that I am at the point where transfers run reliably and in reasonable time.  I'll do a few runs to get some data on the typical speed and variation, but then we should be good, I think.  What I would like to do next is to plan some actual long-distance tests using the HF radios, with one here at the house in Akaroola and another some distance away.  Perhaps in a neighbouring town.  However to do that, I will need to get some antennae and coordinate with folks in the amateur radio community to help with that.

But as this post has already gone on for ages, I'll stop here, and start a new post looking at the remaining TX issues.

Tuesday, September 8, 2020

Thinking about different ways to do the tree-sync for LBARD

In the last few blog posts I've been fighting my way through fixing bugs with LBARD's synchronisation algorithms.  While both the tree-sync and the actual transfer of bundles have been problematic, the tree-sync side of things has been niggling at me for HF use, in particular. One of the reasons is that tree-sync has to start its synchronisation again from scratch every time a radio connection is made. This already takes several minutes, with only relatively small numbers of bundles.  Even if neither end has new bundles, the entire synchronisation process has to occur again.  Also the very high latency of the HF radios doesn't really agree with tree-sync, and can result in considerable delays in synchronisation.

Way back around 2014 we had started thinking about reworking the way that Rhizome stores data, by having it use a block-store to represent the bundles. That is, instead of keeping a manifest and payload, we would have a kind of virtual file-system in which the bundles would be represented.  Data blocks would be identified by the hash of their contents, thus providing automatic de-duplication, and making it much easier to cache data.

Now, reworking all of Rhizome to do that would require significant effort.  However, what *might* be doable in reasonable time, is to make the list of bundles an instance holds be represented in such a way.  This would allow each end of an HF link to cache this virtual file-system representation, and thus re-synchronise very rapidly if none or few new bundles have appeared at each end.

My current thinking is along the following lines:

1. Blocks should be small enough to fit in a single LBARD packet. This realistically means a limit of about 200 bytes.

2. We want to minimise the number of nodes in the tree, as each additional node requires an extra round-trip.

3. The small block size means that we need to use relatively small hashes for each block.  Even if they are too short for strong cryptographic security, they should still be strong enough to keep the chance of accidental collisions practically zero.

4. Each block can consist simply of leaf nodes or pointers to child nodes, or a mix of the two. The position of the node in the tree is implicit.

5. It would be nice to have one hex digit of identifier covered per block.  This means 16 pointers to sub-blocks have to be able to fit in a block.  200 bytes / 16 = 12.5 bytes per entry.  This probably means that 12 byte hashes = 96 bit hashes will be the longest possible.  That's ample to keep us safe from Birthday Paradox accidental node collisions (should be safe for up to 2^48 blocks).

6. If all the child nodes for a block can fit in a block, then that is where they are placed.  This avoids having unnecessary intermediate nodes in the tree that would otherwise have to be synchronised.

7. If all the child nodes can't fit in a single block, then we need to have a deterministic way to represent this, so that we don't end up with a proliferation of blocks as we fill nodes.

8. The simplest approach to (7), is to require that an overfull block child blocks for each of the 16 entries it can hold. This may mean that some child blocks point to the empty node (which because of the hashing, will always be the same).

9. The above formulation means that each bundle (or more correctly, version of bundle), will have a deterministic node, which each side can enter into their block cache, thus avoiding the need to ever transfer a block that corresponds to a bundle that the other party already has.

10. Each side can request blocks from the other, to begin exploring the tree structure. Any common sub-trees between the two sides will be implicitly available.

11. The block cache can persist between executions of LBARD, thus allowing the complete state of the other party to be remembered between radio sessions, thus greatly speeding up the initial synchronisation when a radio session commences.

12. Multiple blocks can be requested at a time, thus helping to mask high latency.  In fact, simple optimisations like "please send me block X, plus any children (and optionally further descendents) of it that you don't think that I have" can be used to save further round-trips.

13. When a block is received, its position in the tree is known. This means that in some cases, it will be possible for the receipient to work out the content of the child nodes, if the sender has a strict sub-set of the sender's bundles that are in that part of the tree.  At its simplest, a leaf node can be constructed for every combination of the bundles that it contains.  If any of those match the block ID of the child node indicated in the received block, then there is no need to receive those child-blocks.  Instead, not only can we avoid the need to send those blocks, we can actually immediately infer that the sender is missing those bundles not in the combination that resulted in the matching child node.  

14. The approach from (13) can be extended further up the tree if desired, to help save bandwidth, at the expense of storage space and computation.  At some point, however, the number of combinations that need to be tested becomes excessive, as each additional bundle in the portion of the tree being considered potentially doubles the number of combinations to be computed.  However, because each leaf node is restricted in its potential content, the growth in complexity is actually rather different (leaf nodes are only formed when the collective leaves don't fit in a single block).  The exact complexity of this will need to be modelled.

15. In theory, if the complexity of the above can be handled, then sending just the top of the tree will allow the recipient of any node that contains only a subset of the nodes the recipient possesses, without any further data exchange.  This is more of theoretical interest than practical effect, except for the leaf-node optimisation described in (13).

16. The bundles themselves can also be represented as tree structures, so that similar techniques can be used.  This also allows for implicit transfer of common data sections, including in journal bundles that have invariant starts.  Of course, this ability to transfer common data sections will only work for those that have such commonalities that are aligned with the block size, or better, aligned with a leaf-node of blocks.

17. This approach allows for implicit caching of all transfers, making resumption of communications with other nodes and of bundles much more efficient.

18. This approach also solves many of the authenticity issues with transfers in LBARD, where the transfer of a large bundle could be poisoned with a fake block, or even just by a bit error that escaped the error correcting code. This is a significant advantage of this approach.

19. This approach effectively trades-off storage space (which is cheap and plentiful) for algorithm complexity and RAM requirements (which is in limited supply on the kinds of small embedded devices we want to target, such as low-power meshing phones etc).  This is thus a very nice side-effect of this approach.

Anyway, I'll have a think about whether I implement this now or not, depending on how feasible it looks to get acceptable performance with the existing algorithm.

Monday, September 7, 2020

More work on LBARD over Codan HF Data Modem

In the last post, we got to the point where we had Rhizome bundles flowing over the modem, but not as fast as we would like.  Also, we were still seeing some problems where bundle sending would get stuck.  I've been away for a couple of weeks, but now am back at the bench with the HF radios, and can investigate further.

The first thing I realised that I should investigate, is what the actual throughput of the modems is, so that I can determine what fraction of the theoretical throughput we have available.

To do this, I instrumented LBARD to count the number of bytes, and divide that by the number of seconds since the first byte was received.  This yielded results of about 91 bytes per second in each direction.  Given the channel theoretical capacity of 2,400 bits / sec = 300 bytes / sec, this is noticeably lower than expected.  It means that we are only actually transmitting data (91+91)/300 = ~60% of the time. 

Now, if we were just transferring in one direction, instead of switching all the time, we could surely improve that, but we know that there is less than a factor of 2 available through such optimisation.  So I'm not too worried about that right now.

What is more interesting is that we should be able to get near to 90 bytes per second, but are still languishing down around the 20 bytes per second mark due to protocol issues. That is, we have a factor of 4 that we can improve, without having to increase the channel capacity or use strategy at all.

Having spoken with one of my students who is working on a debug framework for LBARD, it is sounding to me like the packet bitmap that is used for acknowledgement and scheduling of parts of a bundle that the receiver doesn't already have is not working properly.  If this is the case, it would be causing the sending side to send parts of the bundle that the receiver already has, effectively wasting bandwidth. 

Looking through logs of previous runs, I can see one problem already: We schedule sending a bundle at the end of the file, even if we have already sent to the end of the file.  That is, we send 0 bytes of data, positioned at the end of the bundle payload.  This is clearly silly, and should be fixed. What we need to watch out for, though, is that we use end-of-payload packets to implicitly indicate the payload length in LBARD transfers. 

We are also seeing the 4 data bytes being sent quite a number of times, before the missing parts of the manifest are sent through.  That resending needs to be fixed. But it also occurred to me, that if we randomly select a point in the last 64 bytes of the manifest or payload, when we will only send <64 bytes, when typically we have space in the packet for 128 or 192 bytes.  I suspect that this has the potential to greatly improve latency for small bundles, by basically eliminating such very low efficiency packets.  Together with preventing the re-transmissions mentioned in the last paragraph, this should be able to get the latency of a sub-1KB bundle down to perhaps 15 seconds, which would be great improvement.

I'll try these improvements, together with some tweaking of debug output, so that I can more easily follow when particular bundle pieces are sent, and what the list of candidate blocks at a given packet transmission are.

This should have been a simple-rerun, but then I hit a segfault caused by something unrelated, and now running it in gdb all sorts of strange things are happening, causing one of the instances to not properly switch to the online data mode when the modems connect.  Well, I found what caused the whole crazy problem to start: When I copy-pasted the arguments into gdb, I left the "1" off the front of "127.0.0.1", thus causing long delays while waiting for "27.0.0.1" to resolve, which caused the protocol to time out in various exciting ways.  So let's try that all again...

Ok. Now it is running again, after fixing that.  I also found and fixed some other bugs.  It is now transferring small bundles in typically 20 - 50 seconds.  This could be quicker, but it is a start.  However, I am seeing instances where a bundle is received, but then fails to insert into the Rhizome database with an error 422. This means that either the manifest is too big, or that the payload doesn't match the hash.  The bundle is then received a second time, and gets inserted. This means that there is some problem with the integrity of the transfer.

My first suspicion was that the pure data packets I implemented recently might be the problem.  But in this case, no such packets had been sent at all for the bundle in question.  As the packets are in any case protected by Reed-Solomon error correction, this would seem to suggest that the contents of the packets were being incorrectly assembled.

Actually, it turns out that this happens fairly often.  This is good, in that it probably explains a lot of the poor transfer performance: Some bundles are having to be sent two or more times, before actually being accepted. So now to find out why.

It looks like decompression of the bundle manifests fails, resulting in the end of the manifest being messed up, and not passing its cryptographic self-consistency checks.  It is the end of the manifest that gets messed up, so I am guessing that somehow another, either the length of the manifest, or the last partial 64-byte block of data for the manifest doesn't get properly received.  It should hopefully not be too hard to find, as it is happening quite frequently, with probably 20% of bundle transfers being affected by it.

To track it down, I'll make both sender and recipient parties display the compressed manifest data, so that I can play spot-the-differences.

So looking at the compressed manifest on reception, it looks to be the right size, but has all zeros in the last data blocks. The problem was that when I made the optimisation to send not just the last fractional block, but as many 64-byte blocks that come before it that can fit, I hadn't adjusted the start point of the packet data being read, thus it was missing the extra bytes.  So time for another run, to see how it goes.

Right. So now we don't seem to have the problem with the code 422s, which is good.  However, we have the problem that sometimes we keep receiving a bundle repeatedly after receiving and registering it.   This seems to be because we don't get a confirmation message from servald that the bundle is now in our rhizome database.  As a result, we don't cull the duplicate reception of the bundle. It also means that we can't immediately start sending the bundle to other peers. 

This isn't such an issue for the HF modem driver, because we don't have other peers to talk to. However, for Mesh Extenders with the UHF packet radios, this is a big problem, as bundles won't get forwarded on to other peers until this occurs.  Thus we need to find and fix this problem.

LBARD communicates with servald via servald's HTTP RESTful interface.  In particular, it listens for new bundles via an asynchronous API, which means that we should get indication from servald essentially immediately on inserting the bundle.  So I presume something is fishy in that.  To investigate this, I have instrumented the code that communicates with servald, to find out where the problem is.

While looking at the logs for these delayed bundle notifications, I am also seeing that some bundle transfers are slow, as one end is not sending any data packets at all to the other end for 10 -- 20 seconds at a time. This should never happen, especially once the initial synchronisation has occurred, that lets each end know what the other needs.  In the current case, I am seeing a delay of 24 seconds between when successive pieces of a bundle are sent.  So we know that the remote end thinks it has a bundle that is worth sending.

I'll start by investigating one of these intervals, from 17:51.02 to 17:51.26.
During that time, we do receive several packets from the sender at 17:51.10, 17:51.12, 17:51.15, and 17:51.21. That is, we have received four consecutive packets that seem to contain no pieces of bundles.  Okay, so it seems that the synchronisation process is continuing to run, and is sending packets of ~200 bytes of synchronisation information.

Thus these delays are a false alarm, caused by the natural synchronisation process.  What is a bit worrying though, is that the synchronisation procedure is taking a long time. With 100 unique bundles on each side, synchronisation is taking at least tens of minutes.  In this example, I am seeing of the order of 300 sync messages of close to 200 bytes each -- and it shows no sign of settling down.
This amounts to somewhere around 68 KB of data out of a total of 84KB sent.

Well, that's very interesting: It explains why our goodput is so low, and generally, why transfers are going so slowly.

Thus together with the Rhizome bundle notification delay, we have to bugs that are able to greatly slow things down.  I now have something quite definite to concentrate on.

I'll still work on the Rhizome problem first, because I am much more familiar with that code. 

The synctree code is going to be trickier to fix, because I didn't write that. What I suspect might be the issue with it, though, is that the sync tree code doesn't expect to be used in an environment where multiple packets get queued up.  This means that when it sees a response to and old packet of its own, it might be getting confused, and starting the tree traversal from higher up the tree again.

Thus the potential fix for this is to find a way to make sure we have only one packet in flight in each direction at a time. This would be a good idea in any case, because it would help to minimise the latency when a bundle is received, so that the sending party doesn't keep needlessly sending parts of that bundle after it has been received.

Well, actually I am going to fix this one-packet-in-flight issue first.  But first, a brief explanation as to why we do this syncing this way in the first instance, rather than, say, just exchanging lists of bundles, and then sending the bundles that the other side needs to get -- because that would surely be more efficient.

The reason is that a simple "exchange lists" and then bundles method would be fine if no new bundles ever arrive during a communications session.  But we want to be able to deliver messages with reasonable latency, even if they arrive while a connection is active.  That is, we would like to have live HF links between Serval Mesh islands, and have those act as near-real-time data conduits.

Now, back to the need for the sync algorithm to not be several packets behind, I have implemented this, and it seems to be working somewhat: The sync traffic is flowing, and is no longer filling the packets.  However, the rate of packet exchange is VERY slow.  The average raw data rate in each direction is only about 8 bytes per second. It looks like each end thinks that it has 2 unacked packets nearly the whole time, which is causing it to throttle back the packet rate.  I am suspecting that the low data rate is interacting in a bad way with the modems trying to be efficient, and waiting until there is enough data to be bothered sending.

But back to thinking about having only one packet in flight in each direction at a time, this means we will always end up with some wasted time, because one end or the other will almost certainly be waiting most of the time.  Investigating further, the round trip time for a packet turns out to be around 15 seconds (!!).
Even allowing for two maximum size packets of 255 bytes, the propagation time would still only be (255x2 bytes)/(2400 bits per second) = 1.7 seconds.  This makes me all the more suspicious about the modems trying to batch data together for efficiency, which in other cases would work.

So we need a solution to this problem.  The previous work I did on inserting the occasional pure data packet might actually be a good solution here: Basically it means that we have a background level of sync exchange through the existing packets, and then we greatly accelerate the exchange of bundles via the pure data packets, with the added benefit that they should help to flush the modem buffers.
                    
I'm now reading through the manual for the modems, with a view of disabling compression, in case the compression is causing increased latency. In the process, on page 6-2 of the "HF Data Modem 3012 Reference Manual", I have found some helpful information: While the data channel can in theory carry upto 2,400 bits per second, the maximum effective throughput is only 1,475 bits per second, i.e., 184.375 bytes per second.  This explains why I was seeing data rates of about 92 bytes in each direction.  While the extra bandwidth would have been nice, it does at least give me a clearer idea of what the maximum realistic capacity is.

On page A-3, I also found *T, which allows setting the timeout before sending data packets.  This is important, because any non-zero value will increase the latency of packet transmission. This setting can be set from 0ms to 120000ms.  The default is apparently 0ms. I'll set it explicitly in any case, just to make sure... Except the modem complains about the command.  It is oddly listed as only *T, not AT*T, but *T also does not work. I tried *0, in case the T is supposed to be the value itself.

Also AT&M=5 is supposed to select the highest data rate and optimise for interactivity, i.e., reduced latency. That was already set.

Doing some more direct tests with the modems, I timed how long it takes after I paste a 62 byte string into one modem, before it appears at the other end.  This reliably takes 7 to 8 seconds, thus confirming this latency problem.

The issue we have, is that our synchronisation protocol currently does not support having multiple packets in flight, as it was designed with radios in mind that can reliably indicate when they are ready to send a packet, and then send the packet immediately, thus allowing each packet to be build just-in-time, to ensure it has the most appropriate content in it.

In fact, just pressing a letter and waiting for it to appear at the end has the same latency of 7 - 10 seconds, so it isn't a bandwidth thing.  Rather, it seems that the modems make at least one full turn-around each, before the data gets sent.  Is this normal?

I did also find the command to disable data compression, AT%C0, and I am now testing that to see if it makes any noticeable difference.
(Another useful command I have found is AT&C=n,n,n,n,n, which allows providing the modem with a list of channels on which to listen for incoming calls, which I'll use later, and record here so that I don't forget about it ;).

But back to AT%C: Disabling data compression DOES make a difference: It drops minimum latency from around 7 - 10 seconds, to only about 4 seconds.  That's a big help, and should greatly improve our throughput.  With 4 seconds in each direction, and sending packets of up to 232 bytes, this should get us a data rate of up to 58 bytes per second of the theoretical 92 bytes per second available.

Thus, while it is a good step forward, it isn't enough to completely solve the problem.  We still will need to send some pure data packets to make full use of the bandwidth.  Part of me also wonders if disabling data compression might help with the buffer overrun problem we were having previously, that was limiting the size of the data packets we could send between normal LBARD packets. 

But before we make further changes, we should time how long it is taking to send small bundles (without payloads in this case), so that we have a base line to compare against.  Having ~4 second latency in each direction instead of ~8 seconds, means that we will detect the end of bundle transmission on average 8 seconds sooner, which should provide a noticeable improvement.

Sending only the normal LBARD packets, without any extra data packets, and it fails, because of the previously reported problem with payload-less bundles.  I've now put a fix in for that, and that helps.

I'm also seeing another problem which is when the receiver tries to send a Bundle Announce Response (BAR) to indicate to the sender that it has received the bundle, we hit a problem if the packet contains another piece of the same bundle.  That 2nd piece causes the receiver to emit an ack for that piece, which replaces the BAR in the transmission queue. As a result the sender never hears that the sender has the bundle, and keeps sending pieces of it.  This is likely a key contributor to the highly variable receive times we have been seeing for bundles.

There are two fixes for this that should both be done:
1. ACKs should never replace BARs in the progress report TX queue.
2. We should simply ignore pieces of bundles we have marked as recently received, and never generate the ACKs for them, but instead produce BARs for them, to indicate that we have already received that bundle.

With that fixed, along with some other tidy-ups, we are now ready to do the comparison again.  We will count the number of zero-length payload bundles (which thus involve sending about 200 -- 220 bytes of compress bundle manifest, and all the house-keeping) that we receive in 1,000 seconds, and the total bytes transferred in that time, so that we can calculate the average bundle transfer time, and the goodput. 

Again, we will do this with no extra data packets, then with a 128 byte data packet after each normal LBARD packet, and then again with sending a data packet every 2 seconds, independently from the LBARD packets. We will average for both directions over our test between the two radios.

With 0 byte payloads, i.e., just manifests:

No data packets:
A -> B : 16.66 bytes per second, 13 bundles transferred
B -> A : 16.93 bytes per second, 13 bundles transferred
Sending 1 x 128 byte data packet after each normal LBARD packet:
A -> B : 25.74 bytes per second, 19 bundles transferred
B -> A : 25.26 bytes per second,  21 bundles transferred
Sending 1 x 256 byte data packet after each normal LBARD packet:
A -> B: 33.22 bytes per second, 26 bundles transferred
B -> A: 33.06 bytes per second, 33 bundles transferred

Sending 1 x 256 byte (including headers) data packet every 2 seconds, as well as normal LBARD packets:
A -> B : Modems hang up after a while, possibly due to data overrun, as bytes per second climbs to ~90 bytes per second.
B -> A : Modems hang up after a while, possibly due to data overrun, as bytes per second climbs to ~90 bytes per second.
(this makes sense
Sending 1 x 200 byte (including headers) data packet every 3 seconds, as well as normal LBARD packets:
A -> B: 48.13 bytes per second, 29 bundles transferred
B -> A: 48.18 bytes per second, 26 bundles transferred
Sending 1 x 256 byte (including headers) data packet every 3 seconds, as well as normal LBARD packets:
A -> B:  78.37 bytes per second,  23 bundles transferred
B -> A:  80.62 bytes per second,  21 bundles transferred
Sending 1 x 200 byte (including headers) data packet every 2 seconds, as well as normal LBARD packets:
A -> B: 68.59 bytes per second, 23 bundles transferred
B -> A: 68.86 bytes per second, 23 bundles transferred

Note that just because we allow a 256 byte data packet, doesn't mean that one will get sent: The actual packet sent might be quite a bit smaller, if there isn't enough data for the bundle currently being sent to stuff in.

We also see that in some cases, the bytes per second is higher, while the number of bundles transferred does not increase, or in fact goes down.  This will most likely be due to the re-transmission of the same data blocks over and over, while waiting for acknowledgement of reception of the whole bundle.  Indeed, switching from 256 byte to 200 byte data packets with the same 3 second interval, we see no decrease in bundles transferred, although the number of bytes sent per second reduces.

Once we move to bundles with non-zero-length payloads, this effect should reverse. Another effect that is likely at play, is that if we send lots of data blocks between the LBARD packets, then we will increase the rount-trip-time of the LBARD packets, which will slow acknowledgement of bundles, and thus reduce the number of bundles that can be received per unit time.

Another factor that must be taken into account here, is that the first bundle transfer does not typically take place until around 120 -- 180 seconds into the 1,000 second test. Thus once actively communicating, the actual time per bundle will be lower than suggested here.

So now to repeat those tests with non-zero payloads for the bundles... Which showed up a bug in payload handling, which was quick to fix. More critically, it showed up an issue where receivers sometimes don't recognise packets as being addressed to them.  This means that when the other party sends them pieces of a bundle that they have already received, they don't tell the sender that they already have it.

The mechanism that is playing up stuffs each message containing a piece of a bundle with the first two bytes of the SID of the receiver.  This works fine for some time, but after a while, the receiver finds garbage in this field.  What is not clear is whether it is a data handling bug, that corrupts it on the receive side or during assembly on the transmit side, or whether the field that contains a peer's SID on the sender side is getting corrupted.  This takes of the order of 15 minutes to occur.

LBARD maintains both  ASCII-hex and binary representations of each peer.  If these end up with different values, then we can assume that memory corruption has occurred. I'll add instrumentation that will help me to see if that is what is going on.

Interestingly, the data that overwrites the binary representation seems to often begin with $47, suggesting that some piece of code might be putting stuff there by mistake.  The "Generation ID" message type does put a $47 followed by 4 random bytes.   This message type also contains no bounds checking -- it just assumes it has been given a buffer with enough space in it.  However, it is only called at the start of the packet assembly process, when it can be safely assumed that there is enough space in the buffer to fit the message.

A few more runs confirms that it is always $47 in the first byte, so almost certainly something screwing up with the Generation ID messages.  I'll add some more instrumentation to see what the new generation ID is each time one is received, and if it matches what I am seeing in the packet, then I will know that its to blame.

Ok, while waiting for the run to hit the problem again, I've taken a closer look at the code that parses the generation ID messages, and discovered that it does indeed put the generation ID into the SID prefix field. I.e., a straight-forward bug.

This fixed the problem, and got everything going, but it still seems to be rather slower than it should be sometimes.  But investigating the remaining problems can go into the next blog post.

Sunday, July 19, 2020

LBARD over Codan HF modem and debugging LBARD sync problems

This week I am finally getting to working on the remaining work for supporting the Codan 3012 HF data modem in LBARD.  This was started a long time ago, but has stalled at various points along the way.

The purpose of using this data modem is that it comes as a software option in the Codan Envoy 2210 HF radios, and can in theory transfer close to 100x more data than the horrible ALE text message method that we had demonstrated a few years ago.  That said, we are still talking about a data link that in theory can do 2,400bits per second. I say in theory, because that relies on good atmospheric conditions for the HF communications, and because the link is actually half-duplex with quite a noticeable turn-around time.  That is, the modem at one end sends a blob of data, stops transmitting, and begins listening. The modem at the other end receives the blob of data, switches to transmit, transmits its blob of data, stops transmitting and begins listening. This is a good thing, but not so helpful for our data throughput. Thus it takes perhaps a tenth of a second or more to do the turn-around.  Because this is all happening with radios that can transmit at 125W, it involves relays and various safe-guards in the TX to RX turn-around.

The modems seem to send a data blob lasting 2 seconds. So in every ~(2 + 0.1 + 2 + 0.1) seconds we get to send one blob of data.  Thus while the channel can carry 2,400 bits per second, we are limited to something below 1,200 bits per second from each end. We'll estimate this to be about 1,000 bits per second for ease of calculation.  This should give us an effective data rate of a bout 125 bytes per second in each direction.

This is pretty modest by modern internet standards, but it does correlate to about one text message per second. Of course, by the time we have overhead for routing, authentication and other house-keeping elements, a text message might be closer to 1KB, which would require about 8 seconds to send under ideal conditions.  This means that a link that is active 24/7 could send and receive about 86,400 seconds/day / 8 seconds = ~10,000 messages per day.  Even with just an hour a day this means that we should be able to send and receive about 400 messages. Given our goal is to connect small isolated island communities in the Pacific, and to provide backup text messaging capability during disasters, this sounds like a pretty reasonable capacity -- and certainly better than the 10 to 15 minutes it was taking to send an authenticated text message via the ALE text messaging channel!

The radio setup I have here is a pair of Codan 2210s kindly loaned to us by Codan themselves.  I have these hooked up to a laptop here via USB serial adapters.

The first step is to enable the data modems on the Envoy radios. If you have obtained these radios without enabling this feature, you will need to give Codan a call to purchase an unlock code.  You shouldn't need any hardware changes, which is handy, if you are already in the middle of nowhere, needing to setup communications.

Next, use the handset to select "Menu", "User data", "Peripherals" and "RFU GP port", and select "2.4kbps data modem". It is important to NOT select "3012 mode", unless you really are using an external modem, rather than the built-in one.  The menus are shown in the following figures:





The modems themselves have an interesting feature, where they report themselves when you connect to them via the USB serial port.  However, if you power the Envoy on with the USB already connected, this doesn't always occur, and sometimes I had problems getting anything out of the modem in that context. The solution is to disconnect the USB port from the computer and reconnect it. 

I'll have to think about a robust solution for that,  as it can cause LBARD's auto-detection of the modem to fail, which would be unfortunate for automatic operation, which is what we want to achieve. It might be possible to do a software USB disconnect and reconnect, for example. Or there might be some magic option I need to setup on the radio.  Anyway, we'll come back to that later.

The modems need only a few simple commands to control them -- assuming the HF radios are sitting on the same channel. I haven't yet used the commands that the modems offer for selecting channels, although it should be quite possible.

The first thing to do is to give the modem a station ID. This is a bit like a phone number. It can have upto 6 digits, but for reasons I don't immediately recall, the last two digits cannot be 00.  This is set using the AT&I=number command, e.g.:

AT&I=2

would set the station ID to 2.  Once that has been setup, you can tell a modem to try to dial it using the good old ATD command, e.g.,

ATD2

would call a modem that had been configured as previously described. This will cause the standard RING, NO ANSWER, NO CARRIER, and CONNECTED messages to be displayed based on whether the remote end answers with ATA or not, or later disconnects with ATH0.  In short, Codan have done a great job of making this modem masquerade as an old-fashion 2400 bps data modem.  This helps it be compatible with various existing software, and helps me, because I have previously had many adventures programming such modems.

The modems default to having compression enabled, although I don't know what kind of compression they have, nor how effective it is.  For now, I am just ignoring it, and leaving it in the default setting of enabled. 

Armed with the information on how to control the modem, I set about creating an LBARD driver for it. This consists mostly of an auto-detection routine, the state machine for managing the call states, and routines for sending and receiving packets.  This is all in the src/drivers/drv_codan3012.c file.

Initialisation just consists of sending a few AT commands to set the modem's HF station ID, and enable hardware flow-control:

int hfcodan3012_initialise(int serialfd)
{
  char cmd[1024];
  fprintf(stderr,"Initialising Codan HF 3012 modem with id '%s'...\n",hfselfid?hfselfid:"<not set>");
  snprintf(cmd,1024,"at&i=%s\r\n",hfselfid?hfselfid:"1");
  write_all(serialfd,cmd,strlen(cmd));
  fprintf(stderr,"Set HF station ID in modem to '%s'\n",hfselfid?hfselfid:"1");

  snprintf(cmd,1024,"at&k=3\r\n");
  write_all(serialfd,cmd,strlen(cmd));
  fprintf(stderr,"Enabling hardware flow control.\n");

  // Slow message rate, so that we don't have overruns all the time,
  // and so that we don't end up with lots of missed packets which messes with the
  // sync algorithm
  message_update_interval = 1000;
 
}


Flow control is a bit interesting to manage. The big problem is that we don't want the buffer of the modem to get too full, as this makes the round-trip time for control messages too long, which slows things down. For example, if the far end doesn't quickly confirm that a given bundle has been received, the sender will keep sending pieces of it. Thus we need to keep well away from hitting the flow control-imposed limit. The packet sending and receiving routines keep track of this by having sequence numbers that are then returned to the sender, so that the sender can get an idea of the number of packets in flight.

LBARD modulates packet flow, not by the number of packets in flight, but by the average interval between packets.  As a first-cut approach, I set the packet interval dynamically based on the number of outstanding packets in flight.  Basically I set the packet interval to some multiple of the number of outstanding packets.  250ms per outstanding packet seems to work ok, with reasonable throughput balanced against not having too many unacknolwedged packets to waste outgoing bandwidth with needless retransmissions.  I can refine that a bit more later.

The state machine for the modem is also fairly simple:
 
  switch(hf_state) {
  case HF_DISCONNECTED:
    if (hfcallplan) {
      int n;
      char remoteid[1024];
      int f=sscanf(&hfcallplan[hfcallplan_pos],
                   "call %[^,]%n",
                   remoteid,&n);
      if (f==1) {
        hfcallplan_pos+=n;
        fprintf(stderr,"Calling station '%s'\n",remoteid);
        char cmd[1024];
        snprintf(cmd,1024,"atd%s\r\n",remoteid);
        write_all(serialfd,cmd,strlen(cmd));
        call_timeout=time(0)+300;
        hf_state=HF_CALLREQUESTED;
     } else {
        fprintf(stderr," remoteid='%s', n=%d, f=%d\n",remoteid,n,f);
      }
     
      while (hfcallplan[hfcallplan_pos]==',') hfcallplan_pos++;
      if (!hfcallplan[hfcallplan_pos]) hfcallplan_pos=0;
    }   
    break;
  case HF_CALLREQUESTED:
    if (time(0)>=call_timeout) hf_state=HF_DISCONNECTED;
    break;
  case HF_ANSWERCALL:
    write_all(serialfd,"ata\r\n",5);
    hf_state=HF_CONNECTING;
    call_timeout=300;
    break;
  case HF_CONNECTING:
    // wait for CONNECT or NO CARRIER message
    break;
  case HF_DATALINK:
    // Modem is connected
    write_all(serialfd,"ato\r\n",5);
    hf_state=HF_DATAONLINE;
    call_timeout=time(0)+120;
    break;
  case HF_DATAONLINE:
    // Mode is online.  Do nothing but indicate that the modem is
    // ready to process packets
    if (time(0)>call_timeout) {
      // Nothing for too long, so hang up
      sleep(2);
      write_all(serialfd,"+++",3);
      sleep(2);
      write_all(serialfd,"ath0\r\n",5);
      hf_state=HF_DISCONNECTED;
    }
    break;
  case HF_DISCONNECTING:
    break;
  default:
    break;
  }
 
In short, we make a call if we have been instructed to, and am not currently on a call.  Otherwise stay on the call unless we get no data for too long, in which case, hang up, and repeat the process.

With the above and the necessary glue, I was able to get packets flowing between the two modems fairly easily.  But when I started trying to transfer bundles, I started seeing the protocol get confused with both sides trying to send each other the same bundle endlessly.  This should never happen, as when one end realises that the other has a given bundle, it should stop trying to send it, because it has pretty clear evidence that the other end has it already. 

I'd seen this problem before in the field with UHF radios in the Mesh Extenders, but it had stubbornly refused to be reproduced reliably in the lab.  So while it is on the one hand annoying, its actually a great relief to have found the problem, and to be able to progressively work on solving it.

It turns out to be caused by a number of things:

First, the TreeSync algorithm seems to not realise when the other end receives a bundle, even though it should.  That module is quite fancy and efficient at doing its job, and I don't understand it well enough to figure out what subtle problem it is suffering from.  The mechanism by which this causes the lockup took a little while to realise, but it made a lot of sense once realised it:  Once both sides have as their highest priority bundle for the other a bundle that the other side actually has, and that trips this TreeSync problem, then they will continue to try to send it forever.

It self-corrects eventually, but only briefly, because when it produces a new generation ID, the tree is re-built, which allows it to send the next few bundles before locking again.  To work around this, I now maintain a list of bundles that we have seen the other end try to send.  If they try to send a bundle, then they must by definition have it.  Therefore we should never try to send that bundle to them again.  This works to stop the TreeSync algorithm from re-scheduling the same bundle over and over.

That fixed the protocol lock-up.  Then I was seeing that the scheduling of which data blocks to send was being completely random.  Now, it is supposed to be rather random, so that if multiple senders are available for a given bundle, a single receiver can receive pieces from all of them, and thus receive the bundle as a whole sooner. 

The problem was that the system for keeping track of which blocks we think the recipient might already have was not being activated, because it was getting confused about which bundle we thought we were sending them, or should be sending them.  As a result very small bundles consisting of only a few blocks would send in reasonable time, as the probability of all their blocks being sent would approach 1 very quickly. But larger bundles consisting of more than a dozen or so blocks would be extremely slow to send, because it was basically pot luck waiting for the last few holes to get filled in.

This required a bit of fiddling about in a few different places to pick up all the corner cases of when we should reset it, and making sure that we clear the bitmap of sent blocks and other minor details. I also fixed the situation where we see another party sending a bundle to someone, and we haven't yet worked out what to send to them.  In that case, we now join in sending the same bundle, so that the receiver can hopefully receive the whole bundle that bit quicker. 

These multi-way transfer optimisations are not really applicable to the HF radios, as the modems are designed for point-to-point, but it made sense to implement them, anyway. Also, down the track, it would be possible for us to look at ways to make a radio listen to multiple HF radio channels in parallel, and receive all the content being delivered on them.  This promiscuous mode of listening could greatly increase the effective capacity of the system, and basically is just a sensible use of the true broadcast nature of radio.

Back to the bitmap sending, even after I got it working, it was still not particularly efficient, as even when there were a relatively few pieces left to send, it would still chose them at random, potentially resending the same block many times until the next bitmap update came through.  As the round-trip latency is typically 10 - 20 seconds, by the time all the modem buffering is done, this can add significant delays through redundant sending. Also, it can get stuck sending manifest pieces for a long time, which again can be quite wasteful, as the manifest is typically less than 300 bytes, and will keep getting resent until the bitmap update comes through indicating that the manifest has been fully received. 

What would be better, would be to keep track of the set of pieces that are outstanding at any point in time, and just randomise their order, and then send them in that randomised order, and then start over.  That way we will avoid redundant sending.  The order can be randomised afresh each time we get a bitmap update.

First implementation of this, seems to work fairly well: It sends the blocks in random order. It's still not as fast as I would like, averaging only about 20 bytes per second in each direction.  This will be partly because I am sending 64 byte blocks at random, but the packets can hold 128 or sometimes even 192 bytes. But if there are no contiguous outstanding blocks, then it will end up being very inefficient.  I'll likely fix that by allowing a message type that can contain more than one piece of the same bundle.  Actually, I figured out an easier quick fix: Consider all odd-numbered blocked to have been send once already, so that we try to send all the even-numbered ones at least once.


The bigger problem, though, is that the indication of when the other end has received a bundle is not working properly, and so we are back to our protocol lock-up situation.

So looking at a particular run when this is occurring, we have LBARD2 and LBARD8 talking to each other.

The begin by exchanging two small bundles with each other, before moving on to transfer bundles of several KB. 

LBARD2 receives bundle 4B47* (4870 bytes long) from LBARD8. LBARD8 receives C884* (4850 bytes long) from LBARD2. 

So both now have C884*.  The problem is that both think that the other doesn't have it. As C884* is smaller than 4B47*, it has a higher TX priority, and thus both sides keep trying to send it to the other.

LBARD8 received C884* at 11:42.38.   By 11:42:47, LBARD8 has received confirmation that the bundle was inserted into the Rhizome database, and thus knows definitively that it has the bundle.

Even though it knows that it has this bundle, it continues to accept receiving the bundle afresh from LBARD2.

Part of the problem is that LBARD8 doesn't immediately note that LBARD2 has the bundle.  It should do this, whenever it sees LBARD2 sending a piece of the bundle. 

In fact, there is no indication whatever that LBARD8 is receiving the pieces of the bundle after it has received it the first time.  I'll need to re-run with more debugging enabled.  Perhaps part of LBARD is noticing that it already has the bundle, thus is ignoring the messages, but without scheduling the indication to the other end that it already has the bundle.  But even if that is not the case, the receiver shouldn't be willing to again start reception of the same bundle, after it knows that it has it itself. 

Each run has randomised bundle IDs, which is a bit annoying.  So I have to trawl through the logs again to see what is going on.

So this time, LBARD2 receives bundle D386* from LBARD8 at 12:18.26.

Interestingly, the "recent senders" information is correctly noting that LBARD8 has sent a piece of this bundle.  So I can make use of that to update the other structures, if necessary.

At the point where LBARD2 receives D386*, it is sending bundle 3A7C* to LBARD8.
Ah, I have just spotted at the point where we realise that the other party has the bundle, instead of dequeuing the bundle to them, we are actually queueing it to them.  That should be fixed. This is also the point at which we should mark that we know that that peer has the bundle already.  So we will try that in the next run.  Simultaneously I have added some instrumentation for the "prioritise even over odd" optimisation, as it doesn't seem to be doing exactly what it should.

Fixed that, and some other related bugs. It now does a fairly reasonable job of scheduling, although there is still room for improvement. The big problem is that it is still sitting at around 20 bytes per second of effective throughput. 

Now, this isn't as bad as it sounds, as that includes the overhead of negotiating what needs to be sent, and packetising it all, so that if the modems do lose packets, we don't lose anything. It also reflects the reality that we are only sending one packet in each direction every 3 seconds or so.  With an effective MTU of 220 bytes, we only end up fitting one or two 64 byte blocks of data in each.  Thus our potential throughput is limited to about 64 or 128 bytes every 3 seconds = ~20 -- 40 bytes per second. 

So at the current state of affairs, we have something that is at least 10x faster than the old ALE messaging based transport -- and its is bidirectional throughput. We are seeing message delivery latencies for short messages that are about 30 to 50 seconds.  This compares very favourably to the ~3 minutes that sending a bundle was taking with the ALE transport.  But it still isn't as good as we would like or, I think, we can achieve.  We know that we should have something like 125 bytes per second of bandwidth in each direction.  Even with the suboptimal scheduling of pieces to send, we should be able to achieve at least half of that figure.

The first port of call is to try to optimise the packet sending rate, as it seems that one packet of ~260 bytes in each direction every ~3 seconds or so is probably not quite filling the 2,400bps channel, even allowing for the change-over.  Well, that sounded like a great theory. Looking closer, the packet rate is closer to one every 2 seconds, not every 3 seconds.  2400bps / 2 = 1200bps = 150 bytes per second raw rate in each direction, before subtracting a bit for turn-around.  Thus sending a 255 byte packet every 2 seconds = ~128 bytes per second, which is pretty much spot on the channel capacity.

This means we will have to look at other options.  The most obvious is to send large slabs of bundles, without all the other protocol fluff.  This is only really possible in this context, because the modem link is point-to-point, so we don't have to worry about any trade-offs. In fact, the whole packet format and message types are really a bit of an over-kill for this configuration.  But short of re-writing the whole thing, we can instead get some substantial gain by just sending a slab of useful bundle data after every normal LBARD packet.  This will get us half of the channel use at close to 100% goodput, and thus bring the average up to >50%.   If we send large contiguous slabs, then that will also reduce the opportunity for the block scheduling to get too much wrong, so it may well be even more effective. 

The modem has no natural packet size limit, so we can even send more than 256 bytes at a time -- if we trust ourselves to the modem's error correction to get the data to the other end intact. If errors creep in, and we don't have a means of detecting them, it will be really bad. The current error correcting code works on blocks of 232 bytes. We don't want to send huge amounts, though, as we want to maintain the agility to respond to changing bundle mixes. 

The largest number of 64 byte blocks we can thus fit would be 192 bytes.  256 would have been nicer, so that it would be an even number of blocks. That's a bit annoying, so I'll have to make a different type of packet that can carry upto 256 bytes of raw data. It will assume that the data is for the currently selected bundle, and thus only have a 4 byte offset and 2 byte length indicator, plus the escaping of the packet termination character. Thus it should have an average overhead of less than 10 bytes, and thus be better than 90% efficient.

Well, the next challenge was that some bytes were getting lost, as though buffer overflow was occurring.  However, as I have hardware flow-control enabled, I'm not sure of the cause here.  My work around is to send only 192 bytes, after all, and to make sure I throttle back the packet sending rate,  to try to keep the data flowing at a reasonable rate, without overwhelming the buffers.  This is a bit worrying, as in real life it is quite likely that we will get data rates below 2,400bps on the HF links quite often. 

Reducing the data packet size to 192 bytes has seemed to fix the buffer overflow problem. Also, I am now seeing the larger bundles go through more quickly.  However, I am still seeing the problem where the same bundle is received more than once.  Presumably it starts being received a 2nd time before we get confirmation from the Rhizome data base that it has been inserted. 

The solution would be to filter the receive list whenever we notice a bundle has been added to the rhizome database.  That way such duplicate transmission of the same bundle should get suppressed. Examining the log file of when this happened, it turns out that the problem was actually that the bundle was corrupted during reception.  This is possible if the bundle being transferred changes part way through, as our pure data packets lack bundle identifying information to save space. However, it looks like we will will need it after, to avoid problems with inferring which bundle is being sent by the remote end.

There is also a pattern where some pieces of a bundle are failing to get sent in the data packets. I'm suspecting that this is due to escaping of characters causing the encoded size of the data packet to become too large.  So I might need to trim the packet size again.  (This is all part of why I don't like it when systems try to hide packetisation behind a pretend stream interface.  You end up having to put framing back in the stream, which adds overhead, and there is no way to make sure your packets actually end on the underlying packet boundaries which increases latency. But I understand why people implement them.)

So I guess I'll have to drop back to 128 bytes per data packet, as well as including the bundle information.

Ok, that has helped quite a lot.  It is now delivering the packets more reliably.  But the throughput is still pretty horrible, with good-put rates still somewhere around 25 bytes per second.  I suspect the modems are really not designed for this kind of two-way sustained communications, but rather are optimised for mostly uni-directional transfers. 

If I can get hardware flow-control working, I should be able to improve things by using larger packet sizes.  But I think its also about time I actually wrote a simple test programme that measures the actual bidirectional throughput that these modems can deliver.  That way I know whether it is my protocol side of things that is the problem, or whether I really am hitting the limit of what these modems can achieve. 

But that will all have to wait for the next blog post.

Thursday, January 9, 2020

It's 2020 and Australia is burning

It almost doesn't need saying, but Australia is burning.  Lives have been lost. Houses destroyed. Livestock, farms and livelihood all gone up in smoke, quite literally.  I can't even imagine the pain and distress that this is all causing for so many. But what I can do, is try to do something about the technology gaps that are making a bad situation even worse for those affected.

As readers of this blog will know, I have been devevloping resilient telecommunications solutions for the last decade or so.  The best known of those is the Serval Project, which is a combination of software for smart phones that can form mesh networks, and small low-cost communications repeaters that we call Serval Mesh Extenders. Here is a brief introduction to what, and why, we are making the Serval Mesh:


 And to go right back in history to almost the beginning, here is the original motivation of the project:


And for another blast from the past, here is the original field test call from back in 2010, at Arkaroola in the Outback, which was also covered by the ABC:



Here's another video about the Mesh Extenders that we made back in 2013, when we were first developing them:


Since then, we now have much more mature hardware, and have tested the hardware for long-term operation in the field in a coastal area of Vanuatu, in large part thanks to a grant from DFAT under the Pacific Humanitarian Challenge a few years ago.

While the Serval Mesh was originally developed with disaster situations abroad, it was always also made with more local situations in mind, for example, following cyclones or in very remote areas lacking phone coverage.  For example, one variant of the technology that was worked on for a while, was the concept of an emergency network that uses vehicles as the main component, because they are the most ubiquitous infrastructure once you get out of the big cities.  Here is a piece that we produced with Toyota on this concept:

That video is also a good general introduction to the potential for the technology.

Basically any situation where cellular or other normal communications infrastructure is missing, damaged or disabled for any reason, the Serval Mesh lets local communities form their own local-area digital communications network.

There are also some branches to this work, for example, creating mobile phones that include the Mesh Extender functionality internally, and are designed to be fully "self-sovereign," that is independent of all power, communications and other infrastructure -- and that can offer security and privacy at least as good as the present state of the art.

So where are we at now?

Basically we have proven all the various parts of the technology that we need to make this a reality. What we need to do now, and our plan for 2020 -- and even before the first started -- is to focus time and energy on shaking down the last wrinkles in the system, so that it is ready for deployment by communities in the field. 

This includes revising the Mesh Extender circuit board, and fixing some known problems in the software, and then testing with larger networks of dozens to hundreds of units, that would more accurately reflect the real use of the Serval Mesh by communities in the field.

Flinders University has kindly granted me a "sabbatical" year to focus on this.  I'll be based at Arkaroola to do this, where we did the original field testing, so that we can use the vast rugged landscape there, including over 600 square kilometres of mountainous desert, to be able to deploy realistic test scenarios, and work on this scaling.

Our goal, is that the Serval Mesh can be ready for individuals and communities who want to use it, to do so, by the end of this year. 

Achieving this will depend on a lot of factors, including the ever present problem of having sufficient funds for the equipment that would allow us to work more quickly, and scale up our tests more meaningfully.  It also doesn't answer the question of how we support communities in their use of the technology once it is finished, or who will offer it for sale.  But those are issues that we can think further through during the coming year.

But one thing is crystal clear to me:  We each need to consider what we can do to mitigate the effects of the fires, and to adapt to a future where such events are more and more likely, and to do what we can to mitigate this threat. Whether we can see the whole solution or not, is to me secondary: We must simply make sure that we do what we can now.  And that is what I am going to do with 2020.