Monday, September 7, 2020

More work on LBARD over Codan HF Data Modem

In the last post, we got to the point where we had Rhizome bundles flowing over the modem, but not as fast as we would like.  Also, we were still seeing some problems where bundle sending would get stuck.  I've been away for a couple of weeks, but now am back at the bench with the HF radios, and can investigate further.

The first thing I realised that I should investigate, is what the actual throughput of the modems is, so that I can determine what fraction of the theoretical throughput we have available.

To do this, I instrumented LBARD to count the number of bytes, and divide that by the number of seconds since the first byte was received.  This yielded results of about 91 bytes per second in each direction.  Given the channel theoretical capacity of 2,400 bits / sec = 300 bytes / sec, this is noticeably lower than expected.  It means that we are only actually transmitting data (91+91)/300 = ~60% of the time. 

Now, if we were just transferring in one direction, instead of switching all the time, we could surely improve that, but we know that there is less than a factor of 2 available through such optimisation.  So I'm not too worried about that right now.

What is more interesting is that we should be able to get near to 90 bytes per second, but are still languishing down around the 20 bytes per second mark due to protocol issues. That is, we have a factor of 4 that we can improve, without having to increase the channel capacity or use strategy at all.

Having spoken with one of my students who is working on a debug framework for LBARD, it is sounding to me like the packet bitmap that is used for acknowledgement and scheduling of parts of a bundle that the receiver doesn't already have is not working properly.  If this is the case, it would be causing the sending side to send parts of the bundle that the receiver already has, effectively wasting bandwidth. 

Looking through logs of previous runs, I can see one problem already: We schedule sending a bundle at the end of the file, even if we have already sent to the end of the file.  That is, we send 0 bytes of data, positioned at the end of the bundle payload.  This is clearly silly, and should be fixed. What we need to watch out for, though, is that we use end-of-payload packets to implicitly indicate the payload length in LBARD transfers. 

We are also seeing the 4 data bytes being sent quite a number of times, before the missing parts of the manifest are sent through.  That resending needs to be fixed. But it also occurred to me, that if we randomly select a point in the last 64 bytes of the manifest or payload, when we will only send <64 bytes, when typically we have space in the packet for 128 or 192 bytes.  I suspect that this has the potential to greatly improve latency for small bundles, by basically eliminating such very low efficiency packets.  Together with preventing the re-transmissions mentioned in the last paragraph, this should be able to get the latency of a sub-1KB bundle down to perhaps 15 seconds, which would be great improvement.

I'll try these improvements, together with some tweaking of debug output, so that I can more easily follow when particular bundle pieces are sent, and what the list of candidate blocks at a given packet transmission are.

This should have been a simple-rerun, but then I hit a segfault caused by something unrelated, and now running it in gdb all sorts of strange things are happening, causing one of the instances to not properly switch to the online data mode when the modems connect.  Well, I found what caused the whole crazy problem to start: When I copy-pasted the arguments into gdb, I left the "1" off the front of "127.0.0.1", thus causing long delays while waiting for "27.0.0.1" to resolve, which caused the protocol to time out in various exciting ways.  So let's try that all again...

Ok. Now it is running again, after fixing that.  I also found and fixed some other bugs.  It is now transferring small bundles in typically 20 - 50 seconds.  This could be quicker, but it is a start.  However, I am seeing instances where a bundle is received, but then fails to insert into the Rhizome database with an error 422. This means that either the manifest is too big, or that the payload doesn't match the hash.  The bundle is then received a second time, and gets inserted. This means that there is some problem with the integrity of the transfer.

My first suspicion was that the pure data packets I implemented recently might be the problem.  But in this case, no such packets had been sent at all for the bundle in question.  As the packets are in any case protected by Reed-Solomon error correction, this would seem to suggest that the contents of the packets were being incorrectly assembled.

Actually, it turns out that this happens fairly often.  This is good, in that it probably explains a lot of the poor transfer performance: Some bundles are having to be sent two or more times, before actually being accepted. So now to find out why.

It looks like decompression of the bundle manifests fails, resulting in the end of the manifest being messed up, and not passing its cryptographic self-consistency checks.  It is the end of the manifest that gets messed up, so I am guessing that somehow another, either the length of the manifest, or the last partial 64-byte block of data for the manifest doesn't get properly received.  It should hopefully not be too hard to find, as it is happening quite frequently, with probably 20% of bundle transfers being affected by it.

To track it down, I'll make both sender and recipient parties display the compressed manifest data, so that I can play spot-the-differences.

So looking at the compressed manifest on reception, it looks to be the right size, but has all zeros in the last data blocks. The problem was that when I made the optimisation to send not just the last fractional block, but as many 64-byte blocks that come before it that can fit, I hadn't adjusted the start point of the packet data being read, thus it was missing the extra bytes.  So time for another run, to see how it goes.

Right. So now we don't seem to have the problem with the code 422s, which is good.  However, we have the problem that sometimes we keep receiving a bundle repeatedly after receiving and registering it.   This seems to be because we don't get a confirmation message from servald that the bundle is now in our rhizome database.  As a result, we don't cull the duplicate reception of the bundle. It also means that we can't immediately start sending the bundle to other peers. 

This isn't such an issue for the HF modem driver, because we don't have other peers to talk to. However, for Mesh Extenders with the UHF packet radios, this is a big problem, as bundles won't get forwarded on to other peers until this occurs.  Thus we need to find and fix this problem.

LBARD communicates with servald via servald's HTTP RESTful interface.  In particular, it listens for new bundles via an asynchronous API, which means that we should get indication from servald essentially immediately on inserting the bundle.  So I presume something is fishy in that.  To investigate this, I have instrumented the code that communicates with servald, to find out where the problem is.

While looking at the logs for these delayed bundle notifications, I am also seeing that some bundle transfers are slow, as one end is not sending any data packets at all to the other end for 10 -- 20 seconds at a time. This should never happen, especially once the initial synchronisation has occurred, that lets each end know what the other needs.  In the current case, I am seeing a delay of 24 seconds between when successive pieces of a bundle are sent.  So we know that the remote end thinks it has a bundle that is worth sending.

I'll start by investigating one of these intervals, from 17:51.02 to 17:51.26.
During that time, we do receive several packets from the sender at 17:51.10, 17:51.12, 17:51.15, and 17:51.21. That is, we have received four consecutive packets that seem to contain no pieces of bundles.  Okay, so it seems that the synchronisation process is continuing to run, and is sending packets of ~200 bytes of synchronisation information.

Thus these delays are a false alarm, caused by the natural synchronisation process.  What is a bit worrying though, is that the synchronisation procedure is taking a long time. With 100 unique bundles on each side, synchronisation is taking at least tens of minutes.  In this example, I am seeing of the order of 300 sync messages of close to 200 bytes each -- and it shows no sign of settling down.
This amounts to somewhere around 68 KB of data out of a total of 84KB sent.

Well, that's very interesting: It explains why our goodput is so low, and generally, why transfers are going so slowly.

Thus together with the Rhizome bundle notification delay, we have to bugs that are able to greatly slow things down.  I now have something quite definite to concentrate on.

I'll still work on the Rhizome problem first, because I am much more familiar with that code. 

The synctree code is going to be trickier to fix, because I didn't write that. What I suspect might be the issue with it, though, is that the sync tree code doesn't expect to be used in an environment where multiple packets get queued up.  This means that when it sees a response to and old packet of its own, it might be getting confused, and starting the tree traversal from higher up the tree again.

Thus the potential fix for this is to find a way to make sure we have only one packet in flight in each direction at a time. This would be a good idea in any case, because it would help to minimise the latency when a bundle is received, so that the sending party doesn't keep needlessly sending parts of that bundle after it has been received.

Well, actually I am going to fix this one-packet-in-flight issue first.  But first, a brief explanation as to why we do this syncing this way in the first instance, rather than, say, just exchanging lists of bundles, and then sending the bundles that the other side needs to get -- because that would surely be more efficient.

The reason is that a simple "exchange lists" and then bundles method would be fine if no new bundles ever arrive during a communications session.  But we want to be able to deliver messages with reasonable latency, even if they arrive while a connection is active.  That is, we would like to have live HF links between Serval Mesh islands, and have those act as near-real-time data conduits.

Now, back to the need for the sync algorithm to not be several packets behind, I have implemented this, and it seems to be working somewhat: The sync traffic is flowing, and is no longer filling the packets.  However, the rate of packet exchange is VERY slow.  The average raw data rate in each direction is only about 8 bytes per second. It looks like each end thinks that it has 2 unacked packets nearly the whole time, which is causing it to throttle back the packet rate.  I am suspecting that the low data rate is interacting in a bad way with the modems trying to be efficient, and waiting until there is enough data to be bothered sending.

But back to thinking about having only one packet in flight in each direction at a time, this means we will always end up with some wasted time, because one end or the other will almost certainly be waiting most of the time.  Investigating further, the round trip time for a packet turns out to be around 15 seconds (!!).
Even allowing for two maximum size packets of 255 bytes, the propagation time would still only be (255x2 bytes)/(2400 bits per second) = 1.7 seconds.  This makes me all the more suspicious about the modems trying to batch data together for efficiency, which in other cases would work.

So we need a solution to this problem.  The previous work I did on inserting the occasional pure data packet might actually be a good solution here: Basically it means that we have a background level of sync exchange through the existing packets, and then we greatly accelerate the exchange of bundles via the pure data packets, with the added benefit that they should help to flush the modem buffers.
                    
I'm now reading through the manual for the modems, with a view of disabling compression, in case the compression is causing increased latency. In the process, on page 6-2 of the "HF Data Modem 3012 Reference Manual", I have found some helpful information: While the data channel can in theory carry upto 2,400 bits per second, the maximum effective throughput is only 1,475 bits per second, i.e., 184.375 bytes per second.  This explains why I was seeing data rates of about 92 bytes in each direction.  While the extra bandwidth would have been nice, it does at least give me a clearer idea of what the maximum realistic capacity is.

On page A-3, I also found *T, which allows setting the timeout before sending data packets.  This is important, because any non-zero value will increase the latency of packet transmission. This setting can be set from 0ms to 120000ms.  The default is apparently 0ms. I'll set it explicitly in any case, just to make sure... Except the modem complains about the command.  It is oddly listed as only *T, not AT*T, but *T also does not work. I tried *0, in case the T is supposed to be the value itself.

Also AT&M=5 is supposed to select the highest data rate and optimise for interactivity, i.e., reduced latency. That was already set.

Doing some more direct tests with the modems, I timed how long it takes after I paste a 62 byte string into one modem, before it appears at the other end.  This reliably takes 7 to 8 seconds, thus confirming this latency problem.

The issue we have, is that our synchronisation protocol currently does not support having multiple packets in flight, as it was designed with radios in mind that can reliably indicate when they are ready to send a packet, and then send the packet immediately, thus allowing each packet to be build just-in-time, to ensure it has the most appropriate content in it.

In fact, just pressing a letter and waiting for it to appear at the end has the same latency of 7 - 10 seconds, so it isn't a bandwidth thing.  Rather, it seems that the modems make at least one full turn-around each, before the data gets sent.  Is this normal?

I did also find the command to disable data compression, AT%C0, and I am now testing that to see if it makes any noticeable difference.
(Another useful command I have found is AT&C=n,n,n,n,n, which allows providing the modem with a list of channels on which to listen for incoming calls, which I'll use later, and record here so that I don't forget about it ;).

But back to AT%C: Disabling data compression DOES make a difference: It drops minimum latency from around 7 - 10 seconds, to only about 4 seconds.  That's a big help, and should greatly improve our throughput.  With 4 seconds in each direction, and sending packets of up to 232 bytes, this should get us a data rate of up to 58 bytes per second of the theoretical 92 bytes per second available.

Thus, while it is a good step forward, it isn't enough to completely solve the problem.  We still will need to send some pure data packets to make full use of the bandwidth.  Part of me also wonders if disabling data compression might help with the buffer overrun problem we were having previously, that was limiting the size of the data packets we could send between normal LBARD packets. 

But before we make further changes, we should time how long it is taking to send small bundles (without payloads in this case), so that we have a base line to compare against.  Having ~4 second latency in each direction instead of ~8 seconds, means that we will detect the end of bundle transmission on average 8 seconds sooner, which should provide a noticeable improvement.

Sending only the normal LBARD packets, without any extra data packets, and it fails, because of the previously reported problem with payload-less bundles.  I've now put a fix in for that, and that helps.

I'm also seeing another problem which is when the receiver tries to send a Bundle Announce Response (BAR) to indicate to the sender that it has received the bundle, we hit a problem if the packet contains another piece of the same bundle.  That 2nd piece causes the receiver to emit an ack for that piece, which replaces the BAR in the transmission queue. As a result the sender never hears that the sender has the bundle, and keeps sending pieces of it.  This is likely a key contributor to the highly variable receive times we have been seeing for bundles.

There are two fixes for this that should both be done:
1. ACKs should never replace BARs in the progress report TX queue.
2. We should simply ignore pieces of bundles we have marked as recently received, and never generate the ACKs for them, but instead produce BARs for them, to indicate that we have already received that bundle.

With that fixed, along with some other tidy-ups, we are now ready to do the comparison again.  We will count the number of zero-length payload bundles (which thus involve sending about 200 -- 220 bytes of compress bundle manifest, and all the house-keeping) that we receive in 1,000 seconds, and the total bytes transferred in that time, so that we can calculate the average bundle transfer time, and the goodput. 

Again, we will do this with no extra data packets, then with a 128 byte data packet after each normal LBARD packet, and then again with sending a data packet every 2 seconds, independently from the LBARD packets. We will average for both directions over our test between the two radios.

With 0 byte payloads, i.e., just manifests:

No data packets:
A -> B : 16.66 bytes per second, 13 bundles transferred
B -> A : 16.93 bytes per second, 13 bundles transferred
Sending 1 x 128 byte data packet after each normal LBARD packet:
A -> B : 25.74 bytes per second, 19 bundles transferred
B -> A : 25.26 bytes per second,  21 bundles transferred
Sending 1 x 256 byte data packet after each normal LBARD packet:
A -> B: 33.22 bytes per second, 26 bundles transferred
B -> A: 33.06 bytes per second, 33 bundles transferred

Sending 1 x 256 byte (including headers) data packet every 2 seconds, as well as normal LBARD packets:
A -> B : Modems hang up after a while, possibly due to data overrun, as bytes per second climbs to ~90 bytes per second.
B -> A : Modems hang up after a while, possibly due to data overrun, as bytes per second climbs to ~90 bytes per second.
(this makes sense
Sending 1 x 200 byte (including headers) data packet every 3 seconds, as well as normal LBARD packets:
A -> B: 48.13 bytes per second, 29 bundles transferred
B -> A: 48.18 bytes per second, 26 bundles transferred
Sending 1 x 256 byte (including headers) data packet every 3 seconds, as well as normal LBARD packets:
A -> B:  78.37 bytes per second,  23 bundles transferred
B -> A:  80.62 bytes per second,  21 bundles transferred
Sending 1 x 200 byte (including headers) data packet every 2 seconds, as well as normal LBARD packets:
A -> B: 68.59 bytes per second, 23 bundles transferred
B -> A: 68.86 bytes per second, 23 bundles transferred

Note that just because we allow a 256 byte data packet, doesn't mean that one will get sent: The actual packet sent might be quite a bit smaller, if there isn't enough data for the bundle currently being sent to stuff in.

We also see that in some cases, the bytes per second is higher, while the number of bundles transferred does not increase, or in fact goes down.  This will most likely be due to the re-transmission of the same data blocks over and over, while waiting for acknowledgement of reception of the whole bundle.  Indeed, switching from 256 byte to 200 byte data packets with the same 3 second interval, we see no decrease in bundles transferred, although the number of bytes sent per second reduces.

Once we move to bundles with non-zero-length payloads, this effect should reverse. Another effect that is likely at play, is that if we send lots of data blocks between the LBARD packets, then we will increase the rount-trip-time of the LBARD packets, which will slow acknowledgement of bundles, and thus reduce the number of bundles that can be received per unit time.

Another factor that must be taken into account here, is that the first bundle transfer does not typically take place until around 120 -- 180 seconds into the 1,000 second test. Thus once actively communicating, the actual time per bundle will be lower than suggested here.

So now to repeat those tests with non-zero payloads for the bundles... Which showed up a bug in payload handling, which was quick to fix. More critically, it showed up an issue where receivers sometimes don't recognise packets as being addressed to them.  This means that when the other party sends them pieces of a bundle that they have already received, they don't tell the sender that they already have it.

The mechanism that is playing up stuffs each message containing a piece of a bundle with the first two bytes of the SID of the receiver.  This works fine for some time, but after a while, the receiver finds garbage in this field.  What is not clear is whether it is a data handling bug, that corrupts it on the receive side or during assembly on the transmit side, or whether the field that contains a peer's SID on the sender side is getting corrupted.  This takes of the order of 15 minutes to occur.

LBARD maintains both  ASCII-hex and binary representations of each peer.  If these end up with different values, then we can assume that memory corruption has occurred. I'll add instrumentation that will help me to see if that is what is going on.

Interestingly, the data that overwrites the binary representation seems to often begin with $47, suggesting that some piece of code might be putting stuff there by mistake.  The "Generation ID" message type does put a $47 followed by 4 random bytes.   This message type also contains no bounds checking -- it just assumes it has been given a buffer with enough space in it.  However, it is only called at the start of the packet assembly process, when it can be safely assumed that there is enough space in the buffer to fit the message.

A few more runs confirms that it is always $47 in the first byte, so almost certainly something screwing up with the Generation ID messages.  I'll add some more instrumentation to see what the new generation ID is each time one is received, and if it matches what I am seeing in the packet, then I will know that its to blame.

Ok, while waiting for the run to hit the problem again, I've taken a closer look at the code that parses the generation ID messages, and discovered that it does indeed put the generation ID into the SID prefix field. I.e., a straight-forward bug.

This fixed the problem, and got everything going, but it still seems to be rather slower than it should be sometimes.  But investigating the remaining problems can go into the next blog post.

No comments:

Post a Comment