Enabling Communications, Anywhere, Anytime

Friday, September 30, 2022

Distributed Search & Discovery - Milestone 1

With the support of the NLnet Foundation, we have been looking at extending the Serval Mesh to support search and service discovery functionality on isolated mesh networks, i.e., without relying on internet access. This is what we talked about doing for this project:

Search and discovery have become invaluable elements of the modern internet. However, for communities experiencing transient or chronic loss of internet access, these facilities are not available. This can prevent these communities from being able to promote and find and utilise local services – which together can impair economic capacity and recovery. This project seeks to add a simple fully-distributed search and discovery mechanism to the Serval Mesh.

The COVID19, Cyclone Harold and the complete lock-down of Vanuatu has made it problematic to pilot the result with our South Pacific partners there. Testing will thus occur with IRNAS in Europe and also in Australia, where we have access to the Arkaroola Wilderness Sanctuary as a geographically large test site. The capstone output being a Ubuntu package for the Serval Mesh, including the new search and discovery functionality.

This blog post describes the work undertaken over the part months on this, to satisfy the first milestone.

We have come to take the ability to search for information and services on the internet for granted, precisely because they have become so useful and universally available. The trouble is that they require an extensive international infrastructure in order to operate. This means that they are not available in areas lacking communications infrastructure, or where communications infrastructure has been destroyed or denied.

It is not merely as simple as saying that we should create a distributed internet that supports all of the services that we currently expect from the internet, because global services require infrastructure to operate. Otherwise the provider’s of these services would have implemented them themselves in this way, because they would have been very happy to save all of the money that they otherwise have to spend on creating and maintaining that infrastructure.

However, if we narrow our focus down to how we can provide local search and discovery services, then the situation changes. Just as Serval’s SMS-like MeshMS service and local social-media system, MeshMB, are able to provide local services without requiring infrastructure, the challenge of this current project is how we can define and provide local search and discovery services built on the Serval Mesh’s distributed communications primitives. And that is exactly what this milestone sets out to do.

So let’s break this down into the three steps that we need to pursue to achieve this:

1. Identify the precise search and discovery services to be provided, including from both data provider and consumer perspectives.

So what do we think are the kinds of search and discovery services that we can implement over the Serval Mesh’s distributed networking primitives? Serval’s Rhizome distributed data system effectively works like a magic file share: Various users put files (or file-like objects we call bundles) into it, and those then get distributed throughout the network in an opportunistic manner. The result is an approach that provides eventual consistency, provided that there is sufficient storage on each node, and provided that there is sufficient opportunities for network communications.

The storage limitation is addressed by having a prioritisation system, so that the highest priority material is retained, if there is not space for everything. To date, the priority system has been relatively simple, with smaller files/bundles taking priority over larger ones. This could be easily extended to prioritize local content over remote content in a variety ways. This could include either geographic bounding boxes for content, that indicate that it is only of relevance in a given geographical area, a hop counter, or alternatively that content is tagged as belonging to a given community. Community tagging has the advantage that material could follow a community who were on the move. Neither is perfect, nor are they bullet-proof against various forms of denial of service measures, as is common for most mesh networking systems. Essentially any mesh network is very difficult to defend against a well-resourced aggressor. We will thus just focus on how we can add the ability to search and discover content on a mesh network.

The Serval Rhizome system supports both encrypted and non-encrypted content. Each bundle also has meta-data. At the most basic level, we can simply allow users to search this content, without describing any additional services or semantics. We should do this, but also consider services that can be created that intentionally support discovery and search. Also, from another perspective we should think about the types of discovery and search services that are desirable for infrastructure-deprived contexts. Ideally, by considering what is possible, and what is desirable find those capabilities that are going to be successful in practice.

From the possibility perspective, we need to formulate search and discovery systems that can work using only cached data. This leads to a positive characteristic, in that the time to perform a search can be very rapid in almost all cases, because no network latency will be involved. There is of course a negative aspect, in that if the data has not reached the device performing the search, than it cannot be searched. This is the fundamental limitation of the the Serval Rhizome data model, which is the trade-off of the ability to operate completely independent of supporting infrastructure. As with every other network model, the nature of the network influences the nature and architecture of the network services that can be delivered – especially if they are to perform well.

To help us identify services that are of interest to consumers, it’s probably a good idea for us to look at statistics relating to search engine use, such as from somewhere like these data from 2020:

https://www.statista.com/statistics/271599/most-common-online-search-topics-of-internet-users/

This lists the top 20 reasons for search globally, which has some limitations for our purposes, which we will discuss shortly, but let’s look at the top 10:

13% - Translate

11% - Social Network

10% - Entertainment

10% - Shopping

10% - Weather

9% - News

8% - Web Services

5% - Gambling

4% - Mail

4% - Traveling

Let’s start by saying that I don’t really have a strong opinion as to how reliable I believe these percentages to be, but the overall categories look reasonably sensible, although I am dubious when pornography doesn’t make it into the top-20 list at all. Perhaps that comes under “entertainment”, “shopping” and/or “Web Services” to some degree. But that’s rather academic for us, because we are not interested in creating an infrastructure-free pornography distribution service. In fact, history has shown that you can tell when a communications medium begins to be useful, precisely because it starts being used to carry pornography.

Rather, we are actually actively wanting to create a platform that doesn’t get overwhelmed by porn. This can be difficult, precisely because of the propensity for communications media to get used for this purpose. The services we are creating need to be crafted as intrinsically safe places, to the maximum extent possible. This is not always easy, and can never be done perfectly, but there are sensible measures that can be employed to help.

The corollary to this is that if you don’t take steps to make communications spaces safe, then they rapidly end up as very unsafe places. This has shown itself repeatedly, especially in recent history: CB radio with its near total lack of accountability consists in most places in large part of a stream of profanity and abuse, making it unsafe for families and young peoples – in spite of often quite strong laws that ban even the use of swear words on the medium. Then we have more similar examples to our area of interest, where online chat systems such as FireChat. I recall around the time it was released trying it out in the mundane environment of an Australian city, and even here, the lists of nearby users were eye-scorchingly offensive to read. This isn’t a complaint about FireChat per se (and they might well have great tools for dealing with this now), but really just an example of what I consider to be an axiom of human communications: If people are unaccountable for what they say, they will say just about anything, and a lot of it will be designed to shock, and a lot of it will be sexually explicit and abusive. If we want to create services that are going to be wide spread by civil society, rather than just “uncivil” users, then we have to take active measures to help nudge the use of the system in the right directions, and to reduce the potential for abusive use to be easy or attractive.

For example, the text messaging and neighbour discovery mechanisms already in the Serval Mesh are quite conservative about showing you random participants who are a long distance away on the network, to reduce the potential for unaccountable abusive activity, such as using offensive account names, and allowing random people to send you messages that appear together in your personally curated inbox. Instead, your inbox only shows content from people you have added as contacts, or in the case of the Twitter-like micro-blogging facility, only of people you have chosen to follow their feed. This creates a safe working place, where you can carry on your day to day activity, without fear of random anonymous abuse.

The trade-off is that you have to actively choose to go through your contact requests, when you want to add new people to your network. This makes it harder for social groups to self-organise, and necessarily means that you may have to make excursions from the safe space into the wild exterior. One way to improve on this, is to have a filter for truly “nearby” users, so that you can find people on the network when you meet in person. This could require direct network connection, i.e., that user’s devices are physically directly peered with one another, or use some other mechanism, such as a more relaxed hop-count limit, or perhaps better, to sort the list of potential contacts by hop-count and time since last reception, so that fresh local requests appear at the top of the list. Of course, this current work on supporting search and discovery mechanisms helps to directly address this issue as well, and we will keep solving this problem in mind as we design them.

Another approach that could help would be to have a mechanism to create closed communities, where a common passphrase or similar is used to tag network content as belonging to a particular community. There are also clever cryptographic mechanism to achieve this effect that don’t allow for a random unaccountable user to obtain and use the passphrase. Such approaches help not only to filter results to a manageable level in a large network, but also directly addresses some of the unaccountability that leads to what we might generally call anti-social behaviour on these platforms.

Hybrid approaches are also possible, where for example, only users in your verified social group are able to have image content displayed, so that the benefits of images can be used in a service or facility, but with reduced probability of this filling up with pornographic or other anti-social content. There is considerable potential solution space for such measures, well beyond the scope of this current project. But what is relevant to this project and should be included, is at least some exemplar mechanism for discouraging anti-social behaviour, and where necessary, excluding anti-social users and their content. That is, it must like the existing Serval Chat (MeshMS) and Serval Micro-Blogging (MeshMB) services follow a “safe place by default” approach.

2. Create several user-stories for how the search and discovery systems might be used in practice.

To create our user-stories, we will follow the Serval Project’s long focus on post-disaster response and recovery among the general public. We will thus imagine a situation where a community has had infrastructure damaged, and is without functioning communications infrastructure to reliably connect them to the outside world. Of course this scenario has strong parallels to what we see when people are living in regions of unrest as well, so it’s a good starting point for us. The following user-stories will walk us through an extended scenario in such a post-disaster situation. All names and locations are fictitious.

2.0 Scenario Background

The communities of Ulia, Luania, Saua and Kiatu live on, Tija, a long and thin island of 800 square kilometres at the northern end of the archipeligo, some 400km from the national capital of Ari Ula, and 90km from the next nearest populated island. The 3,500 people living on Tija enjoy life on their tropical island, located in the South Pacific. Kiatu, the regional capital, is the largest population centre on the island, with a population of just over 2,000 living in the town and surrounding villages. There is 4G cellular coverage in the town centre, with 2G coverage extending out to the surrounding villages. The cellular service, including internet access, is connected to the rest of the country via a VSAT satellite link. Luania is the nearest of the other communities to Kiatu, although it is still some 15km away. Depending on weather conditions, can sometimes use mobile phones from on the top of a nearby hill, that has line of sight to Kiatu. Saua is 7km on the other side of Luania from Kiatu, and has no cellular network access. Ulia is yet another 12km further on from Saua, and also lacks cellular coverage. There are NGOs or small government presences in Saua and Ulia, who have HF radios, that are used to provide a basic level of communications back to Kiatu and the national capital.

Tija’s location in the southern tropical zone of the Pacific exposes it to considerable geophysical hazards, including cyclone, tsunami and earthquake, like many other Pacific Island Countries. GDP per capita is relatively low, yet as has become common around the world, almost all families and most adults have access to a low-end smart-phone. The relative expense and difficulty accessing mobile internet means that the population is quite familiar with “side loading” of applications on their Android phones, as is the case in many Pacific Island Countries. The Tijans have a strong community structure based around traditional chiefdoms in each of the four communities, and work constructively with one another whenever disasters strike.

Because of their vulnerability, and isolation from conventional communications, a number of residents in each of the four communities on Tija already use the Serval Mesh peer-to-peer software on their Android smart-phones.

Our scenario begins with a 7.2 magnitude earthquake that causes substantial damage, especially in Kiatu, which is nearest to the epicentre. Fortunately the earthquake triggers only a relatively small tsunami, and there is no immediate loss of life. However the combination of the earthquake and tsunami have destroyed the cellular infrastructure and its satellite earth-station. It will likely take the mobile network operator a couple of weeks to restore service, as they are also having to proritise restoration of service to larger communities nearer the national capital.

2.1 Forming a community

With conventional communications to the outside world down, the Tijan communities need to find their own ways to communicate, so that they can coordinate their own response to the disaster, and generally keep their communities functioning as effectively as possible.

The larger community in Kiatu is particularly keen to enable local businesses to continue trading, and to ensure that the limited local resources can be readily used where required, e.g., building materials required to repair infrastructure, as well as the people required to undertake such work. They also have a need to identify where infrastructure is damaged, so that they have the situational awareness that will enable them to be effective in their response. Knowing where damaged infrastructure is will also be helpful for re-routing traffic and ensuring public safety.

The smaller out-lying communities are also keen to ensure that local communications is possible, so that they can ensure the well-being of their communities, and collate information about any damaged infrastructure so that this can be communicated to the regional and national capitals, so that remedial works can be carried out when the expected foreign aid support starts to flow into the country.

To support this, in each of the communities someone opens the Serval Mesh app, and uses the Create Community function to create a new local community on the mesh network. As well as a community name, a shared secret is set up that is required by participants who wish to contribute material into the community, but reflecting the friendly socio-political environment on Tija, the communities are set up to allow read and search access without needing the shared secret. Also in the smaller communities, the shared secret is set to something quite simple, such as 1234, as the communities prioritise ease of access over security.

In Kiatu, the larger population means that several different people have attempted to create the community, resulting in some initial confusion. Later in the day, the regional administration select one of those communities to be the official Kiatu earthquake response community on the Serval Mesh. Nonetheless, it takes a few days for almost all members of the dupliate communities to switch to the official community. These problems do not emerge in the smaller communities, as their populations are small enough that the normal village meeting mechanisms are able to lead a unified approach in each community.

At this point, there are four primary communities that have been created: Official Kiatu Earthquake Response, Ulia Earthquake, Luania Together, and Saua Saua. This is inaddition to the short-lived duplicate communities in Kiatu: Kiatu Earthquake Help, Earthquake Help, and Kiatu Together. However, there are relatively few users in each. Initially, the duplicate Kiatu communities have more users than the official community, reflecting the ability of grass-roots organisation to be faster than government processes, especially following a disaster.

2.2 Building a community

After creating the communities, individuals within those communities approach others and ask if they have joined the new communities, and explain to them how the communities can help them, by allowing the collection and sharing of information of relevance to them. For those who do not yet have the Serval Mesh on their phone, they side-load the application using the bluetooth tools for doing so that they are already familiar with, or they use the wifi-based sharing mechanism in the Serval Mesh app.

This process continues until all community members who wish to join a community have joined.

2.3 Searching for communities

Some community members don’t wait until someone invites them, but use the Search for communities function in the Serval Mesh app to find communities. Most users use the Find communities that my contacts are in function to do this in a safe way, that avoids the risk of seeing offensively named communities. Some users choose to search through all available communities, after clicking on a confirmation that unprotected searching may result in them seeing offensive material. Either way, when they find the community that they are looking for, they use Join Community, and if they make an error, or wish to otherwise leave a community, they use the Leave community function.

As members of the communities have family and friends in the other villages on Tija, they use the Search for communities function to discover and join the other communities that have been setup on Tija. Among those is Neire, who lives in Ulia.

2.4 Searching for people: User discovery

Neire helps to look after her grandmother, Iu. Neire helps her grandmother setup the Serval Mesh on her phone, and helps her to join the Ulia Earthquake community. She then uses the find users function, with the search scope set to only look for people in my community, to find herself on Iu’s phone, and to find Iu on her phone. They confirm that they can send each other text messages and voice messages. Neire then helps Iu to find other relatives who have joined the community, and adds them all to Iu’s contact list, so that she can easily communicate with them.

2.5 Searching for services: Service & Data discovery

Neire is helping with the recovery effort. As she can drive, the village chief allows her to use the village‘s truck to help move people and goods, as required. Neire uses the offer service function to advertise to each of the communities the availability of freight and transport services for the next three days. She offers this for free.

The truck has only a ¼ of a tank of Diesel, so Neire uses the service search feature to search for available sources of diesel fuel. She also uses the publish request feature to announce that she is looking for diesel fuel. Using this, after a short time, she receives a resource match alert, that contains a list of current offers for diesel fuel that have been made in the communities that she is participating in. She thus discovers that there is some diesel available in Saua, and prepares to head to Saua. Before departing, she searches the list of published requests from the Saua community.

By this time, more people have had time to make requests, and she discovers that Saua is running short on fresh drinking water, because the earthquake has damaged some of their water tanks, and the tsunami has caused contamination of several other water tanks. Neire thus arranges with others from her village to cart 900L of drinking water in a variety of containers. One of the other community members uses the offer resources function to announce the availability of ample fresh drinking water in Ulia. Finally, Neire and several of her extended family set off to Saua with the water, and to collect the diesel.

2.6 Searching for data/information: Data discovery

Other members of the communities use the search published requests and search services functions, both to find resources that they need, as well as to understand what resources and services they might be able to provide to meet unmet needs.

Boli from Luania notices that there are unmet requests for diesel fuel, because the only offers for diesel fuel are from a few community members who have diesel in portable containers. Boli assumes that this means that the fuel distribution centre in Kiatu was damaged in the earthquake or tsunami. Boli and several others in Luania have coconut plantations, and have hand-presses for extracting the oil from coconuts, which can be used in adapted diesel engines, provided that the weather is warm enough that it doesn’t solidfy. Fortunately, the tropical climate on Tija means that this is a very rare event. Boli thus uses the publish service and offer resources functions to offer diesel fuel and fueling services. Boli and the other coconut plantantion managers arrange for the production of several hundred litres of oil over the next 24 hours, and keep an eye on demand through customers and through the search published requests function.

2.7 Searching for places: Mapping and geoinformatic discovery

Meanwhile, Neire and her team are making their way down towards Saua. Along the way, they discover several places where the road has been damaged due to rock slides or erosion from the tsunami. Neire has one of her team use the crowd-sourced mapping function to mark these locations on the map, so that others will know about the damage, allowing them to avoid it, or to begin work on repairing them. Some of the damage is quite near to Saua, and after Neire arrives, she shows some of the people in Saua how to use the crowd-sourced mapping to search for damaged infrastructure, and search for resource offers to find the materials that they need to start repairing the worst of the damage.

Neire uses the crowd-sourced mapping to navigate to the locations where people had requested fresh drinking water and helps distribute this. She is also able to efficiently plan her route to include collecting the diesel fuel. The recipients of the water use the cancel resource request after they receive the drinking water. They have in the meantime logged their damaged water tanks on the crowd-sourced mapping function. After Neire has collected the diesel fuel, that offer is also removed, using the update resource offer function.

Neire now has enough diesel fuel to respond to several other requests for moving materials around Saua, including helping the road repair volunteers get their tools and materials to the nearest damaged location. She knows that she will still need more fuel, and has been keeping an eye on the resource match alert function, and discovers the offer of coconut oil for use as diesel substitute. She uses the Serval Mesh‘s MeshMS text messaging function to make arrangements to collect 300 Litres of coconut oil, enough to fuel her truck for the next couple of days. Because she is able to communicate with Boli, she knows when the coconut oil will be ready, and is able to help with other recovery activities beforehand. The crowd-source mapping function also lets her discover ahead of time the condition of the road, and allow sufficient time to get there, and to bring some repair materials and volunteers to stabilise the worst of the damage.

Neire arrives at Luania in the early evening, soon after Boli has her fuel ready. Neire uses the search for services function to find a place where she and her team can spend the night, before they continue helping their communities the next morning.

2.8 Searching by low-literacy user

Not everyone on Taji has a high level of literacy. Yeco is an older man in Luania who lives on his own when his family travel to Kiatu for work. This is the case following the earthquake. His family have not been able to return from Kiatu yet, as a bridge has been damaged by the earthquake, making the road impassible. However, Yeco is able to exchange voice messages with his family sporadically, as people walk from Luania and Kiatu to opposite sides of the damaged bridge, allowing messages to be automatically exchanged via the Serval Mesh. Thus Yeco and his family know that one another are safe. However, Yeco’s kitchen has been damaged, and so he will require assistance finding nutritious food. Based on instructions from his family, he opens the crowd-sourced mapping function, and uses emojis attached to the service offers and resource offers to find someone nearby who is offering food. He sends them a voice message asking if they can bring him some food. They ask him in a voice message to use the share location function, which he does, thus enabling them to deliver him food until his family are able to return from Kiatu.

2.9 Searching by visually impaired user

Ado is another elderly person living in Luania, and while she is literate, her eyesight has deteriorated over the years, making it very difficult for her to read. She is, however, able to use the vision impaired interface option to access basic functions of the Serval Mesh. This provides a text-to-speech interface, which allows her to access and send voice messages to her contacts. She also requires assistance with obtaining food while her family are busy with the recovery efforts. She is able to send them voice messages to let them know how she is going, and set their minds at ease, and that she has used the vision impared interface to listen to the various resource offers, and thus find someone nearby who is able to provide her with nutritious food, and to communicate with them using voice messages.

2.10 Summary

The above scenario and its embedded user stories show how communications in these sorts of situations can be very effective at helping vulnerable communities. The combination of the existing Serval Mesh functions, with the ability to share, discover and request services and resources makes it much easier for isolated communities to respond efficiently and with effective situational awareness in ways that have not previously been possible. Now our challenge is to collate the functions required to enable these improvements, and map them to the Serval Mesh‘s delay-tolerant networking primitives.

3. Map these services to the Serval Mesh Rhizome Delay Tolerant Networking Primitives, and work though several user-stories to confirm that the model is fundamental workable.

3.1 Mapping to Rhizome Primitives

Let’s start by summarising the functions that are referenced in the above user-stories. Then we can look at which ones are related, and how we can map these to Serval Mesh Rhizome Delay-Tolerant Networking primitives, excluding those already in the Serval Mesh, such as find user, or Send MeshMS text message. These collect into several fairly obvious groupings:

Community Management:

Create Community
Search for communities
Find communities that my contacts are in
Join Community
Leave Community

Services & Resources:

Offer Service
Offer Resource
Search for Service [Offer]
Search for Resource [Offer]
Publish request for service
Publish request for resource
Cancel request for service
Cancel request for resource
Update request for service
Update request for resource
Update offer of service
Update offer of resource
Search service requests
Search resource requests
Alert on Resource Match

Crowd-Sourced Mapping:

Share to Crowd-Sourced Map
Show Crowd-Sourced Map

Transcending these are the need to support accessible content, such as emojis for searching visually, and text-to-speech and emoji-to-speech (including in multiple languages). The mapping functions can be supported by including a geographic tag, which might represent a point, a locality or a geographic bounding box, depending on the circumstances. For simplicity, it probably makes sense to use Google’s Plus Codes as a succinct way to describe a location or locality.

At a lower level, it looks like we can fairly easily map these to Rhizome primitives. For each user, we can have a bundle that contains the current list of services and resources that they are offering and seeking. Each of these can be associated with a Plus Code, that locates the request in space.

One of the challenges that we will need to face, is the cluttering of the system, especially when people make offers or requests, and either forget to remove them, or perhaps move out of the community. In these cases, we don’t want these stale offers and requests to hang around, making the system difficult to use. To solve this, in addition to the Plus Code to locate the request in space, a time maker can also be added, that indicates when the request or offer is valid, with fairly tight maximum bounds. The complete bundle should include a claimed time of publishing, and the time markers should be relative to this, and limited to a maximum of, say, 3 days from that time of publishing. This is designed to be long enough for requests and offers to spread through the network, including in fairly difficult situations, but not so long that they can clog up the system and cause more trouble than help. Based on experience with working with humanitarian response organisations, 3 days is a very long time in a disaster.

Then we also have the crowd-sourced mapping elements, that are separate from the resource and service requests and offers. These can, however, be included in the same bundle, and simply encode points of interest. Again, a similar expiry system should apply, but probably with a longer expiry, perhaps allowing up to 14 days. After 2 weeks, the information is likely to be quite out of date. But the user can of course choose to re-publish the information.

This approach requires only one bundle per user, which will help the system be more efficient. This bundle should use a unique service field in the Rhizome Manifest that contains the bundle meta-data. The various functions then simply either modify a user’s own bundle, and re-publish it into Rhizome, or search the bundles in Rhizome that have this service field. No changes are required to the low-level functionality of the Serval Mesh.

The existing prioritisation scheme that prioritises small over large bundles will prioritise those bundles that contain only a single request or offer, over those that contain many offers. This is probably reasonable, and is certainly easy to explain to users. The software could guide the user to understand that if they want a “high priority” request, then they need to make only that request, and not others, and not also make offers. While imperfect, it is a reasonable approach where priority and user focus are interconnected in a manner that reflects real-life.

One challenge that has to be addressed is to ensure that each user can have only one such bundle. One approach to this is to have the Bunde ID (BID) of the service discovery bundle be cryptographically tied to the user’s ServalID (SID). This can be done in a hard 1:1 way, however, only one such bundle can exist for a user for all purposes, and that bundle is already used for sharing identity information on the mesh. Thus we need to use a looser coupling. This can be done, by requiring the User’s to sign the BID of the service discovery bundle to attest its authenticity. If multiple bundles for the same user are found, the search mechanism (and Rhizome storage system) can discard those bundles that are not currently valid in time (which the expiration scheme neatly enables), and if multiple bundles are valid simultaneously, then the bundle with the later publication time should be used in preference to any older ones. In this way, exactly only the most recent bundle will be valid for the search function.

This leaves only the community management functions. The communities could be managed in separate bundles, however, we can handle this in the same bundle, by allowing the inclusion of a number of communities that a user claims to be a member of. Each community name would be followed by a cryptographic hash that is the hash of the user’s SID, the community name, and the shared secret that attests membership to the community. The search functions then need only to filter the search results according to whether the service discovery bundle has any communities in common with the searcher.

This information can also be used to filter search results that are outside of the service discovery bundle. For example, to allow searching through all microblogging messages posted by members of a given community, or other content that they might have uploaded into Rhizome.

Together, this all means we need only one Rhizome bundle per user to support distributed search and service discovery. Circling back to our need to enable safe search spaces, this is also extended in a pleasingly simple way: You can search your contacts, or extend that to searching in one or more communities in which you participate, or you can search without filters. This allows a graduated transition from very safe to unsafe searching.

From an individual security perspective, it is acknowledged that this approach discloses to other users which communities a user is in. This could be partially concealed, by not including the community names in the service discovery bundle, but rather only the hash that can be interrogated to verify if a user is a member of a given community. However, this would only provide limited protection, because if anyone knew the name of a community, they would be able to test if a user was a member of that community. It would also require a separate community advertisement mechanism, if communities are to be able to be found. Those advertisements would then effectively provide the missing information required to determine the communities in which a user resides.

For more sensitive environments, it would be possible to make “silent communities”, where the community name is neither advertised nor included in the bundles. To provide security for communities, it would then be necessary to encrypt the payload of the service discovery bundle using a key derived from the shared secret of the community. This could be done by having the community membership attestment section of the bundle also include a decryption key that is encrypted using a key derived from the shared secret of the community. This would prevent people outside of a community from being able to search the service and resource offers and requests of the community, which is probably a good step to take.

This approach also has the advantage that members of a community can choose to change the shared secret at any point, without requiring any central coordination. They could choose to advertise membership with both the new and old shared secret for a period of time, to allow others time to update their shared secrets. Or alternative when the goal is to exclude an anti-social member, the old shared secret could be immediately discarded, and any updates to the service discovery bundles would render them unreadable by the excluded user (although they could still read any older version of the bundles for which they have access to the shared secret). This effectively provides a kind of forward secrecy for the communities. It also helps to expire stale members of the communities, who are not participating, helping with the data trimming aspects of the system.

Finally, we need to address the optimisation of bundles in the network, so that content for a community is prioritised over content from outside the communities that a device owner has joined. This will require tweaking of the Rhizome Bundle synchronisation process, rather than any changes to the data structures used. It thus falls outside the scope of this project, although it is highly desirable to have this implemented, as it will help with the scalability of Serval Mesh networks greatly.

So let’s now draw up a rough outline of what this service discovery bundle would contain, so that we can follow their contents for the example users in our previous scenario.

The key information types are:

1. Publication time and date of the bundle. This is used to indicate the beginning of validity of the contents of a bundle, so that a malicious user can't publish data that is "eternally valid", and thus clutters up search results forever.

2. A record for each community that a user has joined, and in which their service discovery bundle should be considered valid. These may have individual periods of validity, but always within the hard-limit of 7 days from the publication date of the bundle. Thus to remain in a community, the service discovery bundle has to be updated at least once per week. Again, this is to stop stale data clogging the network and search results. The maximum number of communities that can be listed in a bundle should be restricted to a relatively small number, perhaps 5.

3. A record for each resource request, service request, resource offer or service offer, again with each entry having a validity period. These records should include both a textual description of the request or offer, as well allowing the specification of a number of emojis to provide a language independent description that can be used to support both low-literacy and through the use of an "emoji reader" vision impaired users. For low-literacy users it should be allowed to specify a request or offer that consists solely of the describing emojis. The emojis can also be used to provide a convenient visual description of a record for placement on a crowd-sourced map display. To support this, these records should also include a PlusCode location identifier, making it easy to draw maps containing this information.

4. Records for the crowd-sourced mapping interface, that are not service or resource requests or offers, but rather situational-awareness markers, e.g., to indicate damaged roads or other infrastructure. These are otherwise similar to the request and offer records.

A data model with just these few record types are sufficient to support the various use-cases explored in the scenario in section 2. These will be fleshed out in a later milestone, once we have assured ourselves that they really are sufficient.

3.2 User Story Work-Throughs

To verify that the data model is adequate, I went through the scenario from section 2, and mapped out the information that would be required in the service discovery bundle for the key users in the scenario. To keep this manageable, I have focussed only on the service discovery bundle, and excluded any Serval MeshMS or other communications. I have also left out the expiration and validity duration information, as this is logically fairly straight-forward, and is orthogonal to the search and discovery requirements.

The result is the following image, where we have the key contents of the service discovery bundle listed for each of the sub-sections of the scenario. New pieces of data that are added to a user's service discovery bundle are highlighted in yellow when they are first added. The image is pretty big, so it is shown smaller here. Click on it to open it in full-resolution:

Using this diagram, we can trace through each user's activity. For example, we can trace Niere connecting to the network in 2.3, and then offering freight services in 2.5. Where she then seeks diesel fuel, we can see the offers of fuel from a Saua community member in 2.5, and then later from Boli in 2.6.

This approach to representing the data purposely doesn't show the act of searching for information, as that doesn't result in the further injection of data into the network. Rather, we can verify that the search will succeed, by visually confirming that the data required for the search is available at that point in time. For example, that Boli's offer for fuel is published in his service discovery bundle at or before the time that Niere searches for fuel in 2.6. That is, we have taken the approach of showing the data that resides in the Serval Mesh network to facilitate the searches -- which we can thus see that it does. Implementing the search facilities will come in a later milestone.

So that all looks sensible, so now we just need to document this on the Serval Project Developer's website, which I have just done, and that's all the items from this first milestone complete:

1. Define Search and Discovery Services and Data Model

This sub-task focuses on defining the user-facing search and discovery services that will be created, and how they will be mapped to the underlying Serval Mesh network primitives.

Milestone(s)

Identify the precise search and discovery services to be provided, including from both data provider and consumer perspectives.
Create several user-stories for how the search and discovery systems might be used in practice.
Map these services to the Serval Mesh Rhizome Delay Tolerant Networking Primitives, and work though several user-stories to confirm that the model is fundamental workable.
This milestone will be considered complete when the above elements have been completed, culminating in worked-through user-stories and documented data model, both of which shall be documented on the Serval Project Developer’s website (which will be upgraded to HTTPS in the process), as well as on the Serval Project blog (https://servalproject.org)

Wednesday, June 16, 2021

Succinct Data in the Pacific for COVID19 Symptom Monitoring

Over the past year or so, we have been talking with some of our partners in the Pacific about helping Pacific Island Nations to improve their capacity to respond to the COVID19 pandemic -- but also to support them with their ability to manage the existing neglected tropical diseases in the region, including Dengue Fever and Malaria.

What these three diseases all have in common, is that tracking who has and doesn't have the disease (or symptoms of the disease) can be really helpful at stopping the chain of transmission. If this is all sounding strangely familiar in our post-COVID19 world, that's because it is quite similar to the trace and quarantine approach that Australia and NZ have taken with great success -- enabled by being relatively small island nations able to control their borders.

This approach doesn't work for all countries, however, e.g., with very large dense populations and porous borders. However, that is not the case for Pacific Island Nations: The natural advantages of Australia and NZ were able to leverage are greatly amplified for most Pacific Island Nations, as they have relatively few international passenger movements, a very wide moat called the Pacific Ocean, and even within most of these countries, large distances between adjacent islands and communities, that further enable natural quarantining, without great inconvenience.

Where the challenge lies for these nations is that their relatively small economies and lower levels of development and logistical adversities make it difficult to build these kind of public health tracking and situation awareness gathering tools.

We developed Succinct Data back in about 2012 with NZ Red Cross precisely to be able to provide this kind of functionality, and we now have a sprint of effort to attempt to resurrect that system, and get it operational for use in the Pacific by the end of 2021. The process of doing this will be documented in blog posts over coming weeks.

Monday, September 28, 2020

900MHz 802.11ah WiFi modules

Hmmm... So it looks like there are the first 900MHz wifi modules, which claim 1 - 1.5km range outdoors, with data rates of megabits to tens of megabits in many circumstances. The module I have found is this one:

https://www.silextechnology.com/connectivity-solutions/embedded-wireless/sx-newah

The only problem I see so far, is that there is no comment as to whether they can do ad-hoc mode, or only AP or station mode.

If they can do ad-hoc mode, then this is potentially very interesting for use in Serval Mesh Extenders.

More when I discover it... But if anyone else has experience working with these, I would love to hear from you.

UPDATE: It looks like it uses an NRC7292 chipset for the 802.11ah, for which source code is available:

https://github.com/newracom/nrc7292_sw_pkg

They already have monitor mode implemented in the chip, it seems, so it looks like it *should* be possible to implement ad-hoc mode on it.

Thursday, September 24, 2020

Chasing down the last LBARD HF TX queue bugs and Initial Performance Graphs

To recap, in the last blog post I found and fixed a lot of bugs in LBARD over HF data modem operation. This has taken much longer than hoped, in part due to the 4 to 5 hours it takes to do a complete run.

The last fixes I discussed in the previous post have resolved the remaining TX queue shuffling bugs, so far as I can see. But in some runs I am still seeing that a few bundles are not transferred in one or both directions. These are typically among the lowest priority bundles, but not strictly the lowest priority bundles.

For example, in the last test run, the following bundles are not transferred:

peer4random79

peer4random94

peer4random96

The files are named in increasing file size, from peer4random00 to peer4random99. Thus the 94 and 96 files are among the lowest priority bundles, and the 79 file is reasonably close.

The first step to tracking down what happened with them is to get the bundle IDs (BIDs) that correspond to these files:

peer4random79 - 6CE2CB42* - bundle #21 on the sending side

peer4random94 - 469E2B18* - bundle #6 on the sending side

peer4random96 - 338FC557* - bundle #4 on the sending side

Armed with that information, I can see if those bundles get transferred at all. Let's start with the 79 file.

At 17:53.27 the sender first realises that the recipient does not have the bundle. However, the TX queue is already full, so it doesn't make it onto the TX queue.

At 18:28.39 it gets queued again, and this time there is space in the TX queue, and it does get queued in position 3 in the TX queue.

At 18:28.39 bundle #66 gets queued, which has a higher priority, so our bundle is demoted to position 4 in the TX queue.

At 18:29.06 bundle #53 gets queued, which also has a higher prioity, so our bundle is now demoted to position 5 in the TX queue.

At 18:29.06 bundles #27, #71 and #56 also get queued, and our bundle is now in position 8.

At 18:29.29 bundles #51 and #55 get queued, and our bundle is now in position 10. One more higher prioity bundle, and it will get bumped from the TX queue altogether until the tree-sync process gets restarted. This happens immediately, as bundle #59 gets queued.

At 19:06.57 our target bundle, #21, gets requeued, however the TX queue is at that time already full of higher-priority bundles, so does not make it ont othe TX queue.

At 19:40.05 it does finally make it back onto the TX queue in position 8.

At 19:40.14 bundle #41 gets queued, bumping our bundle to position 9.

Then at 19:40.49 bundle #27 gets queued, bumping our bundle to position 10, followed immediately by bundle #23, which causes our bundle to get bumped from the TX queue.

Our bundle is next attempted to be requeued at 20:13.42, but the TX queue is already full of bundles with equal or higher priority, so it doesn't get queued.

The next attempt is at 20:43.25. And this looks like it might be the problematic instance. The TX queue is not full, and our bundle has a higher priority than the TX queue's entries, and also compared with the bundle currently being sent. As a result, it is immediately selected for transmission. However, then bundle #28 also gets queued for sending, which bumps #21 from being the bundle currently being sent. However, there is no code that then puts the bundle into the TX queue.

As the TX queue had not overflowed since the last restart of the tree-sync process, no flag is raised to require the tree-sync to restart one last time, and so this bundle will never finish being sent. And indeed this is what happens. Similar things happen for the other two bundles.

So the fix is relatively simple: When de-queuing a bundle that should still get sent, we need to insert it into the TX queue.

I'm heartened that this fault seems to describe the last situation where this problem of bundles not being forwarded can occur.

The only immediate problem I can see, is that there is already code that is supposed to do this. It turned out the problem was that I attempted to re-queue the bundle being sent before removing it as the bundle being currently sent. The queuing logic was noticing this, and not adding the bundle back into the queue, because as far as it was concerned, the bundle was already being sent.

By reordering this sequence of events, I was able to solve the problem, and it now reliably delivers every single bundle. Together with all the performance fixes, we now have things at a state where they work reliably, which is great. So lets get some basic performance statistics from these runs:

We can see that there is considerable variation in the transmission times (and thus speed), with respect to the bundle sizes. That is, for bundles of about the same size, the delivery time can vary by a factor of 5 or so.

This is not unexpected, as the prioritisation of bundles during the tree-sync protocol can cause some bundles to be fully or partially transmitted, before tree-sync realises that there is a higher priority bundle. When this occurs, the bundle that pre-empts the lower-priority bundle will appear to take longer to deliver, because part of one or more other bundles will have occurred during the time accounted to the higher-priority bundle. Conversely, if a bundle has been partially transferred before being pre-empted by a higher priority one, the pre-empted bundle will appear to deliver more quickly, because only the time from when it resumes transmission will be accounted.

Otherwise, we see that the average data throughput, excluding the approximately 400 bytes of Bundle Manifest, improves gradually with increasing bundle size. For larger bundles we see transfer speeds of around 40 bytes per second, which is about half of the channel capacity of ~93 bytes per second we discovered previously. This is still not ideal, but given that it includes all the overheads for the transmissions to occur, to be prioritised and for synchronisation of bundle lists between the two parties, its not too bad. Note that these figures are for one direction only, with similar results for the opposite direction.

If we look at the overall bundle transfer times, we see that this varies between about 50 and 200 seconds, only somewhat influenced by the size of the bundle's payload. This corresponds to message delivery rates between 2x (3600/50) = 144 and 2x (3600/200) = 36 bundles per hour over an HF link. That is, between ~850 and ~3,500 bundles per day. Again, while higher figures would be welcome, being able to deliver one to a handful of text messages very few minutes would seem to be sufficient for delivering a useful level of communications between small isolated communities. Round-trip message send and reply cycles would likely be of the order of 2 to 10 minutes, which again is sufficient to facilitate useful communications.

So at this point we have reliable and reasonably performant delivery of Serval Mesh traffic over Codan HF data modems. The next step will be to try to setup a real HF link with two radios spaced many kilometres, or perhaps a couple of hundred kilometres apart, and see how it performs in practice with some real message traffic.

Otherwise, I'll now be turning my attention to implementing LoRa radio support for Mesh Extenders to help make lower-cost and more internationally portable Mesh Extenders, so that we can experiment with more interesting mixed-radio networks. Because the Wi-Fi links between Mesh Extenders are self-organising, this will allow such mixed-media networks to be easily built. It will then be interesting to see how the system behaves with such networks.

After that, I will start looking at implementing "affinity tags" so that when we start connecting large numbers of such networks together, that we can prevent traffic being replicated places it doesn't need to go. We should also finally implement the logic for detecting whether senders or recipients are likely to be located on the far end of an HF link, so that we can also factor that into the prioritisation.

Monday, September 21, 2020

Continuing working on the LBARD HF protocol efficiency problems

Ok, back from the last post where I was thinking about some longer-term future approaches, to trying to get the current code working as well as it can.

The problem I am seeing at the moment, is that the same blocks are being sent repeatedly, while other blocks are not being sent at all.

One problem was that pieces of the manifest were being marked as sent, when they hadn't. This was due to an out-by-one error in the comparison of the extent of a block being sent, compared with the actual data. The effect was when a block was sent, both it, and the following block were marked as having been sent.

This seems to be a common factor in many of the slow transfers. It has only started showing up now, because we were previously testing with payload-less bundles, and so the manifest would keep being resent, which would then have a higher chance of re-sending the missing part through the normal cyclic re-transmission.

So let's set a test run going, and see if it solves the problem.

Well, its helped, but close examination of the log files reveals that there are still pieces being counted against the wrong blocks, due to a duplicate code-path. I've now de-duplicated that. I also notice that sometimes manifest or body pieces would be counted against the other as having been sent, which of course is also causing problems. So I have fixed those as well.

On the plus side, its pretty how these sorts of bugs can be causing these problems: If we are marking things that have not been sent as having been sent, then we are not going to send those any time soon. Instead, we will keep sending the same pieces over and over, because we think we haven't sent them.

I'm also seeing the situation that the sending of a bundle can get reset part way through, which then makes it forget the transfer bitmap information for that bundle, resulting in it re-sending pieces it has already sent. This might be because an old feature that was built in that causes the tree-sync process to restart periodically, to avoid lock-up in the protocol, which in turn causes the selection of the currently being sent bundle to potentially change. However, the root causes for those lock-up have all been fixed now, so that can probably be removed, if it is still there. I'll investigate that while the new run with the previous bug fixes in place runs.

Yes, it now looks like the resetting of sending bundles is the most significant remaining problem. As mentioned, this causes the knowledge of the state of a transfer to be lost. There are two approaches to solving this:

1. Don't switch which bundle we are transferring.

2. Remember the state of other bundles, so that we can continue from where we left off, if we do switch which bundle we are transferring.

Option 1 sounds great in theory, but the synchronisation process may well interrupt the transfer of a lower priority bundle with a higher priority one, for example. So this is probably not possible to achieve in practice. Although I will instrument the changing of the current bundle being sent, to find out why it is changing.

Thus it probably makes sense to implement something to remember the progress of a bundle when switching. This can be kept in a nice little cache somewhere. Of course, this is one of several complications that would go away if/when I get to implementing the new approach I wrote about in the previous blog post. But for now, the cache will have to do.

I've now implemented this caching of transfer progress. However, the first run after implementing in it showed a weird situation where bundles transferred quite regularly in one direction, but none in the other. In the direction where traffic was flowing, I saw 6 bundles received fairly quickly, with all of them transferring in 50 to 80 seconds. That's the good news. However, there would then be a gap ranging from a couple of minutes to half an hour with nothing being transferred. Then the same six bundles would be transferred again.

I think what happened is that one of the servald instances crashed or didn't start properly. As a result one party had no bundles to share, and could not store them into its Rhizome database, and so the same transfers happened repeatedly. That might have just been some freak event that caused one of the servald instances to not have started properly. To find out, I'll run the test again, and see what happens.

Second run worked properly in that regard. Some bundles transferring at a reasonable speed, but others takeing a VERY long time, upto an hour or so.

One bug spotted: We don't mark the final body block as sent if it is a partial block. This looks to be able to cause the final block to be sent many times repeatedly, which is clearly not good. I've tried to fix that now.

The other bigger problem, is that the bundle keeps getting de-queued and re-queued, which of course is causing a lot of delay in transfer. It also seems that at least one code path can de-queue the current bundle being sent, without stashing its progress for later retrieval.

Found at least one: The sending of a new generation ID by the sender, which causes us to restart the tree-sync process. In that code, we stop sending the current bundle. However, we should just keep on sending the same bundle, unless and until we find a better bundle to send. This is likely the cause of considerable delays, because the synchronisation will run for potentially several minutes before it finds a new bundle to send. That's likely to be it. We should probably reduce the generation ID rotation rate for HF, anyway, since it doesn't make sense to reset the tree-sync process so frequently on a high-latency link.

All these problems keep making me think about the new tree-based approach I wrote about recently, as there would be no need to do any of this, and resumption of transmission would happen in a much more natural and efficient manner -- even if the lbard process really did die and restart.

Anyway, back to the existing code, it now looks like transfers are happening reasonably sensibly. In trying to make a script to show the sequence of transfers, I am however still seeing that the currently selected bundle for transmission tends to change more often than it seems it should. This led me to look more deeply into the TX queue management, and I'm seeing some odd things there, where some bundles added to the queue seem to disappear, and others that were never reported as having been added to the queue turn up in there. So I'll add some more debugging for that.

Yes: The TX queue is being completely purged from time to time. This is again the resetting of the generation ID, the unique identifier that is used to indicate when an LBARD instance restarts. Apparently I was resetting the generation ID every four minutes. Thus it is no wonder that the TX queues are getting purged and we are ending up stuck in a synchronisation hell. Instead, the generation ID should only be changed when the TX queue at one end or the other has overflown, and thus been emptied, and so the sending side doesn't know which bundles it should have sent when the TX queue overflowed.

Given that the transfers are now otherwise working fairly efficiently, I'm hopeful that this will get us to some kind of reasonably reliable transfer with steady transfer times for bundles of a given size.

Running more tests, I have found and fixed a few more problems:

1. If the carrier is lost in a HF modem call, we can end up in a situation where no more normal packets get sent, presumably because the number of unacknowledged packets is high, causing the packet rate to be practically zero.

2. Particularly with bundles that require many data packets, they end up slowing down in transfer over time, and eventually transfer halts. The halting is probably due in part to (1), but the reducing transfer efficiency must have another cause.

3. Some received pure data packets are corrupted, which is a probable significant contributor to (2).

To fix (1), I have reset the TX sequence number and last acknowledged sequence number whenever a HF call starts.

For (2), I need to examine what is going on in the logs of runs, which will probably make more sense once I have fixed (3).

For (3), I suspect that the modem's TX buffer is being easily overwhelmed. This is in spite of the fact that I have hardware flow control enabled. It is possible the hardware flow control doesn't work properly with the USB serial adapters. In any case, I have implemented a check before sending pure data packets, so that they will not be sent if no normal packet has been seen in the last 30 seconds. This should stop the buffers getting too over full. But a fuller investigation will require that I log every byte sent to and received from each modem, and then compare the two streams to see exactly what kind of data errors we are experiencing. Is it lost bytes, as I am suspecting, or is it bit errors, for example?

So adding the logging of bytes sent/received, I can see that indeed something is going wrong with the pure data packets almost immediately. This makes it much easier to see what is going on, since I don't have to wait an hour or more each time. Now to look at the TX and RX logs to see what is happening...

We are indeed missing large slabs of data. Taking a look at the FTDI USB serial adapter documentation, it looks like you cannot safely write more than 32 bytes at a time, if using hardware flow-control, as that is the size of the FTDI chips internal buffer. So I've modified the serial writing code to write 16 bytes at a time, to see if that fixes the missing bytes problem. However the code to back-off due to hardware flow control doesn't seem to get triggered at all, and the problem with corrupt data continues. Found one problem with the TX and RX logs, where the pure data packets weren't being included. Fixed that, now lets see how it looks...

Ok, so now I am not seeing any more lost bytes at the start, but the pure data packets were still being munged. That problem turned out to be caused by the truncation of the data packets on the receive side. I also found and fixed a couple of segmentation fault bugs in the process. Now re-running...

Still seeing corrupted packets. It turns out the pure data packets were not being properly escaped before sending, thus if they contained a "!" character in their data or header, it was messing things up. Fix that now, too. This could well be causing the failure to transfer larger bundles efficiently, as it would effectively stuff up the transfer of larger pieces more often. So its satisfying to have found this problem. Let's see what it does to the transfer rates...

Yes, bundles now seem to transfer in between 30 and 200 seconds. However, now I am seeing that the same bundles get sent more than once. I suspect that this is due to the synctree reset that happens when the TX queue is emptied, but overflow of the TX queue was previously recorded. This tells LBARD that it needs to start the sync again. However, if the bundles were received, then they should not get transferred again. Looking in the logs, I am seeing HTTP 400 errors when trying to insert the bundles into the rhizome database. Thus we are still having a problem with corruption during transfer. I'll have to look at the reassembly of bundles. My suspicion is that the problem will be with the pure data packets somewhere.

The corruption problems have now been solved. Part of it was that writing in 16-byte pieces helps the serial over USB transfers, but messes up the HTTP transactions. So I have split the code to only do that for the serial transfers. That fixed the problem with bundles not making it into the Rhizome database, and transfers now proceed progressively through more than just the same 10 bundles repeating.

However, I am still seeing the HF call dropping out sometimes. Those tend to recover fairly quickly, and might be legitimate loss of carrier in the radio setup.

More concerning though, is that there can be a period of half an hour or more where no transfers in either direction occur. These are quite mysterious, as the HF modem doesn't seem to drop out, but the logs of both sender and receiver simply show nothing at all for a period of 46 minutes in this particular case. I guess I will need to run it again, and try to catch when this happens.

This time we hit a segfault after almost an hour and transferring 80 bundles (between both directions) in about 64 minutes. The problem is that sometimes rubbish ends up in the TX queue. More specifically, the priority of a bundle ends up being stored in the bundle ID field of a slot in the TX queue.

So I've put more debug instrumentation in to find out how the TX queue list is getting corrupted. In the process, I also fixed the case where if a bundle is currently being sent, it could still end up being queued for later transmission as well, e.g., if the tree-sync process is restarted at some point.

I've also added code that detects when the top of the TX queue ends up containing an illegal bundle number to reduce the impact. However, I would really like to fix the error at its cause. What I believe is most likely the problem, is that the shuffling of the TX queue has an out-by-one error somewhere, as the bundle numbers occur immediately before the list of priorities. Thus if one too many entries were copied down, it would copy a bundle priority.

So time to start a new test run...

Its now running through a lot happier. 157 of the ~220 bundles have transferred, but it has taken about 4 hours to get that far. But it's not as bad as it sounds, as it seems that my laptop somehow stops USB communications while the display is locked. This means that if I don't periodically prod the computer, long periods of no transfers result. I'm pretty sure that this is the cause of this problem that I noticed previously.

That said, I am also seeing it able to get into a hang up and redial loop, where no packets get sent in the meantime. I assume that the time to next packet TX is somehow being set wrongly. Or more the point, it can grow large, and wasn't being reset when a call was re-established. I've examined all the code-paths that can get there, and added explicit resetting of the next packet TX time.

I tracked down the "nothing sent for an hour or more" problem: It's actually not with LBARD, but with the laptop I am using. It automatically suspends after 2 hours of inaction, even when connected to mains power. This doesn't really help when it happens mid-run. So I've disabled that in the laptop, and should now be able to run transfers without this problem.

I've also looked through and found the problem that was causing the illegal bundle numbers ending up in the TX queue, and fixed that. The problem was indeed an out-by-one with the list shuffling code, as expected. I'm expecting that this will likely fix the problem where only 199 or so of the 202 bundles were being transferred, as it could result in the situation where the TX queue loses an entry or two without actually overflowing to trigger the re-start of the sync process to re-discover the bundles that were lost from the queue. Thus I'm hoping that runs now will correctly transfer all 202 bundles -- 101 in each direction. It might also end up being marginally faster, because fewer sync restarts will be required.

In short, I'm now hoping that I am at the point where transfers run reliably and in reasonable time. I'll do a few runs to get some data on the typical speed and variation, but then we should be good, I think. What I would like to do next is to plan some actual long-distance tests using the HF radios, with one here at the house in Akaroola and another some distance away. Perhaps in a neighbouring town. However to do that, I will need to get some antennae and coordinate with folks in the amateur radio community to help with that.

But as this post has already gone on for ages, I'll stop here, and start a new post looking at the remaining TX issues.

Tuesday, September 8, 2020

Thinking about different ways to do the tree-sync for LBARD

In the last few blog posts I've been fighting my way through fixing bugs with LBARD's synchronisation algorithms. While both the tree-sync and the actual transfer of bundles have been problematic, the tree-sync side of things has been niggling at me for HF use, in particular. One of the reasons is that tree-sync has to start its synchronisation again from scratch every time a radio connection is made. This already takes several minutes, with only relatively small numbers of bundles. Even if neither end has new bundles, the entire synchronisation process has to occur again. Also the very high latency of the HF radios doesn't really agree with tree-sync, and can result in considerable delays in synchronisation.

Way back around 2014 we had started thinking about reworking the way that Rhizome stores data, by having it use a block-store to represent the bundles. That is, instead of keeping a manifest and payload, we would have a kind of virtual file-system in which the bundles would be represented. Data blocks would be identified by the hash of their contents, thus providing automatic de-duplication, and making it much easier to cache data.

Now, reworking all of Rhizome to do that would require significant effort. However, what *might* be doable in reasonable time, is to make the list of bundles an instance holds be represented in such a way. This would allow each end of an HF link to cache this virtual file-system representation, and thus re-synchronise very rapidly if none or few new bundles have appeared at each end.

My current thinking is along the following lines:

1. Blocks should be small enough to fit in a single LBARD packet. This realistically means a limit of about 200 bytes.

2. We want to minimise the number of nodes in the tree, as each additional node requires an extra round-trip.

3. The small block size means that we need to use relatively small hashes for each block. Even if they are too short for strong cryptographic security, they should still be strong enough to keep the chance of accidental collisions practically zero.

4. Each block can consist simply of leaf nodes or pointers to child nodes, or a mix of the two. The position of the node in the tree is implicit.

5. It would be nice to have one hex digit of identifier covered per block. This means 16 pointers to sub-blocks have to be able to fit in a block. 200 bytes / 16 = 12.5 bytes per entry. This probably means that 12 byte hashes = 96 bit hashes will be the longest possible. That's ample to keep us safe from Birthday Paradox accidental node collisions (should be safe for up to 2^48 blocks).

6. If all the child nodes for a block can fit in a block, then that is where they are placed. This avoids having unnecessary intermediate nodes in the tree that would otherwise have to be synchronised.

7. If all the child nodes can't fit in a single block, then we need to have a deterministic way to represent this, so that we don't end up with a proliferation of blocks as we fill nodes.

8. The simplest approach to (7), is to require that an overfull block child blocks for each of the 16 entries it can hold. This may mean that some child blocks point to the empty node (which because of the hashing, will always be the same).

9. The above formulation means that each bundle (or more correctly, version of bundle), will have a deterministic node, which each side can enter into their block cache, thus avoiding the need to ever transfer a block that corresponds to a bundle that the other party already has.

10. Each side can request blocks from the other, to begin exploring the tree structure. Any common sub-trees between the two sides will be implicitly available.

11. The block cache can persist between executions of LBARD, thus allowing the complete state of the other party to be remembered between radio sessions, thus greatly speeding up the initial synchronisation when a radio session commences.

12. Multiple blocks can be requested at a time, thus helping to mask high latency. In fact, simple optimisations like "please send me block X, plus any children (and optionally further descendents) of it that you don't think that I have" can be used to save further round-trips.

13. When a block is received, its position in the tree is known. This means that in some cases, it will be possible for the receipient to work out the content of the child nodes, if the sender has a strict sub-set of the sender's bundles that are in that part of the tree. At its simplest, a leaf node can be constructed for every combination of the bundles that it contains. If any of those match the block ID of the child node indicated in the received block, then there is no need to receive those child-blocks. Instead, not only can we avoid the need to send those blocks, we can actually immediately infer that the sender is missing those bundles not in the combination that resulted in the matching child node.

14. The approach from (13) can be extended further up the tree if desired, to help save bandwidth, at the expense of storage space and computation. At some point, however, the number of combinations that need to be tested becomes excessive, as each additional bundle in the portion of the tree being considered potentially doubles the number of combinations to be computed. However, because each leaf node is restricted in its potential content, the growth in complexity is actually rather different (leaf nodes are only formed when the collective leaves don't fit in a single block). The exact complexity of this will need to be modelled.

15. In theory, if the complexity of the above can be handled, then sending just the top of the tree will allow the recipient of any node that contains only a subset of the nodes the recipient possesses, without any further data exchange. This is more of theoretical interest than practical effect, except for the leaf-node optimisation described in (13).

16. The bundles themselves can also be represented as tree structures, so that similar techniques can be used. This also allows for implicit transfer of common data sections, including in journal bundles that have invariant starts. Of course, this ability to transfer common data sections will only work for those that have such commonalities that are aligned with the block size, or better, aligned with a leaf-node of blocks.

17. This approach allows for implicit caching of all transfers, making resumption of communications with other nodes and of bundles much more efficient.

18. This approach also solves many of the authenticity issues with transfers in LBARD, where the transfer of a large bundle could be poisoned with a fake block, or even just by a bit error that escaped the error correcting code. This is a significant advantage of this approach.

19. This approach effectively trades-off storage space (which is cheap and plentiful) for algorithm complexity and RAM requirements (which is in limited supply on the kinds of small embedded devices we want to target, such as low-power meshing phones etc). This is thus a very nice side-effect of this approach.

Anyway, I'll have a think about whether I implement this now or not, depending on how feasible it looks to get acceptable performance with the existing algorithm.

Monday, September 7, 2020

More work on LBARD over Codan HF Data Modem

In the last post, we got to the point where we had Rhizome bundles flowing over the modem, but not as fast as we would like. Also, we were still seeing some problems where bundle sending would get stuck. I've been away for a couple of weeks, but now am back at the bench with the HF radios, and can investigate further.

The first thing I realised that I should investigate, is what the actual throughput of the modems is, so that I can determine what fraction of the theoretical throughput we have available.

To do this, I instrumented LBARD to count the number of bytes, and divide that by the number of seconds since the first byte was received. This yielded results of about 91 bytes per second in each direction. Given the channel theoretical capacity of 2,400 bits / sec = 300 bytes / sec, this is noticeably lower than expected. It means that we are only actually transmitting data (91+91)/300 = ~60% of the time.

Now, if we were just transferring in one direction, instead of switching all the time, we could surely improve that, but we know that there is less than a factor of 2 available through such optimisation. So I'm not too worried about that right now.

What is more interesting is that we should be able to get near to 90 bytes per second, but are still languishing down around the 20 bytes per second mark due to protocol issues. That is, we have a factor of 4 that we can improve, without having to increase the channel capacity or use strategy at all.

Having spoken with one of my students who is working on a debug framework for LBARD, it is sounding to me like the packet bitmap that is used for acknowledgement and scheduling of parts of a bundle that the receiver doesn't already have is not working properly. If this is the case, it would be causing the sending side to send parts of the bundle that the receiver already has, effectively wasting bandwidth.

Looking through logs of previous runs, I can see one problem already: We schedule sending a bundle at the end of the file, even if we have already sent to the end of the file. That is, we send 0 bytes of data, positioned at the end of the bundle payload. This is clearly silly, and should be fixed. What we need to watch out for, though, is that we use end-of-payload packets to implicitly indicate the payload length in LBARD transfers.

We are also seeing the 4 data bytes being sent quite a number of times, before the missing parts of the manifest are sent through. That resending needs to be fixed. But it also occurred to me, that if we randomly select a point in the last 64 bytes of the manifest or payload, when we will only send <64 bytes, when typically we have space in the packet for 128 or 192 bytes. I suspect that this has the potential to greatly improve latency for small bundles, by basically eliminating such very low efficiency packets. Together with preventing the re-transmissions mentioned in the last paragraph, this should be able to get the latency of a sub-1KB bundle down to perhaps 15 seconds, which would be great improvement.

I'll try these improvements, together with some tweaking of debug output, so that I can more easily follow when particular bundle pieces are sent, and what the list of candidate blocks at a given packet transmission are.

This should have been a simple-rerun, but then I hit a segfault caused by something unrelated, and now running it in gdb all sorts of strange things are happening, causing one of the instances to not properly switch to the online data mode when the modems connect. Well, I found what caused the whole crazy problem to start: When I copy-pasted the arguments into gdb, I left the "1" off the front of "127.0.0.1", thus causing long delays while waiting for "27.0.0.1" to resolve, which caused the protocol to time out in various exciting ways. So let's try that all again...

Ok. Now it is running again, after fixing that. I also found and fixed some other bugs. It is now transferring small bundles in typically 20 - 50 seconds. This could be quicker, but it is a start. However, I am seeing instances where a bundle is received, but then fails to insert into the Rhizome database with an error 422. This means that either the manifest is too big, or that the payload doesn't match the hash. The bundle is then received a second time, and gets inserted. This means that there is some problem with the integrity of the transfer.

My first suspicion was that the pure data packets I implemented recently might be the problem. But in this case, no such packets had been sent at all for the bundle in question. As the packets are in any case protected by Reed-Solomon error correction, this would seem to suggest that the contents of the packets were being incorrectly assembled.

Actually, it turns out that this happens fairly often. This is good, in that it probably explains a lot of the poor transfer performance: Some bundles are having to be sent two or more times, before actually being accepted. So now to find out why.

It looks like decompression of the bundle manifests fails, resulting in the end of the manifest being messed up, and not passing its cryptographic self-consistency checks. It is the end of the manifest that gets messed up, so I am guessing that somehow another, either the length of the manifest, or the last partial 64-byte block of data for the manifest doesn't get properly received. It should hopefully not be too hard to find, as it is happening quite frequently, with probably 20% of bundle transfers being affected by it.

To track it down, I'll make both sender and recipient parties display the compressed manifest data, so that I can play spot-the-differences.

So looking at the compressed manifest on reception, it looks to be the right size, but has all zeros in the last data blocks. The problem was that when I made the optimisation to send not just the last fractional block, but as many 64-byte blocks that come before it that can fit, I hadn't adjusted the start point of the packet data being read, thus it was missing the extra bytes. So time for another run, to see how it goes.

Right. So now we don't seem to have the problem with the code 422s, which is good. However, we have the problem that sometimes we keep receiving a bundle repeatedly after receiving and registering it. This seems to be because we don't get a confirmation message from servald that the bundle is now in our rhizome database. As a result, we don't cull the duplicate reception of the bundle. It also means that we can't immediately start sending the bundle to other peers.

This isn't such an issue for the HF modem driver, because we don't have other peers to talk to. However, for Mesh Extenders with the UHF packet radios, this is a big problem, as bundles won't get forwarded on to other peers until this occurs. Thus we need to find and fix this problem.

LBARD communicates with servald via servald's HTTP RESTful interface. In particular, it listens for new bundles via an asynchronous API, which means that we should get indication from servald essentially immediately on inserting the bundle. So I presume something is fishy in that. To investigate this, I have instrumented the code that communicates with servald, to find out where the problem is.

While looking at the logs for these delayed bundle notifications, I am also seeing that some bundle transfers are slow, as one end is not sending any data packets at all to the other end for 10 -- 20 seconds at a time. This should never happen, especially once the initial synchronisation has occurred, that lets each end know what the other needs. In the current case, I am seeing a delay of 24 seconds between when successive pieces of a bundle are sent. So we know that the remote end thinks it has a bundle that is worth sending.

I'll start by investigating one of these intervals, from 17:51.02 to 17:51.26.

During that time, we do receive several packets from the sender at 17:51.10, 17:51.12, 17:51.15, and 17:51.21. That is, we have received four consecutive packets that seem to contain no pieces of bundles. Okay, so it seems that the synchronisation process is continuing to run, and is sending packets of ~200 bytes of synchronisation information.

Thus these delays are a false alarm, caused by the natural synchronisation process. What is a bit worrying though, is that the synchronisation procedure is taking a long time. With 100 unique bundles on each side, synchronisation is taking at least tens of minutes. In this example, I am seeing of the order of 300 sync messages of close to 200 bytes each -- and it shows no sign of settling down.

This amounts to somewhere around 68 KB of data out of a total of 84KB sent.

Well, that's very interesting: It explains why our goodput is so low, and generally, why transfers are going so slowly.

Thus together with the Rhizome bundle notification delay, we have to bugs that are able to greatly slow things down. I now have something quite definite to concentrate on.

I'll still work on the Rhizome problem first, because I am much more familiar with that code.

The synctree code is going to be trickier to fix, because I didn't write that. What I suspect might be the issue with it, though, is that the sync tree code doesn't expect to be used in an environment where multiple packets get queued up. This means that when it sees a response to and old packet of its own, it might be getting confused, and starting the tree traversal from higher up the tree again.

Thus the potential fix for this is to find a way to make sure we have only one packet in flight in each direction at a time. This would be a good idea in any case, because it would help to minimise the latency when a bundle is received, so that the sending party doesn't keep needlessly sending parts of that bundle after it has been received.

Well, actually I am going to fix this one-packet-in-flight issue first. But first, a brief explanation as to why we do this syncing this way in the first instance, rather than, say, just exchanging lists of bundles, and then sending the bundles that the other side needs to get -- because that would surely be more efficient.

The reason is that a simple "exchange lists" and then bundles method would be fine if no new bundles ever arrive during a communications session. But we want to be able to deliver messages with reasonable latency, even if they arrive while a connection is active. That is, we would like to have live HF links between Serval Mesh islands, and have those act as near-real-time data conduits.

Now, back to the need for the sync algorithm to not be several packets behind, I have implemented this, and it seems to be working somewhat: The sync traffic is flowing, and is no longer filling the packets. However, the rate of packet exchange is VERY slow. The average raw data rate in each direction is only about 8 bytes per second. It looks like each end thinks that it has 2 unacked packets nearly the whole time, which is causing it to throttle back the packet rate. I am suspecting that the low data rate is interacting in a bad way with the modems trying to be efficient, and waiting until there is enough data to be bothered sending.

But back to thinking about having only one packet in flight in each direction at a time, this means we will always end up with some wasted time, because one end or the other will almost certainly be waiting most of the time. Investigating further, the round trip time for a packet turns out to be around 15 seconds (!!).

Even allowing for two maximum size packets of 255 bytes, the propagation time would still only be (255x2 bytes)/(2400 bits per second) = 1.7 seconds. This makes me all the more suspicious about the modems trying to batch data together for efficiency, which in other cases would work.

So we need a solution to this problem. The previous work I did on inserting the occasional pure data packet might actually be a good solution here: Basically it means that we have a background level of sync exchange through the existing packets, and then we greatly accelerate the exchange of bundles via the pure data packets, with the added benefit that they should help to flush the modem buffers.

I'm now reading through the manual for the modems, with a view of disabling compression, in case the compression is causing increased latency. In the process, on page 6-2 of the "HF Data Modem 3012 Reference Manual", I have found some helpful information: While the data channel can in theory carry upto 2,400 bits per second, the maximum effective throughput is only 1,475 bits per second, i.e., 184.375 bytes per second. This explains why I was seeing data rates of about 92 bytes in each direction. While the extra bandwidth would have been nice, it does at least give me a clearer idea of what the maximum realistic capacity is.

On page A-3, I also found *T, which allows setting the timeout before sending data packets. This is important, because any non-zero value will increase the latency of packet transmission. This setting can be set from 0ms to 120000ms. The default is apparently 0ms. I'll set it explicitly in any case, just to make sure... Except the modem complains about the command. It is oddly listed as only *T, not AT*T, but *T also does not work. I tried *0, in case the T is supposed to be the value itself.

Also AT&M=5 is supposed to select the highest data rate and optimise for interactivity, i.e., reduced latency. That was already set.

Doing some more direct tests with the modems, I timed how long it takes after I paste a 62 byte string into one modem, before it appears at the other end. This reliably takes 7 to 8 seconds, thus confirming this latency problem.

The issue we have, is that our synchronisation protocol currently does not support having multiple packets in flight, as it was designed with radios in mind that can reliably indicate when they are ready to send a packet, and then send the packet immediately, thus allowing each packet to be build just-in-time, to ensure it has the most appropriate content in it.

In fact, just pressing a letter and waiting for it to appear at the end has the same latency of 7 - 10 seconds, so it isn't a bandwidth thing. Rather, it seems that the modems make at least one full turn-around each, before the data gets sent. Is this normal?

I did also find the command to disable data compression, AT%C0, and I am now testing that to see if it makes any noticeable difference.

(Another useful command I have found is AT&C=n,n,n,n,n, which allows providing the modem with a list of channels on which to listen for incoming calls, which I'll use later, and record here so that I don't forget about it ;).

But back to AT%C: Disabling data compression DOES make a difference: It drops minimum latency from around 7 - 10 seconds, to only about 4 seconds. That's a big help, and should greatly improve our throughput. With 4 seconds in each direction, and sending packets of up to 232 bytes, this should get us a data rate of up to 58 bytes per second of the theoretical 92 bytes per second available.

Thus, while it is a good step forward, it isn't enough to completely solve the problem. We still will need to send some pure data packets to make full use of the bandwidth. Part of me also wonders if disabling data compression might help with the buffer overrun problem we were having previously, that was limiting the size of the data packets we could send between normal LBARD packets.

But before we make further changes, we should time how long it is taking to send small bundles (without payloads in this case), so that we have a base line to compare against. Having ~4 second latency in each direction instead of ~8 seconds, means that we will detect the end of bundle transmission on average 8 seconds sooner, which should provide a noticeable improvement.

Sending only the normal LBARD packets, without any extra data packets, and it fails, because of the previously reported problem with payload-less bundles. I've now put a fix in for that, and that helps.

I'm also seeing another problem which is when the receiver tries to send a Bundle Announce Response (BAR) to indicate to the sender that it has received the bundle, we hit a problem if the packet contains another piece of the same bundle. That 2nd piece causes the receiver to emit an ack for that piece, which replaces the BAR in the transmission queue. As a result the sender never hears that the sender has the bundle, and keeps sending pieces of it. This is likely a key contributor to the highly variable receive times we have been seeing for bundles.

There are two fixes for this that should both be done:

1. ACKs should never replace BARs in the progress report TX queue.

2. We should simply ignore pieces of bundles we have marked as recently received, and never generate the ACKs for them, but instead produce BARs for them, to indicate that we have already received that bundle.

With that fixed, along with some other tidy-ups, we are now ready to do the comparison again. We will count the number of zero-length payload bundles (which thus involve sending about 200 -- 220 bytes of compress bundle manifest, and all the house-keeping) that we receive in 1,000 seconds, and the total bytes transferred in that time, so that we can calculate the average bundle transfer time, and the goodput.

Again, we will do this with no extra data packets, then with a 128 byte data packet after each normal LBARD packet, and then again with sending a data packet every 2 seconds, independently from the LBARD packets. We will average for both directions over our test between the two radios.

With 0 byte payloads, i.e., just manifests:

No data packets:

A -> B : 16.66 bytes per second, 13 bundles transferred

B -> A : 16.93 bytes per second, 13 bundles transferred

Sending 1 x 128 byte data packet after each normal LBARD packet:

A -> B : 25.74 bytes per second, 19 bundles transferred

B -> A : 25.26 bytes per second, 21 bundles transferred

Sending 1 x 256 byte data packet after each normal LBARD packet:

A -> B: 33.22 bytes per second, 26 bundles transferred

B -> A: 33.06 bytes per second, 33 bundles transferred

Sending 1 x 256 byte (including headers) data packet every 2 seconds, as well as normal LBARD packets:
A -> B : Modems hang up after a while, possibly due to data overrun, as bytes per second climbs to ~90 bytes per second.

B -> A : Modems hang up after a while, possibly due to data overrun, as bytes per second climbs to ~90 bytes per second.

(this makes sense

Sending 1 x 200 byte (including headers) data packet every 3 seconds, as well as normal LBARD packets:

A -> B: 48.13 bytes per second, 29 bundles transferred

B -> A: 48.18 bytes per second, 26 bundles transferred

Sending 1 x 256 byte (including headers) data packet every 3 seconds, as well as normal LBARD packets:

A -> B: 78.37 bytes per second, 23 bundles transferred

B -> A: 80.62 bytes per second, 21 bundles transferred

Sending 1 x 200 byte (including headers) data packet every 2 seconds, as well as normal LBARD packets:

A -> B: 68.59 bytes per second, 23 bundles transferred

B -> A: 68.86 bytes per second, 23 bundles transferred

Note that just because we allow a 256 byte data packet, doesn't mean that one will get sent: The actual packet sent might be quite a bit smaller, if there isn't enough data for the bundle currently being sent to stuff in.

We also see that in some cases, the bytes per second is higher, while the number of bundles transferred does not increase, or in fact goes down. This will most likely be due to the re-transmission of the same data blocks over and over, while waiting for acknowledgement of reception of the whole bundle. Indeed, switching from 256 byte to 200 byte data packets with the same 3 second interval, we see no decrease in bundles transferred, although the number of bytes sent per second reduces.

Once we move to bundles with non-zero-length payloads, this effect should reverse. Another effect that is likely at play, is that if we send lots of data blocks between the LBARD packets, then we will increase the rount-trip-time of the LBARD packets, which will slow acknowledgement of bundles, and thus reduce the number of bundles that can be received per unit time.

Another factor that must be taken into account here, is that the first bundle transfer does not typically take place until around 120 -- 180 seconds into the 1,000 second test. Thus once actively communicating, the actual time per bundle will be lower than suggested here.

So now to repeat those tests with non-zero payloads for the bundles... Which showed up a bug in payload handling, which was quick to fix. More critically, it showed up an issue where receivers sometimes don't recognise packets as being addressed to them. This means that when the other party sends them pieces of a bundle that they have already received, they don't tell the sender that they already have it.

The mechanism that is playing up stuffs each message containing a piece of a bundle with the first two bytes of the SID of the receiver. This works fine for some time, but after a while, the receiver finds garbage in this field. What is not clear is whether it is a data handling bug, that corrupts it on the receive side or during assembly on the transmit side, or whether the field that contains a peer's SID on the sender side is getting corrupted. This takes of the order of 15 minutes to occur.

LBARD maintains both ASCII-hex and binary representations of each peer. If these end up with different values, then we can assume that memory corruption has occurred. I'll add instrumentation that will help me to see if that is what is going on.

Interestingly, the data that overwrites the binary representation seems to often begin with $47, suggesting that some piece of code might be putting stuff there by mistake. The "Generation ID" message type does put a $47 followed by 4 random bytes. This message type also contains no bounds checking -- it just assumes it has been given a buffer with enough space in it. However, it is only called at the start of the packet assembly process, when it can be safely assumed that there is enough space in the buffer to fit the message.

A few more runs confirms that it is always $47 in the first byte, so almost certainly something screwing up with the Generation ID messages. I'll add some more instrumentation to see what the new generation ID is each time one is received, and if it matches what I am seeing in the packet, then I will know that its to blame.

Ok, while waiting for the run to hit the problem again, I've taken a closer look at the code that parses the generation ID messages, and discovered that it does indeed put the generation ID into the SID prefix field. I.e., a straight-forward bug.

This fixed the problem, and got everything going, but it still seems to be rather slower than it should be sometimes. But investigating the remaining problems can go into the next blog post.