A first prototype of Histo’s architecture has actually been implemented and will continue to be developed.
The components of Histo described in Chapter 6 all describe self-contained functionality and we therefore implemented each of them as a separate module.
At the highest level is the Histo module, which only implements the history tracking functionality described in Section 6.2. All other features are still exposed in the Histo module but delegated to specialized modules:
- histo-sync: Implements the synchronization protocol of Section 6.3
- lca: Implements a lowest-common ancestor algorithm described in Section 2.6
- graph-diff : Implements the commit history difference algorithm of Section 6.4.1
- content-addressable: Implements a content-addressable store interface de- scribed in Section 2.7
- diff-patch-merge: Implements the three-way-merge algorithms described in Sec- tion 6.5
Through this modular approach we enable the support of alternative data structures and other history-based synchronization protocols.
The implementations of all modules are attached to the thesis and can be found online where they continue to be improved [39].
In this chapter we will discuss the technologies used and our approach to ensure cor- rectness of the implementation.
The use of Web Standards is the most common option to implement application logic that can run unmodified on a large number of platforms. Desktop computers, notebooks,
smartphones and tablets all come with implementations of web standards. JavaScript has become the only high-level programming language that can run on any modern con- sumer device. In recent years JavaScript is even increasingly used on server platforms to implement application backends. Node.js has become the most popular server-side execution environment for JavaScript.
As described in Section 2.8, modern browsers come with support for client-side databases in the form of IndexedDB. The IndexedDB implementation of the Chrome Browser is based on LevelDB, which is a key-value store developed by Google [40]. LevelDB is in- spired by Google’s BigTable [41] and was open sourced in 2011. The Node.js community made LevelDB usable from JavaScript in the form of the LevelDOWN module [42]. The same interface used by LevelDOWN is available as level.js [43], which wraps IndexedDB in the browser. This means we have the same database API available both in browsers and on the server using Node.js.
We therefore decided to base the implementation of Histo on JavaScript, using the LevelDOWN interface for the storage layer.
In Chapter 4 we have listed the requirements of a synchronization solution. We will now go through each of them and outline the relevant design decisions we made to fulfill them.
Histo currently uses a hierachical data model as described in Section 6.1. We have shown how to map a typical entity-relationship model to the hierarchical model. Many other types of data models could be mapped to a hierarchy and benefit from the efficient differencing properties of trees. Histo’s architecture is not limited to a hierarchical model - the commit history differencing and synchronization protocol is actually agnostic of the underlying data structures. It is therefore possible to implement support for entirely different types of data.
This property is ensured as our synchronization protocol operates without any locking logic. We will go into more detail on the resulting robustness of our protocol in the next section.
In Section 6.2 we describe how we track the change history of an application’s data. This enabled us to implement three-way merging as described in Section 6.5. Causality is therefore preserved in our synchronization protocol. Section 6.6 describes our approach for handling conflicts in a deterministic manner while leaving it up to the application developer to implement custom conflict resolution algorithms.
The support for a broad range of network topologies is described in Section 6.7. We even went further and identified possible protocol optimizations for each topology.
Histo encapsulates all low-level aspects of data synchronization. We have defined inter- faces in our application which allow the plug-in of custom data stores and appropriate messaging solutions. In the final section we will give examples for ways to improve integration with application logic.
We described our choices for cross-platform technologies in Section 7.1.
Another non-functional requirement is obviously the correctness of our implementation. We implement the Histo modules using a test-driven approach. This ensures that all implementations are continuously checked for correctness as we progress. The limits of showing correctness of software are known - we try to cover the most common cases in our tests. The test suites of all modules are attached to the thesis and can be found online [39].
Part of the test suite is verification of robustness to network partitions. We show that network partitions through any of the phases of our synchronization protocol will leave the database in a consistent state.
This is partly due to our design choice of using a content-addressable store, which never overrides existing data. The entire synchronization process happens asynchronously and therefore does not block the database for writing by the local user. Only once the data has been merged successfully, the node’s head is updated. Even the head update is asynchronous through the use of optimistic locking as described in Section 6.3.2. If the synchronization has to be cancelled at any stage, the node’s head is simply not updated. We therefore avoid the implementation of complex recovery logic.
Due to the separation of data propagation and local merging, the protocol can still efficiently operate during frequent network partitions. The slow data propagation phase can continue where it stopped. Only the comparatively fast local merging phase has to be redone when an error occurs during synchronization.