-
Notifications
You must be signed in to change notification settings - Fork 10
Dev: 0. Requirements
This page describes the requirements for various phases of Asami since its inception. Most phases were well-defined in their timeframe and feature set, while Phase 3 was a maintenance phase where new requirements were brought in as needed.
Asami was initially built as support for the Naga rules engine, which was designed to integrate with any graph database, via protocols. Naga's original adapter was for Datomic. However, because this was a commercial product it was deemed unsuitable for the needs of the organization (ThreatGrid at Cisco Systems, which later became a part of SecureX).
The initial requirements for Asami were:
- Source in Clojure
- Open Source
- In-memory storage
- A Graph Database with a subset of Datomic operations:
- Query language using Clojure data structures
- Pattern-based triple selection, with variable binding
- Join operations (inner join)
- Filtering (removal of rows, based on a condition)
- Projection (removal of columns)
- Query planner
Each requirement here met a specific need or desire for the overall project:
- All of the ThreatGrid source code was written in Clojure, with front-end systems being built in ClojureScript. Keeping to Clojure allowed for easier integration, and stayed in the same technology stack, which helped the engineers.
- Naga was originally a personal project, published as Open Source. When asked if I could bring this work into Cisco, I expressed a desire to keep it open source. Management explained that they had been looking for opportunities to publish tools in Open Source, so this aligned with both of our needs.
- I already had experience building a graph database with a team, and I knew that a complete graph database with storage takes significant time and engineering to build. As a single-person project, it would have been infeasible to take on too many requirements. Keeping it in-memory simplified the system significantly.
- The operations were not specifically from Datomic but were shared among most graph databases. The operations necessary for Datalog processing are: matches, joins, filters, and projection.
- Because the database was being integrated into a rule system, most operations would be machine-generated. This could make some queries extremely inefficient. Having experience with building such optimizers, I believed it would be possible to include one in this phase of development.
This system was deployed as a web service using Clojure Ring and Jetty.
After the initial success of Asami, new requirements were developed:
- Source in ClojureScript
- A new API wrapper
- Importing arbitrary JSON
These requirements were specifically to integrate Asami into the ThreatGrid dashboard application, where it would hold the central data model. Coincidentally, the application had already built its own data model as a graph of nodes and edges, so the existing function calls to this internal graph were refactored into a protocol, and then the protocol was reimplemented to wrap Asami.
Most of the data being processed by the application was in JSON, which was stored separately from other data. To make the rules engine useful for the full data model, we needed a way to handle this data as well. JSON is a tree structure, which is a form of Directed Acyclic Graph, meaning that we could import and export it as a graph. This required a model for handling JSON, as well as a module for converting the data in and out of JSON without losing the original structure.
Maintenance and Expansion Now under regular use, Asami expanded to include:
- Transaction support
- Multigraph support (multiple edges between nodes)
- Aggregate queries
- New query operations: MINUS, BINDING, AND, OR, Transitive closures, Path discovery
- Improved Query planner
- Improved security
- DELETE operation
- Command Line Interface
- Arbitrary nodes for direct JSON representation (e.g. strings)
A new team struggled to work with Asami due to its unique API. This led to adopting an API that was more familiar:
- Datomic API
- New data structures to encapsulate state:
- Connections
- Databases
- Datoms
- Speculative transactions
- Searchable transaction history
- Upsert and Update operations (changing a field on an object)
While Asami continued to operate in production, this phase was started in parallel to phase 4:
- Storage abstraction layer
- Implementation of Storage using JVM NIO
- Index selection
- Indexing component implementation: blocks, trees, flat files
- Significant expansion of testing
- Object Serialization/deserialization
- Reimplementation of in-memory operations (e.g. path discovery) in more general terms
- Expansion and testing of Connection API
After leaving Cisco, I have continued to work on Asami, though with reduced capacity. The current requirements are:
- RDF import and export
- SPARQL querying
- Query functionality
- Statement IDs
- Multi-graph selection
- Performance improvements
- Hybrid memory/disk graphs
- Resolution annotations to indicate private/public identifiers, avoiding conversion to public when possible
- Redis implementation of storage protocol
The RDF/SPARQL work has been implemented as external libraries, which are being integrated.