Goal: A simple and relatively efficient graph data store that can handle billions of vertexes on single machine. In particular, we wanted a local copy of Freebase.
This project is called "Tinygraph" because the codebase is tiny. It just doesn't do much, but it's pretty efficient and easy to use.
Status: Experimental. Definitely not over-engineered.
What it can do:
- Store triples (actually "quads") indexed for (prefixes of) subject-property-object, object-property-subject, and property-subject-object access.
- Find edges based on those indexes.
- Give you a barely functional Go API.
- Give you a barely functional Javascript-from-HTTP API based on that Go API.
That's about it. The core code is fewer than 1,000 lines of Go.
What it can't do:
- Query planning. You just traverse paths. (But it'll be easy to expose RocksDB's
GetApproximateSizes
.) - Fancy graph algorithms.
- Provide much safety.
For comparison, see:
and many other graph databases.
How it works:
- Uses RocksDB. Could use any key-value store that provides prefix ordering.
- Does not intern strings. Instead, uses RocksDB's Snappy compression in an simple attempt to mitigate the cost of duplicates (while avoiding read+update during each write).
- Uses the nifty
robertkrimen/otto
Javascript implementation in Go.
Getting started:
Get Go.
Then try to get RocksDB built:
gcc -v # 4.7+
sudo apt-get install -y libgflags-dev libsnappy-dev zlib1g-dev libbz2-dev
git clone https://github.com/facebook/rocksdb.git
cd rocksdb
make shared_lib
# Suit yourself re installation.
sudo cp librocksdb.so /usr/lib
(cd /usr/include && sudo ln -s ~/rocksdb/include/rocksdb rocksdb)
Then
go get && go build
Quick example:
It's easy to load WordNet RDF. Here's an example of loading a useful subset of WordNet quickly.
wget http://wordnet-rdf.princeton.edu/wn31.nt.gz
zcat wn31.nt.gz | grep 'hyponym\|hypernym\|meronym\|holonym\|#label' | gzip > somewn.nt.gz
rm -rf wordnet.db
./tinygraph -config config.wordnet -lang eng -load somewn.nt.gz -serve
Then:
cat <<EOF > holo.js
function holonyms(term) {
var label = G.Bs("http://www.w3.org/2000/01/rdf-schema#label");
var holo = G.Bs("http://wordnet-rdf.princeton.edu/ontology#part_holonym");
var paths = G.In(label).Out(holo).Out(label).Walk(G.Graph(), G.Vertex(term)).Collect();
var uniq = {};
var acc = [];
for (var i=0; i<paths.length; i++) {
var h = paths[i][2].Strings()[2];
if (!uniq[h]) {
uniq[h] = true;
acc.push(h);
}
}
return acc;
}
EOF
curl --data-urlencode 'js@holo.js' http://localhost:8080/js
# Use that stored procedure.
curl --data-urlencode 'js=holonyms("Africa")' http://localhost:8080/js
You'll hopefully see
[
"Barbary",
"Nubia",
"Mahgrib",
"African nation",
"East Africa",
"South West Africa",
"Republic of Angola",
"Republic of Burundi",
"Republic of Cameroon",
"Central African Republic",
"Tchad",
"Republic of the Congo",
"Zaire",
...
]
Check out examples/
for more. Also see freebase.sh
.
ToDo:
- More logging.
- More test cases.
- More docs (especially configuration).
- Buffered Stepper channels.
- Expose RocksDB
GetApproximateSizes
. - Deal with concurrent requests and Javascript.
- Concurrency using mutexes and such.