Dgraph is a distributed, low-latency, high throughput graph database, written in Go. It puts a lot of emphasis on good design, concurrency and minimizing network calls required to execute a query in a distributed environment.
We think graph databases are currently second class citizens. They are not considered mature enough to be run as the sole database, and get run alongside other SQL/NoSQL databases. Also, we're not happy with the design decisions of existing graph databases, which are either non-native or non-distributed, don't manage underlying data or suffer from performance issues.
If you're interested in a high-performance graph database with an emphasis on sound design, thoughtful implementation, resilience, and cutting edge technologies Dgraph is definitely something you should consider.
If you're running more than five tables in a traditional relational database management system such as MySQL, SQL Server, or Oracle and your application requires five or more foreign keys, a graph database may be a better fit. If you're running a NoSQL database like MongoDB or Cassandra forcing you to do joins in the application layer, you should definitely take a look at moving to a graph database.
If your data doesn't have graph structure, i.e., there's only one predicate, then any graph database might not be a good fit for you. A NoSQL datastore is best for key-value type storage.
We recommend Dgraph to be used in production at companies. Minor releases at this stage might not be backward compatible; so we highly recommend using frequent exports.
Every other graph system that we've run it against, Dgraph has been at least a 10x factor faster. It only goes up from there. But, that's anecdotal observations.
Here are some actual benchmarks:
- Dgraph against Neo4J – check this blog post
- Dgraph against Cayley – check this github repo (credit to Ankur Yadav)
Dgraph is licensed under Apache v2.0. The full text of the license can be found here.
Dgraph v0.8 and above uses Badger, a persistent key-value store written in pure Go.
Dgraph v0.7.x and below used RocksDB for the key-value store. RocksDB is written in C++ and requires cgo to work with Dgraph, which caused several problems. You can read more about it in this blog post.
BoltDB depends on a single global RWMutex
lock for all reads and writes; this negatively affects concurrency of iteration and modification of posting lists for Dgraph. For this reason, we decided not to use it and instead use RocksDB. On the other hand, RocksDB supports concurrent writes and is being used in production both at Google and Facebook.
No. Dgraph stores and handles data natively to ensure it has complete control over performance and latency. The only thing between Dgraph and disk is the key-value application library, Badger.
Dgraph started with the aim to fully support GraphQL. However, as our experience with the language grew, we started hitting the seams. It couldn't support many of the features required from a language meant to interact with Graph data, and we felt some of the features were unnecessary and complicated. So, we've created a simplified and feature rich version of GraphQL. For lack of better name, we're calling GraphQL+-. You can [read more about it here]({{< relref "query-language/index.md" >}}).
Dgraph will aim to support Gremlin after v1.0. However, this is not set in stone. If our community wants Gremlin support to interact with other frameworks, like Tinkerpop, we can look into supporting it earlier.
If there is a demand for it, Dgraph could support Cypher. It would most likely be after v1.0.
Please see Dgraph product roadmap of what we're planning to support for v1.0. If request X
is not part of it, please feel free to start a discussion at discuss.dgraph.io, or file a Github Issue.
Yes. The main core of Dgraph is under the Apache 2.0 license. Enterprise features will be released under a proprietary license. Unlike other databases, we include running Dgraph distributedly under an open source license because we want all our users to be able to scale as demand grows.
Yes. We're VC funded and plan to use the funds for development. We have a dedicated team of really smart engineers working on this as their full-time job. And of course, we're always open to contributions from the wider community.
It's currently too early to say. It's very likely that we will offer commercially licensed plugins and paid support to interested customers. This model would enable us to continue advancing Dgraph while standing by our commitment to keeping the core project free and open.
We accept both code and documentation contributions. Please see link for more information about how to contribute.
This is from a reddit thread. ''Raft means choosing the C in CAP. "Highly Available" means choosing the A. I mean, yeah, adding consistent replication certainly means that it can be more available than something without replication, but advertising this as "highly available" is just misleading... Anything built on raft isn't (highly available).''
CAP theory talks about one edge case, which is what happens in case of a network partition. In case of network partition, Dgraph would chose consistency over availability; which makes it CP (not AP). However, this doesn't necessarily mean the entire system isn't available. Dgraph as a system is also highly-available.
This is from Wikipedia:
There are three principles of systems design in reliability engineering which can help achieve high availability.
- Elimination of single points of failure. This means adding redundancy to the system so that failure of a component does not mean failure of the entire system.
- Reliable crossover. In redundant systems, the crossover point itself tends to become a single point of failure. Reliable systems must provide for reliable crossover.
- Detection of failures as they occur. If the two principles above are observed, then a user may never see a failure. But the maintenance activity must.
Dgraph does each of these 3 things (if not already, then they're planned).
- We don't have a single point of failure. Each server has the same capabilities as the next.
- Even if some servers go down, the queries and writes would still succeed. The queries would automatically be re-routed to a healthy server. Dgraph does reliable crossover.
- Data is divided into shards and served by groups. Unless majority of the particular group needed for the query goes down, the user wouldn't see the failure. But, the maintainer would know about them.
Given these 3, I think I'm right to claim that Dgraph is highly available.