-
Notifications
You must be signed in to change notification settings - Fork 39
Comparison of TSDB
There once was a time when RRD was the only time series database available. This has changed and there are now tons to choose from, each have benefits and drawbacks.
I hope to give a comprehensive list here and show what Newts may bring to the table. I will confess I have an inadequate understanding of almost all of these so I might make mistakes about their purpose or how they work. Please feel free to directly correct this, or send me an email if you want me to fix it.
One thing that I don't list in the limitations is library or program support. If one of these projects sounds like one you want to use then you may want to check to see if there is a plugin for grafana, or if it works with Cubism.js, or whatever you're going to be using for graphing.
Just some guy who needed a TSDB for some projects I work on, so I wanted to figure out which one I should use, and now I don't know. I like the people doing OpenNMS things so I thought I would spam their wiki with this in the hopes it benefits everyone.
Tobias Oetiker (http://oss.oetiker.ch/rrdtool/)
File based datastore, written in C with as-you-write aggregation. Supports built-in functions like min/max/avg for different aggregation types. Has API's for many languages making it popular for both data collection and graphing in open-source.
Cacti, Nagios, Zenoss, MRTG, munin
- Reboots need "spike removal" or other manual intervention to fix the data due to counter resets (this may apply to all TSDB's)
- Heavy disk writes with 5 IOPS per value updated
- Scaling horizontally is a completely manual operation since it is a library, there is no daemon so you can't run something on multiple systems to gain redundancy or split load
GPL2
OpenNMS mostly, but historically (http://oldwww.jrobin.org/)
File based datastore written in Java. Supports built-in functions like min/max/avg for different aggregation types. Writes an endian-agnostic file format.
- Has the same problems as RRD
- Has fewer aggregation functions written for it (this may have changed)
LGPL
Graphite (http://graphite.wikidot.com/whisper)
File based datastore written in Python. It has different requirements for time series than RRD which they spell out on their website. Essentially, they need to be able to skip updates, which is impossible with RRD. RRD considers a late update to be just the next update in the file. In order to "skip" some you would need to write NaN until you get to the new time.
You also can't work backwards in RRD to post a previous value that was missed. You could compensate and do this in your RRD library but it would be very IO intensive to do so, so they wrote a new TSDB.
- Slow, but not enough to stop people using it for 1 minute or less polling precision
- Can't scale horizontally. Doesn't scale well in general (needs raided SSD or big SAN)
- Pain to install (?)
Apache V2
The project was started by Stumbleupon
Runs on Hadoop and HBase. Data is stored exactly as provided without aggregation and without removing old data. According to their FAQ they are considering Cassandra support now.
- Ironically, it manipulates the data returned so it might not be exactly as it is stored
- People say it's a pain to run an HBase cluster. I imagine it's easier now with puppet/chef/etc.
- According to InfluxDB startup thing, "it was too easy to create hot spots that would kill performance"
GPL3
Can use in-memory H2 database (for testing) or Cassandra. Supports aggregation using built-in functions like min/max/avg. This project started as a fork of OpenTSDB due to differing requirements.
- unknown
Apache V2
Written in Go. No underlying database (like Cassandra or HBase). Has the ability to compute queries continuously and send the data to the client.
- Clustering, Replication and HA are in "alpha" state
MIT
Soundcloud (http://prometheus.io)
Written in Go. File based datastore with external indexing and in-memory cache. Made to be better and faster than Graphite but easier than something like OpenTSDB. InfluxDB didn't exist when they started building it. When compared, InfluxDB uses more diskspace per metric.
- Horizontal scaling isn't possible (they acknowledge this, but seem to be working on being the best single-system monitoring solution)
Apache V2
Metamarkets open-sourced druid in 2013 (http://druid.io/)
Java based, Hadoop/HDFS/Zookeeper. Druid was designed to handle analytics for online advertising. It doesn't bill itself as strictly for Time series, but that isn't a reason to exclude it. The biggest barrier you might face is the complexity. It can do much more than the other mentioned platforms from what I can see, so setup and usage may be difficult.
You might start off with something like this:
https://github.com/Quantiply/druid-vagrant
- unknown
Apache V2
Rackspace (http://blueflood.io/)
Java based Cassandra-backed system. Has fixed rollup intervals for aggregation with a few function types (min/max/variance/average)
- Seems to be a work in progress, or perhaps it's exactly what was needed by rackspace without the extra bits
- no API, the part that handled that was not open sourced so you need to write to the database in Java or write your own compatibility layer
Apache V2
Clojure application, Cassandra-backed system. Seems to be based around Graphite.
- Lots of open issues
- Might only work with Graphite currently
Attribution, Share-alike
InMobi (https://github.com/InMobi/level-tsd)
An embedded database based on leveldb that is tailored to graphite. This was written after Ceres was considered and limitations found in it's use (specifically with inode usage). They also tested postgres arrays as a datastore, and found them to be faster than Whisper and Ceres but had problems with the design creating a need for constant VACUUM.
While this is a single-system datastore, they were able to scale to 500K metrics/minute on raid5 4x15K drives. http://www.inmobi.com/blog/2014/01/24/extending-graphites-mileage
- No horizontal scalability
- LevelDB only allows one process to access the database at a time, so no multi-core threading
- Project hasn't been updated in 5 months
Apache V2
Graphite (https://github.com/graphite-project/ceres)
Distributed database written in Python. Not a fixed-sized db, but instead aggregation and expiration will be done by maintenance plugins in Carbon. Is in a partially usable state (people are reporting they are using it in "production" but documentation is incomplete and development is slow)
- ??
Apache V2
The Opennms Group (http://opennms.org)
Java based Cassandra-backed system. This will use delayed aggregation (aggregation at read) to make things fast.
- ?? at a guess, runaway disk space until read/rollup intervals happen. Maybe there will be a maintenance task that can run at slow times/late hours to aggregate graphs.
Apache V2
http://www.erol.si/2015/01/the-complete-list-of-all-timeseries-databases-for-your-iot-project/
https://lobste.rs/s/kjn5an/recommended_reading_for_building_time_series_databases
https://tsdbbench.github.io/Ultimate-TSDB-Comparison/
It's based on the publication Survey and Comparison of Open Source Time Series Databases [slides].
- Getting Started
- Data Model
- Running a REST Service
- Using the Java API
- Aggregation
- Search
- API Reference * Java * REST
- Hacking Newts