The crud-bench benchmarking tool is an open-source benchmarking tool for testing and comparing the performance of a number of different workloads on embedded, networked, and remote databases. It can be used to compare both SQL and NoSQL platforms including key-value, embedded, relational, document, and multi-model databases. Importantly crud-bench focuses on testing additional features which are not present in other benchmarking tools, but which are available in SurrealDB.
The primary purpose of crud-bench is to continually test and monitor the performance of features and functionality built in to SurrealDB, enabling developers working on features in SurrealDB to assess the impact of their changes on database queries and performance.
The crud-bench benchmarking tool is being actively developed with new features and functionality being added regularly.
The crud-bench benchmarking tool is open-source, and we encourage additions, modifications, and improvements to the benchmark runtime, and the datastore implementations.
When running simple, automated tests, the crud-bench benchmarking tool will automatically start a Docker container for the datastore or database which is being benchmarked (when the datastore or database is networked). This configuration can be modified so that an optimised, remote environment can be connected to, instead of running a Docker container locally. This allows for running crud-bench against remote datastores, and distributed datastores on a local network or remotely in the cloud.
In one table, the benchmark will operate 5 main tasks:
- Create: inserting N unique records, with the specified concurrency.
- Read: read N unique records, with the specified concurrency.
- Update: update N unique records, with the specified concurrency.
- Scans: perform a number of range and table scans, with the specified concurrency.
- Delete: delete N unique records, with the specified concurrency.
With crud-bench almost all aspects of the benchmark engine are configurable:
- The number of rows or records (samples).
- The number of concurrent clients or connections.
- The number of concurrent threads (concurrent messages per client).
- Whether rows or records are modified sequentially or randomly.
- The primary id or key type for the records.
- The row or record content including support for nested objects and arrays.
- The scan specifications for range or table queries.
As crud-bench is in active development, some benchmarking workloads are already implemented, while others will be implemented in future releases. The list below details which benchmarks are implemented for the supporting datastores and lists those which are planned in the future.
CRUD
- Creating single records in individual transactions
- Reading single records in individual transactions
- Updating single records in individual transactions
- Deleting single records in individual transactions
- Batch creating multiple records in a transaction
- Batch reading multiple records in a transactions
- Batch updating multiple records in a transactions
- Batch deleting multiple records in a transactions
Scans
- Full table scans, projecting all fields
- Full table scans, projecting id field
- Full table count queries
- Scans with a limit, projecting all fields
- Scans with a limit, projecting id field
- Scans with a limit, counting results
- Scans with a limit and offset, projecting all fields
- Scans with a limit and offset, projecting id field
- Scans with a limit and offset, counting results
Filters
- Full table query, using filter condition, projecting all fields
- Full table query, using filter condition, projecting id field
- Full table query, using filter condition, counting rows
Indexes
- Indexed table query, using filter condition, projecting all fields
- Indexed table query, using filter condition, projecting id field
- Indexed table query, using filter condition, counting rows
Relationships
- Fetching or traversing 1-level, one-to-one relationships or joins
- Fetching or traversing 1-level, one-to-many relationships or joins
- Fetching or traversing 1-level, many-to-many relationships or joins
- Fetching or traversing n-level, one-to-one relationships or joins
- Fetching or traversing n-level, one-to-many relationships or joins
- Fetching or traversing n-level, many-to-many relationships or joins
Workloads
- Workload support for creating, updating, and reading records concurrently
- Docker - required when running automated tests
- Rust - required when building crud-bench from source
- Cargo - required when building crud-bench from source
cargo run -r -- -h
Usage: crud-bench [OPTIONS] --database <DATABASE> --samples <SAMPLES>
Options:
-n, --name <NAME> An optional name for the test, used as a suffix for the JSON result file name
-d, --database <DATABASE> The database to benchmark [possible values: dry, map, arangodb, dragonfly, fjall, keydb, lmdb, mongodb, mysql, neo4j, postgres, redb, redis, rocksdb, scylladb, sqlite, surrealkv, surrealdb, surrealdb-memory, surrealdb-rocksdb, surrealdb-surrealkv]
-i, --image <IMAGE> Specify a custom Docker image
-p, --privileged Whether to run Docker in privileged mode
-e, --endpoint <ENDPOINT> Specify a custom endpoint to connect to
-b, --blocking <BLOCKING> Maximum number of blocking threads (default is the number of CPU cores) [default: 12]
-w, --workers <WORKERS> Number of async runtime workers (default is the number of CPU cores) [default: 12]
-c, --clients <CLIENTS> Number of concurrent clients [default: 1]
-t, --threads <THREADS> Number of concurrent threads per client [default: 1]
-s, --samples <SAMPLES> Number of samples to be created, read, updated, and deleted
-r, --random Generate the keys in a pseudo-randomized order
-k, --key <KEY> The type of the key [default: integer] [possible values: integer, string26, string90, string250, string506, uuid]
-v, --value <VALUE> Size of the text value [env: CRUD_BENCH_VALUE=] [default: "{\n\t\t\t\"text\": \"string:50\",\n\t\t\t\"integer\": \"int\"\n\t\t}"]
--show-sample Print-out an example of a generated value
--pid <PID> Collect system information for a given pid
-a, --scans <SCANS> An array of scan specifications [env: CRUD_BENCH_SCANS=] [default: "[\n\t\t\t{ \"name\": \"count_all\", \"samples\": 100, \"projection\": \"COUNT\" },\n\t\t\t{ \"name\": \"limit_id\", \"samples\": 100, \"projection\": \"ID\", \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_all\", \"samples\": 100, \"projection\": \"FULL\", \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_count\", \"samples\": 100, \"projection\": \"COUNT\", \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_start_id\", \"samples\": 100, \"projection\": \"ID\", \"start\": 5000, \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_start_all\", \"samples\": 100, \"projection\": \"FULL\", \"start\": 5000, \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_start_count\", \"samples\": 100, \"projection\": \"COUNT\", \"start\": 5000, \"limit\": 100, \"expect\": 100 }\n\t\t]"]
-h, --help Print help (see more with '--help')```
For more detailed help information run the following command:
```bash
cargo run -r -- --help
You can use the argument -v
or --value
(or the environment variable CRUD_BENCH_VALUE
) to customize the row,
document, or record value which should be used in the benchmark tests. Pass a JSON structure that will serve as a
template for generating a randomized value.
Note
For tabular, or column-oriented databases (e.g. Postgres, MySQL, ScyllaDB), the first-level fields of the JSON structure are translated as columns, and any nested structures will be stored in a JSON column where possible.
Within the JSON structure, the following values are replaced by randomly generated data:
- Every occurrence of
string:XX
will be replaced by a random string with XX characters. - Every occurrence of
text:XX
will be replaced by a random string made of words of 2 to 10 characters, for a total of XX characters. - Every occurrence of
string:X..Y
will be replaced by a random string between X and Y characters. - Every occurrence of
text:X..Y
will be replaced by a random string made of words of 2 to 10 characters, for a total between X and Y characters. - Every
int
will be replaced by a random integer (i32). - Every
int:X..Y
will be replaced by a random integer (i32) between X and Y. - Every
float
will be replaced by a random float (f32). - Every
float:X..Y
will be replaced by a random float (f32) between X and Y. - Every
uuid
will be replaced by a random UUID (v4). - Every
bool
will be replaced by atrue
orfalse
(v4). - Every
string_enum:A,B,C
will be replaced by a string fromA
B
orC
. - Every
int_enum:A,B,C
will be replaced by a i32 fromA
B
orC
. - Every
float_enum:A,B,C
will be replaced by a f32 fromA
B
orC
. - Every
datetime
will be replaced by a datetime (ISO 8601).
{
"text": "text:30",
"text_range": "text:10..50",
"bool": "bool",
"string_enum": "enum:foo,bar",
"datetime": "datetime",
"float": "float",
"float_range": "float:1..10",
"float_enum": "float:1.1,2.2,3.3",
"integer": "int",
"integer_range": "int:1..5",
"integer_enum": "int:1,2,3",
"uuid": "uuid",
"nested": {
"text": "text:100",
"array": [
"string:10",
"string:2..5"
]
}
}
You can use the argument -a
or --scans
(or the environment variable CRUD_BENCH_SCANS
) to customise the range,
table, or scan queries that are performed in the benchmark. This parameter accepts a JSON array, where each item
represents a different scan test. Each test is defined as a JSON object specifying the scan parameters and the test
name.
Note
Not every database benchmark adapter supports scans or range queries. In such cases, the benchmark will not fail but
the associated tests will indicate that the benchmark was skipped
.
Each scan object can make use of the following values:
name
: A descriptive name for the test.projection
: The projection type of the scan:"ID"
: only the ID is returned."FULL"
: the whole record is returned."COUNT"
: count the number of records.
start
: Skips the specified number of rows before starting to return rows.limit
: Specifies the maximum number of rows to return.expect
: (optional) Asserts the expected number of rows returned.
[
{
"name": "limit100",
"projection": "FULL",
"start": 0,
"limit": 100,
"expect": 100
},
{
"name": "start100",
"projection": "ID",
"start": 100,
"limit": 100,
"expect": 100
}
]
This benchmark does not interact with any datastore, allowing the overhead of the benchmark implementation, written in Rust, to be measured.
cargo run -r -- -d dry -s 100000 -c 12 -t 24 -r
ArangoDB is a multi-model database with flexible data modeling and efficient querying.
cargo run -r -- -d arangodb -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running ArangoDB instance use the following command:
cargo run -r -- -d arangodb -e http://127.0.0.1:8529 -s 100000 -c 12 -t 24 -r
Dragonfly is an in-memory, networked, datastore which is fully-compatible with Redis and Memcached APIs.
cargo run -r -- -d dragonfly -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running Dragonfly instance use the following command:
cargo run -r -- -d dragonfly -e redis://:root@127.0.0.1:6379 -s 100000 -c 12 -t 24 -r
Fjall is a transactional, ACID-compliant, embedded, key-value datastore, written in safe Rust, and based on LSM-trees.
cargo run -r -- -d fjall -s 100000 -c 12 -t 24 -r
KeyDB is an in-memory, networked, datastore which is a high-performance fork of Redis, with a focus on multithreading.
cargo run -r -- -d keydb -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running KeyDB instance use the following command:
cargo run -r -- -d keydb -e redis://:root@127.0.0.1:6379 -s 100000 -c 12 -t 24 -r
LMDB is a transactional, ACID-compliant, embedded, key-value datastore, based on B-trees.
cargo run -r -- -d lmdb -s 100000 -c 12 -t 24 -r
An in-memory concurrent, associative HashMap in Rust.
cargo run -r -- -d map -s 100000 -c 12 -t 24 -r
MongoDB is a NoSQL, networked, ACID-compliant, document-oriented database, with support for unstructured data storage.
cargo run -r -- -d mongodb -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running MongoDB instance use the following command:
cargo run -r -- -d mongodb -e mongodb://root:root@127.0.0.1:27017 -s 100000 -c 12 -t 24 -r
MySQL is a networked, relational, ACID-compliant, SQL-based database.
cargo run -r -- -d mysql -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running MySQL instance use the following command:
cargo run -r -- -d mysql -e mysql://root:mysql@127.0.0.1:3306/bench -s 100000 -c 12 -t 24 -r
Neo4j is a graph database management system for connected data.
cargo run -r -- -d neo4j -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running Neo4j instance use the following command:
cargo run -r -- -d neo4j -e '127.0.0.1:7687' -s 100000 -c 12 -t 24 -r
Postgres is a networked, object-relational, ACID-compliant, SQL-based database.
cargo run -r -- -d postgres -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running Postgres instance use the following command:
cargo run -r -- -d postgres -e 'host=127.0.0.1 user=postgres password=postgres' -s 100000 -c 12 -t 24 -r
ReDB is a transactional, ACID-compliant, embedded, key-value datastore, written in Rust, and based on B-trees.
cargo run -r -- -d redb -s 100000 -c 12 -t 24 -r
Redis is an in-memory, networked, datastore that can be used as a cache, message broker, or datastore.
cargo run -r -- -d redis -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to an already-running Redis instance use the following command:
cargo run -r -- -d redis -e redis://:root@127.0.0.1:6379 -s 100000 -c 12 -t 24 -r
RocksDB is a transactional, ACID-compliant, embedded, key-value datastore, based on LSM-trees.
cargo run -r -- -d rocksdb -s 100000 -c 12 -t 24 -r
ScyllaDB is a distributed, NoSQL, wide-column datastore, designed to be compatible with Cassandra.
cargo run -r -- -d scylladb -s 100000 -c 12 -t 24 -r
The above command starts a Docker container automatically. To connect to a already-running ScyllaDB cluster use the following command:
cargo run -r -- -d scylladb -e 127.0.0.1:9042 -s 100000 -c 12 -t 24 -r
SQLite is an embedded, relational, ACID-compliant, SQL-based database.
cargo run -r -- -d sqlite -s 100000 -c 12 -t 24 -r
SurrealDB (in-memory storage engine)
cargo run -r -- -d surrealdb-memory -s 100000 -c 12 -t 24 -r
SurrealDB (RocksDB storage engine)
cargo run -r -- -d surrealdb-rocksdb -s 100000 -c 12 -t 24 -r
SurrealDB (SurrealKV storage engine)
cargo run -r -- -d surrealdb-surrealkv -s 100000 -c 12 -t 24 -r
SurrealDB embedded (in-memory storage engine)
cargo run -r -- -d surrealdb -e memory -s 100000 -c 12 -t 24 -r
SurrealDB embedded (RocksDB storage engine)
cargo run -r -- -d surrealdb -e rocksdb:/tmp/db -s 100000 -c 12 -t 24 -r
SurrealDB embedded (SurrealKV storage engine)
cargo run -r -- -d surrealdb -e surrealkv:/tmp/db -s 100000 -c 12 -t 24 -r
SurrealKV is a transactional, ACID-compliant, embedded, key-value datastore, written in Rust, and based on concurrent adaptive radix trees.
cargo run -r -- -d surrealkv -s 100000 -c 12 -t 24 -r
To run the benchmark against an already running SurrealDB instance, follow the steps below.
Start a SurrealDB server:
surreal start --allow-all -u root -p root rocksdb:/tmp/db
Then run crud-bench with the surrealdb
database option:
cargo run -r -- -d surrealdb -e ws://127.0.0.1:8000 -s 100000 -c 12 -t 24 -r