Skip to content

A benchmarking tool for testing and comparing the performance of both embedded and networked SQL and NoSQL databases.

License

Notifications You must be signed in to change notification settings

surrealdb/crud-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRUD-bench hero

crud-bench

The crud-bench benchmarking tool is an open-source benchmarking tool for testing and comparing the performance of a number of different workloads on embedded, networked, and remote databases. It can be used to compare both SQL and NoSQL platforms including key-value, embedded, relational, document, and multi-model databases. Importantly crud-bench focuses on testing additional features which are not present in other benchmarking tools, but which are available in SurrealDB.

The primary purpose of crud-bench is to continually test and monitor the performance of features and functionality built in to SurrealDB, enabling developers working on features in SurrealDB to assess the impact of their changes on database queries and performance.

The crud-bench benchmarking tool is being actively developed with new features and functionality being added regularly.

Contributing

The crud-bench benchmarking tool is open-source, and we encourage additions, modifications, and improvements to the benchmark runtime, and the datastore implementations.

How does it work?

When running simple, automated tests, the crud-bench benchmarking tool will automatically start a Docker container for the datastore or database which is being benchmarked (when the datastore or database is networked). This configuration can be modified so that an optimised, remote environment can be connected to, instead of running a Docker container locally. This allows for running crud-bench against remote datastores, and distributed datastores on a local network or remotely in the cloud.

In one table, the benchmark will operate 5 main tasks:

  • Create: inserting N unique records, with the specified concurrency.
  • Read: read N unique records, with the specified concurrency.
  • Update: update N unique records, with the specified concurrency.
  • Scans: perform a number of range and table scans, with the specified concurrency.
  • Delete: delete N unique records, with the specified concurrency.

With crud-bench almost all aspects of the benchmark engine are configurable:

  • The number of rows or records (samples).
  • The number of concurrent clients or connections.
  • The number of concurrent threads (concurrent messages per client).
  • Whether rows or records are modified sequentially or randomly.
  • The primary id or key type for the records.
  • The row or record content including support for nested objects and arrays.
  • The scan specifications for range or table queries.

Benchmarks

As crud-bench is in active development, some benchmarking workloads are already implemented, while others will be implemented in future releases. The list below details which benchmarks are implemented for the supporting datastores and lists those which are planned in the future.

CRUD

  • Creating single records in individual transactions
  • Reading single records in individual transactions
  • Updating single records in individual transactions
  • Deleting single records in individual transactions
  • Batch creating multiple records in a transaction
  • Batch reading multiple records in a transactions
  • Batch updating multiple records in a transactions
  • Batch deleting multiple records in a transactions

Scans

  • Full table scans, projecting all fields
  • Full table scans, projecting id field
  • Full table count queries
  • Scans with a limit, projecting all fields
  • Scans with a limit, projecting id field
  • Scans with a limit, counting results
  • Scans with a limit and offset, projecting all fields
  • Scans with a limit and offset, projecting id field
  • Scans with a limit and offset, counting results

Filters

  • Full table query, using filter condition, projecting all fields
  • Full table query, using filter condition, projecting id field
  • Full table query, using filter condition, counting rows

Indexes

  • Indexed table query, using filter condition, projecting all fields
  • Indexed table query, using filter condition, projecting id field
  • Indexed table query, using filter condition, counting rows

Relationships

  • Fetching or traversing 1-level, one-to-one relationships or joins
  • Fetching or traversing 1-level, one-to-many relationships or joins
  • Fetching or traversing 1-level, many-to-many relationships or joins
  • Fetching or traversing n-level, one-to-one relationships or joins
  • Fetching or traversing n-level, one-to-many relationships or joins
  • Fetching or traversing n-level, many-to-many relationships or joins

Workloads

  • Workload support for creating, updating, and reading records concurrently

Requirements

  • Docker - required when running automated tests
  • Rust - required when building crud-bench from source
  • Cargo - required when building crud-bench from source

Usage

cargo run -r -- -h
Usage: crud-bench [OPTIONS] --database <DATABASE> --samples <SAMPLES>

Options:
  -n, --name <NAME>          An optional name for the test, used as a suffix for the JSON result file name
  -d, --database <DATABASE>  The database to benchmark [possible values: dry, map, arangodb, dragonfly, fjall, keydb, lmdb, mongodb, mysql, neo4j, postgres, redb, redis, rocksdb, scylladb, sqlite, surrealkv, surrealdb, surrealdb-memory, surrealdb-rocksdb, surrealdb-surrealkv]
  -i, --image <IMAGE>        Specify a custom Docker image
  -p, --privileged           Whether to run Docker in privileged mode
  -e, --endpoint <ENDPOINT>  Specify a custom endpoint to connect to
  -b, --blocking <BLOCKING>  Maximum number of blocking threads (default is the number of CPU cores) [default: 12]
  -w, --workers <WORKERS>    Number of async runtime workers (default is the number of CPU cores) [default: 12]
  -c, --clients <CLIENTS>    Number of concurrent clients [default: 1]
  -t, --threads <THREADS>    Number of concurrent threads per client [default: 1]
  -s, --samples <SAMPLES>    Number of samples to be created, read, updated, and deleted
  -r, --random               Generate the keys in a pseudo-randomized order
  -k, --key <KEY>            The type of the key [default: integer] [possible values: integer, string26, string90, string250, string506, uuid]
  -v, --value <VALUE>        Size of the text value [env: CRUD_BENCH_VALUE=] [default: "{\n\t\t\t\"text\": \"string:50\",\n\t\t\t\"integer\": \"int\"\n\t\t}"]
      --show-sample          Print-out an example of a generated value
      --pid <PID>            Collect system information for a given pid
  -a, --scans <SCANS>        An array of scan specifications [env: CRUD_BENCH_SCANS=] [default: "[\n\t\t\t{ \"name\": \"count_all\", \"samples\": 100, \"projection\": \"COUNT\" },\n\t\t\t{ \"name\": \"limit_id\", \"samples\": 100, \"projection\": \"ID\", \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_all\", \"samples\": 100, \"projection\": \"FULL\", \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_count\", \"samples\": 100, \"projection\": \"COUNT\", \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_start_id\", \"samples\": 100, \"projection\": \"ID\", \"start\": 5000, \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_start_all\", \"samples\": 100, \"projection\": \"FULL\", \"start\": 5000, \"limit\": 100, \"expect\": 100 },\n\t\t\t{ \"name\": \"limit_start_count\", \"samples\": 100, \"projection\": \"COUNT\", \"start\": 5000, \"limit\": 100, \"expect\": 100 }\n\t\t]"]
  -h, --help                 Print help (see more with '--help')```

For more detailed help information run the following command:

```bash
cargo run -r -- --help

Value

You can use the argument -v or --value (or the environment variable CRUD_BENCH_VALUE) to customize the row, document, or record value which should be used in the benchmark tests. Pass a JSON structure that will serve as a template for generating a randomized value.

Note

For tabular, or column-oriented databases (e.g. Postgres, MySQL, ScyllaDB), the first-level fields of the JSON structure are translated as columns, and any nested structures will be stored in a JSON column where possible.

Within the JSON structure, the following values are replaced by randomly generated data:

  • Every occurrence of string:XX will be replaced by a random string with XX characters.
  • Every occurrence of text:XX will be replaced by a random string made of words of 2 to 10 characters, for a total of XX characters.
  • Every occurrence of string:X..Y will be replaced by a random string between X and Y characters.
  • Every occurrence of text:X..Y will be replaced by a random string made of words of 2 to 10 characters, for a total between X and Y characters.
  • Every int will be replaced by a random integer (i32).
  • Every int:X..Y will be replaced by a random integer (i32) between X and Y.
  • Every float will be replaced by a random float (f32).
  • Every float:X..Y will be replaced by a random float (f32) between X and Y.
  • Every uuid will be replaced by a random UUID (v4).
  • Every bool will be replaced by a true or false (v4).
  • Every string_enum:A,B,C will be replaced by a string from A B or C.
  • Every int_enum:A,B,C will be replaced by a i32 from A B or C.
  • Every float_enum:A,B,C will be replaced by a f32 from A B or C.
  • Every datetime will be replaced by a datetime (ISO 8601).
{
  "text": "text:30",
  "text_range": "text:10..50",
  "bool": "bool",
  "string_enum": "enum:foo,bar",
  "datetime": "datetime",
  "float": "float",
  "float_range": "float:1..10",
  "float_enum": "float:1.1,2.2,3.3",
  "integer": "int",
  "integer_range": "int:1..5",
  "integer_enum": "int:1,2,3",
  "uuid": "uuid",
  "nested": {
    "text": "text:100",
    "array": [
      "string:10",
      "string:2..5"
    ]
  }
}

Scans

You can use the argument -a or --scans (or the environment variable CRUD_BENCH_SCANS) to customise the range, table, or scan queries that are performed in the benchmark. This parameter accepts a JSON array, where each item represents a different scan test. Each test is defined as a JSON object specifying the scan parameters and the test name.

Note

Not every database benchmark adapter supports scans or range queries. In such cases, the benchmark will not fail but the associated tests will indicate that the benchmark was skipped.

Each scan object can make use of the following values:

  • name: A descriptive name for the test.
  • projection: The projection type of the scan:
    • "ID": only the ID is returned.
    • "FULL": the whole record is returned.
    • "COUNT": count the number of records.
  • start: Skips the specified number of rows before starting to return rows.
  • limit: Specifies the maximum number of rows to return.
  • expect: (optional) Asserts the expected number of rows returned.
[
  {
    "name": "limit100",
    "projection": "FULL",
    "start": 0,
    "limit": 100,
    "expect": 100
  },
  {
    "name": "start100",
    "projection": "ID",
    "start": 100,
    "limit": 100,
    "expect": 100
  }
]

Databases

Dry

This benchmark does not interact with any datastore, allowing the overhead of the benchmark implementation, written in Rust, to be measured.

cargo run -r -- -d dry -s 100000 -c 12 -t 24 -r

ArangoDB is a multi-model database with flexible data modeling and efficient querying.

cargo run -r -- -d arangodb -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running ArangoDB instance use the following command:

cargo run -r -- -d arangodb -e http://127.0.0.1:8529 -s 100000 -c 12 -t 24 -r

Dragonfly is an in-memory, networked, datastore which is fully-compatible with Redis and Memcached APIs.

cargo run -r -- -d dragonfly -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running Dragonfly instance use the following command:

cargo run -r -- -d dragonfly -e redis://:root@127.0.0.1:6379 -s 100000 -c 12 -t 24 -r

Fjall is a transactional, ACID-compliant, embedded, key-value datastore, written in safe Rust, and based on LSM-trees.

cargo run -r -- -d fjall -s 100000 -c 12 -t 24 -r

KeyDB is an in-memory, networked, datastore which is a high-performance fork of Redis, with a focus on multithreading.

cargo run -r -- -d keydb -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running KeyDB instance use the following command:

cargo run -r -- -d keydb -e redis://:root@127.0.0.1:6379 -s 100000 -c 12 -t 24 -r

LMDB is a transactional, ACID-compliant, embedded, key-value datastore, based on B-trees.

cargo run -r -- -d lmdb -s 100000 -c 12 -t 24 -r

An in-memory concurrent, associative HashMap in Rust.

cargo run -r -- -d map -s 100000 -c 12 -t 24 -r

MongoDB is a NoSQL, networked, ACID-compliant, document-oriented database, with support for unstructured data storage.

cargo run -r -- -d mongodb -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running MongoDB instance use the following command:

cargo run -r -- -d mongodb -e mongodb://root:root@127.0.0.1:27017 -s 100000 -c 12 -t 24 -r

MySQL is a networked, relational, ACID-compliant, SQL-based database.

cargo run -r -- -d mysql -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running MySQL instance use the following command:

cargo run -r -- -d mysql -e mysql://root:mysql@127.0.0.1:3306/bench -s 100000 -c 12 -t 24 -r

Neo4j is a graph database management system for connected data.

cargo run -r -- -d neo4j -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running Neo4j instance use the following command:

cargo run -r -- -d neo4j -e '127.0.0.1:7687' -s 100000 -c 12 -t 24 -r

Postgres is a networked, object-relational, ACID-compliant, SQL-based database.

cargo run -r -- -d postgres -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running Postgres instance use the following command:

cargo run -r -- -d postgres -e 'host=127.0.0.1 user=postgres password=postgres' -s 100000 -c 12 -t 24 -r

ReDB is a transactional, ACID-compliant, embedded, key-value datastore, written in Rust, and based on B-trees.

cargo run -r -- -d redb -s 100000 -c 12 -t 24 -r

Redis is an in-memory, networked, datastore that can be used as a cache, message broker, or datastore.

cargo run -r -- -d redis -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to an already-running Redis instance use the following command:

cargo run -r -- -d redis -e redis://:root@127.0.0.1:6379 -s 100000 -c 12 -t 24 -r

RocksDB is a transactional, ACID-compliant, embedded, key-value datastore, based on LSM-trees.

cargo run -r -- -d rocksdb -s 100000 -c 12 -t 24 -r

ScyllaDB is a distributed, NoSQL, wide-column datastore, designed to be compatible with Cassandra.

cargo run -r -- -d scylladb -s 100000 -c 12 -t 24 -r

The above command starts a Docker container automatically. To connect to a already-running ScyllaDB cluster use the following command:

cargo run -r -- -d scylladb -e 127.0.0.1:9042 -s 100000 -c 12 -t 24 -r

SQLite is an embedded, relational, ACID-compliant, SQL-based database.

cargo run -r -- -d sqlite -s 100000 -c 12 -t 24 -r

SurrealDB (in-memory storage engine)

cargo run -r -- -d surrealdb-memory -s 100000 -c 12 -t 24 -r

SurrealDB (RocksDB storage engine)

cargo run -r -- -d surrealdb-rocksdb -s 100000 -c 12 -t 24 -r

SurrealDB (SurrealKV storage engine)

cargo run -r -- -d surrealdb-surrealkv -s 100000 -c 12 -t 24 -r

SurrealDB embedded (in-memory storage engine)

cargo run -r -- -d surrealdb -e memory -s 100000 -c 12 -t 24 -r

SurrealDB embedded (RocksDB storage engine)

cargo run -r -- -d surrealdb -e rocksdb:/tmp/db -s 100000 -c 12 -t 24 -r

SurrealDB embedded (SurrealKV storage engine)

cargo run -r -- -d surrealdb -e surrealkv:/tmp/db -s 100000 -c 12 -t 24 -r

SurrealKV is a transactional, ACID-compliant, embedded, key-value datastore, written in Rust, and based on concurrent adaptive radix trees.

cargo run -r -- -d surrealkv -s 100000 -c 12 -t 24 -r

SurrealDB local benchmark

To run the benchmark against an already running SurrealDB instance, follow the steps below.

Start a SurrealDB server:

surreal start --allow-all -u root -p root rocksdb:/tmp/db

Then run crud-bench with the surrealdb database option:

cargo run -r -- -d surrealdb -e ws://127.0.0.1:8000 -s 100000 -c 12 -t 24 -r

About

A benchmarking tool for testing and comparing the performance of both embedded and networked SQL and NoSQL databases.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages