Skip to content

cachegrand - a modern data ingestion, processing and serving platform built for today's hardware

License

Notifications You must be signed in to change notification settings

danielealbano/cachegrand

License Build & Test codecov FOSSA Status

cachegrand logo

Table of Content

What is cachegrand?

cachegrand is a high-performance key-value store designed to optimize performance on modern hardware. It offers efficient data handling, processing, and retrieval capabilities through standard protocols and APIs. With its impressive scalability, cachegrand can handle millions of operations per second, all completed in less than a millisecond.

Underneath its robust architecture, cachegrand employs its own database engine that seamlessly works with various data types to to enable a versatile data ingestion and retrieval. Additionally, cachegrand supports WebAssembly, empowering server-side data processing.

One of cachegrand's notable features is its compatibility with widely adopted protocols and interfaces, including Kafka, Redis, and Memcache. This compatibility allows developers to leverage their existing familiarity with SDKs and frameworks.

cachegrand is still in development and some of the features are still in the works, but it's already possible to use it as a high-performance key-value store with the Redis protocol.

Use cases

cachegrand simplifies real-time operations by providing an integrated solution, eliminating the need for stitching together multiple technologies. It offers a streamlined approach to manage and process data, enabling efficient real-time scenarios.

With cachegrand, you can:

  • Ingest data using familiar streaming methods like Kafka or Redis.
  • Process data using WebAssembly-compatible languages like Python or Rust.
  • Leverage the internal database to store intermediate and final results.
  • Deliver processed data using supported protocols such as Kafka, Redis, HTTP(s), or Memcached.

But it's also possible just to use single components of cachegrand, like the Key-Value store via the Redis protocol, or the Kafka protocol, both with or without the WebAssembly support, depending on your needs.

Implemented Key features

Currently cachegrand supports only the Redis interface to interact with the Key-Value store, support for the Kafka interface and the WebAssembly support are still in the works.

  • Redis protocol support
  • Prometheus endpoint for monitoring
  • Blazing fast hashtable capable of digesting 2.1 billion records per second on a 1x AMD EPYC 7502
  • Scales vertically, 2x cpus means ~2x requests
  • In-memory and on-disk storage
  • Supports different collection data types (Streams, Lists, Hashsets, Sorted Sets, etc.) (WIP)

Planned Key Features

  • More modules for additional platforms compatibility, e.g. Kafka, Memcache, HTTPS, AWS S3, DataDog, etc.
  • And ad-hoc UDP message-based (Homa-like) network stack based on Linux XDP (eXpress Data Path)
  • WebAssembly to provide User Defined Functions, event hooks, implement modules, you can use your preferred language to perform operations server side
  • Automatic load balancing, Replication groups, Eventual Consistency

It's possible to find more information in the docs' folder.

cachegrand runs on Linux on x86-64 (Intel and AMD) and aarch64 (ARMv8, e.g. Raspberry PI 4, Orange PI 5, etc.), we are planning to port it to more hardware (e.g. RISC) once will become more feature complete.

Benchmarks

Internal Key-Value stor via the Redis interface

The benchmarks are regularly carried out on an AMD EPYC 7502P with 2 x 25Gbit network links using Ubuntu 22.04 and two other servers, with the same hardware, to generate load using memtier_benchmark.

GET/SET latest benchmarks   GET/SET with batching latest benchmarks

Benchmarks source

Quick Start

Simply run

docker run \
  --ulimit memlock=-1:-1 \
  --ulimit nofile=262144:262144 \
  -p 6379:6379 \
  -p 6380:6380 \
  -p 9090:9090 \
  -it \
  --rm \
  cachegrand/cachegrand-server:latest

it comes with a default config file with Redis on port 6379, TLS-enabled Redis on port 6380 and Prometheus on port 9090 with the metrics endpoint on /metrics.

The certificate will be generated on each start, to use an ad-hoc SSL certificate, instead of the auto-generated one, it's possible to mount the required certificate and key using the following command

docker run \
  -v /path/to/certificate.pem:/etc/cachegrand/cachegrand.pem \
  -v /path/to/certificate.key:/etc/cachegrand/cachegrand.key \
  --ulimit memlock=-1:-1 \
  --ulimit nofile=262144:262144 \
  -p 6379:6379 \
  -p 6380:6380 \
  -p 9090:9090 \
  -it \
  --rm \
  cachegrand/cachegrand-server:latest

if you want to use a custom configuration, you can download the default configuration file using the following command

curl https://raw.githubusercontent.com/danielealbano/cachegrand/main/etc/cachegrand.yaml.skel -o /path/to/cachegrand.yaml

Edit it and then use it with the cachegrand's docker image using the following command

docker run \
  -v /path/to/cachegrand.yaml:/etc/cachegrand/cachegrand.yaml \
  --ulimit memlock=-1:-1 \
  --ulimit nofile=262144:262144 \
  -p 6379:6379 \
  -p 6380:6380 \
  -p 9090:9090 \
  -it \
  --rm \
  cachegrand/cachegrand-server:latest

Configuration

cachegrand comes with a default configuration but for production use please review the documentation to ensure an optimal deployment.

Running cachegrand

Build from source

Instructions on how to build cachegrand from the sources are available in the documentation

cachegrand doesn't need to run as root but please review the configuration section to ensure that enough lockable memory has been allowed, enough files can be opened and that the slab allocator has been enabled and enough huge pages have been provided

Before trying to start cachegrand, take a look to the performance tips available in the docs' section as they might provide a valuable help!

Help

$ ./cachegrand-server --help
Usage: cachegrand-server [OPTION...]

  -c, --config-file=FILE     Config file (default config file
                             /usr/local/etc/cachegrand/cachegrand.conf )
  -l, --log-level=LOG LEVEL  log level (error, warning, info, verbose, debug)
  -?, --help                 Give this help list
      --usage                Give a short usage message

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Start it locally

Once cachegrand has been built from the sources, it's possible to run it with the following command

/path/to/cachegrand-server -c /path/to/cachegrand.yaml.skel
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] cachegrand-server version v0.3.0 (built on 2023-05-08T18:19:31Z)
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > Release build, compiled using gcc v11.3.0
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > Hashing algorithm in use t1ha2
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > Running on Linux bf2b94ff7fa5 5.19.0-38-generic #39-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 17 17:33:16 UTC 2023 x86_64
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > Memory: 128722 MB total, 1955 MB swap total
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > TLS: mbed TLS 2.28.0 (kernel offloading enabled)
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > Realtime clock source <POSIX>, resolution <4 ms>
[2023-04-08T18:25:54Z][INFO       ][program_startup_report] > Monotonic clock source <Hardware (TSC)> (estimated cpu cycles per second <4.20 GHz>), resolution <1 ms>
[2023-04-08T18:25:54Z][INFO       ][config] Loading the configuration from /etc/cachegrand/cachegrand.yaml
[2023-04-08T18:25:54Z][INFO       ][program] Starting <32> workers
[2023-04-08T18:25:54Z][INFO       ][worker][id: 00][cpu: 00][module_redis_snapshot_load] Snapshot file </var/lib/cachegrand/dump.rdb> does not exist
[2023-04-08T18:25:55Z][INFO       ][program] Ready to accept connections

License

cachegrand is Open Source and licensed under the BSD 3-Clause License, all the files under the repository are licensed under the same license unless otherwise specified (for example, but not limited, via a README or a LICENSE in a sub folder or as header in the source files).

cachegrand uses a number of different components all licensed under compatible licenses but if you spot any that is not compatible with the BSD 3-Clause License please open an issue, we will be happy to fix it promptly!

Contributing

Please if you find any bug, malfunction or regression feel free to open an issue or to fork the repository and submit your PRs! If you do open an Issue for a crash, if possible please enable sentry.io in the configuration file and try to reproduce the crash, a minidump will be automatically uploaded on sentry.io.

Also, if you have built cachegrand from the source, please attach the compiled binary to the issue as well as sentry.io knows nothing of your own compiled binaries.