Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding nosqlbench test scripts #118

Merged
merged 4 commits into from
Feb 22, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docker-compose/docker-compose-dev-mode.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ services:
networks:
- stargate
ports:
- "8180:8080"
- "8080:8080"
mem_limit: 2G
environment:
- QUARKUS_GRPC_CLIENTS_BRIDGE_HOST=coordinator
Expand Down
2 changes: 1 addition & 1 deletion docker-compose/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ services:
networks:
- stargate
ports:
- "8180:8080"
- "8080:8080"
mem_limit: 2G
environment:
- QUARKUS_GRPC_CLIENTS_BRIDGE_HOST=coordinator
Expand Down
4 changes: 2 additions & 2 deletions docker-compose/start_dse_68.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,6 @@ export DSETAG
export SGTAG
export JSONTAG

echo "Running with DSE $DSETAG, Stargate $SGTAG, JSON API $JSON"
echo "Running with DSE $DSETAG, Stargate $SGTAG, JSON API $JSONTAG"

docker-compose up -d
docker-compose up -d --wait
ivansenic marked this conversation as resolved.
Show resolved Hide resolved
4 changes: 2 additions & 2 deletions docker-compose/start_dse_68_dev_mode.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,6 @@ export DSETAG
export SGTAG
export JSONTAG

echo "Running with DSE $DSETAG, Stargate $SGTAG, JSON API $JSON"
echo "Running with DSE $DSETAG, Stargate $SGTAG, JSON API $JSONTAG"

docker-compose -f docker-compose-dev-mode.yml up -d
docker-compose -f docker-compose-dev-mode.yml up -d --wait
64 changes: 64 additions & 0 deletions nosqlbench/http-jsonapi-crud-basic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# JSON API CRUD Basic

## Description

The JSON API CRUD Basic workflow targets Stargate's JSON API using generated JSON documents.
The documents used are sharing the same structure and are approximately half a kilobyte in size each:

* each document has 13 leaf values, with a maximum depth of 3
* there is at least one `string`, `boolean`, `number` and `null` leaf
* there is one array with `double` values and one with `string` values
* there is one empty array and one empty map

The example JSON looks like:

```json
{
"user_id":"56fd76f6-081d-401a-85eb-b1d9e5bba058",
"created_on":1476743286,
"gender":"F",
"full_name":"Andrew Daniels",
"married":true,
"address":{
"primary":{
"cc":"IO",
"city":"Okmulgee"
},
"secondary":{

}
},
"coordinates":[
64.65964627052323,
-122.35334535072856
],
"children":[

],
"friends":[
"3df498b1-9568-4584-96fd-76f6081da01a"
],
"debt":null
}
```

In contrast to other workflows, this one is not split into ramp-up and main phases.
Instead, there is only the main phase with 4 different load types (write, read, update and delete).
ivansenic marked this conversation as resolved.
Show resolved Hide resolved

## Named Scenarios

### default

The default scenario for http-jsonapi-crud-basic.yaml runs each type of the main phase sequentially: write, read, update and delete.
This means that setting cycles for each of the phases should be done using the: `write-cycles`, `read-cycles`, `update-cycles` and `delete-cycles`.
The default value for all 4 cycles variables is the amount of documents to process (see [Workload Parameters](#workload-parameters)).

Note that error handling is set to `errors=timer,warn`, which means that in case of HTTP errors the scenario is not stopped.

## Workload Parameters

- `docscount` - the number of documents to process in each step of a scenario (default: `10_000_000`)

Note that if number of documents is higher than `read-cycles` you would experience misses, which will result in `HTTP 404` and smaller latencies.


207 changes: 207 additions & 0 deletions nosqlbench/http-jsonapi-crud-basic.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
min_version: "4.17.15"

# nb5 -v run driver=http yaml=http-jsonapi-crud-basic jsonapi_host=my_jsonapi_host auth_token=$AUTH_TOKEN

description: |
This workload emulates CRUD operations for the Stargate JSON API.
It generates a simple JSON document to be used for writes and updates.
Note that jsonapi_port should reflect the port where the JSON API is exposed (defaults to 8080).

scenarios:
default:
schema: run driver=http tags==block:schema threads==1 cycles==UNDEF
write: run driver=http tags==name:"write.*" cycles===TEMPLATE(write-cycles,TEMPLATE(docscount,10000000)) threads=auto errors=timer,warn
read: run driver=http tags==name:"read.*" cycles===TEMPLATE(read-cycles,TEMPLATE(docscount,10000000)) threads=auto errors=timer,warn
update: run driver=http tags==name:"update.*" cycles===TEMPLATE(update-cycles,TEMPLATE(docscount,10000000)) threads=auto errors=timer,warn
delete: run driver=http tags==name:"delete.*" cycles===TEMPLATE(delete-cycles,TEMPLATE(docscount,10000000)) threads=auto errors=timer,warn

bindings:
# To enable an optional weighted set of hosts in place of a load balancer
# Examples
# single host: jsonapi_host=host1
# multiple hosts: jsonapi_host=host1,host2,host3
# multiple weighted hosts: jsonapi_host=host1:3,host2:7
weighted_hosts: WeightedStrings('<<jsonapi_host:jsonapi>>')
# http request id
request_id: ToHashedUUID(); ToString();

seq_key: Mod(<<docscount:10000000>>); ToString() -> String
random_key: Uniform(0,<<docscount:10000000>>); ToString() -> String

user_id: ToHashedUUID(); ToString() -> String
created_on: Uniform(1262304000,1577836800) -> long
gender: WeightedStrings('M:10;F:10;O:1')
full_name: FullNames()
married: ModuloToBoolean()
city: Cities()
country_code: CountryCodes()
lat: Uniform(-180d, 180d)
lng: Hash() -> long; Uniform(-180d, 180d)
friend_id: Add(-1); ToHashedUUID(); ToString() -> String

blocks:
schema:
ops:
create-namespace:
Copy link
Contributor

@ivansenic ivansenic Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add response assertions for everything? We pretty much know that everything must be 200, that schema change commands respond with "ok": 1 (or smth like that), that insert docs would return inserted id, etc.. Also we should confirm if possible that there are no errors in the returned body.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should not do this. The response assertions are used to determine when the script should abort. There is a very limited case where we'd want to do this - if the setup (schema creation) fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true, assertions do not stop the script. The reason why I think that's important, is to be certain we are getting time results for correctly executed use-cases and not for error cases. Cause often error cases would actually be faster. Think about missing coordinator, this test would run really fast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible problem is if there is no check that reads should target documents actually written: if not, we'd get 404 (correctly) when trying to access non-existing documents. So return value depends on true existence of document(s). I agree it would be very useful to have sanity checks so that we do not accidentally "test" performance of broken tests (nothing found, ever, which can be fast).

Copy link

@jshook jshook Feb 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some ideas on this, but I don't have any really prescriptive answers so far. I do think that the answers come more easily if you know exactly what you are testing for. i.e, correctness, performance, etc. But in this case, we generally want to verify that schema creation works all the time but we want to allow for some "empty reads" in performance tests. If you say instead that you are doing a correctness test, then you move more towards wanting to qualify each and every result.

While it is fairly easy to construct bindings which can be used for correctness assertions the second time you do it, the first time is often a learning exercise. But, it is quite doable. [You can use specific strategies for building bindings] so that you know which operations should return non-empty results and which ones should, or even how many results each operation should return.

So, my main question is, what exactly are you testing for in this case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not saying specifically for the schema change responses, I am saying lets add it everywhere. But question is actually good, what do we want from these tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per Shooky's example, we do want to allow empty reads. These are performance tests, not correctness tests.

By default the tests do exit on error condition, but we can override this with the setting errors=count to get statistics on errors which we can then analyze.

According to the JSON API spec this service is going to return 200 for basically everything except 500 server error. So identifying error conditions per call will require more work. We'll probably need to do something like

ok-status: "200"
ok-body: "some regex"

IMO that's beyond the scope of this initial PR. I'd just like to get these checked in initially and begin running them regularly, and then we can improve the tests as we go.

method: POST
uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1
Accept: "application/json"
X-Cassandra-Request-Id: "{request_id}"
X-Cassandra-Token: "<<auth_token:my_auth_token>>"
Content-Type: "application/json"
body: |
{
"createNamespace": {
"name": "<<namespace:jsonapi_crud_basic>>"
}
}

# TODO schema deletion not yet addressed by JSON API Spec
#delete-collection:
# method: POST
# uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1/<<namespace:jsonapi_crud_basic>>
# Accept: "application/json"
# X-Cassandra-Request-Id: "{request_id}"
# X-Cassandra-Token: "<<auth_token:my_auth_token>>"
# ok-status: "[2-4][0-9][0-9]"
# body: |
# {
# "deleteCollection": {
# "name": "<<collection:docs_collection>>"
# }
# }

create-collection:
method: POST
uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1/<<namespace:jsonapi_crud_basic>>
Accept: "application/json"
X-Cassandra-Request-Id: "{request_id}"
X-Cassandra-Token: "<<auth_token:my_auth_token>>"
Content-Type: "application/json"
body: |
{
"createCollection": {
"name": "<<collection:docs_collection>>"
}
}

write:
ops:
# aka insertOne
write-document:
method: POST
uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1/<<namespace:jsonapi_crud_basic>>/<<collection:docs_collection>>
Accept: "application/json"
X-Cassandra-Request-Id: "{request_id}"
X-Cassandra-Token: "<<auth_token:my_auth_token>>"
Content-Type: "application/json"
body: |
{
"insertOne" : {
"document" : {
"_id" : "{seq_key}",
"user_id": "{user_id}",
"created_on": {created_on},
"gender": "{gender}",
"full_name": "{full_name}",
"married": {married},
"address": {
"primary": {
"city": "{city}",
"cc": "{country_code}"
},
"secondary": {}
},
"coordinates": [
{lat},
{lng}
],
"children": [],
"friends": [
"{friend_id}"
]
}
}
}

read:
ops:
# aka findOne with _id as filter
read-document:
method: POST
uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1/<<namespace:jsonapi_crud_basic>>/<<collection:docs_collection>>
Accept: "application/json"
X-Cassandra-Request-Id: "{request_id}"
X-Cassandra-Token: "<<auth_token:my_auth_token>>"
Content-Type: "application/json"
body: |
{
"findOne" : {
"filter" : {
"_id" : "{random_key}"
}
}
}

update:
ops:
# aka findOneAndUpdate
# for parity with other tests this only uses set, not unset, no return value
update-document:
method: POST
uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1/<<namespace:jsonapi_crud_basic>>/<<collection:docs_collection>>
Accept: "application/json"
X-Cassandra-Request-Id: "{request_id}"
X-Cassandra-Token: "<<auth_token:my_auth_token>>"
Content-Type: "application/json"
body: |
{
"findOneAndUpdate" : {
"filter": {
"_id" : "{random_key}
},
"update": {
"$set": {
"user_id": "{user_id}",
"created_on": {created_on},
"gender": "{gender}",
"full_name": "{full_name}",
"married": {married},
"address": {
"primary": {
"city": "{city}",
"cc": "{country_code}"
},
"secondary": {}
},
"coordinates": [
{lat},
{lng}
],
"children": [],
"friends": [
"{friend_id}"
],
"debt": null
}
}
}
}

delete:
ops:
delete-document:
method: POST
uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8080>><<path_prefix:>>/v1/<<namespace:jsonapi_crud_basic>>/<<collection:docs_collection>>
Accept: "application/json"
X-Cassandra-Request-Id: "{request_id}"
X-Cassandra-Token: "<<auth_token:my_auth_token>>"
Content-Type: "application/json"
body: |
{
"findOne" : {
"filter" : {
"_id" : "{random_key}"
}
}
}
41 changes: 41 additions & 0 deletions nosqlbench/http-jsonapi-crud-dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Documents API CRUD using an external Dataset

## Description

The Documents API CRUD Dataset workflow targets Stargate's Documents API using JSON documents from an external dataset.
The [dataset](#dataset) is mandatory and should contain a JSON document per row that should be used as the input for write and update operations.
This workflow is perfect for testing Stargate performance using your own JSON dataset or any other realistic dataset.

In contrast to other workflows, this one is not split into ramp-up and main phases.
Instead, there is only the main phase with 4 different load types (write, read, update and delete).

## Named Scenarios

### default

The default scenario for http-jsonapi-crud-dataset.yaml runs each type of the main phase sequentially: write, read, update and delete.
This means that setting cycles for each of the phases should be done using the: `write-cycles`, `read-cycles`, `update-cycles` and `delete-cycles`.
The default value for all 4 cycles variables is the amount of documents to process (see [Workload Parameters](#workload-parameters)).

Note that error handling is set to `errors=timer,warn`, which means that in case of HTTP errors the scenario is not stopped.

## Dataset

### JSON Documents

As explained above, in order to run the workflow a file containing JSON documents is needed.
If you don't have a dataset at hand, please have a look at [awesome-json-datasets](https://github.com/jdorfman/awesome-json-datasets).
You can use exposed public APIs to create a realistic dataset of your choice.

For example, you can easily create a dataset containing [Bitcoin unconfirmed transactions](https://gist.github.com/ivansenic/e280a89aba6420acb4f587d3779af774).

```bash
curl 'https://blockchain.info/unconfirmed-transactions?format=json&limit=5000' | jq -c '.txs | .[]' > blockchain-unconfirmed-transactions.json
```

Above command creates a dataset with 5.000 latest unconfirmed transactions.

## Workload Parameters

- `docscount` - the number of documents to process in each step of a scenario (default: `10_000_000`)
- `dataset_file` - the file to read the JSON documents from (note that if number of documents in a file is smaller than the `docscount` parameter, the documents will be reused)
Loading