stargate · Yuqi-Du · Sep 7, 2023 · Aug 21, 2023 · Aug 23, 2023 · Aug 28, 2023
@@ -0,0 +1,50 @@
+# JSON API Vector CRUD
+
+## Description
+
+The JSON API CRUD Dataset workflow targets Stargate's JSON API using JSON documents from an external dataset.
+The [dataset](#dataset) is mandatory and should contain a vector per row that should be used as the input for write, read and update operations.
+This workflow is perfect for testing Stargate performance using your own JSON dataset or any other realistic dataset.
+
+In contrast to other workflows, this one is not split into ramp-up and main phases. Instead, there is only the main phase with 4 different load types (write, read, update and delete).
+
+## Named Scenarios
+
+### default
+
+The default scenario for http-jsonapi-vector-crud.yaml runs each type of the main phase sequentially: write, read, update and delete. This means that setting cycles for each of the phases should be done using the: `write-cycles`, `read-cycles`, `update-cycles` and `delete-cycles`. The default value for all 4 cycles variables is the amount of documents to process (see [Workload Parameters](http://localhost:63342/markdownPreview/147307353/markdown-preview-index-1841516304.html?_ijt=avuea5chkg34krn8blmr2k7431#workload-parameters)).
+
+Note that error handling is set to `errors=timer,warn`, which means that in case of HTTP errors the scenario is not stopped.
+
+## Dataset
+
+### Vector Sample
+
+Vector size is 1536 in the nosqlbench file. (openAI embedding vector standard size is 1536)
+Sample dataset is in [vector dataset](vector-dataset.txt)
+
+> If you want to test different vector-size, please change [http-jsonapi-vector-crud create-collection op](http-jsonapi-vector-crud.yaml) and [vector dataset](vector-dataset.txt)
+
+## Workload Parameters
+
+- `docscount` - the number of documents to process in each step of a scenario (default: `500`)
+- `dataset_file` - the file to read the JSON documents from (note that if number of documents in a file is smaller than the `docscount` parameter, the documents will be reused)
+- `connections` - number of HTTP2 connections to be shared between the threads (default: `20`) 
+- `write-cycles`, `read-cycles`, `update-cycles`,`delete-cycles` - running cycles for each phases (default: `docscount`) 
+
+## Sample Command
+
+### Against AstraDB
+
+> comment out `create-namespace` op in the [nosqlbench yaml file](http-jsonapi-vector-crud.yaml) 
+
+```
+nb5 -v http-jsonapi-vector-crud docscount=1000 threads=20 jsonapi_host=Your-AstraDB-Host auth_token=Your-AstraDB-Token jsonapi_port=443 protocol=https path_prefix=/api/json namespace=Your-Keyspace
+```
+
+### Against Local JSON API
+
+```
+nb5 -v http-jsonapi-vector-crud jsonapi_host=localhost docscount=1000 threads=20
+```
+
@@ -0,0 +1,197 @@
+min_version: "5.17.3"
+
+# Example command line
+# Against AstraDB
+# nb5 -v http-jsonapi-vector-crud docscount=1000 threads=20 jsonapi_host=Your-AstraDB-Host auth_token=Your-AstraDB-Token jsonapi_port=443 protocol=https path_prefix=/api/json namespace=Your-Keyspace
+# Against local JSON API
+# nb5 -v http-jsonapi-vector-crud jsonapi_host=localhost docscount=1000 threads=20
+
+description: >2
+  This workload emulates vector CRUD operations for Stargate JSON API.
+  It requires a data set file (default vector-dataset.txt), where contains vectors of size 1536
+  1536 is a standard vector size that openAI embedding generates, using this size for benchmark
+
+
+scenarios:
+  default:
+    schema:   run driver=http tags==block:schema threads==1 cycles==UNDEF
+    write:    run driver=http tags==name:"write.*" cycles===TEMPLATE(write-cycles,TEMPLATE(docscount,500)) threads=auto errors=timer,warn
+    read:     run driver=http tags==name:"read.*" cycles===TEMPLATE(read-cycles,TEMPLATE(docscount,500)) threads=auto errors=timer,warn
+    update:   run driver=http tags==name:"update.*" cycles===TEMPLATE(update-cycles,TEMPLATE(docscount,500)) threads=auto errors=timer,warn
+    delete:   run driver=http tags==name:"delete.*" cycles===TEMPLATE(delete-cycles,TEMPLATE(docscount,500)) threads=auto errors=timer,warn
+
+bindings:
+  # To enable an optional weighted set of hosts in place of a load balancer
+  # Examples
+  #   single host: jsonapi_host=host1
+  #   multiple hosts: jsonapi_host=host1,host2,host3
+  #   multiple weighted hosts: jsonapi_host=host1:3,host2:7
+  weighted_hosts: WeightedStrings('<<jsonapi_host:jsonapi>>')
+
+  # spread into different spaces to use multiple connections
+  space: HashRange(1,<<connections:20>>); ToString();
+
+  # http request id
+  request_id: ToHashedUUID(); ToString();
+
+  # autogenerate auth token to use on API calls using configured uri/uid/password, unless one is provided
+  token: Discard(); Token('<<auth_token:>>','<<uri:http://localhost:8081/v1/auth>>', '<<uid:cassandra>>', '<<pswd:cassandra>>');
+
+  seq_key: Mod(<<docscount:500>>); ToString() -> String
+  random_key: Uniform(0,<<docscount:500>>); ToString() -> String
+  vector_json: HashedLineToString('<<dataset:vector-dataset.txt>>');
+
+blocks:
+  schema:
+    ops:
+      create-namespace:
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"ok\":1.*"
+        body: >2
+          {
+            "createNamespace": {
+              "name": "<<namespace:jsonapi_vector_crud_namespace>>"
+            }
+          }
+
+      delete-collection:
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"ok\":1.*"
+        body: >2
+          {
+            "deleteCollection": {
+              "name": "<<collection:jsonapi_vector_crud_collection>>"
+            }
+          }
+
+      create-collection:
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"ok\":1.*"
+#        vector mush be enabled when creating collection
+        body: >2
+          {
+            "createCollection": {
+              "name": "<<collection:jsonapi_vector_crud_collection>>",
+              "options": {
+                          "vector": {
+                              "size": 1536
+                          }
+              }
+            }
+          }
+
+  write:
+    ops:
+      write-insert-one-vector:
+        params:
+          ratio: 5
+        space: "{space}"
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>/<<collection:jsonapi_vector_crud_collection>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: '.*\"insertedIds\":\[.*\].*'
+        body: >2
+          {
+            "insertOne" : {
+              "document" : {
+                "_id" :         "{seq_key}",
+                "$vector" :      {vector_json}
+              }
+            }
+          }
+  read:
+    ops:
+      find-one-by-vector-projection:
+        space: "{space}"
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>/<<collection:jsonapi_vector_crud_collection>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"data\".*"
+        body: >2
+          {
+            "findOne": {
+              "sort" : {"$vector" : {vector_json}},
+              "projection" : {"$vector" : 1}
+            }
+          }
+
+      find-by-vector-projection:
+        space: "{space}"
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>/<<collection:jsonapi_vector_crud_collection>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"data\".*"
+        body: >2
+          {
+            "find": {
+              "sort" : {"$vector" : {vector_json}},
+              "projection" : {"$vector" : 1, "$similarity" : 1},
+              "options" : {
+                  "limit" : 10
+              }
+            }
+          }
+
+
+  update:
+    ops:
+      find-one-update-vector:
+        space: "{space}"
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>/<<collection:jsonapi_vector_crud_collection>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"data\".*"
+        body: >2
+          {
+              "findOneAndUpdate": {
+                  "sort" : {"$vector" : {vector_json}},
+                  "update" : {"$set" : {"status" : "active"}},
+                  "options" : {"returnDocument" : "after"}
+              }
+          }
+
+  delete:
+    ops:
+      delete-document:
+        space: "{space}"
+        method: POST
+        uri: <<protocol:http>>://{weighted_hosts}:<<jsonapi_port:8181>><<path_prefix:>>/v1/<<namespace:jsonapi_vector_crud_namespace>>/<<collection:jsonapi_vector_crud_collection>>
+        Accept: "application/json"
+        X-Cassandra-Request-Id: "{request_id}"
+        X-Cassandra-Token: "{token}"
+        Content-Type: "application/json"
+        ok-body: ".*\"deletedCount\":[0,1].*"
+        body: >2
+          {
+              "findOneAndDelete": {
+                  "sort" : {"$vector" : {vector_json}}
+              }
+          }
+