From 865902f3c576a4a66b9b9492f3e8234261c14e17 Mon Sep 17 00:00:00 2001 From: sancharigr Date: Mon, 11 Dec 2023 10:29:05 +0100 Subject: [PATCH 1/3] Additional load testing recommendations --- .../monitoring/load-testing-guidelines.mdx | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/docs/docs/monitoring/load-testing-guidelines.mdx b/docs/docs/monitoring/load-testing-guidelines.mdx index ff40486853b5..23b104917177 100644 --- a/docs/docs/monitoring/load-testing-guidelines.mdx +++ b/docs/docs/monitoring/load-testing-guidelines.mdx @@ -17,6 +17,29 @@ In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-a | Up to 50,000 | 6vCPU | 16 GB | | Up to 80,000 | 6vCPU, with almost 90% CPU usage | 16 GB | +:::info This is the most optimal AWS setup tested on EKS with + +ec2: c5.2xlarge - 9.2rps/node throughput +ec2: c5.4xlarge - 19.5rps/node throughput +You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node + +::: + +| AWS | RasaPro | Rasa Action Server | +|--------------------------|----------------------------------------------|-------------------------------------------| +| EC2: C52xlarge | 3vCPU, 10Gb Memory, 3 Sanic Threads | 3vCPU, 2Gb Memory, 3 Sanic Threads | +| EC2: C54xlarge | 7vCPU, 16Gb Memory, 7 Sanic Threads | 7vCPU, 12Gb Memory, 7 Sanic Threads | + +### Some recommendations to improve latency +- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server. +- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server +- Create `async` actions to avoid any blocking I/O +- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests +- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods. +- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests. + However, as they are low on memory, models need to be trained lightweight. + Not suitable if you want to run transformers + ### Debugging bot related issues while scaling up From 652c1756ab695b55eb0c199ebe3a190482fdb29c Mon Sep 17 00:00:00 2001 From: sancharigr Date: Thu, 4 Jan 2024 20:25:20 +0530 Subject: [PATCH 2/3] Review changes made --- .../monitoring/load-testing-guidelines.mdx | 23 ++++++------------- 1 file changed, 7 insertions(+), 16 deletions(-) diff --git a/docs/docs/monitoring/load-testing-guidelines.mdx b/docs/docs/monitoring/load-testing-guidelines.mdx index 23b104917177..a794d73639da 100644 --- a/docs/docs/monitoring/load-testing-guidelines.mdx +++ b/docs/docs/monitoring/load-testing-guidelines.mdx @@ -12,33 +12,24 @@ In order to gather metrics on our system's ability to handle increased loads and In each test case we spawned the following number of concurrent users at peak concurrency using a [spawn rate](https://docs.locust.io/en/1.5.0/configuration.html#all-available-configuration-options) of 1000 users per second. In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-api) and the [Locust](https://locust.io/) open source load testing tool. + | Users | CPU | Memory | |--------------------------|----------------------------------------------|---------------| | Up to 50,000 | 6vCPU | 16 GB | | Up to 80,000 | 6vCPU, with almost 90% CPU usage | 16 GB | -:::info This is the most optimal AWS setup tested on EKS with - -ec2: c5.2xlarge - 9.2rps/node throughput -ec2: c5.4xlarge - 19.5rps/node throughput -You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node - -::: - -| AWS | RasaPro | Rasa Action Server | -|--------------------------|----------------------------------------------|-------------------------------------------| -| EC2: C52xlarge | 3vCPU, 10Gb Memory, 3 Sanic Threads | 3vCPU, 2Gb Memory, 3 Sanic Threads | -| EC2: C54xlarge | 7vCPU, 16Gb Memory, 7 Sanic Threads | 7vCPU, 12Gb Memory, 7 Sanic Threads | ### Some recommendations to improve latency -- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server. - Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server - Create `async` actions to avoid any blocking I/O -- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests - `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods. -- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests. +- Consider using compute efficient machines on cloud which are optimized for high performance computing such as the C5 instances on AWS. However, as they are low on memory, models need to be trained lightweight. - Not suitable if you want to run transformers + + +| Machine | RasaPro | Rasa Action Server | +|--------------------------------|------------------------------------------------|--------------------------------------------------| +| AWS C5 or Azure F or Gcloud C2 | 3-7vCPU, 10-16Gb Memory, 3-7 Sanic Threads | 3-7vCPU, 2-12Gb Memory, 3-7 Sanic Threads | ### Debugging bot related issues while scaling up From 64546fcf70be2754e75550016c9e8916e898aa09 Mon Sep 17 00:00:00 2001 From: sancharigr Date: Fri, 5 Jan 2024 11:28:01 +0530 Subject: [PATCH 3/3] Add missing CI step condition --- .github/workflows/continous-integration.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/continous-integration.yml b/.github/workflows/continous-integration.yml index b50753b1dffb..587ad2ad26e4 100644 --- a/.github/workflows/continous-integration.yml +++ b/.github/workflows/continous-integration.yml @@ -290,6 +290,7 @@ jobs: - name: Prevent race condition in poetry build # More context about race condition during poetry build can be found here: # https://github.com/python-poetry/poetry/issues/7611#issuecomment-1747836233 + if: needs.changes.outputs.backend == 'true' run: | poetry config installer.max-workers 1