From d56545ceaf9de9d7db6438f1b4ce136594178245 Mon Sep 17 00:00:00 2001 From: sancharigr Date: Mon, 11 Dec 2023 10:29:05 +0100 Subject: [PATCH] Additional load testing recommendations --- .../monitoring/load-testing-guidelines.mdx | 21 +++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/docs/docs/monitoring/load-testing-guidelines.mdx b/docs/docs/monitoring/load-testing-guidelines.mdx index ff40486853b5..18c39feaf4ae 100644 --- a/docs/docs/monitoring/load-testing-guidelines.mdx +++ b/docs/docs/monitoring/load-testing-guidelines.mdx @@ -17,6 +17,27 @@ In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-a | Up to 50,000 | 6vCPU | 16 GB | | Up to 80,000 | 6vCPU, with almost 90% CPU usage | 16 GB | +::: This is the most optimal AWS setup tested on EKS with + ec2: c5.2xlarge - 9.2rps/node throughput + ec2: c5.4xlarge - 19.5rps/node throughput + You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node +::: + +| AWS | RasaPro | Rasa Action Server | +|--------------------------|----------------------------------------------|-------------------------------------------| +| EC2: C52xlarge | 3vCPU, 10Gb Memory, 3 Sanic Threads | 3vCPU, 2Gb Memory, 3 Sanic Threads | +| EC2: C54xlarge | 7vCPU, 16Gb Memory, 7 Sanic Threads | 7vCPU, 12Gb Memory, 7 Sanic Threads | + +### Some recommendations to improve latency +- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server. +- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server +- Create `async` actions to avoid any blocking I/O +- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests +- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods. +- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests. + However, as they are low on memory, models need to be trained lightweight. + Not suitable if you want to run transformers + ### Debugging bot related issues while scaling up