From d56545ceaf9de9d7db6438f1b4ce136594178245 Mon Sep 17 00:00:00 2001
From: sancharigr <s.ghosh@rasa.com>
Date: Mon, 11 Dec 2023 10:29:05 +0100
Subject: [PATCH] Additional load testing recommendations

---
 .../monitoring/load-testing-guidelines.mdx    | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/docs/docs/monitoring/load-testing-guidelines.mdx b/docs/docs/monitoring/load-testing-guidelines.mdx
index ff40486853b5..18c39feaf4ae 100644
--- a/docs/docs/monitoring/load-testing-guidelines.mdx
+++ b/docs/docs/monitoring/load-testing-guidelines.mdx
@@ -17,6 +17,27 @@ In our tests we used the Rasa [HTTP-API](https://rasa.com/docs/rasa/pages/http-a
 | Up to 50,000             |         6vCPU                                |      16 GB    |
 | Up to 80,000             |         6vCPU, with almost 90% CPU usage     |      16 GB    |
 
+::: This is the most optimal AWS setup tested on EKS with
+    ec2: c5.2xlarge - 9.2rps/node throughput
+    ec2: c5.4xlarge - 19.5rps/node throughput
+    You can always choose a bigger compute efficient instance like c5.4xlarge with more CPU per node to maximize throughput per node
+:::
+
+|        AWS               |               RasaPro                        |      Rasa Action Server                   |
+|--------------------------|----------------------------------------------|-------------------------------------------|
+| EC2: C52xlarge           |         3vCPU, 10Gb Memory, 3 Sanic Threads  |      3vCPU, 2Gb Memory, 3 Sanic Threads   |
+| EC2: C54xlarge           |         7vCPU, 16Gb Memory, 7 Sanic Threads  |      7vCPU, 12Gb Memory, 7 Sanic Threads  |
+
+### Some recommendations to improve latency
+- Running action as a sidecar, saves about ~100ms on average trips from the action server on the concluded tests. Results may vary depending on the number of calls made to the action server.
+- Sanic Workers must be mapped 1:1 to CPU for both Rasa Pro and Rasa Action Server
+- Create `async` actions to avoid any blocking I/O
+- Use KEDA for pre-emptive autoscaling of rasa pods in production based on http requests
+- `enable_selective_domain: true` : Domain is only sent for actions that needs it. This massively trims the payload between the two pods.
+- Consider using c5n.nxlarge machines which are more compute optimized and support better parallelization on http requests.
+  However, as they are low on memory, models need to be trained lightweight.
+  Not suitable if you want to run transformers
+
 
 ### Debugging bot related issues while scaling up