Skip to content

How to do stress test

ForrestGumb edited this page Dec 23, 2024 · 5 revisions

What is stress test?

In a production system, it is important to make sure the service working well under the target traffic. So it would be valuable to do some stress test on the system end to end before official launch.

We can do such stress test with our customers if they face issues.

How to do stress test?

long time stress

A simple test case is to invoke TTS service with long enough time to see there are any issues with the client code or the service. We commit a high SLA on our service. But sometimes, there could issues with the network or client code etc. So it is good to run though such as test case to make sure there is no surprise.

Please make sure you are using a S0 (no free key) to run stress test.

For example, we run a 2 hour test from Azure DC in China to call Azure southeastasia. All the calls are successfully conducted. The availability > 99.95%

concurrency stress test

If above long time stress pass, one can move to stress test with multiple threads. The basic idea is the same as above but you can create multiple clients that call the service in parallel.

To best simulate the real traffic, estimate the concurrent thread your target scenario need to be at. Then contact with us before doing such stress as we might need to setup capacity if your traffic is too high.

Then doing such concurrency stress test, ramp up the traffic slowly with from small to big number of thread so the service auto scale can handle it.

Note for stress test. (Use 200 TPS stress test as example.)

  • Check network bandwidth. If average audio length is 5 seconds, it takes 60KB. 200 audio files take 12MB. So, it takes 96Mbps network bandwidth. Longer audio length consumes bigger network bandwidth. It will be a bottle neck when it's international traffic. You're suggest putting TTS client in the same region as TTS service.
  • Need multiple client machines. Each TTS client machine can support 30 TPS. With more TPS, client will be a bottle neck.
  • TPS (also called QPS) are different from concurrency (thread). 200 TPS means 200 TTS requests per second. 200 concurrency means 200 threads keep sending requests. In one thread, TTS may finish syntheses one request in 500ms. In that case, one thread can handle 2 TPS. 200 threads will result in 400 TPS.
  • Ramp up traffic slowly. You're suggested starting with 5 TPS, them 10 TPS, 20 TPS, 40 TPS, 60 TPS, 80 TPS, 100 TPS, 120 TPS, 140 TPS, 160 TPS, 180 TPS, 200 TPS. Each traffic keeps 2 minutes. If there's still a lot of 429 errors after 2 minutes, it means service need more time to scale up. Keep this traffic a little longer.
  • If you want to track TTS latency during stress test, please only track TTS execution time. Don't include your system time like write audio to file, write data bases. These operations also take considerable time during high TPS. If you use Speech SDK, you can get first byte latency and finish latency easily with sample code.
Clone this wiki locally