fix(ktor): scalability for server #559

ybelMekk · 2024-12-05T11:04:26Z

Noticed slow processing in ktor, when there is a high amount of load. The thread pool sizes (connectionGroupSize, workerGroupSize, and callGroupSize) dynamically calculated based on the available processors, ensuring optimal performance.

This setup will be 16, 32, 32 instead of 8, 8, 16 -> This configuration assumes that application processing is more CPU-intensive than connection handling. Not true looking at metrics.

checking the docs:
If /token processing is delayed, we can increase callGroupSize gradually to handle more concurrent requests.

consider:
install(IdleTimeout) {
requestTimeoutMillis = 15000
idleTimeoutMillis = 60000
}

to handle connection not consuming resources use

* noticed slow processing in ktor, when there is a high amount of load. The thread pool sizes (connectionGroupSize, workerGroupSize, and callGroupSize) dynamically calculated based on the available processors, ensuring optimal performance. This setup will be 16, 32, 32 instead of 8, 8, 16 -> This configuration assumes that application processing is more CPU-intensive than connection handling. Not true looking at metrics. checking the docs: If /token processing is delayed, we can increase callGroupSize gradually to handle more concurrent requests. consider: install(IdleTimeout) { requestTimeoutMillis = 15000 idleTimeoutMillis = 60000 } to handle connection not consuming use. *

tronghn · 2024-12-05T11:39:06Z

Nice work! How does Runtime.getRuntime().availableProcessors() change? Does it scale with the given CPU requests or limits?

Is the slow processing present when scaling with multiple replicas as CPU load goes up?

ybelMekk · 2024-12-05T12:01:21Z

How does Runtime.getRuntime().availableProcessors() change? Does it scale with the given CPU requests or limits?

Is the slow processing present when scaling with multiple replicas as CPU load goes up?

In Kubernetes, the value adapts to the CPU limits set on the container.
If no specific limits are set, it reflects the total processors available on the host machine.

CPU Requests:
availableProcessors() does not reflect CPU requests; it reflects the CPU limits.

CPU Limits:
availableProcessors() will return the number of logical processors corresponding to the limit.

like, If no limits are set, availableProcessors() will reflect the total CPUs on the node.
If a limit is set to 2 CPUs, availableProcessors() returns 2.

If we increase the replicas we can observe how availableProcessors() scales.

So we want to set a cpu limit, I tested in dev-gcp and it gave me ->16, thats from the calculation in my PR.

we devide by 2

…Size never exceeds the database maxConnectionPool, that could start starvation or exceed the database `max_connection` limit. * set max_connections flag = 200 * increase the production pool size * 10 pods * 20 connections = 200 connections (matches a database with max_connections = 200)

ybelMekk · 2024-12-09T10:24:12Z

heres an example of the server taking over 1 sec to start to handle the request:

https://grafana.nav.cloud.nais.io/a/grafana-exploretraces-app/explore?primarySignal=server_spans&from=now-30m&to=now&var-ds=P8A28344D07741F8D&var-filters=kind%7C%3D%7Cserver&var-filters=resource.service.name%7C%3D%7Ctokenx-tokendings&var-metric=duration&var-groupBy=All&var-latencyThreshold=134ms&var-partialLatencyThreshold=40ms&refresh=&traceId=a27461350744b1b76db23cfca3b275b1&spanId=865ae8939def404c&actionView=traceList&selection=%7B%22duration%22:%7B%22from%22:%22134ms%22,%22to%22:%22%22%7D,%22raw%22:%7B%22x%22:%7B%22from%22:1733728389000,%22to%22:1733730189000%7D,%22y%22:%7B%22from%22:8.5,%22to%22:12.5%7D%7D%7D

tronghn · 2024-12-11T08:05:32Z

charts/templates/tokendings.yaml

+        flags:
+          - name: max_connections
+            value: "200"


This might cause a restart of the sql instance. Otherwise looks good to me!

Yes, it seems so. I suggest merging this during maintenance hours when traffic is low.

tommytroen

Nice work! Lgtm

…nt connections for the database matches the replicas * pol_max

ybelMekk requested review from tronghn and tommytroen December 5, 2024 11:04

ybelMekk added 4 commits December 5, 2024 16:03

fix(server): without setting limit for cpu, not overloading

47d110f

we devide by 2

fix(server): remove comment

77d1beb

update(build): update deps

62da450

tronghn approved these changes Dec 11, 2024

View reviewed changes

tommytroen approved these changes Dec 12, 2024

View reviewed changes

update(metrics): max replicas is 12, so assure the number of concurre…

48220ed

…nt connections for the database matches the replicas * pol_max

ybelMekk merged commit 9715ea3 into master Dec 12, 2024
3 checks passed

ybelMekk deleted the update_server_config branch December 12, 2024 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ktor): scalability for server #559

fix(ktor): scalability for server #559

ybelMekk commented Dec 5, 2024 •

edited

Loading

tronghn commented Dec 5, 2024

ybelMekk commented Dec 5, 2024 •

edited

Loading

ybelMekk commented Dec 9, 2024

tronghn Dec 11, 2024

ybelMekk Dec 12, 2024

tommytroen left a comment

fix(ktor): scalability for server #559

fix(ktor): scalability for server #559

Conversation

ybelMekk commented Dec 5, 2024 • edited Loading

tronghn commented Dec 5, 2024

ybelMekk commented Dec 5, 2024 • edited Loading

ybelMekk commented Dec 9, 2024

tronghn Dec 11, 2024

Choose a reason for hiding this comment

ybelMekk Dec 12, 2024

Choose a reason for hiding this comment

tommytroen left a comment

Choose a reason for hiding this comment

ybelMekk commented Dec 5, 2024 •

edited

Loading

ybelMekk commented Dec 5, 2024 •

edited

Loading