Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ktor): scalability for server #559

Merged
merged 6 commits into from
Dec 12, 2024
Merged

fix(ktor): scalability for server #559

merged 6 commits into from
Dec 12, 2024

Conversation

ybelMekk
Copy link
Contributor

@ybelMekk ybelMekk commented Dec 5, 2024

Noticed slow processing in ktor, when there is a high amount of load. The thread pool sizes (connectionGroupSize, workerGroupSize, and callGroupSize) dynamically calculated based on the available processors, ensuring optimal performance.

This setup will be 16, 32, 32 instead of 8, 8, 16 -> This configuration assumes that application processing is more CPU-intensive than connection handling. Not true looking at metrics.

checking the docs:
If /token processing is delayed, we can increase callGroupSize gradually to handle more concurrent requests.

consider:
install(IdleTimeout) {
requestTimeoutMillis = 15000
idleTimeoutMillis = 60000
}

to handle connection not consuming resources use

* noticed slow processing in ktor, when there is a high amount of load. The thread pool sizes (connectionGroupSize, workerGroupSize, and callGroupSize)  dynamically calculated based on the available processors, ensuring optimal performance.

This setup will be 16, 32, 32 instead of 8, 8, 16 -> This configuration assumes that application processing is more CPU-intensive than connection handling. Not true looking at metrics.

checking the docs:
If /token processing is delayed, we can increase callGroupSize gradually to handle more concurrent requests.

consider:
install(IdleTimeout) {
    requestTimeoutMillis = 15000
    idleTimeoutMillis = 60000
}

to handle connection not consuming use.

*
@tronghn
Copy link
Contributor

tronghn commented Dec 5, 2024

Nice work! How does Runtime.getRuntime().availableProcessors() change? Does it scale with the given CPU requests or limits?

Is the slow processing present when scaling with multiple replicas as CPU load goes up?

@ybelMekk
Copy link
Contributor Author

ybelMekk commented Dec 5, 2024

How does Runtime.getRuntime().availableProcessors() change? Does it scale with the given CPU requests or limits?

Is the slow processing present when scaling with multiple replicas as CPU load goes up?

In Kubernetes, the value adapts to the CPU limits set on the container.
If no specific limits are set, it reflects the total processors available on the host machine.

CPU Requests:
availableProcessors() does not reflect CPU requests; it reflects the CPU limits.

CPU Limits:
availableProcessors() will return the number of logical processors corresponding to the limit.

like, If no limits are set, availableProcessors() will reflect the total CPUs on the node.
If a limit is set to 2 CPUs, availableProcessors() returns 2.

If we increase the replicas we can observe how availableProcessors() scales.

So we want to set a cpu limit, I tested in dev-gcp and it gave me ->16, thats from the calculation in my PR.

…Size never exceeds the database maxConnectionPool,

that could start starvation or exceed the database `max_connection` limit.

* set max_connections flag = 200
* increase the production pool size
* 10 pods * 20 connections = 200 connections (matches a database with max_connections = 200)
Comment on lines 62 to 64
flags:
- name: max_connections
value: "200"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause a restart of the sql instance. Otherwise looks good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it seems so. I suggest merging this during maintenance hours when traffic is low.

Copy link
Collaborator

@tommytroen tommytroen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Lgtm

…nt connections for the database matches the replicas * pol_max
@ybelMekk ybelMekk merged commit 9715ea3 into master Dec 12, 2024
3 checks passed
@ybelMekk ybelMekk deleted the update_server_config branch December 12, 2024 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants