Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Limit cluster resource usage in user granularity #7129

Open
3 tasks done
MorningLight5 opened this issue Nov 16, 2021 · 8 comments
Open
3 tasks done

[Feature] Limit cluster resource usage in user granularity #7129

MorningLight5 opened this issue Nov 16, 2021 · 8 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@MorningLight5
Copy link
Contributor

MorningLight5 commented Nov 16, 2021

Search before asking

  • I had searched in the issues and found no similar issues.

Description

In productive environment, the Doris cluster is often facing pressure from many aspects (mainly from stream load and query), cause many resource shortage problem like OOM, especially in shared cluster.
image
As above picture shows, the memory usage waves too big.
I think it's better to have a way to limit the resource usage of each user. Maybe limit the usage frequency is a proper way.

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@MorningLight5 MorningLight5 added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 16, 2021
@MorningLight5

This comment was marked as outdated.

@morningman
Copy link
Contributor

What if the request exceed the limit? return error or slow down?
And is there any other system we can refer to?

@MorningLight5
Copy link
Contributor Author

MorningLight5 commented Nov 16, 2021

What if the request exceed the limit? return error or slow down? And is there any other system we can refer to?

As far as I know, MySQL have variable max_connection to limit connection number. When connection exceed the limit, it returns error.

@MorningLight5

This comment was marked as outdated.

@morningman
Copy link
Contributor

I see.
Doris already has max_connection limit which can be set for each user.
But I think what you need is not just limit the number of connection, but to limit the rate of request.

As far as I know, Guava's rate limiter may meet the requirement. But what more important is, how to define the rate?
Simply put, it may be a limitation of QPS. But the essence is "control the consumption of cluster resources per unit time."

So I think in the first version, we can implement this function through simple rules (such as QPS). But in the specific design, we must reflect the abstract design of "system resources" so that we can add more rules later.

Looking forward your PR!

@MorningLight5

This comment was marked as outdated.

@xinyiZzz
Copy link
Contributor

What if the request exceed the limit? return error or slow down? And is there any other system we can refer to?

Impala’s AdmissionController does a similar thing,
Introduction is here https://shimo.im/docs/6qxjctpyDHJgPwtw

@MorningLight5
Copy link
Contributor Author

MorningLight5 commented Feb 14, 2022

The limit of operation frequency is developed in #7474 , user can config the threshold through frontend config like below:
ADMIN SET FRONTEND CONFIG ('key' = 'value')
You can limit the query number(max_running_query_num) in certain period (report_stats_period), the default period is 10 second(10 * 1000). And you can also limit the load number through max_running_txn_num.

The design of this feature is clear:
Each FE keeps its query number locally, and reports the query number to Master every period, So every FE can get the query number in each FE through metadata synchronize. When there is a query arrive, if the total query number in last period exceeds threshold, the system reject the query. User can only query in next period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants