Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database and Table Level Metrics #17688

Open
breezewish opened this issue Jun 4, 2020 · 7 comments
Open

Database and Table Level Metrics #17688

breezewish opened this issue Jun 4, 2020 · 7 comments
Labels
component/metrics feature/accepted This feature request is accepted by product managers help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/P1 The issue has P1 priority. type/feature-request Categorizes issue or PR as related to a new feature.

Comments

@breezewish
Copy link
Member

Feature Request

Is your feature request related to a problem? Please describe:

Many customers use a single TiDB cluster to serve multiple & hybrid payloads (in different databases). Currently TiDB only supports metrics of the whole TiDB cluster or single TiDB instance. Database or table metrics are missing.

Describe the feature you'd like:

For critical metrics like QPS, latency, errors, a more detailed, per-table and per-database metrics is needed.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

The technical implementation needs to be further investigated. A possible solution can be using Prometheus labels. The corresponding label in memory metrics need to be deleted when table or database is deleted. Notice that there might be a lot of databases and tables and attaching metrics for each one may affect performance, so that it may not be suitable to adapt all metrics. Also histograms and multi-label metrics need to be very carefully considered, since they notably amplify the number of metrics. Another good idea can be allowing users to config what tables and databases are needed.

The new metrics should be added to the Grafana monitor.

Notice that TiDB already have a similar feature: #9151. However it may lead to memory leaks due to always keeping database names in the memory, as well as not work well when database numbers are huge. The new implementation could refine and improve it.

@breezewish breezewish added the type/feature-request Categorizes issue or PR as related to a new feature. label Jun 4, 2020
@djshow832 djshow832 added component/metrics help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jun 5, 2020
@scsldb scsldb added the feature/reviewing This feature request is reviewing by product managers label Jul 16, 2020
@zz-jason
Copy link
Member

@breeswish Could you describe more about the use cases? Seems it's useful in a multi-tenant scenario?

@breezewish
Copy link
Member Author

@zz-jason Simply speaking, yes. We already have many customers use a single TiDB cluster to serve multiple & hybrid payloads. These payloads are usually stayed in different databases. But anyway, they are on the same TiDB cluster or even same TiDB instance.

@jackysp
Copy link
Member

jackysp commented Aug 6, 2020

emm... something like #9151 ?

@zz-jason zz-jason added feature/discussing This feature request is discussing among product managers and removed feature/reviewing This feature request is reviewing by product managers labels Aug 10, 2020
@breezewish
Copy link
Member Author

@jackysp Yes, similar to it. The current implementation in #9151 has memory leaks (when database is deleted) and database level metrics is usually not precise enough as well.

@jackysp
Copy link
Member

jackysp commented Aug 12, 2020

The key point is that still there are performance issues when there are many databases. It seems like no one uses this feature so that many people don't even know about it.

@breezewish
Copy link
Member Author

@jackysp I received real-world feature requests from our clients for this one, where they don't deploy multiple TiDB clusters :)

Yes, in addition to the memory leak issue, performance is another problem. I think we can simply let user configure what they want to collect, in order to not suffer from performance problems in default scenarios. It is netural that the more user wants to know, the more cost there will be. The important part is to let user decide.

@scsldb scsldb added feature/accepted This feature request is accepted by product managers priority/P1 The issue has P1 priority. and removed feature/discussing This feature request is discussing among product managers labels Sep 4, 2020
@scsldb scsldb added this to the Requirement pool milestone Sep 4, 2020
@zz-jason zz-jason removed their assignment Sep 5, 2020
@pepezzzz
Copy link

pepezzzz commented Dec 1, 2020

@breeswish Please help develop a simple one, end-user dba can oberseve the duration per database.
like #19360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/metrics feature/accepted This feature request is accepted by product managers help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/P1 The issue has P1 priority. type/feature-request Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

6 participants