Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(warehouse): added support for warehouse column count limit #2723

Merged
merged 20 commits into from
Nov 30, 2022

Conversation

achettyiitr
Copy link
Member

@achettyiitr achettyiitr commented Nov 23, 2022

Description

  • Added support for warehouse column count limit using a threshold.
  • Introduced stats in UploadJob and JobRun structs for testing the stats. Also to remove the Global dependency being used in stats.
  • Using memStats for testing.

Notion Ticket

https://www.notion.so/rudderstacks/Send-warehouse-load-table-column-count-alert-to-customer-directly-10496242a6324e7e84ed629790efe208

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

})

It("Upload status stat", func() {
getUploadStatusStat(statName, warehouseutils.Warehouse{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these 2 tests because it does not test any behavior.

@achettyiitr achettyiitr force-pushed the chore.warehouse-column-count branch from 763d083 to 8b40519 Compare November 23, 2022 01:08
@codecov
Copy link

codecov bot commented Nov 23, 2022

Codecov Report

Base: 46.89% // Head: 46.88% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (7a7d4da) compared to base (5e8f73d).
Patch coverage: 75.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2723      +/-   ##
==========================================
- Coverage   46.89%   46.88%   -0.01%     
==========================================
  Files         300      300              
  Lines       49101    49113      +12     
==========================================
+ Hits        23027    23029       +2     
- Misses      24609    24621      +12     
+ Partials     1465     1463       -2     
Impacted Files Coverage Δ
warehouse/identities.go 1.04% <0.00%> (-0.01%) ⬇️
warehouse/slave.go 1.08% <0.00%> (-0.01%) ⬇️
warehouse/warehouse.go 10.02% <0.00%> (-0.02%) ⬇️
warehouse/stats.go 52.84% <60.00%> (-11.92%) ⬇️
warehouse/upload.go 19.37% <90.00%> (+0.91%) ⬆️
config/backend-config/namespace_config.go 67.00% <0.00%> (-3.00%) ⬇️
services/db/recovery.go 54.76% <0.00%> (-1.20%) ⬇️
processor/processor.go 86.40% <0.00%> (+0.41%) ⬆️
router/manager/manager.go 100.00% <0.00%> (+3.29%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@abhimanyubabbar abhimanyubabbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes, overall the change looks good 👍🏼

warehouse/upload.go Outdated Show resolved Hide resolved
warehouse/jobs/types.go Outdated Show resolved Hide resolved
Comment on lines 1085 to 1094
if currentColumnsCount > int(float64(columnCountLimit)*columnCountLimitThreshold) {
tags := []tag{
{
name: "tableName", value: strings.ToLower(tableName),
},
{
name: "columnCountLimit", value: strconv.Itoa(columnCountLimit),
},
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to report two metrics?

warehouse_load_table_column_count

warehouse_load_table_column_limit

And instead of computing the threshold here, we compute it on the alert manager, by doing a similar operation

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add other identifiers as well, like workspaceId and destinationId ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate the motivation behind computing it at the alert manager rather than here ? @lvrach

Copy link
Member Author

@achettyiitr achettyiitr Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add other identifiers as well, like workspaceId and destinationId ?

We are already adding these inside job.counterStat(warehouse_load_table_column_count, tags...).Count(currentColumnsCount)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse_load_table_column_limit This is basically a constant. We should not probably send it as a stat.

Copy link
Member

@lvrach lvrach Nov 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate the motivation behind computing it at the alert manager rather than here? @lvrach

Since the threshold dictates whether an alert should be sent, it is more natural for me to have it on the alert manager. It makes it easier to change it. Also, the fact that we add tableName & columnCountLimit only if the threshold passes prevents us from doing back-testing (going to grafana dashboard and checking when the alert will be triggered with another threshold).

warehouse_load_table_column_limit This is a constant. We should probably not send it as a stat.

If you want to visualise the limit as a line in grafana it is easier if it's metric.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the fact that we add tableName & columnCountLimit only if the threshold passes prevents us from doing back-testing (going to grafana dashboard and checking when the alert will be triggered with another threshold).

Good point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warehouse_load_table_column_limit This is a constant. We should probably not send it as a stat.

If you want to visualise the limit as a line in grafana it is easier if it's metric.

I still feel this information is not much useful. Recording in separate stat and adding tags warehouseId, tableName for an almost constant value (changes only on destination type) would be tedious. Adds computational cost to kapacitor or prometheus executor(future) too.
wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better for debugging and visualization.

@achettyiitr achettyiitr requested a review from lvrach November 24, 2022 08:21
warehouseutils.MSSQL: config.GetInt("Warehouse.mssql.columnCountThreshold", 800),
warehouseutils.POSTGRES: config.GetInt("Warehouse.postgres.columnCountThreshold", 1200),
warehouseutils.RS: config.GetInt("Warehouse.redshift.columnCountThreshold", 1200),
warehouseutils.SNOWFLAKE: config.GetInt("Warehouse.snowflake.columnCountThreshold", 1600),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not planning to add this for SNOWFLAKE because in case of SNOWFLAKE we have a limit on the row size to be 16MB and this will most likely not meet because of the payload max size we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants