-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(warehouse): added support for warehouse column count limit #2723
Conversation
}) | ||
|
||
It("Upload status stat", func() { | ||
getUploadStatusStat(statName, warehouseutils.Warehouse{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these 2 tests because it does not test any behavior.
763d083
to
8b40519
Compare
Codecov ReportBase: 46.89% // Head: 46.88% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2723 +/- ##
==========================================
- Coverage 46.89% 46.88% -0.01%
==========================================
Files 300 300
Lines 49101 49113 +12
==========================================
+ Hits 23027 23029 +2
- Misses 24609 24621 +12
+ Partials 1465 1463 -2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes, overall the change looks good 👍🏼
…rudder-server into chore.warehouse-column-count
…re.warehouse-column-count
warehouse/upload.go
Outdated
if currentColumnsCount > int(float64(columnCountLimit)*columnCountLimitThreshold) { | ||
tags := []tag{ | ||
{ | ||
name: "tableName", value: strings.ToLower(tableName), | ||
}, | ||
{ | ||
name: "columnCountLimit", value: strconv.Itoa(columnCountLimit), | ||
}, | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to report two metrics?
warehouse_load_table_column_count
warehouse_load_table_column_limit
And instead of computing the threshold here, we compute it on the alert manager, by doing a similar operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add other identifiers as well, like workspaceId and destinationId ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate the motivation behind computing it at the alert manager rather than here ? @lvrach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add other identifiers as well, like workspaceId and destinationId ?
We are already adding these inside job.counterStat(
warehouse_load_table_column_count, tags...).Count(currentColumnsCount)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warehouse_load_table_column_limit
This is basically a constant. We should not probably send it as a stat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate the motivation behind computing it at the alert manager rather than here? @lvrach
Since the threshold dictates whether an alert should be sent, it is more natural for me to have it on the alert manager. It makes it easier to change it. Also, the fact that we add tableName
& columnCountLimit
only if the threshold passes prevents us from doing back-testing (going to grafana dashboard and checking when the alert will be triggered with another threshold).
warehouse_load_table_column_limit This is a constant. We should probably not send it as a stat.
If you want to visualise the limit as a line in grafana it is easier if it's metric.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the fact that we add tableName & columnCountLimit only if the threshold passes prevents us from doing back-testing (going to grafana dashboard and checking when the alert will be triggered with another threshold).
Good point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warehouse_load_table_column_limit
This is a constant. We should probably not send it as a stat.If you want to visualise the limit as a line in grafana it is easier if it's metric.
I still feel this information is not much useful. Recording in separate stat and adding tags warehouseId, tableName for an almost constant value (changes only on destination type) would be tedious. Adds computational cost to kapacitor or prometheus executor(future) too.
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better for debugging and visualization.
…re.warehouse-column-count
warehouseutils.MSSQL: config.GetInt("Warehouse.mssql.columnCountThreshold", 800), | ||
warehouseutils.POSTGRES: config.GetInt("Warehouse.postgres.columnCountThreshold", 1200), | ||
warehouseutils.RS: config.GetInt("Warehouse.redshift.columnCountThreshold", 1200), | ||
warehouseutils.SNOWFLAKE: config.GetInt("Warehouse.snowflake.columnCountThreshold", 1600), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not planning to add this for SNOWFLAKE because in case of SNOWFLAKE we have a limit on the row size to be 16MB and this will most likely not meet because of the payload max size we have.
…re.warehouse-column-count
…rudder-server into chore.warehouse-column-count
Description
UploadJob
andJobRun
structs for testing the stats. Also to remove the Global dependency being used in stats.Notion Ticket
https://www.notion.so/rudderstacks/Send-warehouse-load-table-column-count-alert-to-customer-directly-10496242a6324e7e84ed629790efe208
Security