-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Importing huge seed files failes #127
Comments
There is already a limit on number of rows in dbt-core. The use of seeds are not advised and may get deprecated in the future. I think it will require some work to make dbt batch insert the data. But 152 rows is not that much. I don't know what the limit are in more recent versions of SQL Server. |
Well, the limit is actually 2.100 as you can see from the error query but the dbt module for SQL Server seems to make each cell as insert / or parameter which makes it break. I don't have much insight into the inner logic what the module executes here but I think 152 rows should be supported. |
The SQL Server INSERT INTO seed_table
(col1, col2, ...)
VALUES
(@row1_col1, row1_col2, ...),
(@row2_col1, row2_col2, ...),
... and SQL Server has a limit on these parameter values. But I am just guessing here... |
@hz-lschick you're directionally correct here. If you look at line 18, each column value is a parameter marked by So a potential workaround would be to override re-create the adapter's {% macro sqlserver__load_csv_rows(model, agate_table) %}
{{ return(sqlserver__basic_load_csv_rows(model, 200, agate_table) )}}
{% endmacro %} you might try putting synapse's seed macro in your local macros dir and renaming
|
I guess we could change the 200 setting to 100 instead. Will not make a difference in performance. Anyone interested in doing a pull request? |
Hmm I think the offered solutions are both good and will work for me, but would wish that someone would do this calculation within the code. E.g. when you have a batch size of 200 and you have over 10 columns the batch size should be automatically be reduced so that one query does not use more than 2100 parameters/cells in the insert query. |
From my point of view it is a little overkill. I guess we can set it to 42 and have a disclaimer that seed files with more than 50 columns is not supported. I think SQL Server would handle the inserts quite fast even with the small batch size. But, I will approve a PR with this logic if it works ok :-). Seeds files are in your project directory and will be version controlled. This means that if you have a large file it will be sent to git on every change making your project heavy and slow. If you have larger files you may want to look into an other solution for loading them. |
Issue #127 - Respect SQL Servers Parameter Limit
I have a seed file with 14 columns and 152 lines. It fails with the error message:
Note: 14 columns x 152 rows = 2.128 cells
This looks like that the seed files are not batched when they are imported so they fail when they are too big.
I think dbt should create batches for lines/cells to avoid such issues.
I'm working here with a SQL Server 2014, but I guess this issue applies to the current SQL Server version as well.
The text was updated successfully, but these errors were encountered: