Decrease batch size for seeds #35

tomreitz · 2023-01-10T17:57:52Z

Thanks for your work on this adapter, it's great.

The batch size to use when loading seeds is specified here as 100000.

I'm not sure how the 100k value was determined, but I ran into the following SQLite error when trying to load a seed file of 70k rows, 5 cols, 2.37MB (a dictionary of common words in 14 languages):

17:46:24  7 of 7 ERROR loading seed file main.synth_words ................................ [ERROR in 2.54s]
...
17:46:24  Database Error in seed synth_words (seeds/synth_words.csv)
17:46:24    too many SQL variables

I resolved the error by manually changing the batch size in macros/materializations/seed/seed.sql from 100k down to 10k.

SQLite's limits include SQLITE_MAX_VARIABLE_NUMBER

which defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.

(I'm running SQLite 3.31.1.) It seems that the number of variables used in the seed query scales with batch size.

Sidebar: dbt itself specifies a maximum file size for seeds but this is only used to determine whether or not to hash the file. dbt discourages using very large seed files, but does not impose a limit as far as I can tell.

Would it be possible to decrease the batch size from 100k to say 10k or so? and/or use a configurable seed_batch_size parameter?

My guess is that the optimal value probably depends on both the number of rows and columns in the seed file, so it may be difficult to hard-code and justify any specific batch size. But something smaller, while resulting in more queries against the database, should still be fairly performant.

The text was updated successfully, but these errors were encountered:

codeforkjeff · 2023-01-11T02:27:57Z

Bumping down to 10k sounds good, thanks for reporting this. I'll cut a new release containing this fix.

codeforkjeff added a commit that referenced this issue Jan 11, 2023

#35: decrease batch size for seeds from 100k to 10k

87d9000

codeforkjeff closed this as completed Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decrease batch size for seeds #35

Decrease batch size for seeds #35

tomreitz commented Jan 10, 2023

codeforkjeff commented Jan 11, 2023

Decrease batch size for seeds #35

Decrease batch size for seeds #35

Comments

tomreitz commented Jan 10, 2023

codeforkjeff commented Jan 11, 2023