-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] autoCompact not working #1427
Comments
|
Are you running this on Databricks? If so, please contact Databricks support if autocompaction is not working. |
@tdas Shall I expect it to work? Q1: I do not need to run VACUUM from spark, dont I? running VACUUM from databricks shall be OK |
"By default, auto optimize does not begin compacting until it finds more than 50 small files in a directory" I have more than 50 small files in partitioned directories. I have 168 files per partition. |
You can run VACUUM anywhere
auto optimize is still being built #1156. Currently the table property is respected only if you run your queries in Databricks.
Yep, you can use OPTIMIZE. It can be run in Spark or Databricks.
This is expected right now if the write is not happening in Databricks. As I mentioned above, it's being built. Another suggestion, if you run your code outside Databricks, it's better to read https://docs.delta.io/latest/index.html instead. |
I have a delta table in the location as below with such properties:
tablePath
abfss://CONTAINER@SA.dfs.core.windows.net/TABLE
I do run daily from databricks:
Table is partitionBy date, some_col.
I do writeStream to above table with mode
append
using spark streaming.If I check some old location:
or some latest written location:
I see lots of small files still.
I thought that autoCompact will force DeltaTable to repartition small files into large ones (128MB).
Although, it is not the case.
Why?
The text was updated successfully, but these errors were encountered: