[BUG] autoCompact not working #1427

0xdarkman · 2022-10-11T06:16:29Z

I have a delta table in the location as below with such properties:

tablePath abfss://CONTAINER@SA.dfs.core.windows.net/TABLE

delta.autoOptimize.autoCompact true
delta.autoOptimize.optimizeWrite true
delta.deletedFileRetentionDuration interval 168 hours
delta.logRetentionDuration interval 168 hours
delta.minReaderVersion 1
delta.minWriterVersion 2

I do run daily from databricks:

VACUUM delta.`abfss://CONTAINER@SA.dfs.core.windows.net/TABLE`

Table is partitionBy date, some_col.
I do writeStream to above table with mode append using spark streaming.

spark version 3.2.1
delta.io library ver 1.2.1
hadoop ver 3.3.0

If I check some old location:

abfss://CONTAINER@SA.dfs.core.windows.net/TABLE/date=2022-08-20/hour=7/some_col=xxx/

or some latest written location:

abfss://CONTAINER@SA.dfs.core.windows.net/TABLE/date=2022-10-11//hour=5/some_col=xxx/

I see lots of small files still.

I thought that autoCompact will force DeltaTable to repartition small files into large ones (128MB).

Although, it is not the case.
Why?

The text was updated successfully, but these errors were encountered:

0xdarkman · 2022-10-11T06:32:17Z

%sql
DESCRIBE DETAIL delta.`abfss://CONTAINER@SA.dfs.core.windows.net/TABLE`

delta.deletedFileRetentionDuration: "interval 168 hours"
delta.autoOptimize.autoCompact: "true"
delta.logRetentionDuration: "interval 168 hours"
delta.autoOptimize.optimizeWrite: "true"

tdas · 2022-10-11T07:05:32Z

Are you running this on Databricks? If so, please contact Databricks support if autocompaction is not working.
If you are running this with Apache Spark, then auto compaction is being built right now, so its not yet available in Delta OSS.

0xdarkman · 2022-10-11T07:15:10Z

@tdas
I am running spark+scala job deployed with spark operator in kubernetes.
So the answer is: yes, I do use spark to writeStream BUT I do run VACUUM from databricks.

Shall I expect it to work?

Q1: I do not need to run VACUUM from spark, dont I? running VACUUM from databricks shall be OK
Q2: autoCompact is table property so I thought delta table will handle autocompact for me so what difference spark vs databricks make here?
Q3: if autoCompact does not work shall I use OPTIMIZE? I would like to run OPTIMIZE from databricks daily while I keep streaming with spark running in kubernetes.

0xdarkman · 2022-10-11T08:20:45Z

https://docs.databricks.com/optimizations/auto-optimize.html#if-i-have-auto-optimize-enabled-on-a-table-that-im-streaming-into-and-a-concurrent-transaction-conflicts-with-the-optimize-will-my-job-fail

"By default, auto optimize does not begin compacting until it finds more than 50 small files in a directory"

I have more than 50 small files in partitioned directories.

I have 168 files per partition.

zsxwing · 2022-10-12T20:03:10Z

Q1: I do not need to run VACUUM from spark, dont I? running VACUUM from databricks shall be OK

You can run VACUUM anywhere

Q2: autoCompact is table property so I thought delta table will handle autocompact for me so what difference spark vs databricks make here?

auto optimize is still being built #1156. Currently the table property is respected only if you run your queries in Databricks.

Q3: if autoCompact does not work shall I use OPTIMIZE? I would like to run OPTIMIZE from databricks daily while I keep streaming with spark running in kubernetes.

Yep, you can use OPTIMIZE. It can be run in Spark or Databricks.

"By default, auto optimize does not begin compacting until it finds more than 50 small files in a directory"

This is expected right now if the write is not happening in Databricks. As I mentioned above, it's being built.

Another suggestion, if you run your code outside Databricks, it's better to read https://docs.delta.io/latest/index.html instead.

bqiang-stackadapt · 2024-03-06T18:36:46Z

@zsxwing I think this should be available now? since it's documented here

0xdarkman added the bug Something isn't working label Oct 11, 2022

0xdarkman changed the title ~~[BUG] OPTIMIZE not working~~ [BUG] autoCompact not working Oct 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] autoCompact not working #1427

[BUG] autoCompact not working #1427

0xdarkman commented Oct 11, 2022 •

edited

Loading

0xdarkman commented Oct 11, 2022

tdas commented Oct 11, 2022

0xdarkman commented Oct 11, 2022

0xdarkman commented Oct 11, 2022 •

edited

Loading

zsxwing commented Oct 12, 2022 •

edited

Loading

bqiang-stackadapt commented Mar 6, 2024

[BUG] autoCompact not working #1427

[BUG] autoCompact not working #1427

Comments

0xdarkman commented Oct 11, 2022 • edited Loading

0xdarkman commented Oct 11, 2022

tdas commented Oct 11, 2022

0xdarkman commented Oct 11, 2022

0xdarkman commented Oct 11, 2022 • edited Loading

zsxwing commented Oct 12, 2022 • edited Loading

bqiang-stackadapt commented Mar 6, 2024

0xdarkman commented Oct 11, 2022 •

edited

Loading

0xdarkman commented Oct 11, 2022 •

edited

Loading

zsxwing commented Oct 12, 2022 •

edited

Loading