Parallel Vacuum command #393
Labels
binding/rust
Issues for the Rust crate
enhancement
New feature or request
help wanted
Extra attention is needed
storage/aws
AWS S3 storage related
Description
Currently, the vacuum command deletes files one by one which is very slow on e.g. S3, especially if you have 100000s files. I had a case (with databricks/spark) with more than 8 million stale files, which took days, even with parallel calls (using
spark.databricks.delta.vacuum.parallelDelete.enabled
I got around 80 deletes / second)The delete calls could be parallelized (e.g. 100/1000/10k concurrent deletes) to speed up the processing.
Use Case
More performant vacuum.
Related Issue(s)
The text was updated successfully, but these errors were encountered: