Massive performance degradation for table writes one row at a time #2780
Unanswered
joshivinay
asked this question in
Q&A
Replies: 2 comments 1 reply
-
Forgot to mention. We are using delta-rs 0.18.0 |
Beta Was this translation helpful? Give feedback.
1 reply
-
1,3. you are reading the checkpoint, but if you don't optimize compact properly, your checkpoint will be huge and you may get OOM if your container doesn't have enough resources. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We are using delta-rs to read large and deeply hierarchical json files (1 MB - 20 MB) and write them to a delta table in append only mode using write_delta().
Due to current system limitations, we are forced to have one container handle one file at a time. The number of containers running in parallel can go to about 1000.
Each container receives a pointer to a json file, it reads the json file and writes it the delta table. We have a backoff mechanism to backoff in case of conflicts. If there are 50000 json files, we end up with a delta table with 50,000 parquet files and subsequently 50,000 commit log files.
We have noticed that the performance degrades drastically despite checkpointing of the log files. We also see a humongous number of s3 getObject calls.
Some questions:
I did a search for a similar discussion and found one where the number of object storage calls was high, but nothing like we are seeing. We raked up $$$ in a performance test just because of the s3 getObjects and the processing wasnt complete.
Glue eventually couldnt OPTIMIZE and VACUUM RETAIN 0 HOURS either after a 6 hour run.
We are trying to understand if there is any better way to do this or are these limitations with delta-rs for this kind of processing.
Pseudo code here.
`
import time
from dataclasses import dataclass, fields
from datetime import datetime
from typing import Any, Mapping
download the json file and the schema
Beta Was this translation helpful? Give feedback.
All reactions