You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently, we have a table overwrite feature or drop/add partitions. We should add functionality so that we can merge upsert data into existing table.
Describe the solution you'd like
We will follow the below steps:
Read existing table using Athena
Check if existing table have duplicates
Merge the new pandas dataframe into existing table using Pandas on the primary key merged_df = pd.concat([existing_df[~existing_df.index.isin(delta_df.index)], delta_df])
Overwrite the existing Athena table.
Write to Glue catalog
res = wr.s3.to_parquet(
df=merged_df,
path=s3_path_prefix,
dataset=True,
database=database_name,
table=table_name,
mode="overwrite"
)_
P.S. Don't attach files. Please, prefer add code snippets directly in the message body.
The text was updated successfully, but these errors were encountered:
This approach seems good to me. My only recommendation in advance it to read the data directly from s3 using wr.s3.read_parquet_table() instead of fetch it from Athena. If we will not process/filter the data in the Athena query, I think we can save time and money skipping it.
@igorborgest Good call out. I will make sure to incorporate your recommendation.
jiteshsoni
changed the title
Enable Merge Upsert for existing Athena Table on Primary Keys
Enable Merge Upsert for existing Glue Table on Primary Keys
Jan 6, 2021
jiteshsoni
changed the title
Enable Merge Upsert for existing Glue Table on Primary Keys
Enable Merge Upsert for existing Glue Tables on Primary Keys
Jan 6, 2021
Is your feature request related to a problem? Please describe.
Currently, we have a table overwrite feature or drop/add partitions. We should add functionality so that we can merge upsert data into existing table.
Describe the solution you'd like
We will follow the below steps:
Write to Glue catalog
res = wr.s3.to_parquet(
df=merged_df,
path=s3_path_prefix,
dataset=True,
database=database_name,
table=table_name,
mode="overwrite"
)_
P.S. Don't attach files. Please, prefer add code snippets directly in the message body.
The text was updated successfully, but these errors were encountered: