Enable Merge Upsert for existing Glue Tables on Primary Keys #503

jiteshsoni · 2021-01-04T22:12:11Z

Is your feature request related to a problem? Please describe.
Currently, we have a table overwrite feature or drop/add partitions. We should add functionality so that we can merge upsert data into existing table.

Describe the solution you'd like
We will follow the below steps:

Read existing table using Athena
Check if existing table have duplicates
Merge the new pandas dataframe into existing table using Pandas on the primary key merged_df = pd.concat([existing_df[~existing_df.index.isin(delta_df.index)], delta_df])

Overwrite the existing Athena table.
Write to Glue catalog
res = wr.s3.to_parquet(
df=merged_df,
path=s3_path_prefix,
dataset=True,
database=database_name,
table=table_name,
mode="overwrite"
)_

P.S. Don't attach files. Please, prefer add code snippets directly in the message body.

jiteshsoni · 2021-01-04T22:13:31Z

@igorborgest Please review the approach.

igorborgest · 2021-01-05T22:48:21Z

Hi @jiteshsoni,

This approach seems good to me. My only recommendation in advance it to read the data directly from s3 using wr.s3.read_parquet_table() instead of fetch it from Athena. If we will not process/filter the data in the Athena query, I think we can save time and money skipping it.

What do you think?

jiteshsoni · 2021-01-06T04:32:05Z

@igorborgest Good call out. I will make sure to incorporate your recommendation.

…ves issue #503

igorborgest · 2021-02-03T23:31:33Z

Released on version 2.4.0.

jiteshsoni self-assigned this Jan 4, 2021

jiteshsoni added the feature label Jan 4, 2021

jiteshsoni changed the title ~~Enable Merge Upsert for existing Athena Table on Primary Keys~~ Enable Merge Upsert for existing Glue Table on Primary Keys Jan 6, 2021

jiteshsoni changed the title ~~Enable Merge Upsert for existing Glue Table on Primary Keys~~ Enable Merge Upsert for existing Glue Tables on Primary Keys Jan 6, 2021

jiteshsoni added a commit that referenced this issue Jan 8, 2021

New functionality to allow merge upsert on existing table. This resol…

0342e19

…ves issue #503

jiteshsoni mentioned this issue Jan 8, 2021

New functionality to allow merge upsert on existing table. #510

Merged

igorborgest added this to the 2.4.0 milestone Jan 18, 2021

igorborgest added minor release Will be addressed in the next minor release ready to release labels Jan 18, 2021

igorborgest closed this as completed Feb 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Merge Upsert for existing Glue Tables on Primary Keys #503

Enable Merge Upsert for existing Glue Tables on Primary Keys #503

jiteshsoni commented Jan 4, 2021

jiteshsoni commented Jan 4, 2021

igorborgest commented Jan 5, 2021

jiteshsoni commented Jan 6, 2021

igorborgest commented Feb 3, 2021

Enable Merge Upsert for existing Glue Tables on Primary Keys #503

Enable Merge Upsert for existing Glue Tables on Primary Keys #503

Comments

jiteshsoni commented Jan 4, 2021

jiteshsoni commented Jan 4, 2021

igorborgest commented Jan 5, 2021

jiteshsoni commented Jan 6, 2021

igorborgest commented Feb 3, 2021