Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT] DELETE_PARTITION causes AWS Athena Query failure #6024

Closed
Gatsby-Lee opened this issue Jul 1, 2022 · 8 comments
Closed

[SUPPORT] DELETE_PARTITION causes AWS Athena Query failure #6024

Gatsby-Lee opened this issue Jul 1, 2022 · 8 comments
Assignees
Labels
aws-support priority:critical production down; pipelines stalled; Need help asap. query-engine trino, presto, athena, impala, etc

Comments

@Gatsby-Lee
Copy link
Contributor

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

  1. DELETE_PARTITION for non-existing partition ( e.g. org_id=55555 )
  • since it will raise an exception, you have to wrap the Spark Write.
  • this operation will creates org_id=55555_\$folder$ in Hudi Table Path ( BTW, why is it even created? )
  1. UPSERT to other partition ( e.g. org_id=24 )
  • Check the current status
  • you will see org_id=55555 partition is in Glue Catalog
  1. Go to Athena / Run Query
  • you will see that the query will fail due to the missing path org_id=55555 in S3

Expected behavior

org_id=55555 MUST not be registered to Catalog

Environment Description

  • Hudi version : 0.10.1
  • Spark version : 3.1.1-amzn-0
  • Hive version : 2.3.7-amzn-4
  • Hadoop version : 3.2.1-amzn-3
  • Storage (HDFS/S3/GCS..) : S3
  • Running on Docker? (yes/no) : NO
@yihua yihua added priority:critical production down; pipelines stalled; Need help asap. query-engine trino, presto, athena, impala, etc labels Jul 5, 2022
@codope
Copy link
Member

codope commented Aug 4, 2022

@Gatsby-Lee You mentioned

since it will raise an exception, you have to wrap the Spark Write.

What exception did you get? I ran a test locally and tried to delete a non-existing partition. The replacecommit due to DELETE_PARTITION succeeded in my case (though it did not delete any data). Though it sounds counter-intuitive and one would expect to fail fast in such scenarios, but we do not do so because detecting non-existing partitions require listing which is a costly operation. Instead, from 0.11.0 onwards, the partitions are lazily deleted by the cleaner. If a partition does not exist, then even though the DELETE_PARTITION operation will succeed, nothing will be deleted and no extra metadata folder will be created. Can you please try again after upgrading to version 0.11.1 ?

@codope codope self-assigned this Aug 4, 2022
@codope
Copy link
Member

codope commented Aug 5, 2022

Btw, org_id=55555_\$folder$ maybe an S3 thing. Did the partition org_id=55555 ever existed before?

@Gatsby-Lee
Copy link
Contributor Author

@codope hi,

First, as of 0.11.x, DELETE_PARTITION ( in AWS Glue Catalog ) doesn't fail or raise exception. ( It's different from 0.10.x )
Second, like you said the actual delete is done by cleaner ( lazy ), but before the actual delete, Hudi seems to try to delete metadata in AWS Glue Catalog first.
Third, org_id=55555 has never existed.

I will try to replicate the issue with 0.11.1 and post the output here.
( I don't remember if I reproduced this issue with 0.11.0 or not. Anyway, I will try again )

@codope
Copy link
Member

codope commented Aug 10, 2022

@Gatsby-Lee Thanks for the info. I will wait for more updates from you after testing with 0.11.1. However, I can think of one thing which we can improve. Before executing the delete_partition command, we can check for whether the partition exists or not, log a warning if it does not and return early without doing any kind of modification. HUDI-4591 to track this.

@nsivabalan
Copy link
Contributor

@Gatsby-Lee : any updates here please.

@codope
Copy link
Member

codope commented Sep 7, 2022

@Gatsby-Lee Gentle reminder. Can we close this issue?

@Gatsby-Lee
Copy link
Contributor Author

Hi, let's close issue if I am the only one facing the issue.

Let me write more details before I forget.

A couple of months ago, I tried DELETE_PARTITION operation with 0.10.1 and 0.11.0
I noticed that 0.11.0 and 0.10.1 have different behavior when HUDI runs DELETE_PARTITION operation on not existing partition.

  • 0.10.1 raised exception and failed. ( the serious issue was Hudi became unstable
  • 0.11.0 was silence. ( VC told me that this is not the right behavior either. It should raise exception )

I wasn't able to use 0.11.0 because it has a compatibility issue in AWS Glue. ( it was related to AWS Glue Catalog )
I wasn't able to use 0.10.1 because it has a bug in ZookeeperLockProvider.

I ended up using 0.10.1 + a patch that fixed the ZookeeperLockProvider ( available on 0.11.1 )
And, I added a logic that checks if the target partition exists ( cc @codope )

I will test with 0.11.1 and reopen this ticket if I still notice the similar issue.

Thank you
Gatsby

@codope
Copy link
Member

codope commented Sep 8, 2022

@Gatsby-Lee Thanks and I have noted your point. Would you mind upstreaming your fix (logic that checks if the target partition exists). I believe this would be helpful for other users as well. If so, please assign HUDI-4591 to yourself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-support priority:critical production down; pipelines stalled; Need help asap. query-engine trino, presto, athena, impala, etc
Projects
None yet
Development

No branches or pull requests

5 participants