Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2935: Avoid double close of ParquetFileWriter #2951

Merged
merged 2 commits into from
Jul 22, 2024
Merged

GH-2935: Avoid double close of ParquetFileWriter #2951

merged 2 commits into from
Jul 22, 2024

Conversation

hellishfire
Copy link
Contributor

Rationale for this change:

Refer to #2935

What changes are included in this PR?

Prevent double close of ParquetFileWriter

Are these changes tested?

Yes

Are there any user-facing changes?

No

Closes #2935

try (PositionOutputStream temp = out) {
temp.flush();
if (crcAllocator != null) {
crcAllocator.close();
}
closed = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this in the finally block just in case of any exception?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in finally block makes it be not retryable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting it in finally block makes it be not retryable.

That was also my initial thought. I was actually torn between these two choices, but I suppose it's rare for people to actually retry failed close operation. Feedback is welcome.

Copy link

@doki23 doki23 Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's rare for people to actually retry failed close operation

I agree with you, it's rare.
But if someone retries, it will not work as expected -- file is not closed as expected. This is not user friendly. And if people don't retry, it's no matter where we put it in, right? But it will also lead to another problem -- the external finally block will throw an exception that suppresses the original exception.

I also find InteralParquetRecordWriter sets closed to true in the finally block too. But if AutoCloseables.uncheckedClose throws an exception, it'll keep false.

  public void close() throws IOException, InterruptedException {
    if (!closed) {
      try {
        ......
      } finally {
        AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
        closed = true;
      }
    }
  }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I notice that try {} finally { closed = true; } is common in parquet writer code base, so it' ok to follow up.

@Fokko Fokko added this to the 1.15.0 milestone Jul 13, 2024
@hellishfire
Copy link
Contributor Author

@wgtmac Is there any plan to release another 1.14 minor version? 1.14 is unusable with some stream implementations caused by this bug.

@wgtmac
Copy link
Member

wgtmac commented Jul 15, 2024

Could you please point me to the downstream issue? We do not have a planned release for 1.14.2 but we can do it at any time if it breaks many downstream cases. @hellishfire

BTW, I saw that Apache Iceberg is unable to pick parquet-java 1.14.1. Is there anything we can do on the parquet side for a quick release? @Fokko

@hellishfire
Copy link
Contributor Author

Could you please point me to the downstream issue? We do not have a planned release for 1.14.2 but we can do it at any time if it breaks many downstream cases. @hellishfire

BTW, I saw that Apache Iceberg is unable to pick parquet-java 1.14.1. Is there anything we can do on the parquet side for a quick release? @Fokko

Ok...We encountered this issue in a private code base, but I suppose this bug can break many filesystem implementations if they don't allow flush after close, which should be a very reasonable behavior.

@wgtmac
Copy link
Member

wgtmac commented Jul 17, 2024

@hellishfire What about sticking to 1.13.1 for a while? We have a planned release of 1.15.0 in this October and there are not many defect reports received for 1.14.1.

@Fokko
Copy link
Contributor

Fokko commented Jul 17, 2024

BTW, I saw that Apache Iceberg is unable to pick parquet-java 1.14.1. Is there anything we can do on the parquet side for a quick release? @Fokko

It has issues with the latest Jackson release, which carries JDK17/21 specific files and is not supported by the Gradle Shadow plugin we're using. We cannot upgrade that plugin because it removed support for Java 8. So we either have to downgrade Jackson, or remove Java 8 support for Iceberg.

@Fokko
Copy link
Contributor

Fokko commented Jul 17, 2024

I'm happy to run a quick release, as another PR has been raised #2958

@wgtmac
Copy link
Member

wgtmac commented Jul 17, 2024

It has issues with the latest Jackson release, which carries JDK17/21 specific files and is not supported by the Gradle Shadow plugin we're using.

Does it make sense to downgrade Jackson on our side? I know it was tied to some other fixes so it might be difficult.

I'm happy to run a quick release, as another PR has been raised #2958

That's awesome! Let me know if I can help anything.

@Fokko
Copy link
Contributor

Fokko commented Jul 17, 2024

Does it make sense to downgrade Jackson on our side? I know it was tied to some other fixes so it might be difficult.

I would be hesitant to downgrade for just Iceberg. When more folks are running into it, then it makes sense, but it looks like Iceberg is a bit isolated because of Gradle.

@hellishfire
Copy link
Contributor Author

hellishfire commented Jul 17, 2024

Does it make sense to downgrade Jackson on our side? I know it was tied to some other fixes so it might be difficult.

I would be hesitant to downgrade for just Iceberg. When more folks are running into it, then it makes sense, but it looks like Iceberg is a bit isolated because of Gradle.

We used maven shade plugin to skip these JDK17/21 specific files from Jackson.
But anyway, we should merge this pull request.

p.s. I believe there's not many issue reports for 1.14.1 because most users haven't upgraded yet.

@Fokko Fokko merged commit 824b7d0 into apache:master Jul 22, 2024
9 checks passed
@Fokko
Copy link
Contributor

Fokko commented Jul 22, 2024

Thanks for fixing this @hellishfire and thanks @wgtmac, @doki23 for the review 🚀

@Fokko Fokko modified the milestones: 1.15.0, 1.14.2 Jul 22, 2024
@hellishfire hellishfire deleted the fix-close branch July 22, 2024 10:56
Fokko pushed a commit that referenced this pull request Jul 22, 2024
* GH-2935: Avoid double close of ParquetFileWriter

* fix comment

---------

Co-authored-by: youming.whl <youming.whl@antfin.com>
@steveloughran
Copy link
Contributor

I'd have thought an atomic boolean would have been safer here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double close of ParquetFileWriter in ParquetWriter
5 participants