-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-2935: Avoid double close of ParquetFileWriter #2951
Conversation
try (PositionOutputStream temp = out) { | ||
temp.flush(); | ||
if (crcAllocator != null) { | ||
crcAllocator.close(); | ||
} | ||
closed = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put this in the finally block just in case of any exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting it in finally block makes it be not retryable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting it in finally block makes it be not retryable.
That was also my initial thought. I was actually torn between these two choices, but I suppose it's rare for people to actually retry failed close operation. Feedback is welcome.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's rare for people to actually retry failed close operation
I agree with you, it's rare.
But if someone retries, it will not work as expected -- file is not closed as expected. This is not user friendly. And if people don't retry, it's no matter where we put it in, right? But it will also lead to another problem -- the external finally block will throw an exception that suppresses the original exception.
I also find InteralParquetRecordWriter
sets closed to true
in the finally block too. But if AutoCloseables.uncheckedClose throws an exception, it'll keep false.
public void close() throws IOException, InterruptedException {
if (!closed) {
try {
......
} finally {
AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
closed = true;
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I notice that try {} finally { closed = true; }
is common in parquet writer code base, so it' ok to follow up.
parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestParquetFileWriter.java
Show resolved
Hide resolved
@wgtmac Is there any plan to release another 1.14 minor version? 1.14 is unusable with some stream implementations caused by this bug. |
Could you please point me to the downstream issue? We do not have a planned release for 1.14.2 but we can do it at any time if it breaks many downstream cases. @hellishfire BTW, I saw that Apache Iceberg is unable to pick parquet-java 1.14.1. Is there anything we can do on the parquet side for a quick release? @Fokko |
Ok...We encountered this issue in a private code base, but I suppose this bug can break many filesystem implementations if they don't allow flush after close, which should be a very reasonable behavior. |
@hellishfire What about sticking to 1.13.1 for a while? We have a planned release of 1.15.0 in this October and there are not many defect reports received for 1.14.1. |
It has issues with the latest Jackson release, which carries JDK17/21 specific files and is not supported by the Gradle Shadow plugin we're using. We cannot upgrade that plugin because it removed support for Java 8. So we either have to downgrade Jackson, or remove Java 8 support for Iceberg. |
I'm happy to run a quick release, as another PR has been raised #2958 |
Does it make sense to downgrade Jackson on our side? I know it was tied to some other fixes so it might be difficult.
That's awesome! Let me know if I can help anything. |
I would be hesitant to downgrade for just Iceberg. When more folks are running into it, then it makes sense, but it looks like Iceberg is a bit isolated because of Gradle. |
We used maven shade plugin to skip these JDK17/21 specific files from Jackson. p.s. I believe there's not many issue reports for 1.14.1 because most users haven't upgraded yet. |
Thanks for fixing this @hellishfire and thanks @wgtmac, @doki23 for the review 🚀 |
I'd have thought an atomic boolean would have been safer here... |
Rationale for this change:
Refer to #2935
What changes are included in this PR?
Prevent double close of ParquetFileWriter
Are these changes tested?
Yes
Are there any user-facing changes?
No
Closes #2935