Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double close of ParquetFileWriter in ParquetWriter #2935

Closed
hellishfire opened this issue Jun 27, 2024 · 0 comments · Fixed by #2951
Closed

Double close of ParquetFileWriter in ParquetWriter #2935

hellishfire opened this issue Jun 27, 2024 · 0 comments · Fixed by #2951

Comments

@hellishfire
Copy link
Contributor

hellishfire commented Jun 27, 2024

ParquetWriter.close() invokes InternalParquetRecordWriter.close() with following logic:

  public void close() throws IOException, InterruptedException {
    if (!closed) {
      try {
        if (aborted) {
          return;
        }
        flushRowGroupToStore();
        FinalizedWriteContext finalWriteContext = writeSupport.finalizeWrite();
        Map<String, String> finalMetadata = new HashMap<String, String>(extraMetaData);
        String modelName = writeSupport.getName();
        if (modelName != null) {
          finalMetadata.put(ParquetWriter.OBJECT_MODEL_NAME_PROP, modelName);
        }
        finalMetadata.putAll(finalWriteContext.getExtraMetaData());
        parquetFileWriter.end(finalMetadata);
      } finally {
        AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
        closed = true;
      }
    }
  }

Apparently parquetFileWriter is closed twice here, first time by
parquetFileWriter.end(finalMetadata), which eventually calls parquetFileWriter.close()

second time by
AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);

This causes the underlying PositionOutputStream in ParquetFileWriter to be flushed again after it's closed, which may raise exception depending on the underlying stream implementation.

  public void close() throws IOException {
    try (PositionOutputStream temp = out) {
      temp.flush();
      if (crcAllocator != null) {
        crcAllocator.close();
      }
    }
  }

sample exception:

Caused by: org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85)
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:144)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:437)
... 70 more
Caused by: java.io.IOException: stream is already closed
(-------- specific stream implementation ----------------)
at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.flush(HadoopPositionOutputStream.java:59)
at org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1659)
at org.apache.parquet.util.AutoCloseables.close(AutoCloseables.java:49)
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:83)

This issue is observed since 1.14.0, and I suspect PARQUET-2496 is caused by this similar issue.

@hellishfire hellishfire changed the title Double close of parquetFileWriter in ParquetWriter Double close of ParquetFileWriter in ParquetWriter Jun 27, 2024
devOpsHazelcast pushed a commit to hazelcast/hazelcast that referenced this issue Jul 10, 2024
…[5.3.z] (#2561)

Fixes #22541
Fixes #24981
Fixes #26354

Closes https://hazelcast.atlassian.net/browse/REL-279

Backports https://github.com/hazelcast/hazelcast-mono/pull/2467

Notes:
1. apache/parquet-java@274dc51b has broken `ParquetWriter#close()`. See also: https://issues.apache.org/jira/browse/PARQUET-2496 and apache/parquet-java#2935.
2. `hadoop2` classifier has been removed from `avro-mapred`. See also: https://github.com/hazelcast/hazelcast-mono/pull/834.
3. Upgrades `software.amazon.awssdk` from 2.20.95 to 2.24.13.
4. Upgrades `maven-shade-plugin` to 3.6.0 because `org.apache.parquet:parquet-jackson:1.14.1` has classes compiled with Java 21.
5. Allows `MIT-0` license, which is used by `org.reactivestreams:reactive-streams:1.0.4`. See also: #25325.
6. Adds `jar-with-dependencies` classifier to `hazelcast-jet-kafka` and `hazelcast-jet-mongodb` in enterprise-sql-it/pom.xml because `animal-sniffer-maven-plugin` cannot find some transitive dependencies (`kafka-clients` and `mongodb-driver-sync`) in `mvn verify` (it can find them in `mvn install`). See also: https://hazelcast.slack.com/archives/C07066ELRRD/p1720539966962809.
GitOrigin-RevId: e838e0abe0123ef6580d31ada5e675dab1526c20
devOpsHazelcast pushed a commit to hazelcast/hazelcast that referenced this issue Jul 10, 2024
…#2571)

Fixes #22541
Fixes #24981
Fixes #26354

Closes https://hazelcast.atlassian.net/browse/REL-257

Forwardports https://github.com/hazelcast/hazelcast-mono/pull/2467

Notes:
1. apache/parquet-java@274dc51b has broken `ParquetWriter#close()`. See also: https://issues.apache.org/jira/browse/PARQUET-2496 and apache/parquet-java#2935.
2. Adds `jdk8` classifier to `jline` because it contains classes compiled with Java 22, which breaks the build. See also: jline/jline3#937 (comment).
GitOrigin-RevId: 519e71667822b3fd7d2c7cf654f261a8a238d583
Fokko pushed a commit that referenced this issue Jul 22, 2024
* GH-2935: Avoid double close of ParquetFileWriter

* fix comment

---------

Co-authored-by: youming.whl <youming.whl@antfin.com>
Fokko pushed a commit that referenced this issue Jul 22, 2024
* GH-2935: Avoid double close of ParquetFileWriter

* fix comment

---------

Co-authored-by: youming.whl <youming.whl@antfin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant