Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49795][CORE][SQL][SS][DSTREAM][ML][MLLIB][K8S][YARN][EXAMPLES] Clean up deprecated Guava API usage #48248

Closed
wants to merge 10 commits into from

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Sep 25, 2024

What changes were proposed in this pull request?

In order to clean up the usage of deprecated Guava API, the following changes were made in this pr:

  1. Replaced Files.write(from, to, charset) with Files.asCharSink(to, charset).write(from). This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/io/Files.java#L275-L291

/**
   * Writes a character sequence (such as a string) to a file using the given character set.
   *
   * @param from the character sequence to write
   * @param to the destination file
   * @param charset the charset used to encode the output stream; see {@link StandardCharsets} for
   *     helpful predefined constants
   * @throws IOException if an I/O error occurs
   * @deprecated Prefer {@code asCharSink(to, charset).write(from)}.
   */
  @Deprecated
  @InlineMe(
      replacement = "Files.asCharSink(to, charset).write(from)",
      imports = "com.google.common.io.Files")
  public static void write(CharSequence from, File to, Charset charset) throws IOException {
    asCharSink(to, charset).write(from);
  }
  1. Replaced Files.append(from, to, charset) with Files.asCharSink(to, charset, FileWriteMode.APPEND).write(from). This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/io/Files.java#L350-L368

/**
   * Appends a character sequence (such as a string) to a file using the given character set.
   *
   * @param from the character sequence to append
   * @param to the destination file
   * @param charset the charset used to encode the output stream; see {@link StandardCharsets} for
   *     helpful predefined constants
   * @throws IOException if an I/O error occurs
   * @deprecated Prefer {@code asCharSink(to, charset, FileWriteMode.APPEND).write(from)}. This
   *     method is scheduled to be removed in October 2019.
   */
  @Deprecated
  @InlineMe(
      replacement = "Files.asCharSink(to, charset, FileWriteMode.APPEND).write(from)",
      imports = {"com.google.common.io.FileWriteMode", "com.google.common.io.Files"})
  public
  static void append(CharSequence from, File to, Charset charset) throws IOException {
    asCharSink(to, charset, FileWriteMode.APPEND).write(from);
  }
  1. Replaced Files.toString(file, charset) with Files.asCharSource(file, charset).read(). This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/io/Files.java#L243-L259

  /**
   * Reads all characters from a file into a {@link String}, using the given character set.
   *
   * @param file the file to read from
   * @param charset the charset used to decode the input stream; see {@link StandardCharsets} for
   *     helpful predefined constants
   * @return a string containing all the characters from the file
   * @throws IOException if an I/O error occurs
   * @deprecated Prefer {@code asCharSource(file, charset).read()}.
   */
  @Deprecated
  @InlineMe(
      replacement = "Files.asCharSource(file, charset).read()",
      imports = "com.google.common.io.Files")
  public static String toString(File file, Charset charset) throws IOException {
    return asCharSource(file, charset).read();
  }
  1. Replaced HashFunction.murmur3_32() with HashFunction.murmur3_32_fixed(). This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Hashing.java#L99-L115

 /**
   * Returns a hash function implementing the <a
   * href="https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp">32-bit murmur3
   * algorithm, x86 variant</a> (little-endian variant), using the given seed value, <b>with a known
   * bug</b> as described in the deprecation text.
   *
   * <p>The C++ equivalent is the MurmurHash3_x86_32 function (Murmur3A), which however does not
   * have the bug.
   *
   * @deprecated This implementation produces incorrect hash values from the {@link
   *     HashFunction#hashString} method if the string contains non-BMP characters. Use {@link
   *     #murmur3_32_fixed()} instead.
   */
  @Deprecated
  public static HashFunction murmur3_32() {
    return Murmur3_32HashFunction.MURMUR3_32;
  }

This change is safe for Spark. The difference between MURMUR3_32 and MURMUR3_32_FIXED lies in the different supplementaryPlaneFix parameters passed when constructing the Murmur3_32HashFunction:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L56-L59

  static final HashFunction MURMUR3_32 =
      new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ false);
  static final HashFunction MURMUR3_32_FIXED =
      new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ true);

However, the supplementaryPlaneFix parameter is only used in Murmur3_32HashFunction#hashString, and Spark only utilizes Murmur3_32HashFunction#hashInt. Therefore, there will be no logical changes to this method after this change.

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L108-L114

  @Override
  public HashCode hashInt(int input) {
    int k1 = mixK1(input);
    int h1 = mixH1(seed, k1);

    return fmix(h1, Ints.BYTES);
  }
  1. Replaced Throwables.propagateIfPossible(throwable, declaredType) with Throwables.throwIfInstanceOf(throwable, declaredType) + Throwables.throwIfUnchecked(throwable). This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/base/Throwables.java#L156-L175

/**
   * Propagates {@code throwable} exactly as-is, if and only if it is an instance of {@link
   * RuntimeException}, {@link Error}, or {@code declaredType}.
   *
   * <p><b>Discouraged</b> in favor of calling {@link #throwIfInstanceOf} and {@link
   * #throwIfUnchecked}.
   *
   * @param throwable the Throwable to possibly propagate
   * @param declaredType the single checked exception type declared by the calling method
   * @deprecated Use a combination of {@link #throwIfInstanceOf} and {@link #throwIfUnchecked},
   *     which togther provide the same behavior except that they reject {@code null}.
   */
  @Deprecated
  @J2ktIncompatible
  @GwtIncompatible // propagateIfInstanceOf
  public static <X extends Throwable> void propagateIfPossible(
      @CheckForNull Throwable throwable, Class<X> declaredType) throws X {
    propagateIfInstanceOf(throwable, declaredType);
    propagateIfPossible(throwable);
  }
  1. Made modifications to Throwables.propagate with reference to https://github.com/google/guava/wiki/Why-we-deprecated-Throwables.propagate
  • For cases where it is known to be a checked exception, including IOException, GeneralSecurityException, SaslException, and RocksDBException, none of which are subclasses of RuntimeException or Error, directly replaced Throwables.propagate(e) with throw new RuntimeException(e);.

  • For cases where it cannot be determined whether it is a checked exception or an unchecked exception or Error, use

throwIfUnchecked(e);
throw new RuntimeException(e);

to replace Throwables.propagate(e)

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/base/Throwables.java#L199-L235

  /**
   * ...
   * @deprecated To preserve behavior, use {@code throw e} or {@code throw new RuntimeException(e)}
   *     directly, or use a combination of {@link #throwIfUnchecked} and {@code throw new
   *     RuntimeException(e)}. But consider whether users would be better off if your API threw a
   *     different type of exception. For background on the deprecation, read <a
   *     href="https://goo.gl/Ivn2kc">Why we deprecated {@code Throwables.propagate}</a>.
   */
  @CanIgnoreReturnValue
  @J2ktIncompatible
  @GwtIncompatible
  @Deprecated
  public static RuntimeException propagate(Throwable throwable) {
    throwIfUnchecked(throwable);
    throw new RuntimeException(throwable);
  }

Why are the changes needed?

Clean up deprecated Guava API usage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft September 25, 2024 16:27
@github-actions github-actions bot added the CORE label Sep 25, 2024
@LuciferYang
Copy link
Contributor Author

still some work left undone, but it doesn't require attention at this time

@LuciferYang LuciferYang changed the title [WIP] Clean up deprecated Guava API usage [WIP][CORE] Clean up deprecated Guava API usage Sep 25, 2024
@@ -266,7 +266,7 @@ class OpenHashSet[@specialized(Long, Int, Double, Float) T: ClassTag](
/**
* Re-hash a value to deal better with hash functions that don't differ in the lower bits.
*/
private def hashcode(h: Int): Int = Hashing.murmur3_32().hashInt(h).asInt()
private def hashcode(h: Int): Int = Hashing.murmur3_32_fixed().hashInt(h).asInt()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, this is a different implementation that fixes the original one's bug.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/google/guava/blob/3c7c173e9c6ac93f154bfe40876f0c792d849f6e/guava/src/com/google/common/hash/Hashing.java#L117-L133

  /**
   * Returns a hash function implementing the <a
   * href="https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp">32-bit murmur3
   * algorithm, x86 variant</a> (little-endian variant), using the given seed value, <b>with a known
   * bug</b> as described in the deprecation text.
   *
   * <p>The C++ equivalent is the MurmurHash3_x86_32 function (Murmur3A), which however does not
   * have the bug.
   *
   * @deprecated This implementation produces incorrect hash values from the {@link
   *     HashFunction#hashString} method if the string contains non-BMP characters. Use {@link
   *     #murmur3_32_fixed()} instead.
   */
  @Deprecated
  public static HashFunction murmur3_32() {
    return Murmur3_32HashFunction.MURMUR3_32;
  }

Yes, but this is the official fix provided, and there seems to be no other equivalent alternative.

@pan3793 any better suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan3793

I think this change is safe for Spark. The difference between MURMUR3_32 and MURMUR3_32_FIXED lies in the different supplementaryPlaneFix parameters passed when constructing the Murmur3_32HashFunction:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L56-L59

  static final HashFunction MURMUR3_32 =
      new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ false);
  static final HashFunction MURMUR3_32_FIXED =
      new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ true);

However, the supplementaryPlaneFix parameter is only used in Murmur3_32HashFunction#hashString, and Spark only utilizes Murmur3_32HashFunction#hashInt. Therefore, there will be no logical changes to this method after this change.

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L108-L114

  @Override
  public HashCode hashInt(int input) {
    int k1 = mixK1(input);
    int h1 = mixH1(seed, k1);

    return fmix(h1, Ints.BYTES);
  }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for checking, it is not a problem then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pan3793

@LuciferYang LuciferYang changed the title [WIP][CORE] Clean up deprecated Guava API usage [WIP][CORE][SQL][SS][DSTREAM][ML][MLLIB][K8S][YARN][EXAMPLES] Clean up deprecated Guava API usage Sep 26, 2024
@LuciferYang LuciferYang changed the title [WIP][CORE][SQL][SS][DSTREAM][ML][MLLIB][K8S][YARN][EXAMPLES] Clean up deprecated Guava API usage [SPARK-49795][CORE][SQL][SS][DSTREAM][ML][MLLIB][K8S][YARN][EXAMPLES] Clean up deprecated Guava API usage Sep 26, 2024
@@ -290,9 +290,11 @@ public void onFailure(Throwable e) {
try {
return result.get(timeoutMs, TimeUnit.MILLISECONDS);
} catch (ExecutionException e) {
throw Throwables.propagate(e.getCause());
Throwables.throwIfUnchecked(e.getCause());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unable to confirm what type of Throwable is wrapped in ExecutionException, so still use

Throwables.throwIfUnchecked(e.getCause());
 throw new RuntimeException(e.getCause());

@LuciferYang LuciferYang marked this pull request as ready for review September 27, 2024 06:11
@LuciferYang
Copy link
Contributor Author

still some work left undone, but it doesn't require attention at this time

all done

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

Ping @pan3793 once more since he had a comment before.

@dongjoon-hyun
Copy link
Member

Also, cc @panbingkun since he is working on this area.

@dongjoon-hyun
Copy link
Member

Thank you, @LuciferYang and @pan3793 .
Merged to master for Apache Spark 4.0.0.

@LuciferYang
Copy link
Contributor Author

Thanks @dongjoon-hyun and @pan3793

@panbingkun
Copy link
Contributor

Also, cc @panbingkun since he is working on this area.

+1, LGTM.
Thanks @dongjoon-hyun

@LuciferYang
Copy link
Contributor Author

Thanks @panbingkun ~

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
… Clean up deprecated Guava API usage

### What changes were proposed in this pull request?
In order to clean up the usage of deprecated Guava API, the following changes were made in this pr:

1. Replaced `Files.write(from, to, charset)` with `Files.asCharSink(to, charset).write(from)`. This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/io/Files.java#L275-L291

```java
/**
   * Writes a character sequence (such as a string) to a file using the given character set.
   *
   * param from the character sequence to write
   * param to the destination file
   * param charset the charset used to encode the output stream; see {link StandardCharsets} for
   *     helpful predefined constants
   * throws IOException if an I/O error occurs
   * deprecated Prefer {code asCharSink(to, charset).write(from)}.
   */
  Deprecated
  InlineMe(
      replacement = "Files.asCharSink(to, charset).write(from)",
      imports = "com.google.common.io.Files")
  public static void write(CharSequence from, File to, Charset charset) throws IOException {
    asCharSink(to, charset).write(from);
  }
```

2. Replaced `Files.append(from, to, charset)` with `Files.asCharSink(to, charset, FileWriteMode.APPEND).write(from)`. This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/io/Files.java#L350-L368

```java
/**
   * Appends a character sequence (such as a string) to a file using the given character set.
   *
   * param from the character sequence to append
   * param to the destination file
   * param charset the charset used to encode the output stream; see {link StandardCharsets} for
   *     helpful predefined constants
   * throws IOException if an I/O error occurs
   * deprecated Prefer {code asCharSink(to, charset, FileWriteMode.APPEND).write(from)}. This
   *     method is scheduled to be removed in October 2019.
   */
  Deprecated
  InlineMe(
      replacement = "Files.asCharSink(to, charset, FileWriteMode.APPEND).write(from)",
      imports = {"com.google.common.io.FileWriteMode", "com.google.common.io.Files"})
  public
  static void append(CharSequence from, File to, Charset charset) throws IOException {
    asCharSink(to, charset, FileWriteMode.APPEND).write(from);
  }
```

3. Replaced `Files.toString(file, charset)` with `Files.asCharSource(file, charset).read()`. This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/io/Files.java#L243-L259

```java
  /**
   * Reads all characters from a file into a {link String}, using the given character set.
   *
   * param file the file to read from
   * param charset the charset used to decode the input stream; see {link StandardCharsets} for
   *     helpful predefined constants
   * return a string containing all the characters from the file
   * throws IOException if an I/O error occurs
   * deprecated Prefer {code asCharSource(file, charset).read()}.
   */
  Deprecated
  InlineMe(
      replacement = "Files.asCharSource(file, charset).read()",
      imports = "com.google.common.io.Files")
  public static String toString(File file, Charset charset) throws IOException {
    return asCharSource(file, charset).read();
  }
```

4. Replaced `HashFunction.murmur3_32()` with `HashFunction.murmur3_32_fixed()`. This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Hashing.java#L99-L115

```java
 /**
   * Returns a hash function implementing the <a
   * href="https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp">32-bit murmur3
   * algorithm, x86 variant</a> (little-endian variant), using the given seed value, <b>with a known
   * bug</b> as described in the deprecation text.
   *
   * <p>The C++ equivalent is the MurmurHash3_x86_32 function (Murmur3A), which however does not
   * have the bug.
   *
   * deprecated This implementation produces incorrect hash values from the {link
   *     HashFunction#hashString} method if the string contains non-BMP characters. Use {link
   *     #murmur3_32_fixed()} instead.
   */
  Deprecated
  public static HashFunction murmur3_32() {
    return Murmur3_32HashFunction.MURMUR3_32;
  }
```

This change is safe for Spark. The difference between `MURMUR3_32` and `MURMUR3_32_FIXED` lies in the different `supplementaryPlaneFix` parameters passed when constructing the `Murmur3_32HashFunction`:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L56-L59

```java
  static final HashFunction MURMUR3_32 =
      new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ false);
  static final HashFunction MURMUR3_32_FIXED =
      new Murmur3_32HashFunction(0, /* supplementaryPlaneFix= */ true);
```

However, the `supplementaryPlaneFix` parameter is only used in `Murmur3_32HashFunction#hashString`, and Spark only utilizes `Murmur3_32HashFunction#hashInt`. Therefore, there will be no logical changes to this method after this change.

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/hash/Murmur3_32HashFunction.java#L108-L114

```java
  Override
  public HashCode hashInt(int input) {
    int k1 = mixK1(input);
    int h1 = mixH1(seed, k1);

    return fmix(h1, Ints.BYTES);
  }
```

5. Replaced `Throwables.propagateIfPossible(throwable, declaredType)` with `Throwables.throwIfInstanceOf(throwable, declaredType)` + `Throwables.throwIfUnchecked(throwable)`. This change was made with reference to:

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/base/Throwables.java#L156-L175

```
/**
   * Propagates {code throwable} exactly as-is, if and only if it is an instance of {link
   * RuntimeException}, {link Error}, or {code declaredType}.
   *
   * <p><b>Discouraged</b> in favor of calling {link #throwIfInstanceOf} and {link
   * #throwIfUnchecked}.
   *
   * param throwable the Throwable to possibly propagate
   * param declaredType the single checked exception type declared by the calling method
   * deprecated Use a combination of {link #throwIfInstanceOf} and {link #throwIfUnchecked},
   *     which togther provide the same behavior except that they reject {code null}.
   */
  Deprecated
  J2ktIncompatible
  GwtIncompatible // propagateIfInstanceOf
  public static <X extends Throwable> void propagateIfPossible(
      CheckForNull Throwable throwable, Class<X> declaredType) throws X {
    propagateIfInstanceOf(throwable, declaredType);
    propagateIfPossible(throwable);
  }
```

6. Made modifications to `Throwables.propagate` with reference to https://github.com/google/guava/wiki/Why-we-deprecated-Throwables.propagate

- For cases where it is known to be a checked exception, including `IOException`, `GeneralSecurityException`, `SaslException`, and `RocksDBException`, none of which are subclasses of `RuntimeException` or `Error`, directly replaced `Throwables.propagate(e)` with `throw new RuntimeException(e);`.

- For cases where it cannot be determined whether it is a checked exception or an unchecked exception or Error, use

```java
throwIfUnchecked(e);
throw new RuntimeException(e);
```

 to replace `Throwables.propagate(e)`。

https://github.com/google/guava/blob/0c33dd12b193402cdf6962d43d69743521aa2f76/guava/src/com/google/common/base/Throwables.java#L199-L235

```java
  /**
   * ...
   * deprecated To preserve behavior, use {code throw e} or {code throw new RuntimeException(e)}
   *     directly, or use a combination of {link #throwIfUnchecked} and {code throw new
   *     RuntimeException(e)}. But consider whether users would be better off if your API threw a
   *     different type of exception. For background on the deprecation, read <a
   *     href="https://goo.gl/Ivn2kc">Why we deprecated {code Throwables.propagate}</a>.
   */
  CanIgnoreReturnValue
  J2ktIncompatible
  GwtIncompatible
  Deprecated
  public static RuntimeException propagate(Throwable throwable) {
    throwIfUnchecked(throwable);
    throw new RuntimeException(throwable);
  }
```

### Why are the changes needed?
Clean up deprecated Guava API usage.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48248 from LuciferYang/guava-deprecation.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants