[SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages #3146

aarondav · 2014-11-07T04:50:49Z

This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1).

This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time.

Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient.

…rvice messages This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1). This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time.

aarondav · 2014-11-07T04:52:59Z

...ork/shuffle/src/main/java/org/apache/spark/network/shuffle/protocol/ExecutorShuffleInfo.java

@@ -15,21 +15,24 @@
 * limitations under the License.
 */

-package org.apache.spark.network.shuffle;
+package org.apache.spark.network.shuffle.protocol;


Everything from here down is systematic adding encodedLength(), encode(), and decode() methods and updating unit tests, which verify that the aforementioned methods are implemented correctly.

SparkQA · 2014-11-07T04:55:00Z

Test build #23036 has started for PR 3146 at commit 538f2a3.

This patch merges cleanly.

rxin · 2014-11-07T05:59:42Z

LGTM.

SparkQA · 2014-11-07T06:01:42Z

Test build #23036 has finished for PR 3146 at commit 538f2a3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T06:01:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23036/
Test FAILed.

SparkQA · 2014-11-07T08:02:38Z

Test build #23047 has started for PR 3146 at commit b8e2a49.

This patch merges cleanly.

srowen · 2014-11-07T08:18:04Z

This sort of seems like it's reinventing what Thrift or protobuf do. Also, why is it necessary to introduce another serialization-related interface just to customize the serialization? Not objecting so much as asking why you can't just override the serialization with a desired compact serialization, or use a library.

aarondav · 2014-11-07T08:37:55Z

TL;DR: The goal is to keep the network package small, with minimal dependencies and minimal overhead to verify cross-version compatibility moving forward. It is my feeling that protobuf and thrift are expensive dependencies to have, and that Java serialization is harder to reason about.

The problem with using thrift or protobuf is inherently about dependencies. Protobuf dependencies are already a mess in Spark due to different, backwards-incompatible versions being used in Hadoop, Mesos, Akka, etc., and adding a real dependency in Spark just complicates the issue. Thrift is another relatively common dependency and has a few extra dependencies of its own, but I haven't explored that route as far. Since the code here is intended to work while running within other JVMs (e.g., YARN Node Manager), we want to keep dependencies down.

Other parts of the network package use the "Encodable" interface because they write directly to Netty and this API is thus natural (decoding ByteBufs from an IO buffer, for instance). The choice of using Encodable here rather than implementing Externalizable/Serializable objects is for two reasons: simplicity and flexibility. The Java serialization framework brings a lot of baggage and has some non-obvious pitfalls, and accidental misuse may go unnoticed until the serial version id mismatch errors arrive. Second, it is less obvious how to explicitly handle changes in classes between versions. Since we expect the shuffle service to be long-lived, we must be able to simply and straightforwardly verify that code will work in a cross-version manner, and I feel that that is harder to prove when relying on Java serialization.

Finally, the thing that makes this problem tractable, in my opinion, is that we should never be serializing complex object graphs at this level of the API. Everything should be ultimately simple, primitive values with minimal to no abstract types. We're not trying to solve serialization of general objects, just serialization of small, mostly static messages. Arrays of Strings should be the most complicated things we have to serialize.

srowen · 2014-11-07T08:49:35Z

Thanks! good to hear the reasoning. It is indeed light and the use case is not quite the same as the usual general serialization use cases.

rxin · 2014-11-07T08:49:50Z

@srowen I was initially actually for protobuf or avro, but looking at the dependency list, it'd be really hard to guarantee compatibility in the future. Given the number of messages we are actually serializing is very small, the work to do custom serialization protocol is very contained.

SparkQA · 2014-11-07T09:24:06Z

Test build #23047 has finished for PR 3146 at commit b8e2a49.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T09:24:09Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23047/
Test PASSed.

SparkQA · 2014-11-07T09:32:37Z

Test build #23051 has started for PR 3146 at commit ed1102a.

This patch merges cleanly.

SparkQA · 2014-11-07T10:58:03Z

Test build #23051 has finished for PR 3146 at commit ed1102a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-11-07T10:58:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23051/
Test PASSed.

rxin · 2014-11-07T17:42:22Z

Merging in master & branch-1.2. Thanks.

…rvice messages This PR elimiantes the network package's usage of the Java serializer and replaces it with Encodable, which is a lightweight binary protocol. Each message is preceded by a type id, which will allow us to change messages (by only adding new ones), or to change the format entirely by switching to a special id (such as -1). This protocol has the advantage over Java that we can guarantee that messages will remain compatible across compiled versions and JVMs, though it does not provide a clean way to do schema migration. In the future, it may be good to use a more heavy-weight serialization format like protobuf, thrift, or avro, but these all add several dependencies which are unnecessary at the present time. Additionally this unifies the RPC messages of NettyBlockTransferService and ExternalShuffleClient. Author: Aaron Davidson <aaron@databricks.com> Closes #3146 from aarondav/free and squashes the following commits: ed1102a [Aaron Davidson] Remove some unused imports b8e2a49 [Aaron Davidson] Add appId to test 538f2a3 [Aaron Davidson] [SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages (cherry picked from commit d4fa04e) Signed-off-by: Reynold Xin <rxin@databricks.com>

### What changes were proposed in this pull request? The pr aims to upgrade `postgresql` from `42.7.2` to `42.7.3`. ### Why are the changes needed? The version `42.7.3` full release notes: https://jdbc.postgresql.org/changelogs/2024-03-14-42.7.3-release/ - fix: boolean types not handled in SimpleQuery mode [PR #3146](pgjdbc/pgjdbc#3146) *make sure we handle boolean types in simple query mode support uuid as well handle all well known types in text mode and change else if to switch - fix: released new versions of 42.2.29, 42.3.10, 42.4.5, 42.5.6, 42.6.2 to deal with NoSuchMethodError on ByteBuffer#position when running on Java 8 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46038 from panbingkun/postgresql_upgrade. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

aarondav reviewed Nov 7, 2014
View reviewed changes

Add appId to test

b8e2a49

Remove some unused imports

ed1102a

asfgit closed this in d4fa04e Nov 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages #3146

[SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages #3146

aarondav commented Nov 7, 2014

aarondav Nov 7, 2014

SparkQA commented Nov 7, 2014

rxin commented Nov 7, 2014

SparkQA commented Nov 7, 2014

AmplabJenkins commented Nov 7, 2014

SparkQA commented Nov 7, 2014

srowen commented Nov 7, 2014

aarondav commented Nov 7, 2014

srowen commented Nov 7, 2014

rxin commented Nov 7, 2014

SparkQA commented Nov 7, 2014

AmplabJenkins commented Nov 7, 2014

SparkQA commented Nov 7, 2014

SparkQA commented Nov 7, 2014

AmplabJenkins commented Nov 7, 2014

rxin commented Nov 7, 2014

[SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages #3146

[SPARK-4187] [Core] Switch to binary protocol for external shuffle service messages #3146

Conversation

aarondav commented Nov 7, 2014

aarondav Nov 7, 2014

Choose a reason for hiding this comment

SparkQA commented Nov 7, 2014

rxin commented Nov 7, 2014

SparkQA commented Nov 7, 2014

AmplabJenkins commented Nov 7, 2014

SparkQA commented Nov 7, 2014

srowen commented Nov 7, 2014

aarondav commented Nov 7, 2014

srowen commented Nov 7, 2014

rxin commented Nov 7, 2014

SparkQA commented Nov 7, 2014

AmplabJenkins commented Nov 7, 2014

SparkQA commented Nov 7, 2014

SparkQA commented Nov 7, 2014

AmplabJenkins commented Nov 7, 2014

rxin commented Nov 7, 2014