Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-800] Pack the classes into one single jar #1002

Merged
merged 8 commits into from
Jun 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions arrow-data-source/common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
<scope>compile</scope>
</dependency>
Expand Down
1 change: 1 addition & 0 deletions arrow-data-source/standard/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
<version>${project.version}</version>
<scope>provided</scope>
</dependency>

</dependencies>

<build>
Expand Down
25 changes: 18 additions & 7 deletions docs/User-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ There are three ways to use OAP: Gazelle Plugin,

### Use precompiled jars

#### Before 1.4.0 release

Please go to [OAP's Maven Central Repository](https://repo1.maven.org/maven2/com/intel/oap/) to find Gazelle Plugin jars.
For usage, you will require below two jar files:
1. `spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar` is located in com/intel/oap/spark-arrow-datasource-standard/<version>/
Expand All @@ -62,11 +64,20 @@ And for spark 3.2.x, the jar whose `<spark-version>` is `spark321` should be use

4. `spark-sql-columnar-shims-<spark-version>-<version>-SNAPSHOT.jar`

Please notice the files are fat jars shipped with our custom Arrow library and pre-compiled from our server(using GCC 9.3.0 and LLVM 7.0.1), which means you will require to pre-install GCC 9.3.0 and LLVM 7.0.1 in your system for normal usage.
#### Start from 1.4.0 release

Since 1.4.0 release, we consolidate 4 jars into one single jar. And the supported spark version is contained in the jar name. User can pick one jar according to your spark version.

`gazelle-plugin-1.4.0-spark-3.1.1.jar`

`gazelle-plugin-1.4.0-spark-3.2.1.jar`


Please note the files are fat jars shipped with our custom Arrow library and pre-compiled from our server(using GCC 9.3.0 and LLVM 7.0.1), which means you will require to pre-install GCC 9.3.0 and LLVM 7.0.1 in your system for normal usage.

### Building by Conda

If you already have a working Hadoop Spark Cluster, we provide a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./OAP-Installation-Guide.md) for more information. Once finished [OAP-Installation-Guide](./OAP-Installation-Guide.md), you can find built `spark-columnar-core-<version>-jar-with-dependencies.jar` under `$HOME/miniconda2/envs/oapenv/oap_jars`.
If you already have a working Hadoop Spark Cluster, we provide a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./OAP-Installation-Guide.md) for more information. Once finished [OAP-Installation-Guide](./OAP-Installation-Guide.md), you can find built `gazelle-plugin-<version>-<spark-version>.jar` under `$HOME/miniconda2/envs/oapenv/oap_jars`.
Then you can just skip below steps and jump to [Get Started](#get-started).

### Building by yourself
Expand All @@ -86,7 +97,9 @@ Please check the document [Installation Guide](./Installation.md)
## Get started

To enable Gazelle Plugin, the previous built jar `spark-columnar-core-<version>-jar-with-dependencies.jar` and `spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar` should be added to Spark configuration. As of 1.3.1 release, a new shim layer was introduced to work with Spark minor releases. The shim layer also have two jars`spark-sql-columnar-shims-common-<version>-SNAPSHOT.jar` and `spark-sql-columnar-shims-<spark-version>-<version>-SNAPSHOT.jar`
We will demonstrate an example with Spark 3.2.1 by here.
And after 1.4.0 release, only one single jar is required: `gazelle-plugin-1.4.0-spark-3.1.1.jar` or `gazelle-plugin-1.4.0-spark-3.2.1.jar`.

We will demonstrate how to deploy Gazelle (since 1.4.0 release) on Spark 3.2.1.
SPARK related options are:

* `spark.driver.extraClassPath` : Set to load jar file to driver.
Expand All @@ -107,10 +120,8 @@ ${SPARK_HOME}/bin/spark-shell \
--master yarn \
--driver-memory 10G \
--conf spark.plugins=com.intel.oap.GazellePlugin \
--conf spark.driver.extraClassPath=$PATH_TO_JAR/spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar:$PATH_TO_JAR/spark-columnar-core-<version>-jar-with-dependencies.jar:$PATH_TO_JAR/spark-sql-columnar-shims-common-<version>-SNAPSHOT.jar:$PATH_TO_JAR/
spark-sql-columnar-shims-spark321-<version>-SNAPSHOT.jar \
--conf spark.executor.extraClassPath=$PATH_TO_JAR/spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar:$PATH_TO_JAR/spark-columnar-core-<version>-jar-with-dependencies.jar:$PATH_TO_JAR/spark-sql-columnar-shims-common-<version>-SNAPSHOT.jar:$PATH_TO_JAR/
spark-sql-columnar-shims-spark321-<version>-SNAPSHOT.jar \
--conf spark.driver.extraClassPath=$PATH_TO_JAR/gazelle-plugin-1.4.0-spark-3.2.1.jar \
--conf spark.executor.extraClassPath=$PATH_TO_JAR/gazelle-plugin-1.4.0-spark-3.2.1.jar \
--conf spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager \
--conf spark.driver.cores=1 \
--conf spark.executor.instances=12 \
Expand Down
109 changes: 109 additions & 0 deletions gazelle-dist/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>com.intel.oap</groupId>
<artifactId>native-sql-engine-parent</artifactId>
<version>1.4.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>

<artifactId>gazelle-dist</artifactId>
<name>Gazelle dist</name>
<packaging>pom</packaging>

<profiles>
<profile>
<id>spark-3.1</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<dependencies>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-sql-columnar-shims-spark311</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
<profile>
<id>spark-3.2</id>
<dependencies>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-sql-columnar-shims-spark321</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
</profile>
</profiles>

<dependencies>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-arrow-datasource-common</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-arrow-datasource-parquet</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-arrow-datasource-standard</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-columnar-core</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>com.intel.oap</groupId>
<artifactId>spark-sql-columnar-shims-common</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.4</version>
<executions>
<execution>
<id>assembly</id>
<!-- create assembly in package phase-->
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
<configuration>
<descriptors>
<descriptor>src/main/assembly/assembly.xml</descriptor>
</descriptors>
<finalName>gazelle-plugin-${project.version}-spark-${spark.version}</finalName>
<appendAssemblyId>false</appendAssemblyId>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

</project>
41 changes: 41 additions & 0 deletions gazelle-dist/src/main/assembly/assembly.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<assembly>
<id>all-jar</id>
<formats>
<format>jar</format> <!-- the result is a jar file -->
</formats>

<includeBaseDirectory>false</includeBaseDirectory> <!-- strip the module prefixes -->

<dependencySets>
<dependencySet>
<unpack>true</unpack> <!-- unpack , then repack the jars -->
<useTransitiveDependencies>true</useTransitiveDependencies>
<useTransitiveFiltering>true</useTransitiveFiltering>
<excludes>
<!--Exclude jars by specifying groupID:ArtifactID-->
<exclude>com.fasterxml.jackson.core:jackson-databind</exclude>
<exclude>com.fasterxml.jackson.core:jackson-annotations</exclude>
<exclude>com.fasterxml.jackson.core:jackson-core</exclude>
</excludes>
<unpackOptions>
<excludes>
<exclude>META-INF/maven/com.fasterxml.jackson.core/</exclude>
<exclude>META-INF/maven/com.fasterxml.jackson.dataformat/</exclude>
<exclude>META-INF/services/com.fasterxml.jackson.core.ObjectCodec</exclude>
<exclude>META-INF/services/com.fasterxml.jackson.core.JsonFactory</exclude>
</excludes>
</unpackOptions>
</dependencySet>
</dependencySets>
</assembly>
5 changes: 4 additions & 1 deletion native-sql-engine/core/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@

<profiles>
<profile>
<id>spark-3.1.1</id>
<id>spark-3.1</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
Expand All @@ -57,6 +57,7 @@
<groupId>com.intel.oap</groupId>
<artifactId>spark-sql-columnar-shims-spark311</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</profile>
Expand All @@ -67,6 +68,7 @@
<groupId>com.intel.oap</groupId>
<artifactId>spark-sql-columnar-shims-spark321</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
</dependencies>
</profile>
Expand Down Expand Up @@ -173,6 +175,7 @@
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
Expand Down
48 changes: 19 additions & 29 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -33,31 +33,27 @@
<module>arrow-data-source</module>
<module>native-sql-engine/core</module>
<module>shims</module>
<module>gazelle-dist</module>
</modules>

<profiles>
<profile>
<id>spark</id>
<id>spark-3.1</id>
<activation>
<activeByDefault>true</activeByDefault>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
<spark.version>${spark311.version}</spark.version>
<spark.version>${spark311.version}</spark.version>
<scala.version>2.12.10</scala.version>
<jackson.version>2.10.0</jackson.version>
</properties>
</profile>
<profile>
<id>spark-3.1</id>
<properties>
<spark.version>${spark311.version}</spark.version>
<scala.version>2.12.10</scala.version>
<jackson.version>2.10.0</jackson.version>
</properties>
</profile>
<profile>
<id>spark-3.2</id>
<properties>
<spark.version>${spark321.version}</spark.version>
<scala.version>2.12.15</scala.version>
<!--Jackson may be directly used in future UT. Align with the version in spark 3.2.-->
<jackson.version>2.12.0</jackson.version>
<maven.test.skip>true</maven.test.skip>
</properties>
Expand Down Expand Up @@ -154,6 +150,18 @@

<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.arrow</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
Expand Down Expand Up @@ -182,24 +190,6 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.arrow</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<!-- test dependencies -->
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down
16 changes: 15 additions & 1 deletion shims/common/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark311.version}</version>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
Expand All @@ -92,6 +92,20 @@
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>
4 changes: 2 additions & 2 deletions shims/spark311/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,13 @@
<groupId>com.intel.oap</groupId>
<artifactId>${project.prefix}-shims-common</artifactId>
<version>${project.version}</version>
<scope>compile</scope>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark311.version}</version>
<scope>provided</scope>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.intel.oap</groupId>
Expand Down
2 changes: 1 addition & 1 deletion shims/spark321/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
<groupId>com.intel.oap</groupId>
<artifactId>${project.prefix}-shims-common</artifactId>
<version>${project.version}</version>
<scope>compile</scope>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
Expand Down