Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Shuffle Implementation for Spark 3.3.1 #77

Open
wants to merge 28 commits into
base: branch-1.1-spark-3.1.1
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
492eefc
Issue 18 (#23)upgrade to Spark-3.1.1
jiafuzha Jun 11, 2021
85d0698
[REMOTE-SHUFFLE-24] Enhance executor memory release (#25)
jiafuzha Jun 15, 2021
06fea5e
[REMOTE-SHUFFLE-26] Follow OAP name convention (#27)
jiafuzha Jun 17, 2021
1fde89d
[REMOTE-SHUFFLE-28] Follow OAP name convention ${project.artifactId}-…
jiafuzha Jun 17, 2021
9621d70
[REMOTE-SHUFFLE-31] Avoid re-computation of shuffle write when execut…
jiafuzha Jul 1, 2021
9edf922
[REMOTE-SHUFFLE-35] Update version to 1.2.0 (#36)
jiafuzha Jul 9, 2021
3dc0f2d
[REMOTE-SHUFFLE-40] Update Changelog and OAP guide
Sep 2, 2021
f0b6558
[REMOTE-SHUFFLE-34] Enhance map-combine and map-ordering (#38)
jiafuzha Sep 10, 2021
169929c
[REMOTE-SHUFFLE-44] Set mapIndex to -1 when FetchFailedException to a…
jiafuzha Sep 14, 2021
50427b8
[REMOTE-SHUFFLE-46] Fixed some issues when run with I/O synchronously…
jiafuzha Nov 2, 2021
3c1ae1d
[REMOTE-SHUFFLE-50] Support Spark-3.1 and Spark-3.2 in shuffle-daos (…
jiafuzha Dec 7, 2021
55f143b
[REMOTE-SHUFFLE-43] Read Partitions from Highly Compressed Map Status…
jiafuzha Dec 14, 2021
3f9ef69
Create SECURITY.md
carsonwang Dec 15, 2021
350067d
[REMOTE-SHUFFLE-53] Make spark dependency provided (#54)
jiafuzha Dec 20, 2021
a1fddc1
[REMOTE-SHUFFLE-52] Update shuffle write to not encode/decode paramet…
jiafuzha Jan 7, 2022
acb81db
[REMOTE-SHUFFLE-56] Make shuffle object class and object hint configu…
jiafuzha Nov 4, 2022
abc0063
[REMOTE-SHUFFLE-60] Support akey offset size bigger than 2GB (#61)
jiafuzha Nov 7, 2022
57b2a3f
[REMOTE-SHUFFLE-62] Adjust some default parameters for aurora env (#63)
jiafuzha Dec 16, 2022
006a05b
[REMOTE-SHUFFLE-64] remove password from url in shuffle hadoop script…
jiafuzha Jan 5, 2023
50b79bc
[REMOTE-SHUFFLE-66] Enhance shuffle removing logic (#67)
jiafuzha Feb 21, 2023
4b33318
[REMOTE-SHUFFLE-66] Enhance shuffle removing logic (#69)
jiafuzha Apr 26, 2023
38e3822
[REMOTE-SHUFFLE-70] use DAOS object class hint instead of DAOS object…
jiafuzha Jun 5, 2023
eb785b3
[REMOTE-SHUFFLE-101] fixed UT errors
jiafuzha Jun 7, 2023
d957ef0
Merge pull request #72 from jiafuzha/ISSUE_101
carsonwang Jun 7, 2023
14cc7ee
Bump org.apache.spark:spark-core_2.12 from 3.1.1 to 3.3.3 (#74)
dependabot[bot] Sep 12, 2023
4127802
fix OpenSSF
minmingzhu Feb 28, 2024
93f7463
Update dev_cron.yml
minmingzhu Feb 28, 2024
28e1b35
Merge pull request #76 from minmingzhu/fix_openssf
carsonwang Feb 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions .github/workflows/dev_cron.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,21 @@ on:
- edited
- synchronize

permissions: read-all

jobs:
process:
name: Process
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@ee0669bd1cc54295c223e0bb666b733df41de1c5 # v2.7.0

- name: Comment Issues link
if: |
github.event_name == 'pull_request_target' &&
(github.event.action == 'opened' ||
github.event.action == 'edited')
uses: actions/github-script@v3
uses: actions/github-script@ffc2c79a5b2490bd33e0a41c1de74b877714d736 # v3.2.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
Expand All @@ -49,7 +51,7 @@ jobs:
github.event_name == 'pull_request_target' &&
(github.event.action == 'opened' ||
github.event.action == 'edited')
uses: actions/github-script@v3
uses: actions/github-script@ffc2c79a5b2490bd33e0a41c1de74b877714d736 # v3.2.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
Expand Down
350 changes: 340 additions & 10 deletions CHANGELOG.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ following configurations in spark-defaults.conf or Spark submit command line arg
Note: For DAOS users, DAOS Hadoop/Java API jars should also be included in the classpath as we leverage DAOS Hadoop filesystem.

```
spark.executor.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/remote-shuffle-<version>.jar
spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/remote-shuffle-<version>.jar
spark.executor.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/shuffle-hadoop-<version>.jar
spark.driver.extraClassPath $HOME/miniconda2/envs/oapenv/oap_jars/shuffle-hadoop-<version>.jar
```

Enable the remote shuffle manager and specify the Hadoop storage system URI holding shuffle data.
Expand Down
12 changes: 12 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Security Policy

## Report a Vulnerability

Please report security issues or vulnerabilities to the [Intel® Security Center].

For more information on how Intel® works to resolve security issues, see
[Vulnerability Handling Guidelines].

[Intel® Security Center]:https://www.intel.com/security

[Vulnerability Handling Guidelines]:https://www.intel.com/content/www/us/en/security-center/vulnerability-handling-guidelines.html
49 changes: 23 additions & 26 deletions docs/OAP-Developer-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
This document contains the instructions & scripts on installing necessary dependencies and building OAP modules.
You can get more detailed information from OAP each module below.

* [SQL Index and Data Source Cache](https://github.com/oap-project/sql-ds-cache/blob/v1.1.0-spark-3.0.0/docs/Developer-Guide.md)
* [PMem Common](https://github.com/oap-project/pmem-common/tree/v1.1.0-spark-3.0.0)
* [PMem Spill](https://github.com/oap-project/pmem-spill/tree/v1.1.0-spark-3.0.0)
* [PMem Shuffle](https://github.com/oap-project/pmem-shuffle/tree/v1.1.0-spark-3.0.0#5-install-dependencies-for-pmem-shuffle)
* [Remote Shuffle](https://github.com/oap-project/remote-shuffle/tree/v1.1.0-spark-3.0.0)
* [OAP MLlib](https://github.com/oap-project/oap-mllib/tree/v1.1.0-spark-3.0.0)
* [Native SQL Engine](https://github.com/oap-project/native-sql-engine/tree/v1.1.0-spark-3.0.0)
* [SQL Index and Data Source Cache](https://github.com/oap-project/sql-ds-cache/blob/v1.2.0/docs/Developer-Guide.md)
* [PMem Common](https://github.com/oap-project/pmem-common/tree/v1.2.0)
* [PMem Spill](https://github.com/oap-project/pmem-spill/tree/v1.2.0)
* [PMem Shuffle](https://github.com/oap-project/pmem-shuffle/tree/v1.2.0#5-install-dependencies-for-pmem-shuffle)
* [Remote Shuffle](https://github.com/oap-project/remote-shuffle/tree/v1.2.0)
* [OAP MLlib](https://github.com/oap-project/oap-mllib/tree/v1.2.0)
* [Gazelle Plugin](https://github.com/oap-project/gazelle_plugin/tree/v1.2.0)

## Building OAP

Expand All @@ -22,45 +22,42 @@ We provide scripts to help automatically install dependencies required, please c
# cd oap-tools
# sh dev/install-compile-time-dependencies.sh
```
*Note*: oap-tools tag version `v1.1.0-spark-3.0.0` corresponds to all OAP modules' tag version `v1.1.0-spark-3.0.0`.
*Note*: oap-tools tag version `v1.2.0` corresponds to all OAP modules' tag version `v1.2.0`.

Then the dependencies below will be installed:

* [Cmake](https://help.directadmin.com/item.php?id=494)
* [Cmake](https://cmake.org/install/)
* [GCC > 7](https://gcc.gnu.org/wiki/InstallingGCC)
* [Memkind](https://github.com/memkind/memkind/tree/v1.10.1)
* [Vmemcache](https://github.com/pmem/vmemcache)
* [HPNL](https://github.com/Intel-bigdata/HPNL)
* [PMDK](https://github.com/pmem/pmdk)
* [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html)
* [Arrow](https://github.com/oap-project/arrow/tree/arrow-3.0.0-oap-1.1)
* [Arrow](https://github.com/oap-project/arrow/tree/v4.0.0-oap-1.2.0)
* [LLVM](https://llvm.org/)

Run the following command to learn more.

```
# sh dev/scripts/prepare_oap_env.sh --help
```

Run the following command to automatically install specific dependency such as Maven.

```
# sh dev/scripts/prepare_oap_env.sh --prepare_maven
```

- **Requirements for Shuffle Remote PMem Extension**
If enable Shuffle Remote PMem extension with RDMA, you can refer to [PMem Shuffle](https://github.com/oap-project/pmem-shuffle) to configure and validate RDMA in advance.

### Building

#### Building OAP

OAP is built with [Apache Maven](http://maven.apache.org/) and Oracle Java 8.

To build OAP package, run command below then you can find a tarball named `oap-$VERSION-bin-spark-$VERSION.tar.gz` under directory `$OAP_TOOLS_HOME/dev/release-package `.
To build OAP package, run command below then you can find a tarball named `oap-$VERSION-*.tar.gz` under directory `$OAP_TOOLS_HOME/dev/release-package `, which contains all OAP module jars.
Change to `root` user, run

```
$ sh $OAP_TOOLS_HOME/dev/compile-oap.sh
# cd oap-tools
# sh dev/compile-oap.sh
```

Building specified OAP Module, such as `sql-ds-cache`, run:
#### Building OAP specific module

If you just want to build a specific OAP Module, such as `sql-ds-cache`, change to `root` user, then run:

```
$ sh $OAP_TOOLS_HOME/dev/compile-oap.sh --sql-ds-cache
# cd oap-tools
# sh dev/compile-oap.sh --component=sql-ds-cache
```
11 changes: 5 additions & 6 deletions docs/OAP-Installation-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,19 @@ To test your installation, run the command `conda list` in your terminal window
### Installing OAP

Create a Conda environment and install OAP Conda package.

```bash
$ conda create -n oapenv -y python=3.7
$ conda activate oapenv
$ conda install -c conda-forge -c intel -y oap=1.1.0
$ conda create -n oapenv -c conda-forge -c intel -y oap=1.2.0
```

Once finished steps above, you have completed OAP dependencies installation and OAP building, and will find built OAP jars under `$HOME/miniconda2/envs/oapenv/oap_jars`

Dependencies below are required by OAP and all of them are included in OAP Conda package, they will be automatically installed in your cluster when you Conda install OAP. Ensure you have activated environment which you created in the previous steps.

- [Arrow](https://github.com/Intel-bigdata/arrow)
- [Arrow](https://github.com/oap-project/arrow/tree/v4.0.0-oap-1.2.0)
- [Plasma](http://arrow.apache.org/blog/2017/08/08/plasma-in-memory-object-store/)
- [Memkind](https://anaconda.org/intel/memkind)
- [Vmemcache](https://anaconda.org/intel/vmemcache)
- [Memkind](https://github.com/memkind/memkind/tree/v1.10.1)
- [Vmemcache](https://github.com/pmem/vmemcache.git)
- [HPNL](https://anaconda.org/intel/hpnl)
- [PMDK](https://github.com/pmem/pmdk)
- [OneAPI](https://software.intel.com/content/www/us/en/develop/tools/oneapi.html)
Expand Down
17 changes: 9 additions & 8 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@

<groupId>com.intel.oap</groupId>
<artifactId>remote-shuffle-parent</artifactId>
<version>1.2.0</version>
<version>2.2.0-SNAPSHOT</version>
<name>OAP Remote Shuffle Parent POM</name>
<packaging>pom</packaging>

<properties>
<scala.version>2.12.10</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<scala.version>2.12.10</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<java.version>1.8</java.version>
<maven.compiler.source>${java.version}</maven.compiler.source>
<maven.compiler.target>${java.version}</maven.compiler.target>
<spark.version>3.0.0</spark.version>
<spark.version>3.3.3</spark.version>
</properties>

<build>
Expand Down Expand Up @@ -214,8 +214,9 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
Expand All @@ -225,13 +226,13 @@
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.12</artifactId>
<version>3.0.8</version>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<version>3.2.3</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<classifier>tests</classifier>
<type>test-jar</type>
Expand Down
2 changes: 1 addition & 1 deletion shuffle-daos/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Remote Shuffle Based on DAOS Object API

A remote shuffle plugin based on DAOS Object API. You can find DAOS and DAOS Java Wrapper in https://github.com/daos-stack/daos and https://github.com/daos-stack/daos/tree/master/src/client/java.
Thanks to DAOS, the plugin is espacially good for small shuffle block, such as around 200KB.
Thanks to DAOS, the plugin is especially good for small shuffle block, such as around 200KB.

See Shuffle DAOS related documentation in [Readme under project root](../README.md).
55 changes: 49 additions & 6 deletions shuffle-daos/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,14 @@
<parent>
<groupId>com.intel.oap</groupId>
<artifactId>remote-shuffle-parent</artifactId>
<version>1.2.0</version>
<version>2.2.0-SNAPSHOT</version>
</parent>
<artifactId>shuffle-daos</artifactId>
<artifactId>remote-shuffle-daos</artifactId>
<name>OAP Remote Shuffle Based on DAOS Object API</name>
<packaging>jar</packaging>

<build>
<finalName>${project.artifactId}-${project.version}-with-spark-${spark.version}</finalName>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
Expand Down Expand Up @@ -114,6 +115,42 @@
<target>${java.version}</target>
</configuration>
</plugin>
<plugin>
<artifactId>maven-shade-plugin</artifactId>
<executions>
<execution>
<id>shade-netty4</id>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
<resource>META-INF/org.apache.hadoop.fs.FileSystem</resource>
</transformer>
</transformers>
<relocations>
<relocation>
<pattern>io.netty.buffer</pattern>
<shadedPattern>io.netty.buffershade4</shadedPattern>
</relocation>
<relocation>
<pattern>io.netty.util</pattern>
<shadedPattern>io.netty.utilshade4</shadedPattern>
</relocation>
</relocations>
<artifactSet>
<includes>
<include></include>
</includes>
</artifactSet>
<finalName>${artifactId}-${version}-with-spark-${spark.version}-netty4-txf</finalName>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
Expand Down Expand Up @@ -200,12 +237,12 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
</dependency>
<dependency>
<groupId>io.daos</groupId>
<artifactId>daos-java</artifactId>
<version>1.2.2</version>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>junit</groupId>
Expand All @@ -214,12 +251,12 @@
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.12</artifactId>
<artifactId>scalatest_${scala.binary.version}</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
<classifier>tests</classifier>
<type>test-jar</type>
<scope>test</scope>
Expand All @@ -241,4 +278,10 @@
</dependency>
</dependencies>

<repositories>
<repository>
<id>maven-snapshots</id>
<url>http://oss.sonatype.org/content/repositories/snapshots</url>
</repository>
</repositories>
</project>
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,10 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.concurrent.*;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.Executor;
import java.util.concurrent.LinkedBlockingQueue;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.atomic.AtomicInteger;

/**
Expand Down
Loading
Loading