Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs related to new Jar name #729

Merged
merged 3 commits into from
Oct 9, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ TPCH timing results is written to stdout in the following form: `TPCH_Result,<la

## CSharp
1. Ensure that the Microsoft.Spark.Worker is properly [installed](../deployment/README.md#cloud-deployment) in your cluster.
2. Build `microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar` and the [CSharp Tpch benchmark](csharp/Tpch) application by following the [build instructions](../README.md#building-from-source).
3. Upload [run_csharp_benchmark.sh](run_csharp_benchmark.sh), the Tpch benchmark application, and `microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar` to the cluster.
2. Build `microsoft-spark-<version>.jar` and the [CSharp Tpch benchmark](csharp/Tpch) application by following the [build instructions](../README.md#building-from-source).
3. Upload [run_csharp_benchmark.sh](run_csharp_benchmark.sh), the Tpch benchmark application, and `microsoft-spark-<version>.jar` to the cluster.
4. Run the benchmark by invoking:
```shell
run_csharp_benchmark.sh \
Expand All @@ -53,7 +53,7 @@ TPCH timing results is written to stdout in the following form: `TPCH_Result,<la
<executor_memory> \
<executor_cores> \
</path/to/Tpch.dll> \
</path/to/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar> \
</path/to/microsoft-spark-<version>.jar> \
</path/to/Tpch executable> \
</path/to/dataset> \
<number of iterations> \
Expand Down
14 changes: 7 additions & 7 deletions deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Microsoft.Spark.Worker is a backend component that lives on the individual worke
foo@bar:~/path/to/app/bin/Release/netcoreapp2.1/ubuntu.16.04-x64/publish$ zip -r <your app>.zip .
```
4. Upload the following to a distributed file system (e.g., HDFS, WASB, ADLS, S3, DBFS) that your cluster has access to:
* `microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar` (Included as part of the [Microsoft.Spark](https://www.nuget.org/packages/Microsoft.Spark/) nuget and is colocated in your app's build output directory)
* `microsoft-spark-<version>.jar` (Included as part of the [Microsoft.Spark](https://www.nuget.org/packages/Microsoft.Spark/) nuget and is colocated in your app's build output directory)
* `<your app>.zip`
* Files (e.g., dependency files, common data accessible to every worker) or Assemblies (e.g., DLLs that contain your user-defined functions, libraries that your `app` depends on) to be placed in the working directory of each executor.

Expand Down Expand Up @@ -90,7 +90,7 @@ The following captures the setting for a HDInsight Script Action:
--master yarn \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--files <comma-separated list of assemblies that contain UDF definitions, if any> \
adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \
adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<version>.jar \
adl://<cluster name>.azuredatalakestore.net/<some dir>/<your app>.zip <your app> <app arg 1> <app arg 2> ... <app arg n>
```

Expand All @@ -104,7 +104,7 @@ foo@bar:~$ curl -k -v -X POST "https://<your spark cluster>.azurehdinsight.net/l
-H "X-Requested-By: <hdinsight username>" \
-d @- << EOF
{
"file":"adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar",
"file":"adl://<cluster name>.azuredatalakestore.net/<some dir>/microsoft-spark-<version>.jar",
"className":"org.apache.spark.deploy.dotnet.DotnetRunner",
"files":["adl://<cluster name>.azuredatalakestore.net/<some dir>/<udf assembly>", "adl://<cluster name>.azuredatalakestore.net/<some dir>/<file>"],
"args":["adl://<cluster name>.azuredatalakestore.net/<some dir>/<your app>.zip","<your app>","<app arg 1>","<app arg 2>,"...","<app arg n>"]
Expand Down Expand Up @@ -144,7 +144,7 @@ foo@bar:~$ aws emr create-cluster \
--master yarn \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--files <comma-separated list of assemblies that contain UDF definitions, if any> \
s3://mybucket/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar \
s3://mybucket/<some dir>/microsoft-spark-<version>.jar \
s3://mybucket/<some dir>/<your app>.zip <your app> <app args>
```

Expand All @@ -154,7 +154,7 @@ Amazon EMR Steps can be used to submit jobs to the Spark framework installed on
# For example, you can run the following on Linux using `aws` cli.
foo@bar:~$ aws emr add-steps \
--cluster-id j-xxxxxxxxxxxxx \
--steps Type=spark,Name="Spark Program",Args=[--master,yarn,--files,s3://mybucket/<some dir>/<udf assembly>,--class,org.apache.spark.deploy.dotnet.DotnetRunner,s3://mybucket/<some dir>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar,s3://mybucket/<some dir>/<your app>.zip,<your app>,<app arg 1>,<app arg 2>,...,<app arg n>],ActionOnFailure=CONTINUE
--steps Type=spark,Name="Spark Program",Args=[--master,yarn,--files,s3://mybucket/<some dir>/<udf assembly>,--class,org.apache.spark.deploy.dotnet.DotnetRunner,s3://mybucket/<some dir>/microsoft-spark-<version>.jar,s3://mybucket/<some dir>/<your app>.zip,<your app>,<app arg 1>,<app arg 2>,...,<app arg n>],ActionOnFailure=CONTINUE
```

## Databricks
Expand Down Expand Up @@ -192,7 +192,7 @@ Databricks allows you to submit Spark .NET apps to an existing active cluster or

One-time Setup:
1. Go to your Databricks cluster -> Jobs (on the left-side menu) -> Set JAR
2. Upload the appropriate `microsoft-spark-<spark-version>-<spark-dotnet-version>.jar`
2. Upload the appropriate `microsoft-spark-<version>.jar`
3. Set the params appropriately:
```
Main Class: org.apache.spark.deploy.dotnet.DotnetRunner
Expand Down Expand Up @@ -231,5 +231,5 @@ Publishing your App & Running:
1. [Create a Job](https://docs.databricks.com/user-guide/jobs.html) and select *Configure spark-submit*.
2. Configure `spark-submit` with the following parameters:
```shell
["--files","/dbfs/<path-to>/<app assembly/file to deploy to worker>","--class","org.apache.spark.deploy.dotnet.DotnetRunner","/dbfs/<path-to>/microsoft-spark-<spark_majorversion.spark_minorversion.x>-<spark_dotnet_version>.jar","/dbfs/<path-to>/<app name>.zip","<app name>","app arg1","app arg2"]
["--files","/dbfs/<path-to>/<app assembly/file to deploy to worker>","--class","org.apache.spark.deploy.dotnet.DotnetRunner","/dbfs/<path-to>/microsoft-spark-<version>.jar","/dbfs/<path-to>/<app name>.zip","<app name>","app arg1","app arg2"]
```
5 changes: 3 additions & 2 deletions docs/building/ubuntu-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,9 @@ cd src/scala
mvn clean package
```
You should see JARs created for the supported Spark versions:
* `microsoft-spark-2.3.x/target/microsoft-spark-2.3.x-<version>.jar`
* `microsoft-spark-2.4.x/target/microsoft-spark-2.4.x-<version>.jar`
* `microsoft-spark-2-3/target/microsoft-spark-2-3_2.11-<version>.jar`
* `microsoft-spark-2-4/target/microsoft-spark-2-4_2.11-<version>.jar`
* `microsoft-spark-3-0/target/microsoft-spark-3-0_2.12-<version>.jar`

## Building .NET Sample Applications using .NET Core CLI

Expand Down
5 changes: 3 additions & 2 deletions docs/building/windows-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,9 @@ cd src\scala
mvn clean package
```
You should see JARs created for the supported Spark versions:
* `microsoft-spark-2.3.x\target\microsoft-spark-2.3.x-<version>.jar`
* `microsoft-spark-2.4.x\target\microsoft-spark-2.4.x-<version>.jar`
* `microsoft-spark-2-3/target/microsoft-spark-2-3_2.11-<version>.jar`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use \ since this is windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Updated

* `microsoft-spark-2-4/target/microsoft-spark-2-4_2.11-<version>.jar`
* `microsoft-spark-3-0/target/microsoft-spark-3-0_2.12-<version>.jar`

## Building .NET Samples Application

Expand Down
2 changes: 1 addition & 1 deletion docs/deploy-worker-udf-binaries.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,6 @@ spark-submit \
--conf spark.yarn.appMasterEnv.DOTNET_WORKER_DIR=./worker/Microsoft.Spark.Worker-<version> \
--conf spark.yarn.appMasterEnv.DOTNET_ASSEMBLY_SEARCH_PATHS=./udfs \
--archives hdfs://<path to your files>/Microsoft.Spark.Worker.net461.win-x64-<version>.zip#worker,hdfs://<path to your files>/mySparkApp.zip#udfs \
hdfs://<path to jar file>/microsoft-spark-2.4.x-<version>.jar \
hdfs://<path to jar file>/microsoft-spark-<version>.jar \
hdfs://<path to your files>/mySparkApp.zip mySparkApp
```
2 changes: 1 addition & 1 deletion docs/getting-started/macos-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ These instructions will show you how to run a .NET for Apache Spark app using .N
spark-submit \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--master local \
microsoft-spark-2.4.x-<version>.jar \
microsoft-spark-<version>.jar \
dotnet HelloSpark.dll
```
**Note**: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use `spark-submit`, otherwise, you would have to use the full path (e.g., `~/spark/bin/spark-submit`).
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/ubuntu-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ For detailed instructions, you can see [Building .NET for Apache Spark from Sour
spark-submit \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--master local \
microsoft-spark-2.4.x-<version>.jar \
microsoft-spark-<version>.jar \
dotnet HelloSpark.dll
```
**Note**: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use `spark-submit`, otherwise, you would have to use the full path (e.g., `~/spark/bin/spark-submit`). For detailed instructions, you can see [Building .NET for Apache Spark from Source on Ubuntu](../building/ubuntu-instructions.md).
Expand Down
2 changes: 1 addition & 1 deletion docs/getting-started/windows-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ For detailed instructions, you can see [Building .NET for Apache Spark from Sour
spark-submit `
--class org.apache.spark.deploy.dotnet.DotnetRunner `
--master local `
microsoft-spark-2.4.x-<version>.jar `
microsoft-spark-<version>.jar `
dotnet HelloSpark.dll
```
**Note**: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use `spark-submit`, otherwise, you would have to use the full path (e.g., `c:\bin\apache-spark\bin\spark-submit`). For detailed instructions, you can see [Building .NET for Apache Spark from Source on Windows](../building/windows-instructions.md).
Expand Down