Skip to content

bigabid/setup-spark

 
 

Repository files navigation

setup-spark ✨

Run setup-spark action CodeQL analysis

This action sets up Apache Spark in your environment for use in GitHub Actions by:

  • installing and adding spark-submit and spark-shell to PATH
  • setting SPARK_HOME, PYSPARK_PYTHON environment variable (and others) in the workflow

This enables to test applications using a local Spark context in GitHub Actions.

Usage

You will need to setup Python and Java in the job before setting up Spark

Check for the latest Spark versions at https://spark.apache.org/downloads.html

Basic workflow:

steps:
- uses: actions/setup-python@v2
  with:
    python-version: '3.8'
- uses: actions/setup-java@v1
  with:
    java-version: '11'

- uses: vemonet/setup-spark@v1
  with:
    spark-version: '3.1.2'
    hadoop-version: '3.2'

- run: spark-submit --version

See the action.yml file for a complete rundown of the available parameters.

You can also provide a specific URL to download the Spark .tgz:

- uses: vemonet/setup-spark@v1
  with:
    spark-version: '3.1.1'
    hadoop-version: '3.2'
    spark-url: 'https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz'

Available versions

The Hadoop versions stay quite stable (latest is 3.2)

Check for the latest Spark versions at https://spark.apache.org/downloads.html

The setup-spark action is tested in .github/workflows/test-setup-spark.yml for:

  • Apache Spark versions 3.0.2, 3.0.3, 3.1.1, 3.1.2
  • Hadoop version 3.2
  • Ubuntu runner

License

The scripts and documentation in this project are released under the MIT License.

Contributions

Contributions are welcome! Feel free to test other Spark versions, and submit issues, or pull requests.

See the contributor's guide for more details.

Packages

No packages published

Languages

  • TypeScript 81.9%
  • Python 12.2%
  • JavaScript 5.9%