Skip to content
Thamme Gowda edited this page Oct 27, 2021 · 3 revisions

Table of Contents

Requirements

  • Newer JDK (1.7+) : Tested on JDK 7 and 8; Unlikely to work on JDK 9+
  • Newer Maven (3.x+): Tested on Maven 3.3x
  • Working Internet Connection

List of modules

This project contains following modules

  • autoext - The core module
  • autoext-spark - the module for performing distributed clustering on Apache Spark
  • apted - A faster Tree Edit Distance (TED) Implementation
  • visuals - Web interface for visualizations

Build Profiles

  • Executable jar with all dependencies (default)
  • spark-submit - profile for packaging jar for submitting to spark vis spark-submit command

Building executable jar with all dependencies

  • go to the root of the project cd
  • mvn clean compile package
On success, the executable jar shall be at found at autoext-spark/target/autoext-spark-xx.jar

To run this jar : java -jar autoext-spark/target/autoext-spark-xx.jar

Building a jar for spark submission

  • mvn clean compile package -Pspark-submit
This build excludes spark and scala dependencies from package. On success, the jar can be found at autoext-spark/target/autoext-spark-xx-SNAPSHOT-submit-{spark.version}_{scala.version}.jar

To run this jar, use spark-submit command

For quick start tutorial visit Clustering tutorial

NOTE:

 If you have trouble compiling the project, make sure your version of JDK, Scala and Spark are compatible.