#Zeppelin
Documentation: User Guide
Mailing Lists: User and Dev mailing list
Continuous Integration:
Contributing: Contribution Guide
Issue Tracker: Jira
License: Apache 2.0
Zeppelin, a web-based notebook that enables interactive data analytics. You can make beautiful data-driven, interactive and collaborative documents with SQL, Scala and more.
Core feature:
- Web based notebook style editor.
- Built-in Apache Spark support
To know more about Zeppelin, visit our web site http://zeppelin.incubator.apache.org
- Git
- Java 1.7
- Tested on Mac OSX, Ubuntu 14.X, CentOS 6.X, Windows 7 Pro SP1
- Maven (if you want to build from the source code)
- Node.js Package Manager (npm, downloaded by Maven during build phase)
If you don't have requirements prepared, install it. (The installation method may vary according to your environment, example is for Ubuntu.)
sudo apt-get update
sudo apt-get install git
sudo apt-get install openjdk-7-jdk
sudo apt-get install npm
sudo apt-get install libfontconfig
If you are behind a corporate Proxy with NTLM authentication you can use Cntlm Authentication Proxy .
Before build start, run these commands from shell.
export http_proxy=http://localhost:3128
export https_proxy=http://localhost:3128
export HTTP_PROXY=http://localhost:3128
export HTTPS_PROXY=http://localhost:3128
npm config set proxy http://localhost:3128
npm config set https-proxy http://localhost:3128
npm config set registry "http://registry.npmjs.org/"
npm config set strict-ssl false
npm cache clean
git config --global http.proxy http://localhost:3128
git config --global https.proxy http://localhost:3128
git config --global url."http://".insteadOf git://
After build is complete, run these commands to cleanup.
npm config rm proxy
npm config rm https-proxy
git config --global --unset http.proxy
git config --global --unset https.proxy
git config --global --unset url."http://".insteadOf
Notes:
- If you are on Windows replace
export
withset
to set env variables - Replace
localhost:3128
with standard patternhttp://user:pwd@host:port
- Git configuration is needed because Bower use it for fetching from GitHub
wget http://www.eu.apache.org/dist/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz
sudo tar -zxf apache-maven-3.3.3-bin.tar.gz -C /usr/local/
sudo ln -s /usr/local/apache-maven-3.3.3/bin/mvn /usr/local/bin/mvn
Notes:
- Ensure node is installed by running
node --version
- Ensure maven is running version 3.1.x or higher with
mvn -version
- Configure maven to use more memory than usual by
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=1024m"
If you want to build Zeppelin from the source, please first clone this repository, then:
mvn clean package -DskipTests [Options]
Each Interpreter requires different Options.
To build with a specific Spark version, Hadoop version or specific features, define one or more of the following profiles and options:
Set spark major version
Available profiles are
-Pspark-1.6
-Pspark-1.5
-Pspark-1.4
-Pspark-1.3
-Pspark-1.2
-Pspark-1.1
-Pcassandra-spark-1.5
-Pcassandra-spark-1.4
-Pcassandra-spark-1.3
-Pcassandra-spark-1.2
-Pcassandra-spark-1.1
minor version can be adjusted by -Dspark.version=x.x.x
set hadoop major version
Available profiles are
-Phadoop-0.23
-Phadoop-1
-Phadoop-2.2
-Phadoop-2.3
-Phadoop-2.4
-Phadoop-2.6
minor version can be adjusted by -Dhadoop.version=x.x.x
enable YARN support for local mode
YARN for local mode is not supported for Spark v1.5.0 or higher. Set
SPARK_HOME
instead.
enable PySpark support for local mode.
enable R support with SparkR integration.
another R support with SparkR integration as well as local mode support.
enable 3rd party vendor repository (cloudera)
For the MapR Hadoop Distribution, these profiles will handle the Hadoop version. As MapR allows different versions of Spark to be installed, you should specify which version of Spark is installed on the cluster by adding a Spark profile (-Pspark-1.2
, -Pspark-1.3
, etc.) as needed.
For Hive, check the hive/pom.xml and adjust the version installed as well. The correct Maven
artifacts can be found for every version of MapR at http://doc.mapr.com
Available profiles are
-Pmapr3
-Pmapr40
-Pmapr41
-Pmapr50
-Pmapr51
Here're some examples:
# basic build
mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark
# spark-cassandra integration
mvn clean package -Pcassandra-spark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
# with CDH
mvn clean package -Pspark-1.5 -Dhadoop.version=2.6.0-cdh5.5.0 -Phadoop-2.6 -Pvendor-repo -DskipTests
# with MapR
mvn clean package -Pspark-1.5 -Pmapr50 -DskipTests
mvn clean package -Dignite.version=1.1.0-incubating -DskipTests
mvn clean package -Pscalding -DskipTests
If you wish to configure Zeppelin option (like port number), configure the following files:
./conf/zeppelin-env.sh
./conf/zeppelin-site.xml
(You can copy ./conf/zeppelin-env.sh.template
into ./conf/zeppelin-env.sh
.
Same for zeppelin-site.xml
.)
Without SPARK_HOME
and HADOOP_HOME
, Zeppelin uses embedded Spark and Hadoop binaries that you have specified with mvn build option.
If you want to use system provided Spark and Hadoop, export SPARK_HOME
and HADOOP_HOME
in zeppelin-env.sh
.
You can use any supported version of spark without rebuilding Zeppelin.
# ./conf/zeppelin-env.sh
export SPARK_HOME=...
export HADOOP_HOME=...
Mesos
# ./conf/zeppelin-env.sh
export MASTER=mesos://...
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.uri=/path/to/spark-*.tgz" or SPARK_HOME="/path/to/spark_home"
export MESOS_NATIVE_LIBRARY=/path/to/libmesos.so
If you set SPARK_HOME
, you should deploy spark binary on the same location to all worker nodes. And if you set spark.executor.uri
, every worker can read that file on its node.
Yarn
# ./conf/zeppelin-env.sh
export SPARK_HOME=/path/to/spark_dir
./bin/zeppelin-daemon.sh start
And browse localhost:8080 in your browser.
For configuration details check ./conf
subdirectory.
To package the final distribution including the compressed archive, run:
mvn clean package -Pbuild-distr
To build a distribution with specific profiles, run:
mvn clean package -Pbuild-distr -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark
The profiles -Pspark-1.5 -Phadoop-2.4 -Pyarn -Ppyspark
can be adjusted if you wish to build to a specific spark versions, or omit support such as yarn
.
The archive is generated under zeppelin-distribution/target
directory
###Run end-to-end tests Zeppelin comes with a set of end-to-end acceptance tests driving headless selenium browser
# assumes zeppelin-server running on localhost:8080 (use -Durl=.. to override)
mvn verify
# or take care of starting/stoping zeppelin-server from packaged zeppelin-distribuion/target
mvn verify -P using-packaged-distr