-
Notifications
You must be signed in to change notification settings - Fork 61
Alluxio
Tachyon is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, thereby avoiding going to disk to load datasets that are frequently read. This enables different jobs/queries and frameworks to access cached files at memory speed.
Tachyon is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without any code change. The project is open source (Apache License 2.0) and is deployed at multiple companies. It has more than 80 contributors from over 30 institutions, including Yahoo,Intel, Red Hat, and Tachyon Nexus. The project is the storage layer of the Berkeley Data Analytics Stack (BDAS) and also part of the Fedora distribution.
Tachyon will be installed on the master and all the workers. For this simple installation, the configurations between the master and the workers are the same, so you can either use broadcast input in iTerm or simply rsync/copy the configuration file from the master to the other worker nodes
Run the following on the master and all workers by SSH-ing into each node:
Install java-development-kit
master-worker-node$ sudo apt-get update
master-worker-node$ sudo apt-get install openjdk-7-jdk
Install Tachyon
master-worker-node$ wget https://github.com/amplab/tachyon/releases/download/v0.7.1/tachyon-0.7.1-bin.tar.gz -P ~/Downloads
master-worker-node$ sudo tar zxvf ~/Downloads/tachyon-* -C /usr/local
master-worker-node$ sudo mv /usr/local/tachyon-* /usr/local/tachyon
Change ownership of the Tachyon directory
master-worker-node$ sudo chown -R ubuntu /usr/local/tachyon
Set the TACHYON_HOME environment variable and add to PATH in .profile
master-worker-node$ sudo nano ~/.profile
Add the following to ~/.profile and source it
export TACHYON_HOME=/usr/local/TACHYON
export PATH=$PATH:$TACHYON_HOME/bin
master-worker-node$ . ~/.profile
Set the TACHYON_MASTER_ADDRESS in tachyon-env
master-worker-node$ cp $TACHYON_HOME/conf/tachyon-env.sh.template $TACHYON_HOME/conf/tachyon-env.sh
master-worker-node$ nano $TACHYON_HOME/conf/tachyon-env.sh
Locate the following lines and change the TACHYON_MASTER_ADDRESS to the Master node’s hostname e.g. ip-172-31-239
...
export JAVA="$JAVA_HOME/bin/java"
export TACHYON_MASTER_ADDRESS=<master-hostname>
export TACHYON_UNDERFS_ADDRESS=$TACHYON_HOME/underFSStorage
...
Place worker hostnames into the workers file under $TACHYON_HOME/conf
master-worker-node$ nano $TACHYON_HOME/conf/workers
By default localhost is the only one in the file. Remove this before placing the worker hostnames
e.g. with 3 workers
ip-172-31-240
ip-172-31-241
ip-172-31-242
SSH into the master node and run the following SSH into master node
localhost$ ssh -i ~/.ssh/personal_aws.pem ubuntu@master-public-dns
Format Tachyon
master-node$ $TACHYON_HOME/bin/tachyon format
Start Tachyon on master
master-node$ $TACHYON_HOME/bin/tachyon-start.sh all SudoMount
You can check if your standalone cluster is up and running by checking the WebUI at master-public-dns:19999. The webpage should look like the following. Be sure the number of workers available matches what you expect. The example below is a cluster of 4 nodes with 3 acting as workers.
You can check how much memory each worker is allocating under the Workers tab
Find out more about the Insight Data Engineering Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.
You can also read our engineering blog here.