Puppet module to install and manage components of a YARN installation of Cloudera's Distribution 4 (CDH4) for Apache Hadoop.
Installs HDFS, YARN MapReduce or MR1, hive, hbase, pig, sqoop, zookeeper, oozie and hue. Note that, in order for this module to work, you will have to ensure that:
- Sun JRE version 6 or greater is installed
- Your package manager is configured with a repository containing the Cloudera 4 packages. (See examples/cloudera_apt.pp)
The cdh4::hadoop::master
and cdh4::hadoop::worker
classes will
manage hadoop services.
Clone (or copy) this repository into your puppet modules/cdh4 directory:
git clone git://github.com/wikimedia/cloudera-cdh4-puppet.git modules/cdh4
Or you could also use a git submodule:
git submodule add git://github.com/wikimedia/cloudera-cdh4-puppet.git modules/cdh4
git commit -m 'Adding modules/cdh4 as a git submodule.'
The cdh4::apt_source
class will make the packages available from Cloudera's
apt repository. If you are installing on a different Linux, then you'll need
to make sure that the packages are available somehow.
include cdh4
class { "cdh4::hadoop::config":
namenode_hostname => "namenode.hostname.org",
mounts => [
"/var/lib/hadoop/data/a",
"/var/lib/hadoop/data/b",
"/var/lib/hadoop/data/c"
],
dfs_name_dir => ["/var/lib/hadoop/name", "/mnt/hadoop_name"],
}
This will ensure that CDH4 client packages are installed, and that Hadoop related config files are in place with proper settings.
The mounts parameter assumes that you want to keep your
dfs.datanode.data.dir
, yarn.nodemanager.local-dirs
, and
yarn.nodemanager.log-dirs
all as subdirectories in each of the mount
points provided.
include cdh4::hadoop::yarn::master
This installs and starts up the NameNode, ResourceManager and HistoryServer.
include cdh4::hadoop::yarn::worker
This installs and starts up the DataNode and NodeManager.
include cdh4::hadoop::mr1::master
This installs and starts up the NameNode, ResourceManager and HistoryServer.
include cdh4::hadoop::mr1::worker
And declare the hadoop::config with the option use_yarn = false
class { "cdh4::hadoop::config":
use_yarn => false
}
See examples/ for more ideas on how to use this module. examples/analytics.pp shows an organized way you could group and install the cdh4 classes.
This module was developed for Ubuntu 12.04 LTS. Since Cloudera's package names are consistent across Linuxes, much of this could work in other distributions.
Some small adjustments where made to make it work with CentOS.