Installs and Configures Hadoop on Ubuntu with hadoop installed(using official .deb package).
This cookbook is part of HadoopStack. In order to skip the time required in installation of Hadoop on Instances, we decided to use an image with Hadoop pre-installed. This cookbook currently supports Ubuntu with Hadoop pre-installed from official .deb package.
Key | Type | Description | Default |
['hadoop']['mapred_user'] | String | User on behalf of whom job/tasktracker daemons will run | mapred |
['hadoop']['hdfs_user'] | String | User on behalf of whom name/datanodes daemons will run | hdfs |
['hadoop']['group'] | String | A common system group for hadoop daemons | hadoop |
['hadoop']['jobtracker'] | String | IP of jobtracker | |
['hadoop']['namenode'] | String | IP of namenode | |
['hdfs_replication'] | Integer | Replication Factor | 2 |
['hadoop']['dfs_dir'] | String | Parent directory of Namenode/Datanode dir | /mnt/dfs |
['hadoop']['namenode_dir'] | String | Namenode Directory | /mnt/dfs/nn |
['hadoop']['datanode_dir'] | String | Datanode Directory | /mnt/dfs/dn |
['hadoop']['mapred_local_dir'] | String | Mapred local directory | /mnt/mapred/local |
['hadoop']['mapred_system_dir'] | String | Mapred system directory | /mnt/mapred/system |
['hadoop']['log_dir'] | String | Log directory for Hadoop daemons | /mnt/log/hadoop |
['hadoop']['pid_dir'] | String | PID directory for Hadoop Daemons | /var/run/hadoop |
['hadoop']['role'] | String | Hadoop Role for the Instance |
Create roles for appropriate services - jobtracker, tasktracker, namenode and datanode. Update the run_list and set at least two attributes - ['hadoop']['namenode'] and ['hadoop']['jobtracker'].
If its traditional HDFS
name "jobtracker"
description "Role to initiate jobtracker"
run_list [
default_attributes("hadoop" => {
"jobtracker" => <jobtracker_ip>,
"namenode" => <namenode_ip>,
"role" => "jobtracker"
If you are using S3 as storage backend.
name "tasktracker"
description "Role to initiate tasktracker"
run_list [
default_attributes("hadoop" => {
"jobtracker" => <jobtracker_ip>,
"namenode" => <namenode_ip>,
"role" => "tasktracker"
"dfs" => {
"uri" => "s3://"
"s3" => {
"bucket" => <bucket_name>
The default recipe creates configuration files
- core-site.xml
- mapred-site.xml
- hdfs-site.xml
in /etc/hadoop directory using erb templates available in templates/.
This recipe is included in default and is used to create and set appropriate permissions for hadoop directories.
This recipe enables and starts jobtracker service.
This recipe enables and starts tasktracker service.
- Fork the repository on Github
- Create a named feature branch (like
) - Write you change
- Test it thoroughly
- Submit a Pull Request using Github
Authors: Shashank Sahni