Tools for working with Hadoop written with performance in mind.
This has been tested with the HDFS protocol used by CDH 5.x
See our latest release v1.0.1!
By default, hh
will behave the same as hdfs dfs
or hadoop fs
in
terms of which user name to use for HDFS, or which namenodes to use.
The default is to use your current unix username when accessing HDFS.
This can be overridden either by using the HADOOP_USER_NAME
environment variable:
# This trick also works with `hdfs dfs` and `hadoop fs`
export HADOOP_USER_NAME=amber
or by adding the following to your ~/.hh
configuration file:
hdfs {
user = "amber"
}
The default is to lookup the namenode configuration from
/etc/hadoop/conf/core-site.xml
and /etc/hadoop/conf/hdfs-site.xml
.
This can be overridden by adding the following to your ~/.hh
configuration file:
namenode {
host = "hostname or ip address"
}
or if you're using a non-standard namenode port:
namenode {
host = "hostname or ip address"
port = 7020 # defaults to 8020
}
NOTE: You cannot currently specify multiple namenodes using the ~/.hh
config file, but this would be easy to add. If you would like this
feature then please add an
issue.
Sometimes it can be convenient to access HDFS over a SOCKS proxy. The
easiest way to get this to work is to connect to a server which can
access the namenode using ssh <host> -D1080
. This sets up a SOCKS
proxy locally on port 1080
which can access everything that <host>
can access.
To get hh
to make use of this proxy, add the following to your ~/.hh
configuration file:
proxy {
host = "127.0.0.1"
port = 1080
}
In order to use Kerberos authentication you must supply information about
the principal
for both your user and your namenode. These are looked up
in /etc/hadoop/conf/core-site.xml
and /etc/hadoop/conf/hdfs-site.xml
by
default.
namenode {
principal = "hdfs/hostname@REALM.COM"
}
auth {
user = "username@REALM.COM"
}
If you don't provide an auth.user
field it will assume it is
hdfs.user@REALM.COM
, where REALM.COM
cames from the principal of
one of the namenodes.