-
Notifications
You must be signed in to change notification settings - Fork 170
Migration Guide
This page provides information on how to use QFS with Apache Hadoop.
With QFS 1.0.1, we have simplified the integration of QFS with Apache Hadoop deployment. One only needs to copy the Hadoop QFS plugin jar and set the java library path in order to use QFS as the backing store for Hadoop.
You can obtain the Hadoop QFS plugin jar in one of two ways:
-
If there is a QFS binary tarball for your platform, then the tarball already has the jars and native libraries.
Obtain the QFS tarball, say,
$QFSTAR.tgz $ tar -xvzf $QFSTAR.tgz && cd$QFSTAR $ ls -1 lib/hadoop*.jar lib/hadoop-0.23.4-qfs-1.0.1.jar lib/hadoop-1.0.2-qfs-1.0.1.jar lib/hadoop-1.0.4-qfs-1.0.1.jar lib/hadoop-1.1.0-qfs-1.0.1.jar lib/hadoop-2.0.2-alpha-qfs-1.0.1.jar -
If there is no pre-built tarball for your platform, you could obtain the QFS source and build the tarball yourself. As long as you have the pre-requisite packages in your system (see Developer Documentation all you need to do is to run
make tarball
from the QFS source directory, and this will produce the $QFSTAR.tgz in the build directory.
When the Hadoop QFS jar is in your class path and the QFS native libraries are loadable, accessing QFS from Hadoop is as simple as,
$ cd ${HADOOP_HOME}
$ bin/hadoop fs -Dfs.qfs.impl=com.quantcast.qfs.hadoop.QuantcastFileSystem \
-Dfs.default.name=qfs://localhost:20000 \
-Dfs.qfs.metaServerHost=localhost \
-Dfs.qfs.metaServerPort=20000 \
-ls /
In the example above, the sample QFS metaserver (see Getting Started)
listens on port 20000 on localhost
.
-
To add Hadoop QFS jar to your classpath, either copy the
**hadoop-<xxx>-qfs-<yyy>.jar**
to$HADOOP_HOME/lib/
or add the absolute path of the jar file to$HADOOP_HOME/conf/hadoop-env.sh
asexport HADOOP_CLASSPATH=</absolute/path/of/the.jar>
. -
To make the QFS native libraries loadable, you could set
JAVA_LIBRARY_PATH
orjava.library.path
to${QFSTAR}/lib/
(for Hadoop 1.x.x versions you may need to copy${QFSTAR}/lib/libqfs_*
to${HADOOP_HOME}/lib/native/<your platform>/
, and also, edit${HADOOP_HOME}/conf/hadoop-env.sh
file to add the lineexport LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native/<your-platform>
).
Once these are set, you could access QFS from Hadoop using qfs://
URIs by
setting fs.qfs.impl
to com.quantcast.qfs.hadoop.QuantcastFileSystem
and
fs.default.name
(or fs.defaultFS
, in newer Hadoop versions) to
qfs://<qfs-meta-host>:<qfs-meta-port>
, as shown in the example above.
If you have existing data in HDFS that you want to copy to QFS in order to use
QFS as your backing store, you could run a distributed copy provided by Apache
Hadoop. In the following example, namehost:8020
is the host name and port
number of the namenode of an HDFS instance and metahost:20000
is the
corresponding location of a QFS metaserver.\
$ cd ${HADOOP_HOME}
$ bin/hadoop distcp -Dfs.qfs.impl=com.quantcast.qfs.hadoop.QuantcastFileSystem \
-Dfs.default.name=qfs://localhost:20000 \
-Dfs.qfs.metaServerHost=localhost \
-Dfs.qfs.metaServerPort=20000 \
hdfs://localhost:8020/hdfs_dir/70MFile qfs://localhost:20000/qfs_dir/70Mcopy
Note that this is a map-reduce job, and therefore there should be job trackers and task trackers available to do the distibuted copy.
If you want to submit a job to Apache Hadoop that would use QFS, you could follow this example:
$ cd ${HADOOP_HOME}
$ bin/hadoop jar hadoop-examples-1.0.3.jar randomwriter \
-Dfs.qfs.impl=com.quantcast.qfs.hadoop.QuantcastFileSystem \
-Dfs.default.name=qfs://metahost:20000 \
-Dfs.qfs.metaServerHost=metahost \
-Dfs.qfs.metaServerPort=20000 \
/tmp/randomOut