This directory contains a Apache Hadoop MapReduce InputFormat/OutputFormat implementation for TensorFlow's TFRecords format. This can also be used with Apache Spark.
-
Tested with Hadoop 2.6.0. Patches are welcome if there are incompatibilities with your Hadoop version.
- 08/20/2018 - Reverted artifactId back to
org.tensorflow.tensorflow-hadoop
- 05/29/2018 - Changed the artifactId from
org.tensorflow.tensorflow-hadoop
toorg.tensorflow.hadoop
-
Compile the code
mvn clean package
Alternatively, if you would like to build jars for a different version of TensorFlow, e.g., 1.5.0:
mvn versions:set -DnewVersion=1.5.0 mvn clean package
-
Optionally install (or deploy) the jars
mvn install
After installation (or deployment), the package can be used with the following dependency:
<dependency> <groupId>org.tensorflow</groupId> <artifactId>tensorflow-hadoop</artifactId> <version>1.10.0</version> </dependency>
The Hadoop MapReduce example can be found here.
The Spark-TensorFlow-Connector uses TensorFlow Hadoop to load and save TensorFlow's TFRecords format using Spark DataFrames.