Skip to content

Latest commit

 

History

History
36 lines (32 loc) · 1.41 KB

README.md

File metadata and controls

36 lines (32 loc) · 1.41 KB

Text2Hive

NOTE: This is meant as a learning exercise for me for Scala/sbt/Hadoop APIs.

Creates a Hive Table and copies the folder to HDFS based on the xml(s) passed via arguments. Meant to replicate how Hue lets you upload a text file and will create Hive Table.

Steps to Run -

  1. sbt assembly
  2. java -jar Text2Hive-assembly-0.1.jar settings.xml

Example XML Config file

<?xml version="1.0" encoding="UTF-8"?>
<config>
  <coreXml>/etc/hadoop/conf/core-site.xml</coreXml>
  <hdfsXml>/etc/hadoop/conf/hdfs-site.xml</hdfsXml>
  <hiveXml>/etc/hive/conf/hive-site.xml</hiveXml>
  <userKeytab>/home/neil/neil.keytab</userKeytab>
  <src>/home/neil/movie_metadatav</src>
  <dest>/user/neil/movie_metadata</dest>
  <tableType>internal</tableType>
  <isHeaders>yes</isHeaders>
  <delimiter>,</delimiter>
  <quoteChar>"</quoteChar>
  <escapeChar>\</escapeChar>
  <dbTable>default.movie_metadata</dbTable>
  <thriftServer>localhost:10011/default</thriftServer>
</config>

Couple Notes:

  • src Folder is meant to Linux FS and Dest folder is HDFS
  • If tableType is internal then dest (HDFS) is ignored and it instead moves the folder to hive warehouse directory
  • dbTable requires both database and tablename
  • If isHeaders is yes then will assume the file(s) have the header in the first line of the file if not will create its own
  • If Kerberos is required please specify Keytabi
  • Files in src folder are all expected to be the same