DB2/DashDB Hi-Speed connector for Apache Spark

This project is not being actively mantained and has been archived.

DB2/DashDB Hi-Speed connector for Apache Spark

A library for fast loading and unloading of data between Apache Spark and DB2/DashDB.

Requirements

This library requires Apache Spark 1.6+, Apache Commons CSV library and DB2 JDBC driver.

What the package does

This package provides options for:

Persisting data from Apache Spark into DB2/DashDB at HiSpeed leveraging the DB2 LOAD utility.
Loading of data from DB2/DashDB into Apache Spark using parallel read from database partitions.

Build procedure

git clone https://github.com/SparkTC/spark-db2
cd spark-db2
sbt clean package

Scala API

Load data from DB2/DashDB into Apache Spark using DataFrames API

import com.ibm.spark.ibmdataserver.Constants
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}

val sparkContext = new SparkContext(conf)
val sqlContext = new SQLContext(sparkContext)
val df = sqlContext.read
    .format("com.ibm.spark.ibmdataserver")
    .option(Constants.JDBCURL, DB2_CONNECTION_URL) //Specify the JDBC connection URL
    .option(Constants.TABLE,tableName) //Specify the table from which to read
    .load()
df.show()

Persist data from Apache Spark DataFrames into DB2/DashDB

import com.ibm.spark.ibmdataserver.Constants
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}

val sparkContext = new SparkContext(conf)
val sqlContext = new SQLContext(sparkContext)

val df = sqlContext.read.json("/Users/sparkuser/data.json")
df.write
    .format("com.ibm.spark.ibmdataserver")
    .option(Constants.JDBCURL, DB2_CONNECTION_URL) //Specify the JDBC connection URL
    .option(Constants.TABLE,tableName) //Specify the table (will be created if not present) to which data is to be written
    .option(Constants.TMPPATH, tmpPath) //Temporary path to be used for generating intermediate files during processing [System tmp path will be used by default]
    .mode("Append")
    .save()

Java API

Load data from DB2 into Apache Spark using DataFrames API

import com.ibm.spark.ibmdataserver.Constants;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("write test");
SparkContext sparkContext = new SparkContext(conf);
SQLContext sqlContext = new SQLContext(sparkContext);

DataFrame df = sqlContext.read().format("com.ibm.spark.ibmdataserver")
                .option(Constants.JDBCURL(), DB2_CONNECTION_URL) //Specify the JDBC connection URL
                .option(Constants.TABLE(),tableName)
                .load();
df.show()

Persist data from Apache Spark DataFrames into DB2

import com.ibm.spark.ibmdataserver.Constants;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;

SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("write test");
SparkContext sparkContext = new SparkContext(conf);
SQLContext sqlContext = new SQLContext(sparkContext);

DataFrame df = sqlContext.read().json(path);
df.write()
    .format("com.ibm.spark.ibmdataserver") //Specify the table from which to read
    .option(Constants.JDBCURL(), DB2_CONNECTION_URL) //Specify the JDBC connection URL
    .option(Constants.TABLE(),tableName) //Specify the table (will be created if not present) to which data is to be written
    .option(Constants.TMPPATH(), tmpPath) //Temporary path to be used for generating intermediate files during processing [System tmp path will be used by default]
    .mode("Append")
    .save();

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
project		project
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DB2/DashDB Hi-Speed connector for Apache Spark

Requirements

What the package does

Build procedure

Scala API

Java API

About

Releases

Packages

Contributors 3

Languages

License

CODAIT/spark-db2

Folders and files

Latest commit

History

Repository files navigation

DB2/DashDB Hi-Speed connector for Apache Spark

Requirements

What the package does

Build procedure

Scala API

Java API

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages