Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoodie operability with S3 #120

Merged
merged 3 commits into from
Mar 28, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Hoodie
Hoodie manages storage of large analytical datasets on [HDFS](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html) and serve them out via two types of tables

* **Read Optimized Table** - Provides excellent query performance via purely columnar storage (e.g. [Parquet](https://parquet.apache.org/))
Expand Down
3 changes: 3 additions & 0 deletions docs/configurations.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,4 +76,7 @@ summary: "Here we list all possible configurations and what they mean"
- [usePrefix](#usePrefix) () <br/>
<span style="color:grey">Standard prefix for all metrics</span>

- [S3Configs](s3_hoodie.html) (Hoodie S3 Configs) <br/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it s3_filessystem.html?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://idratherbewriting.com/documentation-theme-jekyll/index.html fyi is what we use.. you can run it locally via commands in there

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the permalink as -
permalink: s3_hoodie.html

I have tested locally on Jekyll 👍

<span style="color:grey">Configurations required for S3 and Hoodie co-operability.</span>

{% include callout.html content="Hoodie is a young project. A lot of pluggable interfaces and configurations to support diverse workloads need to be created. Get involved [here](https://github.com/uber/hoodie)" type="info" %}
61 changes: 61 additions & 0 deletions docs/s3_filesystem.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
---
title: S3 Filesystem (experimental)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great.. Can we link this to a section under configuration for filesystem config ?

keywords: sql hive s3 spark presto
sidebar: mydoc_sidebar
permalink: s3_hoodie.html
toc: false
summary: In this page, we go over how to configure hoodie with S3 filesystem.
---
Hoodie works with HDFS by default. There is an experimental work going on Hoodie-S3 compatibility.

## AWS configs

There are two configurations required for Hoodie-S3 compatibility:

- Adding AWS Credentials for Hoodie
- Adding required Jars to classpath

### AWS Credentials

Add the required configs in your core-site.xml from where Hoodie can fetch them. Replace the `fs.defaultFS` with your S3 bucket name and Hoodie should be able to read/write from the bucket.

```
<property>
<name>fs.defaultFS</name>
<value>s3://ysharma</value>
</property>

<property>
<name>fs.s3.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
</property>

<property>
<name>fs.s3.awsAccessKeyId</name>
<value>AWS_KEY</value>
</property>

<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>AWS_SECRET</value>
</property>

<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>AWS_KEY</value>
</property>

<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>AWS_SECRET</value>
</property>
```

### AWS Libs

AWS hadoop libraries to add to our classpath

- com.amazonaws:aws-java-sdk:1.10.34
- org.apache.hadoop:hadoop-aws:2.7.3


5 changes: 5 additions & 0 deletions hoodie-client/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,11 @@
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-core</artifactId>
</dependency>
<dependency>
<groupId>com.beust</groupId>
<artifactId>jcommander</artifactId>
<version>1.48</version>
</dependency>

<!-- Parent dependencies -->
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,10 @@ public class HoodieWrapperFileSystem extends FileSystem {
public static final String HOODIE_SCHEME_PREFIX = "hoodie-";

static {
SUPPORT_SCHEMES = new HashSet<>(2);
SUPPORT_SCHEMES = new HashSet<>();
SUPPORT_SCHEMES.add("file");
SUPPORT_SCHEMES.add("hdfs");
SUPPORT_SCHEMES.add("s3");
}

private ConcurrentMap<String, SizeAwareFSDataOutputStream> openStreams =
Expand Down
32 changes: 22 additions & 10 deletions hoodie-client/src/test/java/HoodieClientExample.java
Original file line number Diff line number Diff line change
Expand Up @@ -14,15 +14,17 @@
* limitations under the License.
*/


import com.beust.jcommander.JCommander;
import com.beust.jcommander.Parameter;
import com.uber.hoodie.HoodieWriteClient;
import com.uber.hoodie.common.table.HoodieTableMetaClient;
import com.uber.hoodie.common.util.FSUtils;
import com.uber.hoodie.config.HoodieWriteConfig;
import com.uber.hoodie.common.HoodieTestDataGenerator;
import com.uber.hoodie.common.model.HoodieRecord;
import com.uber.hoodie.common.table.HoodieTableMetaClient;
import com.uber.hoodie.common.util.FSUtils;
import com.uber.hoodie.config.HoodieIndexConfig;
import com.uber.hoodie.config.HoodieWriteConfig;
import com.uber.hoodie.index.HoodieIndex;

import org.apache.log4j.LogManager;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
Expand All @@ -38,12 +40,23 @@
*/
public class HoodieClientExample {

@Parameter(names={"--table-path", "-p"}, description = "path for Hoodie sample table")
private String inputTablePath = "file:///tmp/hoodie/sample-table";

@Parameter(names={"--table-name", "-n"}, description = "table name for Hoodie sample table")
private String inputTableName = "sample-table";

private static Logger logger = LogManager.getLogger(HoodieClientExample.class);


public static void main(String[] args) throws Exception {
String tablePath = args.length == 1 ? args[0] : "file:///tmp/hoodie/sample-table";
HoodieClientExample cli = new HoodieClientExample();
new JCommander(cli, args);
cli.run();
}


public void run() throws Exception {
HoodieTestDataGenerator dataGen = new HoodieTestDataGenerator();

SparkConf sparkConf = new SparkConf().setAppName("hoodie-client-example");
Expand All @@ -54,16 +67,15 @@ public static void main(String[] args) throws Exception {

// generate some records to be loaded in.
HoodieWriteConfig cfg =
HoodieWriteConfig.newBuilder().withPath(tablePath)
HoodieWriteConfig.newBuilder().withPath(inputTablePath)
.withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA).withParallelism(2, 2)
.forTable("sample-table").withIndexConfig(
.forTable(inputTableName).withIndexConfig(
HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.BLOOM).build())
.build();
Properties properties = new Properties();
properties.put(HoodieWriteConfig.TABLE_NAME, "sample-table");
properties.put(HoodieWriteConfig.TABLE_NAME, inputTableName);
HoodieTableMetaClient
.initializePathAsHoodieDataset(FSUtils.getFs(), tablePath,
properties);
.initializePathAsHoodieDataset(FSUtils.getFs(), inputTablePath, properties);
HoodieWriteClient client = new HoodieWriteClient(jsc, cfg);

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@

import com.google.common.annotations.VisibleForTesting;
import com.google.common.base.Preconditions;
import com.uber.hoodie.common.table.HoodieTimeline;
import com.uber.hoodie.common.table.log.HoodieLogFile;
import com.uber.hoodie.common.table.timeline.HoodieInstant;
import com.uber.hoodie.exception.HoodieIOException;
Expand Down