Skip to content
This repository has been archived by the owner on Mar 4, 2021. It is now read-only.

Using the new SSTable format #36

Closed
danchia opened this issue Oct 17, 2014 · 2 comments
Closed

Using the new SSTable format #36

danchia opened this issue Oct 17, 2014 · 2 comments

Comments

@danchia
Copy link
Contributor

danchia commented Oct 17, 2014

@danielbwatson given that we're trying to deprecate the JSON output format, I wonder what's the best way for people to write downstream jobs that want to process data in a row manner?

It seems to be that there are two options:

(1) Run the same Mapper and Reducer used in aegisthus, but use a ChainReducer so that we can add a custom map stage after to do the application specific processing.

(2) The SSTables output by Aegisthus are actually special, since it's guaranteed that rows are non-overlapping and the columns are sorted in the right order. I'm wondering if we could expose this to a mapper in some smart way (and avoid the reduce step).

What do you think?

@danielbwatson
Copy link
Contributor

I wanted to deprecate the old JSON format when it was created by the reducer, but now that it is actually an output format I don't mind supporting it. We will get rid of the JsonInputFormat. It will be a lot easier to support if we always consume SSTables.

As far as downstream jobs, I think both of your ideas are good. I could see use cases for both of them.

@danielbwatson
Copy link
Contributor

I'm going to close this issue and add a reference to it in the Enhancement section of the README.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants