Using the new SSTable format #36

danchia · 2014-10-17T22:06:10Z

@danielbwatson given that we're trying to deprecate the JSON output format, I wonder what's the best way for people to write downstream jobs that want to process data in a row manner?

It seems to be that there are two options:

(1) Run the same Mapper and Reducer used in aegisthus, but use a ChainReducer so that we can add a custom map stage after to do the application specific processing.

(2) The SSTables output by Aegisthus are actually special, since it's guaranteed that rows are non-overlapping and the columns are sorted in the right order. I'm wondering if we could expose this to a mapper in some smart way (and avoid the reduce step).

What do you think?

danielbwatson · 2014-10-21T18:07:47Z

I wanted to deprecate the old JSON format when it was created by the reducer, but now that it is actually an output format I don't mind supporting it. We will get rid of the JsonInputFormat. It will be a lot easier to support if we always consume SSTables.

As far as downstream jobs, I think both of your ideas are good. I could see use cases for both of them.

danielbwatson · 2016-01-06T00:24:46Z

I'm going to close this issue and add a reference to it in the Enhancement section of the README.

danielbwatson closed this as completed Jan 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the new SSTable format #36

Using the new SSTable format #36

danchia commented Oct 17, 2014

danielbwatson commented Oct 21, 2014

danielbwatson commented Jan 6, 2016

Using the new SSTable format #36

Using the new SSTable format #36

Comments

danchia commented Oct 17, 2014

danielbwatson commented Oct 21, 2014

danielbwatson commented Jan 6, 2016