Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix some typos in TypedPipe docs, expand flatMap docs #1115

Merged
merged 3 commits into from
Dec 5, 2014
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -123,9 +123,17 @@ trait TypedPipe[+T] extends Serializable {
def cross[U](tiny: TypedPipe[U]): TypedPipe[(T, U)]

/**
* This is the fundamental mapper operation. Each item is feed to the input
* function. The result is iterated once and passed to the next step in the
* This is the fundamental mapper operation.
* It behaves in a way similar to List.flatMap, which means that each
* item is fed to the input function, which can return 0, 1, or many outputs
* (as a TraversableOnce) per input. The returned results will be iterated through once
* and then flattened into a single TypedPipe which is passed to the next step in the
* pipeline.
*
* This behavior makes it a powerful operator -- it can be used to filter records
* (by returning 0 items for a given input), it can be used the way map is used
* (by returning 1 item per input), it can be used to explode 1 input into many outputs,
* or even a combination of all of the above at once.
*/
def flatMap[U](f: T => TraversableOnce[U]): TypedPipe[U]

Expand Down Expand Up @@ -265,11 +273,11 @@ trait TypedPipe[+T] extends Serializable {
def fork: TypedPipe[T] = onRawSingle(identity)

/**
* WARNING This is dangerous, and maybe not what you think.
* WARNING This is dangerous, and may not be what you think.
*
* limit the output to AT MOST count items.
* useful for debugging, but probably that's about it.
* The number may be less than count, and not sampled particular method
* The number may be less than count, and not sampled by any particular method
*
* This may change in the future to be exact, but that will add 1 MR step
*/
Expand Down Expand Up @@ -639,7 +647,7 @@ trait TypedPipe[+T] extends Serializable {
*
* {@code pipe.sketch(100).join(thatPipe) }
* will add an extra map/reduce job over a standard join to create the count-min-sketch.
* This will generally only be beneficial if you have really have skew, where without
* This will generally only be beneficial if you have really heavy skew, where without
* this you have 1 or 2 reducers taking hours longer than the rest.
*/
def sketch[K, V](reducers: Int,
Expand Down