Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nesting Options in TypeDescriptor #1387

Merged
merged 2 commits into from
Jul 31, 2015
Merged

Conversation

johnynek
Copy link
Collaborator

This changes does a few things:

  1. refactor the macro code to use a combinator based approach. The types are always trees, so it simplifies the code to leverage this. This code is actually shorter with more features.
  2. relax the requirement for no ambiguous nesting of options when creating Fields (and TupleSetter, but that was never there). This means we could use these macros to make sinks that support Option[String] or Option[Option[Option[Int]]] which just flatten on the way out. Reading is ambiguous, but there seems to be demand for this feature when writing data out to be read by python, R, etc...
  3. Makes TupleConverter stricter: it won't support Option[String] for reading because it is ambiguous (is it None or ""?).
  4. Allows us to do nested Options if they have at least one non-optional numeric column to check. This allows us to do things like Option[(Int, Option[Long])] and similar. The existence checking is also more efficient: we only check one column for each option (the first "evident" column we can find).

@johnynek
Copy link
Collaborator Author

One question: should we change TupleConvertor on String to do:

val tmp = tup.getString(idx)
if (tmp == null) "" else tmp

Since I think in some cases, cascading will parse empty strings in text delimited into null (when the type of the field is Object I think, which may happen now with nested Options).

@sid-kap
Copy link
Contributor

sid-kap commented Jul 27, 2015

Another way to deal with the empty strings parsing as null would be to override the cleanSplit method in Cascading's DelimitedParser (by changing this method: https://github.com/cwensel/cascading/blob/wip-3.0/cascading-core/src/main/java/cascading/scheme/util/DelimitedParser.java#L281).

I'm not sure if changing TupleConverter or DelimitedParser would be better.

@johnynek
Copy link
Collaborator Author

@sid-kap yeah, it's not clear to me either. Generally we change code we have control over (in this case the TupleConverter macros) but not 100% sure the best approach here.

I think the current approach is fine, but it does convolute unknownTypes with "" == null, which might not be true for some applications. So down the line we might need to refactor to add another option (emptyStringIsNull or something) to the macro code.


Left(expanded)
object FieldBuilder {
// This is method on the object to work around this compiler bug: SI-6231
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That bug was fixed for 2.10.0 by the link ?

@ianoc
Copy link
Collaborator

ianoc commented Jul 29, 2015

LGTM other than i think we should use the strategy from bijection to enforce the macro doesn't work in those cases

ianoc added a commit that referenced this pull request Jul 31, 2015
Support nesting Options in TypeDescriptor
@ianoc ianoc merged commit 1b35bf3 into develop Jul 31, 2015
@ianoc ianoc deleted the oscar/anchor-typedescriptor branch July 31, 2015 20:17
@johnynek
Copy link
Collaborator Author

johnynek commented Aug 3, 2015

I wonder if we should add a type:

trait TypedTextDescriptor[T] extends TypeDescriptor[T]

object TypedTextDescriptor {
  implicit def fromMacro[T]: // this assumes null strings are the same as "" on read, disallows Option[String], which is supported in databases.
}

@ianoc ianoc mentioned this pull request Aug 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants