Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3412] [SQL] Add 3 missing types for Row API #2284

Closed

Conversation

chenghao-intel
Copy link
Contributor

BinaryType, DecimalType and TimestampType are missing in the Row API.

@chenghao-intel
Copy link
Contributor Author

test this please.

1 similar comment
@chenghao-intel
Copy link
Contributor Author

test this please.

@SparkQA
Copy link

SparkQA commented Sep 9, 2014

QA tests have started for PR 2284 at commit 3644ffa.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 9, 2014

QA tests have finished for PR 2284 at commit 3644ffa.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

marmbrus commented Sep 9, 2014

I'm not sure we want to do this. The specific getters / setters are really only so that we can avoid boxing primitives. We added getString mostly because at one point we were considering having some kind of internal mutable backing data structure here for performance (and we still might).

What do you think about something like def getAs[T](i: Int): T = apply(i).asInstanceOf[T] in Row?

/cc @liancheng @rxin

@liancheng
Copy link
Contributor

@marmbrus I vote for getAs[T](i: Int).

@chenghao-intel As Michael has just said, avoiding boxing cost is the major design rationale behind these setters/getters. Unfortunately, we haven't fully leverage this design and boxing still happens on some critical paths, for example, building/accessing in-memory columnar buffers. PR #2327 is an attempt to (partially) solve this problem.

@rxin
Copy link
Contributor

rxin commented Sep 9, 2014

Yea getAs[T] sounds good.

@chenghao-intel
Copy link
Contributor Author

Thank you guys for the explanation and voting, the boxing/unboxing is quite annoy problem for performance. But from the normal developer point of view, the Row api is the key to interact with the SparkSQL, complete data type (11 primitive data types currently) support (for getter / setter) may make more sense for people.

And if we used the generic type here, people may confused what the scala/java object type is if the data type is TimeStampType specified via schema, and even they probably add an object of java.security.Timestamp for the data type TimestampType.

Sorry, probably I missed some of the original discussions for row API design.

@marmbrus
Copy link
Contributor

The type for each SQL type are pretty well documented in the programming guide (updated for 1.1 to be published soon). It seems unscalable to add new methods to all the various row implementations for each new datatype, especially since all they are doing is casting. Given this I propose we close this issue.

@chenghao-intel
Copy link
Contributor Author

ok, I am closing it.

asfgit pushed a commit that referenced this pull request Oct 9, 2014
chenghao-intel assigned this to me, check PR #2284 for previous discussion

Author: Daoyuan Wang <daoyuan.wang@intel.com>

Closes #2529 from adrian-wang/rowapi and squashes the following commits:

c6594b2 [Daoyuan Wang] using boxed
7b7e6e3 [Daoyuan Wang] update pattern match
7a39456 [Daoyuan Wang] rename file and refresh getAs[T]
4c18c29 [Daoyuan Wang] remove setAs[T] and null judge
1614493 [Daoyuan Wang] add missing row api
@chenghao-intel chenghao-intel deleted the missing_types_in_row branch December 10, 2014 02:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants