[SPARK-3412] [SQL] Add 3 missing types for Row API #2284

chenghao-intel · 2014-09-05T07:54:32Z

BinaryType, DecimalType and TimestampType are missing in the Row API.

chenghao-intel · 2014-09-05T08:25:09Z

test this please.

chenghao-intel · 2014-09-09T00:16:29Z

test this please.

SparkQA · 2014-09-09T00:49:22Z

QA tests have started for PR 2284 at commit 3644ffa.

This patch merges cleanly.

SparkQA · 2014-09-09T02:33:59Z

QA tests have finished for PR 2284 at commit 3644ffa.

This patch fails unit tests.
This patch merges cleanly.
This patch adds no public classes.

marmbrus · 2014-09-09T02:54:19Z

I'm not sure we want to do this. The specific getters / setters are really only so that we can avoid boxing primitives. We added getString mostly because at one point we were considering having some kind of internal mutable backing data structure here for performance (and we still might).

What do you think about something like def getAs[T](i: Int): T = apply(i).asInstanceOf[T] in Row?

/cc @liancheng @rxin

liancheng · 2014-09-09T04:19:25Z

@marmbrus I vote for getAs[T](i: Int).

@chenghao-intel As Michael has just said, avoiding boxing cost is the major design rationale behind these setters/getters. Unfortunately, we haven't fully leverage this design and boxing still happens on some critical paths, for example, building/accessing in-memory columnar buffers. PR #2327 is an attempt to (partially) solve this problem.

rxin · 2014-09-09T04:22:43Z

Yea getAs[T] sounds good.

chenghao-intel · 2014-09-09T05:05:26Z

Thank you guys for the explanation and voting, the boxing/unboxing is quite annoy problem for performance. But from the normal developer point of view, the Row api is the key to interact with the SparkSQL, complete data type (11 primitive data types currently) support (for getter / setter) may make more sense for people.

And if we used the generic type here, people may confused what the scala/java object type is if the data type is TimeStampType specified via schema, and even they probably add an object of java.security.Timestamp for the data type TimestampType.

Sorry, probably I missed some of the original discussions for row API design.

marmbrus · 2014-09-10T01:46:31Z

The type for each SQL type are pretty well documented in the programming guide (updated for 1.1 to be published soon). It seems unscalable to add new methods to all the various row implementations for each new datatype, especially since all they are doing is casting. Given this I propose we close this issue.

chenghao-intel · 2014-09-10T02:09:40Z

ok, I am closing it.

chenghao-intel assigned this to me, check PR #2284 for previous discussion Author: Daoyuan Wang <daoyuan.wang@intel.com> Closes #2529 from adrian-wang/rowapi and squashes the following commits: c6594b2 [Daoyuan Wang] using boxed 7b7e6e3 [Daoyuan Wang] update pattern match 7a39456 [Daoyuan Wang] rename file and refresh getAs[T] 4c18c29 [Daoyuan Wang] remove setAs[T] and null judge 1614493 [Daoyuan Wang] add missing row api

Add 3 missing types for Row API

3644ffa

chenghao-intel closed this Sep 10, 2014

adrian-wang mentioned this pull request Sep 25, 2014

[SPARK-3412][SQL]add missing row api #2529

Closed

liancheng mentioned this pull request Nov 3, 2014

[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

Closed

chenghao-intel deleted the missing_types_in_row branch December 10, 2014 02:42

adrian-wang mentioned this pull request May 11, 2015

[SPARK-6784] [SQL] Clean up all the inbound/outbound conversions for DateType #6027

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-3412] [SQL] Add 3 missing types for Row API #2284

[SPARK-3412] [SQL] Add 3 missing types for Row API #2284

chenghao-intel commented Sep 5, 2014

chenghao-intel commented Sep 5, 2014

chenghao-intel commented Sep 9, 2014

SparkQA commented Sep 9, 2014

SparkQA commented Sep 9, 2014

marmbrus commented Sep 9, 2014

liancheng commented Sep 9, 2014

rxin commented Sep 9, 2014

chenghao-intel commented Sep 9, 2014

marmbrus commented Sep 10, 2014

chenghao-intel commented Sep 10, 2014

[SPARK-3412] [SQL] Add 3 missing types for Row API #2284

[SPARK-3412] [SQL] Add 3 missing types for Row API #2284

Conversation

chenghao-intel commented Sep 5, 2014

chenghao-intel commented Sep 5, 2014

chenghao-intel commented Sep 9, 2014

SparkQA commented Sep 9, 2014

SparkQA commented Sep 9, 2014

marmbrus commented Sep 9, 2014

liancheng commented Sep 9, 2014

rxin commented Sep 9, 2014

chenghao-intel commented Sep 9, 2014

marmbrus commented Sep 10, 2014

chenghao-intel commented Sep 10, 2014