-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Fix for Spark 3.x - Avoid converting converted Row values #868
Conversation
@suhsteve Can you check the test failures if they are related? |
@@ -143,9 +143,9 @@ internal class PicklingSqlCommandExecutor : SqlCommandExecutor | |||
// The following can happen if an UDF takes Row object(s). | |||
// The JVM Spark side sends a Row object that wraps all the columns used | |||
// in the UDF, thus, it is normalized below (the extra layer is removed). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worker will crash without this, so I believe it is ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I meant respect to the code. I think "extra layer is removed" is regarding the RowConstructor
, but now that it's gone, is the comment up to date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra layer can refer to Row
, so we take out Values
from it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with removing the ( )'s though if things sound unclear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elvaliuliuliu do we need to update the description or does it still apply ?
@@ -94,7 +94,7 @@ public Timestamp(DateTime dateTime) | |||
/// <summary> | |||
/// Readable string representation for this type. | |||
/// </summary> | |||
public override string ToString() => _dateTime.ToString("yyyy-MM-dd HH:mm:ss.ffffff"); | |||
public override string ToString() => _dateTime.ToString("yyyy-MM-dd HH:mm:ss.ffffffZ"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was a bug ? Converting to string and casting back to Timestamp in Spark caused the time to shift 8 hours.
@elvaliuliuliu @imback82
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a separate PR instead to address this. #871
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. let's discuss this in #871 and remove from this PR.
// It is possible that an entry of a Row (row1) may itself be a Row (row2). | ||
// If the entry is a RowConstructor then it will be a RowConstructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we already have test case handling this right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah there are a few that have rows as column values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (if tests pass), thanks @suhsteve!
The target machines must be using UTC time as their timezone. It fails locally on my machine.
|
Can you push an empty commit? |
There has been no modification to the
RowPickler
code between EvaluatePython.scala (Spark 2.4.7) and EvaluatePython.scala (Spark 3.0.0).Spark 2.4.7
used pyrolite4.13
and starting withSpark 3.0.0
pyrolite was updated to4.30
. InRowPickler
, Spark pickles the row values using:In
pickler.save(Object)
Pyrolite checks whether the object has beenmemoized
, and if it hasn't it will process the object and pickle it. There was a PR in Nov 2017 (between the 4.13 and 4.30 releases) that updated the logic with how thememoize
check was done. Pyrolite4.13
checked theSystem.identityHashCode(obj)
, however pyrolite4.30
only checks theobj.hashCode()
, which is the default behavior, unless thevalueCompare
flag has been toggled. Toggling this flag would go back to the4.13
behavior. Spark 3.x, however, does not use the Pickler constructor to set this.I have refactored the
RowConstructor
class a bit to make it easier to understand as well as fixed the issue with converting already converted row values.Fixes #760