-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Spark CAST(string as boolean) #10382
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Thank you for the fix.
velox/type/Conversions.h
Outdated
/// throws. By default, Presto compatible cast is conducted. The allowed strings | ||
/// are `t, f, 1, 0, true, false` and their upper case equivalents. | ||
/// @tparam TPolicy The specified cast policy. When set as SparkCastPolicy, the | ||
/// allowed strings are `t, f, 1, 0, y, n, true, false, yes, no` and their upper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more readable if the comment describe the differences between Presto and Spark policies.
PrestoCastPolicy allows `t, f, 1, 0, true, false` strings and ...
SparkCastPolicy allows all strings that PrestoCastPolicy allows plus `y, n, yes, no` and their ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I find Presto cast could also use 'LegacyCastPolicy' but its behavior should be the same with 'PrestoCastPolicy' for this case as Presto's implementation in VarcharOperators is not controlled by a config (please kindly correct me if I'm wrong). Changed the comment as below.
The specified cast policy. PrestoCastPolicy and
LegacyCastPolicy allowt, f, 1, 0, true, false
and their upper case
equivalents. SparkCastPolicy allows all strings that they allow plusy, n, yes, no
and their upper case equivalents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe LegacyCastPolicy is used by a streaming system at Meta. It uses Presto functions, but requires slightly different behavior in some cases. CC: @bikramSingh91 @kgpai @kagamiori
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explain. In Velox, Presto cast uses 'LegacyCastPolicy' if the query config 'kLegacyCast' is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rui-mo Should we document the behavior of this API when TPolicy is LegacyCastPolicy ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is documented as below, having the same behavior with PrestoCastPolicy. Thanks.
/// @tparam TPolicy PrestoCastPolicy and LegacyCastPolicy allow `t, f, 1, 0,
/// true, false` and their upper case equivalents. SparkCastPolicy additionally
/// allows 'y, n, yes, no' and their upper case equivalents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect. Thanks.
velox/type/Conversions.h
Outdated
Expected<bool> castToBoolean(const char* data, size_t len); | ||
/// @tparam TPolicy The specified cast policy. PrestoCastPolicy and | ||
/// LegacyCastPolicy allow `t, f, 1, 0, true, false` and their upper case | ||
/// equivalents. SparkCastPolicy allows all strings that they allow plus `y, n, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: may drop 'The specified cast policy.'
nit: SparkCastPolicy additionally allows 'y, n, yes, no' and their upper case equivalents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Updated.
velox/type/Conversions.h
Outdated
/// Presto compatible cast of strings to boolean. There is a set of strings | ||
/// allowed to be casted to boolean. These strings are `t, f, 1, 0, true, false` | ||
/// and their upper case equivalents. Casting from other strings to boolean | ||
/// This function casts a string to a boolean value. There is a set of strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function casts a string to a boolean value.
Casts a string to a boolean. Allows a fixed set of strings.
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Spark only allows the strings
t, f, y, n, 1, 0, yes, no, true, false
andtheir uppercase equivalents. Folly allows more strings than spark does, e.g.
'on' and 'off'. This PR restricts the strings that can be converted to boolean
to these strings. Strings apart from these will throw.