-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use LogicalType for TypeSignature Numeric
and String
, Coercible
#13240
Conversation
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
3a578de
to
4b6d433
Compare
Coercible
and String
Coercible
, Numeric
and String
], | ||
Volatility::Immutable, | ||
), | ||
signature: Signature::any(1, Volatility::Immutable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally first/last value get the first/last item in the column, so it should be the type of the column.
I forgot why we don't use Any
.
Without the change, we need to add another type signature for boolean, so I change it to Any
in this PR
/// Avoid adding specific coercion cases here. | ||
/// Aim to keep this logic as SIMPLE as possible! | ||
pub fn can_cast_to(&self, target_type: &Self) -> bool { | ||
// In Postgres, most functions coerce numeric strings to numeric inputs, | ||
// but they do not accept numeric inputs as strings. | ||
if self.is_numeric() && target_type == &NativeType::String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This talks about "can cast" and "can coerce" without making clear distinction between them.
Can we make it less ambigous and clarify whether we may "can cast" (ie does CAST(type_from AS type_to)
exist?) or "can coerce" (does cast exist and is applied implicitly in various contexts?)
return false; | ||
} | ||
|
||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default for "can cast" and "can coerce" should be false.
Maps cannot coerce to numbers or lists.
Let'e have
true | |
false |
here and let's define what can be converted in rules above. This will make the code simpler to reason about
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Coercible
, Numeric
and String
Numeric
and String
. Deprecate Coercible
and add NumericAndNumericString
Numeric
and String
. Deprecate Coercible
and add NumericAndNumericString
Numeric
and String
, Coercible
and introduce NumericAndNumericString
/// See [`NativeType::is_numeric`] to know which type is considered numeric | ||
/// This signature accepts numeric string | ||
/// Example of functions In Postgres that support numeric string | ||
/// 1. Mathematical Functions, like `abs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i confirm this works in PostgreSQL (was testing with PostgreSQL v 17)
select abs('-123');
https://www.postgresql.org/docs/17/typeconv-oper.html#id-1.5.9.7.8 suggests why this works
'...' is not a varchar literal in PostgreSQL.
it's "unknown-type" literal, which gets interpreted as float8
Here the system has implicitly resolved the unknown-type literal as type float8 before applying the chosen operator.
indeed, select pg_typeof(abs('-123'));
returns double precision
However, this doesn't work in PostgreSQL
select abs(CAST('-123' AS varchar));
This fails with "Query Error: function abs(character varying) does not exist"
Indicating that there is no special coercion rules from varchar to numbers when calling functions like abs()
.
BTW, coercion rules can be retrieved from PostgreSQL from https://www.postgresql.org/docs/17/catalog-pg-cast.html
PostgreSQL doesn't declare any implicit coercions between (selected) numeric types and (selected) varchar type, and abs()
function resolution behavior as described above matches that.
with selected_types(name) AS (VALUES ('int4'), ('int8'), ('float8'), ('varchar'))
select
src.typname src_type,
dst.typname dst_type,
castcontext,
case castcontext
when 'e' then 'only explicit'
when 'a' then 'explicit | implicitly in assignment'
when 'i' then 'implicitly in expressions, as well as the other cases'
else '???' -- undocumented @ https://www.postgresql.org/docs/17/catalog-pg-cast.html
end castcontext_explained
from pg_cast
join pg_type src on pg_cast.castsource = src.oid
join pg_type dst on pg_cast.casttarget = dst.oid
where true
and src.oid != dst.oid
and src.typname in (select name from selected_types)
and dst.typname in (select name from selected_types)
order by src.typname, dst.typname
;
src_type | dst_type | castcontext | castcontext_explained |
---|---|---|---|
float8 | int4 | a | explicit | implicitly in assignment |
float8 | int8 | a | explicit | implicitly in assignment |
int4 | float8 | i | implicitly in expressions, as well as the other cases |
int4 | int8 | i | implicitly in expressions, as well as the other cases |
int8 | float8 | i | implicitly in expressions, as well as the other cases |
int8 | int4 | a | explicit | implicitly in assignment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @findepi.
It makes sense to me. NumericAndNumericString
allows abs('-1')
but also allows abs('-1'::varchar)
, which isn't allowed by Postgres.
A similar behavior in DuckDB is with to_timestamp
.
D select to_timestamp('-1');
┌──────────────────────────┐
│ to_timestamp('-1') │
│ timestamp with time zone │
├──────────────────────────┤
│ 1970-01-01 07:59:59+08 │
└──────────────────────────┘
D select to_timestamp('-1'::varchar);
Binder Error: No function matches the given name and argument types 'to_timestamp(VARCHAR)'. You might need to add explicit type casts.
Candidate functions:
to_timestamp(DOUBLE) -> TIMESTAMP WITH TIME ZONE
LINE 1: select to_timestamp('-1'::varchar);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to follow PostgreSQL behavior, we would need to model '...'
literals as unknown type literals. This would be parser / frontend work. It should have no impact on projects that do not use the DF parser (#12723).
Not sure it's worthwhile to do. For now, let's just focus on coercions and make them reasonably correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. We need to model unknown type literal and avoid numeric string to string coercion
@@ -98,6 +98,12 @@ impl fmt::Debug for dyn LogicalType { | |||
} | |||
} | |||
|
|||
impl std::fmt::Display for dyn LogicalType { | |||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |||
write!(f, "{self:?}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can make a distinction between Debug
and Display
. Currently, we print something like
let int: Box<dyn LogicalType> = Box::new(Int32);
println!(format!("{}", int));
------- output ------
LogicalType(Native(Int32), Int32))
I imagine we can print Int32
toInt32
, print JSON
to JSON
... 🤔
However, I think it may not be the point of this PR. We can do it in another PR.
/// See [`NativeType::is_numeric`] to know which type is considered numeric | ||
/// This signature accepts numeric string | ||
/// Example of functions In Postgres that support numeric string | ||
/// 1. Mathematical Functions, like `abs` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @findepi.
It makes sense to me. NumericAndNumericString
allows abs('-1')
but also allows abs('-1'::varchar)
, which isn't allowed by Postgres.
A similar behavior in DuckDB is with to_timestamp
.
D select to_timestamp('-1');
┌──────────────────────────┐
│ to_timestamp('-1') │
│ timestamp with time zone │
├──────────────────────────┤
│ 1970-01-01 07:59:59+08 │
└──────────────────────────┘
D select to_timestamp('-1'::varchar);
Binder Error: No function matches the given name and argument types 'to_timestamp(VARCHAR)'. You might need to add explicit type casts.
Candidate functions:
to_timestamp(DOUBLE) -> TIMESTAMP WITH TIME ZONE
LINE 1: select to_timestamp('-1'::varchar);
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
I rm remove incorrect numeric string case in this PR |
Numeric
and String
, Coercible
and introduce NumericAndNumericString
Numeric
and String
, Coercible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211, overall looks good to me. Just some minor comments. 👍
Signed-off-by: jayzhan211 <jayzhan211@gmail.com>
Thanks @goldmedal @findepi |
Which issue does this PR close?
Closes #.
Rationale for this change
An attempt to use logical type for TypeSignature
Why NumericAndNumericString is added
In Postgres, math functions accept numeric string, but functions like aggregation
var
they don't accept numeric string.Therefore we need two kinds of numeric signature, one accepts string one doesn't
What changes are included in this PR?
TypeSignature::Coercible, TypeSignature::Numeric and TypeSignature::String, TypeSignature:: NumericAndNumericString
Are these changes tested?
Are there any user-facing changes?