-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] support for typing.reveal_type #13384
Conversation
a44c681
to
d327c60
Compare
4a82682
to
4b171bd
Compare
|
/// The `typing.reveal_type` function, which has special `__call__` behavior. | ||
RevealTypeFunction(FunctionType<'db>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to store the entire FunctionType
for RevealType
? Don't we know its exact signature?
Also, kind of funny to have this as a dedicated type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree with Micha that there's not much use storing inner data for this variant, since we already have the complexity of having an extra variant (wrapping inner data doesn't really save us any complexity, I don't think), everything about reveal_type
's special behaviour will have to be hardcoded in by us anyway, and the signature is known and guaranteed to never change.
Also, kind of funny to have this as a dedicated type
It surprised me initially that this was the approach used, since conceptually it's really the symbol of reveal_type
that's special rather than the type of reveal_type
. But I suppose that unless it's represented as a separate variant here, it's hard for Type::Call
to operate differently when applied to revealed_type
as opposed to other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this as a distinct Type variant?
Conceptually, singleton types (which include Class and Function) are already the same thing as "a symbol" -- they represent a single Python class or function object, defined at a single location, and these types allow us to track this identity across imports (including with aliasing), assignments, etc. So identifying reveal_type
via the type system is a natural fit; I don't really see what the alternative would even be.
In principle, we wouldn't need the extra variant for this; Type::Function
has all the information we need already. But practically speaking, it would be a bad idea for performance if we had to add an extra check "wait, is this function reveal_type
??" at every single call site of any function; is_stdlib_symbol
is not free, differentiating an enum tag is much much cheaper. Using a distinct variant allows us to make the is_stdlib_symbol
check only at function-definition time, which is much less frequent than function-call time.
Should we store the inner FunctionType
inside Type::RevealTypeFunction
?
I don't have super-strong feelings here, but I think the overall cost-benefit for doing this is positive.
The cost is very low; a FunctionType
is just a u32 id, and RevealTypeFunction
is not going to be a super common type. I would venture to say the cost of doing this is effectively zero.
And the benefit is not zero. We don't have to hardcode everything about RevealTypeFunction
. If we store the inner FunctionType
, we can in almost every case just fall back to the normal Function
handling, and only add special reveal_type
behavior where it is actually special. For instance, I would rather not hardcode call-signature validation, I'd rather share as much as possible with Type::Function
.
So yes, we could skip storing the inner FunctionType
, and instead hardcode literally everything about RevealTypeFunction
. But I don't see why we'd want to do this, when we don't have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see what the alternative would even be.
The alternative would be this diff, right? All the red-knot tests pass if I apply this diff to main
, and to me it seems much simpler conceptually. Perhaps it adds a performance cost, though, because you have to check whether the function is the reveal_type
function every time you run Type::Call
.
Diff demonstrating an implementation without a new `Type` variant
diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 607fa80ac..38451fd9c 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -238,8 +238,6 @@ pub enum Type<'db> {
None,
/// a specific function object
Function(FunctionType<'db>),
- /// The `typing.reveal_type` function, which has special `__call__` behavior.
- RevealTypeFunction(FunctionType<'db>),
/// a specific module object
Module(File),
/// a specific class object
@@ -326,9 +324,7 @@ impl<'db> Type<'db> {
pub const fn into_function_type(self) -> Option<FunctionType<'db>> {
match self {
- Type::Function(function_type) | Type::RevealTypeFunction(function_type) => {
- Some(function_type)
- }
+ Type::Function(function_type) => Some(function_type),
_ => None,
}
}
@@ -374,9 +370,7 @@ impl<'db> Type<'db> {
pub fn is_stdlib_symbol(&self, db: &'db dyn Db, module_name: &str, name: &str) -> bool {
match self {
Type::Class(class) => class.is_stdlib_symbol(db, module_name, name),
- Type::Function(function) | Type::RevealTypeFunction(function) => {
- function.is_stdlib_symbol(db, module_name, name)
- }
+ Type::Function(function) => function.is_stdlib_symbol(db, module_name, name),
_ => false,
}
}
@@ -450,7 +444,7 @@ impl<'db> Type<'db> {
// TODO: attribute lookup on None type
Type::Unknown
}
- Type::Function(_) | Type::RevealTypeFunction(_) => {
+ Type::Function(_) => {
// TODO: attribute lookup on function type
Type::Unknown
}
@@ -499,11 +493,16 @@ impl<'db> Type<'db> {
fn call(self, db: &'db dyn Db, arg_types: &[Type<'db>]) -> CallOutcome<'db> {
match self {
// TODO validate typed call arguments vs callable signature
- Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)),
- Type::RevealTypeFunction(function_type) => CallOutcome::revealed(
- function_type.return_type(db),
- *arg_types.first().unwrap_or(&Type::Unknown),
- ),
+ Type::Function(function_type) => {
+ if function_type.is_stdlib_symbol(db, "typing", "reveal_type") {
+ CallOutcome::revealed(
+ function_type.return_type(db),
+ *arg_types.first().unwrap_or(&Type::Unknown),
+ )
+ } else {
+ CallOutcome::callable(function_type.return_type(db))
+ }
+ }
// TODO annotated return type on `__new__` or metaclass `__call__`
Type::Class(class) => CallOutcome::callable(Type::Instance(class)),
@@ -605,7 +604,6 @@ impl<'db> Type<'db> {
Type::BooleanLiteral(_)
| Type::BytesLiteral(_)
| Type::Function(_)
- | Type::RevealTypeFunction(_)
| Type::Instance(_)
| Type::Module(_)
| Type::IntLiteral(_)
@@ -628,7 +626,7 @@ impl<'db> Type<'db> {
Type::BooleanLiteral(_) => builtins_symbol_ty(db, "bool"),
Type::BytesLiteral(_) => builtins_symbol_ty(db, "bytes"),
Type::IntLiteral(_) => builtins_symbol_ty(db, "int"),
- Type::Function(_) | Type::RevealTypeFunction(_) => types_symbol_ty(db, "FunctionType"),
+ Type::Function(_) => types_symbol_ty(db, "FunctionType"),
Type::Module(_) => types_symbol_ty(db, "ModuleType"),
Type::None => typeshed_symbol_ty(db, "NoneType"),
// TODO not accurate if there's a custom metaclass...
diff --git a/crates/red_knot_python_semantic/src/types/display.rs b/crates/red_knot_python_semantic/src/types/display.rs
index 3d037fe65..954a19d31 100644
--- a/crates/red_knot_python_semantic/src/types/display.rs
+++ b/crates/red_knot_python_semantic/src/types/display.rs
@@ -36,7 +36,6 @@ impl Display for DisplayType<'_> {
| Type::BytesLiteral(_)
| Type::Class(_)
| Type::Function(_)
- | Type::RevealTypeFunction(_)
) {
write!(f, "Literal[{representation}]",)
} else {
@@ -73,9 +72,7 @@ impl Display for DisplayRepresentation<'_> {
// TODO functions and classes should display using a fully qualified name
Type::Class(class) => f.write_str(class.name(self.db)),
Type::Instance(class) => f.write_str(class.name(self.db)),
- Type::Function(function) | Type::RevealTypeFunction(function) => {
- f.write_str(function.name(self.db))
- }
+ Type::Function(function) => f.write_str(function.name(self.db)),
Type::Union(union) => union.display(self.db).fmt(f),
Type::Intersection(intersection) => intersection.display(self.db).fmt(f),
Type::IntLiteral(n) => n.fmt(f),
@@ -194,7 +191,7 @@ impl TryFrom<Type<'_>> for LiteralTypeKind {
fn try_from(value: Type<'_>) -> Result<Self, Self::Error> {
match value {
Type::Class(_) => Ok(Self::Class),
- Type::Function(_) | Type::RevealTypeFunction(_) => Ok(Self::Function),
+ Type::Function(_) => Ok(Self::Function),
Type::IntLiteral(_) => Ok(Self::IntLiteral),
Type::StringLiteral(_) => Ok(Self::StringLiteral),
Type::BytesLiteral(_) => Ok(Self::BytesLiteral),
diff --git a/crates/red_knot_python_semantic/src/types/infer.rs b/crates/red_knot_python_semantic/src/types/infer.rs
index 52d95f1be..c53ce7b27 100644
--- a/crates/red_knot_python_semantic/src/types/infer.rs
+++ b/crates/red_knot_python_semantic/src/types/infer.rs
@@ -704,12 +704,12 @@ impl<'db> TypeInferenceBuilder<'db> {
}
}
- let function_type = FunctionType::new(self.db, name.id.clone(), definition, decorator_tys);
- let function_ty = if function_type.is_stdlib_symbol(self.db, "typing", "reveal_type") {
- Type::RevealTypeFunction(function_type)
- } else {
- Type::Function(function_type)
- };
+ let function_ty = Type::Function(FunctionType::new(
+ self.db,
+ name.id.clone(),
+ definition,
+ decorator_tys,
+ ));
self.add_declaration_with_binding(function.into(), definition, function_ty, function_ty);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, yeah -- that's what I addressed in my second paragraph. I guess my first paragraph didn't make much sense / wasn't relevant, because you're not suggesting not using types at all.
This approach is definitely simpler, but I considered it a performance nonstarter to add the overhead of is_stdlib_symbol
to every Type::call. Perhaps that's premature optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree the code maintenance is a bigger concern as we accumulate lots of these.
If the contained data inside enum variants is large, that will dominate the overall size of the enum. But in this case the contained data is small (usually just a u32 id; I think IntLiteral
is currently our largest variant, as it contains an i64
), so the need to find niches to discriminate the enum variants themselves can be relevant. Though in practice in this case there just aren't any niches available in an i64
, so the enum requires another entire 8 bytes for the discriminant (doubling the overall size of Type
to 16 bytes), and we are certainly not going to reach 2^64 variants 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doubling the overall size of
Type
to 16 bytes
I believe a FunctionType
instance is actually just a u32
, though, since it's a salsa-tracked struct — right? So adding a boolean field to FunctionType
would increase the size of the underlying data stored in the salsa cache for each FunctionType
instance, but wouldn't actually increase the size of the Type
enum...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. I never mentioned the size of the Type enum in relation to adding a boolean to FunctionType. I only brought up the size of the Type enum in relation to having a ton of extra Type variants in the future, and it's probably not even really relevant there.
Making every FunctionType bigger is less bad than making the Type enum bigger would be, but we will have quite a large number of FunctionType in the db.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that was a sloppy misreading of what you said there!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries!
pub const fn into_function_type(self) -> Option<FunctionType<'db>> { | ||
match self { | ||
Type::Function(function_type) => Some(function_type), | ||
Type::Function(function_type) | Type::RevealTypeFunction(function_type) => { | ||
Some(function_type) | ||
} | ||
_ => None, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the answer is probably no because the function is called into_function_type
and not into_function
but is there any existing code that makes the assumption that the function only returns Some
if the variant is Function
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes Type::expect_function
not operate as expected, since the purpose of that method is partly to assert that the variant you have is a Type::Function
variant:
ruff/crates/red_knot_python_semantic/src/types.rs
Lines 332 to 335 in 4eb849a
pub fn expect_function(self) -> FunctionType<'db> { | |
self.into_function_type() | |
.expect("Expected a Type::Function variant") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any existing code that makes the assumption that the function only returns
Some
if the variant isFunction
?
No, there's no such code. And I think we generally want any external code to ideally transparently treat Type::RevealTypeFunction
as if it were Type::Function
. Because reveal_type
is a function, and should behave as one.
This is another really good reason to store the inner FunctionType
inside RevealTypeFunction
.
I think this makes
Type::expect_function
not operate as expected
I think this makes Type::expect_function
behave precisely how we want it to, which is that the caller can transparently get out the correct FunctionType
without having to special-case RevealTypeFunction
. Once the caller has the FunctionType
, there's no reason they should have to know or care about whether it was a Type::Function
or Type::RevealTypeFunction
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes
Type::expect_function
behave precisely how we want it to
Well, you'll at least need to change the message in the .expect()
call here, to something like "expected a variant that wraps a FunctionType
"
ruff/crates/red_knot_python_semantic/src/types.rs
Lines 332 to 335 in 4eb849a
pub fn expect_function(self) -> FunctionType<'db> { | |
self.into_function_type() | |
.expect("Expected a Type::Function variant") | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Fixed the message.
Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)), | ||
Type::RevealTypeFunction(function_type) => CallOutcome::RevealType { | ||
return_ty: function_type.return_type(db), | ||
revealed_ty: *arg_types.first().unwrap_or(&Type::Unknown), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be None
instead for better error messages? It's a difference whether a type is inferred as Unknown
or the argument was missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but this shouldn't be handled as a special case for CallOutcome::RevealType
, it should be handled by general call-argument-matching-and-validation, which applies to all function calls (and which we don't have yet.) So this code is presuming that in the future (once we add call signature checking) you would get a diagnostic from that; we don't need extra handling it for it here, we should just reveal unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly. But then you end up with two diagnostics where one is the error and the other is the reveal type diagnostic which doesn't seem ideal to me. But I can see that special casing reveal type's diagnostic is probably more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point that it may not make sense to even have a revealed-type diagnostic if the call to reveal_type
is incorrect. But when we add call-signature checking (which can be shared in a common function used across both the Function
and RevealTypeFunction
arms, because it just needs the call arguments and the FunctionType
), the RevealTypeFunction
arm isn't required to return a CallOutcome::RevealType
at all. If it fails call signature checking, it can return a different outcome type, that won't issue a revealed-type diagnostic at all.
So in that case you can consider this fallback just a temporary band-aid until we have call signature checking (which I already added a TODO for above.)
outcomes: union | ||
.elements(db) | ||
.iter() | ||
.map(|elem| elem.call(db, arg_types)) | ||
.collect(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to unify the elements because multiple elements could have the same return type (or even all of them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's handled by CallOutcome::return_ty
and CallOutcome::unwrap_with_diagnostic
, in the CallOutcome::Union
arms, where we do build a union of all the return types.
// TODO validate typed call arguments vs callable signature | ||
Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)), | ||
Type::RevealTypeFunction(function_type) => CallOutcome::RevealType { | ||
return_ty: function_type.return_type(db), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a fixed return type or is the reveal_type
function similar to dbg
where it is an identity
function with the side effect that it prints the type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a generic identity function, the same as dbg!
: https://github.com/python/cpython/blob/32119fc377a4d9df524a7bac02b6922a990361dd/Lib/typing.py#L3635-L3651
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be possible here to instead set return_ty
to the same as the revealed type. But this is another case where I would prefer to have less special-casing, and allow the return type to be inferred via the typeshed stubs, the same as for any other function. The result should be the same, although avoiding special-casing means we won't have that correct result until we add support for generic functions.
@@ -619,6 +651,136 @@ impl<'db> From<&Type<'db>> for Type<'db> { | |||
} | |||
} | |||
|
|||
#[derive(Debug, Clone, PartialEq, Eq)] | |||
enum CallOutcome<'db> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the name Outcome
a bit strange but I think that's something you plan to address as part of your context proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any plans to address this. My document was just listing out pros and cons as I saw them of different options I'd considered in working on this, so as to be able to refer back to them later. My conclusion for now was to not do a context object, and stick with this style.
It's very possible that we later decide it makes sense to do a type inference context object, but that's not clear to me yet and not something I plan to do in the short term.
So if you don't like the name "Outcome" then we should brainstorm alternative name options, and we can change both CallOutcome
and IterationOutcome
. But maybe in a separate PR.
/// The `typing.reveal_type` function, which has special `__call__` behavior. | ||
RevealTypeFunction(FunctionType<'db>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree with Micha that there's not much use storing inner data for this variant, since we already have the complexity of having an extra variant (wrapping inner data doesn't really save us any complexity, I don't think), everything about reveal_type
's special behaviour will have to be hardcoded in by us anyway, and the signature is known and guaranteed to never change.
Also, kind of funny to have this as a dedicated type
It surprised me initially that this was the approach used, since conceptually it's really the symbol of reveal_type
that's special rather than the type of reveal_type
. But I suppose that unless it's represented as a separate variant here, it's hard for Type::Call
to operate differently when applied to revealed_type
as opposed to other functions.
pub const fn into_function_type(self) -> Option<FunctionType<'db>> { | ||
match self { | ||
Type::Function(function_type) => Some(function_type), | ||
Type::Function(function_type) | Type::RevealTypeFunction(function_type) => { | ||
Some(function_type) | ||
} | ||
_ => None, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes Type::expect_function
not operate as expected, since the purpose of that method is partly to assert that the variant you have is a Type::Function
variant:
ruff/crates/red_knot_python_semantic/src/types.rs
Lines 332 to 335 in 4eb849a
pub fn expect_function(self) -> FunctionType<'db> { | |
self.into_function_type() | |
.expect("Expected a Type::Function variant") | |
} |
// TODO validate typed call arguments vs callable signature | ||
Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)), | ||
Type::RevealTypeFunction(function_type) => CallOutcome::RevealType { | ||
return_ty: function_type.return_type(db), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a generic identity function, the same as dbg!
: https://github.com/python/cpython/blob/32119fc377a4d9df524a7bac02b6922a990361dd/Lib/typing.py#L3635-L3651
d671547
to
f0ace0e
Compare
Add support for the
typing.reveal_type
function, emitting a diagnostic revealing the type of its single argument. This is a necessary piece for the planned testing framework.This puts the cart slightly in front of the horse, in that we don't yet have proper support for validating call signatures / argument types. But it's easy to do just enough to make
reveal_type
work.This PR includes support for calling union types (this is necessary because we don't yet support
sys.version_info
checks, sotyping.reveal_type
itself is a union type), plus some nice consolidated error messages for calls to unions where some elements are not callable. This is mostly to demonstrate the flexibility in diagnostics that we get from theCallOutcome
enum.