Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] support for typing.reveal_type #13384

Merged
merged 6 commits into from
Sep 18, 2024
Merged

[red-knot] support for typing.reveal_type #13384

merged 6 commits into from
Sep 18, 2024

Conversation

carljm
Copy link
Contributor

@carljm carljm commented Sep 17, 2024

Add support for the typing.reveal_type function, emitting a diagnostic revealing the type of its single argument. This is a necessary piece for the planned testing framework.

This puts the cart slightly in front of the horse, in that we don't yet have proper support for validating call signatures / argument types. But it's easy to do just enough to make reveal_type work.

This PR includes support for calling union types (this is necessary because we don't yet support sys.version_info checks, so typing.reveal_type itself is a union type), plus some nice consolidated error messages for calls to unions where some elements are not callable. This is mostly to demonstrate the flexibility in diagnostics that we get from the CallOutcome enum.

@carljm carljm added the red-knot Multi-file analysis & type inference label Sep 17, 2024
@carljm carljm changed the title [red-knot] [WIP] typing.reveal_type [red-knot] support for typing.reveal_type Sep 17, 2024
@carljm carljm force-pushed the cjm/reveal-type branch 2 times, most recently from 4a82682 to 4b171bd Compare September 17, 2024 21:45
Copy link
Contributor

github-actions bot commented Sep 17, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@carljm carljm marked this pull request as ready for review September 17, 2024 22:07
Comment on lines +241 to +242
/// The `typing.reveal_type` function, which has special `__call__` behavior.
RevealTypeFunction(FunctionType<'db>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to store the entire FunctionType for RevealType? Don't we know its exact signature?

Also, kind of funny to have this as a dedicated type

Copy link
Member

@AlexWaygood AlexWaygood Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree with Micha that there's not much use storing inner data for this variant, since we already have the complexity of having an extra variant (wrapping inner data doesn't really save us any complexity, I don't think), everything about reveal_type's special behaviour will have to be hardcoded in by us anyway, and the signature is known and guaranteed to never change.

Also, kind of funny to have this as a dedicated type

It surprised me initially that this was the approach used, since conceptually it's really the symbol of reveal_type that's special rather than the type of reveal_type. But I suppose that unless it's represented as a separate variant here, it's hard for Type::Call to operate differently when applied to revealed_type as opposed to other functions.

Copy link
Contributor Author

@carljm carljm Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have this as a distinct Type variant?

Conceptually, singleton types (which include Class and Function) are already the same thing as "a symbol" -- they represent a single Python class or function object, defined at a single location, and these types allow us to track this identity across imports (including with aliasing), assignments, etc. So identifying reveal_type via the type system is a natural fit; I don't really see what the alternative would even be.

In principle, we wouldn't need the extra variant for this; Type::Function has all the information we need already. But practically speaking, it would be a bad idea for performance if we had to add an extra check "wait, is this function reveal_type??" at every single call site of any function; is_stdlib_symbol is not free, differentiating an enum tag is much much cheaper. Using a distinct variant allows us to make the is_stdlib_symbol check only at function-definition time, which is much less frequent than function-call time.

Should we store the inner FunctionType inside Type::RevealTypeFunction?

I don't have super-strong feelings here, but I think the overall cost-benefit for doing this is positive.

The cost is very low; a FunctionType is just a u32 id, and RevealTypeFunction is not going to be a super common type. I would venture to say the cost of doing this is effectively zero.

And the benefit is not zero. We don't have to hardcode everything about RevealTypeFunction. If we store the inner FunctionType, we can in almost every case just fall back to the normal Function handling, and only add special reveal_type behavior where it is actually special. For instance, I would rather not hardcode call-signature validation, I'd rather share as much as possible with Type::Function.

So yes, we could skip storing the inner FunctionType, and instead hardcode literally everything about RevealTypeFunction. But I don't see why we'd want to do this, when we don't have to.

Copy link
Member

@AlexWaygood AlexWaygood Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see what the alternative would even be.

The alternative would be this diff, right? All the red-knot tests pass if I apply this diff to main, and to me it seems much simpler conceptually. Perhaps it adds a performance cost, though, because you have to check whether the function is the reveal_type function every time you run Type::Call.

Diff demonstrating an implementation without a new `Type` variant
diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 607fa80ac..38451fd9c 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -238,8 +238,6 @@ pub enum Type<'db> {
     None,
     /// a specific function object
     Function(FunctionType<'db>),
-    /// The `typing.reveal_type` function, which has special `__call__` behavior.
-    RevealTypeFunction(FunctionType<'db>),
     /// a specific module object
     Module(File),
     /// a specific class object
@@ -326,9 +324,7 @@ impl<'db> Type<'db> {
 
     pub const fn into_function_type(self) -> Option<FunctionType<'db>> {
         match self {
-            Type::Function(function_type) | Type::RevealTypeFunction(function_type) => {
-                Some(function_type)
-            }
+            Type::Function(function_type) => Some(function_type),
             _ => None,
         }
     }
@@ -374,9 +370,7 @@ impl<'db> Type<'db> {
     pub fn is_stdlib_symbol(&self, db: &'db dyn Db, module_name: &str, name: &str) -> bool {
         match self {
             Type::Class(class) => class.is_stdlib_symbol(db, module_name, name),
-            Type::Function(function) | Type::RevealTypeFunction(function) => {
-                function.is_stdlib_symbol(db, module_name, name)
-            }
+            Type::Function(function) => function.is_stdlib_symbol(db, module_name, name),
             _ => false,
         }
     }
@@ -450,7 +444,7 @@ impl<'db> Type<'db> {
                 // TODO: attribute lookup on None type
                 Type::Unknown
             }
-            Type::Function(_) | Type::RevealTypeFunction(_) => {
+            Type::Function(_) => {
                 // TODO: attribute lookup on function type
                 Type::Unknown
             }
@@ -499,11 +493,16 @@ impl<'db> Type<'db> {
     fn call(self, db: &'db dyn Db, arg_types: &[Type<'db>]) -> CallOutcome<'db> {
         match self {
             // TODO validate typed call arguments vs callable signature
-            Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)),
-            Type::RevealTypeFunction(function_type) => CallOutcome::revealed(
-                function_type.return_type(db),
-                *arg_types.first().unwrap_or(&Type::Unknown),
-            ),
+            Type::Function(function_type) => {
+                if function_type.is_stdlib_symbol(db, "typing", "reveal_type") {
+                    CallOutcome::revealed(
+                        function_type.return_type(db),
+                        *arg_types.first().unwrap_or(&Type::Unknown),
+                    )
+                } else {
+                    CallOutcome::callable(function_type.return_type(db))
+                }
+            }
 
             // TODO annotated return type on `__new__` or metaclass `__call__`
             Type::Class(class) => CallOutcome::callable(Type::Instance(class)),
@@ -605,7 +604,6 @@ impl<'db> Type<'db> {
             Type::BooleanLiteral(_)
             | Type::BytesLiteral(_)
             | Type::Function(_)
-            | Type::RevealTypeFunction(_)
             | Type::Instance(_)
             | Type::Module(_)
             | Type::IntLiteral(_)
@@ -628,7 +626,7 @@ impl<'db> Type<'db> {
             Type::BooleanLiteral(_) => builtins_symbol_ty(db, "bool"),
             Type::BytesLiteral(_) => builtins_symbol_ty(db, "bytes"),
             Type::IntLiteral(_) => builtins_symbol_ty(db, "int"),
-            Type::Function(_) | Type::RevealTypeFunction(_) => types_symbol_ty(db, "FunctionType"),
+            Type::Function(_) => types_symbol_ty(db, "FunctionType"),
             Type::Module(_) => types_symbol_ty(db, "ModuleType"),
             Type::None => typeshed_symbol_ty(db, "NoneType"),
             // TODO not accurate if there's a custom metaclass...
diff --git a/crates/red_knot_python_semantic/src/types/display.rs b/crates/red_knot_python_semantic/src/types/display.rs
index 3d037fe65..954a19d31 100644
--- a/crates/red_knot_python_semantic/src/types/display.rs
+++ b/crates/red_knot_python_semantic/src/types/display.rs
@@ -36,7 +36,6 @@ impl Display for DisplayType<'_> {
                 | Type::BytesLiteral(_)
                 | Type::Class(_)
                 | Type::Function(_)
-                | Type::RevealTypeFunction(_)
         ) {
             write!(f, "Literal[{representation}]",)
         } else {
@@ -73,9 +72,7 @@ impl Display for DisplayRepresentation<'_> {
             // TODO functions and classes should display using a fully qualified name
             Type::Class(class) => f.write_str(class.name(self.db)),
             Type::Instance(class) => f.write_str(class.name(self.db)),
-            Type::Function(function) | Type::RevealTypeFunction(function) => {
-                f.write_str(function.name(self.db))
-            }
+            Type::Function(function) => f.write_str(function.name(self.db)),
             Type::Union(union) => union.display(self.db).fmt(f),
             Type::Intersection(intersection) => intersection.display(self.db).fmt(f),
             Type::IntLiteral(n) => n.fmt(f),
@@ -194,7 +191,7 @@ impl TryFrom<Type<'_>> for LiteralTypeKind {
     fn try_from(value: Type<'_>) -> Result<Self, Self::Error> {
         match value {
             Type::Class(_) => Ok(Self::Class),
-            Type::Function(_) | Type::RevealTypeFunction(_) => Ok(Self::Function),
+            Type::Function(_) => Ok(Self::Function),
             Type::IntLiteral(_) => Ok(Self::IntLiteral),
             Type::StringLiteral(_) => Ok(Self::StringLiteral),
             Type::BytesLiteral(_) => Ok(Self::BytesLiteral),
diff --git a/crates/red_knot_python_semantic/src/types/infer.rs b/crates/red_knot_python_semantic/src/types/infer.rs
index 52d95f1be..c53ce7b27 100644
--- a/crates/red_knot_python_semantic/src/types/infer.rs
+++ b/crates/red_knot_python_semantic/src/types/infer.rs
@@ -704,12 +704,12 @@ impl<'db> TypeInferenceBuilder<'db> {
             }
         }
 
-        let function_type = FunctionType::new(self.db, name.id.clone(), definition, decorator_tys);
-        let function_ty = if function_type.is_stdlib_symbol(self.db, "typing", "reveal_type") {
-            Type::RevealTypeFunction(function_type)
-        } else {
-            Type::Function(function_type)
-        };
+        let function_ty = Type::Function(FunctionType::new(
+            self.db,
+            name.id.clone(),
+            definition,
+            decorator_tys,
+        ));
 
         self.add_declaration_with_binding(function.into(), definition, function_ty, function_ty);
     }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, yeah -- that's what I addressed in my second paragraph. I guess my first paragraph didn't make much sense / wasn't relevant, because you're not suggesting not using types at all.

This approach is definitely simpler, but I considered it a performance nonstarter to add the overhead of is_stdlib_symbol to every Type::call. Perhaps that's premature optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree the code maintenance is a bigger concern as we accumulate lots of these.

If the contained data inside enum variants is large, that will dominate the overall size of the enum. But in this case the contained data is small (usually just a u32 id; I think IntLiteral is currently our largest variant, as it contains an i64), so the need to find niches to discriminate the enum variants themselves can be relevant. Though in practice in this case there just aren't any niches available in an i64, so the enum requires another entire 8 bytes for the discriminant (doubling the overall size of Type to 16 bytes), and we are certainly not going to reach 2^64 variants 😆

Copy link
Member

@AlexWaygood AlexWaygood Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doubling the overall size of Type to 16 bytes

I believe a FunctionType instance is actually just a u32, though, since it's a salsa-tracked struct — right? So adding a boolean field to FunctionType would increase the size of the underlying data stored in the salsa cache for each FunctionType instance, but wouldn't actually increase the size of the Type enum...?

Copy link
Contributor Author

@carljm carljm Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. I never mentioned the size of the Type enum in relation to adding a boolean to FunctionType. I only brought up the size of the Type enum in relation to having a ton of extra Type variants in the future, and it's probably not even really relevant there.

Making every FunctionType bigger is less bad than making the Type enum bigger would be, but we will have quite a large number of FunctionType in the db.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that was a sloppy misreading of what you said there!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries!

Comment on lines 327 to 334
pub const fn into_function_type(self) -> Option<FunctionType<'db>> {
match self {
Type::Function(function_type) => Some(function_type),
Type::Function(function_type) | Type::RevealTypeFunction(function_type) => {
Some(function_type)
}
_ => None,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the answer is probably no because the function is called into_function_type and not into_function but is there any existing code that makes the assumption that the function only returns Some if the variant is Function?

Copy link
Member

@AlexWaygood AlexWaygood Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes Type::expect_function not operate as expected, since the purpose of that method is partly to assert that the variant you have is a Type::Function variant:

pub fn expect_function(self) -> FunctionType<'db> {
self.into_function_type()
.expect("Expected a Type::Function variant")
}

Copy link
Contributor Author

@carljm carljm Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any existing code that makes the assumption that the function only returns Some if the variant is Function?

No, there's no such code. And I think we generally want any external code to ideally transparently treat Type::RevealTypeFunction as if it were Type::Function. Because reveal_type is a function, and should behave as one.

This is another really good reason to store the inner FunctionType inside RevealTypeFunction.

I think this makes Type::expect_function not operate as expected

I think this makes Type::expect_function behave precisely how we want it to, which is that the caller can transparently get out the correct FunctionType without having to special-case RevealTypeFunction. Once the caller has the FunctionType, there's no reason they should have to know or care about whether it was a Type::Function or Type::RevealTypeFunction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes Type::expect_function behave precisely how we want it to

Well, you'll at least need to change the message in the .expect() call here, to something like "expected a variant that wraps a FunctionType"

pub fn expect_function(self) -> FunctionType<'db> {
self.into_function_type()
.expect("Expected a Type::Function variant")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Fixed the message.

Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)),
Type::RevealTypeFunction(function_type) => CallOutcome::RevealType {
return_ty: function_type.return_type(db),
revealed_ty: *arg_types.first().unwrap_or(&Type::Unknown),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be None instead for better error messages? It's a difference whether a type is inferred as Unknown or the argument was missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this shouldn't be handled as a special case for CallOutcome::RevealType, it should be handled by general call-argument-matching-and-validation, which applies to all function calls (and which we don't have yet.) So this code is presuming that in the future (once we add call signature checking) you would get a diagnostic from that; we don't need extra handling it for it here, we should just reveal unknown.

Copy link
Member

@MichaReiser MichaReiser Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly. But then you end up with two diagnostics where one is the error and the other is the reveal type diagnostic which doesn't seem ideal to me. But I can see that special casing reveal type's diagnostic is probably more work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point that it may not make sense to even have a revealed-type diagnostic if the call to reveal_type is incorrect. But when we add call-signature checking (which can be shared in a common function used across both the Function and RevealTypeFunction arms, because it just needs the call arguments and the FunctionType), the RevealTypeFunction arm isn't required to return a CallOutcome::RevealType at all. If it fails call signature checking, it can return a different outcome type, that won't issue a revealed-type diagnostic at all.

So in that case you can consider this fallback just a temporary band-aid until we have call signature checking (which I already added a TODO for above.)

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
Comment on lines 520 to 524
outcomes: union
.elements(db)
.iter()
.map(|elem| elem.call(db, arg_types))
.collect(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to unify the elements because multiple elements could have the same return type (or even all of them)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's handled by CallOutcome::return_ty and CallOutcome::unwrap_with_diagnostic, in the CallOutcome::Union arms, where we do build a union of all the return types.

// TODO validate typed call arguments vs callable signature
Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)),
Type::RevealTypeFunction(function_type) => CallOutcome::RevealType {
return_ty: function_type.return_type(db),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a fixed return type or is the reveal_type function similar to dbg where it is an identity function with the side effect that it prints the type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@carljm carljm Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be possible here to instead set return_ty to the same as the revealed type. But this is another case where I would prefer to have less special-casing, and allow the return type to be inferred via the typeshed stubs, the same as for any other function. The result should be the same, although avoiding special-casing means we won't have that correct result until we add support for generic functions.

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved
@@ -619,6 +651,136 @@ impl<'db> From<&Type<'db>> for Type<'db> {
}
}

#[derive(Debug, Clone, PartialEq, Eq)]
enum CallOutcome<'db> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the name Outcome a bit strange but I think that's something you plan to address as part of your context proposal.

Copy link
Contributor Author

@carljm carljm Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any plans to address this. My document was just listing out pros and cons as I saw them of different options I'd considered in working on this, so as to be able to refer back to them later. My conclusion for now was to not do a context object, and stick with this style.

It's very possible that we later decide it makes sense to do a type inference context object, but that's not clear to me yet and not something I plan to do in the short term.

So if you don't like the name "Outcome" then we should brainstorm alternative name options, and we can change both CallOutcome and IterationOutcome. But maybe in a separate PR.

Comment on lines +241 to +242
/// The `typing.reveal_type` function, which has special `__call__` behavior.
RevealTypeFunction(FunctionType<'db>),
Copy link
Member

@AlexWaygood AlexWaygood Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree with Micha that there's not much use storing inner data for this variant, since we already have the complexity of having an extra variant (wrapping inner data doesn't really save us any complexity, I don't think), everything about reveal_type's special behaviour will have to be hardcoded in by us anyway, and the signature is known and guaranteed to never change.

Also, kind of funny to have this as a dedicated type

It surprised me initially that this was the approach used, since conceptually it's really the symbol of reveal_type that's special rather than the type of reveal_type. But I suppose that unless it's represented as a separate variant here, it's hard for Type::Call to operate differently when applied to revealed_type as opposed to other functions.

Comment on lines 327 to 334
pub const fn into_function_type(self) -> Option<FunctionType<'db>> {
match self {
Type::Function(function_type) => Some(function_type),
Type::Function(function_type) | Type::RevealTypeFunction(function_type) => {
Some(function_type)
}
_ => None,
}
}
Copy link
Member

@AlexWaygood AlexWaygood Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes Type::expect_function not operate as expected, since the purpose of that method is partly to assert that the variant you have is a Type::Function variant:

pub fn expect_function(self) -> FunctionType<'db> {
self.into_function_type()
.expect("Expected a Type::Function variant")
}

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
// TODO validate typed call arguments vs callable signature
Type::Function(function_type) => CallOutcome::callable(function_type.return_type(db)),
Type::RevealTypeFunction(function_type) => CallOutcome::RevealType {
return_ty: function_type.return_type(db),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carljm carljm merged commit c173ec5 into main Sep 18, 2024
20 checks passed
@carljm carljm deleted the cjm/reveal-type branch September 18, 2024 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants