Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

analyze: add pointee_type analysis #1029

Merged
merged 7 commits into from
Oct 13, 2023
Merged

Conversation

spernsteiner
Copy link
Collaborator

This branch adds an analysis that computes a set of possible pointee types for each pointer. It includes only the analysis; the analysis results aren't used to drive any rewriting yet. On algo_md5, the analysis correctly identifies that the _input argument to MD5_Update should actually point to u8s, that the memset in li_MD5Transform operates on u32s, and that the memset in MD5_Final operates on an MD5_CTX.

The next step after this will be to use the analysis results in rewriting. We need to rewrite pointer types to use the inferred pointees (such as changing _input: &c_void to _input: &[u8]), replace memcpy and memset with safe operations, and (optionally) delete casts that are now no-ops (such as input = _input as *const c_uchar, once input and _input are both rewritten to &[u8]). This may also require some changes to our current void* cast handling.

#[derive(Clone, Copy, PartialEq, Eq, Debug, Hash)]
pub enum CTy<'tcx> {
Ty(LTy<'tcx>),
/// An inference variable. Note that inference variables are scoped to the local function;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is an inference variable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the state of some variable before its equivalence to a type is determined?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a unification variable as used in type inference.


#[derive(Clone, Copy, PartialEq, Eq, Debug, Hash)]
pub enum Constraint<'tcx> {
/// The set of types for pointer `.0` must contain type `.1`. This is used for "uses" of a
Copy link
Contributor

@aneksteind aneksteind Sep 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the notation .N?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referring to the anonymous fields of this enum variant, similar to field access in tuples and tuple structs (e.g. let foo = (1, 2); return foo.0;)

AllTypesCompatible(PointerId),

/// The set of types for pointer `.0` must be a subset of the set of types for pointer `.1`.
/// This is used for pointer assignments like `p = q`, among other things.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's apparent to those with dataflow familiarity but I think it's worth equating .0 and q and .1 and p in the documentation somehow

Copy link
Collaborator Author

@spernsteiner spernsteiner Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted comment: 763e88f

@aneksteind
Copy link
Contributor

is this doing equality saturation? it looks very familiar to the approach egg takes with egraphs, so i'm curious why something like that wasn't used and why this was done by hand?

self.visit_place(pl);
}
Rvalue::Aggregate(_, ref ops) => {
// TODO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make TODO more specfic please, i'm not sure what's left to be done

Copy link
Collaborator Author

@spernsteiner spernsteiner Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these a little more specific: 5f1a2a1

@aneksteind
Copy link
Contributor

It's a little difficult for me to review this for correctness, especially given there are no tests. How can I be most useful to you here as a reviewer? One of my concerns is that it's all hand-made with no reference to a standard equivalence constraint solving algorithm. We do this in a few places and it seems repetitious and error-prone; please tell me if my concerns are overblown.

@spernsteiner spernsteiner changed the base branch from analyze-pointee-type-base to master October 13, 2023 21:20
@spernsteiner spernsteiner merged commit 2262835 into master Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants