Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exhaustive_match_output RFC #3340

Closed
wants to merge 5 commits into from

Conversation

Xaeroxe
Copy link
Contributor

@Xaeroxe Xaeroxe commented Oct 31, 2022

@Xaeroxe Xaeroxe changed the title Add exhaustive_output RFC Add exhaustive_match_output RFC Oct 31, 2022
@joshtriplett joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 31, 2022
@joshtriplett
Copy link
Member

I love it. It's simple, self-explanatory even if you haven't seen it, and helps with fiddly parsers/decoders.

Co-authored-by: Josh Triplett <josh@joshtriplett.org>
@afetisov
Copy link

afetisov commented Nov 1, 2022

The alternatives section misses the obvious current solution: the dead code lint, which warns that "variant .. is never constructed". Now, it won't work if you construct that variant in some other way, or if the enum is public (so that downstream users can construct it), but in practice it covers many of those "bag of variants" enums.

Also, since it doesn't affect the match semantics and is just a lint, it should be introduced as a lint. Perhaps an allow-by-default clippy::restriction lint?

How does it interact with #[non_exhaustive] enums? It feels like if the enum is defined in a different crate, then it's wrong to try and enforce any exhaustiveness of constructed variants. It would be a direct semver hazard.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 1, 2022

The alternatives section misses the obvious current solution: the dead code lint

Good point, I'll mention it. As you say, the dead code lint doesn't cover all such scenarios like this so I think this attribute is still useful.

Perhaps an allow-by-default clippy::restriction lint?

I considered this, but it seemed like a poor fit for clippy. Most clippy lints seem like you're intended to turn them on for the whole project, not just one match statement.

How does it interact with #[non_exhaustive] enums? It feels like if the enum is defined in a different crate, then it's wrong to try and enforce any exhaustiveness of constructed variants. It would be a direct semver hazard.

Perhaps this should be made incompatible with external non_exhaustive enums. I'm only intending for this to be a warning by default, i.e. doesn't inhibit compilation. If someone were to elevate the lint to deny would rustc stop compiling it even if the code were inside a dependency? I'm not sure how deny works if the code is a dependency.

@afetisov
Copy link

afetisov commented Nov 1, 2022

A built-in attribute is a significant burden. If something can be made just a lint, I see no reason not to make it a lint.

It's true that I can't think of any lints which are expected to be attached to a specific expression, but that's mostly because you don't normally know where an error could occur. In principle, I don't see a reason why a lint couldn't work that way.

In fact, now that you mention it, I see even stronger reasons to make it a lint. Why not attach the attribute to the whole function, so that several individual matches inside of it (e.g. in different branches) can be checked for exhaustive outputs? Why not attach it to a module, or even to the crate? There are low-level crates and modules which mostly consist of several huge definitions of data enums, and their parsing/serialization logic. We could enable the lint once at the crate root, and perhaps disable it on several specific functions which are not expected to give exhaustive outputs.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 1, 2022

A built-in attribute is a significant burden.

Is there something I can read to understand why this is?

@afetisov
Copy link

afetisov commented Nov 1, 2022

Ok, maybe "significant" is an over-exaggeration on my part (but I really think so). My opinion is that the attributes are accepted too lightly, without giving proper weight to the fact that each attribute is a small language extension, a burden on all readers of the code (which can't be expected to know by heart all attributes in the Rust docs, and there is no way to search for attribute definition in code) and that there is no way to manage the complexity of growing number of attributes, since they all exist at the global scope and can't be abstracted in any way. In particular, all built-in attributes can conflict with any users' custom attributes (either proc macros, or their auxiliary attributes).

For that reason, if it's possible to scope the attribute, then it's best to do it. For example, if I see #[deny(clippy::fusrodah)], then I immediately know that it's denying some clippy lint, doesn't affect semantics in any way, and can't conflict with any other attributes. I don't need to search what fusrodah means, Clippy will tell me if it matters.

In my opinion, it's just too easy to dump any language extensions into attributes, since you don't need to invent extra syntax or think about interaction with alternative parsers (in macros or tooling). This causes a tendency to push all language extensions into attributes, turning them into that weird meta-Rust, with horrible syntax and potentially unforseen interactions with other language features (it's just so easy to forget about some obscure attribute which may not be present in code, rather than stable syntax, which is at least explicitly defined in the parsers and the reference).

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 1, 2022

a small language extension, a burden on all readers of the code

This is true of most abstractions yes? Abstractions, be they functions, macros, attributes, syntax, or lints have to earn their place. The overhead involved in understanding them needs to be worth the power or expressiveness that they introduce. I can express the documentation for this particular attribute in a paragraph or two, and it helps with some really common use cases, namely parsers/decoders of any sort. In order to use mental burden as justification for rejecting the RFC I think we need to demonstrate two things.

  1. There is another way to implement the feature described, it doesn't have to live in rustc.
  2. The use case is specific enough that it shouldn't be in generalized tooling.

I don't believe we have justification for point 2, parsers/decoders are very common. As for point 1, I don't have much ideological issue with the lint living in some other tool, such as clippy, but given that there's no precedent for putting this sort of lint in clippy I'm reluctant to try and steer the direction of rust-lang tooling without input from one of the rust-lang teams.

I agree it'd be useful to namespace rustc lints, but that's out of scope for this particular RFC.

@Lokathor
Copy link
Contributor

Lokathor commented Nov 1, 2022

lints without a prefix are implicitly rustc lints. it's not a great system, but it's the system.

As to this being in clippy or rustc, in general it's always fine for any lint of a more narrow nature to start in clippy and graduate to rustc if it turns out to be essential. Or it can live forever in clippy, there's not much harm in that either.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 1, 2022

That makes sense to me and I'm willing to transition this to a clippy lint, if someone involved with a rust-lang team can confirm for me that this is the correct thing to do. I can't find any prior precedent for this sort of thing being a clippy lint. To me it's just a matter of "what repo should it be written in?" The functionality and purpose is largely still the same.

@Lokathor
Copy link
Contributor

Lokathor commented Nov 1, 2022

Josh T, the first to reply in this PR, is on T-lang if that's what you mean. I'm sure they'll chime in again when available.

The contribution guide for clippy is here: https://github.com/rust-lang/rust-clippy/blob/master/CONTRIBUTING.md

And the most important detail of clippy development, at least for this discussion, is that it does not have tightly controlled requirements for new lints. You can just open an issue in the repo about a new lint idea, and generally it can be added to clippy without any huge fanfare or oversight process, just a fairly normal PR review cycle.

@Noratrieb
Copy link
Member

I like it! My only concern is that, as an attribute, it's not very discoverable but that is something that needs to be fixed in other places anyways imo, so it shouldn't be a blocker.
I think that the #[non_exhaustive] attribute to control exhaustiveness in combination with Rusts default exhaustive match can be seen as prior art as well, even if it's the other way around there.

@programmerjake
Copy link
Member

one reason to not just have this be a lint is that a lint would apply to all match statements rather than just the one match statement with the attribute, e.g.:

pub enum MyEnum {
    ZeroZero,
    ZeroOne,
    One,
    Two,
    Other,
}

pub fn demo(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[warn(exhaustive_match_output)]
        match iter.next()? {
        0 => match iter.next()? {
            // undesirably warns on inner match, because lint attributes apply to
            // everything inside rather than just what's directly attributed.
            0 => MyEnum::ZeroZero,
            1 => MyEnum::ZeroOne,
            _ => MyEnum::Other,
        },
        1 => MyEnum::One,
        2 => MyEnum::Two,
        _ => MyEnum::Other,
    })
}

pub fn demo2(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[exhaustive_match_output]
        match iter.next()? {
        0 => match iter.next()? {
            // doesn't warn on inner match -- as desired
            0 => MyEnum::ZeroZero,
            1 => MyEnum::ZeroOne,
            _ => MyEnum::Other,
        },
        1 => MyEnum::One,
        2 => MyEnum::Two,
        _ => MyEnum::Other,
    })
}

@Lokathor
Copy link
Contributor

Lokathor commented Nov 2, 2022

Can't we just make it an allow by default lint if it shouldn't apply to all match statements by default?

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 2, 2022

I think the issue here is that when you apply the deny attribute to the outer match, it also cascades into the inner match.

@afetisov
Copy link

afetisov commented Nov 2, 2022

@programmerjake I'd say it's very debatable whether your example is working as intended in either case.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 2, 2022

That may be, but the point they were trying to illustrate still stands. Here's a different example

pub enum MyEnum {
	Zero(Inner)
    One,
    Two,
    Other,
}

pub enum Inner {
	Zero,
	One,
	Two,
	Other,
}

pub fn demo(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[warn(exhaustive_match_output)]
        match iter.next()? {
	        0 => MyEnum::Zero(match iter.next()? {
	            // undesirably warns on inner match, because lint attributes apply to
	            // everything inside rather than just what's directly attributed.
	            0 => Inner::Zero,
	            1 => Inner::One,
	            _ => Inner::Other,
	        }),
	        1 => MyEnum::One,
	        2 => MyEnum::Two,
	        _ => MyEnum::Other,
    	}
	)
}

pub fn demo2(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[exhaustive_match_output]
        match iter.next()? {
			0 => MyEnum::Zero(match iter.next()? {
	            // doesn't warn on inner match -- as desired
	            0 => Inner::Zero,
	            1 => Inner::One,
	            _ => Inner::Other,
	        }),
	        1 => MyEnum::One,
	        2 => MyEnum::Two,
	        _ => MyEnum::Other,
	    }
	)
}

@scottmcm
Copy link
Member

scottmcm commented Nov 3, 2022

This is basically the other half of https://doc.rust-lang.org/nightly/rustc/lints/listing/allowed-by-default.html#non-exhaustive-omitted-patterns -- that was about matching things, and this is about producing things -- so my default intuition here is that this should work exactly the same: allow-by-default lint that will never be on-by-default, but can be turned on in specific situations as needed.

I agree we should avoid attribute additions in general, especially when they're about static analysis and not semantics. There's an infinite number of possible things that might want attributes to mark them, so those things would be best as lints or tool attributes or ghost code whenever possible.

# Motivation
[motivation]: #motivation

Oftentimes it is necessary to convert unrestricted values into enum variants. Unrestricted values can include integers, strings, UUIDs, even other much larger enums.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that integers are exhaustive -- we even use that in matching, where you don't need wildcards to match integers (or chars, for that matter!).

So shouldn't this use that same exhaustiveness checking? What if I wrote

#[exhaustive_output]
match my_u8 {
    0 => 255,
    x @ 1..=254 => x,
    255 => 1, // Oops, I meant `0`
}

Could that give me a warning that I wasn't exhaustive?

More plausibly, if we get something like #[niche], then arguably anything we can support for enums we should be supporting for niched integers too -- otherwise it's arguably better to #[repr(u8)] enum Foo { _1 = 1, _2 = 2, _3 = 3 } than to use RefinedU8<1..=3>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so, though from an implementation perspective this complicates things a lot as the compiler has to infer the possible outputs of x @ 1..=254 => x,, and pretty much every other integer pattern.

@Nemo157
Copy link
Member

Nemo157 commented Nov 3, 2022

This has been requested as a clippy lint previously. As mentioned in that issue as a lint it's hard to know what to lint on specifically, and the only suggested solution was some sort of attribute like proposed here.

@Nemo157
Copy link
Member

Nemo157 commented Nov 3, 2022

How does it interact with #[non_exhaustive] enums? It feels like if the enum is defined in a different crate, then it's wrong to try and enforce any exhaustiveness of constructed variants. It would be a direct semver hazard.

Perhaps this should be made incompatible with external non_exhaustive enums.

non_exhaustive enums are a primary usecase for this sort of lint. Those are the most likely enums to have a new variant added to, and as the creator of the enum you want to update all dynamic creation-points to potentially create the new variant.

@kornelski
Copy link
Contributor

kornelski commented Nov 7, 2022

edit: nevermind

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 7, 2022

@kornelski

This was suggested as an attribute specifically because the behavior it introduces is undesirable for many match statements. I don't see how it's possible or desirable to implement this without specifically marking the match as opting in to this behavior.

@ijackson
Copy link

ijackson commented Nov 9, 2022

The RFC uses the phrase "arm(s) for (some) variant" (emph mine). I don't think this concept of an arm being "for" a variant is properly defined in the RFC. It should be. I guess the intended meaning is something like "the arm is precisely an enum struct expression for the variant, possibly in curly brackets".

The handling of if guards on match arms should be discussed.

With that definition, the use cases are fairly limited. For example, if any of the arms need to contain an inner if the lint will fire.

Under "alternatives", the possibility should be discussed of solving the problem ("we might forget to add an arm") through a #[test] test case. (A package like strum can be used to iterate over the variants at runtime in the test.)

A limitation of this proposal is that it can't be used where the match wants to return the enum value and also some other information (a separate categorisation, or some additional data, or something). An alternative which would do this would be a way to annotate an expression with a requirement that somewhere lexically within that expression, all variants of a nominated enum must be constructed. (I don't think this is a particularly good idea, but it should be in "alternatives" probably.) Ie, not limit it to match, and require the enum type to be specified explicitly.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Nov 10, 2022

I don't think this concept of an arm being "for" a variant is properly defined in the RFC. It should be.

Agreed! I consider an arm to be "for" the variant if the arm contains a return expression with that variant. This means it's technically possible for an arm to contain multiple variants, I think this is good.

Now, my next question is "how far should the compiler go in trying to determine what variants might be returned by a given expression?" i.e. if an arm contains a function call which returns a variant, do we count all possible variants that function might return as well?

I think there isn't a clear answer here. If we say "yes, the compiler should recursively enter functions in return position to determine what variants they return" then we have a few problems

  1. A variant might get counted that will never be returned from that function call. This felt okay when all of the logic was contained in the arm, but it feels not okay when the logic is elsewhere in the program.

  2. This could cause some serious performance concerns with the lint.

  3. The implementation is now much more complex and might result in scanning large portions of the code base.

However if we say "no, the compiler should consider a function call as though it were no variants" then the lint suddenly isn't compatible with helper methods or enum constructors. This feels like a pretty significant limitation, but maybe the better option is to just accept that this is how the lint must work.

The handling of if guards on match arms should be discussed.

I think probably the best answer here is "ignore the presence or absence of a guard when evaluating an arm".

I believe the lint should be satisfied that it is possible to output a variant. It's not the responsibility of the lint to check your logic in the guard.

A limitation of this proposal is that it can't be used where the match wants to return the enum value and also some other information

Right, and in the event that multiple enums are being returned it's not clear which enum this lint should apply to. We could arbitrarily pick one, but any choice must be arbitrary. So, I think the correct answer is "refuse the question". The return type of the match must be an enum, not a type containing an enum. I think it's likely that's why you say "I don't think this is a particularly good idea"

Thanks for your feedback!

@ahicks92
Copy link

What happens if:

let some_value: bool = random_value();

match an_integer {
    1 => if some_value { variant1 } else { Variant2 },
    ...
}

It seems that either (1) the attribute must be placed on a match of the exact form x => constant_expression, or (2) it can't warn for all match statements that it might be applied to.

@CraftSpider
Copy link

My first thought on that is only supporting constant-expression right hands, since most cases like that can be rewritten like

match (int, bool) {
    (1, true) => Variant1,
    (2, false) => Variant2,
    ...
}

Which seems like it could be supported

@ahicks92
Copy link

Actually apparently that was already discussed. For whatever reason my first look here had GitHub not listing any comments. But now my wider objection is that this is "warn if you don't cover all variants but only if there are no function calls, you don't return anything else, you never return multiple variants in a branch, and you don't use if guards"

This seems more like something that should just be a normal lint, "it looks like you are trying to convert a whatever to this enum but you are missing..." wherever it can be applied already, rather than something where a human figures out where it might be applied and remembers to do so. If a human user is expected to remember it, that seems to me to be the same human-memory-space of remembering that strum exists, or that maybe you should search for a crate etc. So instead if we want this, I'd make the human user either globally allow it if it's on by default, allow it in specific cases if it's on by default, or let the specialized subset of people who are doing this frequently just enable it for their code.

The only other Rust feature I can think of offhand with that many edge cases is const generics and I guess also GAT, but both of those are that way because they're not "finished" in some sense and not by design.

Now all that said I would actually use this. I see the utility in having a mechanism, I just don't know that I agree with the mechanism because the mechanism is weirdly restricted and doesn't have a good model. I suppose one last alternative is to make the compiler warn on any match statement it can't fully handle, but that kind of feels like punting on the problem.

Lastly, if we do this attribute, how is it being taught? This seems like the kind of thing that gets buried in the reference or something, though at this point perhaps someone has a complete C FFI guide that it'd fit into.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented May 29, 2023

So, I've thought about this some more and I've come to a few conclusions.

  1. I think this lint should be all or nothing. Go deep on determining how many variants can be returned, recursively scan functions if a match arm calls a function. The major exception to this will have to be dynamic function calls. Because static analysis can't show what kinds of variants will be returned, a dynamic function call must count as containing zero variants. "Dynamic function call" involves the use of fn() function pointers, and dyn trait objects. For similar reasons extern "C" functions must also count as zero variants.
    1. When evaluating what variants a function will return, any arguments that are const and used in branching will be used to narrow down what paths the function may take. Any path eliminated in this manner will not have variants from it counted. This information about skipped paths may be provided in the warning diagnostic if it's determined to be relevant.
  2. Because this recursive scanning is going to come with a performance impact it should not be part of rustc, and instead should be part of clippy. That way the performance penalty associated with this kind of scanning is not a mandatory part of the build process.
  3. If the performance hindrance associated with this feature causes an excessive rise in how long it takes for clippy to execute, and it is deemed impractical or impossible to optimize that time away, this feature will not be implemented.
  4. I'd like to implement this as an "opt-out" attribute rather than an "opt-in" attribute, but only if our heuristics for determining what it should apply to are really good. As a first draft for these heuristics, I propose this.
    1. A match statement will not be subject to this lint if the enum being returned has less than 4 variants.
    2. A match statement will not be subject to this lint unless it already returns more than half of the variants. i.e. 3 of 4 triggers a reminder, 2 of 4 does not, 4 of 5 triggers a reminder, 1 of 128 does not, 127 of 128 triggers a reminder.
  5. This will apply to any type which uses an enum as part of its construction. i.e. Arc<MyEnum>, Cow<MyEnum>, (i32, MyEnum) and Arc<(i32, MyEnum)> are all valid candidates for this. Please note that having an enum as part of the generic arguments is not sufficient to meet this criteria, the enum must be used as part of the construction of the type. So PhantomData<MyEnum> does not count.

I recognize that I've described an incredibly complex feature here, and it will almost certainly need further clarification and definition. For starters, I'd like to get some feedback on how the community feels about my general ideas.

@programmerjake
Copy link
Member

i think it should not do analysis across function boundaries, however, if it does, imho there is no good reason to not analyze extern "C" functions too, changing the function ABI shouldn't change what enum variants it returns.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented May 29, 2023

Okay, small clarification, that statement isn't meant to apply to extern "C" functions defined in Rust, it's meant to apply to extern "C" functions that are defined in C language files.

@Xaeroxe
Copy link
Contributor Author

Xaeroxe commented Jul 19, 2023

The group should know I have no current plans to progress this RFC as-is. It's clearly not refined enough to describe clearly exactly what it is going to do and my attempts to refine it have led to multiple approaches with no clear winners. I am left wondering if this issue would be better resolved by macro crates in the crates io ecosystem because any possible solution to these issues is bound to be controversial for some reason or another.

@Xaeroxe Xaeroxe closed this Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.