Add `exhaustive_match_output` RFC #3340

Xaeroxe · 2022-10-31T21:22:05Z

text/3340-exhaustive-output.md

joshtriplett · 2022-10-31T22:50:59Z

I love it. It's simple, self-explanatory even if you haven't seen it, and helps with fiddly parsers/decoders.

Co-authored-by: Josh Triplett <josh@joshtriplett.org>

afetisov · 2022-11-01T00:07:00Z

The alternatives section misses the obvious current solution: the dead code lint, which warns that "variant .. is never constructed". Now, it won't work if you construct that variant in some other way, or if the enum is public (so that downstream users can construct it), but in practice it covers many of those "bag of variants" enums.

Also, since it doesn't affect the match semantics and is just a lint, it should be introduced as a lint. Perhaps an allow-by-default clippy::restriction lint?

How does it interact with #[non_exhaustive] enums? It feels like if the enum is defined in a different crate, then it's wrong to try and enforce any exhaustiveness of constructed variants. It would be a direct semver hazard.

Xaeroxe · 2022-11-01T00:37:44Z

The alternatives section misses the obvious current solution: the dead code lint

Good point, I'll mention it. As you say, the dead code lint doesn't cover all such scenarios like this so I think this attribute is still useful.

Perhaps an allow-by-default clippy::restriction lint?

I considered this, but it seemed like a poor fit for clippy. Most clippy lints seem like you're intended to turn them on for the whole project, not just one match statement.

How does it interact with #[non_exhaustive] enums? It feels like if the enum is defined in a different crate, then it's wrong to try and enforce any exhaustiveness of constructed variants. It would be a direct semver hazard.

Perhaps this should be made incompatible with external non_exhaustive enums. I'm only intending for this to be a warning by default, i.e. doesn't inhibit compilation. If someone were to elevate the lint to deny would rustc stop compiling it even if the code were inside a dependency? I'm not sure how deny works if the code is a dependency.

afetisov · 2022-11-01T00:56:52Z

A built-in attribute is a significant burden. If something can be made just a lint, I see no reason not to make it a lint.

It's true that I can't think of any lints which are expected to be attached to a specific expression, but that's mostly because you don't normally know where an error could occur. In principle, I don't see a reason why a lint couldn't work that way.

In fact, now that you mention it, I see even stronger reasons to make it a lint. Why not attach the attribute to the whole function, so that several individual matches inside of it (e.g. in different branches) can be checked for exhaustive outputs? Why not attach it to a module, or even to the crate? There are low-level crates and modules which mostly consist of several huge definitions of data enums, and their parsing/serialization logic. We could enable the lint once at the crate root, and perhaps disable it on several specific functions which are not expected to give exhaustive outputs.

Xaeroxe · 2022-11-01T01:03:21Z

A built-in attribute is a significant burden.

Is there something I can read to understand why this is?

afetisov · 2022-11-01T01:22:52Z

Ok, maybe "significant" is an over-exaggeration on my part (but I really think so). My opinion is that the attributes are accepted too lightly, without giving proper weight to the fact that each attribute is a small language extension, a burden on all readers of the code (which can't be expected to know by heart all attributes in the Rust docs, and there is no way to search for attribute definition in code) and that there is no way to manage the complexity of growing number of attributes, since they all exist at the global scope and can't be abstracted in any way. In particular, all built-in attributes can conflict with any users' custom attributes (either proc macros, or their auxiliary attributes).

For that reason, if it's possible to scope the attribute, then it's best to do it. For example, if I see #[deny(clippy::fusrodah)], then I immediately know that it's denying some clippy lint, doesn't affect semantics in any way, and can't conflict with any other attributes. I don't need to search what fusrodah means, Clippy will tell me if it matters.

In my opinion, it's just too easy to dump any language extensions into attributes, since you don't need to invent extra syntax or think about interaction with alternative parsers (in macros or tooling). This causes a tendency to push all language extensions into attributes, turning them into that weird meta-Rust, with horrible syntax and potentially unforseen interactions with other language features (it's just so easy to forget about some obscure attribute which may not be present in code, rather than stable syntax, which is at least explicitly defined in the parsers and the reference).

Xaeroxe · 2022-11-01T02:02:14Z

a small language extension, a burden on all readers of the code

This is true of most abstractions yes? Abstractions, be they functions, macros, attributes, syntax, or lints have to earn their place. The overhead involved in understanding them needs to be worth the power or expressiveness that they introduce. I can express the documentation for this particular attribute in a paragraph or two, and it helps with some really common use cases, namely parsers/decoders of any sort. In order to use mental burden as justification for rejecting the RFC I think we need to demonstrate two things.

There is another way to implement the feature described, it doesn't have to live in rustc.
The use case is specific enough that it shouldn't be in generalized tooling.

I don't believe we have justification for point 2, parsers/decoders are very common. As for point 1, I don't have much ideological issue with the lint living in some other tool, such as clippy, but given that there's no precedent for putting this sort of lint in clippy I'm reluctant to try and steer the direction of rust-lang tooling without input from one of the rust-lang teams.

I agree it'd be useful to namespace rustc lints, but that's out of scope for this particular RFC.

Lokathor · 2022-11-01T02:06:29Z

lints without a prefix are implicitly rustc lints. it's not a great system, but it's the system.

As to this being in clippy or rustc, in general it's always fine for any lint of a more narrow nature to start in clippy and graduate to rustc if it turns out to be essential. Or it can live forever in clippy, there's not much harm in that either.

Xaeroxe · 2022-11-01T02:22:18Z

That makes sense to me and I'm willing to transition this to a clippy lint, if someone involved with a rust-lang team can confirm for me that this is the correct thing to do. I can't find any prior precedent for this sort of thing being a clippy lint. To me it's just a matter of "what repo should it be written in?" The functionality and purpose is largely still the same.

Lokathor · 2022-11-01T02:31:43Z

Josh T, the first to reply in this PR, is on T-lang if that's what you mean. I'm sure they'll chime in again when available.

The contribution guide for clippy is here: https://github.com/rust-lang/rust-clippy/blob/master/CONTRIBUTING.md

And the most important detail of clippy development, at least for this discussion, is that it does not have tightly controlled requirements for new lints. You can just open an issue in the repo about a new lint idea, and generally it can be added to clippy without any huge fanfare or oversight process, just a fairly normal PR review cycle.

Noratrieb · 2022-11-01T06:10:27Z

I like it! My only concern is that, as an attribute, it's not very discoverable but that is something that needs to be fixed in other places anyways imo, so it shouldn't be a blocker.
I think that the #[non_exhaustive] attribute to control exhaustiveness in combination with Rusts default exhaustive match can be seen as prior art as well, even if it's the other way around there.

programmerjake · 2022-11-02T03:21:58Z

one reason to not just have this be a lint is that a lint would apply to all match statements rather than just the one match statement with the attribute, e.g.:

pub enum MyEnum {
    ZeroZero,
    ZeroOne,
    One,
    Two,
    Other,
}

pub fn demo(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[warn(exhaustive_match_output)]
        match iter.next()? {
        0 => match iter.next()? {
            // undesirably warns on inner match, because lint attributes apply to
            // everything inside rather than just what's directly attributed.
            0 => MyEnum::ZeroZero,
            1 => MyEnum::ZeroOne,
            _ => MyEnum::Other,
        },
        1 => MyEnum::One,
        2 => MyEnum::Two,
        _ => MyEnum::Other,
    })
}

pub fn demo2(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[exhaustive_match_output]
        match iter.next()? {
        0 => match iter.next()? {
            // doesn't warn on inner match -- as desired
            0 => MyEnum::ZeroZero,
            1 => MyEnum::ZeroOne,
            _ => MyEnum::Other,
        },
        1 => MyEnum::One,
        2 => MyEnum::Two,
        _ => MyEnum::Other,
    })
}

Lokathor · 2022-11-02T03:42:52Z

Can't we just make it an allow by default lint if it shouldn't apply to all match statements by default?

Xaeroxe · 2022-11-02T04:43:10Z

I think the issue here is that when you apply the deny attribute to the outer match, it also cascades into the inner match.

afetisov · 2022-11-02T10:26:21Z

@programmerjake I'd say it's very debatable whether your example is working as intended in either case.

Xaeroxe · 2022-11-02T22:34:11Z

That may be, but the point they were trying to illustrate still stands. Here's a different example

pub enum MyEnum {
	Zero(Inner)
    One,
    Two,
    Other,
}

pub enum Inner {
	Zero,
	One,
	Two,
	Other,
}

pub fn demo(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[warn(exhaustive_match_output)]
        match iter.next()? {
	        0 => MyEnum::Zero(match iter.next()? {
	            // undesirably warns on inner match, because lint attributes apply to
	            // everything inside rather than just what's directly attributed.
	            0 => Inner::Zero,
	            1 => Inner::One,
	            _ => Inner::Other,
	        }),
	        1 => MyEnum::One,
	        2 => MyEnum::Two,
	        _ => MyEnum::Other,
    	}
	)
}

pub fn demo2(iter: &mut impl Iterator<Item = u8>) -> Option<MyEnum> {
    Some(#[exhaustive_match_output]
        match iter.next()? {
			0 => MyEnum::Zero(match iter.next()? {
	            // doesn't warn on inner match -- as desired
	            0 => Inner::Zero,
	            1 => Inner::One,
	            _ => Inner::Other,
	        }),
	        1 => MyEnum::One,
	        2 => MyEnum::Two,
	        _ => MyEnum::Other,
	    }
	)
}

scottmcm · 2022-11-03T04:00:17Z

This is basically the other half of https://doc.rust-lang.org/nightly/rustc/lints/listing/allowed-by-default.html#non-exhaustive-omitted-patterns -- that was about matching things, and this is about producing things -- so my default intuition here is that this should work exactly the same: allow-by-default lint that will never be on-by-default, but can be turned on in specific situations as needed.

I agree we should avoid attribute additions in general, especially when they're about static analysis and not semantics. There's an infinite number of possible things that might want attributes to mark them, so those things would be best as lints or tool attributes or ghost code whenever possible.

scottmcm · 2022-11-03T04:07:34Z

text/3340-exhaustive-output.md

+# Motivation
+[motivation]: #motivation
+
+Oftentimes it is necessary to convert unrestricted values into enum variants. Unrestricted values can include integers, strings, UUIDs, even other much larger enums.


Note that integers are exhaustive -- we even use that in matching, where you don't need wildcards to match integers (or chars, for that matter!).

So shouldn't this use that same exhaustiveness checking? What if I wrote

#[exhaustive_output] match my_u8 { 0 => 255, x @ 1..=254 => x, 255 => 1, // Oops, I meant `0` }

Could that give me a warning that I wasn't exhaustive?

More plausibly, if we get something like #[niche], then arguably anything we can support for enums we should be supporting for niched integers too -- otherwise it's arguably better to #[repr(u8)] enum Foo { _1 = 1, _2 = 2, _3 = 3 } than to use RefinedU8<1..=3>.

I suppose so, though from an implementation perspective this complicates things a lot as the compiler has to infer the possible outputs of x @ 1..=254 => x,, and pretty much every other integer pattern.

Nemo157 · 2022-11-03T13:52:25Z

This has been requested as a clippy lint previously. As mentioned in that issue as a lint it's hard to know what to lint on specifically, and the only suggested solution was some sort of attribute like proposed here.

Nemo157 · 2022-11-03T13:55:24Z

How does it interact with #[non_exhaustive] enums? It feels like if the enum is defined in a different crate, then it's wrong to try and enforce any exhaustiveness of constructed variants. It would be a direct semver hazard.

Perhaps this should be made incompatible with external non_exhaustive enums.

non_exhaustive enums are a primary usecase for this sort of lint. Those are the most likely enums to have a new variant added to, and as the creator of the enum you want to update all dynamic creation-points to potentially create the new variant.

kornelski · 2022-11-07T14:26:27Z

edit: nevermind

Xaeroxe · 2022-11-07T15:34:55Z

@kornelski

This was suggested as an attribute specifically because the behavior it introduces is undesirable for many match statements. I don't see how it's possible or desirable to implement this without specifically marking the match as opting in to this behavior.

ijackson · 2022-11-09T11:29:34Z

The RFC uses the phrase "arm(s) for (some) variant" (emph mine). I don't think this concept of an arm being "for" a variant is properly defined in the RFC. It should be. I guess the intended meaning is something like "the arm is precisely an enum struct expression for the variant, possibly in curly brackets".

The handling of if guards on match arms should be discussed.

With that definition, the use cases are fairly limited. For example, if any of the arms need to contain an inner if the lint will fire.

Under "alternatives", the possibility should be discussed of solving the problem ("we might forget to add an arm") through a #[test] test case. (A package like strum can be used to iterate over the variants at runtime in the test.)

A limitation of this proposal is that it can't be used where the match wants to return the enum value and also some other information (a separate categorisation, or some additional data, or something). An alternative which would do this would be a way to annotate an expression with a requirement that somewhere lexically within that expression, all variants of a nominated enum must be constructed. (I don't think this is a particularly good idea, but it should be in "alternatives" probably.) Ie, not limit it to match, and require the enum type to be specified explicitly.

Xaeroxe · 2022-11-10T03:29:02Z

I don't think this concept of an arm being "for" a variant is properly defined in the RFC. It should be.

Agreed! I consider an arm to be "for" the variant if the arm contains a return expression with that variant. This means it's technically possible for an arm to contain multiple variants, I think this is good.

Now, my next question is "how far should the compiler go in trying to determine what variants might be returned by a given expression?" i.e. if an arm contains a function call which returns a variant, do we count all possible variants that function might return as well?

I think there isn't a clear answer here. If we say "yes, the compiler should recursively enter functions in return position to determine what variants they return" then we have a few problems

A variant might get counted that will never be returned from that function call. This felt okay when all of the logic was contained in the arm, but it feels not okay when the logic is elsewhere in the program.
This could cause some serious performance concerns with the lint.
The implementation is now much more complex and might result in scanning large portions of the code base.

However if we say "no, the compiler should consider a function call as though it were no variants" then the lint suddenly isn't compatible with helper methods or enum constructors. This feels like a pretty significant limitation, but maybe the better option is to just accept that this is how the lint must work.

The handling of if guards on match arms should be discussed.

I think probably the best answer here is "ignore the presence or absence of a guard when evaluating an arm".

I believe the lint should be satisfied that it is possible to output a variant. It's not the responsibility of the lint to check your logic in the guard.

A limitation of this proposal is that it can't be used where the match wants to return the enum value and also some other information

Right, and in the event that multiple enums are being returned it's not clear which enum this lint should apply to. We could arbitrarily pick one, but any choice must be arbitrary. So, I think the correct answer is "refuse the question". The return type of the match must be an enum, not a type containing an enum. I think it's likely that's why you say "I don't think this is a particularly good idea"

Thanks for your feedback!

ahicks92 · 2022-12-10T22:53:45Z

What happens if:

let some_value: bool = random_value();

match an_integer {
    1 => if some_value { variant1 } else { Variant2 },
    ...
}

It seems that either (1) the attribute must be placed on a match of the exact form x => constant_expression, or (2) it can't warn for all match statements that it might be applied to.

CraftSpider · 2022-12-11T00:14:49Z

My first thought on that is only supporting constant-expression right hands, since most cases like that can be rewritten like

match (int, bool) {
    (1, true) => Variant1,
    (2, false) => Variant2,
    ...
}

Which seems like it could be supported

ahicks92 · 2022-12-11T00:42:19Z

Actually apparently that was already discussed. For whatever reason my first look here had GitHub not listing any comments. But now my wider objection is that this is "warn if you don't cover all variants but only if there are no function calls, you don't return anything else, you never return multiple variants in a branch, and you don't use if guards"

This seems more like something that should just be a normal lint, "it looks like you are trying to convert a whatever to this enum but you are missing..." wherever it can be applied already, rather than something where a human figures out where it might be applied and remembers to do so. If a human user is expected to remember it, that seems to me to be the same human-memory-space of remembering that strum exists, or that maybe you should search for a crate etc. So instead if we want this, I'd make the human user either globally allow it if it's on by default, allow it in specific cases if it's on by default, or let the specialized subset of people who are doing this frequently just enable it for their code.

The only other Rust feature I can think of offhand with that many edge cases is const generics and I guess also GAT, but both of those are that way because they're not "finished" in some sense and not by design.

Now all that said I would actually use this. I see the utility in having a mechanism, I just don't know that I agree with the mechanism because the mechanism is weirdly restricted and doesn't have a good model. I suppose one last alternative is to make the compiler warn on any match statement it can't fully handle, but that kind of feels like punting on the problem.

Lastly, if we do this attribute, how is it being taught? This seems like the kind of thing that gets buried in the reference or something, though at this point perhaps someone has a complete C FFI guide that it'd fit into.

Xaeroxe · 2023-05-29T00:13:03Z

So, I've thought about this some more and I've come to a few conclusions.

I think this lint should be all or nothing. Go deep on determining how many variants can be returned, recursively scan functions if a match arm calls a function. The major exception to this will have to be dynamic function calls. Because static analysis can't show what kinds of variants will be returned, a dynamic function call must count as containing zero variants. "Dynamic function call" involves the use of fn() function pointers, and dyn trait objects. For similar reasons extern "C" functions must also count as zero variants.
1. When evaluating what variants a function will return, any arguments that are const and used in branching will be used to narrow down what paths the function may take. Any path eliminated in this manner will not have variants from it counted. This information about skipped paths may be provided in the warning diagnostic if it's determined to be relevant.
Because this recursive scanning is going to come with a performance impact it should not be part of rustc, and instead should be part of clippy. That way the performance penalty associated with this kind of scanning is not a mandatory part of the build process.
If the performance hindrance associated with this feature causes an excessive rise in how long it takes for clippy to execute, and it is deemed impractical or impossible to optimize that time away, this feature will not be implemented.
I'd like to implement this as an "opt-out" attribute rather than an "opt-in" attribute, but only if our heuristics for determining what it should apply to are really good. As a first draft for these heuristics, I propose this.
1. A match statement will not be subject to this lint if the enum being returned has less than 4 variants.
2. A match statement will not be subject to this lint unless it already returns more than half of the variants. i.e. 3 of 4 triggers a reminder, 2 of 4 does not, 4 of 5 triggers a reminder, 1 of 128 does not, 127 of 128 triggers a reminder.
This will apply to any type which uses an enum as part of its construction. i.e. Arc<MyEnum>, Cow<MyEnum>, (i32, MyEnum) and Arc<(i32, MyEnum)> are all valid candidates for this. Please note that having an enum as part of the generic arguments is not sufficient to meet this criteria, the enum must be used as part of the construction of the type. So PhantomData<MyEnum> does not count.

I recognize that I've described an incredibly complex feature here, and it will almost certainly need further clarification and definition. For starters, I'd like to get some feedback on how the community feels about my general ideas.

programmerjake · 2023-05-29T01:05:06Z

i think it should not do analysis across function boundaries, however, if it does, imho there is no good reason to not analyze extern "C" functions too, changing the function ABI shouldn't change what enum variants it returns.

Xaeroxe · 2023-05-29T01:06:30Z

Okay, small clarification, that statement isn't meant to apply to extern "C" functions defined in Rust, it's meant to apply to extern "C" functions that are defined in C language files.

Xaeroxe · 2023-07-19T07:51:31Z

The group should know I have no current plans to progress this RFC as-is. It's clearly not refined enough to describe clearly exactly what it is going to do and my attempts to refine it have led to multiple approaches with no clear winners. I am left wondering if this issue would be better resolved by macro crates in the crates io ecosystem because any possible solution to these issues is bound to be controversial for some reason or another.

Add exhaustive_output RFC

24e3b5e

Xaeroxe mentioned this pull request Oct 31, 2022

Tracking Issue for 3340 exhaustive_match_output rust-lang/rust#103818

Closed

3 tasks

Update links

5d0878c

Xaeroxe changed the title ~~Add exhaustive_output RFC~~ Add exhaustive_match_output RFC Oct 31, 2022

Grammar fix

5d287b5

joshtriplett reviewed Oct 31, 2022

View reviewed changes

text/3340-exhaustive-output.md Show resolved Hide resolved

joshtriplett added the T-lang Relevant to the language team, which will review and decide on the RFC. label Oct 31, 2022

Add clarification about attribute

bd0b166

Co-authored-by: Josh Triplett <josh@joshtriplett.org>

scottmcm reviewed Nov 3, 2022

View reviewed changes

Mention dead code lint in alternatives

d1aaa90

Xaeroxe closed this Aug 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `exhaustive_match_output` RFC #3340

Add `exhaustive_match_output` RFC #3340

Xaeroxe commented Oct 31, 2022 •

edited

Loading

joshtriplett commented Oct 31, 2022

afetisov commented Nov 1, 2022 •

edited

Loading

Xaeroxe commented Nov 1, 2022 •

edited

Loading

afetisov commented Nov 1, 2022

Xaeroxe commented Nov 1, 2022

afetisov commented Nov 1, 2022

Xaeroxe commented Nov 1, 2022

Lokathor commented Nov 1, 2022

Xaeroxe commented Nov 1, 2022

Lokathor commented Nov 1, 2022

Noratrieb commented Nov 1, 2022

programmerjake commented Nov 2, 2022

Lokathor commented Nov 2, 2022

Xaeroxe commented Nov 2, 2022

afetisov commented Nov 2, 2022

Xaeroxe commented Nov 2, 2022 •

edited

Loading

scottmcm commented Nov 3, 2022

scottmcm Nov 3, 2022

Xaeroxe Nov 5, 2022

Nemo157 commented Nov 3, 2022

Nemo157 commented Nov 3, 2022

kornelski commented Nov 7, 2022 •

edited

Loading

Xaeroxe commented Nov 7, 2022 •

edited

Loading

ijackson commented Nov 9, 2022

Xaeroxe commented Nov 10, 2022 •

edited

Loading

ahicks92 commented Dec 10, 2022

CraftSpider commented Dec 11, 2022

ahicks92 commented Dec 11, 2022

Xaeroxe commented May 29, 2023

programmerjake commented May 29, 2023

Xaeroxe commented May 29, 2023

Xaeroxe commented Jul 19, 2023

Add exhaustive_match_output RFC #3340

Add exhaustive_match_output RFC #3340

Conversation

Xaeroxe commented Oct 31, 2022 • edited Loading

joshtriplett commented Oct 31, 2022

afetisov commented Nov 1, 2022 • edited Loading

Xaeroxe commented Nov 1, 2022 • edited Loading

afetisov commented Nov 1, 2022

Xaeroxe commented Nov 1, 2022

afetisov commented Nov 1, 2022

Xaeroxe commented Nov 1, 2022

Lokathor commented Nov 1, 2022

Xaeroxe commented Nov 1, 2022

Lokathor commented Nov 1, 2022

Noratrieb commented Nov 1, 2022

programmerjake commented Nov 2, 2022

Lokathor commented Nov 2, 2022

Xaeroxe commented Nov 2, 2022

afetisov commented Nov 2, 2022

Xaeroxe commented Nov 2, 2022 • edited Loading

scottmcm commented Nov 3, 2022

scottmcm Nov 3, 2022

Choose a reason for hiding this comment

Xaeroxe Nov 5, 2022

Choose a reason for hiding this comment

Nemo157 commented Nov 3, 2022

Nemo157 commented Nov 3, 2022

kornelski commented Nov 7, 2022 • edited Loading

Xaeroxe commented Nov 7, 2022 • edited Loading

ijackson commented Nov 9, 2022

Xaeroxe commented Nov 10, 2022 • edited Loading

ahicks92 commented Dec 10, 2022

CraftSpider commented Dec 11, 2022

ahicks92 commented Dec 11, 2022

Xaeroxe commented May 29, 2023

programmerjake commented May 29, 2023

Xaeroxe commented May 29, 2023

Xaeroxe commented Jul 19, 2023

Add `exhaustive_match_output` RFC #3340

Add `exhaustive_match_output` RFC #3340

Xaeroxe commented Oct 31, 2022 •

edited

Loading

afetisov commented Nov 1, 2022 •

edited

Loading

Xaeroxe commented Nov 1, 2022 •

edited

Loading

Xaeroxe commented Nov 2, 2022 •

edited

Loading

kornelski commented Nov 7, 2022 •

edited

Loading

Xaeroxe commented Nov 7, 2022 •

edited

Loading

Xaeroxe commented Nov 10, 2022 •

edited

Loading