Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WhereNotNull to help with nullability tracking #30381

Closed
YairHalberstadt opened this issue Jul 26, 2019 · 28 comments
Closed

Add WhereNotNull to help with nullability tracking #30381

YairHalberstadt opened this issue Jul 26, 2019 · 28 comments
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Linq
Milestone

Comments

@YairHalberstadt
Copy link
Contributor

YairHalberstadt commented Jul 26, 2019

See dotnet/csharplang#8383

Currently the C# compiler can't track how the lambda in a Where method effects nullability.

For example

using System;
using System.Collections.Generic;
using System.Linq;

#nullable enable
public class C {
    public IEnumerable<string> WhereNotNull(IEnumerable<string?> strings)
    {
        return strings.Where(x => x != null);
    }
}

Warns with:

warning CS8619: Nullability of reference types in value of type 'IEnumerable<string?>' doesn't match target type 'IEnumerable'.

This could definitely be solved by improved compiler tooling, but doing so would be non-trivial.

A workaround would be to introduce a WhereNotNull method into CoreFX, so that consumers won't need to use the suppression operator themselves.

A further advantage would be that using WhereNotNull instead of Where would avoid allocating a lambda for an extremely common Linq method. This could be done either by caching the lambda as a Func<object, bool>, which avoids the need to update the rest of Linq to be aware of a new Linq method, or by creating a custom enumerator for WhereNotNull which avoids the overhead of a virtual function call, but requires updating the rest of Linq.

API Proposal

namespace System.Linq
{
    public static partial class Enumerable
    {
        public static IEnumerable<T> WhereNotNull<T>(this IEnumerable<T?> source);
        public static IEnumerable<T> WhereNotNull<T>(this IEnumerable<T?> source) where T: struct;
    }
}

API Usage

I've deliberately not constrained the first API (i.e. omitted where T: class) so that unconstrained generic code can use this API as well. Of course, this won't work well with nullable value types, but that's a general problem of unconstrained generic code and nullability in general, so this isn't a new problem.

Risks

None.

@stephentoub
Copy link
Member

A further advantage would be that using WhereNotNull instead of Where would avoid allocating a lambda for an extremely common Linq method. This could be done either by caching the lambda as a Func<object, bool>

If you write:

Where(x => x != null)

the compiler is already going to cache the delegate automatically.

@stephentoub
Copy link
Member

stephentoub commented Jul 26, 2019

This could definitely be solved by improved compiler tooling, but doing so would be non-trivial.

It might be non-trivial, but there are a myriad number of places this same kind of issue can show up, and I don't think we should address each as a one-off solved by adding additional methods for each such case. My preference is to wait and see what the actual broad impact is (we don't even have LINQ annotated yet), and then ideally come up with a holistic solution around such cases. If it turns out that this specific case is literally the only one that matters, then it could be relevant, but we should not rush it.

@YairHalberstadt
Copy link
Contributor Author

the compiler is already going to cache the delegate automatically.

The compiler will cache one delegate per type parameter, per use site. There are likely to be hundreds of identical such delegates across a large code base - a quick search just for "x => x != null" turned up 97 in ours. It's a small benefit, but perhaps not irrelevant.

My preference is to wait and see

Agreed. I was just bringing this up here as a point of discussion. Hopefully the Roslyn and CoreFx teams can work something out on this together. Thanks for your input!

@stephentoub
Copy link
Member

stephentoub commented Jul 26, 2019

The compiler will cache one delegate per type parameter, per use site.

This is something Roslyn could have an optimization pass to consolidate if deemed impactful. Developers shouldn't be required to manually do such folding by hand.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the Future milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@stephentoub stephentoub added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 26, 2020
@kronic
Copy link
Contributor

kronic commented Aug 26, 2020

Maybe WhereNotDefault

@terrajobst
Copy link
Member

Linq is now annotated and the issue remains.

dotnet/roslyn#41363 has some good comments:

@adamsitnik adamsitnik removed the untriaged New issue has not been triaged by the area owner label Sep 2, 2020
@john-h-k
Copy link
Contributor

john-h-k commented Sep 7, 2020

why isn't OfType<object> good enough? Avoids the lambda, isn't particularly cryptic (imo, given the fact null is T is never true), and works great with nullability

@eiriktsarpalis eiriktsarpalis added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 26, 2020
@Evangelink
Copy link
Member

@john-h-k +1 on the OfType call. I was using this syntax and I have recently hit a compiler warning saying I cannot change the nullability although it seems to actually work properly.

@Thaina
Copy link

Thaina commented Mar 25, 2021

I want to add SelectNotNull that both transform and filter in the same line

public static IEnumerable<V> SelectNotNull<T,V>(this IEnumerable<T> items,Func<T,V> func) where V : class
{
	foreach(var item in items)
	{
		var value = func(item);
		if(value != null)
			yield return value;
	}
}

public static IEnumerable<V> SelectNotNull<T,V>(this IEnumerable<T> items,Func<T,V?> func) where V : struct
{
	foreach(var item in items)
	{
		var value = func(item);
		if(value.HasValue)
			yield return value.Value;
	}
}

It could also be used as condition filtering. If the value was not satisfied condition we could return null to skip it

@eiriktsarpalis eiriktsarpalis added untriaged New issue has not been triaged by the area owner and removed needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration untriaged New issue has not been triaged by the area owner labels Apr 19, 2021
@terrajobst terrajobst added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Sep 22, 2021
@terrajobst
Copy link
Member

terrajobst commented Sep 22, 2021

@stephentoub

This could definitely be solved by improved compiler tooling, but doing so would be non-trivial.

It might be non-trivial, but there are a myriad number of places this same kind of issue can show up, and I don't think we should address each as a one-off solved by adding additional methods for each such case. My preference is to wait and see what the actual broad impact is (we don't even have LINQ annotated yet), and then ideally come up with a holistic solution around such cases. If it turns out that this specific case is literally the only one that matters, then it could be relevant, but we should not rush it.

As a Linq user, I keep running into this issue fairly often. And even if the compiler is eventually smart enough to infer this, I think the proposed API still adds value by making a fairly common Linq filter built-in. I've marked the API as ready for review so we can discuss the necessary next steps.

@stephentoub
Copy link
Member

stephentoub commented Sep 22, 2021

I don't know why we'd promote this one Where + predicate to a top-level method. Why not WhereNotEmptyString? WhereContainsString? WhereNotEmptyCollection? And so on. If it's purely about nullable annotations, one character (!) heals all wounds. I do not think this should be added.

@terrajobst
Copy link
Member

terrajobst commented Sep 22, 2021

In my uses cases, I found it usually not solvable by adding a single !.

Let's say I write this:

var nonNullCustomers = collection.Where(c => c is not null);

The compiler infers nonNullCustomers to be IEnumerable<Customer?>. Adding an ! doesn't change that. I would either have to add a cast:

var nonNullCustomers = (IEnumerable<Customer>)collection.Where(c => c is not null);

or give up on type inference

IEnumerable<Customer> nonNullCustomers = collection.Where(c => c is not null)!;

or explicitly do something in the Linq query to change the null state:

var result = customers.Where(c is not null).Select(c => c!);

All solutions seem substantially less desirable than just writing

var result = customers.WhereNotNull();

I don't know why we'd promote this one Where + predicate to a top-level method. Why not WhereNotEmptyString? WhereContainsString? WhereNotEmptyCollection? And so on.

There is a line somewhere; where that exactly lies is a bit in the eye of the beholder. We have certain conveniences, such as string.IsNullOrEmpty() because it's very common to ask such questions. At least in my experience, filtering null values in Linq is pretty common.

@Thaina
Copy link

Thaina commented Sep 22, 2021

Also want to add that

I don't know why we'd promote this one Where + predicate to a top-level method. Why not WhereNotEmptyString? WhereContainsString? WhereNotEmptyCollection? And so on.

Because null work with every object in the whole language. Even the value type could be temporarily made into Nullable while string is only one class, and collection is only fraction of all type in the language

But I could support WhereNotNullOrEmpty that work with both string, ICollection, and StringBuilder and similar class. Because that was somewhat the large fraction of object in C# world

@Evangelink
Copy link
Member

I agree with @stephentoub about the overload why this one and not some other. As he stated, let's say you have a collection of string and do Where(x => !string.IsNullOrEmpty(x)), I expect the compiler to know that the strings are not null.

Besides, adding specific overload won't help building better analyzers.

I would be in favor of having some mechanism to "promote" what is learned from within the Where clause. This would also help to handle cases like Where(x => IsNotNull(x)) where we have this helper method annotated to say it's not null (or maybe let's say some kind of TryXXX function). This mechanism would also help to build other kind of analyzers (including DFAs).

@Thaina
Copy link

Thaina commented Sep 22, 2021

where we have this helper method annotated to say it's not null

I don't think compiler will smart enough to propagate the information of nullability from arbitrary delegate to Where function. Where clause is only simple predication that only check for true or false

I don't understand why someone think of null as having importantness less than true or false so we cannot have linq function to handle it in the same manner

@stephentoub
Copy link
Member

stephentoub commented Sep 22, 2021

The compiler infers nonNullCustomers to be IEnumerable<Customer?>. Adding an ! doesn't change that.

What are you then doing with nonNullCustomers? If you're returning it from something that needs IEnumerable<Customer>, that's where you add the !. If you're assigning it to something that takes IEnumerable<Customer>, that's where you add it. If you're immediately calling another LINQ operation, then you can put a ! in that next operation where the null value is used but you know it's not null.

or give up on type inference

You still get type inference. You're just telling the compiler you know it's all not null.

At least in my experience, filtering null values in Linq is pretty common.

Can you gather some data? We should be able to analyze all usage of Where in large codebases (sync with @marklio) and demonstrate that this is, say, the 20% case of using Where, and I'll think about it differently (from a convenience perspective; from a nullability analysis perspective, even with that I think solving this in the compiler is better). Without that, sure, I of course believe people do it, but I don't believe it's so common that it's worth special-casing it.

I don't think compiler will smart enough to propagate the information of nullability from arbitrary delegate to Where function. Where clause is only simple predication that only check for true or false

Why not? The compiler can see the lambda. It already does the flow analysis to determine whether the value returned from the lambda is maybe-null or non-null. The feature there is about enabling that information to propagate out, which has little to do with smarts and more to do with just making it work. And when it does work, it'll handle well more than just != null, and rather anything that results in a non-null state.

I don't understand why someone think of null as having importantness less than true or false so we cannot have linq function to handle it in the same manner

Because every decision about whether to filter something in or out is a Boolean decision. We already have such overloads for making such Boolean decisions. With, you know, Booleans.

@Thaina
Copy link

Thaina commented Sep 22, 2021

I don't think compiler will smart enough to propagate the information of nullability from arbitrary delegate to Where function. Where clause is only simple predication that only check for true or false

Why not? The compiler can see the lambda. It already does the flow analysis to determine whether the value returned from the lambda is maybe-null or non-null. The feature there is about enabling that information to propagate out, which has little to do with smarts and more to do with just making it work. And when it does work, it'll handle well more than just != null, and rather anything that results in a non-null state.

How can compiler know the internal mechanism of Where function to know that, if it check for nullability in the lambda, it then will return only a not null object. Compiler don't even know what will the Where function really return after the predicate. It only know the return type IEnumerable<T>

Or you meant that you want compiler to treat Where or any static method as special case that it will pry into internal implementation of every static method we call to do a flow analysis on each and every of them and change the method signature to be nullable or not nullable by compiler themselves?

That's what I want to call that compiler is not smart enough and it really should not as much smart as that. It would have unintended behaviour

I don't understand why someone think of null as having importantness less than true or false so we cannot have linq function to handle it in the same manner

Because every decision about whether to filter something in or out is a Boolean decision. We already have such overloads for making such Boolean decisions. With, you know, Booleans.

There was OfType function exist that it just filter null out of the collection. Also GroupBy and Zip and Join that also filter

@stephentoub
Copy link
Member

stephentoub commented Sep 22, 2021

How can compiler know the internal mechanism of Where function to know that

The compiler already knows about Where and Select and other LINQ methods. The C# specification actually defines the behavior of what these methods do as it binds to them as part of query comprehension syntax.

There was OfType function exist that it just filter null out of the collection

No, it exists to filter out things that aren't of the right type, and in particular to help with going from non-generic to generic collections. It, along with Cast, are the two primary ways for bridging the non-generic world to LINQ.

Also GroupBy and Zip and Join that also filter

The proposed WhereNotNull is implementable as a single clause, which is part of why I don't think it should be added. We'd be adding an entirely new method to do something a developer can express themselves with just a few characters, which means it needs to be insanely valuable to be worth its weight. Show me how you can implement GroupBy and Join in just a few characters.

@Thaina
Copy link

Thaina commented Sep 22, 2021

How can compiler know the internal mechanism of Where function to know that

The compiler already knows about Where and Select and other LINQ methods. The C# specification actually defines the behavior of what these methods do as it binds to them as part of query comprehension syntax.

This meant you try to make every other C# functions in the system as second class citizen and promote the Select and Where to be special class that compiler must treat it differently than everything and without generic way to apply it to our own custom function

There was OfType function exist that it just filter null out of the collection

No, it exists to filter out things that aren't of the right type, and in particular to help with going from non-generic to generic collections. It, along with Cast, are the two primary ways for bridging the non-generic world to LINQ.

that aren't of the right type in essense that also include null

We'd be adding an entirely new method to do something a developer can express themselves with just a few characters, which means it needs to be insanely valuable to be worth its weight. Show me how you can implement GroupBy and Join in just a few characters.

ToList and ToArray is relatively a few character. We also have List.AddRange so why we have ToList ?

I'm understand the latter argument, but still insanely valuable is subjective. In my opinion this feature is insanely valuable for everyone. It is common expectation and don't even need to make delegate

Especially for converting nullable to value it save so much more than just a few characters

@stephentoub
Copy link
Member

stephentoub commented Sep 22, 2021

This meant you try to make every other C# functions in the system as second class citizen and promote the Select and Where to be special class that compiler must treat it differently than everything and without generic way to apply it to our own custom function

This issue is about adding one API that itself "promotes the Where" to be special by adding a dedicated method that handles only a single case. The compiler already knows about these methods and can handle them for nullability much more generally.

that aren't of the right type in essense that also include null

But isn't limited to null. It's like saying StringBuilder.Append is the same thing because it handles null strings specially.

We also have List.AddRange so why we have ToList ?

Because it's insanely common, used by practically every codebase that uses LINQ, there's no other LINQ method it can be implemented in terms of, and enables fluid syntax as the rest of LINQ provides. It also already exists. We're talking about investing in new API here, for something that's already doable with minimal ceremony; it needs to be worth it.

In my opinion this feature is insanely valuable for everyone. It is common expectation

I asked @terrajobst for data earlier.

@Thaina
Copy link

Thaina commented Sep 22, 2021

adding a dedicated method that handles only a single case.

How is that promoting Where by any reasoning ? It's completely normal to have multiple overload method or multiple method with similar name to handle single case in any class when it predictably have specific usage. No one ever consider it promoting anything. It's more like default argument that when there was no argument we can assume that argument to have default value

And so you mean OfType was promoting Where all along? The OfType is the single case too, the case that the object is the specific type. And we could always write OfType as this

items.Where((item) => item is MyType).Cast<MyType>();
// items.OfType<MyType>();

To be honest I think of this feature is similar to OfType from the start

But isn't limited to null

It's opposite. I think of it as "limited to only a type it want to filter" while WhereNotNull is not limited to any type, just exclude the null. In the same way that Where just exclude the false

Because it's insanely common
there's no other LINQ method it can be implemented

This is also subjective. I admittedly also use it. But for me it just only save relatively a few character, without it we can also

var list = new List<MyType>();
list.AddRange(/* some linq we normally append .ToList() at the end */);

Like this

@stephentoub
Copy link
Member

How is that promoting Where by any reasoning ?

It's taking something you can easily write today:

source.Where(s => s != null);

and creating a dedicated method for it:

source.WhereNotNull();

That's promoting this very specific use of Where to its own method. That's the whole purpose of this issue.

And so you mean OfType was promoting Where all along?

No, OfType is primarily about bringing non-generic enumerables into the generic world, which is why it accepts IEnumerable and produces IEnumerable<T>. You can't just replicate its behavior with Where(s => s is T) because Where only works with generics.

This is also subjective.

I don't think it's subjective at all.

@terrajobst
Copy link
Member

The compiler infers nonNullCustomers to be IEnumerable<Customer?>. Adding an ! doesn't change that.

What are you then doing with nonNullCustomers? If you're returning it from something that needs IEnumerable<Customer>, that's where you add the !. If you're assigning it to something that takes IEnumerable<Customer>, that's where you add it. If you're immediately calling another LINQ operation, then you can put a ! in that next operation where the null value is used but you know it's not null.

Depends on the scenario, but it's often subsequent Linq operations. Here is an example:

var nonNullCustomers = collection.Where(c => c is not null)
                                 .OrderBy(c => c.FirstName) // Warning
                                 .ThenBy(c => c.LastName);  // Warning

foreach (var c in nonNullCustomers)
{
    Console.WriteLine(c.FirstName + " " + c.LastName); // Warning
}

I would have to put a ! in three places:

var nonNullCustomers = collection.Where(c => c is not null)
                                 .OrderBy(c => c!.FirstName)
                                 .ThenBy(c => c!.LastName);

foreach (var c in nonNullCustomers)
{
    Console.WriteLine(c!.FirstName + " " + c.LastName);
}

In cases like this it makes the most sense to deal with the null state right in the query:

var nonNullCustomers = collection.Where(c => c is not null)
                                 .Select(c => c!);
                                 .OrderBy(c => c.FirstName)
                                 .ThenBy(c => c.LastName);

foreach (var c in nonNullCustomers)
{
    Console.WriteLine(c.FirstName + " " + c.LastName);
}

Which is why I'd prefer to write this:

var nonNullCustomers = collection.WhereNotNull()
                                 .OrderBy(c => c.FirstName)
                                 .ThenBy(c => c.LastName);

foreach (var c in nonNullCustomers)
{
    Console.WriteLine(c!.FirstName + " " + c.LastName);
}

@Thaina
Copy link

Thaina commented Sep 22, 2021

How is that promoting Where by any reasoning ?

It's taking something you can easily write today:

source.Where(s => s != null);

and creating a dedicated method for it:

source.WhereNotNull();

That's promoting this very specific use of Where to its own method. That's the whole purpose of this issue.

If it would be called as promote anything I would call it promoting null not promoting where

Which is perfectly fine, because null is keyword. But Where is function. The function try to promote keyword, not the other way around

OfType is primarily about bringing non-generic enumerables into the generic world

With your argument this meant promoting the very specific use of Where to the nongeneric collection

And also we have both version of OfType, if your reasoning is really primarily truth we would have only OfType<V>(IEnumerable) and wouldn't have the generic version OfType<V>(IEnumerable<T>)

@jeremyVignelles
Copy link

As a C# dev, I'm following this discussion not because I'd like to add .WhereNotNull() for the sake of adding a method, but because the current behavior is cumbersome to get right, as described by terrajobst.

Now, if there is a better solution to make .Where(s => s is not null) work as expected, I would gladly agree with stephen that adding a method would be pointless.

That said, I don't understand enough the frameworks internal to see how this could be possible. Would it be some attribute magic, like NotNullIfNotNull that is injected at the time the method is built ? Would it be an analyzer stuff that detects "OK, it's an enumerable of nullable, but earlier, I saw that it wasn't actually null" (flow analysis) ? Would it be specific to / a special case to .Where()?

Is dotnet/csharplang#8383 the correct place to track the "nullability detection issue" ?

@stephentoub
Copy link
Member

Is dotnet/csharplang#8383 the correct place to track the "nullability detection issue" ?

Or dotnet/csharplang#3951 I assume.
cc: @jcouv

That said, I don't understand enough the frameworks internal to see how this could be possible. Would it be some attribute magic, like NotNullIfNotNull that is injected at the time the method is built ? Would it be an analyzer stuff that detects "OK, it's an enumerable of nullable, but earlier, I saw that it wasn't actually null" (flow analysis) ? Would it be specific to / a special case to .Where()?

My guess is it will end up being specific to the LINQ members C# already knows about.

@bartonjs
Copy link
Member

bartonjs commented Oct 5, 2021

Video

Given that it would require people to change their code, and the current analysis says it's only about 3% of libraries where it's even possibly related; and that the main use should ideally be solved by the compiler... there's no need for it in the BCL.

@bartonjs bartonjs closed this as completed Oct 5, 2021
@jl0pd
Copy link
Contributor

jl0pd commented Oct 20, 2021

WhereNotNull actually useful when working with nullable structs

Given this class

class MyClass
{
    public int? SomeProp { get; set; }
}

And some collection

MyClass[] ar = ...;

It's easier and less error prone to get values with

int[] props = ar.Select(c => c.SomeProp).WhereNotNull().ToArray()

than with

int[] props = ar.Select(c => c.SomeProp).Where(x => x.HasValue).Select(x => x!.Value).ToArray();

@ghost ghost locked as resolved and limited conversation to collaborators Nov 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-ready-for-review API is ready for review, it is NOT ready for implementation area-System.Linq
Projects
None yet
Development

No branches or pull requests