Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Left, right and full outer joins #238

Closed
cknightdevelopment opened this issue Jan 14, 2017 · 12 comments
Closed

Left, right and full outer joins #238

cknightdevelopment opened this issue Jan 14, 2017 · 12 comments
Milestone

Comments

@cknightdevelopment
Copy link

Overview

I am thinking that it would be helpful to have LeftJoin, RightJoin, InnerJoin, and FullOuterJoin methods in MoreLINQ. I know that there is already GroupJoin and FullGroupJoin (from MoreLINQ), but I know sometimes I would like a way to have more control over my joins and have them be flattened.

I am thinking these new methods could return an IEnumerable of an object such as this:

public class JoinResult<TLeft, TRight>
{
    public TLeft Left { get; set; }
    public TRight Right { get; set; }
}

Below is some sample data (to allow this to be a little more concrete) and examples of what I am thinking as far as calling these new methods and the results they would provide:

Sample Data

var people = new List<Person>
{
    new Person { PersonId = 1, Name = "Chris" },
    new Person { PersonId = 2, Name = "Joe" },
    new Person { PersonId = 3, Name = "Mike" }
};
var addresses = new List<Address>
{
    new Address { AddressId = 1, PersonId = 2, Street = "Main St" },
    new Address { AddressId = 2, PersonId = 3, Street = "2nd Ave" },
    new Address { AddressId = 3, PersonId = 4, Street = "Broad St" }
};

LeftJoin

Call

people.LeftJoin(
    addresses,
    p => p.PersonId,
    a => a.PersonId
);

Pseudo-Code Result

[
   {Left = Person1, Right = null},
   {Left = Person2, Right = Address1},
   {Left = Person3, Right = Address2}
] 

RightJoin

Call

people.RightJoin(
    addresses,
    p => p.PersonId,
    a => a.PersonId
);

Pseudo-Code Result

[
   {Left = Person2, Right = Address1},
   {Left = Person3, Right = Address2},
   {Left = null, Right = Address3}
] 

InnerJoin

Call

people.InnerJoin(
    addresses,
    p => p.PersonId,
    a => a.PersonId
);

Pseudo-Code Result

[
   {Left = Person2, Right = Address1},
   {Left = Person3, Right = Address2}
] 

FullOuterJoin

Call

people.FullOuterJoin(
    addresses,
    p => p.PersonId,
    a => a.PersonId
);

Pseudo-Code Result

[
   {Left = Person1, Right = null},
   {Left = Person2, Right = Address1},
   {Left = Person3, Right = Address2},
   {Left = null, Right = Address3}
] 

Question/Request

Would this be a welcome contribution to the project right now? I would be happy to work on this in a fork and submit a pull request, but I wanted to first see if you think this would be a positive contribution to the project. Please let me know if you would like to discuss this further. Thank you.

@atifaziz
Copy link
Member

I would like a way to have more control over my joins

Could you expand on that? What kind of extra control would you be able to exercise?

@cknightdevelopment
Copy link
Author

What I mean is having options of what data I get back via the left/right/inner/fullouter, just like when thinking of these type of joins in SQL. The data you get back is flattened out, and determined by which type of join you do. E.g. the left join returns all the Person objects and only the Address objects that match, inner join returns only the Person and Address objects that match, etc. Hopefully I am explaining myself okay.

@atifaziz
Copy link
Member

atifaziz commented Jan 16, 2017

So the only thing that would be added is flattening?

Why InnerJoin? That one is trivial, built-into LINQ and the result is already flattened:

var people = new[]
{
    new { PersonId = 1, Name = "Chris" },
    new { PersonId = 2, Name = "Joe" },
    new { PersonId = 3, Name = "Mike" }
};

var addresses = new[]
{
    new { AddressId = 1, PersonId = 2, Street = "Main St" },
    new { AddressId = 2, PersonId = 3, Street = "2nd Ave" },
    new { AddressId = 3, PersonId = 4, Street = "Broad St" }
};

var query =
    from person in people
    join address in addresses on person.PersonId equals address.PersonId
    select new { person.Name, address.Street };
    
foreach (var e in query)
    Console.WriteLine(e);

// Prints:
// { Name = Joe, Street = Main St }
// { Name = Mike, Street = 2nd Ave }

@cknightdevelopment
Copy link
Author

cknightdevelopment commented Jan 16, 2017

After looking at your example I agree with you that InnerJoin is not needed.

I think for the others (LeftJoin, RightJoin, and FullOuterJoin) flattening is one benefit, as well as having a simple way to say the type of join you want.

@atifaziz
Copy link
Member

atifaziz commented Jan 17, 2017

Great, so then we are just left with LeftJoin, RightJoin, and FullOuterJoin. Since LeftJoin and RightJoin can be implemented in terms of each other by more or less swapping sides, let's focus our attention on LeftJoin and FullOuterJoin with RightJoin being implemented in terms of LeftJoin.

My next question is, have you thought about how this will work with value types? For example, here's a left outer join of two number sequences using standard join from LINQ:

from x in Enumerable.Range(0, 10)
join y in Enumerable.Range(5, 10) on x equals y into ys
from y in ys.DefaultIfEmpty()
select new { x, y }

This will give the output:

{ x = 0, y = 0 }
{ x = 1, y = 0 }
{ x = 2, y = 0 }
{ x = 3, y = 0 }
{ x = 4, y = 0 }
{ x = 5, y = 5 }
{ x = 6, y = 6 }
{ x = 7, y = 7 }
{ x = 8, y = 8 }
{ x = 9, y = 9 }

What you can see is that the result is rather unsettling as where y was missing, there is now a 0 in the result. Assuming we are working with zero and positive integers only, we can use an overload of DefaultIfEmpty to have -1 mean missing:

from x in Enumerable.Range(0, 10)
join y in Enumerable.Range(5, 10) on x equals y into ys
from y in ys.DefaultIfEmpty(-1)
select new { x, y }

The output will be:

{ x = 0, y = -1 }
{ x = 1, y = -1 }
{ x = 2, y = -1 }
{ x = 3, y = -1 }
{ x = 4, y = -1 }
{ x = 5, y = 5 }
{ x = 6, y = 6 }
{ x = 7, y = 7 }
{ x = 8, y = 8 }
{ x = 9, y = 9 }

The reason I'm going over this is that I think the join methods are going to have signatures that are more complicated than those you initially illustrated (repeated below), in order to account for missing cases:

people.LeftJoin(
    addresses,
    p => p.PersonId,
    a => a.PersonId
);

It's less of an issue when you are working with references or nullable values where null is a natural choice, but less obvious otherwise. The join methods would therefore not only deal with flattening but also defaulting. Here's how I imagine LeftJoin could look like by the end:

var q = people.LeftJoin(
    addresses,
    p => p.PersonId,
    a => a.PersonId,
    p => new { p.Name, Street = (string) null }, // person with no address
    (p, a) => new { p.Name, a.Street } // person with address
);

Note that there are two functions to determine the projected results. This is better and more explicit than DefaultIfEmpty where you also have to deal with null for missing cases and chose some default for value types.

Also, I'm not in favor of introducing a type like JoinResult<,> for two reasons:

  • It won't be used except for the simplest queries
  • C# 7 tuples will be a better solution for this and if we do it using JoinResult<,> today then there'll be no chance to have a tuple-returning version since you can't overload on return types

Given this and defaults, your signatures will be a little more involved. I want to make sure that you're on the same page with that because it would be a shame if you make a lot of effort for a PR only to find the end-result to be distasteful (or not what you had in mind) so it's good to discuss and set expectations upfront. For example, for FullOuterJoin, I imagine that the most complex overload will be identical to that of OrderedMerge, which after all is a full-outer-join for ordered sequences. Hope it makes sense?

@cknightdevelopment
Copy link
Author

Yes, what you are saying makes sense. I am thinking of a signature of something like this for LeftJoin:

public static IEnumerable<TResult> LeftJoin<TLeft, TRight, TKey, TResult>(
    this IEnumerable<TLeft> left,
    IEnumerable<TRight> right,
    Func<TLeft, TKey> leftKeySelector,
    Func<TRight, TKey> rightKeySelector,
    Func<TLeft, TRight, TResult> bothSelector,
    Func<TLeft, TResult> leftOnlySelector,
    IEqualityComparer<TKey> comparer = null
)

And FullOuterJoin could be similar with one minor change of adding a rightOnlySelector:

public static IEnumerable<TResult> FullOuterJoin<TLeft, TRight, TKey, TResult>(
    this IEnumerable<TLeft> left,
    IEnumerable<TRight> right,
    Func<TLeft, TKey> leftKeySelector,
    Func<TRight, TKey> rightKeySelector,
    Func<TLeft, TRight, TResult> bothSelector,
    Func<TLeft, TResult> leftOnlySelector,
    Func<TRight, TResult> rightOnlySelector,
    IEqualityComparer<TKey> comparer = null
)

This handles the cases you mentioned in your last comment, such as:

  • Handles returned object for matched (bothSelector) and un-matched (leftOnlySelector and rightOnlySelector) scenarios
  • Custom equality comparer for the key selector
  • No longer returns a JoinResult<,> object

What are your thoughts on this? Thanks.

@atifaziz
Copy link
Member

Thanks for working through the questions and glad to see we are on the same page. 👍

I suggest you proceed with 2 PRs, one for LeftJoin and RightJoin (since the latter will build on the former) and another for FullOuterJoin.

@atifaziz
Copy link
Member

In terms of consistency of names, we should decide if the outer qualification is necessary. It's not there in LeftJoin and RightJoin so I'd ask, why not rename FullOuterJoin to simply FullJoin?

@cknightdevelopment
Copy link
Author

cknightdevelopment commented Jan 18, 2017

Yeah, 2 separate PRs make sense to me. I agree that we should use FullJoin instead of FullOuterJoin. I will start with LeftJoin/RightJoin later this week.

@cknightdevelopment
Copy link
Author

I forked the project and made a separate branch. When I load the solution in Visual Studio 2015 it builds successfully, but the MoreLinq.Test project has some errors in it's references. Is there anything besides opening the solution in Visual Studio and building that I need to do? I am fairly new to the .NET Core world, so maybe I am missing something. I did a NuGet Package restore, but I still get these errors. Thoughts?:

image

image

@atifaziz
Copy link
Member

I actually have the same problem (and in another project of mine too)…

image

However, my Error List pane is empty and the solution builds just fine. I'm guessing this issue has to do with the .xproj tooling being in preview status.

Do you have the .NET Core tools installed? Have you tried building from the command line by simply running build.cmd?

@cknightdevelopment
Copy link
Author

I got it working now. Tests run successfully as well. Thanks.

@atifaziz atifaziz added this to the 2.8.0 milestone Sep 18, 2017
@atifaziz atifaziz changed the title Idea for additional Join methods to contribute Left, right and full outer joins Sep 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants