Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert multiple equality/in-equality on same column joined by Or/Else into SQL IN expression #20003

Closed
arthur-liberman opened this issue Feb 20, 2020 · 15 comments · Fixed by #21477
Assignees
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported type-enhancement
Milestone

Comments

@arthur-liberman
Copy link

arthur-liberman commented Feb 20, 2020

We generate some Where expressions dynamically in our product. In some situations we pass a long list of values.
In 2.2 this translated into an IN expression and was handled correctly.
In 3.1 it translates to a series of OR expressions with each consecutive expression being nested in brackets.
The following example produces drastically different SQL code. This also reproduces when we use Devart Oracle provider.

Steps to reproduce

Both Where clauses in the example produce identical SQL in each .NET Core version.

using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Metadata.Builders;
using System.Linq;
using System;
using Microsoft.Extensions.Logging;
using System.Collections.Generic;
using System.Linq.Dynamic.Core;

namespace joingroupcountquery
{
    class Program
    {
        static void Main(string[] args)
        {
            var seeder = new Seeder();
            seeder.Seed();
            new Logic().ExecuteQuery();
            Console.WriteLine("Hello World!");
        }
    }

    public class Logic
    {
        public IList<Dto> ExecuteQuery()
        {
            using (var dbContext = new MyDbContext())
            {
                return dbContext.Items
                .Where("Id in(4,5,1,7,12,15)")
                //.Where(i => (i.Id == 4 || i.Id == 5 || i.Id == 1 || i.Id == 7 || i.Id == 12 || i.Id == 15))
                .Select(i => new Dto { Id = i.Id })
                .ToList();
            }
        }
    }

    public class Dto
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public int GroupCount { get; set; }
        public int CategoryCount { get; set; }
    }

    public class Seeder
    {
        public void Seed()
        {
            using (var dbContext = new MyDbContext())
            {
                if (!dbContext.Items.Any())
                {
                    for (int i = 0; i < 10; i++)
                    {
                        var rand = new Random();
                        int group = rand.Next() % 4;
                        int cat = rand.Next() % 3;
                        var item = new Item { GroupId = group, CategoryId = cat, Name = $"Item {i}" };
                        dbContext.Items.Add(item);
                    }
                }
                if (!dbContext.Groups.Any())
                {
                    for (int i = 0; i < 4; i++)
                    {
                        var group = new Group { Name = $"Group {i}" };
                        dbContext.Groups.Add(group);
                    }
                }
                if (!dbContext.Categories.Any())
                {
                    for (int i = 0; i < 3; i++)
                    {
                        var category = new Category { Name = $"Category {i}" };
                        dbContext.Categories.Add(category);
                    }
                }
                dbContext.SaveChanges();
            }
        }
    }

    public class Item
    {
        public int Id { get; set; }
        public int GroupId { get; set; }
        public int CategoryId { get; set; }
        public string Name { get; set; }
    }

    public class Group
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }

    public class Category
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }

    public class MyDbContext : DbContext
    {
        public static readonly ILoggerFactory loggerFactory = LoggerFactory.Create(builder =>
        {
            builder.AddConsole();
        });

        public DbSet<Item> Items { get; set; }
        public DbSet<Group> Groups { get; set; }
        public DbSet<Category> Categories { get; set; }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.ApplyConfiguration(new ItemConfiguration());
            modelBuilder.ApplyConfiguration(new GroupConfiguration());
            modelBuilder.ApplyConfiguration(new CategoryConfiguration());
        }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder
                .UseLoggerFactory(loggerFactory)
                .EnableSensitiveDataLogging()
                .UseSqlServer(@"Server=.;Database=JoinGroupCount;User Id=sa;Password=password");
        }
    }

    public class ItemConfiguration : IEntityTypeConfiguration<Item>
    {
        public void Configure(EntityTypeBuilder<Item> builder)
        {
            builder.ToTable("Item");
            builder.HasKey(o => o.Id);
            builder.Property(t => t.Id).HasColumnName("id").ValueGeneratedOnAdd();
            builder.Property(t => t.GroupId).HasColumnName("groupId");
            builder.Property(t => t.CategoryId).HasColumnName("categoryId");
            builder.Property(t => t.Name).HasColumnName("name");
        }
    }

    public class GroupConfiguration : IEntityTypeConfiguration<Group>
    {
        public void Configure(EntityTypeBuilder<Group> builder)
        {
            builder.ToTable("Group");
            builder.HasKey(o => o.Id);
            builder.Property(t => t.Id).HasColumnName("id").ValueGeneratedOnAdd();
            builder.Property(t => t.Name).HasColumnName("name");
        }
    }

    public class CategoryConfiguration : IEntityTypeConfiguration<Category>
    {
        public void Configure(EntityTypeBuilder<Category> builder)
        {
            builder.ToTable("Category");
            builder.HasKey(o => o.Id);
            builder.Property(t => t.Id).HasColumnName("id").ValueGeneratedOnAdd();
            builder.Property(t => t.Name).HasColumnName("name");
        }
    }
}

.NET Core 3.1:

      SELECT [i].[id] AS [Id]
      FROM [Item] AS [i]
      WHERE ((((([i].[id] = 4) OR ([i].[id] = 5)) OR ([i].[id] = 1)) OR ([i].[id] = 7)) OR ([i].[id] = 12)) OR ([i].[id] = 15)

.NET Core 2.2:

      SELECT [i].[id]
      FROM [Item] AS [i]
      WHERE [i].[id] IN (4, 5, 1, 7, 12, 15)

With enough elements in the IN expression, we end up exceeding the allowed nest depth and the SQL server throws an exception.

I'm not sure whether this a dup, but I couldn't find any other open issues describing this problem.

Further technical details

EF Core version: 3.1.2
Database provider: Microsoft.EntityFrameworkCore.SqlServer, Devart Oracle provider
Target framework: .NET Core 3.1.2
Operating system: Visual Studio 2019 16.4, Visual Studio Code

@arthur-liberman
Copy link
Author

arthur-liberman commented Feb 20, 2020

Now I see that it's a dup.
That's a pretty bad regression, and very sad to see that the fix has been pushed back to 5.0 release.

@ErikEJ
Copy link
Contributor

ErikEJ commented Feb 20, 2020

@arthur-liberman I have fixed a similar issue in 3.0, but it may be because my values are split into buckets of factors of 2. Where is the duplicate?

@arthur-liberman
Copy link
Author

I just searched for "nested too deeply" in the issues section, but now that I've taken another look none of the issues listed are the same as mine, so I'll reopen this.

@NetTecture
Copy link

Well, part of the issue is hat SQL has no problem with multiple OR to start with - there is no need to use brackets like that and nest them. The seoncd is that - despite all issues with the IN statement, the IN statement is way better than tons of OR in scenarios like this. I would call the 3.1 code defective because it does get way overly complex.

@ErikEJ
Copy link
Contributor

ErikEJ commented Feb 20, 2020

@NetTecture In with hard coded parameter values is not ideal, imho.

@arthur-liberman you must share a repro if.you want help.

@arthur-liberman
Copy link
Author

@ErikEJ I updated the OP with a full code example.

Also, I agree that In with hard coded parameters isn't ideal, but sometimes there are legacy reasons for having to resort to using it like this. It should still work as expected.
I would be fine with no brackets surrounding the OR statements, we really needed this to just work.
We ended up resorting to ugly and unsafe (in regards to system robustness) workaround to get the system to work properly.

@smitpatel
Copy link
Member

We don't convert multiple ORs to In Expression in 3.x. We had that optimization since 2.x translation pipeline internally we generated that. In 3.x we don't generate that internally and users could rewrite query to use contains. We will add this some time in future.

Using Contains is the way to write such query right now.

@arthur-liberman
Copy link
Author

@smitpatel I understand, but even so, the way that the SQL query is created from the ORs is incorrect.
I would expect to see something like WHERE ([i].[id] = 4) OR ([i].[id] = 5) OR ([i].[id] = 1) OR ([i].[id] = 7) OR ([i].[id] = 12) OR ([i].[id] = 15), not WHERE ((((([i].[id] = 4) OR ([i].[id] = 5)) OR ([i].[id] = 1)) OR ([i].[id] = 7)) OR ([i].[id] = 12)) OR ([i].[id] = 15).

@smitpatel
Copy link
Member

@arthur-liberman - I understand the way you want to see it and for humans it is obvious readable choice but expression tree does not work that way. Since it is OR & equality, they are BinaryExpression. So it generates nice big binary tree which compiler generates right skewed.

@arthur-liberman
Copy link
Author

arthur-liberman commented Feb 20, 2020

@smitpatel Is there no way around this?
Contains is great, but our implementation is generic, and uses dynamic linq from string to generate the LINQ queries. I can't see how we can create strongly typed lists this way.
Our current workaround is to parse the IN(...) statement before passing it to dynamic LINQ, create a list of objects, and pass it as a parameter to the Where clause.
The problem with it is that we only added support for List<string> and List<int> as those are the most common types which will be encountered, but the code is ugly and will not permit using any other data types.

If anybody has any suggestion on how to better approach this, I'd be glad to hear them.

@smitpatel
Copy link
Member

return dbContext.Items
                .Where(i => new [] { 4, 5, 1, 7, 12, 15}.Contains(i.Id))
                .Select(i => new Dto { Id = i.Id })
                .ToList();

You can just initialize an array. As long as it is a mapped scalar type it works.

@arthur-liberman
Copy link
Author

Thanks!
Looks like this works.

        public IList<Dto> ExecuteQuery()
        {
            using (var dbContext = new MyDbContext())
            {
                return dbContext.Items
                .Where("(new [] { 4,5,1,7,12,15 }).Contains(Id)")
                .Select(i => new Dto { Id = i.Id })
                .ToList();
            }
        }

@smitpatel smitpatel changed the title A large sequence of OR expressions results in "Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries" Convert multiple equality/in-equality on same column joined by Or/Else into SQL IN expression Feb 20, 2020
@maumar maumar self-assigned this Feb 20, 2020
@ErikEJ
Copy link
Contributor

ErikEJ commented Feb 21, 2020

Current effort to reduce nesting (with a help from a very smart friend!) - and my parameter count is factors of 2!

            var predicates = distinctValues
                .Select(v =>
                {
                    Expression<Func<TKey>> valueAsExpression = () => v;
                    return Expression.Equal(keySelector.Body, valueAsExpression.Body);
                })
                .ToList();

            while (predicates.Count > 1)
            {
                predicates = PairWise(predicates).Select(p => Expression.OrElse(p.Item1, p.Item2)).ToList();
            }

            var body = predicates.Single();

            var clause = Expression.Lambda<Func<TQuery, bool>>(body, keySelector.Parameters);

            return queryable.Where(clause);

        private static IEnumerable<(T, T)> PairWise<T>(this IEnumerable<T> source)
        {
            var sourceEnumerator = source.GetEnumerator();
            while (sourceEnumerator.MoveNext())
            {
                var a = sourceEnumerator.Current;
                sourceEnumerator.MoveNext();
                var b = sourceEnumerator.Current;

                yield return (a, b);
            }
        }

@smitpatel
Copy link
Member

This also has a duplicate

@maumar
Copy link
Contributor

maumar commented May 19, 2020

cosmos improvement is tracked by #12635

@maumar maumar removed this from the 5.0.0 milestone Jun 23, 2020
@ajcvickers ajcvickers added this to the 5.0.0 milestone Jun 26, 2020
maumar added a commit that referenced this issue Jul 1, 2020
…joined by Or/Else into SQL IN expression

Various optimizations to queries with IN:
- a == X || a == Y -> a IN (X, Y)
- a != X && a != Y -> a NOT IN (X, Y)
- a IN (X) -> a == X
- a IN (X, Y) || a IN (Z, W) -> a IN (X, Y, Z, W)
- a IN (X, Y, Z) && a IN (Y, Z, W) -> a IN (Y, Z)

Fixes #20003
@maumar maumar added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Jul 1, 2020
maumar pushed a commit that referenced this issue Jul 1, 2020
…joined by Or/Else into SQL IN expression

Various optimizations to queries with IN:
- a == X || a == Y -> a IN (X, Y)
- a != X && a != Y -> a NOT IN (X, Y)
- a IN (X) -> a == X
- a IN (X, Y) || a IN (Z, W) -> a IN (X, Y, Z, W)
- a IN (X, Y, Z) && a IN (Y, Z, W) -> a IN (Y, Z)

Fixes #20003
maumar pushed a commit that referenced this issue Jul 1, 2020
…joined by Or/Else into SQL IN expression

Various optimizations to queries with IN:
- a == X || a == Y -> a IN (X, Y)
- a != X && a != Y -> a NOT IN (X, Y)
- a IN (X) -> a == X
- a IN (X, Y) || a IN (Z, W) -> a IN (X, Y, Z, W)
- a IN (X, Y, Z) && a IN (Y, Z, W) -> a IN (Y, Z)

Fixes #20003
maumar added a commit that referenced this issue Jul 1, 2020
…joined by Or/Else into SQL IN expression

Various optimizations to queries with IN:

- a == X || a == Y -> a IN (X, Y)
- a != X && a != Y -> a NOT IN (X, Y)
- a IN (X) -> a == X
- a IN (X, Y) || a IN (Z, W) -> a IN (X, Y, Z, W)
- a IN (X, Y, Z) && a IN (Y, Z, W) -> a IN (Y, Z)

Fixes #20003
@maumar maumar closed this as completed in 585a09d Jul 8, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0, 5.0.0-preview8 Jul 14, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0-preview8, 5.0.0 Nov 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment