-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Translate Contains to IN with subquery instead of EXISTS where relevant #30955
Comments
Using IN rather than EXISTS provides some very serious performance advantages, on all databases except for SQL Server; this is probably due to the removal of the correlated subquery which EXISTS requires. The benchmark below uses the data and queries from @FHTSean, comparing the following two SQLs (see #30938): SELECT *
FROM `CustomerMainTable` AS `t`
WHERE `t`.`CustomerMainTableId` IN (
SELECT MAX(`t0`.`CustomerMainTableId`)
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
) ... with: SELECT *
FROM `CustomerMainTable` AS `t`
WHERE EXISTS (
SELECT 1
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
HAVING MAX(`t0`.`CustomerMainTableId`) = `t`.`CustomerMainTableId`) On MariaDB:
(That's 7ms compared to 9 seconds) On PostgreSQL I'm also seeing a very big performance difference, even if less dramatic:
Also on SQLite:
On SQL Server both queries have the exact same running time, I'm guessing SQL Server sees through the EXISTS and avoids the correlated subquery. Full benchmark sourcesBenchmarkRunner.Run<Benchmark>();
public class Benchmark
{
private MySqlConnection _connection;
[GlobalSetup]
public async Task Setup()
{
_connection = new MySqlConnection("Server=localhost;Database=test;User ID=root;Password=Abcd5678;");
await _connection.OpenAsync();
}
[Benchmark]
public async Task In()
{
await using var command = new MySqlCommand(
"""
SELECT *
FROM `CustomerMainTable` AS `t`
WHERE `t`.`CustomerMainTableId` IN (
SELECT MAX(`t0`.`CustomerMainTableId`)
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`);
""", _connection);
await command.ExecuteNonQueryAsync();
}
[Benchmark]
public async Task Exists()
{
await using var command = new MySqlCommand(
"""
SELECT *
FROM `CustomerMainTable` AS `t`
WHERE EXISTS (
SELECT 1
FROM `CustomerMainTable` AS `t0`
GROUP BY `t0`.`CustomerMainTableCustomerId`
HAVING MAX(`t0`.`CustomerMainTableId`) = `t`.`CustomerMainTableId`)
""", _connection);
await command.ExecuteNonQueryAsync();
}
} |
Although our InExpression supports having a subquery (and various visitors handle that), we never actually generate this - instead we always generate an EXISTS subquery with a predicate (based on SQL Server baselines). Translating to InExpression is likely better - it doesn't require a correlated subquery and expresses the intent more succintly, without needing a predicate. For example, instead of this:
We can translate to this:
However, note that our null semantics around InExpression with subquery currently seems broken (fortunately it's dead code): IN returns null when the item is null (or when the values contains null and a match isn't found). We'd need to make that logic actually work.
The text was updated successfully, but these errors were encountered: