Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXPLAIN ANALYZE should run all Optimizer passes #929

Merged
merged 1 commit into from
Aug 23, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 12 additions & 2 deletions datafusion/src/optimizer/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -195,11 +195,21 @@ pub fn from_plan(
schema: schema.clone(),
alias: alias.clone(),
}),
LogicalPlan::Analyze {
verbose, schema, ..
} => {
assert!(expr.is_empty());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the actual bug fix: the code that builds up LogicalPlans from optimized inputs ignored the inputs for Analyze. I have a follow on PR to try and make this kind of thing harder to introduce in the future: #930

assert_eq!(inputs.len(), 1);
Ok(LogicalPlan::Analyze {
verbose: *verbose,
schema: schema.clone(),
input: Arc::new(inputs[0].clone()),
})
}
LogicalPlan::EmptyRelation { .. }
| LogicalPlan::TableScan { .. }
| LogicalPlan::CreateExternalTable { .. }
| LogicalPlan::Explain { .. }
| LogicalPlan::Analyze { .. } => Ok(plan.clone()),
| LogicalPlan::Explain { .. } => Ok(plan.clone()),
}
}

Expand Down
45 changes: 45 additions & 0 deletions datafusion/tests/sql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,28 @@ use datafusion::{
};
use datafusion::{execution::context::ExecutionContext, physical_plan::displayable};

/// A macro to assert that one string is contained within another with
/// a nice error message if they are not.
///
/// Usage: `assert_contains!(actual, expected)`
///
/// Is a macro so test error
/// messages are on the same line as the failure;
///
/// Both arguments must be convertable into Strings (Into<String>)
macro_rules! assert_contains {
($ACTUAL: expr, $EXPECTED: expr) => {
let actual_value: String = $ACTUAL.into();
let expected_value: String = $EXPECTED.into();
assert!(
actual_value.contains(&expected_value),
"Can not find expected in actual.\n\nExpected:\n{}\n\nActual:\n{}",
expected_value,
actual_value
);
};
}

#[tokio::test]
async fn nyc() -> Result<()> {
// schema for nyxtaxi csv files
Expand Down Expand Up @@ -2589,6 +2611,29 @@ async fn csv_explain_verbose_plans() {
);
}

#[tokio::test]
async fn explain_analyze_runs_optimizers() {
// repro for https://github.com/apache/arrow-datafusion/issues/917
// where EXPLAIN ANALYZE was not correctly running optiimizer
let mut ctx = ExecutionContext::new();
register_alltypes_parquet(&mut ctx);

// This happens as an optimization pass where count(*) can be
// answered using statistics only.
let expected = "EmptyExec: produce_one_row=true";

let sql = "EXPLAIN SELECT count(*) from alltypes_plain";
let actual = execute_to_batches(&mut ctx, sql).await;
let actual = arrow::util::pretty::pretty_format_batches(&actual).unwrap();
assert_contains!(actual, expected);

// EXPLAIN ANALYZE should work the same
let sql = "EXPLAIN ANALYZE SELECT count(*) from alltypes_plain";
let actual = execute_to_batches(&mut ctx, sql).await;
let actual = arrow::util::pretty::pretty_format_batches(&actual).unwrap();
assert_contains!(actual, expected);
}

fn aggr_test_schema() -> SchemaRef {
Arc::new(Schema::new(vec![
Field::new("c1", DataType::Utf8, false),
Expand Down