Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: can't run EXPLAIN(DISTSQL) or EXPLAIN(VEC) on plans that need to see concrete values from subqueries #40677

Open
rafiss opened this issue Sep 11, 2019 · 11 comments
Labels
A-sql-execution Relating to SQL execution. A-sql-ui Why is my query slow? C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-sql-queries SQL Queries Team

Comments

@rafiss
Copy link
Collaborator

rafiss commented Sep 11, 2019

The following produces an internal error with EXPLAIN (DISTSQL), even though the query itself runs fine. Not sure if it's related to other invalid index issues like #39433, #40444, or #39438.

root@:26257/defaultdb> CREATE TABLE kv (
  k INT PRIMARY KEY,
  v INT,
  w INT,
  f FLOAT,
  d DECIMAL,
  s STRING,
  b BOOL,
  FAMILY (k, v, w, f, b),
  FAMILY (d),
  FAMILY (s)
);
CREATE TABLE

root@:26257/defaultdb> INSERT INTO kv VALUES
(1, 2, 3, 1.0, 1, 'a', true),
(3, 4, 5, 2, 8, 'a', true),
(5, NULL, 5, 9.9, -321, NULL, false),
(6, 2, 3, 4.4, 4.4, 'b', true),
(7, 2, 2, 6, 7.9, 'b', true),
(8, 4, 2, 3, 3, 'A', false);
INSERT 6

root@:26257/defaultdb> EXPLAIN (distsql)
    SELECT
        avg(v) OVER (
            PARTITION BY
                w
            ROWS
                BETWEEN (SELECT count(*) FROM kv) PRECEDING AND 1 FOLLOWING
        )
    FROM
        kv;

This leads to the following error:

pq: internal error: invalid index 1 for "(SELECT count(*) FROM kv)"
DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/subquery.go:46: EvalSubquery()
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:4099: Eval()
github.com/cockroachdb/cockroach/pkg/sql/distsqlpb/processors.go:110: initFromAST()
github.com/cockroachdb/cockroach/pkg/sql/distsqlpb/processors.go:220: InitFromAST()
github.com/cockroachdb/cockroach/pkg/sql/distsql_plan_window.go:139: createWindowFnSpec()
github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:3158: createPlanForWindow()
github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:2421: createPlanForNode()
github.com/cockroachdb/cockroach/pkg/sql/distsql_physical_planner.go:2329: createPlanForNode()
github.com/cockroachdb/cockroach/pkg/sql/explain_distsql.go:132: startExec()
github.com/cockroachdb/cockroach/pkg/sql/plan.go:362: func2()
github.com/cockroachdb/cockroach/pkg/sql/walk.go:141: func1()
github.com/cockroachdb/cockroach/pkg/sql/walk.go:146: visitInternal()
github.com/cockroachdb/cockroach/pkg/sql/walk.go:108: visit()
github.com/cockroachdb/cockroach/pkg/sql/walk.go:72: walkPlan()
github.com/cockroachdb/cockroach/pkg/sql/plan.go:365: startExec()
github.com/cockroachdb/cockroach/pkg/sql/plan_node_to_row_source.go:117: Start()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/processors.go:792: Run()
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/flow.go:584: Run()
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:337: Run()
github.com/cockroachdb/cockroach/pkg/sql/distsql_running.go:962: PlanAndRun()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:816: execWithDistSQLEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:708: dispatchToExecutionEngine()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:417: execStmtInOpenState()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:98: execStmt()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1238: execCmd()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1167: run()
github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:444: ServeConn()
github.com/cockroachdb/cockroach/pkg/sql/pgwire/conn.go:584: func1()
runtime/asm_amd64.s:1337: goexit()

EXPLAIN (VEC) gives this:

root@:26257/defaultdb> EXPLAIN (vec)
    SELECT
        avg(v) OVER (
            PARTITION BY
                w
            ROWS
                BETWEEN (SELECT count(*) FROM kv) PRECEDING AND 1 FOLLOWING
        )
    FROM
        kv;
pq: frame starting offset must not be null

Jira issue: CRDB-5526

Epic CRDB-14510

@rafiss rafiss added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-execution Relating to SQL execution. labels Sep 11, 2019
@rafiss
Copy link
Collaborator Author

rafiss commented Sep 11, 2019

I guess the proximate cause is that explain_vec.go does this:

planCtx.planner.curPlan.subqueryPlans = n.subqueryPlans

whereas explain_distsql.go leaves planCtx.planner.curPlan.subqueryPlans as nil unless ANALYZE is used.

And subquery.go EvalSubquery has the following check.

	if expr.Idx < 0 || expr.Idx-1 >= len(p.curPlan.subqueryPlans) {
		return nil, errors.AssertionFailedf("invalid index %d for %q", expr.Idx, expr)
	}

@jordanlewis
Copy link
Member

Wait, are we looking at different things? Isn't the proximate cause:

pq: frame starting offset must not be null

which sounds like a window functions thing?

@yuzefovich
Copy link
Member

I haven't looked into this issue, but the error message to me implies that the subquery is not evaluated, so it is a bug with EXPLAIN (VEC) - in there, we simply set NULLs for all results of the subqueries.

@jordanlewis
Copy link
Member

Oh, yeah. How annoying. Perhaps EXPLAIN(VEC) needs to be enhanced to set default values for each type instead of null.

@rafiss
Copy link
Collaborator Author

rafiss commented Sep 11, 2019

I think there are two separate issues here then.

EXPLAIN (VEC) results in pq: frame starting offset must not be null. It happens at

return pgerror.Newf(pgcode.NullValueNotAllowed, "frame starting offset must not be null")

EXPLAIN (DISTSQL) results in an internal error with a stack trace. It happens when

dStartOffset, err := typedStartOffset.Eval(evalCtx)
has an error calling the EvalSubquery code I posted above. (This happens before the frame starting offset must not be null error.)

@yuzefovich
Copy link
Member

I'll look into the internal error (a little bit later).

craig bot pushed a commit that referenced this issue Sep 17, 2019
40803: sql: replace an internal error in EXPLAIN (DISTSQL) and EXPLAIN (VEC) r=jordanlewis a=yuzefovich

Currently, when we're explaining a query that has a subquery value
of which is needed to construct the physical plan, we error out with
an internal error. This commit replaces that stack trace with a more
user-friendly message saying that explaining such a query is not
supported.

Addresses: #40677.

Release justification: Category 2: Bug fixes and low-risk updates to
new functionality.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@jordanlewis jordanlewis changed the title sql: internal error: invalid index in EXPLAIN(DISTSQL) sql: can' invalid index in EXPLAIN(DISTSQL) Sep 17, 2019
@jordanlewis jordanlewis changed the title sql: can' invalid index in EXPLAIN(DISTSQL) sql: can't run EXPLAIN(DISTSQL) or EXPLAIN(VEC) on plans that need to see concrete values from subqueries Sep 17, 2019
@yuzefovich yuzefovich removed their assignment Mar 14, 2020
@asubiotto
Copy link
Contributor

What's the status here?

@yuzefovich
Copy link
Member

Nothing has changed, AFAIU it's pretty hard to fix this correctly, if possible, without actually executing the subquery. We could attempt to replace the result of subquery with zeroes, but that also is tricky since we need to have a zero of the "correct" type, probably doable though.

@awoods187
Copy link
Contributor

This works with EXPLAIN and EXPLAIN ANALYZE in 21.1, should we close this issue since we are steering people away from these variants? @RaduBerinde

@RaduBerinde
Copy link
Member

I think these are more for our own use (especially VEC). We can now see distsql plans in the EXPLAIN ANALYZE (DEBUG) bundle, even with subqueries. I think the most useful thing we could do here would be to get EXPLAIN (VEC) info in the bundle. This would work with subqueries just fine, because they would actually get run (just like DISTSQL works). I hit some issues with VEC when I implemented the bundles so I omitted it.

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@yuzefovich yuzefovich added T-sql-queries SQL Queries Team C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. and removed no-issue-activity C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. A-sql-ui Why is my query slow? C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

8 participants