sql/performance: independent filter expressions on JOINs should be pushed down to the operands #8566

zhaoyuxi · 2016-08-16T02:49:43Z

1 Issue appearance
I created two tables and inserted 800,000 rows into each table. A relational query hanging more than 20 minutes is not over yet. 1 CPU was 100% fully filled by a switch thread such as following two stacks:

Thread 11 (Thread 0x2af3f2c00700 (LWP 11476)):
#0 0x000000000058ca11 in runtime.getitab ()
#1 0x000000000058dbe5 in runtime.assertI2I ()
#2 0x0000000000934cd7 in github.com/cockroachdb/cockroach/sql/parser.(*ComparisonExpr).Eval ()
#3 0x000000000092f7c0 in github.com/cockroachdb/cockroach/sql/parser.(*AndExpr).Eval ()
#4 0x0000000000c21b83 in github.com/cockroachdb/cockroach/sql/sqlbase.RunFilter ()
#5 0x0000000000f1c0e1 in github.com/cockroachdb/cockroach/sql.(*selectNode).Next ()
#6 0x0000000000f23fa6 in github.com/cockroachdb/cockroach/sql.(*selectTopNode).Next ()
#7 0x0000000000ece0e6 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmt ()
#8 0x0000000000ecb9e9 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtInOpenTxn ()
#9 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#10 0x0000000000ec9aad in github.com/cockroachdb/cockroach/sql.runTxnAttempt ()
#11 0x0000000000f5cb60 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest.func2 ()
#12 0x0000000000ae8c4b in github.com/cockroachdb/cockroach/internal/client.(*Txn).Exec ()
#13 0x0000000000ec8450 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest ()
#14 0x0000000000ec7656 in github.com/cockroachdb/cockroach/sql.(*Executor).ExecuteStatements ()
#15 0x0000000000fec6a8 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).executeStatements ()
#16 0x0000000000fe8548 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).handleSimpleQuery ()
#17 0x0000000000fe813a in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).serve ()
#18 0x0000000000fde369 in github.com/cockroachdb/cockroach/sql/pgwire.(*Server).ServeConn ()
#19 0x0000000000ad298d in github.com/cockroachdb/cockroach/server.(*Server).Start.func8.1 ()
#20 0x0000000001000c7d in github.com/cockroachdb/cockroach/util/netutil.(*Server).ServeWith.func1 ()
#21 0x00000000005dfbd1 in runtime.goexit ()
#22 0x000000c82037ce10 in ?? ()
#23 0x000000c8201581f0 in ?? ()
#24 0x00002af38040a180 in ?? ()
#25 0x000000c82d16ed10 in ?? ()
#26 0x000000c82035e000 in ?? ()
#27 0x7568746967203866 in ?? ()
#28 0x616d2f6d6f632e62 in ?? ()
#29 0x692d6f672f6e7474 in ?? ()
#30 0x36353a7974746173 in ?? ()
#31 0x000000c82543a958 in ?? ()
#32 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#33 0x0000000000000000 in ?? ()

Thread 8 (Thread 0x2af3f3203700 (LWP 11479)):
#0 0x0000000000589cb3 in runtime.mapiternext ()
#1 0x0000000000f2265a in github.com/cockroachdb/cockroach/sql.qvalMap.populateQVals ()
#2 0x0000000000f1c099 in github.com/cockroachdb/cockroach/sql.(*selectNode).Next ()
#3 0x0000000000f23fa6 in github.com/cockroachdb/cockroach/sql.(*selectTopNode).Next ()
#4 0x0000000000ece0e6 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmt ()
#5 0x0000000000ecb9e9 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtInOpenTxn ()
#6 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#7 0x0000000000ec9aad in github.com/cockroachdb/cockroach/sql.runTxnAttempt ()
#8 0x0000000000f5cb60 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest.func2 ()
#9 0x0000000000ae8c4b in github.com/cockroachdb/cockroach/internal/client.(*Txn).Exec ()
#10 0x0000000000ec8450 in github.com/cockroachdb/cockroach/sql.(*Executor).execRequest ()
#11 0x0000000000ec7656 in github.com/cockroachdb/cockroach/sql.(*Executor).ExecuteStatements ()
#12 0x0000000000fec6a8 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).executeStatements ()
#13 0x0000000000fe8548 in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).handleSimpleQuery ()
#14 0x0000000000fe813a in github.com/cockroachdb/cockroach/sql/pgwire.(*v3Conn).serve ()
#15 0x0000000000fde369 in github.com/cockroachdb/cockroach/sql/pgwire.(*Server).ServeConn ()
#16 0x0000000000ad298d in github.com/cockroachdb/cockroach/server.(*Server).Start.func8.1 ()
#17 0x0000000001000c7d in github.com/cockroachdb/cockroach/util/netutil.(*Server).ServeWith.func1 ()
#18 0x00000000005dfbd1 in runtime.goexit ()
#19 0x000000c82037ce10 in ?? ()
#20 0x000000c8201581f0 in ?? ()
#21 0x00002af38040a180 in ?? ()
#22 0x000000c82d16ed10 in ?? ()
#23 0x000000c82035e000 in ?? ()
#24 0x7568746967203866 in ?? ()
#25 0x616d2f6d6f632e62 in ?? ()
#26 0x692d6f672f6e7474 in ?? ()
#27 0x36353a7974746173 in ?? ()
#28 0x000000c82543a958 in ?? ()
#29 0x0000000000eca5e4 in github.com/cockroachdb/cockroach/sql.(*Executor).execStmtsInCurrentTxn ()
#30 0x0000000000000000 in ?? ()

2 Two tables:
CREATE TABLE a(
id STRING(128) PRIMARY KEY,
name STRING(32),
type STRING(32),
int_id INT UNIQUE,
int_type INT,
INDEX type_idx (type),
INDEX int_id_idx (int_id),
INDEX int_type_idx (int_type)
);
CREATE TABLE b(
id STRING(128) PRIMARY KEY,
name STRING(32),
type STRING(32),
int_id INT UNIQUE,
int_type INT,
foreignA STRING NOT NULL REFERENCES a,
INDEX(foreignA)
);

All was fast as following:
SELECT * from a where a.id='70000000';
SELECT * from a where id='70000000';
SELECT * from b where b.id='70000000';
SELECT * from b where id='70000000';

But I didn’t have the patience to wait for the result after more than 20 minutes on this case:
SELECT * from a,b where a.id='70000000' and b.id='70000000';

I expected a fast result. But it’s not just a full scan. A full scan had ended early within 1 minute according to my test.

petermattis · 2016-08-16T11:53:12Z

@zhaoyuxi Joins are functional for small tables, but not at all optimized. They likely have pathological performance problems. Cc @knz, we can speak to exactly what is going on in this case.

knz · 2016-08-16T12:01:46Z

Try this:

select * from (select * from a where a.id=70000000) as a, (select * from b where b.id=70000000) as b;

We don't yet propagate the filters in the outer select into the inner scans, so you have to do it manually to get the expected performance. There is a bunch of techniques we plan to use to optimize query performance and they are on the roadmap, however we're not yet there.

For an overview of the current state of our JOIN implementation you can peruse the following blog post:
https://www.cockroachlabs.com/blog/cockroachdbs-first-join/

zhaoyuxi · 2016-08-17T09:51:22Z

@petermattis, Thank you very much. It works now. Our service expects so much on relational queries optimization.

zhaoyuxi · 2016-08-17T09:53:03Z

@knz,@petermattis. Thank you very much.

knz · 2016-08-17T10:37:33Z

@zhaoyuxi if you have time we'd be interested to know which types of JOINs your service uses the most often. This way we could put them higher in our list of objectives.

zhaoyuxi · 2016-08-18T05:11:45Z

@knz It is my pleasure. Our service has 50 kinds of structured objects. The ojbect and its index are separately stored.No strong consistency. Read much more and write much less. First do the select based on the index table to get the object ID. Then get the object. The great challenges for us are:

Some select query performance is very poor. And latency is also high. The select statement is often as follows:
SELECT main.key /,sub1.mKey,sub2.mKey,sub4.mKey/
FROM main
INNER JOIN sub1 ON main.key = sub1.mKey and (sub1.contentType in (0,4,6,10) and sub1.netID=2)
INNER JOIN sub2 ON main.key = sub2.mKey and (sub2.startTime>? and sub2.endTime<?)
INNER JOIN sub3 ON main.key = sub3.mKey and (sub3.recommend==1)
INNER JOIN sub4 ON main.key = sub4.mKey and (sub4.status==1 or ...)
ORDER BY main.key

Query result has about 10000 records. The page only takes topN usually 10 every time. 1500 select TPS is asked on 4C36G VM.

2.Several objects may span multiple nodes, and the efficiency of distributed SQL is even worse. Not only to be flexible, but also to require low CPU consumption.

zhaoyuxi · 2016-08-18T06:33:06Z

It’s our pleasure to get your ping. Thanks very much.

Cockroach is so amazing for us. Scalable, Survivable, SQL and Consistent (Not strongly) are our needs But we worry about SQL performance. Especially distributed SQL.
Our service has 50 kinds of structured objects. The object and its index are separately stored. No strong consistency. Read much more and write much less. First do the select based on the index table to get the object ID. Then get the object. The great challenges for us are:

Some select query performance is very poor. And latency is also high. The select statement is often as follows:
SELECT main.key /,sub1.mKey,sub2.mKey,sub4.mKey/
FROM main
INNER JOIN sub1 ON main.key = sub1.mKey and (sub1.contentType in (0,4,6,10) and sub1.netID=2)
INNER JOIN sub2 ON main.key = sub2.mKey and (sub2.startTime>? and sub2.endTime<?)
INNER JOIN sub3 ON main.key = sub3.mKey and (sub3.recommend==1)
INNER JOIN sub4 ON main.key = sub4.mKey and (sub4.status==1 or ...)
ORDER BY main.key
Query result has about 10000 records. The page only takes topN usually 10 every time. 1500 select TPS is asked on 4C36G VM.
2.Several objects may span multiple nodes, and the efficiency of distributed SQL is even worse. Not only to be flexible, but also to require low CPU consumption.
Now we are just in the initial design and prototype development. The performance has a gap. Probably we will do the SQL optimization according to our service. The deep optimization will go deep into the distribution of physical storage. We need to know how the data are ranged and to know where it’s. So the exploring of the storage distribution and even to control it is also our needs. We need this deep control for future optimization.

knz · 2016-08-18T09:55:36Z

cc @RaduBerinde @andreimatei

knz · 2016-08-18T10:01:39Z

@zhaoyuxi thanks for sharing.
Now I wonder how much concurrency we can extract from this type of query. Can you say more about how many different rows of the "main" table are involved in the final 10000 records? Are the tables sub1-sub4 filters over main, or is main a filter over the other 4? (I wonder if we can run the filters on the sub1-sub4 tables concurrently.)

RaduBerinde · 2016-08-18T11:49:29Z

@knz From the sample query, each sub1-sub4 table has a filter that can be evaluated on that table.
A decent strategy would be to use merge join (especially if main is already indexed by key). With distsql the four scans with filters would run concurrently. For filters that are very permissive (sub4.status == 1 might be), we could do point lookups if we only want the top 10 results (so e.g. merge join with sub1, sub2, sub3 and then point lookups in sub4 until we get 10 hits).

@zhaoyuxi Thanks for sharing. Please note that our current JOIN implementation is meant to be just functional (only on very small data sets); it is not anywhere close to what we plan to have in 1.0. We also plan to distribute SQL computation among the nodes that store relevant data.

I would like to hear more detail on 3. It sounds like you want to know exactly how data is distributed and stored? That is not something we have planned.. Assuming you had a way of knowing how table data is sliced and where each piece is stored, what would you do with that information? What is the "deep optimization" regarding?
We provide some control using zones; one can control on what stores certain tables go to. So e.g. if some nodes are faster or some stores have faster storage, one can make some "important" tables only use those nodes/stores.

zhaoyuxi · 2016-08-18T12:38:58Z

Now the main has 6,000,000 records. The sub about 10100 times than the main.
Tables sub1-sub4 tables are the filters with 15 columns. The main table references the domain object. It often means all sub filters are satisfied. Filter one sub by one sub(All are ‘and’ operation). And the sub is composed of ‘or’ or ‘and’ operations. The filter columns are the type of Boolean, string enumeration(less than 10 values, in or equal operation), integer enumeration(less than 10 values, in or equal operation) for the most case. And some are period time(between operation) or counters(less than or more than or ordered by operation).

zhaoyuxi · 2016-08-19T02:43:28Z

@RaduBerinde Thanks for your input.
If all the filters(rows of main and sub) from the same domain object is not in a single node, the distriubuted SQL performance probably will be bad. We plan that just the host nodes(hold the domain object filters) do a local select concurrently without communication with other nodes. Then merge the filtered rows. Consistency read is not asked for our service.
The zone probably can't support all the related rows from relational tables are stored in the same node. Different cockroach cluster can also support association together. But the flexibility is lost.

knz mentioned this issue Aug 16, 2016

sql: clean up error messages in join checks #8580

Merged

knz changed the title ~~Very bad performance issue:A relational query based on PK results in a full table scan~~ sql/performance: independent filter expressions on JOINs should be pushed down to the operands Aug 16, 2016

knz added this to the Later milestone Aug 16, 2016

knz added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Aug 16, 2016

knz mentioned this issue Dec 15, 2016

sql: tracking issue, proposed optimizations #12418

Closed

14 tasks

knz mentioned this issue Dec 31, 2016

sql: implement filter propagation. #12617

Merged

knz closed this as completed in #12617 Jan 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql/performance: independent filter expressions on JOINs should be pushed down to the operands #8566

sql/performance: independent filter expressions on JOINs should be pushed down to the operands #8566

zhaoyuxi commented Aug 16, 2016 •

edited

Loading

petermattis commented Aug 16, 2016

knz commented Aug 16, 2016

zhaoyuxi commented Aug 17, 2016

zhaoyuxi commented Aug 17, 2016

knz commented Aug 17, 2016

zhaoyuxi commented Aug 18, 2016

zhaoyuxi commented Aug 18, 2016 •

edited by knz

Loading

knz commented Aug 18, 2016

knz commented Aug 18, 2016

RaduBerinde commented Aug 18, 2016 •

edited

Loading

zhaoyuxi commented Aug 18, 2016 •

edited by knz

Loading

zhaoyuxi commented Aug 19, 2016 •

edited

Loading

sql/performance: independent filter expressions on JOINs should be pushed down to the operands #8566

sql/performance: independent filter expressions on JOINs should be pushed down to the operands #8566

Comments

zhaoyuxi commented Aug 16, 2016 • edited Loading

petermattis commented Aug 16, 2016

knz commented Aug 16, 2016

zhaoyuxi commented Aug 17, 2016

zhaoyuxi commented Aug 17, 2016

knz commented Aug 17, 2016

zhaoyuxi commented Aug 18, 2016

zhaoyuxi commented Aug 18, 2016 • edited by knz Loading

knz commented Aug 18, 2016

knz commented Aug 18, 2016

RaduBerinde commented Aug 18, 2016 • edited Loading

zhaoyuxi commented Aug 18, 2016 • edited by knz Loading

zhaoyuxi commented Aug 19, 2016 • edited Loading

zhaoyuxi commented Aug 16, 2016 •

edited

Loading

zhaoyuxi commented Aug 18, 2016 •

edited by knz

Loading

RaduBerinde commented Aug 18, 2016 •

edited

Loading

zhaoyuxi commented Aug 18, 2016 •

edited by knz

Loading

zhaoyuxi commented Aug 19, 2016 •

edited

Loading