Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much slower query on meta data compared with CockroachDB and PostgresSQL #15224

Closed
bnuzhouwei opened this issue Dec 6, 2022 · 15 comments
Closed
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage

Comments

@bnuzhouwei
Copy link

bnuzhouwei commented Dec 6, 2022

Jira Link: DB-4403

Description

The following sql:

 select count(0) FROM pg_attribute t1 JOIN pg_class t2 ON t1.attrelid = t2.oid

Took 0.9s on YBDB, but only 0.014s on CRDB and PostgreSQL.

@bnuzhouwei bnuzhouwei added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Dec 6, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Dec 6, 2022
@ddorian
Copy link
Contributor

ddorian commented Dec 6, 2022

@bnuzhouwei can you do explain analyze of the query on all 3 cases?

@ddorian
Copy link
Contributor

ddorian commented Dec 6, 2022

Also, what's the hardware and how many nodes are you using?

@bnuzhouwei
Copy link
Author

bnuzhouwei commented Dec 6, 2022

Only one single node, and my computer is:

cpu: 12700K
ram: 4*32G DDR4 3200
2T SSD

the compose file for docker to create a test enviroment:

version: '2'

volumes:
  yb-master-data-1:
  yb-tserver-data-1:

services:
  yb-master:
      image: yugabytedb/yugabyte:latest
      container_name: yb-master-n1
      volumes:
      - yb-master-data-1:/mnt/master
      command: [ "/home/yugabyte/bin/yb-master",
                "--fs_data_dirs=/mnt/master",
                "--master_addresses=yb-master-n1:7100",
                "--rpc_bind_addresses=yb-master-n1:7100",
                "--replication_factor=1"]
      ports:
      - "7000:7000"
      environment:
        SERVICE_7000_NAME: yb-master

  yb-tserver:
      image: yugabytedb/yugabyte:latest
      container_name: yb-tserver-n1
      volumes:
      - yb-tserver-data-1:/mnt/tserver
      command: [ "/home/yugabyte/bin/yb-tserver",
                "--fs_data_dirs=/mnt/tserver",
                "--enable_ysql",
                "--rpc_bind_addresses=yb-tserver-n1:9100",
                "--tserver_master_addrs=yb-master-n1:7100"]
      ports:
      - "9042:9042"
      - "5433:5433"
      - "9000:9000"
      environment:
        SERVICE_5433_NAME: ysql
        SERVICE_9042_NAME: ycql
        SERVICE_6379_NAME: yedis
        SERVICE_9000_NAME: yb-tserver
      depends_on:
      - yb-master

I am testing yugabytes, and i want to suit my app to YGDB, my app need to read the meta data of of all table and columns to dynamically create SQL.

So i find the poor performance of YGDB to query join meta tables, while CRDB and PostgreSQL are faster.

I think may be because of lacking indexes on meta tables of YGDB?

@ddorian
Copy link
Contributor

ddorian commented Dec 6, 2022

I think may be because of lacking indexes on meta tables of YGDB?

You have to paste the explain analyze of the queries.

The reason is probably because the metadata is located in yb-master. yb-tserver has to do RPCs to it and it's not very efficient.
We're working on making this more efficient.

How often are you running metadata queries?

@bnuzhouwei
Copy link
Author

I don't know how to paste the explain analyze of the queries.

The query is fast when query each table, but slow in join tables.

-- fast:
select * from pg_attribute
select * from pg_class 
-- very slow:
select count(0) FROM pg_attribute t1 JOIN pg_class t2 ON t1.attrelid = t2.oid

@ddorian
Copy link
Contributor

ddorian commented Dec 6, 2022

Run the query below and paste the output:

explain analyze select count(0) FROM pg_attribute t1 JOIN pg_class t2 ON t1.attrelid = t2.oidl

How often are you running metadata queries?

Please answer

@FranckPachot
Copy link
Contributor

FranckPachot commented Dec 6, 2022

Hi,
I see that you are on :latest which is the preview version.
You set yb_bnl_batch_size to 100 which will make a huge difference:

yugabyte=#  select count(0) FROM pg_attribute t1 JOIN pg_class t2 ON t1.attrelid = t2.oid;
 count
-------
  2547
(1 row)

Time: 24032.212 ms (00:24.032)

yugabyte=# set yb_bnl_batch_size=100;
SET
Time: 14.998 ms

yugabyte=#  select count(0) FROM pg_attribute t1 JOIN pg_class t2 ON t1.attrelid = t2.oid;
 count
-------
  2547
(1 row)

Time: 52.533 ms
yugabyte=#

@bnuzhouwei
Copy link
Author

@FranckPachot Yes, much fater after set yb_bnl_batch_size, but it only for a query.
How to set this variable as global setting?

@FranckPachot
Copy link
Contributor

You can add --ysql_pg_conf_csv=yb_bnl_batch_size=1000 when starting yb-tserver

@bnuzhouwei
Copy link
Author

bnuzhouwei commented Dec 7, 2022

I use PostgreSQL 11.2-YB-2.14.5.0-b0 on x86_64-pc-linux-gnu, compiled by clang version 12.0.1 (https://github.com/yugabyte/llvm-project.git bdb147e675d8c87cee72cc1f87c4b82855977d94), 64-bit, the problem still have.

And no yb_bnl_batch_size variable can be set.

How often are you running metadata queries?

Almost every query i need the schema tables, because my engine use FillSchema to got a datatable of schema, and then a AdminUI is auto created from the schema table.

@bnuzhouwei
Copy link
Author

explain analyze select count(0) FROM pg_attribute t1 JOIN pg_class t2 ON t1.attrelid = t2.oid
Aggregate  (cost=216.39..216.40 rows=1 width=8) (actual time=1753.865..1753.865 rows=1 loops=1)
  ->  Nested Loop  (cost=0.00..213.89 rows=1000 width=0) (actual time=3.446..1752.203 rows=4423 loops=1)
        ->  Seq Scan on pg_attribute t1  (cost=0.00..100.00 rows=1000 width=4) (actual time=2.863..6.896 rows=4423 loops=1)
        ->  Index Scan using pg_class_oid_index on pg_class t2  (cost=0.00..0.11 rows=1 width=4) (actual time=0.387..0.387 rows=1 loops=4423)
              Index Cond: (oid = t1.attrelid)
Planning Time: 0.115 ms
Execution Time: 1753.921 ms
Peak Memory Usage: 31 kB

@FranckPachot
Copy link
Contributor

Ok, for versions that do not have Batched Nested Loop yes, I have an ugly one:

set random_page_cost=1e42

but better do that only for queries on the dictionary. Ideally as a hint /*+ Set(random_page_cost 1e42) */

@bnuzhouwei
Copy link
Author

bnuzhouwei commented Dec 13, 2022

Not a good user experience, how to set the default configs.

Use Navicat double click to open a table, also very slow...

The metadata do cause many performance issues..

@FranckPachot
Copy link
Contributor

The defaults can be set at cluster, database, user, connection, session, transaction, que level.

I don't know FillSchema but reading the schema tables for each query is not what an application is supposed to do. This will never be scalable

@m-iancu
Copy link
Contributor

m-iancu commented Jan 17, 2023

Closing this as the issue is identified -- the fundamental issue (changing the defaults) should be tracked in: #14070.

@m-iancu m-iancu closed this as completed Jan 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue status/awaiting-triage Issue awaiting triage
Projects
None yet
Development

No branches or pull requests

5 participants