Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ruby schema definitions download performance issues #234

Closed
daniel-crlabs opened this issue Oct 27, 2021 · 8 comments
Closed

Ruby schema definitions download performance issues #234

daniel-crlabs opened this issue Oct 27, 2021 · 8 comments

Comments

@daniel-crlabs
Copy link

This issue is regarding performance problems for the CRDB driver for Ruby, Activerecord. According to the customer, this is not a break/fix issue necessarily or a bug, it is a performance type of concern.

Issue

Rails pulls data types from schema definitions in CRDB. These queries are particularly slow in CockroachDB.
The theory is that these are queries on internal table that are poorly optimized because there is no index on the internal table. The latency is about 200ms (or more) per query, which runs once per connection.

Workaround/Solution

Rails allows you to cache the schema definitions, that way you don't have to pull the schema from CRDB.
This is the command to generate the cache: bundle exec rake db:schema:cache:dump

Additional Technical Details

Under the covers, Rails pulls the schema definitions from pg_attribute and uses that to do the data type conversion in the ORM in rails.

CRDB driver code references:
line 456, this code block this part is slow
line 916, this code block this part is even slower

Reproduction

I was able to replicate this behavior in my lab. It seems the initial correlation here is, the more records the table has, the longer it'll take for the query to run ( I did not notice this when the table was small/had less records -- less than 100 thousand). However, I also noticed this was not always the case, sometimes I also got a longer response time when the table had less records (around the 400 thousand threshold).
I also noticed the response time for the queries is sometimes not even always close to the same. I was able to run the same exact query and get a response within 107ms, and a few seconds later, the same query returned within 807ms and then again 185ms, 65ms , 124ms, etc ....
This makes it seem like it is being cached somehow, or dependent on some other factor, that causes it to be slower or faster.
The same observations above were found when I ran the 2nd query, the response times varied from 313ms, 99ms, 66ms, 15ms, 117ms. This also seems random, so I'm not sure what the actual cause might be.

1st query, this is the one that took the longest to complete:

root@lab-kub01:30007/test> SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = 'randomrecords'::regclass
AND a.attnum > 0 AND NOT a.attisdropped
ORDER BY a.attnum;
   attname  |         format_type         |     pg_get_expr     | attnotnull | atttypid | atttypmod | collname | comment
------------+-----------------------------+---------------------+------------+----------+-----------+----------+----------
  autoidcol | bigint                      | unique_rowid()      |    true    |       20 |        -1 | NULL     | NULL
  varcol1   | character varying           | NULL                |   false    |     1043 |        -1 | NULL     | NULL
  varcol2   | character varying           | NULL                |   false    |     1043 |        -1 | NULL     | NULL
  intcol1   | bigint                      | NULL                |   false    |       20 |        -1 | NULL     | NULL
  intcol2   | bigint                      | NULL                |   false    |       20 |        -1 | NULL     | NULL
  varcol3   | character varying           | NULL                |   false    |     1043 |        -1 | NULL     | NULL
  intcol3   | bigint                      | NULL                |   false    |       20 |        -1 | NULL     | NULL
  intcol4   | bigint                      | NULL                |   false    |       20 |        -1 | NULL     | NULL
  intcol5   | numeric                     | NULL                |   false    |     1700 |        -1 | NULL     | NULL
  varcol4   | character varying           | NULL                |   false    |     1043 |        -1 | NULL     | NULL
  intcol6   | bigint                      | NULL                |   false    |       20 |        -1 | NULL     | NULL
  intcol7   | bigint                      | NULL                |   false    |       20 |        -1 | NULL     | NULL
  varcol5   | character varying           | NULL                |   false    |     1043 |        -1 | NULL     | NULL
  datecol   | timestamp without time zone | current_timestamp() |   false    |     1114 |        -1 | NULL     | NULL
(14 rows)

Time: 892ms total (execution 802ms / network 89ms)

2nd query, sample below shows the longest run:

root@lab-kub01:30007/test> SELECT c.crdb_sql_type FROM information_schema.columns c WHERE c.table_name = 'randomrecords' AND c.column_name = 'autoidcol';
  crdb_sql_type
-----------------
  INT8
(1 row)

Time: 313ms total (execution 300ms / network 13ms)

I ran the queries above against a table with 1 million records and the following table definition. Sample records below:

root@lab-kub01:30007/test> select count(*) from randomrecords;
   count
-----------
  1000000
(1 row)

Time: 5.119s total (execution 5.118s / network 0.001s)
root@lab-kub01:30007/test> select * from randomrecords limit 5;
       autoidcol       |     varcol1     |  varcol2   |  intcol1   |  intcol2   |  varcol3   | intcol3 | intcol4 |      intcol5       |  varcol4   | intcol6 | intcol7 |  varcol5   |          datecol
-----------------------+-----------------+------------+------------+------------+------------+---------+---------+--------------------+------------+---------+---------+------------+----------------------------
  -9223350509158126772 | peyrrrueeycvieb | clcjbbhbbd | 1325309531 | 1497057430 | wrrotkpkjp |    2112 |    3117 | 4.0501159729374425 | yepbyuygrc |       0 |    1299 | rzzuxdltgb | 2021-10-27 15:09:26.49035
  -9223350509158126771 | wnnxzlutxuwfjxk | xxkvuvcavq | 1448522614 | 1625801598 | yyvhugqvpo |    1200 |     757 | 2.8486701158814007 | gvhinkpkwc |       1 |    1070 | ywcedshllm | 2021-10-27 15:09:26.49035
  -9223350509158126770 | xxxyxnydjqjnmpa | pgpkrflcdl | 1231282346 | 1631862842 | tdfnyjmtiu |    1352 |    1976 | 2.7050390225120347 | icdzvmmarw |       0 |    1524 | ayzuchtexp | 2021-10-27 15:09:26.49035
  -9223350509158126769 | cmzkcdaaquefrtj | ujvxdydeti | 1471896163 | 1289143705 | hhblvdidop |    1277 |    4667 |  4.303774040368031 | khuwibxvmi |       1 |    4822 | jxqrvwveld | 2021-10-27 15:09:26.49035
  -9223350509158126768 | lvprhpkgljhsnxs | ghqyjbskfe | 1629680887 | 1567883413 | flcdrkgxxw |     320 |    3889 |  9.698259924777089 | hjdhjzwxya |       0 |    1132 | pdujrvhkye | 2021-10-27 15:09:26.49035
(5 rows)

Time: 4ms total (execution 3ms / network 1ms)
root@lab-kub01:30007/test> show create table randomrecords;
   table_name   |                                                                   create_statement
----------------+--------------------------------------------------------------------------------------------------------------------------------------------------------
  randomrecords | CREATE TABLE public.randomrecords (
                |     autoidcol INT8 NOT NULL DEFAULT unique_rowid(),
                |     varcol1 VARCHAR NULL,
                |     varcol2 VARCHAR NULL,
                |     intcol1 INT8 NULL,
                |     intcol2 INT8 NULL,
                |     varcol3 VARCHAR NULL,
                |     intcol3 INT8 NULL,
                |     intcol4 INT8 NULL,
                |     intcol5 DECIMAL NULL,
                |     varcol4 VARCHAR NULL,
                |     intcol6 INT8 NULL,
                |     intcol7 INT8 NULL,
                |     varcol5 VARCHAR NULL,
                |     datecol TIMESTAMP NULL DEFAULT current_timestamp():::TIMESTAMP,
                |     CONSTRAINT "primary" PRIMARY KEY (autoidcol ASC),
                |     FAMILY "primary" (autoidcol, varcol1, varcol2, intcol1, intcol2, varcol3, intcol3, intcol4, intcol5, varcol4, intcol6, intcol7, varcol5, datecol)
                | )
(1 row)

Time: 63ms total (execution 62ms / network 1ms)

root@lab-kub01:30007/test>
@keithdoggett
Copy link
Contributor

Thanks for the detailed writeup @daniel-crlabs. It's interesting that the queries seem to take longer when the tables are bigger given that those tables aren't queried directly.

I'm not very familiar with the inner workings of crdb, but maybe it has something to do with the information_schema not fitting being in memory if the tables have a lot of data so the reads take longer?

@daniel-crlabs
Copy link
Author

Thanks for looking into it. I was asked by our engineers to open the ticket in this repo, but maybe a different team should look into this? @vy-ton please let us know who to re-assign this issue, thank you.

@rafiss
Copy link
Contributor

rafiss commented Oct 29, 2021

I think we can keep the discussion here for now.

We've seen similar slowness in other queries:

In the one you shared:

SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = 'randomrecords'::regclass
AND a.attnum > 0 AND NOT a.attisdropped

just from looking at it, the slowest part of this is going to be the ::regclass cast. CRDB implements those by kicking off a separate internal query. we should try to see if we can avoid the cast by just using one of the columns we have already. the change would be similar to this: #226. the idea is to filter by relname and nspname and then join using relid/oid.

@carlallen
Copy link
Contributor

I'm not seeing significant performance removing the ::regclass cast:

SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = 'orders'::regclass
AND a.attnum > 0 AND NOT a.attisdropped;
Time: 128.938 ms

SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname, col_description(a.attrelid, a.attnum) AS comment
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = 371
AND a.attnum > 0 AND NOT a.attisdropped;
Time: 136.841 ms

Of note: This table is empty, I don't think this issue is related to the number of records in the table, I think it's about database size.

@rafiss
Copy link
Contributor

rafiss commented Nov 4, 2021

Thanks for trying that out. The other slow thing I can see in that query is the col_description() function call. We have a naive implementation for that that will cause a lot of data to be re-fetched for each result in the scan.

I think it's about database size.

Yes, definitely. The re-fetching I'm talking about happens more when there are mote objects in the database.

@carlallen
Copy link
Contributor

carlallen commented Nov 4, 2021

You called it on the col_description(), let me dig through rails and see where that's actully used, it might not be important.

SELECT a.attname, format_type(a.atttypid, a.atttypmod),
pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod,
c.collname
FROM pg_attribute a
LEFT JOIN pg_attrdef d ON a.attrelid = d.adrelid AND a.attnum = d.adnum
LEFT JOIN pg_type t ON a.atttypid = t.oid
LEFT JOIN pg_collation c ON a.attcollation = c.oid AND a.attcollation <> t.typcollation
WHERE a.attrelid = 'orders'::regclass
AND a.attnum > 0 AND NOT a.attisdropped;
Time: 9.052 ms

@carlallen
Copy link
Contributor

Looks like they aren't super important... just used to display if you want it. So I've got a solution in mind to resolve that problem, but it doesn't resolve completely because of the other slow query.

I'll start a PR.

@carlallen
Copy link
Contributor

I know I marked #235 as fixes this, but it only half fixed it. Performance is better, but still not where I think it should be.

Patching the query here makes it considerably faster as well as changing this to run once per table instead of once per matching column. However querying information_schema.columns is still too slow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants