-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ruby schema definitions download performance issues #234
Comments
Thanks for the detailed writeup @daniel-crlabs. It's interesting that the queries seem to take longer when the tables are bigger given that those tables aren't queried directly. I'm not very familiar with the inner workings of crdb, but maybe it has something to do with the information_schema not fitting being in memory if the tables have a lot of data so the reads take longer? |
Thanks for looking into it. I was asked by our engineers to open the ticket in this repo, but maybe a different team should look into this? @vy-ton please let us know who to re-assign this issue, thank you. |
I think we can keep the discussion here for now. We've seen similar slowness in other queries: In the one you shared:
just from looking at it, the slowest part of this is going to be the |
I'm not seeing significant performance removing the
Of note: This table is empty, I don't think this issue is related to the number of records in the table, I think it's about database size. |
Thanks for trying that out. The other slow thing I can see in that query is the
Yes, definitely. The re-fetching I'm talking about happens more when there are mote objects in the database. |
You called it on the
|
Looks like they aren't super important... just used to display if you want it. So I've got a solution in mind to resolve that problem, but it doesn't resolve completely because of the other slow query. I'll start a PR. |
I know I marked #235 as fixes this, but it only half fixed it. Performance is better, but still not where I think it should be. Patching the query here makes it considerably faster as well as changing this to run once per table instead of once per matching column. However querying |
This issue is regarding performance problems for the CRDB driver for Ruby,
Activerecord
. According to the customer, this is not a break/fix issue necessarily or a bug, it is a performance type of concern.Issue
Rails pulls data types from schema definitions in CRDB. These queries are particularly slow in CockroachDB.
The theory is that these are queries on internal table that are poorly optimized because there is no index on the internal table. The latency is about 200ms (or more) per query, which runs once per connection.
Workaround/Solution
Rails allows you to cache the schema definitions, that way you don't have to pull the schema from CRDB.
This is the command to generate the cache:
bundle exec rake db:schema:cache:dump
Additional Technical Details
Under the covers, Rails pulls the schema definitions from
pg_attribute
and uses that to do the data type conversion in the ORM in rails.CRDB driver code references:
line 456, this code block this part is slow
line 916, this code block this part is even slower
Reproduction
I was able to replicate this behavior in my lab. It seems the initial correlation here is, the more records the table has, the longer it'll take for the query to run ( I did not notice this when the table was small/had less records -- less than 100 thousand). However, I also noticed this was not always the case, sometimes I also got a longer response time when the table had less records (around the 400 thousand threshold).
I also noticed the response time for the queries is sometimes not even always close to the same. I was able to run the same exact query and get a response within 107ms, and a few seconds later, the same query returned within 807ms and then again 185ms, 65ms , 124ms, etc ....
This makes it seem like it is being cached somehow, or dependent on some other factor, that causes it to be slower or faster.
The same observations above were found when I ran the 2nd query, the response times varied from 313ms, 99ms, 66ms, 15ms, 117ms. This also seems random, so I'm not sure what the actual cause might be.
1st query, this is the one that took the longest to complete:
2nd query, sample below shows the longest run:
I ran the queries above against a table with 1 million records and the following table definition. Sample records below:
The text was updated successfully, but these errors were encountered: