-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataStore Query Performance Very Slow #8405
Comments
I closed my original issue as I could no longer replicate the problem anymore Do u happen to be using the Storage module in your app? |
Hi @nubpro - we are using the Storage module in the app for upload of photographs. |
New Video - here is exactly the same application - with a few hundred records loaded versus over 7,000 in the previous video. Seems loading more than a trivial amount of data is grinding it to a halt. What you see below is my expectation for what we should be getting with a larger data set - even 100K records. MOV.File.viewer._.Microsoft.Teams.mp4 |
Hi @manueliglesias / @iartemiev Is there anything I can provide you to move this along? Its on a critical time line for us to get this resolved. Thanks Stephen |
@sacrampton is the performance comparable between Simulator and physical devices on both iOS and Android? Or is one worse than the other? Does query performance degrade gradually as the number of records increases? Or does it tend to be acceptable up to a certain point and then rapidly drops? Since you said only 5 of your models have data in them, I'll attempt to reproduce this with 5 models and 10k+ records in each. Will let you know what I find. |
Also, which version of aws-amplify are you currently using? |
Hi @iartemiev I went through the list of models and data counts in more detail. The difference between the data in each model between the first and second videos is shown here. Thanks... Data By Model </style>
|
Using aws-amplify version = 3.3.27 |
Hi @iartemiev - I had another thought on this. On the Asset object I have potentially 400+ attributes, and potentially similar on the AssetVisit object. In DynamoDB there would rarely be more than 30 attributes on any individual record so the size is quite small. And in GraphQL we just download the attributes we need for a specific app so we are never trying to download 400+ attributes. But I'm starting to think that DataStore might is creating 400+ attributes on every single record on the mobile. In my case they would be hundreds of empty attributes taking up space. Is that what might be going on? Can you test by creating a couple of models with hundreds of empty attributes and see if that is the issue? |
Thanks for the additional info. Are you returning all of the records for those models when you're querying Asset, AssetVisit, etc.? Or are you using predicates to return a subset? |
Hi @iartemiev - sorry I'm probably not understanding the question. Our app is multi-tenanted. So we have a base query to return only the data that belongs to that client. But in our existing GraphQL app we would only return the specific attributes from say the Asset that the App needs (say 30 attributes and not 400+ attributes). Also, I notice in the GraphQL when it is a BELONGS_TO association that the GraphQL query is returning all the associated data and I guess it is storing that against the record. For example, and Asset in most cases has a ParentAsset - so it seems that not only is the Asset getting all 400+ blank attributes, it is storing the 400+ blank attributes of the ParentAsset. The Asset has a lot of HAS_MANY connections, so each of those objects is also getting the 400+ blank attributes of the connected Asset in its GraphQL query. |
I meant when you're calling In response to your question about fields with empty values: all of the model fields in your schema will be part of the selection sets for the GraphQL operations that populate the local store. Any empty values will come back as For example, if the records have empty values in fields type Post @model {
id: ID!
title: String!
field1: String
field2: String
field3: String
field4: String
field5: String
} They would be stored as: {
"id": "id1",
"title": "someTitle",
"field1": null,
"field2": null,
"field3": null,
"field4": null,
"field5": null
} Do you know which fields your app will need ahead of time? Or is that determined programmatically? In the former case, you could modify your |
Thanks @iartemiev. I have a further question related objects - whether the data from the related object is embedded in the original object inside DataStore. The queries tend to suggest that might be the case. So for example, I might have an optional connection to a parent Post
I am wondering if the Parent record is embedded in the current record like something like this.
Also, editing the schema.js is fine - I can do that. Do I need to edit any of the other files (ie. index.d.ts, etc.) And I guess I can have a different schema.js for each application I have so each application can get to just the attributes it needs. |
Hi @iartemiev - to answer your question "...I meant when you're calling DataStore.query, are you passing it a predicate, e.g., DataStore.query(Asset, c => c.someField('eq', 'someValue') or perhaps using pagination or are you retrieving all of the records from the local store by calling DataStore.query(Asset) (no predicates or pagination)?..." We are doing the base query with a predicate as you describe and we are using pagination (500 records per cycle) because 1000 causes a resource exceeded error. |
Hi @iartemiev - if we expand your type Post model above and have 100 x "field01": null fields and 2 fields as you show. In DynamoDB you would just have the id/title fields and this would be 32b in size. But if I look at the size of adding 100 null fields it swells to 1,532b. On a mobile where we are trying to save memory then it seems counterintuitive to be storing null fields. Is there any opportunity to store like DynamoDB? I guess not since I think you are using SQLite.. |
Hi @iartemiev - Some more results of testing with the original data set of 21,357 records detailed above. Querying the asset list before was taking 12 seconds. I have created a new schema.js that reduced the number of attributes on the records down significantly for the 3 models below.
The result was that by reducing the number of attributes on the records as shown above halves the response time (6 seconds to retrieve the assets list versus 12 seconds with the the previous one. To confirm, 100% of the attributes I removed were null. The null attributes are used by other (backend) parts of the solution - but not required in mobile app. I think this shows that syncing null attributes to a record can dramatically impact performance. Given that there is a direct connection between storing null attributes and performance it follows that if we can get rid of all null attributes (like DynamoDB does) then we are going to be maximizing performance. The only way I can think of to get rid of all null values at all times is if you were able to use some sort of a NoSQL repository on the device that only stores non-null attributes, thus optimizing space and performance? The beauty of GraphQL and DynamoDB is that I can create a comprehensive schema, but only store minimal data - so I have huge flexibility with economy and performance. But if we are storing a null value for every single possible attribute that takes away the flexibility. |
Hi @iartemiev - another test result to share with you. Used another larger dataset for testing. This data set had these quantities in these 3 models. All the other models are approximately the same quantities - they don't vary by project as they are essentially reference data objects.
I tried with the Original schema.js and the New schema.js defined above.
Note that when we query a model the response is irrelevant to the amount of data we are retrieving. From the 30K assets I could do a query that retrieves 1 record and it will take 1 minute and 4 seconds. I could conduct another query that returns 113 records and it take 1 minute and 4 seconds. |
Hi @iartemiev - one final update from my testing. On the desktop you ship a storage adapter for IndexedDB which is a NoSQL database. So we did a test on the desktop with the large 30K+ Asset data set described above using the full schema. It took about 11 minutes for the full dataset to download - but after that a query of the Asset table took 2 seconds. This compares with 64 seconds on Mobile App (React Native). |
I'll try to respond to the questions/points in order
For Has One and Belongs To relationships we resolve the connection at run time when you call
By "base query" are you referring to the initial base sync query between AppSync and the local store? Or calling
That's not possible at the moment, but it's certainly something we will explore in the future.
Could you please share the query you're using here? Specifically, what predicate are you passing? |
Hi @iartemiev - thanks for answering my long chain of rambling "input" / observations.
The base query would be to download all assets in a Plant
Then in DataStore with the 30K assets we do a query to find Assets in a package that have not yet been audited. This query takes 1 minute 4 seconds to return the 2 records. If we remove the limit it returns 113 records and takes the same time.
|
Thank you, that is helpful. We'll be doing additional library benchmarking today in order to pinpoint the issue and will provide an update with our findings. |
Quick update: I have been able to repro performance issues similar to those described above when using a model with 10 fields containing values + 100 fields with |
Hi @iartemiev - that is fantastic news that you were able to reproduce. Hopefully you can also reproduce it working perfectly with IndexedDB/Browser too. Await your feedback. |
We've pinpointed the bottleneck to the We'll be brainstorming what improvements we could make in DataStore in order to mitigate this as much as possible. For your use case specifically, even with the reduced number of fields, are a considerable number of values in your records still As an aside, when you're seeing this poor performance do you have the React Native debugger enabled on your app? We're seeing significantly better performance (still suboptimal, but 2-3x faster) with the debugger disabled, so just want to point that out as well. |
@iartemiev - looks like good progress. Ultimately, I'd like to retain access to all fields (like we do with IndexedDB) - but for now we can run with the reduced schema. And even in the reduced schema I would estimate that 80% of those fields are null in the dataset. I don't believe we have debugger enabled - but I will check. |
@iartemiev - checked performance with and without debugger enabled. The 64 second response was with React Native Debugger enabled. With Debugger switched off the response is 22 seconds. As you said - its about a 3X improvement. Worth remembering that 22 seconds was with the stripped down schema and is still 11X slower than the Browser/IndexedDB running with the full schema. Good to know we'll see that sort of improvement when we go to production, but 22 seconds with cut-down schema still isn't a solution. Thanks for working this. |
@jeremy-white, thank you for calling this out. We'll need to make sure to add |
Thanks @iartemiev - thanks for the feedback - can confirm your suggestion does indeed return data before it is synced to the server. Just need to work out how we get a list back now. |
Just a couple more things I've hit
Hit this for HAS_MANY connections defined as
as well as
I've been able to get past this by changing I'm also seeing
For fields I define as AWSJSON and have values that are arbitrary JSON . . . I haven't been able to workaround this other than just removing those fields from schema.js so if you have a better workaround I would love to hear it. I really appreciate your work on this @iartemiev as I think it will be a HUGE improvement! |
Hi @iartemiev - does anything in our predicate above jump out at you as a possible issue that is not getting un-synced data? We know we can retrieve a single un-synced record whose id we know. But the search for a list of records that included synced and un-synced records is something we are still struggling with. Any hints you might have on what we might be doing wrong in the above predicate would be much appreciated. We are going live with our App this weekend - we wouldn't have got here without your amazing support in this effort. Much appreciated. |
@sacrampton, I don't think there's anything inherently wrong with it. It's hard to know for sure without being familiar with your app code and knowing what values are being passed and what data is expected to be present in the local store. DataStore doesn't use a cache. Every time you execute If I were debugging this, I would check that each of the operands has the expected value by logging them out to the console. You can also try removing fields from your predicates one by one to determine which one is causing all the results to be filtered out. Try something like the following in the same place as where you're normally attempting to query the local records (and before the sync): const newAsset = await DataStore.save(new Asset({...assetData}));
const data = await DataStore.query(Asset, newAsset.id);
console.log(data.assetParentAssetId === mobileActiveLocation?.id, data.assetParentAssetId, mobileActiveLocation?.id);
console.log(data.assetPackageId === userPackageId, data.assetPackageId, userPackageId);
console.log(data.assetPlantId === userPlantId, data.assetPlantId, userPlantId);
console.log(data.isDeleted !== true, data.isDeleted); If you have a Lastly, if you have debug logs on, you should see the SQL SELECT statement that is generated by that predicate. Does it look correct? |
Hi @iartemiev - the count suggestion you had above works for smaller data sets in iOS (extremely slow with larger data sets - > 1 minute). But in Android we get the following error.
|
@sacrampton SQLite represents boolean values as You can either specify re: performance, I'm surprised it's running that slowly. I'll try to reproduce on my end to confirm. |
Hi @iartemiev - replacing "false" with 0 is not working for us... The error seems to be that the column we are calling is not there. But we know it is there as it works in other queries from DataStore - just not count in SQLite. |
What's the error you get when you query By the way, are you calling |
Hi @iartemiev - here are a few points.... On iOS if I do the following query it gives me the correct answer and does so very quickly. SELECT COUNT(*) as count FROM Asset WHERE auditCompleted = true If we execute exactly the same query on Android it gives this error
We were using the other SELECT statement because we can't seem to get the COUNT to work with an AND statement - the following query in iOS returns an incorrect value of 1. In Android it returns the same error. SELECT COUNT(*) as count FROM Asset WHERE auditCompleted = true AND packageName = JONESBORO |
Try parameterizing the query and enclose the WHERE clause expression in parentheses. E.g., await db.executeSql('SELECT COUNT(*) as count FROM Asset WHERE (auditCompleted = ? AND packageName = ?)', [true, 'JONESBORO']); |
Hi @iartemiev - thanks for that - did the trick now. So resolved all issues around the count. Thanks so much as always. Still noticing the delta-sync on SQLite is slower than before. Not sure if you were seeing that. |
My pleasure! Very glad to hear that things are finally coming together!
Hmm, I remember you mentioning that in your email, but I haven't observed it yet myself. I'll do some more benchmarking around this specifically when I get a chance.
|
@iartemiev, @sacrampton I am facing same issue on Android, no matter what i do, data is always being stored in the AsyncStorage. Steps, I followed are as below..
Am I missing any specific setting? |
Hi @iartemiev - want to push further into the slowness we are seeing for DeltaSync DataStore creates a separate table in DynamoDB to manage the DeltaSync called "AmplifyDataStore-ENV". There are no indexes in this table - just the partition key and sort key - where the partition key is table/date and sort key is time/id/version. Our database is multi-tenanted - and we deal with assets in industrial plants. So I could have hundreds of other users in other plants making massive amounts of changes. But I might not have any users working my plant. The DeltaSync as I see it is going to have to sort through everyone else's changes just to work out there are zero changes that are going to be applicable to me. When we initially hydrate the cache we do a base query which uses GSI's to get a quick response. At the moment I'm seeing DeltaSync take about the same amount of time as the full sync (20 minutes). Today I know I was doing a lot of bulk updating of data in a few different plants through our web back end. Not an unusually large workload. But I am concerned from the slowness I'm seeing in our database and what I see in the DeltaSync table for DynamoDB has me worried that this is not scalable for a multi-tenanted environment. You've been really good at coming up with solutions to get us moving - hopefully someone else has already come up with a solution to make the DeltaSync run in seconds rather than 20+ minutes. |
@mjaydeep01 try deleting your node_modules and lock file, then adding the following to your package.json and then running "resolutions": {
"@aws-amplify/datastore": "rn-sqlite",
"@aws-amplify/core": "rn-sqlite",
"@aws-amplify/*": "rn-sqlite"
}, |
@sacrampton, I think this behavior likely warrants a separate GitHub issue, unless this is somehow related to the on-device database on React Native specifically (AsyncStorage or SQLite). To better understand what's going on, I have some follow up questions:
|
Hi @iartemiev - have created a separate GitHub issue as requested - #8699 |
@iartemiev, thanks for the new suggestion... warning Resolution field "rn-sqlite" has an invalid version entry and may be ignored yarn --version --> 1.22.5 Can you please suggest how to resolve this issues? |
@sacrampton, @jeremy-white, @mjaydeep01 - this feature has been officially released as part of (Note: this new storage adapter is only compatible with React Native CLI-generated apps. Expo support will be added in the future) Install: $ npm install aws-amplify @aws-amplify/datastore-storage-adapter react-native-sqlite-storage aws-amplify-react-native amazon-cognito-identity-js @react-native-community/netinfo @react-native-async-storage/async-storage
$ npx pod-install Enable in your app: import { DataStore } from 'aws-amplify';
import { SQLiteAdapter } from '@aws-amplify/datastore-storage-adapter';
DataStore.configure({
storageAdapter: SQLiteAdapter
}); I'm closing this issue, as the new storage adapter addresses the performance concerns expressed in the issue. Please create a new issue if you require assistance with this feature. |
Hi @iartemiev - thank you for going above and beyond to make this happen |
SqliteAdapter prints empty object.Please confirm sync is not running after doing this |
no such table: Setting This are the errors logs.Please advise .I tried below code import { DataStore } from 'aws-amplify'; DataStore.configure({ |
Hey have you found a solution for this issue? |
Before opening, please confirm:
JavaScript Framework
React Native
Amplify APIs
DataStore
Amplify Categories
api
Environment information
Describe the bug
We are porting our AppSync/GraphQL based app to DataStore and what we are finding is that queries on the cached database are very slow. We are also finding that pagination of the queries makes no difference to the performance of the queries.
There are many tables with the largest table having about 8,000 records and most other tables having less than 1000 records. There are only 5 tables/models that have data.
Whether we are connected to the network or working offline it makes no difference to the performance.
In issue #6994 it talks of a first time slowing in performance, but we see consistently slow performance.
I have recorded a video of the performance we are seeing - both online and offline - to give you an idea of the problem we are facing.
Google.Chrome.-.RPReplay_Final1622941095.MP4.-.Google.Chrome.mp4
Expected behavior
The database is local so we should be seeing almost instant responses to our queries. The performance of our existing AppSync GraphQL cache gives us almost instant responses so we expect the same or better performance than the existing AppSync GraphQL database.
Reproduction steps
Not sure how to replicate this as the database I have has 48 models and 200 GSI's and takes about 6 hours of incremental (manual) pushes to deploy into a new environment.
Code Snippet
// Put your code below this line.
Log output
aws-exports.js
No response
Manual configuration
No response
Additional configuration
No response
Mobile Device
iPhone 7
Mobile Operating System
14.5
Mobile Browser
Not relevant
Mobile Browser Version
No response
Additional information and screenshots
No response
The text was updated successfully, but these errors were encountered: