-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Asset Paths #1312
Optimize Asset Paths #1312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
How will we migrate the existing data to this new model? Would someone run the ingest_asset_paths
management command manually?
models.CheckConstraint(name='asset_path_regex', check=Q(path__regex=ASSET_PATH_REGEX)), | ||
models.CheckConstraint( | ||
name='asset_path_no_leading_slash', check=~Q(path__startswith='/') | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done as two check constraints as it was difficult to create a working/efficient regex that covered both.
Yes that's correct. |
Now that #1006 is merged, the migrations here will need to be recreated. |
c55e06b
to
d7bd079
Compare
76eb5ee
to
408a5c1
Compare
e684299
to
248a61b
Compare
FWIW, as I was one among those who "questioned" the efficiency of a "flat list of paths" representation when it was introduced in original DANDI design, and performance improvements are always welcome but I think the cons of such approach should also be considered. It
An alternative, and generic, could be - caching of views based on the last-modified timestamp of the dandiset version (I thought I filed an issue but I failed to find it), on which we rely anyways already in backups and have check introduced which would alert us if there are changes without timestamp boost. |
It does indeed cost some complexity to gain better correctness and performance for the file browsing functionality. However, this is a case where we actually failed to model properly this corner of the system. In that sense, the new models are required to capture the domain logic here. I'm not sure what you mean exactly by the need to call into AssetPath, but we can take steps as needed to make this part a bit easier. For example, if we identify which things have a need to "change
We are indeed denormalizing the path data in the name of maintaining performance. However, we are not particularly worried about loss of consistency, since the system doesn't modify the existing path data (assets must be copied first in order to change their paths, or else they need to be deleted and re-added, etc.). If we add an "fsck" type of script, it would be a maintenance script we could run to help us debug in case we see something going wrong, rather than something that would be exposed via the API.
The design is intended to accelerate the common case of reading path information, at a slight expense for deletion and addition operations. But the complexity is not Furthermore, we should talk about exactly what kind of thing
This approach carries a lot of risk (as you know, cache invalidation is one of the two hard things in computer science). It is hard to guess at the performance boost this approach would give us, measured against the risk of incorrect updates, etc. On top of that, we are able to solve other problems with our current approach with an actual model of file paths (efficient updates of file sizes and file counts; pagination). Jake's PR brings much more good than harm, and I think we should (after final reviews) deploy it to staging, play around with it, and gain some more confidence that it's working. |
@AlmightyYakob Does this resolve #1109? |
No, but it would be trivial to address with these new models. Since there's been so much churn on this PR, I'd like to avoid fixing it here and do a follow up afterwards. |
ef7918a
to
fd0a644
Compare
🚀 PR was released in |
Closes #652
This is a performance optimization to how we handle asset paths. This implementation greatly improves read times for the asset paths endpoint, as simplifies the UI FileBrowser component.
Some notes:
path
field, any offending assets in production will need to be dealt with (likely manually) before this is merged/deployed.ingest_asset_paths
management command will need to be run, which means there will be a small period of time where the file browser is not functional.