Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize reference indexes #200

Open
rompetroll opened this issue Feb 6, 2023 · 0 comments
Open

Optimize reference indexes #200

rompetroll opened this issue Feb 6, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@rompetroll
Copy link
Contributor

Currently, whenever a new change is stored in a dataset, we add all incoming and outgoing references belonging to that change-entity to incoming and outgoing reference indexes.

We do this with a timestamp attached to the reference, so that point in time queries can follow the references that were active at the time.

However, often the references do not really change when an entity is updated. In these cases we don't really need to store a new index-item everytime. we just need to track changes to the reference state (in practise: the deleted state of a reference changes).

So, to optimize reference indexes, we could - as part of datahubs garbage collection process - compact reference indexes. e.g. remove identical(timestamp disregarded) successive refences from incoming and outgoing indexes.

We could also extend the logic in dataset.StoreEntities to skip adding new ref-index entries if they are not changed

@rompetroll rompetroll added the enhancement New feature or request label Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant