-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate/implement non-redundant ancestry processing #91
Comments
Key question for whether this will work - are registry-refs docs guaranteed to be written to OpenSearch after the non-aggregate products they refer to? @jordanpadams @al-niessner do you know, off the top of your head? |
I was just looking at a related item in harvest a week or two ago. While I cannot say for certain because I have not tested it, harvest processes bundles then collections then products (non-aggs) and writes them in that order via batching and a List. So, just the opposite of the order you want. I think the primary reason it is in this order is it simplifies testing and checking that harvest has to do. Since the bundle is already loaded, it knows if the collection is part of it. Ditto on next layer down. It makes the harvest code much simpler. The order it is written is not as important. However it can be batched or done as found. Default is batch. If done as found, then order is obvious. I remember the batch using a list and sending to registry from first to last index. It would probably be easy, but no promises, to do the array in reverse. However, this would not help you if the user is not using batch mode. |
Thanks Al, much appreciated! Will need to have a think about whether to follow this (harvest) up or rely on detection/cleanup of such cases. Given that all it would take to break something is for someone to use an out-of-date harvest even if we did fix it, seems like maybe the latter is the only option. |
@alexdunnjpl @al-niessner one catch here is that probably only applies when someone actually points at a bundle. Harvest can be pointed at any directory. |
Suggest (accepted, per breakout): implement naively, ignoring the "ingestion while sweeping" test case and monitor the quantity of orphaned documents or just check them in a few weeks/months. If there are an unmanageable quantity of orphans, we'll need to rethink, else implement a secondary cleanup sweeper process. |
Checked for duplicates
No - I haven't checked
π§βπ¬ User Persona(s)
No response
πͺ Motivation
...so that ECS costs are significantly reduced
π Additional Details
No response
Acceptance Criteria
Ancestry sweeper only processes data which is new, modified, or was processed with an out-of-date version of the ancestry sweeper, or which references a bundle or collection which has been modified.
βοΈ Engineering Details
No response
The text was updated successfully, but these errors were encountered: