Refactor away from the .update file #174

joshgarde · 2024-06-21T01:57:07Z

Issue
The current solution for maintaining the latest timestamp within a directory is via the .update hidden file. While this works, the solution is not portable or self evident to users.

Solution
Refactor data-subscriber to instead utilize file metadata within the directory to determine the next start datetime to fetch from. This solution removes the need to maintain a .update file which may disappear if the user copies the granules from one directory to another without noticing the .update file. Potential issues that may arise is if the user is utilizing the directory for other work and adding additional files after subscriber runs or if the user is subscribing to multiple granules into the same directory.

An alternative solution may be to perform granule downloads in descending order of timestamps such that any granule that's not found already in the directory is downloaded, but once the subscriber hits a granule that does exist (implying that was the last stop point), it ends its execution. This solution would skip the need to look for file metadata which may change unbeknownst to the user and may be inconsistent across filesystems. It would also enable support for subscribing to multiple datasets within the same directory.

The text was updated successfully, but these errors were encountered:

mike-gangl · 2024-06-21T15:22:29Z

it's been a while since i worked on this, but wanted to confirm- is this change only for the "downloader" tool, or is it for the subscribe tool as well? i'd be weary of changing the subscription feature because it's very purpose built- it's not meant to get data from the past (only data that are newly ingested, which could be "in the past" but has been recently updated". If you want to download various temporality, can't we just use the "data downloader" tool?

joshgarde · 2024-06-21T16:43:25Z

Reworked the ticket to something I think is more workable for subscriber specifically. Lmk your thoughts

joshgarde changed the title ~~Refactor timestamp mechanism in to better support mixed data in existing folders~~ Refactor away from the .update file Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor away from the .update file #174

Refactor away from the .update file #174

joshgarde commented Jun 21, 2024 •

edited

Loading

mike-gangl commented Jun 21, 2024

joshgarde commented Jun 21, 2024

Refactor away from the .update file #174

Refactor away from the .update file #174

Comments

joshgarde commented Jun 21, 2024 • edited Loading

mike-gangl commented Jun 21, 2024

joshgarde commented Jun 21, 2024

joshgarde commented Jun 21, 2024 •

edited

Loading