Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate Timeliness of data #39

Open
imvs95 opened this issue Oct 26, 2022 · 1 comment
Open

Incorporate Timeliness of data #39

imvs95 opened this issue Oct 26, 2022 · 1 comment
Labels

Comments

@imvs95
Copy link
Collaborator

imvs95 commented Oct 26, 2022

Old data is less up-to-date than new data for webscrapers

@imvs95 imvs95 added the Prio3 label Oct 26, 2022
@EwoutH
Copy link
Owner

EwoutH commented Oct 26, 2022

I think this could best be done by adding a "date collected" column, with the date on which each data row is collected. The advantage is that we keep all the raw data this way, and can track how much data is. Then we could modify scripts to either keep both or use the newest data available when combining data.

Now that I think of it, alle the raw data files already contain the date in their name. So a separate column in the raw data isn't needed. Maybe the combined data could contain columns with "First detected" and "Last updated" dates.

This also depends on what our criterea are for two routes to be the same, and of course this could change over time.

For now I see no immediate action, since all collected data is already date-stamped in the file name. So I agree with a low priority on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants