Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider changes to DAG #100

Open
jacobthill opened this issue Sep 26, 2024 · 0 comments
Open

Consider changes to DAG #100

jacobthill opened this issue Sep 26, 2024 · 0 comments

Comments

@jacobthill
Copy link
Collaborator

I just want to put this here so we can discuss it at some point.

We want to run a separate DAGs for each report/dashboard. In each DAG we only want to publish the fields that are used in the report/dashboard. We also don't want to harvest the same DOI from OpenAlex, Dimensions, SUL-Pub too frequently. Having 5 DAGs run monthly would be inefficient. We could consider changing the DAG we have to do all of the harvesting and publish the OpenAlex, Dimensions, SUL-Pub pubs as JSONL files to the rialto@sul-rialto-airflow-dev server. This DAG could run monthly and keep the pubs updated. We could then move the merge task to other DAGs which could read the JSONL files, merge them, and filter and publish the contributions/publications files with only the required columns.

This needs some thought but it may be worth exploring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant