You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just want to put this here so we can discuss it at some point.
We want to run a separate DAGs for each report/dashboard. In each DAG we only want to publish the fields that are used in the report/dashboard. We also don't want to harvest the same DOI from OpenAlex, Dimensions, SUL-Pub too frequently. Having 5 DAGs run monthly would be inefficient. We could consider changing the DAG we have to do all of the harvesting and publish the OpenAlex, Dimensions, SUL-Pub pubs as JSONL files to the rialto@sul-rialto-airflow-dev server. This DAG could run monthly and keep the pubs updated. We could then move the merge task to other DAGs which could read the JSONL files, merge them, and filter and publish the contributions/publications files with only the required columns.
This needs some thought but it may be worth exploring.
The text was updated successfully, but these errors were encountered:
I just want to put this here so we can discuss it at some point.
We want to run a separate DAGs for each report/dashboard. In each DAG we only want to publish the fields that are used in the report/dashboard. We also don't want to harvest the same DOI from OpenAlex, Dimensions, SUL-Pub too frequently. Having 5 DAGs run monthly would be inefficient. We could consider changing the DAG we have to do all of the harvesting and publish the OpenAlex, Dimensions, SUL-Pub pubs as JSONL files to the
rialto@sul-rialto-airflow-dev
server. This DAG could run monthly and keep the pubs updated. We could then move the merge task to other DAGs which could read the JSONL files, merge them, and filter and publish the contributions/publications files with only the required columns.This needs some thought but it may be worth exploring.
The text was updated successfully, but these errors were encountered: