Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataflow to publish alienspecies cubes on ZENODO #270

Open
SanderDevisscher opened this issue Nov 5, 2024 · 7 comments
Open

Create dataflow to publish alienspecies cubes on ZENODO #270

SanderDevisscher opened this issue Nov 5, 2024 · 7 comments

Comments

@SanderDevisscher
Copy link
Collaborator

The cubes will be downloaded monthly by the get_occ_cube github action (after #18).
Since this information would be interesting for other researchers (in and outside of INBO) we should publish the cube to ZENODO.

@damianooldoni we should plan a meet to determine the best way to achieve this.

@SanderDevisscher
Copy link
Collaborator Author

SanderDevisscher commented Nov 12, 2024

A new workflow should be created to update eu_alientaxacube on Zenodo. It should be triggered when the get_occ_cube - workflow is completed successfully. This new workflow will run in parallel with the cube preprocessing - workflow.

Todo:

  • add logic to log download doi's to get_occ_cube flow
  • @damianooldoni adds a access token to this repo's secrets
  • use deposits to update eu_alientaxacube on zenodo

Taxon metadata is not yet included in b-cubed flow see gbif/occurrence-cube#9 for updates

@SanderDevisscher
Copy link
Collaborator Author

@damianooldoni can you provide a link to the zenodo dataset to be updated ?

@damianooldoni
Copy link
Member

@SanderDevisscher
Copy link
Collaborator Author

@damianooldoni do we need other info besides the DOI of the component downloads (1 for each rank cube + class cube) ?

SanderDevisscher added a commit that referenced this issue Jan 28, 2025
SanderDevisscher added a commit that referenced this issue Jan 28, 2025
SanderDevisscher added a commit that referenced this issue Jan 28, 2025
SanderDevisscher added a commit that referenced this issue Jan 28, 2025
SanderDevisscher added a commit that referenced this issue Jan 28, 2025
SanderDevisscher added a commit that referenced this issue Jan 28, 2025
SanderDevisscher added a commit that referenced this issue Jan 28, 2025
@SanderDevisscher
Copy link
Collaborator Author

The flow is running, see download occurrence cube

@damianooldoni
Copy link
Member

I think it's enough.

@SanderDevisscher
Copy link
Collaborator Author

SanderDevisscher commented Jan 29, 2025

this flow failed on an other job, but here is how the final metadata file would look like: https://github.com/inbo/aspbo/actions/runs/13016533798/artifacts/2499619831

be_cube_metadata.csv

I suggest the next step would be to create a seperate github action with the following parts in the .yaml file:

To trigger upon succesful completion of the previous flow

on:
  workflow_dispatch:
  workflow_run:
    workflows: ["download occurrence cube"]
    types:
      - completed

jobs:
  fetch-data:
    runs-on: ${{ matrix.config.os }}
    if: ${{ github.event.workflow_run.conclusion == 'success' || github.event_name == 'workflow_dispatch' }}

To retrieve the metadata and both cubes:

     - name: Get latest successful run ID 
        id: get-latest-run
        run: |
          run_id=$(gh run list -w get_occ_cube.yaml --status success --json databaseId --jq '.[0].databaseId')
          echo "run_id=$run_id" >> $GITHUB_OUTPUT
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Download be_alientaxa_cube artifact
        uses: actions/download-artifact@v4
        with:
          name: be_alientaxa_cube.csv
          path: ./cube_data/
          run-id: ${{ steps.get-latest-run.outputs.run_id }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          merge-multiple: true

     - name: Download be_classes_cube artifact
        uses: actions/download-artifact@v4
        with:
          name: be_classes_cube.csv
          path: ./cube_data/
          run-id: ${{ steps.get-latest-run.outputs.run_id }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          merge-multiple: true

     - name: Download metadata artifact
        uses: actions/download-artifact@v4
        with:
          name: be_cube_metadata.csv
          path: ./cube_data/
          run-id: ${{ steps.get-latest-run.outputs.run_id }}
          github-token: ${{ secrets.GITHUB_TOKEN }}
          merge-multiple: true

This github action would do the neccesary steps to upload the cubes to zenodo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants