Improve all taxonomies (CSET and GMF) CSV export #3082

pdcp1 · 2024-09-05T18:06:58Z

Part of #3023
On top of #3030

This PR changes the way that we export taxonomies CSV files.
Now it iterates through all taxonomies on the taxa collection, dynamically generating a CSV file based on the fields of each taxonomy and the data from the classifications collection.

Based on the current data, it will produce these 5 CSV files:

classifications_CSETv0.csv
classifications_CSETv1.csv
classifications_CSETv1_Annotator-1.csv
classifications_CSETv1_Annotator-2.csv
classifications_CSETv1_Annotator-3.csv
classifications_GMF.csv

Testing

To test these changes, please go to the Pablo's fork repository and run the GitHub Action manually:
https://github.com/pdcp1/aiid/actions/workflows/db-backup.yml

The backup file will be uploaded to a public test Cloudflare bucket account. To access the backup file use the public URL https://pub-daddb16dc28841779b83690f75eb5c57.r2.dev/[backup file]
i.e.: https://pub-daddb16dc28841779b83690f75eb5c57.r2.dev/backup-20240905204113.tar.bz2

codecov · 2024-09-05T18:13:05Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.34%. Comparing base (4046eb3) to head (27a6feb).
Report is 3 commits behind head on staging.

Additional details and impacted files

@@             Coverage Diff             @@
##           staging    #3082      +/-   ##
===========================================
+ Coverage    79.25%   79.34%   +0.09%     
===========================================
  Files          165      165              
  Lines        13263    13263              
  Branches      1524     1527       +3     
===========================================
+ Hits         10511    10523      +12     
+ Misses        2459     2447      -12     
  Partials       293      293

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

cesarvarela

Anywhere I can see this working?

pdcp1 · 2024-09-05T20:47:50Z

Anywhere I can see this working?

@cesarvarela Yes, I just updated the PR description with the testing instructions.

cesarvarela

Looks good; my only question is if the array notation is expected for lists in CSV files:

I think it is usually comma-separated and without quotes.

pdcp1 · 2024-09-10T00:27:19Z

@cesarvarela It's ready for another review. Now it convert Array of values into comma-separated values as you suggested.
An example of CSV exports is here https://pub-daddb16dc28841779b83690f75eb5c57.r2.dev/backup-20240910002252.tar.bz2

Handle and export all taxonomies to CSV

bb89f70

pdcp1 self-assigned this Sep 5, 2024

pdcp1 temporarily deployed to staging September 5, 2024 18:07 — with GitHub Actions Inactive

pdcp1 mentioned this pull request Sep 5, 2024

Improve all taxonomies (CSET and GMF) CSV export files #3034

Closed

pdcp1 marked this pull request as ready for review September 5, 2024 18:08

pdcp1 requested a review from cesarvarela September 5, 2024 18:17

pdcp1 temporarily deployed to staging September 5, 2024 18:28 — with GitHub Actions Inactive

cesarvarela reviewed Sep 5, 2024

View reviewed changes

pdcp1 requested a review from cesarvarela September 5, 2024 20:48

cesarvarela approved these changes Sep 6, 2024

View reviewed changes

Convert JSON array of values into comma separated values

27a6feb

pdcp1 temporarily deployed to staging September 10, 2024 00:21 — with GitHub Actions Inactive

pdcp1 requested a review from cesarvarela September 10, 2024 00:25

pdcp1 temporarily deployed to staging September 10, 2024 00:42 — with GitHub Actions Inactive

cesarvarela merged commit 3f67946 into responsible-ai-collaborative:staging Sep 10, 2024
26 checks passed

pdcp1 mentioned this pull request Sep 10, 2024

Fix CSET taxonomies snapshots and CSV exports #3023

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve all taxonomies (CSET and GMF) CSV export #3082

Improve all taxonomies (CSET and GMF) CSV export #3082

pdcp1 commented Sep 5, 2024 •

edited

Loading

codecov bot commented Sep 5, 2024 •

edited

Loading

cesarvarela left a comment

pdcp1 commented Sep 5, 2024

cesarvarela left a comment

pdcp1 commented Sep 10, 2024

Improve all taxonomies (CSET and GMF) CSV export #3082

Improve all taxonomies (CSET and GMF) CSV export #3082

Conversation

pdcp1 commented Sep 5, 2024 • edited Loading

Testing

codecov bot commented Sep 5, 2024 • edited Loading

Codecov Report

cesarvarela left a comment

Choose a reason for hiding this comment

pdcp1 commented Sep 5, 2024

cesarvarela left a comment

Choose a reason for hiding this comment

pdcp1 commented Sep 10, 2024

pdcp1 commented Sep 5, 2024 •

edited

Loading

codecov bot commented Sep 5, 2024 •

edited

Loading