[Project Idea] InstructLab Taxonomy Reporting #1

hemajv · 2024-06-20T20:51:45Z

Currently, InstructLab does not publish any metrics per taxonomy leaf node. We would like to explore different ways we can evaluate the InstructLab model being fine tuned via the taxonomy approach and come up with metrics to evaluate each of the taxonomy leaf nodes.

Each leaf node in the taxonomy represents one particular skill, or set of knowledge. Here is one example:
https://github.com/instructlab/taxonomy/blob/main/compositional_skills/linguistics/complete_common_expressions/qna.yaml. We see that each leaf has question and answer pairs. We would like to track how many questions from these yaml files the model answers correctly (for some definition of correctness), over time.

hemajv · 2024-06-20T20:56:11Z

@erikerlandson please feel free to add to this

PalmPalm7 · 2024-06-24T19:09:12Z

First steps according to Sanjay, feel free to edit/add!

Run Granite model
Take Granite model to qLora fine-tune on small datasets
Testing infra and learn a few things
Then evaluate knowledge trees

hemajv assigned alekhyak1 and PalmPalm7 Jun 20, 2024

PalmPalm7 mentioned this issue Jun 21, 2024

Per Taxonomy Leaf Reporting instructlab/instructlab#2127

Open

hemajv added this to Data Science WG Jul 12, 2024

hemajv moved this to Backlog in Data Science WG Jul 12, 2024

suppathak self-assigned this Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Project Idea] InstructLab Taxonomy Reporting #1

[Project Idea] InstructLab Taxonomy Reporting #1

hemajv commented Jun 20, 2024 •

edited

Loading

hemajv commented Jun 20, 2024

PalmPalm7 commented Jun 24, 2024 •

edited

Loading

[Project Idea] InstructLab Taxonomy Reporting #1

[Project Idea] InstructLab Taxonomy Reporting #1

Comments

hemajv commented Jun 20, 2024 • edited Loading

hemajv commented Jun 20, 2024

PalmPalm7 commented Jun 24, 2024 • edited Loading

hemajv commented Jun 20, 2024 •

edited

Loading

PalmPalm7 commented Jun 24, 2024 •

edited

Loading