Convert Entity Relationship Extraction in DSPy to using CoT #44

NumberChiffre · 2024-09-18T00:03:56Z

Description

DSPy has typed predictors/CoT that uses pydantic models as schema for their return types, however this is not as reliable as non-typed like CoT (because if something goes wrong with the formatting, you can catch it and resolve it yourself, often times some JSON parsing error or response from outputfield ends up in the prediction.rationale of CoT). Therefore, this PR is here to resolve these formatting issues with DSPy once and for all (hopefully) and run MIPROv2 to generate optimal prompt instructions for entity relationship extraction.

Misc:

Need to resolve this before moving forward, or an alternative way: MIPROv2 crashes with AssertionError: No input variables found in the example stanfordnlp/dspy#1506
Switched from DeepSeek Beta version with max output token of 8k instead of 4k.
Qwen2-7B works with entity relationship extraction using DSPy, but needs further fine-tuning with more examples

…erimental DSPy in notebook

…anging to CoT

codecov · 2024-09-19T03:35:51Z

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.25%. Comparing base (f11e9f2) to head (20bb681).
Report is 12 commits behind head on main.

Files with missing lines	Patch %	Lines
nano_graphrag/entity_extraction/extract.py	96.77%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #44      +/-   ##
==========================================
- Coverage   94.36%   94.25%   -0.12%     
==========================================
  Files          11       12       +1     
  Lines        1189     1288      +99     
==========================================
+ Hits         1122     1214      +92     
- Misses         67       74       +7

Flag	Coverage Δ
	`94.25% <98.66%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…instructions that works for qwen2-7b

…task model

…4#44) * Converted TypedPredictor to CoT and removed pydantic models using experimental DSPy in notebook * Fix entity extraction unittests after removing pydantic models and changing to CoT * Add working random search fine tuning with better metrics * Still cannot get MIPROv2 to work * Working MIPROv2 with TypedChainOfThought * Updated metrics to compute all relationships at once, updated prompt instructions that works for qwen2-7b * Add updated notebooks with fine tuning using MIPROv2 and qwen2-7b as task model * Add compiled model for generate dataset with updated unittests --------- Co-authored-by: terence-gpt <numberchiffre@users.noreply.github.com>

Converted TypedPredictor to CoT and removed pydantic models using exp…

be09b22

…erimental DSPy in notebook

NumberChiffre self-assigned this Sep 18, 2024

NumberChiffre added enhancement New feature or request dspy labels Sep 18, 2024

NumberChiffre added 3 commits September 17, 2024 20:51

Fix entity extraction unittests after removing pydantic models and ch…

0d3ceb7

…anging to CoT

Add working random search fine tuning with better metrics

0007081

Still cannot get MIPROv2 to work

97d91e2

NumberChiffre marked this pull request as ready for review September 19, 2024 03:33

NumberChiffre added 4 commits September 19, 2024 03:50

Working MIPROv2 with TypedChainOfThought

da9812f

Updated metrics to compute all relationships at once, updated prompt …

e0f1a6d

…instructions that works for qwen2-7b

Add updated notebooks with fine tuning using MIPROv2 and qwen2-7b as …

91e81e0

…task model

Add compiled model for generate dataset with updated unittests

20bb681

NumberChiffre merged commit 5adf21f into gusye1234:main Sep 23, 2024
2 of 3 checks passed

gusye1234 mentioned this pull request Sep 26, 2024

Feature: prompt tuning #64

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert Entity Relationship Extraction in DSPy to using CoT #44

Convert Entity Relationship Extraction in DSPy to using CoT #44

NumberChiffre commented Sep 18, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading

Convert Entity Relationship Extraction in DSPy to using CoT #44

Convert Entity Relationship Extraction in DSPy to using CoT #44

Conversation

NumberChiffre commented Sep 18, 2024 • edited Loading

Description

Misc:

codecov bot commented Sep 19, 2024 • edited Loading

Codecov Report

NumberChiffre commented Sep 18, 2024 •

edited

Loading

codecov bot commented Sep 19, 2024 •

edited

Loading