Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert Entity Relationship Extraction in DSPy to using CoT #44

Conversation

NumberChiffre
Copy link
Collaborator

@NumberChiffre NumberChiffre commented Sep 18, 2024

Description

DSPy has typed predictors/CoT that uses pydantic models as schema for their return types, however this is not as reliable as non-typed like CoT (because if something goes wrong with the formatting, you can catch it and resolve it yourself, often times some JSON parsing error or response from outputfield ends up in the prediction.rationale of CoT). Therefore, this PR is here to resolve these formatting issues with DSPy once and for all (hopefully) and run MIPROv2 to generate optimal prompt instructions for entity relationship extraction.

Misc:

@NumberChiffre NumberChiffre self-assigned this Sep 18, 2024
@NumberChiffre NumberChiffre added enhancement New feature or request dspy labels Sep 18, 2024
@NumberChiffre NumberChiffre marked this pull request as ready for review September 19, 2024 03:33
Copy link

codecov bot commented Sep 19, 2024

Codecov Report

Attention: Patch coverage is 98.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 94.25%. Comparing base (f11e9f2) to head (20bb681).
Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
nano_graphrag/entity_extraction/extract.py 96.77% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #44      +/-   ##
==========================================
- Coverage   94.36%   94.25%   -0.12%     
==========================================
  Files          11       12       +1     
  Lines        1189     1288      +99     
==========================================
+ Hits         1122     1214      +92     
- Misses         67       74       +7     
Flag Coverage Δ
94.25% <98.66%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@NumberChiffre NumberChiffre merged commit 5adf21f into gusye1234:main Sep 23, 2024
2 of 3 checks passed
rangehow pushed a commit to rangehow/nano-graphrag that referenced this pull request Oct 18, 2024
…4#44)

* Converted TypedPredictor to CoT and removed pydantic models using experimental DSPy in notebook

* Fix entity extraction unittests after removing pydantic models and changing to CoT

* Add working random search fine tuning with better metrics

* Still cannot get MIPROv2 to work

* Working MIPROv2 with TypedChainOfThought

* Updated metrics to compute all relationships at once, updated prompt instructions that works for qwen2-7b

* Add updated notebooks with fine tuning using MIPROv2 and qwen2-7b as task model

* Add compiled model for generate dataset with updated unittests

---------

Co-authored-by: terence-gpt <numberchiffre@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dspy enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant