Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ArgillaSpaCyTransformersTrainer & improve ArgillaSpaCyTrainer #3256

Merged
merged 17 commits into from
Jun 29, 2023

Conversation

alvarobartt
Copy link
Member

Description

This PR adds support for spacy-transformers via the new ArgillaSpaCyTransformersTrainer class allowing the user to lock the transformer model not to be updated, and the ArgillaSpaCyTrainer is improved to allow re-using the tok2vec or freeze it, if available.

Besides that, this PR also includes a new arg in ArgillaTrainer named framework_kwargs which is a Python dict to contain the framework-specific kwargs, in this case intended to be created as {"update_transformer": False} and {"freeze_tok2vec": False} for both ArgillaSpaCyTransformersTrainer and ArgillaSpaCyTrainer, respectively. Ideally, we should also move the spaCy specific-args already included as ArgillaTrainer args.

Type of change

  • New feature (non-breaking change which adds functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

  • Add unit-tests for ArgillaSpaCyTransformersTrainer

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@alvarobartt alvarobartt added the type: integration Indicates integrations with third parties label Jun 23, 2023
@davidberenstein1957 davidberenstein1957 changed the base branch from develop to feature/prepare-for-training-feedbacktask June 23, 2023 14:43
@davidberenstein1957 davidberenstein1957 changed the base branch from feature/prepare-for-training-feedbacktask to develop June 23, 2023 14:45
Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for the fix. Perhaps it makes more sense to me to pass the variable via the update_config to align with the usage of the rest of the frameworks. Also, I don't see the new config options being test right?

src/argilla/training/spacy.py Show resolved Hide resolved
tests/training/test_spacy.py Show resolved Hide resolved
@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Jun 27, 2023

Hi @alvarobartt, could you also include the implementation and tests for the FeedbackDataset?

@alvarobartt
Copy link
Member Author

Hi @alvarobartt, could you also include the implementation and tests for the FeedbackDataset?

Not sure I'll have time for this release, need to focus now on the RankingQuestion, but we can create a ticket for the next one

@davidberenstein1957
Copy link
Member

I can have a look tomorrow afternoon. I implemented most stuff so it should only require some small changes.

@alvarobartt
Copy link
Member Author

Hi @davidberenstein1957 can you have a look into the unit test that is failing? Is there anything that you changed w.r.t. ArgillaSpaCyTrainer in the ArgillaTrainer for the FeedbackDatasets? I guess we're good to merge as the failing unit test is unrelated?

@alvarobartt alvarobartt added this to the v1.12.0 milestone Jun 28, 2023
@davidberenstein1957
Copy link
Member

@alvarobartt, I added the integration and some test. Can you have a final look tomorrow before merging?

@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 84.98% and project coverage change: -0.77 ⚠️

Comparison is base (51751ac) 90.91% compared to head (d1c9ccd) 90.14%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3256      +/-   ##
===========================================
- Coverage    90.91%   90.14%   -0.77%     
===========================================
  Files          215      233      +18     
  Lines        11304    12493    +1189     
===========================================
+ Hits         10277    11262     +985     
- Misses        1027     1231     +204     
Flag Coverage Δ
pytest 90.14% <84.98%> (-0.77%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/argilla/__init__.py 86.66% <ø> (+3.33%) ⬆️
...illa/client/feedback/training/frameworks/openai.py 0.00% <0.00%> (ø)
...rgilla/client/feedback/training/frameworks/peft.py 0.00% <0.00%> (ø)
...client/feedback/training/frameworks/span_marker.py 0.00% <0.00%> (ø)
src/argilla/server/contexts/datasets.py 96.01% <ø> (ø)
src/argilla/server/seeds.py 0.00% <ø> (ø)
src/argilla/tasks/training/__main__.py 30.00% <0.00%> (-1.58%) ⬇️
src/argilla/tasks/users/create.py 91.11% <ø> (-4.45%) ⬇️
src/argilla/training/autotrain_advanced.py 0.00% <0.00%> (ø)
src/argilla/training/peft.py 0.00% <0.00%> (ø)
... and 62 more

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@alvarobartt alvarobartt merged commit 3598977 into develop Jun 29, 2023
@alvarobartt alvarobartt deleted the feat/spacy-and-spacy-transformers branch June 29, 2023 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: integration Indicates integrations with third parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants