feat: add `ArgillaSpaCyTransformersTrainer` & improve `ArgillaSpaCyTrainer` #3256

alvarobartt · 2023-06-23T13:31:17Z

Description

This PR adds support for spacy-transformers via the new ArgillaSpaCyTransformersTrainer class allowing the user to lock the transformer model not to be updated, and the ArgillaSpaCyTrainer is improved to allow re-using the tok2vec or freeze it, if available.

Besides that, this PR also includes a new arg in ArgillaTrainer named framework_kwargs which is a Python dict to contain the framework-specific kwargs, in this case intended to be created as {"update_transformer": False} and {"freeze_tok2vec": False} for both ArgillaSpaCyTransformersTrainer and ArgillaSpaCyTrainer, respectively. Ideally, we should also move the spaCy specific-args already included as ArgillaTrainer args.

Type of change

New feature (non-breaking change which adds functionality)
Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

Add unit-tests for ArgillaSpaCyTransformersTrainer

Checklist

I added relevant documentation
follows the style guidelines of this project
I did a self-review of my code
I made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I filled out the contributor form (see text above)
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

src/argilla/training/base.py

davidberenstein1957

LGTM and thanks for the fix. Perhaps it makes more sense to me to pass the variable via the update_config to align with the usage of the rest of the frameworks. Also, I don't see the new config options being test right?

src/argilla/training/spacy.py

tests/training/test_spacy.py

Otherwise the configuration for `spacy-transformers` generated via `init_config` doesn't support it

davidberenstein1957 · 2023-06-27T10:53:53Z

Hi @alvarobartt, could you also include the implementation and tests for the FeedbackDataset?

src/argilla/training/spacy.py

alvarobartt · 2023-06-27T11:15:52Z

Hi @alvarobartt, could you also include the implementation and tests for the FeedbackDataset?

Not sure I'll have time for this release, need to focus now on the RankingQuestion, but we can create a ticket for the next one

davidberenstein1957 · 2023-06-27T11:17:08Z

I can have a look tomorrow afternoon. I implemented most stuff so it should only require some small changes.

alvarobartt · 2023-06-28T06:28:44Z

Hi @davidberenstein1957 can you have a look into the unit test that is failing? Is there anything that you changed w.r.t. ArgillaSpaCyTrainer in the ArgillaTrainer for the FeedbackDatasets? I guess we're good to merge as the failing unit test is unrelated?

davidberenstein1957 · 2023-06-28T20:24:27Z

@alvarobartt, I added the integration and some test. Can you have a final look tomorrow before merging?

for more information, see https://pre-commit.ci

codecov · 2023-06-28T20:47:01Z

Codecov Report

Patch coverage: 84.98% and project coverage change: -0.77 ⚠️

Comparison is base (51751ac) 90.91% compared to head (d1c9ccd) 90.14%.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #3256      +/-   ##
===========================================
- Coverage    90.91%   90.14%   -0.77%     
===========================================
  Files          215      233      +18     
  Lines        11304    12493    +1189     
===========================================
+ Hits         10277    11262     +985     
- Misses        1027     1231     +204

Flag	Coverage Δ
pytest	`90.14% <84.98%> (-0.77%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/argilla/__init__.py	`86.66% <ø> (+3.33%)`	⬆️
...illa/client/feedback/training/frameworks/openai.py	`0.00% <0.00%> (ø)`
...rgilla/client/feedback/training/frameworks/peft.py	`0.00% <0.00%> (ø)`
...client/feedback/training/frameworks/span_marker.py	`0.00% <0.00%> (ø)`
src/argilla/server/contexts/datasets.py	`96.01% <ø> (ø)`
src/argilla/server/seeds.py	`0.00% <ø> (ø)`
src/argilla/tasks/training/__main__.py	`30.00% <0.00%> (-1.58%)`	⬇️
src/argilla/tasks/users/create.py	`91.11% <ø> (-4.45%)`	⬇️
src/argilla/training/autotrain_advanced.py	`0.00% <0.00%> (ø)`
src/argilla/training/peft.py	`0.00% <0.00%> (ø)`
... and 62 more

... and 5 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

alvarobartt added 6 commits June 23, 2023 14:53

docs: add notes for spaCy training in notebooks

9ad88c3

feat: add optimize arg

ca9637f

refactor: add _ArgillaSpaCyTrainerBase

7c6ce5c

feat: add ArgillaSpaCyTrainer and ArgillaSpaCyTransformersTrainer

d69bf7e

feat: add Framework.SPACY_TRANSFORMERS

5acc02b

test: add ArgillaSpaCyTransformersTrainer tests

d7c5224

alvarobartt added the type: integration Indicates integrations with third parties label Jun 23, 2023

alvarobartt requested a review from davidberenstein1957 June 23, 2023 13:31

davidberenstein1957 changed the base branch from develop to feature/prepare-for-training-feedbacktask June 23, 2023 14:43

davidberenstein1957 changed the base branch from feature/prepare-for-training-feedbacktask to develop June 23, 2023 14:45

davidberenstein1957 reviewed Jun 23, 2023

View reviewed changes

src/argilla/training/base.py Show resolved Hide resolved

davidberenstein1957 reviewed Jun 23, 2023

View reviewed changes

src/argilla/training/spacy.py Show resolved Hide resolved

tests/training/test_spacy.py Show resolved Hide resolved

alvarobartt mentioned this pull request Jun 26, 2023

[FEATURE] Feature/prepare for training feedbacktask #3151

Merged

12 tasks

alvarobartt added 6 commits June 27, 2023 12:05

Merge branch 'develop' into feat/spacy-and-spacy-transformers

903ba6f

fix(test): add missing self in TestSpaCyTrainer methods

e8b9554

fix: don't check Enum against value

9a4587a

fix: set flag before super().__init__

deb4758

fix: add require_version(spacy>=3.5.3)

197ddd6

Otherwise the configuration for `spacy-transformers` generated via `init_config` doesn't support it

fix: tok2vec freezing

2f6ac32

davidberenstein1957 reviewed Jun 27, 2023

View reviewed changes

src/argilla/training/spacy.py Show resolved Hide resolved

alvarobartt added 2 commits June 27, 2023 13:11

chore: upgrade spacy and add spacy-transformers

97c26c2

test: skip test_predict_wo_training for bert-tiny

13b737e

alvarobartt requested a review from davidberenstein1957 June 28, 2023 14:25

alvarobartt added this to the v1.12.0 milestone Jun 28, 2023

davidberenstein1957 added 2 commits June 28, 2023 22:23

chore: added integration with Feedback task

ced612b

Merge branch 'develop' into feat/spacy-and-spacy-transformers

e6f83b3

[pre-commit.ci] auto fixes from pre-commit.com hooks

d1c9ccd

for more information, see https://pre-commit.ci

davidberenstein1957 removed their request for review June 28, 2023 20:25

davidberenstein1957 assigned alvarobartt Jun 28, 2023

davidberenstein1957 self-requested a review June 28, 2023 20:25

davidberenstein1957 approved these changes Jun 28, 2023

View reviewed changes

alvarobartt merged commit 3598977 into develop Jun 29, 2023

alvarobartt deleted the feat/spacy-and-spacy-transformers branch June 29, 2023 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `ArgillaSpaCyTransformersTrainer` & improve `ArgillaSpaCyTrainer` #3256

feat: add `ArgillaSpaCyTransformersTrainer` & improve `ArgillaSpaCyTrainer` #3256

alvarobartt commented Jun 23, 2023

davidberenstein1957 left a comment

davidberenstein1957 commented Jun 27, 2023 •

edited

Loading

alvarobartt commented Jun 27, 2023

davidberenstein1957 commented Jun 27, 2023

alvarobartt commented Jun 28, 2023

davidberenstein1957 commented Jun 28, 2023

codecov bot commented Jun 28, 2023 •

edited

Loading

feat: add ArgillaSpaCyTransformersTrainer & improve ArgillaSpaCyTrainer #3256

feat: add ArgillaSpaCyTransformersTrainer & improve ArgillaSpaCyTrainer #3256

Conversation

alvarobartt commented Jun 23, 2023

Description

davidberenstein1957 left a comment

Choose a reason for hiding this comment

davidberenstein1957 commented Jun 27, 2023 • edited Loading

alvarobartt commented Jun 27, 2023

davidberenstein1957 commented Jun 27, 2023

alvarobartt commented Jun 28, 2023

davidberenstein1957 commented Jun 28, 2023

codecov bot commented Jun 28, 2023 • edited Loading

Codecov Report

feat: add `ArgillaSpaCyTransformersTrainer` & improve `ArgillaSpaCyTrainer` #3256

feat: add `ArgillaSpaCyTransformersTrainer` & improve `ArgillaSpaCyTrainer` #3256

davidberenstein1957 commented Jun 27, 2023 •

edited

Loading

codecov bot commented Jun 28, 2023 •

edited

Loading