Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Adding integration with Label Studio #554

Merged
merged 64 commits into from
Sep 28, 2021

Conversation

KonstantinKorotaev
Copy link
Contributor

What does this PR do?

Adds LabelStudioDataset, ImageClassificationData.from_labelstudio and example of usage

Before submitting

  • [n] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • [y] Did you read the contributor guideline, Pull Request section?
  • [y] Did you make sure your PR does only one thing, instead of bundling different changes together?
  • [n] Did you make sure to update the documentation with your changes?
  • [n] Did you write any new necessary tests? [not needed for typos/docs]
  • [y] Did you verify new and existing tests pass locally with your changes?
  • [n] If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • [n] Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@pep8speaks
Copy link

pep8speaks commented Jul 7, 2021

Hello @KonstantinKorotaev! Thanks for updating this PR.

Line 1223:9: E741 ambiguous variable name 'l'

Line 32:56: W292 no newline at end of file

Comment last updated at 2021-08-13 13:35:51 UTC

@codecov
Copy link

codecov bot commented Jul 7, 2021

Codecov Report

Merging #554 (b81f53f) into master (f2c9b2a) will increase coverage by 0.85%.
The diff coverage is 85.09%.

❗ Current head b81f53f differs from pull request most recent head 5b5be24. Consider uploading reports for the commit 5b5be24 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #554      +/-   ##
==========================================
+ Coverage   84.16%   85.02%   +0.85%     
==========================================
  Files         217      219       +2     
  Lines       11627    11879     +252     
==========================================
+ Hits         9786    10100     +314     
+ Misses       1841     1779      -62     
Flag Coverage Δ
unittests 85.02% <85.09%> (+0.85%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/core/integrations/labelstudio/data_source.py 81.31% <81.31%> (ø)
flash/core/data/data_module.py 94.18% <87.50%> (-0.42%) ⬇️
flash/core/integrations/labelstudio/visualizer.py 96.00% <96.00%> (ø)
flash/core/data/data_source.py 94.73% <100.00%> (+0.01%) ⬆️
flash/image/classification/data.py 94.17% <100.00%> (+0.05%) ⬆️
flash/tabular/regression/data.py 100.00% <100.00%> (ø)
flash/text/classification/data.py 58.03% <100.00%> (-27.62%) ⬇️
flash/video/classification/data.py 82.88% <100.00%> (+0.15%) ⬆️
flash/text/question_answering/model.py 29.16% <0.00%> (-61.81%) ⬇️
flash/text/question_answering/data.py 28.29% <0.00%> (-58.53%) ⬇️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f2c9b2a...5b5be24. Read the comment docs.

flash/core/data/data_module.py Outdated Show resolved Hide resolved
flash/core/data/data_module.py Outdated Show resolved Hide resolved
flash/core/data/data_source.py Outdated Show resolved Hide resolved
KonstantinKorotaev and others added 6 commits August 13, 2021 16:33
- Move from dataset to datasource
- Add text processing example
- Add video example
- Add data type for DataSource
- Move split to load_data
Copy link
Collaborator

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome! I think it would be cleaner to split the datasource into different classes for the three modalities. The examples are really clean! Let's also add a docs page in docs/source/integrations which showcases the three examples 😃 (this could be done in a follow-up PR)

flash/core/data/data_module.py Outdated Show resolved Hide resolved
flash/core/data/data_source.py Outdated Show resolved Hide resolved
flash/core/data/data_source.py Outdated Show resolved Hide resolved
flash/core/data/data_source.py Outdated Show resolved Hide resolved
@mergify mergify bot removed the has conflicts label Aug 23, 2021
@tchaton
Copy link
Contributor

tchaton commented Sep 10, 2021

Hey @KonstantinKorotaev,

Mind adding yourself and someone else from your team to the https://github.com/PyTorchLightning/lightning-flash/blob/master/.github/CODEOWNERS as code-owners for the LabelStudio integration.

Would you mind to share screenshots / gifs of the visuals on LabelStudio for both dataset exploration / predictions ?

Best,
T.C

Copy link
Collaborator

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good 😃 Couple of comments / suggestions

flash/core/data/data_module.py Outdated Show resolved Hide resolved
def load_data(self, data: Optional[Any] = None, dataset: Optional[Any] = None) -> Sequence[Mapping[str, Any]]:
"""load_data produces a sequence or iterable of samples."""
res = super().load_data(data, dataset)
return self.convert_to_encodedvideo(res)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with the stuff attached to self? There you could end up with a dataset that hasn't been wrapped. Possible solution would be to wrap the test set and val set here if they were set in the super?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work like that:

  1. Load data with default load_data in LabelStudioDataSource
  2. Wrap result with convert_to_encodedvideo method

Could you please suggest the scenario for checking the situation with the unwrapped dataset?

@tchaton
Copy link
Contributor

tchaton commented Sep 21, 2021

Dear @KonstantinKorotaev,

One of the test is failing. If you send me the data, I can upload them on your s3 bucket. It seems it has not been public and can't download.

Best,
T.C

@KonstantinKorotaev
Copy link
Contributor Author

KonstantinKorotaev commented Sep 22, 2021 via email

@tchaton tchaton self-assigned this Sep 28, 2021
@tchaton tchaton enabled auto-merge (squash) September 28, 2021 20:56
@tchaton tchaton merged commit 0a28672 into Lightning-Universe:master Sep 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants