Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add metadata to Record #3194

Merged
merged 29 commits into from
Jun 16, 2023
Merged

Conversation

gabrielmbmb
Copy link
Member

@gabrielmbmb gabrielmbmb commented Jun 14, 2023

Description

This PR adds a new attribute called metadata to the Record of the FeedbackDataset.

  • A new metadata column/attribute has been added to the Record ORM class (a new column metadata will be added by the migration script generated by alembic)
  • All the endpoints in the API v1 listing records has been updated to return this new metadata column
  • The SDK has been updated to parse the metadata key returned by the API
  • The Python client has been updated so new records can be created including metadata

Additionally, I've refactor some if/else conditions in some methods from the FeedbackDataset class.

Closes #3155

Type of change

  • New feature (non-breaking change which adds functionality)
  • Refactor

How Has This Been Tested

Unit tests has been updated and I've done some manual tests.

Checklist

  • I have merged the original branch into my forked branch
  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

@codecov
Copy link

codecov bot commented Jun 15, 2023

Codecov Report

Patch coverage: 82.66% and project coverage change: -0.24 ⚠️

Comparison is base (51751ac) 90.91% compared to head (a437ddf) 90.68%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3194      +/-   ##
===========================================
- Coverage    90.91%   90.68%   -0.24%     
===========================================
  Files          215      215              
  Lines        11304    11342      +38     
===========================================
+ Hits         10277    10285       +8     
- Misses        1027     1057      +30     
Flag Coverage Δ
pytest 90.68% <82.66%> (-0.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/argilla/__init__.py 83.33% <ø> (ø)
src/argilla/server/contexts/datasets.py 96.01% <ø> (ø)
src/argilla/client/feedback/utils.py 77.77% <70.00%> (+6.34%) ⬆️
src/argilla/client/feedback/dataset.py 82.92% <73.68%> (+0.40%) ⬆️
src/argilla/_version.py 100.00% <100.00%> (ø)
src/argilla/client/apis/datasets.py 90.37% <100.00%> (ø)
src/argilla/client/feedback/schemas.py 98.23% <100.00%> (+0.01%) ⬆️
src/argilla/client/sdk/v1/datasets/api.py 89.00% <100.00%> (+0.11%) ⬆️
src/argilla/client/sdk/v1/datasets/models.py 96.15% <100.00%> (+0.07%) ⬆️
src/argilla/server/apis/v1/handlers/datasets.py 100.00% <100.00%> (ø)
... and 4 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

gabrielmbmb and others added 2 commits June 15, 2023 09:41
Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>
@gabrielmbmb gabrielmbmb marked this pull request as ready for review June 15, 2023 07:42
@gabrielmbmb gabrielmbmb requested a review from frascuchon June 15, 2023 07:42
Comment on lines 315 to 318
@classmethod
def from_orm(cls: Type["Record"], obj: Any) -> "Record":
dict_copy = obj.__dict__.copy()
return cls(**{"metadata": dict_copy.pop("metadata_", None), **dict_copy})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Member Author

@gabrielmbmb gabrielmbmb Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the listing records endpoints we were using record.__dict__ to build the Record schema. This is because if we use record directly, then pydantic when creating the Record class will try to access record.responses and sqlalchemy will return an empty list no matter what, so in the output payload we would have "responses": [] even if the client has not asked for the responses.

To fix this, we use record.__dict__ which does not include the responses if they were not included in the query with a joinedload or other load option. Now that we've added the metadata_ attribute, we cannot use record.__dict__ directly to build the Record class, because the field in this class is called metadata and we have to map this manually.

As we had to do this in several endpoints, I've decided that it was a good idea to put this logic in this method.

Copy link
Member

@frascuchon frascuchon Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use an alias then for the record.metadata field? Or use a RecordGetter class to tackle this? I feel this from_orm override is a bit tricky

Copy link
Member Author

@gabrielmbmb gabrielmbmb Jun 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to do the same with the GetterDict and keep the response as unset if the query didn't have the joinedload option. It was as easy as returning the default argument instead of None.

class RecordGetterDict(GetterDict):
    def get(self, key: str, default: Any) -> Any:
        if key == "metadata":
            return getattr(self._obj, "metadata_", None)
        if key == "responses" and "responses" not in self._obj.__dict__:
            return default
        return super().get(key, default)

This way pydantic detects that the field has not been set and in combination with the decorator parameter response_model_exclude_unset=True it works.

@frascuchon frascuchon merged commit 0ea8ee8 into develop Jun 16, 2023
@frascuchon frascuchon deleted the feature/api-record-metadata-column branch June 16, 2023 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add metadata field to the FeedbackRecord
3 participants