Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add FeedbackDatasetBase and RemoteFeedbackDataset while keeping FeedbackDataset just for local #3465

Merged
merged 57 commits into from
Aug 4, 2023

Conversation

alvarobartt
Copy link
Member

@alvarobartt alvarobartt commented Jul 27, 2023

Description

This PR is still in progress, and the main idea is that we move the common functionality to be used on both local and remote (Argilla) datasets to FeedbackDatasetBase, while we keep FeedbackDataset for working locally with these datasets, and add _ArgillaFeedbackDataset to be instantiated internally via FeedbackDataset.from_argilla and as part of the return statement of FeedbackDataset.push_to_argilla, so as to split the behaviour on some operations such as add_records, since locally means adding those to a local list, while remotely that means pushing those to Argilla, to avoid having to call push_to_argilla right after every record addition.

Some more things are tackled as part of this refactoring and will be listed down below as soon as the PR is out of draft!

Closes #3456

Type of change

  • Refactor (change restructuring the codebase without changing functionality)
  • Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

  • Add unit tests for the different classes and methods, while ensuring backwards compatibility on everything related to the FeedbackDataset

Checklist

  • I added relevant documentation
  • follows the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I filled out the contributor form (see text above)
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

alvarobartt and others added 2 commits July 28, 2023 11:16
# Description

This PR adds a function in the SDK named `get_metrics` that calls `GET
/api/v1/me/datasets/{dataset_id}/metrics` to get the metrics of a
certain dataset from a certain user, meaning that each user would just
see what can be seen according to its permissions.

This PR is created on top of #3465 since we will need to get the
`records.count` to override the magic method `__len__` in
`_ArgillaFeedbackDataset` so that the length of a dataset is directly
retrieved from the record count of the dataset, since we're no longer
keeping local data when working with a remote dataset.

**Type of change**

- [X] New feature (non-breaking change which adds functionality)

**How Has This Been Tested**

- [X] Add unit tests for the `get_metrics` function

**Checklist**

- [ ] I added relevant documentation
- [X] follows the style guidelines of this project
- [X] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [X] My changes generate no new warnings
- [X] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [x] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: Francisco Aranda <francis@argilla.io>
@alvarobartt
Copy link
Member Author

alvarobartt commented Jul 28, 2023

The code below works fine, the only missing piece is fetch_records, that won't be compatible with the upcoming approach of _ArgillaFeedbackRecords to detach the record-related functionality to a separate class i.e. dataset.records.add would be used via the shortcut dataset.add_records

import argilla as rg

ds = rg.FeedbackDataset(...) # Initialises `FeedbackDataset` as always

records = [...]
ds.add_records(records) # Adds records locally

for record in ds.records: # Loops over the local records
    print(record)

print(ds.records[0]) # Prints the record with index 0 locally

rg.init(
    api_url="...",
    api_key="...",
)

ds.push_to_argilla("my-dataset", workspace="alvaro") # Pushes the `FeedbackDataset` to Argilla, and remaps the class instance to be now of type `_ArgillaFeedbackDataset`, also returns it, but to keep backwards compatibility we're also remapping it for the moment

ds = rg.FeedbackDataset.from_argilla("my-dataset", workspace="alvaro") # Retrieves the `FeedbackDataset` from Argilla and returns an `_ArgillaFeedbackDataset` instance

for record in ds: # Iters over the records of the `FeedbackDataset` in Argilla (yield not return)
    print(record.id)

print(ds[0]) # Prints the record with index 0 in the `FeedbackDataset` in Argilla
print(ds[1:4]) # Prints the records within the specified slice in the `FeedbackDataset` in Argilla
print(len(ds)) # Prints the total number of records in the `FeedbackDataset` in Argilla

ds.add_records(...) # Adds records directly in Argilla

ds.push_to_argilla("my-dataset", workspace="alvaro") # Raises a `DeprecationWarning` and mentions that updates are automatic

@alvarobartt alvarobartt self-assigned this Jul 28, 2023
@alvarobartt alvarobartt added type: breaking changes This issue or PR may include breaking changes in the code type: deprecation Indicates a feature that will be deprecated and/or support will be dropped labels Jul 28, 2023
@alvarobartt alvarobartt added this to the v1.14.0 milestone Jul 28, 2023
@alvarobartt alvarobartt changed the title refactor: add FeedbackDatasetBase and _ArgillaFeedbackDataset while keeping FeedbackDataset just for local refactor: add FeedbackDatasetBase and RemoteFeedbackDataset while keeping FeedbackDataset just for local Aug 3, 2023
alvarobartt and others added 2 commits August 3, 2023 12:32
Co-authored-by: Francis Aranda <francis@argilla.io>
Co-authored-by: Gabriel Martin <gabriel@argilla.io>
alvarobartt and others added 5 commits August 3, 2023 16:52
Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Due to the same message being formatted differently depending on the Python version e.g. `abstract methods records` in Python 3.10, but `abstract method records` in Python 3.7
@gabrielmbmb gabrielmbmb merged commit cd65a0f into develop Aug 4, 2023
14 of 15 checks passed
@gabrielmbmb gabrielmbmb deleted the refactor/split-local-and-remote-dataset branch August 4, 2023 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: breaking changes This issue or PR may include breaking changes in the code type: deprecation Indicates a feature that will be deprecated and/or support will be dropped
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python client FeedbackDataset refactor to simplify new user workflows implementation
3 participants