All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Fixed PostgreSQL database not being updated after
begin_nested
because of missingcommit
(#3567).
- Added
PATCH /api/v1/fields/{field_id}
endpoint to update the field title and markdown settings (#3421). - Added
PATCH /api/v1/datasets/{dataset_id}
endpoint to update dataset name and guidelines (#3402). - Added
PATCH /api/v1/questions/{question_id}
endpoint to update question title, description and some settings (depending on the type of question) (#3477). - Added
DELETE /api/v1/records/{record_id}
endpoint to remove a record given its ID (#3337). - Added
pull
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) to pull all the records from it and return it as a local copy as aFeedbackDataset
(#3465). - Added
delete
method inRemoteFeedbackDataset
(aFeedbackDataset
pushed to Argilla) (#3512). - Added
delete_records
method inRemoteFeedbackDataset
, anddelete
method inRemoteFeedbackRecord
to delete records from Argilla (#3526).
- Improved efficiency of weak labeling when dataset contains vectors (#3444).
- Added
ArgillaDatasetMixin
to detach the Argilla-related functionality from theFeedbackDataset
(#3427) - Moved
FeedbackDataset
-relatedpydantic.BaseModel
schemas toargilla.client.feedback.schemas
instead, to be better structured and more scalable and maintainable (#3427) - Update CLI to use database async connection (#3450).
- Limit rating questions values to the positive range [1, 10] (#3451).
- Updated
POST /api/users
endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated Python client
User.create
method to be able to provide a list of workspace names to which the user should be linked to (#3462). - Updated
GET /api/v1/me/datasets/{dataset_id}/records
endpoint to allow getting records matching one of the response statuses provided via query param (#3359). - Updated
POST /api/v1/me/datasets/{dataset_id}/records
endpoint to allow searching records matching one of the response statuses provided via query param (#3359). - Updated
SearchEngine.search
method to allow searching records matching one of the response statuses provided (#3359). - After calling
FeedbackDataset.push_to_argilla
, the methodsFeedbackDataset.add_records
andFeedbackRecord.set_suggestions
will automatically call Argilla with no need of callingpush_to_argilla
explicitly (#3465). - Now calling
FeedbackDataset.push_to_huggingface
dumps theresponses
as aList[Dict[str, Any]]
instead ofSequence
to make it more readable via 🤗datasets
(#3539).
- Fixed issue with
bool
values anddefault
from Jinja2 while generating the HuggingFaceDatasetCard
fromargilla_template.md
(#3499). - Fixed
DatasetConfig.from_yaml
which was failing when callingFeedbackDataset.from_huggingface
as the UUIDs cannot be deserialized automatically byPyYAML
, so UUIDs are neither dumped nor loaded anymore (#3502). - Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
TextClassificationSettings
andTokenClassificationSettings
labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).- Fixed
PUT /api/v1/datasets/{dataset_id}/publish
to check whether at least one field and question hasrequired=True
(#3511). - Fixed
FeedbackDataset.from_huggingface
assuggestions
were being lost when there were noresponses
(#3539). - Fixed
QuestionSchema
andFieldSchema
not validatingname
attribute (#3550).
- After calling
FeedbackDataset.push_to_argilla
, callingpush_to_argilla
again won't do anything since the dataset is already pushed to Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, callingfetch_records
won't do anything since the records are lazily fetched from Argilla (#3465). - After calling
FeedbackDataset.push_to_argilla
, the Argilla ID is no longer stored in the attribute/propertyargilla_id
but inid
instead (#3465).
- Fixed
ModuleNotFoundError
caused because theargilla.utils.telemetry
module used in theArgillaTrainer
was importing an optional dependency not installed by default (#3471). - Fixed
ImportError
caused because theargilla.client.feedback.config
module was importingpyyaml
optional dependency not installed by default (#3471).
- The
suggestion_type_enum
ENUM data type created in PostgreSQL didn't have any value (#3445).
- Fix database migration for PostgreSQL (See #3438)
- Added
GET /api/v1/users/{user_id}/workspaces
endpoint to list the workspaces to which a user belongs (#3308 and #3343). - Added
HuggingFaceDatasetMixin
for internal usage, to detach theFeedbackDataset
integrations from the class itself, and use Mixins instead (#3326). - Added
GET /api/v1/records/{record_id}/suggestions
API endpoint to get the list of suggestions for the responses associated to a record (#3304). - Added
POST /api/v1/records/{record_id}/suggestions
API endpoint to create a suggestion for a response associated to a record (#3304). - Added support for
RankingQuestionStrategy
,RankingQuestionUnification
and the.for_text_classification
method for theTrainingTaskMapping
(#3364) - Added
PUT /api/v1/records/{record_id}/suggestions
API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391). - Added
suggestions
attribute toFeedbackRecord
, and allow adding and retrieving suggestions from the Python client (#3370) - Added
allowed_for_roles
Python decorator to check whether the current user has the required role to access the decorated function/method forUser
andWorkspace
(#3383) - Added API and Python Client support for workspace deletion (Closes #3260)
- Added
GET /api/v1/me/workspaces
endpoint to list the workspaces of the current active user (#3390)
- Updated output payload for
GET /api/v1/datasets/{dataset_id}/records
,GET /api/v1/me/datasets/{dataset_id}/records
,POST /api/v1/me/datasets/{dataset_id}/records/search
endpoints to include the suggestions of the records based on the value of theinclude
query parameter (#3304). - Updated
POST /api/v1/datasets/{dataset_id}/records
input payload to add suggestions (#3304). - The
POST /api/datasets/:dataset-id/:task/bulk
endpoints don't create the dataset if does not exists (Closes #3244) - Added Telemetry support for
ArgillaTrainer
(closes #3325) User.workspaces
is no longer an attribute but a property, and is callinglist_user_workspaces
to list all the workspace names for a given user ID (#3334)- Renamed
FeedbackDatasetConfig
toDatasetConfig
and export/import from YAML as default instead of JSON (just used internally onpush_to_huggingface
andfrom_huggingface
methods ofFeedbackDataset
) (#3326). - The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
- Updated
Dockerfile
parent image frompython:3.9.16-slim
topython:3.10.12-slim
(#3425). - Updated
quickstart.Dockerfile
parent image fromelasticsearch:8.5.3
toargilla/argilla-server:${ARGILLA_VERSION}
(#3425).
- Removed support to non-prefixed environment variables. All valid env vars start with
ARGILLA_
(See #3392).
- Fixed
GET /api/v1/me/datasets/{dataset_id}/records
endpoint returning always the responses for the records even ifresponses
was not provided via theinclude
query parameter (#3304). - Values for protected metadata fields are not truncated (Closes #3331).
- Big number ids are properly rendered in UI (Closes #3265)
- Fixed
ArgillaDatasetCard
to include the values/labels for all the existing questions (#3366)
- Integer support for record id in text classification, token classification and text2text datasets.
- Using
rg.init
with defaultargilla
user skips setting the default workspace if not available. (Closes #3340) - Resolved wrong import structure for
ArgillaTrainer
andTrainingTaskMapping
(Closes #3345) - Pin pydantic dependency to version < 2 (Closes 3348)
- Added
RankingQuestionSettings
class allowing to create ranking questions in the API usingPOST /api/v1/datasets/{dataset_id}/questions
endpoint (#3232) - Added
RankingQuestion
in the Python client to create ranking questions (#3275). - Added
Ranking
component in feedback task question form (#3177 & #3246). - Added
FeedbackDataset.prepare_for_training
method for generaring a framework-specific dataset with the responses provided forRatingQuestion
,LabelQuestion
andMultiLabelQuestion
(#3151). - Added
ArgillaSpaCyTransformersTrainer
class for supporting the training withspacy-transformers
(#3256).
- Added instructions for how to run the Argilla frontend in the developer docs (#3314).
- All docker related files have been moved into the
docker
folder (#3053). release.Dockerfile
have been renamed toDockerfile
(#3133).- Updated
rg.load
function to raise aValueError
with a explanatory message for the cases in which the user tries to use the function to load aFeedbackDataset
(#3289). - Updated
ArgillaSpaCyTrainer
to allow re-usingtok2vec
(#3256).
- Check available workspaces on Argilla on
rg.set_workspace
(Closes #3262)
- Replaced
np.float
alias byfloat
to avoidAttributeError
when usingfind_label_errors
function withnumpy>=1.24.0
(#3214). - Fixed
format_as("datasets")
when no responses or optional respones inFeedbackRecord
, to set their value to what 🤗 Datasets expects instead of justNone
(#3224). - Fixed
push_to_huggingface()
whengenerate_card=True
(default behaviour), as we were passing a sample record to theArgillaDatasetCard
class, andUUID
s introduced in 1.10.0 (#3192), are not JSON-serializable (#3231). - Fixed
from_argilla
andpush_to_argilla
to ensure consistency on both field and question re-construction, and to ensureUUID
s are properly serialized asstr
, respectively (#3234). - Refactored usage of
import argilla as rg
to clarify package navigation (#3279).
- Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
- Fixed library buttons' formatting on Tutorials page (#3255).
- Modified styling of error code outputs in notebooks (#3270).
- Added ElasticSearch and OpenSearch versions (#3280).
- Removed template notebook from table of contents (#3271).
- Fixed tutorials with
pip install argilla
to not use older versions of the package (#3282).
- Added
metadata
attribute to theRecord
of theFeedbackDataset
(#3194) - New
users update
command to update the role for an existing user (#3188) - New
Workspace
class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180) - Added
User
class to let users manage their Argilla users via the Python client (#3169). - Added an option to display
tqdm
progress bar toFeedbackDataset.push_to_argilla
when looping over the records to upload (#3233).
- The role system now support three different roles
owner
,admin
andannotator
(#3104) admin
role is scoped to workspace-level operations (#3115)- The
owner
user is created among the default pool of users in the quickstart, and the default user in the server has nowowner
role (#3248), reverting (#3188).
- As of Python 3.7 end-of-life (EOL) on 2023-06-27, Argilla will no longer support Python 3.7 (#3188). More information at https://peps.python.org/pep-0537/
- Added search component for feedback datasets (#3138)
- Added markdown support for feedback dataset guidelines (#3153)
- Added Train button for feedback datasets (#3170)
- Updated
SearchEngine
andPOST /api/v1/me/datasets/{dataset_id}/records/search
to return thetotal
number of records matching the search query (#3166)
- Replaced Enum for string value in URLs for client API calls (Closes #3149)
- Resolve breaking issue with
ArgillaSpanMarkerTrainer
for Named Entity Recognition withspan_marker
v1.1.x onwards. - Move
ArgillaDatasetCard
import under@requires_version
decorator, so that theImportError
onhuggingface_hub
is handled properly (#3174) - Allow flow
FeedbackDataset.from_argilla
->FeedbackDataset.push_to_argilla
under different dataset names and/or workspaces (#3192)
- Added boolean
use_markdown
property toTextFieldSettings
model. - Added boolean
use_markdown
property toTextQuestionSettings
model. - Added new status
draft
for theResponse
model. - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API (#3005) - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API (#3010). - Added
POST /api/v1/me/datasets/{dataset_id}/records/search
endpoint (#3068). - Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
- Added docstrings to the
pydantic.BaseModel
s defined atargilla/client/feedback/schemas.py
(#3137) - Added the information about executing tests in the developer documentation ([#3143]).
- Updated
GET /api/v1/me/datasets/:dataset_id/metrics
output payload to include the count of responses withdraft
status. - Added
LabelSelectionQuestionSettings
class allowing to create label selection (single-choice) questions in the API. - Added
MultiLabelSelectionQuestionSettings
class allowing to create multi-label selection (multi-choice) questions in the API. - Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
- Updated
alembic
setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044) - Improved
DatasetCard
generation onFeedbackDataset.push_to_huggingface
whengenerate_card=True
, following the official HuggingFace Hub template, but suited toFeedbackDataset
s from Argilla (#3110)
- Disallow
fields
andquestions
inFeedbackDataset
with the same name (#3126). - Fixed broken links in the documentation and updated the development branch name from
development
todevelop
([#3145]).
/api/v1/datasets
new endpoint to list and create datasets (#2615)./api/v1/datasets/{dataset_id}
new endpoint to get and delete datasets (#2615)./api/v1/datasets/{dataset_id}/publish
new endpoint to publish a dataset (#2615)./api/v1/datasets/{dataset_id}/questions
new endpoint to list and create dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields
new endpoint to list and create dataset fields (#2615)/api/v1/datasets/{dataset_id}/questions/{question_id}
new endpoint to delete a dataset questions (#2615)/api/v1/datasets/{dataset_id}/fields/{field_id}
new endpoint to delete a dataset field (#2615)/api/v1/workspaces/{workspace_id}
new endpoint to get workspaces by id (#2615)/api/v1/responses/{response_id}
new endpoint to update and delete a response (#2615)/api/v1/datasets/{dataset_id}/records
new endpoint to create and list dataset records (#2615)/api/v1/me/datasets
new endpoint to list user visible datasets (#2615)/api/v1/me/dataset/{dataset_id}/records
new endpoint to list dataset records with user responses (#2615)/api/v1/me/datasets/{dataset_id}/metrics
new endpoint to get the dataset user metrics (#2615)/api/v1/me/records/{record_id}/responses
new endpoint to create record user responses (#2615)- showing new feedback task datasets in datasets list ([#2719])
- new page for feedback task ([#2680])
- show feedback task metrics ([#2822])
- user can delete dataset in dataset settings page ([#2792])
- Support for
FeedbackDataset
in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003]) - Integration with the HuggingFace Hub ([#2949])
- Added
ArgillaPeftTrainer
for text and token classificaiton #2854 - Added
predict_proba()
method toArgillaSetFitTrainer
- Added
ArgillaAutoTrainTrainer
for Text Classification #2664 - New
database revisions
command showing database revisions info
- Avoid rendering html for invalid html strings in Text2text ([#2911]argilla-io/argilla#2911)
- The
database migrate
command accepts a--revision
param to provide specific revision id tokens_length
metrics function returns empty data (#3045)token_length
metrics function returns empty data (#3045)mention_length
metrics function returns empty data (#3045)entity_density
metrics function returns empty data (#3045)
- Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
tokens_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)token_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)mention_length
metrics function has been deprecated and will be removed in 1.10.0 (#3045)entity_density
metrics function has been deprecated and will be removed in 1.10.0 (#3045)
- Removed mention
density
,tokens_length
andchars_length
metrics from token classification metrics storage (#3045) - Removed token
char_start
,char_end
,tag
, andscore
metrics from token classification metrics storage (#3045) - Removed tags-related metrics from token classification metrics storage (#3045)
- add
max_retries
andnum_threads
parameters torg.log
to run data logging request concurrently with backoff retry policy. See #2458 and #2533 rg.load
acceptsinclude_vectors
andinclude_metrics
when loading data. Closes #2398- Added
settings
param toprepare_for_training
(#2689) - Added
prepare_for_training
foropenai
(#2658) - Added
ArgillaOpenAITrainer
(#2659) - Added
ArgillaSpanMarkerTrainer
for Named Entity Recognition (#2693) - Added
ArgillaTrainer
CLI support. Closes (#2809)
- fix image alignment on token classification
- Argilla quickstart image dependencies are externalized into
quickstart.requirements.txt
. See #2666 - bulk endpoints will upsert data when record
id
is present. Closes #2535 - moved from
click
totyper
CLI support. Closes (#2815) - Argilla server docker image is built with PostgreSQL support. Closes #2686
- The
rg.log
computes all batches and raise an error for all failed batches. - The default batch size for
rg.log
is now 100.
argilla.training
bugfixes and unification (#2665)- Resolved several small bugs in the
ArgillaTrainer
.
- The
rg.log_async
function is deprecated and will be removed in next minor release.
ARGILLA_HOME_PATH
new environment variable (#2564).ARGILLA_DATABASE_URL
new environment variable (#2564).- Basic support for user roles with
admin
andannotator
(#2564). id
,first_name
,last_name
,role
,inserted_at
andupdated_at
new user fields (#2564)./api/users
new endpoint to list and create users (#2564)./api/users/{user_id}
new endpoint to delete users (#2564)./api/workspaces
new endpoint to list and create workspaces (#2564)./api/workspaces/{workspace_id}/users
new endpoint to list workspace users (#2564)./api/workspaces/{workspace_id}/users/{user_id}
new endpoint to create and delete workspace users (#2564).argilla.tasks.users.migrate
new task to migrate users from old YAML file to database (#2564).argilla.tasks.users.create
new task to create a user (#2564).argilla.tasks.users.create_default
new task to create a user with default credentials (#2564).argilla.tasks.database.migrate
new task to execute database migrations (#2564).release.Dockerfile
andquickstart.Dockerfile
now creates a defaultargilladata
volume to persist data (#2564).- Add user settings page. Closes #2496
- Added
Argilla.training
module with support forspacy
,setfit
, andtransformers
. Closes #2504
- Now the
prepare_for_training
method is working whenmulti_label=True
. Closes #2606
ARGILLA_USERS_DB_FILE
environment variable now it's only used to migrate users from YAML file to database (#2564).full_name
user field is now deprecated andfirst_name
andlast_name
should be used instead (#2564).password
user field now requires a minimum of8
and a maximum of100
characters in size (#2564).quickstart.Dockerfile
image default users fromteam
andargilla
toadmin
andannotator
including new passwords and API keys (#2564).- Datasets to be managed only by users with
admin
role (#2564). - The list of rules is now accessible while metrics are computed. Closes#2117
- Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648
email
user field (#2564).disabled
user field (#2564).- Support for private workspaces (#2564).
ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY
andARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD
environment variables. Usepython -m argilla.tasks.users.create_default
instead (#2564).- The old headers for
API Key
andworkspace
from python client - The default value for old
API Key
constant. Closes #2251
1.5.1 - 2023-03-30
- Copying datasets between workspaces with proper owner/workspace info. Closes #2562
- Copy dataset with empty workspace to the default user workspace 905d4de
- Using elasticsearch config to request backend version. Closes #2311
- Remove sorting by score in labels. Closes #2622
- Update field name in metadata for image url. See #2609
- Improvements in tutorial doc cards. Closes #2216
1.5.0 - 2023-03-21
- Add the fields to retrieve when loading the data from argilla.
rg.load
takes too long because of the vector field, even when users don't need it. Closes #2398 - Add new page and components for dataset settings. Closes #2442
- Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
- Non-searchable fields support in metadata. #2570
- Add record ID references to the prepare for training methods. Closes #2483
- Add tutorial on Image Classification. #2420
- Add Train button, visible for "admin" role, with code snippets from a selection of libraries. Closes [#2591] (argilla-io/argilla#2591)
- Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see argilla-io/argilla#2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
- The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
- Update "Define a labeling schema" section in docs.
- The record inputs are sorted alphabetically in UI by default. #2581
- The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587
- Allow URL to be clickable in Jupyter notebook again. Closes #2527
- Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client
<v1.3.0
- Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version
<1.3.0
- Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.