Merge branch 'develop' into styles/ui-fine-tuning

* develop: feat: field `PATCH` endpoint (#3421) feat: add dataset `PATCH` endpoint (#3402) fix: import errors when importing from `argilla.feedback` (#3471) feat: bump version to `0.13.3` docs: update example os listing users with python client (#3454) docs: Resolve typos, missing import (#3443)
argilla-io · Jul 31, 2023 · a36dcc7 · a36dcc7
2 parents 72aeafa + 642f322
commit a36dcc7
Show file tree

Hide file tree

Showing 20 changed files with 583 additions and 146 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -16,6 +16,11 @@ These are the section headers that we use:
 
 ## [Unreleased]
 
+### Added
+
+- Added `PATCH /api/v1/fields/{field_id}` endpoint to update the field title and markdown settings (See [#3421](https://github.com/argilla-io/argilla/pull/3421)).
+- Added `PATCH /api/v1/datasets/{dataset_id}` endpoint to update dataset name and guidelines (See [#3402](https://github.com/argilla-io/argilla/pull/3402)).
+
 ### Changed
 
 - Improved efficiency of weak labeling when dataset contains vectors ([#3444](https://github.com/argilla-io/argilla/pull/3444)).
@@ -25,6 +30,13 @@ These are the section headers that we use:
 - Update alembic code to apply migrations to use database async engine ([#3450](https://github.com/argilla-io/argilla/pull/3450)).
 - Limit rating questions values to the positive range [1, 10] (Closes [#3451](https://github.com/argilla-io/argilla/issues/3451)).
 
+## [1.13.3](https://github.com/argilla-io/argilla/compare/v1.13.2...v1.13.3)
+
+### Fixed
+
+- Fixed `ModuleNotFoundError` caused because the `argilla.utils.telemetry` module used in the `ArgillaTrainer` was importing an optional dependency not installed by default ([#3471](https://github.com/argilla-io/argilla/pull/3471)).
+- Fixed `ImportError` caused because the `argilla.client.feedback.config` module was importing `pyyaml` optional dependency not installed by default ([#3471](https://github.com/argilla-io/argilla/pull/3471)).
+
 ## [1.13.2](https://github.com/argilla-io/argilla/compare/v1.13.1...v1.13.2)
 
 ### Fixed

diff --git a/docs/_source/guides/train_a_model.md b/docs/_source/guides/train_a_model.md
@@ -89,6 +89,7 @@ Options:
 
 ```python
 import argilla as rg
+from argilla.training import ArgillaTrainer
 from datasets import load_dataset
 
 dataset_rg = rg.DatasetForTokenClassification.from_datasets(
@@ -126,18 +127,18 @@ It is possible to directly include train-test splits to the `prepare_for_trainin
 *TextClassification*
 
 For text classification tasks, it flattens the inputs into separate columns of the returned dataset and converts the annotations of your records into integers and writes them in a label column:
-By passing the `framework` variable as `setfit`, `transformers`, `spark-nlp` or `spacy`. This task requires a `DatastForTextClassification`.
+By passing the `framework` variable as `setfit`, `transformers`, `spark-nlp` or `spacy`. This task requires a `DatasetForTextClassification`.
 
 
 *TokenClassification*
 
 For token classification tasks, it converts the annotations of a record into integers representing BIO tags and writes them in a `ner_tags` column:
-By passing the `framework` variable as `transformers`, `spark-nlp` or `spacy`.  This task requires a `DatastForTokenClassification`.
+By passing the `framework` variable as `transformers`, `spark-nlp` or `spacy`.  This task requires a `DatasetForTokenClassification`.
 
 *Text2Text*
 
 For text generation tasks like `summarization` and translation tasks, it converts the annotations of a record `text` and `target` columns.
-By passing the `framework` variable as `transformers` and `spark-nlp`.  This task requires a `DatastForText2Text`.
+By passing the `framework` variable as `transformers` and `spark-nlp`.  This task requires a `DatasetForText2Text`.
 
 *Feedback*
 For feedback-oriented datasets, we currently rely on a fully customizable workflow, which means automation is limited and yet to be thought out.

diff --git a/docs/_source/tutorials/notebooks/labelling-tokenclassification-basics.ipynb b/docs/_source/tutorials/notebooks/labelling-tokenclassification-basics.ipynb
@@ -1,7 +1,6 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -16,7 +15,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -38,7 +36,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -70,7 +67,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -89,7 +85,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -106,7 +101,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -128,7 +122,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -138,7 +131,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -159,7 +151,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -181,15 +172,13 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Learn more about loading and creating Argilla datasets [here](../../guides/log_load_and_prepare_data.ipynb)."
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -198,6 +187,32 @@
     "As a first step, we want to get the list of the users that will be annotating our dataset."
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# get workspace where the dataset is (or will be) located\n",
+    "ws = rg.Workspace.from_name(\"my_workspace\")\n",
+    "# get the list of users with access to the workspace\n",
+    "# make sure that all users that will work on the dataset have access to the workspace\n",
+    "# optional: filter users to get only those with annotator role\n",
+    "users = [u for u in rg.User.list() if u.role == \"annotator\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div class=\"alert alert-info\">\n",
+    "\n",
+    "**Note**\n",
+    "\n",
+    "If you are using a version earlier than 1.11.0 you will need to call the API directly to get the list of users as is done in the following cell. Note that, in that case, users will be returned as dictionaries and so `users.username` will be `users['username']` instead.\n",
+    "</div>"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -218,7 +233,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -238,7 +252,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -261,19 +274,17 @@
     "chunked_records = [ds[i:i + n] for i in range(0, len(ds), n)]\n",
     "for chunk in chunked_records:\n",
     "    for idx, record in enumerate(chunk):\n",
-    "        assignments[users[idx]['username']].append(record)"
+    "        assignments[users[idx].username].append(record)"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## 💾 Log your dataset and assignments in Argilla "
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -306,7 +317,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -334,15 +344,13 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Tip: If you plan to have more than one user annotating the same record, we recommend adding an ID to each record before splitting them into several datasets. That way you will be able to retrieve the different annotations for each record when postprocessing the datasets."
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -358,7 +366,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -372,7 +379,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "argilla",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -386,7 +393,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.12"
+   "version": "3.8.10"
   },
   "vscode": {
    "interpreter": {

diff --git a/src/argilla/client/feedback/integrations/huggingface/dataset.py b/src/argilla/client/feedback/integrations/huggingface/dataset.py
@@ -20,7 +20,6 @@
 
 from packaging.version import parse as parse_version
 
-from argilla.client.feedback.config import DatasetConfig, DeprecatedDatasetConfig
 from argilla.client.feedback.constants import FIELD_TYPE_TO_PYTHON_TYPE
 from argilla.client.feedback.schemas import FeedbackRecord
 from argilla.client.feedback.types import AllowedQuestionTypes
@@ -188,6 +187,9 @@ def push_to_huggingface(
         import huggingface_hub
         from huggingface_hub import DatasetCardData, HfApi
 
+        # https://github.com/argilla-io/argilla/issues/3468
+        from argilla.client.feedback.config import DatasetConfig
+
         if parse_version(huggingface_hub.__version__) < parse_version("0.14.0"):
             _LOGGER.warning(
                 "Recommended `huggingface_hub` version is 0.14.0 or higher, and you have"
@@ -261,6 +263,12 @@ def from_huggingface(cls: Type["FeedbackDataset"], repo_id: str, *args: Any, **k
         from huggingface_hub import hf_hub_download
         from huggingface_hub.utils import EntryNotFoundError
 
+        # https://github.com/argilla-io/argilla/issues/3468
+        from argilla.client.feedback.config import (
+            DatasetConfig,
+            DeprecatedDatasetConfig,
+        )
+
         if parse_version(huggingface_hub.__version__) < parse_version("0.14.0"):
             _LOGGER.warning(
                 "Recommended `huggingface_hub` version is 0.14.0 or higher, and you have"

diff --git a/src/argilla/server/apis/v1/handlers/datasets.py b/src/argilla/server/apis/v1/handlers/datasets.py
@@ -28,6 +28,7 @@
     Dataset,
     DatasetCreate,
     Datasets,
+    DatasetUpdate,
     Field,
     FieldCreate,
     Fields,
@@ -378,7 +379,7 @@ async def publish_dataset(
 async def delete_dataset(
     *,
     db: AsyncSession = Depends(get_async_db),
-    search_engine=Depends(get_search_engine),
+    search_engine: SearchEngine = Depends(get_search_engine),
     dataset_id: UUID,
     current_user: User = Security(auth.get_current_user),
 ):
@@ -389,3 +390,18 @@ async def delete_dataset(
     await datasets.delete_dataset(db, search_engine, dataset=dataset)
 
     return dataset
+
+
+@router.patch("/datasets/{dataset_id}", response_model=Dataset)
+async def update_dataset(
+    *,
+    db: AsyncSession = Depends(get_async_db),
+    dataset_id: UUID,
+    dataset_update: DatasetUpdate,
+    current_user: User = Security(auth.get_current_user),
+):
+    dataset = await _get_dataset(db, dataset_id)
+
+    await authorize(current_user, DatasetPolicyV1.update(dataset))
+
+    return await datasets.update_dataset(db, dataset=dataset, dataset_update=dataset_update)
diff --git a/src/argilla/server/apis/v1/handlers/fields.py b/src/argilla/server/apis/v1/handlers/fields.py
@@ -20,25 +20,45 @@
 from argilla.server.contexts import datasets
 from argilla.server.database import get_async_db
 from argilla.server.policies import FieldPolicyV1, authorize
-from argilla.server.schemas.v1.fields import Field
+from argilla.server.schemas.v1.fields import Field, FieldUpdate
 from argilla.server.security import auth
 from argilla.server.security.model import User
 
 router = APIRouter(tags=["fields"])
 
 
-@router.delete("/fields/{field_id}", response_model=Field)
-async def delete_field(
-    *, db: AsyncSession = Depends(get_async_db), field_id: UUID, current_user: User = Security(auth.get_current_user)
-):
+async def _get_field(db: "AsyncSession", field_id: UUID) -> Field:
     field = await datasets.get_field_by_id(db, field_id)
-
-    await authorize(current_user, FieldPolicyV1.delete(field))
     if not field:
         raise HTTPException(
             status_code=status.HTTP_404_NOT_FOUND,
             detail=f"Field with id `{field_id}` not found",
         )
+    return field
+
+
+@router.patch("/fields/{field_id}", response_model=Field)
+async def update_field(
+    *,
+    db: AsyncSession = Depends(get_async_db),
+    field_id: UUID,
+    field_update: FieldUpdate,
+    current_user: User = Security(auth.get_current_user),
+):
+    field = await _get_field(db, field_id)
+
+    await authorize(current_user, FieldPolicyV1.update(field))
+
+    return await datasets.update_field(db, field, field_update)
+
+
+@router.delete("/fields/{field_id}", response_model=Field)
+async def delete_field(
+    *, db: AsyncSession = Depends(get_async_db), field_id: UUID, current_user: User = Security(auth.get_current_user)
+):
+    field = await _get_field(db, field_id)
+
+    await authorize(current_user, FieldPolicyV1.delete(field))
 
     # TODO: We should split API v1 into different FastAPI apps so we can customize error management.
     # After mapping ValueError to 422 errors for API v1 then we can remove this try except.