Skip to content

Commit

Permalink
Merge branch 'develop' into styles/ui-fine-tuning
Browse files Browse the repository at this point in the history
* develop:
  feat: field `PATCH` endpoint (#3421)
  feat: add dataset `PATCH` endpoint (#3402)
  fix: import errors when importing from `argilla.feedback` (#3471)
  feat: bump version to `0.13.3`
  docs: update example os listing users with python client (#3454)
  docs: Resolve typos, missing import (#3443)
  • Loading branch information
leire committed Jul 31, 2023
2 parents 72aeafa + 642f322 commit a36dcc7
Show file tree
Hide file tree
Showing 20 changed files with 583 additions and 146 deletions.
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,11 @@ These are the section headers that we use:

## [Unreleased]

### Added

- Added `PATCH /api/v1/fields/{field_id}` endpoint to update the field title and markdown settings (See [#3421](https://github.com/argilla-io/argilla/pull/3421)).
- Added `PATCH /api/v1/datasets/{dataset_id}` endpoint to update dataset name and guidelines (See [#3402](https://github.com/argilla-io/argilla/pull/3402)).

### Changed

- Improved efficiency of weak labeling when dataset contains vectors ([#3444](https://github.com/argilla-io/argilla/pull/3444)).
Expand All @@ -25,6 +30,13 @@ These are the section headers that we use:
- Update alembic code to apply migrations to use database async engine ([#3450](https://github.com/argilla-io/argilla/pull/3450)).
- Limit rating questions values to the positive range [1, 10] (Closes [#3451](https://github.com/argilla-io/argilla/issues/3451)).

## [1.13.3](https://github.com/argilla-io/argilla/compare/v1.13.2...v1.13.3)

### Fixed

- Fixed `ModuleNotFoundError` caused because the `argilla.utils.telemetry` module used in the `ArgillaTrainer` was importing an optional dependency not installed by default ([#3471](https://github.com/argilla-io/argilla/pull/3471)).
- Fixed `ImportError` caused because the `argilla.client.feedback.config` module was importing `pyyaml` optional dependency not installed by default ([#3471](https://github.com/argilla-io/argilla/pull/3471)).

## [1.13.2](https://github.com/argilla-io/argilla/compare/v1.13.1...v1.13.2)

### Fixed
Expand Down
7 changes: 4 additions & 3 deletions docs/_source/guides/train_a_model.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ Options:

```python
import argilla as rg
from argilla.training import ArgillaTrainer
from datasets import load_dataset

dataset_rg = rg.DatasetForTokenClassification.from_datasets(
Expand Down Expand Up @@ -126,18 +127,18 @@ It is possible to directly include train-test splits to the `prepare_for_trainin
*TextClassification*

For text classification tasks, it flattens the inputs into separate columns of the returned dataset and converts the annotations of your records into integers and writes them in a label column:
By passing the `framework` variable as `setfit`, `transformers`, `spark-nlp` or `spacy`. This task requires a `DatastForTextClassification`.
By passing the `framework` variable as `setfit`, `transformers`, `spark-nlp` or `spacy`. This task requires a `DatasetForTextClassification`.


*TokenClassification*

For token classification tasks, it converts the annotations of a record into integers representing BIO tags and writes them in a `ner_tags` column:
By passing the `framework` variable as `transformers`, `spark-nlp` or `spacy`. This task requires a `DatastForTokenClassification`.
By passing the `framework` variable as `transformers`, `spark-nlp` or `spacy`. This task requires a `DatasetForTokenClassification`.

*Text2Text*

For text generation tasks like `summarization` and translation tasks, it converts the annotations of a record `text` and `target` columns.
By passing the `framework` variable as `transformers` and `spark-nlp`. This task requires a `DatastForText2Text`.
By passing the `framework` variable as `transformers` and `spark-nlp`. This task requires a `DatasetForText2Text`.

*Feedback*
For feedback-oriented datasets, we currently rely on a fully customizable workflow, which means automation is limited and yet to be thought out.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -16,7 +15,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -38,7 +36,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -70,7 +67,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -89,7 +85,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -106,7 +101,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -128,7 +122,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -138,7 +131,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -159,7 +151,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -181,15 +172,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Learn more about loading and creating Argilla datasets [here](../../guides/log_load_and_prepare_data.ipynb)."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -198,6 +187,32 @@
"As a first step, we want to get the list of the users that will be annotating our dataset."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get workspace where the dataset is (or will be) located\n",
"ws = rg.Workspace.from_name(\"my_workspace\")\n",
"# get the list of users with access to the workspace\n",
"# make sure that all users that will work on the dataset have access to the workspace\n",
"# optional: filter users to get only those with annotator role\n",
"users = [u for u in rg.User.list() if u.role == \"annotator\"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-info\">\n",
"\n",
"**Note**\n",
"\n",
"If you are using a version earlier than 1.11.0 you will need to call the API directly to get the list of users as is done in the following cell. Note that, in that case, users will be returned as dictionaries and so `users.username` will be `users['username']` instead.\n",
"</div>"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -218,7 +233,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -238,7 +252,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -261,19 +274,17 @@
"chunked_records = [ds[i:i + n] for i in range(0, len(ds), n)]\n",
"for chunk in chunked_records:\n",
" for idx, record in enumerate(chunk):\n",
" assignments[users[idx]['username']].append(record)"
" assignments[users[idx].username].append(record)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 💾 Log your dataset and assignments in Argilla "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -306,7 +317,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -334,15 +344,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Tip: If you plan to have more than one user annotating the same record, we recommend adding an ID to each record before splitting them into several datasets. That way you will be able to retrieve the different annotations for each record when postprocessing the datasets."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -358,7 +366,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -372,7 +379,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "argilla",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -386,7 +393,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.12"
"version": "3.8.10"
},
"vscode": {
"interpreter": {
Expand Down
10 changes: 9 additions & 1 deletion src/argilla/client/feedback/integrations/huggingface/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@

from packaging.version import parse as parse_version

from argilla.client.feedback.config import DatasetConfig, DeprecatedDatasetConfig
from argilla.client.feedback.constants import FIELD_TYPE_TO_PYTHON_TYPE
from argilla.client.feedback.schemas import FeedbackRecord
from argilla.client.feedback.types import AllowedQuestionTypes
Expand Down Expand Up @@ -188,6 +187,9 @@ def push_to_huggingface(
import huggingface_hub
from huggingface_hub import DatasetCardData, HfApi

# https://github.com/argilla-io/argilla/issues/3468
from argilla.client.feedback.config import DatasetConfig

if parse_version(huggingface_hub.__version__) < parse_version("0.14.0"):
_LOGGER.warning(
"Recommended `huggingface_hub` version is 0.14.0 or higher, and you have"
Expand Down Expand Up @@ -261,6 +263,12 @@ def from_huggingface(cls: Type["FeedbackDataset"], repo_id: str, *args: Any, **k
from huggingface_hub import hf_hub_download
from huggingface_hub.utils import EntryNotFoundError

# https://github.com/argilla-io/argilla/issues/3468
from argilla.client.feedback.config import (
DatasetConfig,
DeprecatedDatasetConfig,
)

if parse_version(huggingface_hub.__version__) < parse_version("0.14.0"):
_LOGGER.warning(
"Recommended `huggingface_hub` version is 0.14.0 or higher, and you have"
Expand Down
18 changes: 17 additions & 1 deletion src/argilla/server/apis/v1/handlers/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
Dataset,
DatasetCreate,
Datasets,
DatasetUpdate,
Field,
FieldCreate,
Fields,
Expand Down Expand Up @@ -378,7 +379,7 @@ async def publish_dataset(
async def delete_dataset(
*,
db: AsyncSession = Depends(get_async_db),
search_engine=Depends(get_search_engine),
search_engine: SearchEngine = Depends(get_search_engine),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
):
Expand All @@ -389,3 +390,18 @@ async def delete_dataset(
await datasets.delete_dataset(db, search_engine, dataset=dataset)

return dataset


@router.patch("/datasets/{dataset_id}", response_model=Dataset)
async def update_dataset(
*,
db: AsyncSession = Depends(get_async_db),
dataset_id: UUID,
dataset_update: DatasetUpdate,
current_user: User = Security(auth.get_current_user),
):
dataset = await _get_dataset(db, dataset_id)

await authorize(current_user, DatasetPolicyV1.update(dataset))

return await datasets.update_dataset(db, dataset=dataset, dataset_update=dataset_update)
34 changes: 27 additions & 7 deletions src/argilla/server/apis/v1/handlers/fields.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,45 @@
from argilla.server.contexts import datasets
from argilla.server.database import get_async_db
from argilla.server.policies import FieldPolicyV1, authorize
from argilla.server.schemas.v1.fields import Field
from argilla.server.schemas.v1.fields import Field, FieldUpdate
from argilla.server.security import auth
from argilla.server.security.model import User

router = APIRouter(tags=["fields"])


@router.delete("/fields/{field_id}", response_model=Field)
async def delete_field(
*, db: AsyncSession = Depends(get_async_db), field_id: UUID, current_user: User = Security(auth.get_current_user)
):
async def _get_field(db: "AsyncSession", field_id: UUID) -> Field:
field = await datasets.get_field_by_id(db, field_id)

await authorize(current_user, FieldPolicyV1.delete(field))
if not field:
raise HTTPException(
status_code=status.HTTP_404_NOT_FOUND,
detail=f"Field with id `{field_id}` not found",
)
return field


@router.patch("/fields/{field_id}", response_model=Field)
async def update_field(
*,
db: AsyncSession = Depends(get_async_db),
field_id: UUID,
field_update: FieldUpdate,
current_user: User = Security(auth.get_current_user),
):
field = await _get_field(db, field_id)

await authorize(current_user, FieldPolicyV1.update(field))

return await datasets.update_field(db, field, field_update)


@router.delete("/fields/{field_id}", response_model=Field)
async def delete_field(
*, db: AsyncSession = Depends(get_async_db), field_id: UUID, current_user: User = Security(auth.get_current_user)
):
field = await _get_field(db, field_id)

await authorize(current_user, FieldPolicyV1.delete(field))

# TODO: We should split API v1 into different FastAPI apps so we can customize error management.
# After mapping ValueError to 422 errors for API v1 then we can remove this try except.
Expand Down
Loading

0 comments on commit a36dcc7

Please sign in to comment.