Merge branch 'develop' into feature/prepare-bulk-annotation

* develop: (21 commits) ✨ Fix error handling in axios plugin for 401 (#4362) docs: Change `telemetry` section in tutorials to directly executable cells (#4399) docs: add faq files (#4363) fix: pinning `pytest-asyncio` to version `0.21.1` to avoid problems running unit tests on GitHub workflows (#4395) docs: add making most of markdown to tutorial page (#4376) Fixing typo in Fine Tuning LLMs Practical Guides (#4392) Token Classification epochs parameter trainer changed (#4393) docs: align practical guidescreate datasethtml with end2end examples structure (#4375) docs: hugging face space url (#4379) docs: extend using proxy section (#4368) chore: update dev version chore: update CHANGELOG.md before release v1.20.0 (#4357) docs: temporal update to indicate persistent storage (#4355) docs: add suggestions and responses filters and sorting (#4345) docs: add end2end example on creating a basic text-classification dataset (#4208) Fix/responses suggestions filter fine tune (#4356) Fix/responses suggestions filter fine tune (#4356) fix: Accept draft responses on dataset records creation (#4354) Feature/responses operator (#4352) Feature/responses operator (#4352) ...
argilla-io · Dec 12, 2023 · ef50946 · ef50946
2 parents 4caf187 + 2729c76
commit ef50946
Show file tree

Hide file tree

Showing 148 changed files with 9,900 additions and 1,587 deletions.
diff --git a/.github/workflows/check-repo-files.yml b/.github/workflows/check-repo-files.yml
@@ -6,6 +6,9 @@ on:
       pythonChanges:
         description: "True if some files in python code have changed"
         value: ${{ jobs.check-repo-files.outputs.pythonChanges }}
+      end2endChanges:
+        description: "True if some files in python code have changed"
+        value: ${{ jobs.check-repo-files.outputs.end2endChanges }}
       buildChanges:
         description: "True if some files affecting the build have changed"
         value: ${{ jobs.check-repo-files.outputs.buildChanges }}
@@ -17,6 +20,7 @@ jobs:
     outputs:
       pythonChanges: ${{ steps.path_filter.outputs.pythonChanges }}
       buildChanges: ${{ steps.path_filter.outputs.buildChanges }}
+      end2endChanges: ${{ steps.path_filter.outputs.end2endChanges }}
     steps:
       - name: Checkout Code 🛎
         uses: actions/checkout@v3
@@ -30,6 +34,12 @@ jobs:
               - 'tests/**'
               - 'pyproject.toml'
               - 'setup.py'
+            end2endChanges:
+              - 'src/**'
+              - 'pyproject.toml'
+              - 'setup.py'
+              - 'scripts/end2end_examples.py'
+              - 'docs/_source/tutorials_and_integrations/tutorials/feedback/end2end_examples/**'
             buildChanges:
               - 'src/**'
               - 'frontend/**'

diff --git a/.github/workflows/end2end-examples.yml b/.github/workflows/end2end-examples.yml
@@ -0,0 +1,98 @@
+name: Run end2end sdk examples
+
+on:
+  workflow_call:
+    inputs:
+      runsOn:
+        required: false
+        type: string
+        default: extended-runner
+      searchEngineDockerImage:
+        description: "The name of the Docker image of the search engine to use."
+        default: docker.elastic.co/elasticsearch/elasticsearch:8.8.2
+        required: false
+        type: string
+      searchEngineDockerEnv:
+        description: "The name of the Docker image of the search engine to use."
+        default: '{"discovery.type": "single-node", "xpack.security.enabled": "false"}'
+        required: false
+        type: string
+env:
+  # Increase this value to reset cache if etc/example-environment.yml has not changed
+  CACHE_NUMBER: 5
+
+jobs:
+  # Runs depending on the result from the check-repo-files.yml
+  call-check-repo-files:
+    uses: ./.github/workflows/check-repo-files.yml
+
+  end2end-examples:
+    name: end2end notebook examples, FeedbackDataset for text-classification
+    runs-on: ${{ inputs.runsOn }}
+    services:
+      search_engine:
+        image: ${{ inputs.searchEngineDockerImage }}
+        ports:
+          - 9200:9200
+        env: ${{ fromJson(inputs.searchEngineDockerEnv) }}
+    defaults:
+      run:
+        shell: bash -l {0}
+    steps:
+      - name: Checkout Code 🛎
+        uses: actions/checkout@v3
+      - name: Setup Conda Env 🐍
+        uses: conda-incubator/setup-miniconda@v2
+        with:
+          miniforge-variant: Mambaforge
+          miniforge-version: latest
+          use-mamba: true
+          activate-environment: argilla
+      - name: Get date for conda cache
+        id: get-date
+        run: echo "::set-output name=today::$(/bin/date -u '+%Y%m%d')"
+        shell: bash
+      - name: Cache Conda env
+        uses: actions/cache@v3
+        id: cache
+        with:
+          path: ${{ env.CONDA }}/envs
+          key: conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('environment_dev.yml') }}-${{ env.CACHE_NUMBER }}
+      - name: Update environment
+        if: steps.cache.outputs.cache-hit != 'true'
+        run: mamba env update -n argilla -f environment_dev.yml
+      - name: Cache pip 👜
+        uses: actions/cache@v3
+        with:
+          path: ~/.cache/pip
+          key: ${{ runner.os }}-pip-${{ env.CACHE_NUMBER }}-${{ hashFiles('pyproject.toml') }}
+      - name: Set huggingface hub credentials
+        if: github.ref == 'refs/heads/main' || github.ref == 'refs/heads/develop' || startsWith(github.ref, 'refs/heads/releases')
+        run: |
+          echo "HF_HUB_ACCESS_TOKEN=${{ secrets.HF_HUB_ACCESS_TOKEN }}" >> "$GITHUB_ENV"
+          echo "Enable HF access token"
+      - name: Set Argilla search engine env variable
+        if: startsWith(inputs.searchEngineDockerImage, 'docker.elastic.co')
+        run: |
+          echo "ARGILLA_SEARCH_ENGINE=elasticsearch" >> "$GITHUB_ENV"
+          echo "Configure elasticsearch engine"
+      - name: Set Argilla search engine env variable
+        if: startsWith(inputs.searchEngineDockerImage, 'opensearchproject')
+        run: |
+          echo "ARGILLA_SEARCH_ENGINE=opensearch" >> "$GITHUB_ENV"
+          echo "Configure opensearch engine"
+      - name: Launch Argilla Server
+        env:
+          ARGILLA_ENABLE_TELEMETRY: 0
+        run: |
+          pip install -e .
+          python -m argilla server database migrate
+          python -m argilla server database users create_default
+          uvicorn argilla.server.app:app --port 6900 --host 0.0.0.0 &
+      - name: Run end2end examples 📈
+        env:
+          ARGILLA_ENABLE_TELEMETRY: 0
+          HF_HUB_ACCESS_TOKEN: ${{ secrets.HF_HUB_ACCESS_TOKEN }}
+        run: |
+          pip install papermill
+          python scripts/end2end_examples.py
diff --git a/.github/workflows/package.yml b/.github/workflows/package.yml
@@ -74,6 +74,28 @@ jobs:
       pytestArgs: tests/unit
     secrets: inherit
 
+  run_end2end_tests:
+    strategy:
+      matrix:
+        include:
+          - searchEngineDockerImage: docker.elastic.co/elasticsearch/elasticsearch:8.8.2
+            searchEngineDockerEnv: '{"discovery.type": "single-node", "xpack.security.enabled": "false"}'
+            coverageReport: coverage-elasticsearch-8.8.2
+            runsOn: extended-runner
+          - searchEngineDockerImage: opensearchproject/opensearch:2.4.1
+            searchEngineDockerEnv: '{"discovery.type": "single-node", "plugins.security.disabled": "true"}'
+            coverageReport: coverage-opensearch-2.4.1
+            runsOn: ubuntu-latest
+    name: Run end2end tests
+    uses: ./.github/workflows/end2end-examples.yml
+    needs: check_repo_files
+    if: needs.check_repo_files.outputs.end2endChanges == 'true'
+    with:
+      runsOn: ${{ matrix.runsOn }}
+      searchEngineDockerImage: ${{ matrix.searchEngineDockerImage }}
+      searchEngineDockerEnv: ${{ matrix.searchEngineDockerEnv }}
+    secrets: inherit
+
   run_unit_test_with_extra_engines:
     strategy:
       matrix:

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,8 +18,14 @@ These are the section headers that we use:
 
 ### Added
 
-- Added `metadata_properties` to the `__repr__` method of the `FeedbackDataset` and `RemoteFeedbackDataset`.([#4192](https://github.com/argilla-io/argilla/pull/4192)).
+- Added strategy to handle and translate errors from server for `401 http status code` ([#4362](https://github.com/argilla-io/argilla/pull/4362))
+
+## [1.20.0](https://github.com/argilla-io/argilla/compare/v1.19.0...v1.20.0)
+
+### Added
+
 - Added `GET /api/v1/datasets/:dataset_id/records/search/suggestions/options` endpoint to return suggestion available options for searching. ([#4260](https://github.com/argilla-io/argilla/pull/4260))
+- Added `metadata_properties` to the `__repr__` method of the `FeedbackDataset` and `RemoteFeedbackDataset`.([#4192](https://github.com/argilla-io/argilla/pull/4192)).
 - Added `get_model_kwargs`, `get_trainer_kwargs`, `get_trainer_model`, `get_trainer_tokenizer` and `get_trainer` -methods to the `ArgillaTrainer` to improve interoperability across frameworks. ([#4214](https://github.com/argilla-io/argilla/pull/4214)).
 - Added additional formatting checks to the `ArgillaTrainer` to allow for better interoperability of `defaults` and `formatting_func` usage. ([#4214](https://github.com/argilla-io/argilla/pull/4214)).
 - Added a warning to the `update_config`-method of `ArgillaTrainer` to emphasize if the `kwargs` were updated correctly. ([#4214](https://github.com/argilla-io/argilla/pull/4214)).
@@ -33,16 +39,19 @@ These are the section headers that we use:
 - Fixed error in the unification strategy for `RankingQuestion` ([#4295](https://github.com/argilla-io/argilla/pull/4295))
 - Fixed `TextClassificationSettings.labels_schema` order was not being preserved. Closes [#3828](https://github.com/argilla-io/argilla/issues/3828) ([#4332](https://github.com/argilla-io/argilla/pull/4332))
 - Fixed error when requesting non-existing API endpoints. Closes [#4073](https://github.com/argilla-io/argilla/issues/4073) ([#4325](https://github.com/argilla-io/argilla/pull/4325))
+- Fixed error when passing `draft` responses to create records endpoint. ([#4354](https://github.com/argilla-io/argilla/pull/4354))
 
 ### Changed
 
 - [breaking] Suggestions `agent` field only accepts now some specific characters and a limited length. ([#4265](https://github.com/argilla-io/argilla/pull/4265))
 - [breaking] Suggestions `score` field only accepts now float values in the range `0` to `1`. ([#4266](https://github.com/argilla-io/argilla/pull/4266))
+- Updated `POST /api/v1/dataset/:dataset_id/records/search` endpoint to support optional `query` attribute. ([#4327](https://github.com/argilla-io/argilla/pull/4327))
+- Updated `POST /api/v1/dataset/:dataset_id/records/search` endpoint to support `filter` and `sort` attributes. ([#4327](https://github.com/argilla-io/argilla/pull/4327))
+- Updated `POST /api/v1/me/datasets/:dataset_id/records/search` endpoint to support optional `query` attribute. ([#4270](https://github.com/argilla-io/argilla/pull/4270))
+- Updated `POST /api/v1/me/datasets/:dataset_id/records/search` endpoint to support `filter` and `sort` attributes. ([#4270](https://github.com/argilla-io/argilla/pull/4270))
 - Changed the logging style while pulling and pushing `FeedbackDataset` to Argilla from `tqdm` style to `rich`. ([#4267](https://github.com/argilla-io/argilla/pull/4267)). Contributed by @zucchini-nlp.
 - Updated `push_to_argilla` to print `repr` of the pushed `RemoteFeedbackDataset` after push and changed `show_progress` to True by default. ([#4223](https://github.com/argilla-io/argilla/pull/4223))
 - Changed `models` and `tokenizer` for the `ArgillaTrainer` to explicitly allow for changing them when needed. ([#4214](https://github.com/argilla-io/argilla/pull/4214)).
-- Updated `POST /api/v1/dataset/:dataset_id/records/search` endpoint to support optional `query` attribute. ([#4327](https://github.com/argilla-io/argilla/pull/4327))
-- Updated `POST /api/v1/dataset/:dataset_id/records/search` endpoint to support `filter` and `sort` attributes. ([#4327](https://github.com/argilla-io/argilla/pull/4327))
 
 ## [1.19.0](https://github.com/argilla-io/argilla/compare/v1.18.0...v1.19.0)