Merge livekit-agent 0.9.0 (#4)

* Fix deepgram English check (livekit#625) * Cartesia bump to 0.4.0 (livekit#624) * Introduce manual package release (livekit#626) * Use the correct working directory in the manual publish job (livekit#627) * Modified RAG plugin (livekit#629) Co-authored-by: Théo Monnom <theo.monnom@outlook.com> * Revert "nltk: fix broken punkt download" (livekit#630) * Expose WorkerType explicitly (livekit#632) * openai: allow sending user IDs (livekit#633) * silero: fix vad padding & choppy audio (livekit#631) * ipc: use our own duplex instead of mp.Queue (livekit#634) * llm: fix optional arguments & non-hashable list (livekit#637) * Add agent_name to WorkerOptions (livekit#636) * Support OpenAI Assistants API (livekit#601) * voiceassistant: fix will_synthesize_assistant_reply race (livekit#638) * silero: adjust vad activation threshold (livekit#639) * Version Packages (livekit#615) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * voiceassistant: fix llm not having the full chat context on bad interruption timing (livekit#640) * livekit-plugins-browser: handle mouse/keyboard inputs on devmode (livekit#644) * nltk: fix another semver break (livekit#647) * livekit-plugins-browser: python API (livekit#645) * Delete test.py (livekit#652) * livekit-plugins-browser: prepare for release (livekit#653) * Version Packages (livekit#641) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Revert "Version Packages" (livekit#659) * fix release workflow (livekit#661) * Version Packages (livekit#660) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Add ServerMessage.termination handler (livekit#635) Co-authored-by: Théo Monnom <theo.8bits@gmail.com> * Introduce anthropic plugin (livekit#655) * fix uninitialized SpeechHandle error on interruption (livekit#665) * voiceassistant: avoid stacking assistant replies when allow_interruptions=False (livekit#667) * fix: disconnect event may now have some arguments (livekit#668) * Anthropic requires the first message to be a non empty 'user' role (livekit#669) * support clova speech (livekit#439) * Updated readme with LLM options (livekit#671) * Update README.md (livekit#666) * plugins: add docstrings explaining API keys (livekit#672) * Disable anthropic test due to 429s (livekit#675) * Remove duplicate entry from plugin table (livekit#673) * Version Packages (livekit#662) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * deepgram: switch the default model to phonecall (livekit#676) * update livekit to 0.14.0 and await tracksubscribed (livekit#678) * Fix Google STT exception when no valid speech is recognized (livekit#680) * Introduce easy api for starting tasks for remote participants (livekit#679) * examples: document how to log chats (livekit#685) * Version Packages (livekit#677) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * voiceassistant: keep punctuations when sending agent transcription (livekit#648) * Pass context into participant entrypoint (livekit#694) * Version Packages (livekit#693) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Update examples to use participant_entrypoint (livekit#695) * voiceassistant: add VoiceAssistantState (livekit#654) Co-authored-by: Théo Monnom <theo.8bits@gmail.com> * Fix anthropic package publishing (livekit#701) * fix non pickleable log (livekit#691) * Revert "Update examples to use participant_entrypoint" (livekit#702) * google-tts: ignore wav header (livekit#703) * fix examples (livekit#704) * skip processing of choice.delta when it is None (livekit#705) * delete duplicate code (livekit#707) * voiceassistant: skip speech initialization if interrupted (livekit#715) * Ensure room.name is available before connection (livekit#716) * Add deepseek LLMs at OpenAI plugin (livekit#714) * add threaded job runners (livekit#684) * voiceassistant: add before_tts_cb callback (livekit#706) * voiceassistant: fix mark_audio_segment_end with no audio data (livekit#719) * add JobContext.wait_for_participant (livekit#712) * Enable Google TTS with application default credentials (livekit#721) * improve gracefully_cancel logic (livekit#720) * bump required livekit version to 0.15.2 (livekit#722) * elevenlabs: expose enable_ssml_parsing (livekit#723) * Version Packages (livekit#697) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * release anthropic (livekit#724) * Version Packages (livekit#725) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Update examples to use wait_for_participant (livekit#726) Co-authored-by: Théo Monnom <theo.8bits@gmail.com> * Introduce function calling to OpenAI Assistants (livekit#710) Co-authored-by: Théo Monnom <theo.8bits@gmail.com> * tts_forwarder: don't raise inside mark_{audio,text}_segment_end when nothing was pushed (livekit#730) * Add Cerebras to OpenAI Plugin (livekit#731) * Fixes to Anthropic Function Calling (livekit#708) * ci: don't run tests on forks (livekit#739) * Only send actual audio to Deepgram (livekit#738) * Add support for cartesia voice control (livekit#740) Co-authored-by: Théo Monnom <theo.8bits@gmail.com> * Version Packages (livekit#727) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Allow setting LLM temperature with VoiceAssistant (livekit#741) * Update STT sample README (livekit#709) * avoid returning tiny frames from TTS (livekit#747) * run tests on main (and make skipping clearer) (livekit#748) * voiceassistant: avoid tiny frames on playout (livekit#750) * limit concurrent process init to 1 (livekit#751) * windows: default to threaded executor & fix dev mode (livekit#755) * improve graceful shutdown (livekit#756) * better dev defaults (livekit#762) * 11labs: send phoneme in one entire xml chunk (livekit#766) * ipc: fix process not starting if num_idle_processes is zero (livekit#763) * limit noisy logs & keep the root logger info (livekit#768) * use os.exit to exit forcefully (livekit#770) * Fix Assistant API Vision Capabilities (livekit#771) * voiceassistant: allow to cancel llm generation inside before_llm_cb (livekit#753) * Remove useless logs (livekit#773) * voiceassistant: expose min_endpointing_delay (livekit#752) * Add typing-extensions as a dependency (livekit#778) * rename voice_assistant.state to agent.state (livekit#772) Co-authored-by: aoife cassidy <aoife@livekit.io> * bump rtc (livekit#782) * Version Packages (livekit#744) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * added livekit-plugins-playht text-to-speech (livekit#735) * Fix function for OpenAI Assistants (livekit#784) * fix the problem of infinite loop when agent speech is interrupted (livekit#790) --------- Co-authored-by: David Zhao <dz@livekit.io> Co-authored-by: Neil Dwyer <neildwyer1991@gmail.com> Co-authored-by: Alejandro Figar Gutierrez <afigar@me.com> Co-authored-by: Théo Monnom <theo.monnom@outlook.com> Co-authored-by: Théo Monnom <theo.8bits@gmail.com> Co-authored-by: aoife cassidy <aoife@livekit.io> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: josephkieu <168809198+josephkieu@users.noreply.github.com> Co-authored-by: Mehadi Hasan Menon <104126711+mehadi92@users.noreply.github.com> Co-authored-by: lukasIO <mail@lukasseiler.de> Co-authored-by: xsg22 <111886011+xsg22@users.noreply.github.com> Co-authored-by: Yuan He <183649+lenage@users.noreply.github.com> Co-authored-by: Ryan Sinnet <rsinnet@users.noreply.github.com> Co-authored-by: Henry Tu <henry@henrytu.me> Co-authored-by: Ben Cherry <bcherry@gmail.com> Co-authored-by: Jaydev <jaydevjadav.015@gmail.com> Co-authored-by: Jax <anyetiangong@qq.com>
okolabs · Sep 26, 2024 · 8c4c075 · 8c4c075
1 parent 75d2e54
commit 8c4c075
Show file tree

Hide file tree

Showing 227 changed files with 9,974 additions and 2,632 deletions.
diff --git a/.changeset/cuddly-eels-sin.md b/.changeset/cuddly-eels-sin.md
diff --git a/.changeset/five-planes-drum.md b/.changeset/five-planes-drum.md
diff --git a/.changeset/itchy-ligers-exist.md b/.changeset/itchy-ligers-exist.md
diff --git a/.changeset/lazy-cups-cross.md b/.changeset/lazy-cups-cross.md
diff --git a/.changeset/moody-doors-poke.md b/.changeset/moody-doors-poke.md
@@ -0,0 +1,5 @@
+---
+"livekit-agents": patch
+---
+
+fix VoiceAssisstant being stuck when interrupting before user speech is committed
diff --git a/.changeset/proud-birds-press.md b/.changeset/proud-birds-press.md
diff --git a/.changeset/red-taxis-smoke.md b/.changeset/red-taxis-smoke.md
diff --git a/.changeset/shaggy-apes-matter.md b/.changeset/shaggy-apes-matter.md
diff --git a/.changeset/tidy-years-refuse.md b/.changeset/tidy-years-refuse.md
@@ -0,0 +1,6 @@
+---
+"livekit-agents": patch
+"livekit-plugins-openai": patch
+---
+
+Fix function for OpenAI Assistants
diff --git a/.github/workflows/build-package.yml b/.github/workflows/build-package.yml
@@ -0,0 +1,98 @@
+name: Build package
+
+on:
+  workflow_call:
+    inputs:
+      package:
+        required: true
+        type: string
+      artifact_name:
+        required: true
+        type: string
+  workflow_dispatch:
+    inputs:
+      package:
+        description: 'Name of the package to build'
+        required: true
+        default: 'livekit-plugins-browser'
+      artifact_name:
+        description: 'Artifact name for the distribution package'
+        required: true
+        default: 'build-artifact'
+
+jobs:
+  build_plugins:
+    runs-on: ubuntu-latest
+    if: |
+      inputs.package == 'livekit-agents' ||
+      inputs.package == 'livekit-plugins-azure' ||
+      inputs.package == 'livekit-plugins-cartesia' ||
+      inputs.package == 'livekit-plugins-deepgram' ||
+      inputs.package == 'livekit-plugins-elevenlabs' ||
+      inputs.package == 'livekit-plugins-google' ||
+      inputs.package == 'livekit-plugins-minimal' ||
+      inputs.package == 'livekit-plugins-nltk' ||
+      inputs.package == 'livekit-plugins-openai' ||
+      inputs.package == 'livekit-plugins-rag' ||
+      inputs.package == 'livekit-plugins-silero' ||
+      inputs.package == 'livekit-plugins-anthropic'
+
+    defaults:
+      run:
+        working-directory: "${{ startsWith(inputs.package, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ inputs.package }}"
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.9"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install build
+
+      - name: Build package
+        run: python -m build
+
+      - name: Upload distribution package
+        uses: actions/upload-artifact@v3
+        with:
+          name: ${{ inputs.artifact_name }}
+          path: "${{ startsWith(inputs.package, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ inputs.package }}/dist/"
+
+  build_browser:
+    if: inputs.package == 'livekit-plugins-browser'
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        os: [macos-14] # TODO(theomonnom): other platforms
+
+    defaults:
+      run:
+        working-directory: livekit-plugins/livekit-plugins-browser
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: "3.9"
+
+      - name: Install cibuildwheel
+        run: |
+          python -m pip install --upgrade pip
+          pip install cibuildwheel
+
+      - name: Build wheels
+        run: cibuildwheel --output-dir dist
+        env:
+          CIBW_SKIP: pp* cp313-*
+          CIBW_BUILD_VERBOSITY: 3
+
+      - name: Upload distribution package
+        uses: actions/upload-artifact@v3
+        with:
+          name: ${{ inputs.artifact_name }}
+          path: livekit-plugins/livekit-plugins-browser/dist/
diff --git a/.github/workflows/check-types.yml b/.github/workflows/check-types.yml
@@ -40,7 +40,8 @@ jobs:
                       ./livekit-plugins/livekit-plugins-elevenlabs \
                       ./livekit-plugins/livekit-plugins-cartesia \
                       ./livekit-plugins/livekit-plugins-rag \
-                      ./livekit-plugins/livekit-plugins-azure
+                      ./livekit-plugins/livekit-plugins-azure \
+                      ./livekit-plugins/livekit-plugins-anthropic
 
       - name: Install stub packages
         run: |
@@ -67,4 +68,5 @@ jobs:
                -p livekit.plugins.elevenlabs \
                -p livekit.plugins.cartesia \
                -p livekit.plugins.rag \
-               -p livekit.plugins.azure
+               -p livekit.plugins.azure \
+               -p livekit.plugins.anthropic
diff --git a/.github/workflows/publish-package.yml b/.github/workflows/publish-package.yml
@@ -52,6 +52,7 @@ jobs:
           echo "exitcode=$?" >> $GITHUB_OUTPUT
         env:
           GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+
       - name: Add changes
         if: ${{ steps.release_mode.outputs.exitcode == '0' }}
         uses: EndBug/add-and-commit@v9
@@ -79,38 +80,11 @@ jobs:
     strategy:
       matrix:
         package: ${{ fromJson(needs.bump.outputs.packages) }}
-    defaults:
-      run:
-        working-directory: "${{ startsWith(matrix.package.name, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ matrix.package.name }}"
-
-    runs-on: ubuntu-latest
-
-    steps:
-      - uses: actions/checkout@v4
-        with:
-          submodules: true
-          lfs: true
-        env:
-          GITHUB_TOKEN: ${{ secrets.CHANGESETS_PUSH_PAT }}
 
-      - name: Set up Python
-        uses: actions/setup-python@v5
-        with:
-          python-version: "3.9"
-
-      - name: Install dependencies
-        run: |
-          python -m pip install --upgrade pip
-          pip install build
-
-      - name: Build package
-        run: python -m build
-
-      - name: Store the distribution packages
-        uses: actions/upload-artifact@v3
-        with:
-          name: python-package-distributions
-          path: "${{ startsWith(matrix.package.name, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ matrix.package.name }}/dist/"
+    uses: livekit/agents/.github/workflows/build-package.yml@main
+    with:
+      package: ${{ matrix.package.name }}
+      artifact_name: python-package-distributions
 
   publish:
     needs:

diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -13,6 +13,11 @@ on:
 
 jobs:
   tests:
+    if: > # don't run tests for PRs on forks
+      ${{
+        !github.event.pull_request ||
+        github.event.pull_request.head.repo.full_name == github.repository
+      }}
     strategy:
       fail-fast: false
       matrix:
@@ -75,7 +80,8 @@ jobs:
                       ./livekit-plugins/livekit-plugins-silero \
                       ./livekit-plugins/livekit-plugins-elevenlabs \
                       ./livekit-plugins/livekit-plugins-cartesia \
-                      ./livekit-plugins/livekit-plugins-azure
+                      ./livekit-plugins/livekit-plugins-azure \
+                      ./livekit-plugins/livekit-plugins-anthropic
 
       - name: Run tests
         shell: bash
@@ -90,6 +96,7 @@ jobs:
           AZURE_SPEECH_KEY: ${{ secrets.AZURE_SPEECH_KEY }}
           AZURE_SPEECH_REGION: ${{ secrets.AZURE_SPEECH_REGION }} # nit: doesn't have to be secret
           GOOGLE_CREDENTIALS_JSON: ${{ secrets.GOOGLE_CREDENTIALS_JSON }}
+          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
           GOOGLE_APPLICATION_CREDENTIALS: google.json
         run: |
           echo $GOOGLE_CREDENTIALS_JSON > google.json

diff --git a/README.md b/README.md
@@ -61,6 +61,7 @@ The following plugins are available today:
 
 | Plugin                                                                             | Features                        |
 | ---------------------------------------------------------------------------------- | ------------------------------- |
+| [livekit-plugins-anthropic](https://pypi.org/project/livekit-plugins-anthropic/)   | LLM                             |
 | [livekit-plugins-azure](https://pypi.org/project/livekit-plugins-azure/)           | STT, TTS                        |
 | [livekit-plugins-cartesia](https://pypi.org/project/livekit-plugins-cartesia/)     | TTS                             |
 | [livekit-plugins-deepgram](https://pypi.org/project/livekit-plugins-deepgram/)     | STT                             |
@@ -70,6 +71,38 @@ The following plugins are available today:
 | [livekit-plugins-openai](https://pypi.org/project/livekit-plugins-openai/)         | LLM, STT, TTS                   |
 | [livekit-plugins-silero](https://pypi.org/project/livekit-plugins-silero/)         | VAD                             |
 
+## Using LLM models
+
+Agents framework supports a wide range of LLMs and hosting providers.
+
+### OpenAI-compatible models
+
+Most LLM providers offer an OpenAI-compatible API, which can be used with the `livekit-plugins-openai` plugin.
+
+```python
+from livekit.plugins.openai.llm import LLM
+```
+
+- OpenAI: `LLM(model="gpt-4o")`
+- Azure: `LLM.with_azure(azure_endpoint="", azure_deployment="")`
+- Cerebras: `LLM.with_cerebras(api_key="", model="")`
+- Fireworks: `LLM.with_fireworks(api_key="", model="")`
+- Groq: `LLM.with_groq(api_key="", model="")`
+- OctoAI: `LLM.with_octo(api_key="", model="")`
+- Ollama: `LLM.with_ollama(base_url="http://localhost:11434/v1", model="")`
+- Perplexity: `LLM.with_perplexity(api_key="", model="")`
+- TogetherAI: `LLM.with_together(api_key="", model="")`
+
+### Anthropic Claude
+
+Anthropic Claude can be used with `livekit-plugins-anthropic` plugin.
+
+```python
+from livekit.plugins.anthropic.llm import LLM
+
+myllm = LLM(model="claude-3-opus-20240229")
+```
+
 ## Concepts
 
 - **Agent**: A function that defines the workflow of a programmable, server-side participant. This is your application code.
@@ -153,7 +186,9 @@ class MyPlugin(Plugin):
 ```
 
 <!--BEGIN_REPO_NAV-->
+
 <br/><table>
+
 <thead><tr><th colspan="2">LiveKit Ecosystem</th></tr></thead>
 <tbody>
 <tr><td>Realtime SDKs</td><td><a href="https://github.com/livekit/components-js">React Components</a> · <a href="https://github.com/livekit/client-sdk-js">Browser</a> · <a href="https://github.com/livekit/components-swift">Swift Components</a> · <a href="https://github.com/livekit/client-sdk-swift">iOS/macOS/visionOS</a> · <a href="https://github.com/livekit/client-sdk-android">Android</a> · <a href="https://github.com/livekit/client-sdk-flutter">Flutter</a> · <a href="https://github.com/livekit/client-sdk-react-native">React Native</a> · <a href="https://github.com/livekit/rust-sdks">Rust</a> · <a href="https://github.com/livekit/node-sdks">Node.js</a> · <a href="https://github.com/livekit/python-sdks">Python</a> · <a href="https://github.com/livekit/client-sdk-unity-web">Unity (web)</a> · <a href="https://github.com/livekit/client-sdk-unity">Unity (beta)</a></td></tr><tr></tr>

diff --git a/examples/browser/browser_track.py b/examples/browser/browser_track.py
@@ -0,0 +1,55 @@
+import asyncio
+import logging
+
+from dotenv import load_dotenv
+from livekit import rtc
+from livekit.agents import JobContext, WorkerOptions, cli
+from livekit.plugins import browser
+
+WIDTH = 1920
+HEIGHT = 1080
+
+load_dotenv()
+
+
+async def entrypoint(job: JobContext):
+    await job.connect()
+
+    ctx = browser.BrowserContext(dev_mode=True)
+    await ctx.initialize()
+
+    page = await ctx.new_page(url="www.livekit.io")
+
+    source = rtc.VideoSource(WIDTH, HEIGHT)
+    track = rtc.LocalVideoTrack.create_video_track("single-color", source)
+    options = rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_CAMERA)
+    publication = await job.room.local_participant.publish_track(track, options)
+    logging.info("published track", extra={"track_sid": publication.sid})
+
+    @page.on("paint")
+    def on_paint(paint_data):
+        source.capture_frame(paint_data.frame)
+
+    async def _test_cycle():
+        urls = [
+            "https://www.livekit.io",
+            "https://www.google.com",
+        ]
+
+        i = 0
+        async with ctx.playwright() as browser:
+            while True:
+                i += 1
+                await asyncio.sleep(5)
+                defaultContext = browser.contexts[0]
+                defaultPage = defaultContext.pages[0]
+                try:
+                    await defaultPage.goto(urls[i % len(urls)])
+                except Exception:
+                    logging.exception(f"failed to navigate to {urls[i % len(urls)]}")
+
+    await _test_cycle()
+
+
+if __name__ == "__main__":
+    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
diff --git a/examples/browser/standalone_app.py b/examples/browser/standalone_app.py
@@ -0,0 +1,3 @@
+from livekit.plugins import browser
+
+ctx = browser.BrowserContext(dev_mode=True)
diff --git a/examples/minimal_worker.py b/examples/minimal_worker.py
@@ -1,6 +1,6 @@
 import logging
 
-from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
+from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, WorkerType, cli
 
 logger = logging.getLogger("my-worker")
 logger.setLevel(logging.INFO)
@@ -16,4 +16,6 @@ async def entrypoint(ctx: JobContext):
 
 
 if __name__ == "__main__":
-    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
+    # WorkerType.ROOM is the default worker type which will create an agent for every room.
+    # You can also use WorkerType.PUBLISHER to create a single agent for all participants that publish a track.
+    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM))
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		from livekit.plugins import browser

		ctx = browser.BrowserContext(dev_mode=True)