Skip to content

Commit

Permalink
Merge livekit-agent 0.9.0 (#4)
Browse files Browse the repository at this point in the history
* Fix deepgram English check (livekit#625)

* Cartesia bump to 0.4.0 (livekit#624)

* Introduce manual package release (livekit#626)

* Use the correct working directory in the manual publish job (livekit#627)

* Modified RAG plugin (livekit#629)

Co-authored-by: Théo Monnom <theo.monnom@outlook.com>

* Revert "nltk: fix broken punkt download" (livekit#630)

* Expose WorkerType explicitly (livekit#632)

* openai: allow sending user IDs (livekit#633)

* silero: fix vad padding & choppy audio  (livekit#631)

* ipc: use our own duplex instead of mp.Queue (livekit#634)

* llm: fix optional arguments & non-hashable list (livekit#637)

* Add agent_name to WorkerOptions (livekit#636)

* Support OpenAI Assistants API (livekit#601)

* voiceassistant: fix will_synthesize_assistant_reply race (livekit#638)

* silero: adjust vad activation threshold (livekit#639)

* Version Packages (livekit#615)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* voiceassistant: fix llm not having the full chat context on bad interruption timing (livekit#640)

* livekit-plugins-browser: handle mouse/keyboard inputs on devmode  (livekit#644)

* nltk: fix another semver break (livekit#647)

* livekit-plugins-browser: python API (livekit#645)

* Delete test.py (livekit#652)

* livekit-plugins-browser: prepare for release (livekit#653)

* Version Packages (livekit#641)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Revert "Version Packages" (livekit#659)

* fix release workflow (livekit#661)

* Version Packages (livekit#660)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add ServerMessage.termination handler (livekit#635)

Co-authored-by: Théo Monnom <theo.8bits@gmail.com>

* Introduce anthropic plugin (livekit#655)

* fix uninitialized SpeechHandle error on interruption  (livekit#665)

* voiceassistant: avoid stacking assistant replies when allow_interruptions=False (livekit#667)

* fix: disconnect event may now have some arguments  (livekit#668)

* Anthropic requires the first message to be a non empty 'user' role (livekit#669)

* support clova speech (livekit#439)

* Updated readme with LLM options (livekit#671)

* Update README.md (livekit#666)

* plugins: add docstrings explaining API keys (livekit#672)

* Disable anthropic test due to 429s (livekit#675)

* Remove duplicate entry from plugin table (livekit#673)

* Version Packages (livekit#662)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* deepgram: switch the default model to phonecall (livekit#676)

* update livekit to 0.14.0 and await tracksubscribed (livekit#678)

* Fix Google STT exception when no valid speech is recognized (livekit#680)

* Introduce easy api for starting tasks for remote participants (livekit#679)

* examples: document how to log chats (livekit#685)

* Version Packages (livekit#677)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* voiceassistant: keep punctuations when sending agent transcription (livekit#648)

* Pass context into participant entrypoint (livekit#694)

* Version Packages (livekit#693)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update examples to use participant_entrypoint (livekit#695)

* voiceassistant: add VoiceAssistantState (livekit#654)

Co-authored-by: Théo Monnom <theo.8bits@gmail.com>

* Fix anthropic package publishing (livekit#701)

* fix non pickleable log (livekit#691)

* Revert "Update examples to use participant_entrypoint" (livekit#702)

* google-tts: ignore wav header (livekit#703)

* fix examples (livekit#704)

* skip processing of choice.delta when it is None (livekit#705)

* delete duplicate code (livekit#707)

* voiceassistant: skip speech initialization if interrupted  (livekit#715)

* Ensure room.name is available before connection (livekit#716)

* Add deepseek LLMs at OpenAI plugin (livekit#714)

* add threaded job runners (livekit#684)

* voiceassistant: add before_tts_cb callback (livekit#706)

* voiceassistant: fix mark_audio_segment_end with no audio data (livekit#719)

* add JobContext.wait_for_participant (livekit#712)

* Enable Google TTS with application default credentials (livekit#721)

* improve gracefully_cancel logic (livekit#720)

* bump required livekit version to 0.15.2 (livekit#722)

* elevenlabs: expose enable_ssml_parsing (livekit#723)

* Version Packages (livekit#697)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* release anthropic (livekit#724)

* Version Packages (livekit#725)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update examples to use wait_for_participant (livekit#726)

Co-authored-by: Théo Monnom <theo.8bits@gmail.com>

* Introduce function calling to OpenAI Assistants (livekit#710)

Co-authored-by: Théo Monnom <theo.8bits@gmail.com>

* tts_forwarder: don't raise inside mark_{audio,text}_segment_end when nothing was pushed (livekit#730)

* Add Cerebras to OpenAI Plugin (livekit#731)

* Fixes to Anthropic Function Calling (livekit#708)

* ci: don't run tests on forks (livekit#739)

* Only send actual audio to Deepgram (livekit#738)

* Add support for cartesia voice control (livekit#740)

Co-authored-by: Théo Monnom <theo.8bits@gmail.com>

* Version Packages (livekit#727)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Allow setting LLM temperature with VoiceAssistant (livekit#741)

* Update STT sample README (livekit#709)

* avoid returning tiny frames from TTS (livekit#747)

* run tests on main (and make skipping clearer) (livekit#748)

* voiceassistant: avoid tiny frames on playout (livekit#750)

* limit concurrent process init to 1 (livekit#751)

* windows: default to threaded executor & fix dev mode  (livekit#755)

* improve graceful shutdown  (livekit#756)

* better dev defaults (livekit#762)

* 11labs: send phoneme in one entire xml chunk (livekit#766)

* ipc: fix process not starting if num_idle_processes is zero (livekit#763)

* limit noisy logs & keep the root logger info (livekit#768)

* use os.exit to exit forcefully  (livekit#770)

* Fix Assistant API Vision Capabilities (livekit#771)

* voiceassistant: allow to cancel llm generation inside before_llm_cb (livekit#753)

* Remove useless logs (livekit#773)

* voiceassistant: expose min_endpointing_delay (livekit#752)

* Add typing-extensions as a dependency (livekit#778)

* rename voice_assistant.state to agent.state (livekit#772)

Co-authored-by: aoife cassidy <aoife@livekit.io>

* bump rtc (livekit#782)

* Version Packages (livekit#744)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* added livekit-plugins-playht text-to-speech (livekit#735)

* Fix function for OpenAI Assistants (livekit#784)

* fix the problem of infinite loop when agent speech is interrupted (livekit#790)

---------

Co-authored-by: David Zhao <dz@livekit.io>
Co-authored-by: Neil Dwyer <neildwyer1991@gmail.com>
Co-authored-by: Alejandro Figar Gutierrez <afigar@me.com>
Co-authored-by: Théo Monnom <theo.monnom@outlook.com>
Co-authored-by: Théo Monnom <theo.8bits@gmail.com>
Co-authored-by: aoife cassidy <aoife@livekit.io>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: josephkieu <168809198+josephkieu@users.noreply.github.com>
Co-authored-by: Mehadi Hasan Menon <104126711+mehadi92@users.noreply.github.com>
Co-authored-by: lukasIO <mail@lukasseiler.de>
Co-authored-by: xsg22 <111886011+xsg22@users.noreply.github.com>
Co-authored-by: Yuan He <183649+lenage@users.noreply.github.com>
Co-authored-by: Ryan Sinnet <rsinnet@users.noreply.github.com>
Co-authored-by: Henry Tu <henry@henrytu.me>
Co-authored-by: Ben Cherry <bcherry@gmail.com>
Co-authored-by: Jaydev <jaydevjadav.015@gmail.com>
Co-authored-by: Jax <anyetiangong@qq.com>
  • Loading branch information
19 people authored Sep 26, 2024
1 parent 75d2e54 commit 8c4c075
Show file tree
Hide file tree
Showing 227 changed files with 9,974 additions and 2,632 deletions.
5 changes: 0 additions & 5 deletions .changeset/cuddly-eels-sin.md

This file was deleted.

7 changes: 0 additions & 7 deletions .changeset/five-planes-drum.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/itchy-ligers-exist.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/lazy-cups-cross.md

This file was deleted.

5 changes: 5 additions & 0 deletions .changeset/moody-doors-poke.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"livekit-agents": patch
---

fix VoiceAssisstant being stuck when interrupting before user speech is committed
5 changes: 0 additions & 5 deletions .changeset/proud-birds-press.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/red-taxis-smoke.md

This file was deleted.

5 changes: 0 additions & 5 deletions .changeset/shaggy-apes-matter.md

This file was deleted.

6 changes: 6 additions & 0 deletions .changeset/tidy-years-refuse.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
"livekit-agents": patch
"livekit-plugins-openai": patch
---

Fix function for OpenAI Assistants
98 changes: 98 additions & 0 deletions .github/workflows/build-package.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
name: Build package

on:
workflow_call:
inputs:
package:
required: true
type: string
artifact_name:
required: true
type: string
workflow_dispatch:
inputs:
package:
description: 'Name of the package to build'
required: true
default: 'livekit-plugins-browser'
artifact_name:
description: 'Artifact name for the distribution package'
required: true
default: 'build-artifact'

jobs:
build_plugins:
runs-on: ubuntu-latest
if: |
inputs.package == 'livekit-agents' ||
inputs.package == 'livekit-plugins-azure' ||
inputs.package == 'livekit-plugins-cartesia' ||
inputs.package == 'livekit-plugins-deepgram' ||
inputs.package == 'livekit-plugins-elevenlabs' ||
inputs.package == 'livekit-plugins-google' ||
inputs.package == 'livekit-plugins-minimal' ||
inputs.package == 'livekit-plugins-nltk' ||
inputs.package == 'livekit-plugins-openai' ||
inputs.package == 'livekit-plugins-rag' ||
inputs.package == 'livekit-plugins-silero' ||
inputs.package == 'livekit-plugins-anthropic'
defaults:
run:
working-directory: "${{ startsWith(inputs.package, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ inputs.package }}"
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.9"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build

- name: Upload distribution package
uses: actions/upload-artifact@v3
with:
name: ${{ inputs.artifact_name }}
path: "${{ startsWith(inputs.package, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ inputs.package }}/dist/"

build_browser:
if: inputs.package == 'livekit-plugins-browser'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [macos-14] # TODO(theomonnom): other platforms

defaults:
run:
working-directory: livekit-plugins/livekit-plugins-browser
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.9"

- name: Install cibuildwheel
run: |
python -m pip install --upgrade pip
pip install cibuildwheel
- name: Build wheels
run: cibuildwheel --output-dir dist
env:
CIBW_SKIP: pp* cp313-*
CIBW_BUILD_VERBOSITY: 3

- name: Upload distribution package
uses: actions/upload-artifact@v3
with:
name: ${{ inputs.artifact_name }}
path: livekit-plugins/livekit-plugins-browser/dist/
6 changes: 4 additions & 2 deletions .github/workflows/check-types.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ jobs:
./livekit-plugins/livekit-plugins-elevenlabs \
./livekit-plugins/livekit-plugins-cartesia \
./livekit-plugins/livekit-plugins-rag \
./livekit-plugins/livekit-plugins-azure
./livekit-plugins/livekit-plugins-azure \
./livekit-plugins/livekit-plugins-anthropic
- name: Install stub packages
run: |
Expand All @@ -67,4 +68,5 @@ jobs:
-p livekit.plugins.elevenlabs \
-p livekit.plugins.cartesia \
-p livekit.plugins.rag \
-p livekit.plugins.azure
-p livekit.plugins.azure \
-p livekit.plugins.anthropic
36 changes: 5 additions & 31 deletions .github/workflows/publish-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ jobs:
echo "exitcode=$?" >> $GITHUB_OUTPUT
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

- name: Add changes
if: ${{ steps.release_mode.outputs.exitcode == '0' }}
uses: EndBug/add-and-commit@v9
Expand Down Expand Up @@ -79,38 +80,11 @@ jobs:
strategy:
matrix:
package: ${{ fromJson(needs.bump.outputs.packages) }}
defaults:
run:
working-directory: "${{ startsWith(matrix.package.name, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ matrix.package.name }}"

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
with:
submodules: true
lfs: true
env:
GITHUB_TOKEN: ${{ secrets.CHANGESETS_PUSH_PAT }}

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.9"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install build
- name: Build package
run: python -m build

- name: Store the distribution packages
uses: actions/upload-artifact@v3
with:
name: python-package-distributions
path: "${{ startsWith(matrix.package.name, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ matrix.package.name }}/dist/"
uses: livekit/agents/.github/workflows/build-package.yml@main
with:
package: ${{ matrix.package.name }}
artifact_name: python-package-distributions

publish:
needs:
Expand Down
9 changes: 8 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ on:

jobs:
tests:
if: > # don't run tests for PRs on forks
${{
!github.event.pull_request ||
github.event.pull_request.head.repo.full_name == github.repository
}}
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -75,7 +80,8 @@ jobs:
./livekit-plugins/livekit-plugins-silero \
./livekit-plugins/livekit-plugins-elevenlabs \
./livekit-plugins/livekit-plugins-cartesia \
./livekit-plugins/livekit-plugins-azure
./livekit-plugins/livekit-plugins-azure \
./livekit-plugins/livekit-plugins-anthropic
- name: Run tests
shell: bash
Expand All @@ -90,6 +96,7 @@ jobs:
AZURE_SPEECH_KEY: ${{ secrets.AZURE_SPEECH_KEY }}
AZURE_SPEECH_REGION: ${{ secrets.AZURE_SPEECH_REGION }} # nit: doesn't have to be secret
GOOGLE_CREDENTIALS_JSON: ${{ secrets.GOOGLE_CREDENTIALS_JSON }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GOOGLE_APPLICATION_CREDENTIALS: google.json
run: |
echo $GOOGLE_CREDENTIALS_JSON > google.json
Expand Down
35 changes: 35 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ The following plugins are available today:

| Plugin | Features |
| ---------------------------------------------------------------------------------- | ------------------------------- |
| [livekit-plugins-anthropic](https://pypi.org/project/livekit-plugins-anthropic/) | LLM |
| [livekit-plugins-azure](https://pypi.org/project/livekit-plugins-azure/) | STT, TTS |
| [livekit-plugins-cartesia](https://pypi.org/project/livekit-plugins-cartesia/) | TTS |
| [livekit-plugins-deepgram](https://pypi.org/project/livekit-plugins-deepgram/) | STT |
Expand All @@ -70,6 +71,38 @@ The following plugins are available today:
| [livekit-plugins-openai](https://pypi.org/project/livekit-plugins-openai/) | LLM, STT, TTS |
| [livekit-plugins-silero](https://pypi.org/project/livekit-plugins-silero/) | VAD |

## Using LLM models

Agents framework supports a wide range of LLMs and hosting providers.

### OpenAI-compatible models

Most LLM providers offer an OpenAI-compatible API, which can be used with the `livekit-plugins-openai` plugin.

```python
from livekit.plugins.openai.llm import LLM
```

- OpenAI: `LLM(model="gpt-4o")`
- Azure: `LLM.with_azure(azure_endpoint="", azure_deployment="")`
- Cerebras: `LLM.with_cerebras(api_key="", model="")`
- Fireworks: `LLM.with_fireworks(api_key="", model="")`
- Groq: `LLM.with_groq(api_key="", model="")`
- OctoAI: `LLM.with_octo(api_key="", model="")`
- Ollama: `LLM.with_ollama(base_url="http://localhost:11434/v1", model="")`
- Perplexity: `LLM.with_perplexity(api_key="", model="")`
- TogetherAI: `LLM.with_together(api_key="", model="")`

### Anthropic Claude

Anthropic Claude can be used with `livekit-plugins-anthropic` plugin.

```python
from livekit.plugins.anthropic.llm import LLM

myllm = LLM(model="claude-3-opus-20240229")
```

## Concepts

- **Agent**: A function that defines the workflow of a programmable, server-side participant. This is your application code.
Expand Down Expand Up @@ -153,7 +186,9 @@ class MyPlugin(Plugin):
```

<!--BEGIN_REPO_NAV-->

<br/><table>

<thead><tr><th colspan="2">LiveKit Ecosystem</th></tr></thead>
<tbody>
<tr><td>Realtime SDKs</td><td><a href="https://github.com/livekit/components-js">React Components</a> · <a href="https://github.com/livekit/client-sdk-js">Browser</a> · <a href="https://github.com/livekit/components-swift">Swift Components</a> · <a href="https://github.com/livekit/client-sdk-swift">iOS/macOS/visionOS</a> · <a href="https://github.com/livekit/client-sdk-android">Android</a> · <a href="https://github.com/livekit/client-sdk-flutter">Flutter</a> · <a href="https://github.com/livekit/client-sdk-react-native">React Native</a> · <a href="https://github.com/livekit/rust-sdks">Rust</a> · <a href="https://github.com/livekit/node-sdks">Node.js</a> · <a href="https://github.com/livekit/python-sdks">Python</a> · <a href="https://github.com/livekit/client-sdk-unity-web">Unity (web)</a> · <a href="https://github.com/livekit/client-sdk-unity">Unity (beta)</a></td></tr><tr></tr>
Expand Down
55 changes: 55 additions & 0 deletions examples/browser/browser_track.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import asyncio
import logging

from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.plugins import browser

WIDTH = 1920
HEIGHT = 1080

load_dotenv()


async def entrypoint(job: JobContext):
await job.connect()

ctx = browser.BrowserContext(dev_mode=True)
await ctx.initialize()

page = await ctx.new_page(url="www.livekit.io")

source = rtc.VideoSource(WIDTH, HEIGHT)
track = rtc.LocalVideoTrack.create_video_track("single-color", source)
options = rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_CAMERA)
publication = await job.room.local_participant.publish_track(track, options)
logging.info("published track", extra={"track_sid": publication.sid})

@page.on("paint")
def on_paint(paint_data):
source.capture_frame(paint_data.frame)

async def _test_cycle():
urls = [
"https://www.livekit.io",
"https://www.google.com",
]

i = 0
async with ctx.playwright() as browser:
while True:
i += 1
await asyncio.sleep(5)
defaultContext = browser.contexts[0]
defaultPage = defaultContext.pages[0]
try:
await defaultPage.goto(urls[i % len(urls)])
except Exception:
logging.exception(f"failed to navigate to {urls[i % len(urls)]}")

await _test_cycle()


if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
3 changes: 3 additions & 0 deletions examples/browser/standalone_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from livekit.plugins import browser

ctx = browser.BrowserContext(dev_mode=True)
6 changes: 4 additions & 2 deletions examples/minimal_worker.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import logging

from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, WorkerType, cli

logger = logging.getLogger("my-worker")
logger.setLevel(logging.INFO)
Expand All @@ -16,4 +16,6 @@ async def entrypoint(ctx: JobContext):


if __name__ == "__main__":
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
# WorkerType.ROOM is the default worker type which will create an agent for every room.
# You can also use WorkerType.PUBLISHER to create a single agent for all participants that publish a track.
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM))
Loading

0 comments on commit 8c4c075

Please sign in to comment.