From 8c4c0754fa3babea9087e2abcd424a548f3f4b83 Mon Sep 17 00:00:00 2001 From: Donny Yung Date: Wed, 25 Sep 2024 21:16:39 -0400 Subject: [PATCH] Merge livekit-agent 0.9.0 (#4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Fix deepgram English check (#625) * Cartesia bump to 0.4.0 (#624) * Introduce manual package release (#626) * Use the correct working directory in the manual publish job (#627) * Modified RAG plugin (#629) Co-authored-by: Théo Monnom * Revert "nltk: fix broken punkt download" (#630) * Expose WorkerType explicitly (#632) * openai: allow sending user IDs (#633) * silero: fix vad padding & choppy audio (#631) * ipc: use our own duplex instead of mp.Queue (#634) * llm: fix optional arguments & non-hashable list (#637) * Add agent_name to WorkerOptions (#636) * Support OpenAI Assistants API (#601) * voiceassistant: fix will_synthesize_assistant_reply race (#638) * silero: adjust vad activation threshold (#639) * Version Packages (#615) Co-authored-by: github-actions[bot] * voiceassistant: fix llm not having the full chat context on bad interruption timing (#640) * livekit-plugins-browser: handle mouse/keyboard inputs on devmode (#644) * nltk: fix another semver break (#647) * livekit-plugins-browser: python API (#645) * Delete test.py (#652) * livekit-plugins-browser: prepare for release (#653) * Version Packages (#641) Co-authored-by: github-actions[bot] * Revert "Version Packages" (#659) * fix release workflow (#661) * Version Packages (#660) Co-authored-by: github-actions[bot] * Add ServerMessage.termination handler (#635) Co-authored-by: Théo Monnom * Introduce anthropic plugin (#655) * fix uninitialized SpeechHandle error on interruption (#665) * voiceassistant: avoid stacking assistant replies when allow_interruptions=False (#667) * fix: disconnect event may now have some arguments (#668) * Anthropic requires the first message to be a non empty 'user' role (#669) * support clova speech (#439) * Updated readme with LLM options (#671) * Update README.md (#666) * plugins: add docstrings explaining API keys (#672) * Disable anthropic test due to 429s (#675) * Remove duplicate entry from plugin table (#673) * Version Packages (#662) Co-authored-by: github-actions[bot] * deepgram: switch the default model to phonecall (#676) * update livekit to 0.14.0 and await tracksubscribed (#678) * Fix Google STT exception when no valid speech is recognized (#680) * Introduce easy api for starting tasks for remote participants (#679) * examples: document how to log chats (#685) * Version Packages (#677) Co-authored-by: github-actions[bot] * voiceassistant: keep punctuations when sending agent transcription (#648) * Pass context into participant entrypoint (#694) * Version Packages (#693) Co-authored-by: github-actions[bot] * Update examples to use participant_entrypoint (#695) * voiceassistant: add VoiceAssistantState (#654) Co-authored-by: Théo Monnom * Fix anthropic package publishing (#701) * fix non pickleable log (#691) * Revert "Update examples to use participant_entrypoint" (#702) * google-tts: ignore wav header (#703) * fix examples (#704) * skip processing of choice.delta when it is None (#705) * delete duplicate code (#707) * voiceassistant: skip speech initialization if interrupted (#715) * Ensure room.name is available before connection (#716) * Add deepseek LLMs at OpenAI plugin (#714) * add threaded job runners (#684) * voiceassistant: add before_tts_cb callback (#706) * voiceassistant: fix mark_audio_segment_end with no audio data (#719) * add JobContext.wait_for_participant (#712) * Enable Google TTS with application default credentials (#721) * improve gracefully_cancel logic (#720) * bump required livekit version to 0.15.2 (#722) * elevenlabs: expose enable_ssml_parsing (#723) * Version Packages (#697) Co-authored-by: github-actions[bot] * release anthropic (#724) * Version Packages (#725) Co-authored-by: github-actions[bot] * Update examples to use wait_for_participant (#726) Co-authored-by: Théo Monnom * Introduce function calling to OpenAI Assistants (#710) Co-authored-by: Théo Monnom * tts_forwarder: don't raise inside mark_{audio,text}_segment_end when nothing was pushed (#730) * Add Cerebras to OpenAI Plugin (#731) * Fixes to Anthropic Function Calling (#708) * ci: don't run tests on forks (#739) * Only send actual audio to Deepgram (#738) * Add support for cartesia voice control (#740) Co-authored-by: Théo Monnom * Version Packages (#727) Co-authored-by: github-actions[bot] * Allow setting LLM temperature with VoiceAssistant (#741) * Update STT sample README (#709) * avoid returning tiny frames from TTS (#747) * run tests on main (and make skipping clearer) (#748) * voiceassistant: avoid tiny frames on playout (#750) * limit concurrent process init to 1 (#751) * windows: default to threaded executor & fix dev mode (#755) * improve graceful shutdown (#756) * better dev defaults (#762) * 11labs: send phoneme in one entire xml chunk (#766) * ipc: fix process not starting if num_idle_processes is zero (#763) * limit noisy logs & keep the root logger info (#768) * use os.exit to exit forcefully (#770) * Fix Assistant API Vision Capabilities (#771) * voiceassistant: allow to cancel llm generation inside before_llm_cb (#753) * Remove useless logs (#773) * voiceassistant: expose min_endpointing_delay (#752) * Add typing-extensions as a dependency (#778) * rename voice_assistant.state to agent.state (#772) Co-authored-by: aoife cassidy * bump rtc (#782) * Version Packages (#744) Co-authored-by: github-actions[bot] * added livekit-plugins-playht text-to-speech (#735) * Fix function for OpenAI Assistants (#784) * fix the problem of infinite loop when agent speech is interrupted (#790) --------- Co-authored-by: David Zhao Co-authored-by: Neil Dwyer Co-authored-by: Alejandro Figar Gutierrez Co-authored-by: Théo Monnom Co-authored-by: Théo Monnom Co-authored-by: aoife cassidy Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] Co-authored-by: josephkieu <168809198+josephkieu@users.noreply.github.com> Co-authored-by: Mehadi Hasan Menon <104126711+mehadi92@users.noreply.github.com> Co-authored-by: lukasIO Co-authored-by: xsg22 <111886011+xsg22@users.noreply.github.com> Co-authored-by: Yuan He <183649+lenage@users.noreply.github.com> Co-authored-by: Ryan Sinnet Co-authored-by: Henry Tu Co-authored-by: Ben Cherry Co-authored-by: Jaydev Co-authored-by: Jax --- .changeset/cuddly-eels-sin.md | 5 - .changeset/five-planes-drum.md | 7 - .changeset/itchy-ligers-exist.md | 5 - .changeset/lazy-cups-cross.md | 5 - .changeset/moody-doors-poke.md | 5 + .changeset/proud-birds-press.md | 5 - .changeset/red-taxis-smoke.md | 5 - .changeset/shaggy-apes-matter.md | 5 - .changeset/tidy-years-refuse.md | 6 + .github/workflows/build-package.yml | 98 ++ .github/workflows/check-types.yml | 6 +- .github/workflows/publish-package.yml | 36 +- .github/workflows/tests.yml | 9 +- README.md | 35 + examples/browser/browser_track.py | 55 + examples/browser/standalone_app.py | 3 + examples/minimal_worker.py | 6 +- examples/participant-entrypoint/README.md | 30 + .../participant_entrypoint.py | 44 + .../participant-entrypoint/requirements.txt | 1 + examples/simple-color/agent.py | 15 +- examples/simple-color/requirements.txt | 2 +- examples/speech-to-text/README.md | 10 +- examples/speech-to-text/deepgram_stt.py | 3 + examples/speech-to-text/requirements.txt | 4 +- examples/text-to-speech/cartesia_tts.py | 43 + examples/text-to-speech/elevenlabs_tts.py | 7 +- examples/text-to-speech/openai_tts.py | 7 +- examples/text-to-speech/requirements.txt | 6 +- .../text-to-speech/sync_tts_transcription.py | 7 +- examples/voice-assistant/README.md | 22 +- .../voice-assistant/custom_pronunciation.py | 49 + examples/voice-assistant/function_calling.py | 115 -- .../function_calling_weather.py | 85 ++ examples/voice-assistant/minimal_assistant.py | 35 +- examples/voice-assistant/requirements.txt | 11 +- examples/voice-assistant/save_chatctx.py | 84 ++ .../voice-assistant/simple-rag/assistant.py | 12 +- livekit-agents/CHANGELOG.md | 126 ++ livekit-agents/livekit/agents/__init__.py | 10 +- livekit-agents/livekit/agents/cli/cli.py | 232 ++-- livekit-agents/livekit/agents/cli/log.py | 21 +- livekit-agents/livekit/agents/cli/proto.py | 2 +- livekit-agents/livekit/agents/cli/watcher.py | 65 +- livekit-agents/livekit/agents/ipc/__init__.py | 18 +- .../livekit/agents/ipc/job_executor.py | 29 + .../agents/ipc/{proc_main.py => job_main.py} | 112 +- ...upervised_proc.py => proc_job_executor.py} | 61 +- .../livekit/agents/ipc/proc_lazy_main.py | 72 ++ .../livekit/agents/ipc/proc_pool.py | 76 +- livekit-agents/livekit/agents/ipc/proto.py | 16 +- .../livekit/agents/ipc/thread_job_executor.py | 256 ++++ livekit-agents/livekit/agents/job.py | 107 +- livekit-agents/livekit/agents/llm/_oai_api.py | 2 +- .../livekit/agents/llm/chat_context.py | 15 +- .../livekit/agents/llm/function_context.py | 80 +- livekit-agents/livekit/agents/log.py | 2 +- livekit-agents/livekit/agents/proto.py | 5 + .../livekit/agents/tokenize/__init__.py | 3 +- .../agents/tokenize/_basic_paragraph.py | 30 +- .../livekit/agents/tokenize/_basic_sent.py | 43 +- .../livekit/agents/tokenize/_basic_word.py | 47 +- .../livekit/agents/tokenize/basic.py | 20 +- .../livekit/agents/tokenize/token_stream.py | 82 +- .../livekit/agents/tokenize/tokenizer.py | 6 + .../livekit/agents/tokenize/utils.py | 82 ++ .../agents/transcription/stt_forwarder.py | 27 +- .../agents/transcription/tts_forwarder.py | 265 ++-- .../livekit/agents/utils/aio/__init__.py | 30 +- .../livekit/agents/utils/aio/duplex_unix.py | 25 +- .../livekit/agents/utils/aio/itertools.py | 114 ++ livekit-agents/livekit/agents/utils/audio.py | 2 +- livekit-agents/livekit/agents/utils/misc.py | 2 + livekit-agents/livekit/agents/vad.py | 8 +- livekit-agents/livekit/agents/version.py | 2 +- .../agents/voice_assistant/__init__.py | 6 +- .../agents/voice_assistant/agent_output.py | 90 +- .../agents/voice_assistant/agent_playout.py | 83 +- .../agents/voice_assistant/human_input.py | 4 +- .../livekit/agents/voice_assistant/plotter.py | 52 +- .../agents/voice_assistant/speech_handle.py | 153 +++ .../agents/voice_assistant/voice_assistant.py | 401 +++--- livekit-agents/livekit/agents/worker.py | 133 +- livekit-agents/package.json | 2 +- livekit-agents/setup.py | 3 +- livekit-plugins/install_plugins_editable.sh | 1 + .../livekit-plugins-anthropic/CHANGELOG.md | 13 + .../livekit-plugins-anthropic/README.md | 13 + .../livekit/plugins/anthropic/__init__.py | 37 + .../livekit/plugins/anthropic/llm.py | 511 ++++++++ .../livekit/plugins/anthropic/log.py | 3 + .../livekit/plugins/anthropic/models.py | 8 + .../livekit/plugins/anthropic/py.typed} | 0 .../livekit/plugins/anthropic/version.py | 15 + .../livekit-plugins-anthropic/package.json | 5 + .../livekit-plugins-anthropic/pyproject.toml | 3 + .../livekit-plugins-anthropic/setup.py | 59 + .../livekit-plugins-azure/CHANGELOG.md | 6 + .../livekit/plugins/azure/stt.py | 7 + .../livekit/plugins/azure/tts.py | 52 +- .../livekit/plugins/azure/version.py | 2 +- .../livekit-plugins-azure/package.json | 2 +- .../{cef => }/.clang-format | 0 .../{cef => }/.gitignore | 0 .../livekit-plugins-browser/CHANGELOG.md | 7 + .../{cef => }/CMakeLists.txt | 3 +- .../{cef => }/LICENSE.txt | 0 .../livekit-plugins-browser/README.md | 4 + .../cef/src/agents_python.cpp | 52 - .../cef/src/agents_python.hpp | 39 - .../livekit-plugins-browser/cef/src/app.hpp | 47 - .../cef/src/app_mac.mm | 146 --- .../cef/src/dev_renderer.cpp | 195 --- .../cef/src/handler.cpp | 156 --- .../cef/src/handler.hpp | 94 -- .../cef/src/resources/lkcef-Info.plist | 36 - .../cef/src/run_browser.py | 27 - .../{cef => }/cmake/DownloadCEF.cmake | 0 .../livekit/plugins/browser/__init__.py | 29 + .../livekit/plugins/browser/log.py | 3 + .../livekit/plugins/browser/proc.py | 239 ++++ .../livekit/plugins/browser/proc_main.py | 193 +++ .../livekit/plugins/browser/proto.py | 196 +++ .../plugins/browser/py.typed} | 0 .../plugins/browser/resources/__init__.py | 1 + .../livekit/plugins/browser/version.py | 15 + .../livekit-plugins-browser/package.json | 5 + .../livekit-plugins-browser/pyproject.toml | 9 + .../livekit-plugins-browser/setup.py | 126 ++ .../livekit-plugins-browser/src/.gitignore | 3 + .../{cef => }/src/CMakeLists.txt | 28 +- .../src/agents_python.cpp | 138 +++ .../src/agents_python.hpp | 69 ++ .../{cef => }/src/app.cpp | 47 +- .../livekit-plugins-browser/src/app.hpp | 75 ++ .../livekit-plugins-browser/src/app_mac.mm | 110 ++ .../src/browser_handle.cpp | 15 + .../src/browser_handle.hpp | 72 ++ .../src/dev_renderer.cpp | 593 +++++++++ .../{cef => }/src/dev_renderer.hpp | 21 +- .../livekit-plugins-browser/src/dummy.cpp | 3 + .../livekit-plugins-browser/src/gleq.h | 419 +++++++ .../livekit-plugins-browser/src/handler.cpp | 181 +++ .../livekit-plugins-browser/src/handler.hpp | 104 ++ .../{cef => }/src/helper_main_linux.cpp | 0 .../{cef => }/src/helper_main_mac.mm | 0 .../src/utils.hpp => src/helper_main_win.cpp} | 0 .../src/keyboard_codes.h | 528 ++++++++ .../src/resources/lkcefapp-Info.plist | 0 .../src/resources/lkcefhelper-Info.plist | 0 .../src/run_browser.py | 45 + .../livekit-plugins-cartesia/CHANGELOG.md | 13 + .../livekit/plugins/cartesia/models.py | 29 +- .../livekit/plugins/cartesia/tts.py | 40 +- .../livekit/plugins/cartesia/version.py | 2 +- .../livekit-plugins-cartesia/package.json | 2 +- .../livekit-plugins-clova/README.md | 13 + .../livekit/plugins/clova/__init__.py | 21 + .../livekit/plugins/clova/common.py | 13 + .../livekit/plugins/clova/constants.py | 2 + .../livekit/plugins/clova/log.py | 3 + .../livekit/plugins/clova/models.py | 17 + .../livekit/plugins/clova/stt.py | 132 ++ .../livekit/plugins/clova/version.py | 15 + .../livekit-plugins-clova/pyproject.toml | 3 + .../livekit-plugins-clova/setup.py | 56 + .../livekit-plugins-deepgram/CHANGELOG.md | 20 + .../livekit/plugins/deepgram/stt.py | 19 +- .../livekit/plugins/deepgram/utils.py | 27 + .../livekit/plugins/deepgram/version.py | 2 +- .../livekit-plugins-deepgram/package.json | 2 +- .../livekit-plugins-deepgram/setup.py | 2 +- .../livekit-plugins-elevenlabs/CHANGELOG.md | 14 + .../livekit/plugins/elevenlabs/tts.py | 59 +- .../livekit/plugins/elevenlabs/version.py | 2 +- .../livekit-plugins-elevenlabs/package.json | 2 +- .../livekit-plugins-google/CHANGELOG.md | 22 + .../livekit-plugins-google/README.md | 2 +- .../livekit/plugins/google/stt.py | 45 +- .../livekit/plugins/google/tts.py | 20 +- .../livekit/plugins/google/version.py | 2 +- .../livekit-plugins-google/package.json | 2 +- .../livekit-plugins-google/setup.py | 1 + .../livekit-plugins-nltk/CHANGELOG.md | 12 + .../livekit/plugins/nltk/version.py | 2 +- .../livekit-plugins-nltk/package.json | 2 +- livekit-plugins/livekit-plugins-nltk/setup.py | 2 +- .../livekit-plugins-openai/CHANGELOG.md | 35 + .../livekit/plugins/openai/__init__.py | 3 + .../livekit/plugins/openai/beta/README.md | 78 ++ .../livekit/plugins/openai/beta/__init__.py | 17 + .../plugins/openai/beta/assistant_llm.py | 590 +++++++++ .../livekit/plugins/openai/llm.py | 299 +++-- .../livekit/plugins/openai/models.py | 14 + .../livekit/plugins/openai/stt.py | 13 + .../livekit/plugins/openai/tts.py | 36 +- .../livekit/plugins/openai/utils.py | 88 +- .../livekit/plugins/openai/version.py | 2 +- .../livekit-plugins-openai/package.json | 2 +- .../livekit-plugins-playht/README.md | 13 + .../livekit/__init__.py | 0 .../livekit/plugins/__init__.py | 0 .../livekit/plugins/playht/__init__.py | 24 + .../livekit/plugins/playht/log.py | 3 + .../livekit/plugins/playht/models.py | 19 + .../livekit/plugins/playht/tts.py | 218 ++++ .../livekit/plugins/playht/version.py | 1 + .../livekit-plugins-playht/package.json | 6 + .../livekit-plugins-playht/pyproject.toml | 3 + .../livekit-plugins-playht/setup.py | 44 + .../livekit-plugins-rag/CHANGELOG.md | 6 + .../livekit/plugins/rag/__init__.py | 3 + .../livekit/plugins/rag/version.py | 2 +- .../livekit-plugins-rag/package.json | 2 +- .../livekit-plugins-silero/CHANGELOG.md | 8 + .../livekit/plugins/silero/vad.py | 104 +- .../livekit/plugins/silero/version.py | 2 +- .../livekit-plugins-silero/package.json | 2 +- pnpm-lock.yaml | 1090 +++++++++-------- test.py | 59 - tests/.gitignore | 1 + tests/test_ipc.py | 19 +- tests/test_llm.py | 224 ++-- tests/test_tokenizer.py | 106 ++ tests/test_tts.py | 2 + tests/test_vad.py | 66 + tests/utils.py | 60 + 227 files changed, 9974 insertions(+), 2632 deletions(-) delete mode 100644 .changeset/cuddly-eels-sin.md delete mode 100644 .changeset/five-planes-drum.md delete mode 100644 .changeset/itchy-ligers-exist.md delete mode 100644 .changeset/lazy-cups-cross.md create mode 100644 .changeset/moody-doors-poke.md delete mode 100644 .changeset/proud-birds-press.md delete mode 100644 .changeset/red-taxis-smoke.md delete mode 100644 .changeset/shaggy-apes-matter.md create mode 100644 .changeset/tidy-years-refuse.md create mode 100644 .github/workflows/build-package.yml create mode 100644 examples/browser/browser_track.py create mode 100644 examples/browser/standalone_app.py create mode 100644 examples/participant-entrypoint/README.md create mode 100644 examples/participant-entrypoint/participant_entrypoint.py create mode 100644 examples/participant-entrypoint/requirements.txt create mode 100644 examples/text-to-speech/cartesia_tts.py create mode 100644 examples/voice-assistant/custom_pronunciation.py delete mode 100644 examples/voice-assistant/function_calling.py create mode 100644 examples/voice-assistant/function_calling_weather.py create mode 100644 examples/voice-assistant/save_chatctx.py create mode 100644 livekit-agents/livekit/agents/ipc/job_executor.py rename livekit-agents/livekit/agents/ipc/{proc_main.py => job_main.py} (71%) rename livekit-agents/livekit/agents/ipc/{supervised_proc.py => proc_job_executor.py} (89%) create mode 100644 livekit-agents/livekit/agents/ipc/proc_lazy_main.py create mode 100644 livekit-agents/livekit/agents/ipc/thread_job_executor.py create mode 100644 livekit-agents/livekit/agents/proto.py create mode 100644 livekit-agents/livekit/agents/tokenize/utils.py create mode 100644 livekit-agents/livekit/agents/utils/aio/itertools.py create mode 100644 livekit-agents/livekit/agents/voice_assistant/speech_handle.py create mode 100644 livekit-plugins/livekit-plugins-anthropic/CHANGELOG.md create mode 100644 livekit-plugins/livekit-plugins-anthropic/README.md create mode 100644 livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/__init__.py create mode 100644 livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py create mode 100644 livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/log.py create mode 100644 livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/models.py rename livekit-plugins/{livekit-plugins-browser/cef/src/helper_main_win.cpp => livekit-plugins-anthropic/livekit/plugins/anthropic/py.typed} (100%) create mode 100644 livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py create mode 100644 livekit-plugins/livekit-plugins-anthropic/package.json create mode 100644 livekit-plugins/livekit-plugins-anthropic/pyproject.toml create mode 100644 livekit-plugins/livekit-plugins-anthropic/setup.py rename livekit-plugins/livekit-plugins-browser/{cef => }/.clang-format (100%) rename livekit-plugins/livekit-plugins-browser/{cef => }/.gitignore (100%) create mode 100644 livekit-plugins/livekit-plugins-browser/CHANGELOG.md rename livekit-plugins/livekit-plugins-browser/{cef => }/CMakeLists.txt (90%) rename livekit-plugins/livekit-plugins-browser/{cef => }/LICENSE.txt (100%) create mode 100644 livekit-plugins/livekit-plugins-browser/README.md delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/agents_python.cpp delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/agents_python.hpp delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/app.hpp delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/app_mac.mm delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.cpp delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/handler.cpp delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/handler.hpp delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcef-Info.plist delete mode 100644 livekit-plugins/livekit-plugins-browser/cef/src/run_browser.py rename livekit-plugins/livekit-plugins-browser/{cef => }/cmake/DownloadCEF.cmake (100%) create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/__init__.py create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/log.py create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc.py create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc_main.py create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proto.py rename livekit-plugins/livekit-plugins-browser/{cef/src/utils.cpp => livekit/plugins/browser/py.typed} (100%) create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/resources/__init__.py create mode 100644 livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py create mode 100644 livekit-plugins/livekit-plugins-browser/package.json create mode 100644 livekit-plugins/livekit-plugins-browser/pyproject.toml create mode 100644 livekit-plugins/livekit-plugins-browser/setup.py create mode 100644 livekit-plugins/livekit-plugins-browser/src/.gitignore rename livekit-plugins/livekit-plugins-browser/{cef => }/src/CMakeLists.txt (90%) create mode 100644 livekit-plugins/livekit-plugins-browser/src/agents_python.cpp create mode 100644 livekit-plugins/livekit-plugins-browser/src/agents_python.hpp rename livekit-plugins/livekit-plugins-browser/{cef => }/src/app.cpp (50%) create mode 100644 livekit-plugins/livekit-plugins-browser/src/app.hpp create mode 100644 livekit-plugins/livekit-plugins-browser/src/app_mac.mm create mode 100644 livekit-plugins/livekit-plugins-browser/src/browser_handle.cpp create mode 100644 livekit-plugins/livekit-plugins-browser/src/browser_handle.hpp create mode 100644 livekit-plugins/livekit-plugins-browser/src/dev_renderer.cpp rename livekit-plugins/livekit-plugins-browser/{cef => }/src/dev_renderer.hpp (62%) create mode 100644 livekit-plugins/livekit-plugins-browser/src/dummy.cpp create mode 100644 livekit-plugins/livekit-plugins-browser/src/gleq.h create mode 100644 livekit-plugins/livekit-plugins-browser/src/handler.cpp create mode 100644 livekit-plugins/livekit-plugins-browser/src/handler.hpp rename livekit-plugins/livekit-plugins-browser/{cef => }/src/helper_main_linux.cpp (100%) rename livekit-plugins/livekit-plugins-browser/{cef => }/src/helper_main_mac.mm (100%) rename livekit-plugins/livekit-plugins-browser/{cef/src/utils.hpp => src/helper_main_win.cpp} (100%) create mode 100644 livekit-plugins/livekit-plugins-browser/src/keyboard_codes.h rename livekit-plugins/livekit-plugins-browser/{cef => }/src/resources/lkcefapp-Info.plist (100%) rename livekit-plugins/livekit-plugins-browser/{cef => }/src/resources/lkcefhelper-Info.plist (100%) create mode 100644 livekit-plugins/livekit-plugins-browser/src/run_browser.py create mode 100644 livekit-plugins/livekit-plugins-clova/README.md create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/__init__.py create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.py create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/constants.py create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/log.py create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/models.py create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/stt.py create mode 100644 livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py create mode 100644 livekit-plugins/livekit-plugins-clova/pyproject.toml create mode 100644 livekit-plugins/livekit-plugins-clova/setup.py create mode 100644 livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/utils.py create mode 100644 livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/README.md create mode 100644 livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/__init__.py create mode 100644 livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/assistant_llm.py create mode 100644 livekit-plugins/livekit-plugins-playht/README.md create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/__init__.py create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/plugins/__init__.py create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/__init__.py create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/log.py create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/models.py create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/tts.py create mode 100644 livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/version.py create mode 100644 livekit-plugins/livekit-plugins-playht/package.json create mode 100644 livekit-plugins/livekit-plugins-playht/pyproject.toml create mode 100644 livekit-plugins/livekit-plugins-playht/setup.py delete mode 100644 test.py create mode 100644 tests/.gitignore diff --git a/.changeset/cuddly-eels-sin.md b/.changeset/cuddly-eels-sin.md deleted file mode 100644 index 64b87dd21..000000000 --- a/.changeset/cuddly-eels-sin.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"livekit-plugins-openai": patch ---- - -add support for Ollama, Perplexity, Fireworks, Octo, Together, and Groq LLMs through the OpenAI API diff --git a/.changeset/five-planes-drum.md b/.changeset/five-planes-drum.md deleted file mode 100644 index dd4bc1d65..000000000 --- a/.changeset/five-planes-drum.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -"livekit-agents": patch -"livekit-plugins-cartesia": patch ---- - -Switch Cartesia to a sentence tokenizer and keep the same context id throughout. -Propagate segment_id through the basic sentence tokenizer diff --git a/.changeset/itchy-ligers-exist.md b/.changeset/itchy-ligers-exist.md deleted file mode 100644 index 0ff55363e..000000000 --- a/.changeset/itchy-ligers-exist.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"livekit-plugins-nltk": patch ---- - -nltk: fix broken punkt download diff --git a/.changeset/lazy-cups-cross.md b/.changeset/lazy-cups-cross.md deleted file mode 100644 index 214469980..000000000 --- a/.changeset/lazy-cups-cross.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"livekit-agents": patch ---- - -limit simultaneous process initialization diff --git a/.changeset/moody-doors-poke.md b/.changeset/moody-doors-poke.md new file mode 100644 index 000000000..ca70304ed --- /dev/null +++ b/.changeset/moody-doors-poke.md @@ -0,0 +1,5 @@ +--- +"livekit-agents": patch +--- + +fix VoiceAssisstant being stuck when interrupting before user speech is committed diff --git a/.changeset/proud-birds-press.md b/.changeset/proud-birds-press.md deleted file mode 100644 index d9b556918..000000000 --- a/.changeset/proud-birds-press.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"livekit-agents": patch ---- - -voiceassistant: remove fade effect when interrupting #622 diff --git a/.changeset/red-taxis-smoke.md b/.changeset/red-taxis-smoke.md deleted file mode 100644 index 506ac97ac..000000000 --- a/.changeset/red-taxis-smoke.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"livekit-agents": patch ---- - -ipc improvements, fix slow shutdown & cleanup leaked resources diff --git a/.changeset/shaggy-apes-matter.md b/.changeset/shaggy-apes-matter.md deleted file mode 100644 index 38374e08b..000000000 --- a/.changeset/shaggy-apes-matter.md +++ /dev/null @@ -1,5 +0,0 @@ ---- -"livekit-plugins-deepgram": patch ---- - -deepgram: fallback to nova-2-general when the language isn't supported diff --git a/.changeset/tidy-years-refuse.md b/.changeset/tidy-years-refuse.md new file mode 100644 index 000000000..5f22709af --- /dev/null +++ b/.changeset/tidy-years-refuse.md @@ -0,0 +1,6 @@ +--- +"livekit-agents": patch +"livekit-plugins-openai": patch +--- + +Fix function for OpenAI Assistants diff --git a/.github/workflows/build-package.yml b/.github/workflows/build-package.yml new file mode 100644 index 000000000..148271a88 --- /dev/null +++ b/.github/workflows/build-package.yml @@ -0,0 +1,98 @@ +name: Build package + +on: + workflow_call: + inputs: + package: + required: true + type: string + artifact_name: + required: true + type: string + workflow_dispatch: + inputs: + package: + description: 'Name of the package to build' + required: true + default: 'livekit-plugins-browser' + artifact_name: + description: 'Artifact name for the distribution package' + required: true + default: 'build-artifact' + +jobs: + build_plugins: + runs-on: ubuntu-latest + if: | + inputs.package == 'livekit-agents' || + inputs.package == 'livekit-plugins-azure' || + inputs.package == 'livekit-plugins-cartesia' || + inputs.package == 'livekit-plugins-deepgram' || + inputs.package == 'livekit-plugins-elevenlabs' || + inputs.package == 'livekit-plugins-google' || + inputs.package == 'livekit-plugins-minimal' || + inputs.package == 'livekit-plugins-nltk' || + inputs.package == 'livekit-plugins-openai' || + inputs.package == 'livekit-plugins-rag' || + inputs.package == 'livekit-plugins-silero' || + inputs.package == 'livekit-plugins-anthropic' + + defaults: + run: + working-directory: "${{ startsWith(inputs.package, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ inputs.package }}" + steps: + - uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.9" + + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install build + + - name: Build package + run: python -m build + + - name: Upload distribution package + uses: actions/upload-artifact@v3 + with: + name: ${{ inputs.artifact_name }} + path: "${{ startsWith(inputs.package, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ inputs.package }}/dist/" + + build_browser: + if: inputs.package == 'livekit-plugins-browser' + runs-on: ${{ matrix.os }} + strategy: + matrix: + os: [macos-14] # TODO(theomonnom): other platforms + + defaults: + run: + working-directory: livekit-plugins/livekit-plugins-browser + steps: + - uses: actions/checkout@v4 + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: "3.9" + + - name: Install cibuildwheel + run: | + python -m pip install --upgrade pip + pip install cibuildwheel + + - name: Build wheels + run: cibuildwheel --output-dir dist + env: + CIBW_SKIP: pp* cp313-* + CIBW_BUILD_VERBOSITY: 3 + + - name: Upload distribution package + uses: actions/upload-artifact@v3 + with: + name: ${{ inputs.artifact_name }} + path: livekit-plugins/livekit-plugins-browser/dist/ \ No newline at end of file diff --git a/.github/workflows/check-types.yml b/.github/workflows/check-types.yml index aa560e70c..927c9e2eb 100644 --- a/.github/workflows/check-types.yml +++ b/.github/workflows/check-types.yml @@ -40,7 +40,8 @@ jobs: ./livekit-plugins/livekit-plugins-elevenlabs \ ./livekit-plugins/livekit-plugins-cartesia \ ./livekit-plugins/livekit-plugins-rag \ - ./livekit-plugins/livekit-plugins-azure + ./livekit-plugins/livekit-plugins-azure \ + ./livekit-plugins/livekit-plugins-anthropic - name: Install stub packages run: | @@ -67,4 +68,5 @@ jobs: -p livekit.plugins.elevenlabs \ -p livekit.plugins.cartesia \ -p livekit.plugins.rag \ - -p livekit.plugins.azure + -p livekit.plugins.azure \ + -p livekit.plugins.anthropic diff --git a/.github/workflows/publish-package.yml b/.github/workflows/publish-package.yml index 6f0686ccf..2b997895b 100644 --- a/.github/workflows/publish-package.yml +++ b/.github/workflows/publish-package.yml @@ -52,6 +52,7 @@ jobs: echo "exitcode=$?" >> $GITHUB_OUTPUT env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - name: Add changes if: ${{ steps.release_mode.outputs.exitcode == '0' }} uses: EndBug/add-and-commit@v9 @@ -79,38 +80,11 @@ jobs: strategy: matrix: package: ${{ fromJson(needs.bump.outputs.packages) }} - defaults: - run: - working-directory: "${{ startsWith(matrix.package.name, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ matrix.package.name }}" - - runs-on: ubuntu-latest - - steps: - - uses: actions/checkout@v4 - with: - submodules: true - lfs: true - env: - GITHUB_TOKEN: ${{ secrets.CHANGESETS_PUSH_PAT }} - - name: Set up Python - uses: actions/setup-python@v5 - with: - python-version: "3.9" - - - name: Install dependencies - run: | - python -m pip install --upgrade pip - pip install build - - - name: Build package - run: python -m build - - - name: Store the distribution packages - uses: actions/upload-artifact@v3 - with: - name: python-package-distributions - path: "${{ startsWith(matrix.package.name, 'livekit-plugin') && 'livekit-plugins/' || '' }}${{ matrix.package.name }}/dist/" + uses: livekit/agents/.github/workflows/build-package.yml@main + with: + package: ${{ matrix.package.name }} + artifact_name: python-package-distributions publish: needs: diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml index 0b900ea92..9d6f73da0 100644 --- a/.github/workflows/tests.yml +++ b/.github/workflows/tests.yml @@ -13,6 +13,11 @@ on: jobs: tests: + if: > # don't run tests for PRs on forks + ${{ + !github.event.pull_request || + github.event.pull_request.head.repo.full_name == github.repository + }} strategy: fail-fast: false matrix: @@ -75,7 +80,8 @@ jobs: ./livekit-plugins/livekit-plugins-silero \ ./livekit-plugins/livekit-plugins-elevenlabs \ ./livekit-plugins/livekit-plugins-cartesia \ - ./livekit-plugins/livekit-plugins-azure + ./livekit-plugins/livekit-plugins-azure \ + ./livekit-plugins/livekit-plugins-anthropic - name: Run tests shell: bash @@ -90,6 +96,7 @@ jobs: AZURE_SPEECH_KEY: ${{ secrets.AZURE_SPEECH_KEY }} AZURE_SPEECH_REGION: ${{ secrets.AZURE_SPEECH_REGION }} # nit: doesn't have to be secret GOOGLE_CREDENTIALS_JSON: ${{ secrets.GOOGLE_CREDENTIALS_JSON }} + ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }} GOOGLE_APPLICATION_CREDENTIALS: google.json run: | echo $GOOGLE_CREDENTIALS_JSON > google.json diff --git a/README.md b/README.md index ce4b8bd7f..3204d2bab 100644 --- a/README.md +++ b/README.md @@ -61,6 +61,7 @@ The following plugins are available today: | Plugin | Features | | ---------------------------------------------------------------------------------- | ------------------------------- | +| [livekit-plugins-anthropic](https://pypi.org/project/livekit-plugins-anthropic/) | LLM | | [livekit-plugins-azure](https://pypi.org/project/livekit-plugins-azure/) | STT, TTS | | [livekit-plugins-cartesia](https://pypi.org/project/livekit-plugins-cartesia/) | TTS | | [livekit-plugins-deepgram](https://pypi.org/project/livekit-plugins-deepgram/) | STT | @@ -70,6 +71,38 @@ The following plugins are available today: | [livekit-plugins-openai](https://pypi.org/project/livekit-plugins-openai/) | LLM, STT, TTS | | [livekit-plugins-silero](https://pypi.org/project/livekit-plugins-silero/) | VAD | +## Using LLM models + +Agents framework supports a wide range of LLMs and hosting providers. + +### OpenAI-compatible models + +Most LLM providers offer an OpenAI-compatible API, which can be used with the `livekit-plugins-openai` plugin. + +```python +from livekit.plugins.openai.llm import LLM +``` + +- OpenAI: `LLM(model="gpt-4o")` +- Azure: `LLM.with_azure(azure_endpoint="", azure_deployment="")` +- Cerebras: `LLM.with_cerebras(api_key="", model="")` +- Fireworks: `LLM.with_fireworks(api_key="", model="")` +- Groq: `LLM.with_groq(api_key="", model="")` +- OctoAI: `LLM.with_octo(api_key="", model="")` +- Ollama: `LLM.with_ollama(base_url="http://localhost:11434/v1", model="")` +- Perplexity: `LLM.with_perplexity(api_key="", model="")` +- TogetherAI: `LLM.with_together(api_key="", model="")` + +### Anthropic Claude + +Anthropic Claude can be used with `livekit-plugins-anthropic` plugin. + +```python +from livekit.plugins.anthropic.llm import LLM + +myllm = LLM(model="claude-3-opus-20240229") +``` + ## Concepts - **Agent**: A function that defines the workflow of a programmable, server-side participant. This is your application code. @@ -153,7 +186,9 @@ class MyPlugin(Plugin): ``` +
+ diff --git a/examples/browser/browser_track.py b/examples/browser/browser_track.py new file mode 100644 index 000000000..998da7979 --- /dev/null +++ b/examples/browser/browser_track.py @@ -0,0 +1,55 @@ +import asyncio +import logging + +from dotenv import load_dotenv +from livekit import rtc +from livekit.agents import JobContext, WorkerOptions, cli +from livekit.plugins import browser + +WIDTH = 1920 +HEIGHT = 1080 + +load_dotenv() + + +async def entrypoint(job: JobContext): + await job.connect() + + ctx = browser.BrowserContext(dev_mode=True) + await ctx.initialize() + + page = await ctx.new_page(url="www.livekit.io") + + source = rtc.VideoSource(WIDTH, HEIGHT) + track = rtc.LocalVideoTrack.create_video_track("single-color", source) + options = rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_CAMERA) + publication = await job.room.local_participant.publish_track(track, options) + logging.info("published track", extra={"track_sid": publication.sid}) + + @page.on("paint") + def on_paint(paint_data): + source.capture_frame(paint_data.frame) + + async def _test_cycle(): + urls = [ + "https://www.livekit.io", + "https://www.google.com", + ] + + i = 0 + async with ctx.playwright() as browser: + while True: + i += 1 + await asyncio.sleep(5) + defaultContext = browser.contexts[0] + defaultPage = defaultContext.pages[0] + try: + await defaultPage.goto(urls[i % len(urls)]) + except Exception: + logging.exception(f"failed to navigate to {urls[i % len(urls)]}") + + await _test_cycle() + + +if __name__ == "__main__": + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) diff --git a/examples/browser/standalone_app.py b/examples/browser/standalone_app.py new file mode 100644 index 000000000..fdc4bad04 --- /dev/null +++ b/examples/browser/standalone_app.py @@ -0,0 +1,3 @@ +from livekit.plugins import browser + +ctx = browser.BrowserContext(dev_mode=True) diff --git a/examples/minimal_worker.py b/examples/minimal_worker.py index aeca197cc..e3a9ed3b9 100644 --- a/examples/minimal_worker.py +++ b/examples/minimal_worker.py @@ -1,6 +1,6 @@ import logging -from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli +from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, WorkerType, cli logger = logging.getLogger("my-worker") logger.setLevel(logging.INFO) @@ -16,4 +16,6 @@ async def entrypoint(ctx: JobContext): if __name__ == "__main__": - cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) + # WorkerType.ROOM is the default worker type which will create an agent for every room. + # You can also use WorkerType.PUBLISHER to create a single agent for all participants that publish a track. + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, worker_type=WorkerType.ROOM)) diff --git a/examples/participant-entrypoint/README.md b/examples/participant-entrypoint/README.md new file mode 100644 index 000000000..249912907 --- /dev/null +++ b/examples/participant-entrypoint/README.md @@ -0,0 +1,30 @@ +# Participant Entrypoint Example + +This example shows how to do things when participants join. For example, a common use case is to fetch some external data based on the participant's attributes. + +## Run + +### Setup and activate a virtual env: + +`python -m venv venv` + +`source venv/bin/activate` + +### Set environment variables: + +```bash +export LIVEKIT_URL= +export LIVEKIT_API_KEY= +export LIVEKIT_API_SECRET= +``` + +### Install requirments: +`pip install -r requirements.txt` + +### Run the agent worker: + +`python participant_entrypoint.py dev` + +### Test with a LiveKit frontend: + +We've built [Agents Playground](https://agents-playground.livekit.io) so you don't have to build your own frontend while you iterate on your agent. diff --git a/examples/participant-entrypoint/participant_entrypoint.py b/examples/participant-entrypoint/participant_entrypoint.py new file mode 100644 index 000000000..5c8c38c69 --- /dev/null +++ b/examples/participant-entrypoint/participant_entrypoint.py @@ -0,0 +1,44 @@ +import asyncio +import logging + +from dotenv import load_dotenv +from livekit import rtc +from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli + +load_dotenv() + +logger = logging.getLogger("my-worker") +logger.setLevel(logging.INFO) + + +async def entrypoint(ctx: JobContext): + logger.info("starting entrypoint") + + async def participant_task_1(ctx: JobContext, p: rtc.RemoteParticipant): + # You can filter out participants you are not interested in + # if p.identity != "some_identity_of_interest": + # return + + logger.info(f"participant task 1 starting for {p.identity}") + # Do something with p.attributes, p.identity, p.metadata, etc. + # my_stuff = await fetch_stuff_from_my_db(p) + + # Do something + await asyncio.sleep(60) + logger.info(f"participant task done for {p.identity}") + + async def participant_task_2(ctx: JobContext, p: rtc.RemoteParticipant): + # multiple tasks can be run concurrently for each participant + logger.info(f"participant task 2 starting for {p.identity}") + await asyncio.sleep(10) + + # Add participant entrypoints before calling ctx.connect + ctx.add_participant_entrypoint(entrypoint_fnc=participant_task_1) + ctx.add_participant_entrypoint(entrypoint_fnc=participant_task_2) + + await ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_ALL) + logger.info("connected to the room") + + +if __name__ == "__main__": + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) diff --git a/examples/participant-entrypoint/requirements.txt b/examples/participant-entrypoint/requirements.txt new file mode 100644 index 000000000..468a9e5d2 --- /dev/null +++ b/examples/participant-entrypoint/requirements.txt @@ -0,0 +1 @@ +livekit-agents>=0.9.0 diff --git a/examples/simple-color/agent.py b/examples/simple-color/agent.py index e64f5cfda..57fc99952 100644 --- a/examples/simple-color/agent.py +++ b/examples/simple-color/agent.py @@ -1,15 +1,17 @@ import asyncio import logging +import random +from dotenv import load_dotenv from livekit import rtc from livekit.agents import JobContext, WorkerOptions, cli +# Load environment variables +load_dotenv() + WIDTH = 640 HEIGHT = 480 -# change this color in dev mode and the agent will automatically update -COLOR = bytes([0, 255, 0, 255]) - async def entrypoint(job: JobContext): await job.connect() @@ -26,7 +28,12 @@ async def _draw_color(): while True: await asyncio.sleep(0.1) # 100ms - argb_frame[:] = COLOR * WIDTH * HEIGHT + # Create a new random color + r, g, b = [random.randint(0, 255) for _ in range(3)] + color = bytes([r, g, b, 255]) + + # Fill the frame with the new random color + argb_frame[:] = color * WIDTH * HEIGHT frame = rtc.VideoFrame(WIDTH, HEIGHT, rtc.VideoBufferType.RGBA, argb_frame) source.capture_frame(frame) diff --git a/examples/simple-color/requirements.txt b/examples/simple-color/requirements.txt index 0e6eb52ae..468a9e5d2 100644 --- a/examples/simple-color/requirements.txt +++ b/examples/simple-color/requirements.txt @@ -1 +1 @@ -livekit-agents>=0.8.5 +livekit-agents>=0.9.0 diff --git a/examples/speech-to-text/README.md b/examples/speech-to-text/README.md index 700497899..f468b601c 100644 --- a/examples/speech-to-text/README.md +++ b/examples/speech-to-text/README.md @@ -1,18 +1,14 @@ # Speech-to-text -This example shows how you can transcript real-time audio data into text. +This example show realtime transcription from audio to text. -It uses Deepgram's STT API to transcript the audio data. It can be switched to -other STT providers by changing this line: +It uses Deepgram's STT API, but supports other STT plugins by changing this line: ```python stt = deepgram.STT() ``` -All transcriptions are sent to clients in the room with LiveKit's transcription protocol. - -It's currently supported in the JS SDK and React Components. This will be made available for -all other SDKs in the coming weeks. +To render the transcriptions into your client application, refer to the [full documentation](https://docs.livekit.io/agents/build/transcriptions). ## Running the example diff --git a/examples/speech-to-text/deepgram_stt.py b/examples/speech-to-text/deepgram_stt.py index 6a8cc100e..24d770f68 100644 --- a/examples/speech-to-text/deepgram_stt.py +++ b/examples/speech-to-text/deepgram_stt.py @@ -1,6 +1,7 @@ import asyncio import logging +from dotenv import load_dotenv from livekit import rtc from livekit.agents import ( AutoSubscribe, @@ -12,6 +13,8 @@ ) from livekit.plugins import deepgram +load_dotenv() + logger = logging.getLogger("deepgram-stt-demo") logger.setLevel(logging.INFO) diff --git a/examples/speech-to-text/requirements.txt b/examples/speech-to-text/requirements.txt index 852095813..eb367925c 100644 --- a/examples/speech-to-text/requirements.txt +++ b/examples/speech-to-text/requirements.txt @@ -1,2 +1,2 @@ -livekit-agents>=0.8.5 -livekit-plugins-deepgram>=0.6.4 +livekit-agents>=0.9.0 +livekit-plugins-deepgram>=0.6.7 diff --git a/examples/text-to-speech/cartesia_tts.py b/examples/text-to-speech/cartesia_tts.py new file mode 100644 index 000000000..2f87ee975 --- /dev/null +++ b/examples/text-to-speech/cartesia_tts.py @@ -0,0 +1,43 @@ +import asyncio +import logging + +from dotenv import load_dotenv +from livekit import rtc +from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli +from livekit.plugins import cartesia + +load_dotenv() + +logger = logging.getLogger("cartesia-tts-demo") +logger.setLevel(logging.INFO) + + +async def entrypoint(job: JobContext): + logger.info("starting tts example agent") + + tts = cartesia.TTS( + speed="fastest", + emotion=["surprise:highest"], + ) + + source = rtc.AudioSource(tts.sample_rate, tts.num_channels) + track = rtc.LocalAudioTrack.create_audio_track("agent-mic", source) + options = rtc.TrackPublishOptions() + options.source = rtc.TrackSource.SOURCE_MICROPHONE + + await job.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_NONE) + publication = await job.room.local_participant.publish_track(track, options) + await publication.wait_for_subscription() + + logger.info('Saying "Hello!"') + async for output in tts.synthesize("Hello I hope you are having a great day."): + await source.capture_frame(output.frame) + + await asyncio.sleep(4) + logger.info('Saying "Goodbye."') + async for output in tts.synthesize("Goodbye I hope to see you again soon."): + await source.capture_frame(output.frame) + + +if __name__ == "__main__": + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) diff --git a/examples/text-to-speech/elevenlabs_tts.py b/examples/text-to-speech/elevenlabs_tts.py index 7f6180402..91e1bd7b5 100644 --- a/examples/text-to-speech/elevenlabs_tts.py +++ b/examples/text-to-speech/elevenlabs_tts.py @@ -2,6 +2,7 @@ import logging from typing import Optional +from dotenv import load_dotenv from livekit import rtc from livekit.agents import JobContext, WorkerOptions, cli from livekit.plugins import elevenlabs @@ -9,6 +10,8 @@ logger = logging.getLogger("elevenlabs-tts-demo") logger.setLevel(logging.INFO) +load_dotenv() + def _text_to_chunks(text: str) -> list[str]: """Split the text into chunks of 2, 3, and 4 words""" @@ -51,9 +54,9 @@ async def entrypoint(job: JobContext): options.source = rtc.TrackSource.SOURCE_MICROPHONE await job.connect() - await job.room.local_participant.publish_track(track, options) + publication = await job.room.local_participant.publish_track(track, options) + await publication.wait_for_subscription() - await asyncio.sleep(1) logger.info('Saying "Bonjour, comment allez-vous?"') async for output in tts_11labs.synthesize("Bonjour, comment allez-vous?"): await source.capture_frame(output.frame) diff --git a/examples/text-to-speech/openai_tts.py b/examples/text-to-speech/openai_tts.py index 0edcb3b7c..fce018309 100644 --- a/examples/text-to-speech/openai_tts.py +++ b/examples/text-to-speech/openai_tts.py @@ -1,10 +1,13 @@ import asyncio import logging +from dotenv import load_dotenv from livekit import rtc from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli from livekit.plugins import openai +load_dotenv() + logger = logging.getLogger("openai-tts-demo") logger.setLevel(logging.INFO) @@ -20,9 +23,9 @@ async def entrypoint(job: JobContext): options.source = rtc.TrackSource.SOURCE_MICROPHONE await job.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_NONE) - await job.room.local_participant.publish_track(track, options) + publication = await job.room.local_participant.publish_track(track, options) + await publication.wait_for_subscription() - await asyncio.sleep(1) logger.info('Saying "Hello!"') async for output in tts.synthesize("Hello!"): await source.capture_frame(output.frame) diff --git a/examples/text-to-speech/requirements.txt b/examples/text-to-speech/requirements.txt index 292d588ad..e81e20304 100644 --- a/examples/text-to-speech/requirements.txt +++ b/examples/text-to-speech/requirements.txt @@ -1,2 +1,4 @@ -livekit-agents>=0.8.5 -livekit-plugins-openai>=0.8.0 +livekit-agents>=0.9.0 +livekit-plugins-openai>=0.8.4 +livekit-plugins-cartesia>=0.4.2 +livekit-plugins-elevenlabs>=0.7.5 diff --git a/examples/text-to-speech/sync_tts_transcription.py b/examples/text-to-speech/sync_tts_transcription.py index 545247ccd..d7a349b56 100644 --- a/examples/text-to-speech/sync_tts_transcription.py +++ b/examples/text-to-speech/sync_tts_transcription.py @@ -2,6 +2,7 @@ import logging from typing import Optional +from dotenv import load_dotenv from livekit import rtc from livekit.agents import ( AutoSubscribe, @@ -13,6 +14,8 @@ ) from livekit.plugins import elevenlabs +load_dotenv() + logger = logging.getLogger("transcription-forwarding-demo") logger.setLevel(logging.INFO) @@ -27,14 +30,14 @@ async def entrypoint(ctx: JobContext): options = rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_MICROPHONE) await ctx.connect(auto_subscribe=AutoSubscribe.SUBSCRIBE_NONE) - await ctx.room.local_participant.publish_track(track, options) + publication = await ctx.room.local_participant.publish_track(track, options) + await publication.wait_for_subscription() # start the transcription examples tts_forwarder = transcription.TTSSegmentsForwarder( room=ctx.room, participant=ctx.room.local_participant ) - await asyncio.sleep(2) await _eg_single_segment(tts_forwarder, tts_11labs, source) await asyncio.sleep(2) diff --git a/examples/voice-assistant/README.md b/examples/voice-assistant/README.md index d9b48ff82..6f7e176fb 100644 --- a/examples/voice-assistant/README.md +++ b/examples/voice-assistant/README.md @@ -1,15 +1,19 @@ -# Voice Assistant Example +# Voice Assistant Examples + +We have a few examples that shows the various ways of using using the VoiceAssistant class: -This example shows two usages of the VoiceAssistant class: - `minimal_assistant.py`: a basic conversational assistant -- `function_calling.py`: a voice assistant capable of obeying commands (turning on/off a mock room's lights) +- `function_calling_weather.py`: a weather assistant that calls an API endpoint to retrieve the weather +- `custom_pronunciation.py`: using the `before_tts_cb` hook to customize how TTS pronounces words +- `simple_rag`: a simple RAG assistant that answers questions by querying a embeddings index + +The demo assistants use: -Both assistants use: - Deepgram for Speech-to-text -- OpenAI for LLM -- Elevenlabs for Text-to-speech +- OpenAI for LLM and Text-to-speech ## Run + Instructions for running the two agents are identical, the following steps will assume you are running `minimal_assistant.py` ### Setup and activate a virtual env: @@ -24,17 +28,13 @@ Instructions for running the two agents are identical, the following steps will export LIVEKIT_URL= export LIVEKIT_API_KEY= export LIVEKIT_API_SECRET= -export ELEVEN_API_KEY= export DEEPGRAM_API_KEY= export OPENAI_API_KEY= ``` ### Install requirments: -`pip install -r requirements.txt` - -### Download files (in this case, it downloads the model weights for Voice-activity-detection): -`python minimal_assistant.py download-files` +`pip install -r requirements.txt` ### Run the agent worker: diff --git a/examples/voice-assistant/custom_pronunciation.py b/examples/voice-assistant/custom_pronunciation.py new file mode 100644 index 000000000..e6ff7cd52 --- /dev/null +++ b/examples/voice-assistant/custom_pronunciation.py @@ -0,0 +1,49 @@ +from __future__ import annotations + +from typing import AsyncIterable + +from dotenv import load_dotenv +from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm, tokenize +from livekit.agents.voice_assistant import VoiceAssistant +from livekit.plugins import cartesia, deepgram, openai, silero + +load_dotenv() + + +async def entrypoint(ctx: JobContext): + initial_ctx = llm.ChatContext().append( + role="system", + text=( + "You are a voice assistant created by LiveKit. Your interface with users will be voice. " + "You should use short and concise responses, and avoiding usage of unpronouncable punctuation." + ), + ) + + await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) + + def _before_tts_cb(assistant: VoiceAssistant, text: str | AsyncIterable[str]): + # The TTS is incorrectly pronouncing "LiveKit", so we'll replace it with a phonetic + # spelling + return tokenize.utils.replace_words( + text=text, replacements={"livekit": r"<>"} + ) + + # also for this example, we also intensify the keyword "LiveKit" to make it more likely to be + # recognized with the STT + deepgram_stt = deepgram.STT(keywords=[("LiveKit", 3.5)]) + + assistant = VoiceAssistant( + vad=silero.VAD.load(), + stt=deepgram_stt, + llm=openai.LLM(), + tts=cartesia.TTS(), + chat_ctx=initial_ctx, + before_tts_cb=_before_tts_cb, + ) + assistant.start(ctx.room) + + await assistant.say("Hey, LiveKit is awesome!", allow_interruptions=True) + + +if __name__ == "__main__": + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) diff --git a/examples/voice-assistant/function_calling.py b/examples/voice-assistant/function_calling.py deleted file mode 100644 index 9392bc900..000000000 --- a/examples/voice-assistant/function_calling.py +++ /dev/null @@ -1,115 +0,0 @@ -import asyncio -import enum -import logging -from typing import Annotated - -from dotenv import load_dotenv -from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm -from livekit.agents.voice_assistant import VoiceAssistant -from livekit.plugins import deepgram, openai, silero - -load_dotenv() - -logger = logging.getLogger("function-calling-demo") -logger.setLevel(logging.INFO) - - -class Room(enum.Enum): - # ai_callable can understand enum types as a set of choices - # this is equivalent to: - # `Annotated[Room, llm.TypeInfo(choices=["bedroom", "living room", "kitchen", "bathroom", "office"])]` - BEDROOM = "bedroom" - LIVING_ROOM = "living room" - KITCHEN = "kitchen" - BATHROOM = "bathroom" - OFFICE = "office" - - -class AssistantFnc(llm.FunctionContext): - """ - The class defines a set of AI functions that the assistant can execute. - """ - - def __init__(self) -> None: - super().__init__() - - # default state of the lights in each room - self._light_status = { - Room.BEDROOM: False, - Room.LIVING_ROOM: True, - Room.KITCHEN: True, - Room.BATHROOM: False, - Room.OFFICE: False, - } - - @property - def light_status(self): - return self._light_status - - # Simple demonstration of an AI function that can be called by the user with some arguments. - @llm.ai_callable(description="Turn on/off the lights in a room") - async def toggle_light( - self, - room: Annotated[Room, llm.TypeInfo(description="The specific room")], - status: bool, - ): - logger.info("toggle_light - room: %s status: %s", room, status) - self._light_status[room] = status - return f"Turned the lights in the {room} {'on' if status else 'off'}" - - -async def entrypoint(ctx: JobContext): - fnc_ctx = AssistantFnc() # create our fnc ctx instance - - async def _will_synthesize_assistant_reply( - assistant: VoiceAssistant, chat_ctx: llm.ChatContext - ): - # Inject the current state of the lights into the context of the LLM - chat_ctx = chat_ctx.copy() - chat_ctx.messages.append( - llm.ChatMessage( - content=( - "Current state of the lights:\n" - + "\n".join( - f"- {room}: {'on' if status else 'off'}" - for room, status in fnc_ctx.light_status.items() - ) - ), - role="system", - ) - ) - return assistant.llm.chat(chat_ctx=chat_ctx, fnc_ctx=assistant.fnc_ctx) - - initial_chat_ctx = llm.ChatContext() - initial_chat_ctx.messages.append( - llm.ChatMessage( - content=( - "You are a home assistant created by LiveKit. Your interface with users will be voice. " - "You should use short and concise responses, and avoiding usage of unpronouncable punctuation. " - ), - role="system", - ) - ) - - assistant = VoiceAssistant( - vad=silero.VAD.load(), - stt=deepgram.STT(), - llm=openai.LLM(), - tts=openai.TTS(), - fnc_ctx=fnc_ctx, - chat_ctx=initial_chat_ctx, - will_synthesize_assistant_reply=_will_synthesize_assistant_reply, - ) - - await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) - - # Start the assistant. This will automatically publish a microphone track and listen to the first participant - # it finds in the current room. If you need to specify a particular participant, use the participant parameter. - assistant.start(ctx.room) - - await asyncio.sleep(2) - await assistant.say("Hey, how can I help you today?") - - -if __name__ == "__main__": - cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) diff --git a/examples/voice-assistant/function_calling_weather.py b/examples/voice-assistant/function_calling_weather.py new file mode 100644 index 000000000..82155cce1 --- /dev/null +++ b/examples/voice-assistant/function_calling_weather.py @@ -0,0 +1,85 @@ +import logging +from typing import Annotated + +import aiohttp +from dotenv import load_dotenv +from livekit.agents import ( + AutoSubscribe, + JobContext, + JobProcess, + WorkerOptions, + cli, + llm, +) +from livekit.agents.voice_assistant import VoiceAssistant +from livekit.plugins import deepgram, openai, silero + +load_dotenv() + +logger = logging.getLogger("weather-demo") +logger.setLevel(logging.INFO) + + +class AssistantFnc(llm.FunctionContext): + """ + The class defines a set of LLM functions that the assistant can execute. + """ + + @llm.ai_callable() + async def get_weather( + self, + location: Annotated[ + str, llm.TypeInfo(description="The location to get the weather for") + ], + ): + """Called when the user asks about the weather. This function will return the weather for the given location.""" + logger.info(f"getting weather for {location}") + url = f"https://wttr.in/{location}?format=%C+%t" + async with aiohttp.ClientSession() as session: + async with session.get(url) as response: + if response.status == 200: + weather_data = await response.text() + # response from the function call is returned to the LLM + return f"The weather in {location} is {weather_data}." + else: + raise f"Failed to get weather data, status code: {response.status}" + + +def prewarm_process(proc: JobProcess): + # preload silero VAD in memory to speed up session start + proc.userdata["vad"] = silero.VAD.load() + + +async def entrypoint(ctx: JobContext): + await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) + fnc_ctx = AssistantFnc() # create our fnc ctx instance + initial_chat_ctx = llm.ChatContext().append( + text=( + "You are a weather assistant created by LiveKit. Your interface with users will be voice. " + "You will provide weather information for a given location." + ), + role="system", + ) + participant = await ctx.wait_for_participant() + assistant = VoiceAssistant( + vad=ctx.proc.userdata["vad"], + stt=deepgram.STT(), + llm=openai.LLM(), + tts=openai.TTS(), + fnc_ctx=fnc_ctx, + chat_ctx=initial_chat_ctx, + ) + # Start the assistant. This will automatically publish a microphone track and listen to the participant. + assistant.start(ctx.room, participant) + await assistant.say( + "Hello from the weather station. Would you like to know the weather? If so, tell me your location." + ) + + +if __name__ == "__main__": + cli.run_app( + WorkerOptions( + entrypoint_fnc=entrypoint, + prewarm_fnc=prewarm_process, + ), + ) diff --git a/examples/voice-assistant/minimal_assistant.py b/examples/voice-assistant/minimal_assistant.py index 35e0dee8e..c1aec2a44 100644 --- a/examples/voice-assistant/minimal_assistant.py +++ b/examples/voice-assistant/minimal_assistant.py @@ -1,12 +1,25 @@ import asyncio +import logging from dotenv import load_dotenv from livekit import rtc -from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm +from livekit.agents import ( + AutoSubscribe, + JobContext, + JobProcess, + WorkerOptions, + cli, + llm, +) from livekit.agents.voice_assistant import VoiceAssistant from livekit.plugins import deepgram, openai, silero load_dotenv() +logger = logging.getLogger("voice-assistant") + + +def prewarm(proc: JobProcess): + proc.userdata["vad"] = silero.VAD.load() async def entrypoint(ctx: JobContext): @@ -18,16 +31,27 @@ async def entrypoint(ctx: JobContext): ), ) + logger.info(f"connecting to room {ctx.room.name}") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) + # wait for the first participant to connect + participant = await ctx.wait_for_participant() + logger.info(f"starting voice assistant for participant {participant.identity}") + + dg_model = "nova-2-general" + if participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP: + # use a model optimized for telephony + dg_model = "nova-2-phonecall" + assistant = VoiceAssistant( - vad=silero.VAD.load(), - stt=deepgram.STT(), + vad=ctx.proc.userdata["vad"], + stt=deepgram.STT(model=dg_model), llm=openai.LLM(), tts=openai.TTS(), chat_ctx=initial_ctx, ) - assistant.start(ctx.room) + + assistant.start(ctx.room, participant) # listen to incoming chat messages, only required if you'd like the agent to # answer incoming messages from Chat @@ -44,9 +68,8 @@ def on_chat_received(msg: rtc.ChatMessage): if msg.message: asyncio.create_task(answer_from_text(msg.message)) - await asyncio.sleep(1) await assistant.say("Hey, how can I help you today?", allow_interruptions=True) if __name__ == "__main__": - cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm)) diff --git a/examples/voice-assistant/requirements.txt b/examples/voice-assistant/requirements.txt index 1c92c23ae..7071396dc 100644 --- a/examples/voice-assistant/requirements.txt +++ b/examples/voice-assistant/requirements.txt @@ -1,5 +1,6 @@ -livekit-agents>=0.8.5 -livekit-plugins-openai>=0.8.0 -livekit-plugins-deepgram>=0.6.4 -livekit-plugins-silero>=0.6.3 -python-dotenv~=1.0 \ No newline at end of file +livekit-agents>=0.9.0 +livekit-plugins-openai>=0.8.4 +livekit-plugins-deepgram>=0.6.7 +livekit-plugins-silero>=0.6.4 +python-dotenv~=1.0 +aiofile~=3.8.8 diff --git a/examples/voice-assistant/save_chatctx.py b/examples/voice-assistant/save_chatctx.py new file mode 100644 index 000000000..d6b1b6ac6 --- /dev/null +++ b/examples/voice-assistant/save_chatctx.py @@ -0,0 +1,84 @@ +import asyncio +from datetime import datetime + +from aiofile import async_open as open +from dotenv import load_dotenv +from livekit import rtc +from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm +from livekit.agents.voice_assistant import VoiceAssistant +from livekit.plugins import deepgram, openai, silero + +load_dotenv() + + +async def entrypoint(ctx: JobContext): + initial_ctx = llm.ChatContext().append( + role="system", + text=( + "You are a voice assistant created by LiveKit. Your interface with users will be voice. " + "You should use short and concise responses, and avoiding usage of unpronouncable punctuation." + ), + ) + + await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) + + assistant = VoiceAssistant( + vad=silero.VAD.load(), + stt=deepgram.STT(), + llm=openai.LLM(), + tts=openai.TTS(), + chat_ctx=initial_ctx, + ) + assistant.start(ctx.room) + + # listen to incoming chat messages, only required if you'd like the agent to + # answer incoming messages from Chat + chat = rtc.ChatManager(ctx.room) + + async def answer_from_text(txt: str): + chat_ctx = assistant.chat_ctx.copy() + chat_ctx.append(role="user", text=txt) + stream = assistant.llm.chat(chat_ctx=chat_ctx) + await assistant.say(stream) + + @chat.on("message_received") + def on_chat_received(msg: rtc.ChatMessage): + if msg.message: + asyncio.create_task(answer_from_text(msg.message)) + + log_queue = asyncio.Queue() + + @assistant.on("user_speech_committed") + def on_user_speech_committed(msg: llm.ChatMessage): + # convert string lists to strings, drop images + if isinstance(msg.content, list): + msg.content = "\n".join( + "[image]" if isinstance(x, llm.ChatImage) else x for x in msg + ) + log_queue.put_nowait(f"[{datetime.now()}] USER:\n{msg.content}\n\n") + + @assistant.on("agent_speech_committed") + def on_agent_speech_committed(msg: llm.ChatMessage): + log_queue.put_nowait(f"[{datetime.now()}] AGENT:\n{msg.content}\n\n") + + async def write_transcription(): + async with open("transcriptions.log", "w") as f: + while True: + msg = await log_queue.get() + if msg is None: + break + await f.write(msg) + + write_task = asyncio.create_task(write_transcription()) + + async def finish_queue(): + log_queue.put_nowait(None) + await write_task + + ctx.add_shutdown_callback(finish_queue) + + await assistant.say("Hey, how can I help you today?", allow_interruptions=True) + + +if __name__ == "__main__": + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) diff --git a/examples/voice-assistant/simple-rag/assistant.py b/examples/voice-assistant/simple-rag/assistant.py index 84d9aa6ae..1bbcda056 100644 --- a/examples/voice-assistant/simple-rag/assistant.py +++ b/examples/voice-assistant/simple-rag/assistant.py @@ -1,4 +1,3 @@ -import asyncio import pickle from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm @@ -13,9 +12,10 @@ async def entrypoint(ctx: JobContext): - async def _will_synthesize_assistant_answer( - assistant: VoiceAssistant, chat_ctx: llm.ChatContext - ): + async def _enrich_with_rag(assistant: VoiceAssistant, chat_ctx: llm.ChatContext): + # locate the last user message and use it to query the RAG model + # to get the most relevant paragraph + # then provide that as additional context to the LLM user_msg = chat_ctx.messages[-1] user_embedding = await openai.create_embeddings( input=[user_msg.content], @@ -28,7 +28,6 @@ async def _will_synthesize_assistant_answer( user_msg.content = ( "Context:\n" + paragraph + "\n\nUser question: " + user_msg.content ) - return assistant.llm.chat(chat_ctx=chat_ctx, fnc_ctx=assistant.fnc_ctx) initial_ctx = llm.ChatContext().append( role="system", @@ -47,12 +46,11 @@ async def _will_synthesize_assistant_answer( stt=deepgram.STT(), llm=openai.LLM(), tts=openai.TTS(), - will_synthesize_assistant_reply=_will_synthesize_assistant_answer, + before_llm_cb=_enrich_with_rag, ) assistant.start(ctx.room) - await asyncio.sleep(1) await assistant.say("Hey, how can I help you today?", allow_interruptions=True) diff --git a/livekit-agents/CHANGELOG.md b/livekit-agents/CHANGELOG.md index f56c62696..698682dc5 100644 --- a/livekit-agents/CHANGELOG.md +++ b/livekit-agents/CHANGELOG.md @@ -1,5 +1,131 @@ # livekit-agents +## 0.9.0 + +### Minor Changes + +- rename voice_assistant.state to lk.agent.state - [#772](https://github.com/livekit/agents/pull/772) ([@bcherry](https://github.com/bcherry)) + +### Patch Changes + +- bump rtc - [#782](https://github.com/livekit/agents/pull/782) ([@nbsp](https://github.com/nbsp)) + +- improve graceful shutdown - [#756](https://github.com/livekit/agents/pull/756) ([@theomonnom](https://github.com/theomonnom)) + +- avoid returning tiny frames from TTS - [#747](https://github.com/livekit/agents/pull/747) ([@theomonnom](https://github.com/theomonnom)) + +- windows: default to threaded executor & fix dev mode - [#755](https://github.com/livekit/agents/pull/755) ([@theomonnom](https://github.com/theomonnom)) + +- 11labs: send phoneme in one entire xml chunk - [#766](https://github.com/livekit/agents/pull/766) ([@theomonnom](https://github.com/theomonnom)) + +- fix: process not starting if num_idle_processes is zero - [#763](https://github.com/livekit/agents/pull/763) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: avoid tiny frames on playout - [#750](https://github.com/livekit/agents/pull/750) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: expose turn_completion_delay - [#752](https://github.com/livekit/agents/pull/752) ([@theomonnom](https://github.com/theomonnom)) + +- limit concurrent process init to 1 - [#751](https://github.com/livekit/agents/pull/751) ([@theomonnom](https://github.com/theomonnom)) + +- Add typing-extensions as a dependency - [#778](https://github.com/livekit/agents/pull/778) ([@keepingitneil](https://github.com/keepingitneil)) + +- Allow setting LLM temperature with VoiceAssistant - [#741](https://github.com/livekit/agents/pull/741) ([@davidzhao](https://github.com/davidzhao)) + +- better dev defaults - [#762](https://github.com/livekit/agents/pull/762) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: allow to cancel llm generation inside before_llm_cb - [#753](https://github.com/livekit/agents/pull/753) ([@theomonnom](https://github.com/theomonnom)) + +- use os.exit to exit forcefully - [#770](https://github.com/livekit/agents/pull/770) ([@theomonnom](https://github.com/theomonnom)) + +## 0.8.12 + +### Patch Changes + +- tts*forwarder: don't raise inside mark*{audio,text}\_segment_end when nothing was pushed - [#730](https://github.com/livekit/agents/pull/730) ([@theomonnom](https://github.com/theomonnom)) + +## 0.8.11 + +### Patch Changes + +- improve gracefully_cancel logic - [#720](https://github.com/livekit/agents/pull/720) ([@theomonnom](https://github.com/theomonnom)) + +- Make ctx.room.name available prior to connection - [#716](https://github.com/livekit/agents/pull/716) ([@davidzhao](https://github.com/davidzhao)) + +- ipc: add threaded job runner - [#684](https://github.com/livekit/agents/pull/684) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: add VoiceAssistantState - [#654](https://github.com/livekit/agents/pull/654) ([@lukasIO](https://github.com/lukasIO)) + +- add JobContext.wait_for_participant - [#712](https://github.com/livekit/agents/pull/712) ([@theomonnom](https://github.com/theomonnom)) + +- fix non pickleable log - [#691](https://github.com/livekit/agents/pull/691) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: skip speech initialization if interrupted - [#715](https://github.com/livekit/agents/pull/715) ([@theomonnom](https://github.com/theomonnom)) + +- bump required livekit version to 0.15.2 - [#722](https://github.com/livekit/agents/pull/722) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: add will_synthesize_assistant_speech - [#706](https://github.com/livekit/agents/pull/706) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: fix mark_audio_segment_end with no audio data - [#719](https://github.com/livekit/agents/pull/719) ([@theomonnom](https://github.com/theomonnom)) + +## 0.8.10 + +### Patch Changes + +- Pass JobContext to participant entrypoint function - [#694](https://github.com/livekit/agents/pull/694) ([@davidzhao](https://github.com/davidzhao)) + +- voiceassistant: keep punctuations when sending agent transcription - [#648](https://github.com/livekit/agents/pull/648) ([@theomonnom](https://github.com/theomonnom)) + +## 0.8.9 + +### Patch Changes + +- Introduce easy api for starting tasks for remote participants - [#679](https://github.com/livekit/agents/pull/679) ([@keepingitneil](https://github.com/keepingitneil)) + +- update livekit to 0.14.0 and await tracksubscribed - [#678](https://github.com/livekit/agents/pull/678) ([@nbsp](https://github.com/nbsp)) + +## 0.8.8 + +### Patch Changes + +- fix uninitialized SpeechHandle error on interruption - [#665](https://github.com/livekit/agents/pull/665) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: avoid stacking assistant replies when allow_interruptions=False - [#667](https://github.com/livekit/agents/pull/667) ([@theomonnom](https://github.com/theomonnom)) + +- fix: disconnect event may now have a arguments - [#668](https://github.com/livekit/agents/pull/668) ([@theomonnom](https://github.com/theomonnom)) + +- Add ServerMessage.termination handler - [#635](https://github.com/livekit/agents/pull/635) ([@nbsp](https://github.com/nbsp)) + +## 0.8.7 + +### Patch Changes + +- voiceassistant: fix llm not having the full chat context on bad interruption timing - [#659](https://github.com/livekit/agents/pull/659) ([@theomonnom](https://github.com/theomonnom)) + +## 0.8.6 + +### Patch Changes + +- voiceassistant: fix will_synthesize_assistant_reply race - [#638](https://github.com/livekit/agents/pull/638) ([@theomonnom](https://github.com/theomonnom)) + +- Switch Cartesia to a sentence tokenizer and keep the same context id throughout. - [#608](https://github.com/livekit/agents/pull/608) ([@keepingitneil](https://github.com/keepingitneil)) + Propagate segment_id through the basic sentence tokenizer + +- silero: adjust vad activation threshold - [#639](https://github.com/livekit/agents/pull/639) ([@theomonnom](https://github.com/theomonnom)) + +- limit simultaneous process initialization - [#621](https://github.com/livekit/agents/pull/621) ([@theomonnom](https://github.com/theomonnom)) + +- voiceassistant: remove fade effect when interrupting #622 - [#623](https://github.com/livekit/agents/pull/623) ([@theomonnom](https://github.com/theomonnom)) + +- ipc improvements, fix slow shutdown & cleanup leaked resources - [#607](https://github.com/livekit/agents/pull/607) ([@theomonnom](https://github.com/theomonnom)) + +- ipc: use our own duplex instead of mp.Queue - [#634](https://github.com/livekit/agents/pull/634) ([@theomonnom](https://github.com/theomonnom)) + +- Support OpenAI Assistants API as a beta feature under `livekit.plugins.openai.beta` - [#601](https://github.com/livekit/agents/pull/601) ([@keepingitneil](https://github.com/keepingitneil)) + Add \_metadata to ChatCtx and ChatMessage which can be used (in the case of OpenAI assistants) for bookeeping to sync local state with remote, OpenAI state + +- llm: fix optional arguments & non-hashable list - [#637](https://github.com/livekit/agents/pull/637) ([@theomonnom](https://github.com/theomonnom)) + +- silero: fix vad padding & static audio - [#631](https://github.com/livekit/agents/pull/631) ([@theomonnom](https://github.com/theomonnom)) + ## 0.8.5 ### Patch Changes diff --git a/livekit-agents/livekit/agents/__init__.py b/livekit-agents/livekit/agents/__init__.py index c3c168541..ce05f97f0 100644 --- a/livekit-agents/livekit/agents/__init__.py +++ b/livekit-agents/livekit/agents/__init__.py @@ -13,19 +13,25 @@ # limitations under the License. from . import ipc, llm, stt, tokenize, transcription, tts, utils, vad, voice_assistant -from .job import AutoSubscribe, JobContext, JobProcess, JobRequest +from .job import AutoSubscribe, JobContext, JobExecutorType, JobProcess, JobRequest from .plugin import Plugin +from .proto import ATTR_AGENT_STATE, AgentState from .version import __version__ -from .worker import Worker, WorkerOptions +from .worker import Worker, WorkerOptions, WorkerPermissions, WorkerType __all__ = [ "__version__", "Worker", "WorkerOptions", + "WorkerType", + "WorkerPermissions", "JobProcess", "JobContext", "JobRequest", + "JobExecutorType", "AutoSubscribe", + "AgentState", + "ATTR_AGENT_STATE", "Plugin", "ipc", "stt", diff --git a/livekit-agents/livekit/agents/cli/cli.py b/livekit-agents/livekit/agents/cli/cli.py index ed5c1a76b..54de0e712 100644 --- a/livekit-agents/livekit/agents/cli/cli.py +++ b/livekit-agents/livekit/agents/cli/cli.py @@ -1,5 +1,4 @@ import asyncio -import functools import pathlib import signal import sys @@ -15,7 +14,11 @@ from .log import setup_logging -def shared_args(func): +def run_app(opts: WorkerOptions) -> None: + """Run the CLI to interact with the worker""" + cli = click.Group() + + @cli.command(help="Start the worker in production mode.") @click.option( "--log-level", default="INFO", @@ -39,83 +42,6 @@ def shared_args(func): envvar="LIVEKIT_API_SECRET", help="LiveKit server or Cloud project's API secret", ) - @functools.wraps(func) - def wrapper(*args, **kwargs): - return func(*args, **kwargs) - - return wrapper - - -def shared_dev_args(func): - @click.option( - "--asyncio-debug/--no-asyncio-debug", - default=False, - help="Enable debugging feature of asyncio", - ) - @click.option( - "--watch/--no-watch", - default=True, - help="Watch for changes in the current directory and plugins in editable mode", - ) - @functools.wraps(func) - def wrapper(*args, **kwargs): - return func(*args, **kwargs) - - return wrapper - - -def _run_dev( - opts: WorkerOptions, - log_level: str, - url: str, - api_key: str, - api_secret: str, - asyncio_debug: bool, - watch: bool, - room: str = "", - participant_identity: str = "", -): - opts.ws_url = url or opts.ws_url - opts.api_key = api_key or opts.api_key - opts.api_secret = api_secret or opts.api_secret - args = proto.CliArgs( - opts=opts, - log_level=log_level, - production=False, - asyncio_debug=asyncio_debug, - watch=watch, - drain_timeout=0, - room=room, - participant_identity=participant_identity, - ) - - if watch: - from .watcher import WatchServer - - setup_logging(log_level, args.production) - main_file = pathlib.Path(sys.argv[0]).parent - - async def _run_loop(): - server = WatchServer( - run_worker, main_file, args, loop=asyncio.get_event_loop() - ) - await server.run() - - try: - asyncio.run(_run_loop()) - except KeyboardInterrupt: - pass - else: - run_worker(args) - - -def run_app(opts: WorkerOptions) -> None: - """Run the CLI to interact with the worker""" - - cli = click.Group() - - @cli.command(help="Start the worker in production mode.") - @shared_args @click.option( "--drain-timeout", default=60, @@ -130,7 +56,7 @@ def start( args = proto.CliArgs( opts=opts, log_level=log_level, - production=True, + devmode=False, asyncio_debug=False, watch=False, drain_timeout=drain_timeout, @@ -138,8 +64,39 @@ def start( run_worker(args) @cli.command(help="Start the worker in development mode") - @shared_args - @shared_dev_args + @click.option( + "--log-level", + default="DEBUG", + type=click.Choice( + ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"], case_sensitive=False + ), + help="Set the logging level", + ) + @click.option( + "--url", + envvar="LIVEKIT_URL", + help="LiveKit server or Cloud project's websocket URL", + ) + @click.option( + "--api-key", + envvar="LIVEKIT_API_KEY", + help="LiveKit server or Cloud project's API key", + ) + @click.option( + "--api-secret", + envvar="LIVEKIT_API_SECRET", + help="LiveKit server or Cloud project's API secret", + ) + @click.option( + "--asyncio-debug/--no-asyncio-debug", + default=False, + help="Enable debugging feature of asyncio", + ) + @click.option( + "--watch/--no-watch", + default=True, + help="Watch for changes in the current directory and plugins in editable mode", + ) def dev( log_level: str, url: str, @@ -151,8 +108,39 @@ def dev( _run_dev(opts, log_level, url, api_key, api_secret, asyncio_debug, watch) @cli.command(help="Connect to a specific room") - @shared_args - @shared_dev_args + @click.option( + "--log-level", + default="DEBUG", + type=click.Choice( + ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"], case_sensitive=False + ), + help="Set the logging level", + ) + @click.option( + "--url", + envvar="LIVEKIT_URL", + help="LiveKit server or Cloud project's websocket URL", + ) + @click.option( + "--api-key", + envvar="LIVEKIT_API_KEY", + help="LiveKit server or Cloud project's API key", + ) + @click.option( + "--api-secret", + envvar="LIVEKIT_API_SECRET", + help="LiveKit server or Cloud project's API secret", + ) + @click.option( + "--asyncio-debug/--no-asyncio-debug", + default=False, + help="Enable debugging feature of asyncio", + ) + @click.option( + "--watch/--no-watch", + default=True, + help="Watch for changes in the current directory and plugins in editable mode", + ) @click.option("--room", help="Room name to connect to", required=True) @click.option( "--participant-identity", help="Participant identity (JobType.JT_PUBLISHER)" @@ -179,10 +167,10 @@ def connect( participant_identity, ) - @cli.command(help="Download plugin dependency files (i.e. model weights, ...)") + @cli.command(help="Download plugin dependency files") @click.option( "--log-level", - default="INFO", + default="DEBUG", type=click.Choice( ["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"], case_sensitive=False ), @@ -199,14 +187,56 @@ def download_files(log_level: str) -> None: cli() -def run_worker(args: proto.CliArgs) -> None: - class Shutdown(SystemExit): - pass +def _run_dev( + opts: WorkerOptions, + log_level: str, + url: str, + api_key: str, + api_secret: str, + asyncio_debug: bool, + watch: bool, + room: str = "", + participant_identity: str = "", +): + opts.ws_url = url or opts.ws_url + opts.api_key = api_key or opts.api_key + opts.api_secret = api_secret or opts.api_secret + args = proto.CliArgs( + opts=opts, + log_level=log_level, + devmode=True, + asyncio_debug=asyncio_debug, + watch=watch, + drain_timeout=0, + room=room, + participant_identity=participant_identity, + ) + + if watch: + from .watcher import WatchServer + + setup_logging(log_level, args.devmode) + main_file = pathlib.Path(sys.argv[0]).parent - setup_logging(args.log_level, args.production) + async def _run_loop(): + server = WatchServer( + run_worker, main_file, args, loop=asyncio.get_event_loop() + ) + await server.run() + + try: + asyncio.run(_run_loop()) + except KeyboardInterrupt: + pass + else: + run_worker(args) + + +def run_worker(args: proto.CliArgs) -> None: + setup_logging(args.log_level, args.devmode) loop = asyncio.get_event_loop() - worker = Worker(args.opts, loop=loop) + worker = Worker(args.opts, devmode=args.devmode, loop=loop) loop.set_debug(args.asyncio_debug) loop.slow_callback_duration = 0.1 # 100ms @@ -222,7 +252,7 @@ def _connect_on_register(worker_id: str, server_info: models.ServerInfo): try: def _signal_handler(): - raise Shutdown + raise KeyboardInterrupt for sig in (signal.SIGINT, signal.SIGTERM): loop.add_signal_handler(sig, _signal_handler) @@ -249,16 +279,22 @@ async def _worker_run(worker: Worker) -> None: main_task = loop.create_task(_worker_run(worker), name="agent_runner") try: loop.run_until_complete(main_task) - except (Shutdown, KeyboardInterrupt): + except KeyboardInterrupt: pass - if args.production: - loop.run_until_complete(worker.drain(timeout=args.drain_timeout)) + try: + if not args.devmode: + loop.run_until_complete(worker.drain(timeout=args.drain_timeout)) - loop.run_until_complete(worker.aclose()) + loop.run_until_complete(worker.aclose()) + + if watch_client: + loop.run_until_complete(watch_client.aclose()) + except KeyboardInterrupt: + logger.warning("exiting forcefully") + import os - if watch_client: - loop.run_until_complete(watch_client.aclose()) + os._exit(1) # TODO(theomonnom): add aclose(force=True) in worker finally: try: tasks = asyncio.all_tasks(loop) diff --git a/livekit-agents/livekit/agents/cli/log.py b/livekit-agents/livekit/agents/cli/log.py index 520eb71ea..2223869a0 100644 --- a/livekit-agents/livekit/agents/cli/log.py +++ b/livekit-agents/livekit/agents/cli/log.py @@ -11,6 +11,15 @@ from ..plugin import Plugin +# noisy loggers are set to warn by default +NOISY_LOGGERS = [ + "httpx", + "httpcore", + "openai", + "livekit", + "watchfiles", +] + # skip default LogRecord attributes # http://docs.python.org/library/logging.html#logrecord-attributes _RESERVED_ATTRS: Tuple[str, ...] = ( @@ -92,6 +101,7 @@ def format(self, record: logging.LogRecord) -> str: """Formats a log record and serializes to json""" message_dict: Dict[str, Any] = {} message_dict["level"] = record.levelname + message_dict["name"] = record.name if isinstance(record.msg, dict): message_dict = record.msg @@ -180,10 +190,10 @@ def formatMessage(self, record: logging.LogRecord) -> str: return msg + self._esc_codes["esc_reset"] -def setup_logging(log_level: str, production: bool = True) -> None: +def setup_logging(log_level: str, devmode: bool) -> None: handler = logging.StreamHandler() - if not production: + if devmode: # colorful logs for dev (improves readability) colored_formatter = ColoredFormatter( "%(asctime)s - %(esc_levelcolor)s%(levelname)-4s%(esc_reset)s %(name)s - %(message)s %(extra)s" @@ -196,9 +206,12 @@ def setup_logging(log_level: str, production: bool = True) -> None: root = logging.getLogger() root.addHandler(handler) + root.setLevel(log_level) - if root.level == logging.NOTSET: - root.setLevel(logging.WARN) + for noisy_logger in NOISY_LOGGERS: + logger = logging.getLogger(noisy_logger) + if logger.level == logging.NOTSET: + logger.setLevel(logging.WARN) from ..log import logger diff --git a/livekit-agents/livekit/agents/cli/proto.py b/livekit-agents/livekit/agents/cli/proto.py index cc278445f..f03b5f669 100644 --- a/livekit-agents/livekit/agents/cli/proto.py +++ b/livekit-agents/livekit/agents/cli/proto.py @@ -16,7 +16,7 @@ class CliArgs: opts: WorkerOptions log_level: str - production: bool + devmode: bool asyncio_debug: bool watch: bool drain_timeout: int diff --git a/livekit-agents/livekit/agents/cli/watcher.py b/livekit-agents/livekit/agents/cli/watcher.py index acdf49480..1be355922 100644 --- a/livekit-agents/livekit/agents/cli/watcher.py +++ b/livekit-agents/livekit/agents/cli/watcher.py @@ -6,6 +6,7 @@ import pathlib import socket import urllib.parse +import urllib.request from importlib.metadata import Distribution, PackageNotFoundError from typing import Any, Callable, Set @@ -52,7 +53,9 @@ def _try_add(name: str) -> bool: path: str | None = durl_json.get("url") if path and path.startswith("file://"): parsed_url = urllib.parse.urlparse(path) - file_path = pathlib.Path(urllib.parse.unquote(parsed_url.path)) + file_url_path = urllib.parse.unquote(parsed_url.path) + local_path = urllib.request.url2pathname(file_url_path) + file_path = pathlib.Path(local_path) paths.append(file_path) return paths @@ -83,15 +86,18 @@ async def run(self) -> None: self._pch = await utils.aio.duplex_unix._AsyncDuplex.open(self._mp_pch) read_ipc_task = self._loop.create_task(self._read_ipc_task()) - await watchfiles.arun_process( - *watch_paths, - target=self._worker_runner, - args=(self._cli_args,), - watch_filter=watchfiles.filters.PythonFilter(), - callback=self._on_reload, - ) - await utils.aio.gracefully_cancel(read_ipc_task) + try: + await watchfiles.arun_process( + *watch_paths, + target=self._worker_runner, + args=(self._cli_args,), + watch_filter=watchfiles.filters.PythonFilter(), + callback=self._on_reload, + ) + finally: + await utils.aio.gracefully_cancel(read_ipc_task) + await self._pch.aclose() async def _on_reload(self, _: Set[watchfiles.main.FileChange]) -> None: if self._reloading_jobs: @@ -138,25 +144,28 @@ def start(self) -> None: @utils.log_exceptions(logger=logger) async def _run(self) -> None: - self._cch = await utils.aio.duplex_unix._AsyncDuplex.open(self._mp_cch) - - await channel.asend_message(self._cch, proto.ReloadJobsRequest()) - while True: - try: - msg = await channel.arecv_message(self._cch, proto.IPC_MESSAGES) - except utils.aio.duplex_unix.DuplexClosed: - break - - if isinstance(msg, proto.ActiveJobsRequest): - jobs = self._worker.active_jobs - - await channel.asend_message( - self._cch, proto.ActiveJobsResponse(jobs=jobs) - ) - elif isinstance(msg, proto.ReloadJobsResponse): - # TODO(theomonnom): wait for the worker to be fully initialized/connected - await self._worker._reload_jobs(msg.jobs) - await channel.asend_message(self._cch, proto.Reloaded()) + try: + self._cch = await utils.aio.duplex_unix._AsyncDuplex.open(self._mp_cch) + + await channel.asend_message(self._cch, proto.ReloadJobsRequest()) + while True: + try: + msg = await channel.arecv_message(self._cch, proto.IPC_MESSAGES) + except utils.aio.duplex_unix.DuplexClosed: + break + + if isinstance(msg, proto.ActiveJobsRequest): + jobs = self._worker.active_jobs + + await channel.asend_message( + self._cch, proto.ActiveJobsResponse(jobs=jobs) + ) + elif isinstance(msg, proto.ReloadJobsResponse): + # TODO(theomonnom): wait for the worker to be fully initialized/connected + await self._worker._reload_jobs(msg.jobs) + await channel.asend_message(self._cch, proto.Reloaded()) + except utils.aio.duplex_unix.DuplexClosed: + pass async def aclose(self) -> None: if not self._main_task: diff --git a/livekit-agents/livekit/agents/ipc/__init__.py b/livekit-agents/livekit/agents/ipc/__init__.py index de6c63381..ab04d6b5e 100644 --- a/livekit-agents/livekit/agents/ipc/__init__.py +++ b/livekit-agents/livekit/agents/ipc/__init__.py @@ -1,3 +1,17 @@ -from . import channel, proc_pool, proto, supervised_proc +from . import ( + channel, + job_executor, + proc_job_executor, + proc_pool, + proto, + thread_job_executor, +) -__all__ = ["proto", "channel", "proc_pool", "supervised_proc"] +__all__ = [ + "proto", + "channel", + "proc_pool", + "proc_job_executor", + "thread_job_executor", + "job_executor", +] diff --git a/livekit-agents/livekit/agents/ipc/job_executor.py b/livekit-agents/livekit/agents/ipc/job_executor.py new file mode 100644 index 000000000..8fe9b9848 --- /dev/null +++ b/livekit-agents/livekit/agents/ipc/job_executor.py @@ -0,0 +1,29 @@ +from __future__ import annotations + +from typing import Any, Protocol + +from ..job import RunningJobInfo + + +class JobExecutor(Protocol): + @property + def started(self) -> bool: ... + + @property + def start_arguments(self) -> Any | None: ... + + @start_arguments.setter + def start_arguments(self, value: Any | None) -> None: ... + + @property + def running_job(self) -> RunningJobInfo | None: ... + + async def start(self) -> None: ... + + async def join(self) -> None: ... + + async def initialize(self) -> None: ... + + async def aclose(self) -> None: ... + + async def launch_job(self, info: RunningJobInfo) -> None: ... diff --git a/livekit-agents/livekit/agents/ipc/proc_main.py b/livekit-agents/livekit/agents/ipc/job_main.py similarity index 71% rename from livekit-agents/livekit/agents/ipc/proc_main.py rename to livekit-agents/livekit/agents/ipc/job_main.py index 8b1bb57c0..ff0dc54b9 100644 --- a/livekit-agents/livekit/agents/ipc/proc_main.py +++ b/livekit-agents/livekit/agents/ipc/job_main.py @@ -4,9 +4,12 @@ import contextlib import copy import logging -import multiprocessing as mp +import pickle +import queue import socket +import threading from dataclasses import dataclass +from typing import Any, Callable, Optional from livekit import rtc @@ -18,12 +21,33 @@ class LogQueueHandler(logging.Handler): - def __init__(self, queue: mp.Queue) -> None: + _sentinal = None + + def __init__(self, duplex: utils.aio.duplex_unix._Duplex) -> None: super().__init__() - self._q = queue + self._duplex = duplex + self._send_q = queue.SimpleQueue[Optional[bytes]]() + self._send_thread = threading.Thread( + target=self._forward_logs, name="ipc_log_forwarder" + ) + self._send_thread.start() + + def _forward_logs(self): + while True: + serialized_record = self._send_q.get() + if serialized_record is None: + break + + try: + self._duplex.send_bytes(serialized_record) + except duplex_unix.DuplexClosed: + break + + self._duplex.close() def emit(self, record: logging.LogRecord) -> None: try: + # from https://github.com/python/cpython/blob/91b7f2e7f6593acefda4fa860250dd87d6f849bf/Lib/logging/handlers.py#L1453 msg = self.format(record) record = copy.copy(record) record.message = msg @@ -31,10 +55,22 @@ def emit(self, record: logging.LogRecord) -> None: record.args = None record.exc_info = None record.exc_text = None - self._q.put_nowait(record) + record.stack_info = None + + # https://websockets.readthedocs.io/en/stable/topics/logging.html#logging-to-json + # webosckets library add "websocket" attribute to log records, which is not pickleable + if hasattr(record, "websocket"): + record.websocket = None + + self._send_q.put_nowait(pickle.dumps(record)) + except Exception: self.handleError(record) + def close(self) -> None: + super().close() + self._send_q.put_nowait(self._sentinal) + @dataclass class _ShutdownInfo: @@ -50,8 +86,8 @@ class JobTask: def _start_job( - args: proto.ProcStartArgs, proc: JobProcess, + job_entrypoint_fnc: Callable[[JobContext], Any], start_req: proto.StartJobRequest, exit_proc_fut: asyncio.Event, cch: utils.aio.duplex_unix._AsyncDuplex, @@ -82,6 +118,7 @@ def _on_ctx_shutdown(reason: str) -> None: ) info = start_req.running_job + room._info.name = info.job.room.name job_ctx = JobContext( proc=proc, info=info, @@ -94,7 +131,7 @@ def _on_ctx_shutdown(reason: str) -> None: async def _run_job_task() -> None: utils.http_context._new_session_ctx() job_entry_task = asyncio.create_task( - args.job_entrypoint_fnc(job_ctx), name="job_entrypoint" + job_entrypoint_fnc(job_ctx), name="job_entrypoint" ) async def _warn_not_connected_task(): @@ -152,7 +189,9 @@ def log_exception(t: asyncio.Task) -> None: async def _async_main( - args: proto.ProcStartArgs, proc: JobProcess, mp_cch: socket.socket + proc: JobProcess, + job_entrypoint_fnc: Callable[[JobContext], Any], + mp_cch: socket.socket, ) -> None: cch = await duplex_unix._AsyncDuplex.open(mp_cch) @@ -165,7 +204,8 @@ async def _read_ipc_task(): nonlocal job_task while True: msg = await channel.arecv_message(cch, proto.IPC_MESSAGES) - no_msg_timeout.reset() + with contextlib.suppress(utils.aio.SleepFinished): + no_msg_timeout.reset() if isinstance(msg, proto.PingRequest): pong = proto.PongResponse( @@ -175,7 +215,7 @@ async def _read_ipc_task(): if isinstance(msg, proto.StartJobRequest): assert job_task is None, "job task already running" - job_task = _start_job(args, proc, msg, exit_proc_fut, cch) + job_task = _start_job(proc, job_entrypoint_fnc, msg, exit_proc_fut, cch) if isinstance(msg, proto.ShutdownRequest): if job_task is None: @@ -209,48 +249,58 @@ def _done_cb(task: asyncio.Task) -> None: await cch.aclose() -def main(args: proto.ProcStartArgs) -> None: - root_logger = logging.getLogger() - root_logger.setLevel(logging.NOTSET) +@dataclass +class ProcStartArgs: + initialize_process_fnc: Callable[[JobProcess], Any] + job_entrypoint_fnc: Callable[[JobContext], Any] + log_cch: socket.socket + mp_cch: socket.socket + asyncio_debug: bool + user_arguments: Any | None = None + + +@dataclass +class ThreadStartArgs: + mp_cch: socket.socket + initialize_process_fnc: Callable[[JobProcess], Any] + job_entrypoint_fnc: Callable[[JobContext], Any] + user_arguments: Any | None + asyncio_debug: bool + join_fnc: Callable[[], None] - log_q = args.log_q - log_q.cancel_join_thread() - log_handler = LogQueueHandler(log_q) - root_logger.addHandler(log_handler) +def thread_main( + args: ThreadStartArgs, +) -> None: + """main function for the job process when using the ThreadedJobRunner""" + tid = threading.get_native_id() loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) loop.set_debug(args.asyncio_debug) loop.slow_callback_duration = 0.1 # 100ms - utils.aio.debug.hook_slow_callbacks(2.0) cch = duplex_unix._Duplex.open(args.mp_cch) try: init_req = channel.recv_message(cch, proto.IPC_MESSAGES) - assert isinstance( init_req, proto.InitializeRequest ), "first message must be InitializeRequest" - job_proc = JobProcess(start_arguments=args.user_arguments) - logger.debug("initializing process", extra={"pid": job_proc.pid}) + + logger.debug("initializing job runner", extra={"tid": tid}) args.initialize_process_fnc(job_proc) - logger.debug("process initialized", extra={"pid": job_proc.pid}) + logger.debug("job runner initialized", extra={"tid": tid}) channel.send_message(cch, proto.InitializeResponse()) main_task = loop.create_task( - _async_main(args, job_proc, cch.detach()), name="job_proc_main" + _async_main(job_proc, args.job_entrypoint_fnc, cch.detach()), + name="job_proc_main", ) - while not main_task.done(): - try: - loop.run_until_complete(main_task) - except KeyboardInterrupt: - # ignore the keyboard interrupt, we handle the process shutdown ourselves on the worker process - pass + loop.run_until_complete(main_task) except duplex_unix.DuplexClosed: pass + except Exception: + logger.exception("error while running job process", extra={"tid": tid}) finally: - log_handler.close() - log_q.close() - cch.close() + args.join_fnc() loop.run_until_complete(loop.shutdown_default_executor()) diff --git a/livekit-agents/livekit/agents/ipc/supervised_proc.py b/livekit-agents/livekit/agents/ipc/proc_job_executor.py similarity index 89% rename from livekit-agents/livekit/agents/ipc/supervised_proc.py rename to livekit-agents/livekit/agents/ipc/proc_job_executor.py index 1cb4ae7da..f5f846130 100644 --- a/livekit-agents/livekit/agents/ipc/supervised_proc.py +++ b/livekit-agents/livekit/agents/ipc/proc_job_executor.py @@ -3,41 +3,40 @@ import asyncio import contextlib import logging -import multiprocessing as mp +import pickle import socket import sys import threading from dataclasses import dataclass from multiprocessing.context import BaseContext -from typing import Any, Callable, Coroutine +from typing import Any, Awaitable, Callable from .. import utils from ..job import JobContext, JobProcess, RunningJobInfo from ..log import logger from ..utils.aio import duplex_unix -from . import channel, proc_main, proto +from . import channel, job_main, proc_lazy_main, proto class LogQueueListener: - _sentinel = None - def __init__( - self, queue: mp.Queue, prepare_fnc: Callable[[logging.LogRecord], None] + self, + duplex: utils.aio.duplex_unix._Duplex, + prepare_fnc: Callable[[logging.LogRecord], None], ): self._thread: threading.Thread | None = None - self._q = queue + self._duplex = duplex self._prepare_fnc = prepare_fnc def start(self) -> None: - self._thread = t = threading.Thread( - target=self._monitor, daemon=True, name="log_listener" - ) - t.start() + self._thread = threading.Thread(target=self._monitor, name="ipc_log_listener") + self._thread.start() def stop(self) -> None: if self._thread is None: return - self._q.put_nowait(self._sentinel) + + self._duplex.close() self._thread.join() self._thread = None @@ -52,28 +51,30 @@ def handle(self, record: logging.LogRecord) -> None: def _monitor(self): while True: - record = self._q.get() - if record is self._sentinel: + try: + data = self._duplex.recv_bytes() + except utils.aio.duplex_unix.DuplexClosed: break + record = pickle.loads(data) self.handle(record) @dataclass class _ProcOpts: initialize_process_fnc: Callable[[JobProcess], Any] - job_entrypoint_fnc: Callable[[JobContext], Coroutine] + job_entrypoint_fnc: Callable[[JobContext], Awaitable[None]] mp_ctx: BaseContext initialize_timeout: float close_timeout: float -class SupervisedProc: +class ProcJobExecutor: def __init__( self, *, initialize_process_fnc: Callable[[JobProcess], Any], - job_entrypoint_fnc: Callable[[JobContext], Coroutine], + job_entrypoint_fnc: Callable[[JobContext], Awaitable[None]], initialize_timeout: float, close_timeout: float, mp_ctx: BaseContext, @@ -145,29 +146,32 @@ def _add_proc_ctx_log(record: logging.LogRecord) -> None: setattr(record, key, value) async with self._lock: - log_q = self._opts.mp_ctx.Queue() - log_q.cancel_join_thread() - mp_pch, mp_cch = socket.socketpair() + mp_log_pch, mp_log_cch = socket.socketpair() self._pch = await duplex_unix._AsyncDuplex.open(mp_pch) - log_listener = LogQueueListener(log_q, _add_proc_ctx_log) + + log_pch = duplex_unix._Duplex.open(mp_log_pch) + log_listener = LogQueueListener(log_pch, _add_proc_ctx_log) log_listener.start() - self._proc_args = proto.ProcStartArgs( + self._proc_args = job_main.ProcStartArgs( initialize_process_fnc=self._opts.initialize_process_fnc, job_entrypoint_fnc=self._opts.job_entrypoint_fnc, - log_q=log_q, + log_cch=mp_log_cch, mp_cch=mp_cch, asyncio_debug=self._loop.get_debug(), user_arguments=self._user_args, ) self._proc = self._opts.mp_ctx.Process( # type: ignore - target=proc_main.main, args=(self._proc_args,), name="job_proc" + target=proc_lazy_main.proc_main, + args=(self._proc_args,), + name="job_proc", ) self._proc.start() + mp_log_cch.close() mp_cch.close() self._pid = self._proc.pid @@ -176,7 +180,6 @@ def _add_proc_ctx_log(record: logging.LogRecord) -> None: def _sync_run(): self._proc.join() log_listener.stop() - log_q.close() try: self._loop.call_soon_threadsafe(self._join_fut.set_result, None) except RuntimeError: @@ -278,7 +281,7 @@ def _send_kill_signal(self) -> None: except ValueError: return - logger.debug("killing job process", extra=self.logging_extra()) + logger.info("killing job process", extra=self.logging_extra()) if sys.platform == "win32": self._proc.terminate() else: @@ -334,7 +337,7 @@ async def _monitor_task(self, pong_timeout: utils.aio.Sleep) -> None: pong_timeout.reset() if isinstance(msg, proto.Exiting): - logger.debug( + logger.info( "job exiting", extra={"reason": msg.reason, **self.logging_extra()} ) @@ -366,8 +369,8 @@ async def _pong_timeout_co(): finally: await utils.aio.gracefully_cancel(*tasks) - def logging_extra(self) -> dict: - extra: dict = { + def logging_extra(self): + extra: dict[str, Any] = { "pid": self.pid, } if self._running_job: diff --git a/livekit-agents/livekit/agents/ipc/proc_lazy_main.py b/livekit-agents/livekit/agents/ipc/proc_lazy_main.py new file mode 100644 index 000000000..be09e7f5a --- /dev/null +++ b/livekit-agents/livekit/agents/ipc/proc_lazy_main.py @@ -0,0 +1,72 @@ +import multiprocessing + +if multiprocessing.current_process().name == "job_proc": + import signal + import sys + + # ignore signals in the jobs process (the parent process will handle them) + signal.signal(signal.SIGINT, signal.SIG_IGN) + signal.signal(signal.SIGTERM, signal.SIG_IGN) + + def _no_traceback_excepthook(exc_type, exc_val, traceback): + if isinstance(exc_val, KeyboardInterrupt): + return + sys.__excepthook__(exc_type, exc_val, traceback) + + sys.excepthook = _no_traceback_excepthook + + +def proc_main(args) -> None: + """main function for the job process when using the ProcessJobRunner""" + + # import every package lazily + import asyncio + import logging + + from .. import utils + from ..job import JobProcess + from ..log import logger + from . import channel, job_main, proto + + root_logger = logging.getLogger() + root_logger.setLevel(logging.NOTSET) + + log_cch = utils.aio.duplex_unix._Duplex.open(args.log_cch) + log_handler = job_main.LogQueueHandler(log_cch) + root_logger.addHandler(log_handler) + + loop = asyncio.new_event_loop() + asyncio.set_event_loop(loop) + loop.set_debug(args.asyncio_debug) + loop.slow_callback_duration = 0.1 # 100ms + utils.aio.debug.hook_slow_callbacks(2.0) + + cch = utils.aio.duplex_unix._Duplex.open(args.mp_cch) + try: + init_req = channel.recv_message(cch, proto.IPC_MESSAGES) + + assert isinstance( + init_req, proto.InitializeRequest + ), "first message must be InitializeRequest" + + job_proc = JobProcess(start_arguments=args.user_arguments) + logger.info("initializing process", extra={"pid": job_proc.pid}) + args.initialize_process_fnc(job_proc) + logger.info("process initialized", extra={"pid": job_proc.pid}) + channel.send_message(cch, proto.InitializeResponse()) + + main_task = loop.create_task( + job_main._async_main(job_proc, args.job_entrypoint_fnc, cch.detach()), + name="job_proc_main", + ) + while not main_task.done(): + try: + loop.run_until_complete(main_task) + except KeyboardInterrupt: + # ignore the keyboard interrupt, we handle the process shutdown ourselves on the worker process + pass + except (utils.aio.duplex_unix.DuplexClosed, KeyboardInterrupt): + pass + finally: + log_handler.close() + loop.run_until_complete(loop.shutdown_default_executor()) diff --git a/livekit-agents/livekit/agents/ipc/proc_pool.py b/livekit-agents/livekit/agents/ipc/proc_pool.py index e281aed96..307227876 100644 --- a/livekit-agents/livekit/agents/ipc/proc_pool.py +++ b/livekit-agents/livekit/agents/ipc/proc_pool.py @@ -2,13 +2,14 @@ import asyncio from multiprocessing.context import BaseContext -from typing import Any, Callable, Coroutine, Literal +from typing import Any, Awaitable, Callable, Literal from .. import utils -from ..job import JobContext, JobProcess, RunningJobInfo +from ..job import JobContext, JobExecutorType, JobProcess, RunningJobInfo from ..log import logger from ..utils import aio -from .supervised_proc import SupervisedProc +from . import proc_job_executor, thread_job_executor +from .job_executor import JobExecutor EventTypes = Literal[ "process_created", "process_started", "process_ready", "process_closed" @@ -22,14 +23,16 @@ def __init__( self, *, initialize_process_fnc: Callable[[JobProcess], Any], - job_entrypoint_fnc: Callable[[JobContext], Coroutine], + job_entrypoint_fnc: Callable[[JobContext], Awaitable[None]], num_idle_processes: int, initialize_timeout: float, close_timeout: float, + job_executor_type: JobExecutorType, mp_ctx: BaseContext, loop: asyncio.AbstractEventLoop, ) -> None: super().__init__() + self._job_executor_type = job_executor_type self._mp_ctx = mp_ctx self._initialize_process_fnc = initialize_process_fnc self._job_entrypoint_fnc = job_entrypoint_fnc @@ -37,16 +40,27 @@ def __init__( self._initialize_timeout = initialize_timeout self._loop = loop + self._num_idle_processes = num_idle_processes self._init_sem = asyncio.Semaphore(MAX_CONCURRENT_INITIALIZATIONS) self._proc_needed_sem = asyncio.Semaphore(num_idle_processes) - self._warmed_proc_queue = asyncio.Queue[SupervisedProc]() - self._processes: list[SupervisedProc] = [] + self._warmed_proc_queue = asyncio.Queue[JobExecutor]() + self._executors: list[JobExecutor] = [] self._started = False self._closed = False @property - def processes(self) -> list[SupervisedProc]: - return self._processes + def processes(self) -> list[JobExecutor]: + return self._executors + + def get_by_job_id(self, job_id: str) -> JobExecutor | None: + return next( + ( + x + for x in self._executors + if x.running_job and x.running_job.job.id == job_id + ), + None, + ) def start(self) -> None: if self._started: @@ -63,22 +77,40 @@ async def aclose(self) -> None: await aio.gracefully_cancel(self._main_atask) async def launch_job(self, info: RunningJobInfo) -> None: - proc = await self._warmed_proc_queue.get() - self._proc_needed_sem.release() # notify that a new process needs to be warmed/started + if self._num_idle_processes == 0: + self._proc_needed_sem.release() # ask for a process if prewarmed processes are not disabled + proc = await self._warmed_proc_queue.get() + else: + proc = await self._warmed_proc_queue.get() + self._proc_needed_sem.release() # notify that a new process can be warmed/started + await proc.launch_job(info) @utils.log_exceptions(logger=logger) async def _proc_watch_task(self) -> None: - proc = SupervisedProc( - initialize_process_fnc=self._initialize_process_fnc, - job_entrypoint_fnc=self._job_entrypoint_fnc, - initialize_timeout=self._initialize_timeout, - close_timeout=self._close_timeout, - mp_ctx=self._mp_ctx, - loop=self._loop, - ) + proc: JobExecutor + if self._job_executor_type == JobExecutorType.THREAD: + proc = thread_job_executor.ThreadJobExecutor( + initialize_process_fnc=self._initialize_process_fnc, + job_entrypoint_fnc=self._job_entrypoint_fnc, + initialize_timeout=self._initialize_timeout, + close_timeout=self._close_timeout, + loop=self._loop, + ) + elif self._job_executor_type == JobExecutorType.PROCESS: + proc = proc_job_executor.ProcJobExecutor( + initialize_process_fnc=self._initialize_process_fnc, + job_entrypoint_fnc=self._job_entrypoint_fnc, + initialize_timeout=self._initialize_timeout, + close_timeout=self._close_timeout, + mp_ctx=self._mp_ctx, + loop=self._loop, + ) + else: + raise ValueError(f"unsupported job executor: {self._job_executor_type}") + try: - self._processes.append(proc) + self._executors.append(proc) async with self._init_sem: if self._closed: @@ -99,11 +131,11 @@ async def _proc_watch_task(self) -> None: await proc.join() self.emit("process_closed", proc) finally: - self._processes.remove(proc) + self._executors.remove(proc) @utils.log_exceptions(logger=logger) async def _main_task(self) -> None: - watch_tasks = [] + watch_tasks: list[asyncio.Task[None]] = [] try: while True: await self._proc_needed_sem.acquire() @@ -111,5 +143,5 @@ async def _main_task(self) -> None: watch_tasks.append(task) task.add_done_callback(watch_tasks.remove) except asyncio.CancelledError: - await asyncio.gather(*[proc.aclose() for proc in self._processes]) + await asyncio.gather(*[proc.aclose() for proc in self._executors]) await asyncio.gather(*watch_tasks) diff --git a/livekit-agents/livekit/agents/ipc/proto.py b/livekit-agents/livekit/agents/ipc/proto.py index 9e8567ffe..7dd7c29e3 100644 --- a/livekit-agents/livekit/agents/ipc/proto.py +++ b/livekit-agents/livekit/agents/ipc/proto.py @@ -1,14 +1,12 @@ from __future__ import annotations import io -import multiprocessing as mp -import socket from dataclasses import dataclass, field -from typing import Any, Callable, ClassVar, Coroutine +from typing import ClassVar from livekit.protocol import agent -from ..job import JobAcceptArguments, JobContext, JobProcess, RunningJobInfo +from ..job import JobAcceptArguments, RunningJobInfo from . import channel PING_INTERVAL = 2.5 @@ -17,16 +15,6 @@ NO_MESSAGE_TIMEOUT = 15.0 -@dataclass -class ProcStartArgs: - initialize_process_fnc: Callable[[JobProcess], Any] - job_entrypoint_fnc: Callable[[JobContext], Coroutine] - log_q: mp.Queue - mp_cch: socket.socket - asyncio_debug: bool - user_arguments: Any | None = None - - @dataclass class InitializeRequest: """sent by the main process to the subprocess to initialize it. this is going to call initialize_process_fnc""" diff --git a/livekit-agents/livekit/agents/ipc/thread_job_executor.py b/livekit-agents/livekit/agents/ipc/thread_job_executor.py new file mode 100644 index 000000000..99e75f74c --- /dev/null +++ b/livekit-agents/livekit/agents/ipc/thread_job_executor.py @@ -0,0 +1,256 @@ +from __future__ import annotations + +import asyncio +import contextlib +import socket +import threading +from dataclasses import dataclass +from typing import Any, Awaitable, Callable + +from .. import utils +from ..job import JobContext, JobProcess, RunningJobInfo +from ..log import logger +from ..utils.aio import duplex_unix +from . import channel, job_main, proto + + +@dataclass +class _ProcOpts: + initialize_process_fnc: Callable[[JobProcess], Any] + job_entrypoint_fnc: Callable[[JobContext], Awaitable[None]] + initialize_timeout: float + close_timeout: float + + +class ThreadJobExecutor: + def __init__( + self, + *, + initialize_process_fnc: Callable[[JobProcess], Any], + job_entrypoint_fnc: Callable[[JobContext], Awaitable[None]], + initialize_timeout: float, + close_timeout: float, + loop: asyncio.AbstractEventLoop, + ) -> None: + self._loop = loop + self._opts = _ProcOpts( + initialize_process_fnc=initialize_process_fnc, + job_entrypoint_fnc=job_entrypoint_fnc, + initialize_timeout=initialize_timeout, + close_timeout=close_timeout, + ) + + self._user_args: Any | None = None + self._running_job: RunningJobInfo | None = None + + self._main_atask: asyncio.Task[None] | None = None + self._closing = False + self._initialize_fut = asyncio.Future[None]() + + self._lock = asyncio.Lock() + + @property + def started(self) -> bool: + return self._main_atask is not None + + @property + def start_arguments(self) -> Any | None: + return self._user_args + + @start_arguments.setter + def start_arguments(self, value: Any | None) -> None: + self._user_args = value + + @property + def running_job(self) -> RunningJobInfo | None: + return self._running_job + + async def start(self) -> None: + if self.started: + raise RuntimeError("runner already started") + + if self._closing: + raise RuntimeError("runner is closed") + + await asyncio.shield(self._start()) + + async def _start(self) -> None: + async with self._lock: + # to simplify the runners implementation, we also use a duplex in the threaded executor + # (ThreadedRunners), so we can use the same protocol + mp_pch, mp_cch = socket.socketpair() + self._pch = await duplex_unix._AsyncDuplex.open(mp_pch) + + self._join_fut = asyncio.Future[None]() + + def _on_join() -> None: + with contextlib.suppress(RuntimeError): + self._loop.call_soon_threadsafe(self._join_fut.set_result, None) + + targs = job_main.ThreadStartArgs( + mp_cch=mp_cch, + initialize_process_fnc=self._opts.initialize_process_fnc, + job_entrypoint_fnc=self._opts.job_entrypoint_fnc, + user_arguments=self._user_args, + asyncio_debug=self._loop.get_debug(), + join_fnc=_on_join, + ) + + self._thread = t = threading.Thread( + target=job_main.thread_main, + args=(targs,), + name="job_thread_runner", + ) + t.start() + + self._main_atask = asyncio.create_task(self._main_task()) + + async def join(self) -> None: + """wait for the thread to finish""" + if not self.started: + raise RuntimeError("runner not started") + + async with self._lock: + if self._main_atask: + await asyncio.shield(self._main_atask) + + async def initialize(self) -> None: + await channel.asend_message(self._pch, proto.InitializeRequest()) + + try: + init_res = await asyncio.wait_for( + channel.arecv_message(self._pch, proto.IPC_MESSAGES), + timeout=self._opts.initialize_timeout, + ) + assert isinstance( + init_res, proto.InitializeResponse + ), "first message must be InitializeResponse" + except asyncio.TimeoutError: + self._initialize_fut.set_exception( + asyncio.TimeoutError("runner initialization timed out") + ) + logger.error( + "job initialization is taking too much time..", + extra=self.logging_extra(), + ) + raise + except Exception as e: # should be channel.ChannelClosed most of the time + self._initialize_fut.set_exception(e) + raise + else: + self._initialize_fut.set_result(None) + + async def aclose(self) -> None: + """ + attempt to gracefully close the job. warn if it takes too long to close + (in the threaded executor, the job can't be "killed") + """ + if not self.started: + return + + self._closing = True + with contextlib.suppress(utils.aio.duplex_unix.DuplexClosed): + await channel.asend_message(self._pch, proto.ShutdownRequest()) + + try: + if self._main_atask: + await asyncio.wait_for( + asyncio.shield(self._main_atask), timeout=self._opts.close_timeout + ) + except asyncio.TimeoutError: + logger.error( + "job shutdown is taking too much time..", extra=self.logging_extra() + ) + + async with self._lock: + if self._main_atask: + await asyncio.shield(self._main_atask) + + async def launch_job(self, info: RunningJobInfo) -> None: + """start/assign a job to the executor""" + if self._running_job is not None: + raise RuntimeError("executor already has a running job") + + self._running_job = info + start_req = proto.StartJobRequest() + start_req.running_job = info + await channel.asend_message(self._pch, start_req) + + @utils.log_exceptions(logger=logger) + async def _main_task(self) -> None: + try: + await self._initialize_fut + except asyncio.TimeoutError: + pass # this happens when the initialization takes longer than self._initialize_timeout + except Exception: + pass # initialization failed + + pong_timeout = utils.aio.sleep(proto.PING_TIMEOUT) + ping_task = asyncio.create_task(self._ping_pong_task(pong_timeout)) + monitor_task = asyncio.create_task(self._monitor_task(pong_timeout)) + + await self._join_fut + await utils.aio.gracefully_cancel(ping_task, monitor_task) + + with contextlib.suppress(duplex_unix.DuplexClosed): + await self._pch.aclose() + + @utils.log_exceptions(logger=logger) + async def _monitor_task(self, pong_timeout: utils.aio.Sleep) -> None: + while True: + try: + msg = await channel.arecv_message(self._pch, proto.IPC_MESSAGES) + except utils.aio.duplex_unix.DuplexClosed: + break + + if isinstance(msg, proto.PongResponse): + delay = utils.time_ms() - msg.timestamp + if delay > proto.HIGH_PING_THRESHOLD * 1000: + logger.warning( + "job executor is unresponsive", + extra={"delay": delay, **self.logging_extra()}, + ) + + with contextlib.suppress(utils.aio.SleepFinished): + pong_timeout.reset() + + if isinstance(msg, proto.Exiting): + logger.debug( + "job exiting", extra={"reason": msg.reason, **self.logging_extra()} + ) + + @utils.log_exceptions(logger=logger) + async def _ping_pong_task(self, pong_timeout: utils.aio.Sleep) -> None: + ping_interval = utils.aio.interval(proto.PING_INTERVAL) + + async def _send_ping_co(): + while True: + await ping_interval.tick() + try: + await channel.asend_message( + self._pch, proto.PingRequest(timestamp=utils.time_ms()) + ) + except utils.aio.duplex_unix.DuplexClosed: + break + + async def _pong_timeout_co(): + await pong_timeout + logger.error("job is unresponsive..", extra=self.logging_extra()) + + tasks = [ + asyncio.create_task(_send_ping_co()), + asyncio.create_task(_pong_timeout_co()), + ] + try: + await asyncio.gather(*tasks) + finally: + await utils.aio.gracefully_cancel(*tasks) + + def logging_extra(self): + extra: dict[str, Any] = { + "tid": self._thread.native_id, + } + if self._running_job: + extra["job_id"] = self._running_job.job.id + + return extra diff --git a/livekit-agents/livekit/agents/job.py b/livekit-agents/livekit/agents/job.py index 6d66abdd8..19574b71b 100644 --- a/livekit-agents/livekit/agents/job.py +++ b/livekit-agents/livekit/agents/job.py @@ -17,12 +17,20 @@ import asyncio import multiprocessing as mp from dataclasses import dataclass -from enum import Enum -from typing import Any, Callable, Coroutine +from enum import Enum, unique +from typing import Any, Callable, Coroutine, Tuple from livekit import rtc from livekit.protocol import agent, models +from .log import logger + + +@unique +class JobExecutorType(Enum): + PROCESS = "process" + THREAD = "thread" + class AutoSubscribe(str, Enum): SUBSCRIBE_ALL = "subscribe_all" @@ -61,27 +69,74 @@ def __init__( self._room = room self._on_connect = on_connect self._on_shutdown = on_shutdown - self._shutdown_callbacks: list[Callable[[], Coroutine]] = [] + self._shutdown_callbacks: list[Callable[[], Coroutine[None, None, None]]] = [] + self._participant_entrypoints: list[ + Callable[[JobContext, rtc.RemoteParticipant], Coroutine[None, None, None]] + ] = [] + self._participant_tasks = dict[Tuple[str, Callable], asyncio.Task[None]]() + self._room.on("participant_connected", self._participant_available) @property def proc(self) -> JobProcess: + """Returns the process running the job. Useful for storing process-specific state.""" return self._proc @property def job(self) -> agent.Job: + """Returns the current job that the worker is executing.""" return self._info.job @property def room(self) -> rtc.Room: + """The Room object is the main interface that the worker should interact with. + + When the entrypoint is called, the worker has not connected to the Room yet. + Certain properties of Room would not be available before calling JobContext.connect() + """ return self._room @property def agent(self) -> rtc.LocalParticipant: return self._room.local_participant - def add_shutdown_callback(self, callback: Callable[[], Coroutine]) -> None: + def add_shutdown_callback( + self, callback: Callable[[], Coroutine[None, None, None]] + ) -> None: self._shutdown_callbacks.append(callback) + async def wait_for_participant( + self, *, identity: str | None = None + ) -> rtc.RemoteParticipant: + """ + Returns a participant that matches the given identity. If identity is None, the first + participant that joins the room will be returned. + If the participant has already joined, the function will return immediately. + """ + if not self._room.isconnected(): + raise RuntimeError("room is not connected") + + fut = asyncio.Future[rtc.RemoteParticipant]() + + for p in self._room.remote_participants.values(): + if ( + identity is None or p.identity == identity + ) and p.kind != rtc.ParticipantKind.PARTICIPANT_KIND_AGENT: + fut.set_result(p) + break + + def _on_participant_connected(p: rtc.RemoteParticipant): + if ( + identity is None or p.identity == identity + ) and p.kind != rtc.ParticipantKind.PARTICIPANT_KIND_AGENT: + self._room.off("participant_connected", _on_participant_connected) + if not fut.done(): + fut.set_result(p) + + if not fut.done(): + self._room.on("participant_connected", _on_participant_connected) + + return await fut + async def connect( self, *, @@ -89,6 +144,13 @@ async def connect( auto_subscribe: AutoSubscribe = AutoSubscribe.SUBSCRIBE_ALL, rtc_config: rtc.RtcConfiguration | None = None, ) -> None: + """Connect to the room. This method should be called only once. + + Args: + e2ee: End-to-end encryption options. If provided, the Agent will utilize end-to-end encryption. Note: clients will also need to handle E2EE. + auto_subscribe: Whether to automatically subscribe to tracks. Default is AutoSubscribe.SUBSCRIBE_ALL. + rtc_config: Custom RTC configuration to use when connecting to the room. + """ room_options = rtc.RoomOptions( e2ee=e2ee, auto_subscribe=auto_subscribe == AutoSubscribe.SUBSCRIBE_ALL, @@ -97,12 +159,43 @@ async def connect( await self._room.connect(self._info.url, self._info.token, options=room_options) self._on_connect() + for p in self._room.remote_participants.values(): + self._participant_available(p) _apply_auto_subscribe_opts(self._room, auto_subscribe) def shutdown(self, reason: str = "") -> None: self._on_shutdown(reason) + def add_participant_entrypoint( + self, + entrypoint_fnc: Callable[ + [JobContext, rtc.RemoteParticipant], Coroutine[None, None, None] + ], + ): + """Adds an entrypoint function to be run when a participant joins the room. In cases where + the participant has already joined, the entrypoint will be run immediately. Multiple unique entrypoints can be + added and they will each be run in parallel for each participant. + """ + + if entrypoint_fnc in self._participant_entrypoints: + raise ValueError("entrypoints cannot be added more than once") + + self._participant_entrypoints.append(entrypoint_fnc) + + def _participant_available(self, p: rtc.RemoteParticipant) -> None: + for coro in self._participant_entrypoints: + if (p.identity, coro) in self._participant_tasks: + logger.warning( + f"a participant has joined before a prior participant task matching the same identity has finished: '{p.identity}'" + ) + task_name = f"part-entry-{p.identity}-{coro.__name__}" + task = asyncio.create_task(coro(self, p), name=task_name) + self._participant_tasks[(p.identity, coro)] = task + task.add_done_callback( + lambda _: self._participant_tasks.pop((p.identity, coro)) + ) + def _apply_auto_subscribe_opts(room: rtc.Room, auto_subscribe: AutoSubscribe) -> None: if auto_subscribe not in (AutoSubscribe.AUDIO_ONLY, AutoSubscribe.VIDEO_ONLY): @@ -151,7 +244,7 @@ def __init__( self, *, job: agent.Job, - on_reject: Callable[[], Coroutine], + on_reject: Callable[[], Coroutine[None, None, None]], on_accept: Callable[[JobAcceptArguments], Coroutine[None, None, None]], ) -> None: self._job = job @@ -175,6 +268,10 @@ def room(self) -> models.Room: def publisher(self) -> models.ParticipantInfo | None: return self._job.participant + @property + def agent_name(self) -> str: + return self._job.agent_name + async def reject(self) -> None: """Reject the job request. The job may be assigned to another worker""" await self._on_reject() diff --git a/livekit-agents/livekit/agents/llm/_oai_api.py b/livekit-agents/livekit/agents/llm/_oai_api.py index bd46e7bf9..9d7dcf302 100644 --- a/livekit-agents/livekit/agents/llm/_oai_api.py +++ b/livekit-agents/livekit/agents/llm/_oai_api.py @@ -141,7 +141,7 @@ def type2str(t: type) -> str: def _sanitize_primitive( - *, value: Any, expected_type: type, choices: list | None + *, value: Any, expected_type: type, choices: tuple | None ) -> Any: if expected_type is str: if not isinstance(value, str): diff --git a/livekit-agents/livekit/agents/llm/chat_context.py b/livekit-agents/livekit/agents/llm/chat_context.py index 08fd9d630..081a33ad7 100644 --- a/livekit-agents/livekit/agents/llm/chat_context.py +++ b/livekit-agents/livekit/agents/llm/chat_context.py @@ -41,6 +41,8 @@ class ChatMessage: content: str | list[str | ChatImage] | None = None tool_calls: list[function_context.FunctionCallInfo] | None = None tool_call_id: str | None = None + tool_exception: Exception | None = None + _metadata: dict[str, Any] = field(default_factory=dict, repr=False, init=False) @staticmethod def create_tool_from_called_function( @@ -49,9 +51,12 @@ def create_tool_from_called_function( if not called_function.task.done(): raise ValueError("cannot create a tool result from a running ai function") + tool_exception: Exception | None = None try: content = called_function.task.result() except BaseException as e: + if isinstance(e, Exception): + tool_exception = e content = f"Error: {e}" return ChatMessage( @@ -59,6 +64,7 @@ def create_tool_from_called_function( name=called_function.call_info.function_info.name, content=content, tool_call_id=called_function.call_info.tool_call_id, + tool_exception=tool_exception, ) @staticmethod @@ -92,18 +98,21 @@ def copy(self): if tool_calls is not None: tool_calls = tool_calls.copy() - return ChatMessage( + copied_msg = ChatMessage( role=self.role, name=self.name, content=content, tool_calls=tool_calls, tool_call_id=self.tool_call_id, ) + copied_msg._metadata = self._metadata + return copied_msg @dataclass class ChatContext: messages: list[ChatMessage] = field(default_factory=list) + _metadata: dict[str, Any] = field(default_factory=dict, repr=False, init=False) def append( self, *, text: str = "", images: list[ChatImage] = [], role: ChatRole = "system" @@ -112,4 +121,6 @@ def append( return self def copy(self) -> ChatContext: - return ChatContext(messages=[m.copy() for m in self.messages]) + copied_chat_ctx = ChatContext(messages=[m.copy() for m in self.messages]) + copied_chat_ctx._metadata = self._metadata + return copied_chat_ctx diff --git a/livekit-agents/livekit/agents/llm/function_context.py b/livekit-agents/livekit/agents/llm/function_context.py index 42d893d96..9564c3a1c 100644 --- a/livekit-agents/livekit/agents/llm/function_context.py +++ b/livekit-agents/livekit/agents/llm/function_context.py @@ -19,7 +19,7 @@ import functools import inspect import typing -from dataclasses import dataclass, field +from dataclasses import dataclass from typing import Any, Callable, Tuple from ..log import logger @@ -33,10 +33,18 @@ class _UseDocMarker: USE_DOCSTRING = _UseDocMarker() -@dataclass(frozen=True) +@dataclass(frozen=True, init=False) class TypeInfo: - description: str = "" - choices: list[Any] = field(default_factory=list) + description: str + choices: tuple + + def __init__(self, description: str, choices: tuple | list[Any] = tuple()) -> None: + object.__setattr__(self, "description", description) + + if isinstance(choices, list): + choices = tuple(choices) + + object.__setattr__(self, "choices", choices) @dataclass(frozen=True) @@ -45,7 +53,7 @@ class FunctionArgInfo: description: str type: type default: Any - choices: list[Any] | None + choices: tuple | None @dataclass(frozen=True) @@ -137,8 +145,13 @@ def _register_ai_function(self, fnc: Callable) -> None: raise ValueError(f"duplicate ai_callable name: {fnc_name}") sig = inspect.signature(fnc) - type_hints = typing.get_type_hints(fnc) # Annotated[T, ...] -> T - args = dict() + + # get_type_hints with include_extra=True is needed when using Annotated + # using typing.get_args with param.Annotated is returning an empty tuple for some reason + type_hints = typing.get_type_hints( + fnc, include_extras=True + ) # Annotated[T, ...] -> T + args = dict[str, FunctionArgInfo]() for name, param in sig.parameters.items(): if param.kind not in ( @@ -147,37 +160,32 @@ def _register_ai_function(self, fnc: Callable) -> None: ): raise ValueError(f"{fnc_name}: unsupported parameter kind {param.kind}") - if param.annotation is inspect.Parameter.empty: - raise ValueError( - f"{fnc_name}: missing type annotation for parameter {name}" - ) + inner_th, type_info = _extract_types(type_hints[name]) - th = type_hints[name] - if not is_type_supported(th): + if not is_type_supported(inner_th): raise ValueError( - f"{fnc_name}: unsupported type {th} for parameter {name}" + f"{fnc_name}: unsupported type {inner_th} for parameter {name}" ) - type_info = _find_param_type_info(param.annotation) desc = type_info.description if type_info else "" choices = type_info.choices if type_info else None - is_optional, inner_type = _is_optional_type(th) + is_optional, optional_inner = _is_optional_type(inner_th) if is_optional: # when the type is optional, only the inner type is relevant # the argument info for default would be None - th = inner_type + inner_th = optional_inner - if issubclass(th, enum.Enum) and not choices: + if issubclass(inner_th, enum.Enum) and not choices: # the enum must be a str or int (and at least one value) # this is verified by is_type_supported - choices = [item.value for item in th] - th = type(choices[0]) + choices = tuple([item.value for item in inner_th]) + inner_th = type(choices[0]) args[name] = FunctionArgInfo( name=name, description=desc, - type=th, + type=inner_th, default=param.default, choices=choices, ) @@ -202,15 +210,33 @@ class _AIFncMetadata: auto_retry: bool -def _find_param_type_info(annotation: type) -> TypeInfo | None: +def _extract_types(annotation: type) -> tuple[type, TypeInfo | None]: + """Return inner_type, TypeInfo""" if typing.get_origin(annotation) is not typing.Annotated: - return None - - for a in typing.get_args(annotation): + # email: Annotated[ + # Optional[str], TypeInfo(description="The user address email") + # ] = None, + # + # An argument like the above will return us: + # `typing.Optional[typing.Annotated[typing.Optional[str], TypeInfo(description='The user address email', choices=())]]` + # So we ignore the first typing.Optional + + is_optional, optional_inner = _is_optional_type(annotation) + if is_optional: + return _extract_types(optional_inner) + + return annotation, None + + # assume the first argument is always the inner type the LLM will use + args = typing.get_args(annotation) + if len(args) < 2: + return args[0], None + + for a in args: if isinstance(a, TypeInfo): - return a + return args[0], a - return None + return args[0], None def _set_metadata( diff --git a/livekit-agents/livekit/agents/log.py b/livekit-agents/livekit/agents/log.py index f8236850c..7757aff59 100644 --- a/livekit-agents/livekit/agents/log.py +++ b/livekit-agents/livekit/agents/log.py @@ -1,6 +1,6 @@ import logging -DEV_LEVEL = 25 +DEV_LEVEL = 23 logging.addLevelName(DEV_LEVEL, "DEV") logger = logging.getLogger("livekit.agents") diff --git a/livekit-agents/livekit/agents/proto.py b/livekit-agents/livekit/agents/proto.py new file mode 100644 index 000000000..3fc3dbd31 --- /dev/null +++ b/livekit-agents/livekit/agents/proto.py @@ -0,0 +1,5 @@ +from typing import Literal, Union + +ATTR_AGENT_STATE = "lk.agent.state" + +AgentState = Union[Literal["initializing", "listening", "thinking", "speaking"], str] diff --git a/livekit-agents/livekit/agents/tokenize/__init__.py b/livekit-agents/livekit/agents/tokenize/__init__.py index 1a9eafb57..5b18d0e29 100644 --- a/livekit-agents/livekit/agents/tokenize/__init__.py +++ b/livekit-agents/livekit/agents/tokenize/__init__.py @@ -1,4 +1,4 @@ -from . import basic +from . import basic, utils from .token_stream import ( BufferedSentenceStream, BufferedWordStream, @@ -20,4 +20,5 @@ "BufferedSentenceStream", "BufferedWordStream", "basic", + "utils", ] diff --git a/livekit-agents/livekit/agents/tokenize/_basic_paragraph.py b/livekit-agents/livekit/agents/tokenize/_basic_paragraph.py index 726515103..263a87f33 100644 --- a/livekit-agents/livekit/agents/tokenize/_basic_paragraph.py +++ b/livekit-agents/livekit/agents/tokenize/_basic_paragraph.py @@ -1,12 +1,18 @@ -def split_paragraphs(text: str) -> list[str]: - sep = "\n\n" - - paragraphs = text.split(sep) - new_paragraphs = [] - for p in paragraphs: - p = p.strip() - if not p: - continue - new_paragraphs.append(p) - - return new_paragraphs +import re + + +def split_paragraphs(text: str) -> list[tuple[str, int, int]]: + """ + Split the text into paragraphs. + Returns a list of paragraphs with their start and end indices of the original text. + """ + matches = re.finditer(r"\n{2,}", text) + paragraphs = [] + + for match in matches: + paragraph = match.group(0) + start_pos = match.start() + end_pos = match.end() + paragraphs.append((paragraph.strip(), start_pos, end_pos)) + + return paragraphs diff --git a/livekit-agents/livekit/agents/tokenize/_basic_sent.py b/livekit-agents/livekit/agents/tokenize/_basic_sent.py index 1e8721dc5..9b33fc4e2 100644 --- a/livekit-agents/livekit/agents/tokenize/_basic_sent.py +++ b/livekit-agents/livekit/agents/tokenize/_basic_sent.py @@ -1,9 +1,13 @@ import re -# rule based segmentation from https://stackoverflow.com/a/31505798, works surprisingly well -def split_sentences(text: str, min_sentence_len: int = 20) -> list[str]: - """the text can't contains substrings "" or """" +# rule based segmentation based on https://stackoverflow.com/a/31505798, works surprisingly well +def split_sentences( + text: str, min_sentence_len: int = 20 +) -> list[tuple[str, int, int]]: + """ + the text may not contain substrings "" or "" + """ alphabets = r"([A-Za-z])" prefixes = r"(Mr|St|Mrs|Ms|Dr)[.]" suffixes = r"(Inc|Ltd|Jr|Sr|Co)" @@ -14,12 +18,11 @@ def split_sentences(text: str, min_sentence_len: int = 20) -> list[str]: multiple_dots = r"\.{2,}" # fmt: off - text = " " + text + " " text = text.replace("\n"," ") - text = re.sub(prefixes,"\\1",text) - text = re.sub(websites,"\\1",text) + text = re.sub(prefixes,"\\1", text) + text = re.sub(websites,"\\1", text) text = re.sub(digits + "[.]" + digits,"\\1\\2",text) - #text = re.sub(multiple_dots, lambda match: "" * len(match.group(0)) + "", text) + # text = re.sub(multiple_dots, lambda match: "" * len(match.group(0)) + "", text) # TODO(theomonnom): need improvement for ""..." dots", check capital + next sentence should not be # small text = re.sub(multiple_dots, lambda match: "" * len(match.group(0)), text) @@ -44,21 +47,29 @@ def split_sentences(text: str, min_sentence_len: int = 20) -> list[str]: text = text.replace("?","?") text = text.replace("!","!") text = text.replace("",".") - sentences = text.split("") - sentences = [s.strip() for s in sentences] - if sentences and not sentences[-1]: - sentences = sentences[:-1] # fmt: on - new_sentences = [] + splitted_sentences = text.split("") + text = text.replace("", "") + + sentences: list[tuple[str, int, int]] = [] + buff = "" - for sentence in sentences: + start_pos = 0 + end_pos = 0 + for match in splitted_sentences: + sentence = match.strip() + if not sentence: + continue + buff += " " + sentence + end_pos += len(match) if len(buff) > min_sentence_len: - new_sentences.append(buff[1:]) + sentences.append((buff[1:], start_pos, end_pos)) + start_pos = end_pos buff = "" if buff: - new_sentences.append(buff[1:]) + sentences.append((buff[1:], start_pos, len(text) - 1)) - return new_sentences + return sentences diff --git a/livekit-agents/livekit/agents/tokenize/_basic_word.py b/livekit-agents/livekit/agents/tokenize/_basic_word.py index e19f8bac6..109ee7160 100644 --- a/livekit-agents/livekit/agents/tokenize/_basic_word.py +++ b/livekit-agents/livekit/agents/tokenize/_basic_word.py @@ -1,22 +1,31 @@ import re +from . import tokenizer -def split_words(text: str, ignore_punctuation: bool = True) -> list[str]: - # fmt: off - punctuations = [".", ",", "!", "?", ";", ":", "'", '"', "(", ")", "[", "]", "{", "}", "<", ">", - "—"] - # fmt: on - - if ignore_punctuation: - for p in punctuations: - # TODO(theomonnom): Ignore acronyms - text = text.replace(p, "") - - words = re.split("[ \n]+", text) - new_words = [] - for word in words: - if not word: - continue # ignore empty - new_words.append(word) - - return new_words + +def split_words( + text: str, ignore_punctuation: bool = True +) -> list[tuple[str, int, int]]: + """ + Split the text into words. + Returns a list of words with their start and end indices of the original text. + """ + matches = re.finditer(r"\S+", text) + words: list[tuple[str, int, int]] = [] + + for match in matches: + word = match.group(0) + start_pos = match.start() + end_pos = match.end() + + if ignore_punctuation: + # TODO(theomonnom): acronyms passthrough + translation_table = str.maketrans("", "", "".join(tokenizer.PUNCTUATIONS)) + word = word.translate(translation_table) + + if not word: + continue + + words.append((word, start_pos, end_pos)) + + return words diff --git a/livekit-agents/livekit/agents/tokenize/basic.py b/livekit-agents/livekit/agents/tokenize/basic.py index fd8f84c22..70bbd09cd 100644 --- a/livekit-agents/livekit/agents/tokenize/basic.py +++ b/livekit-agents/livekit/agents/tokenize/basic.py @@ -45,9 +45,12 @@ def __init__( ) def tokenize(self, text: str, *, language: str | None = None) -> list[str]: - return _basic_sent.split_sentences( - text, min_sentence_len=self._config.min_sentence_len - ) + return [ + tok[0] + for tok in _basic_sent.split_sentences( + text, min_sentence_len=self._config.min_sentence_len + ) + ] def stream(self, *, language: str | None = None) -> tokenizer.SentenceStream: return token_stream.BufferedSentenceStream( @@ -65,9 +68,12 @@ def __init__(self, *, ignore_punctuation: bool = True) -> None: self._ignore_punctuation = ignore_punctuation def tokenize(self, text: str, *, language: str | None = None) -> list[str]: - return _basic_word.split_words( - text, ignore_punctuation=self._ignore_punctuation - ) + return [ + tok[0] + for tok in _basic_word.split_words( + text, ignore_punctuation=self._ignore_punctuation + ) + ] def stream(self, *, language: str | None = None) -> tokenizer.WordStream: return token_stream.BufferedWordStream( @@ -84,4 +90,4 @@ def hyphenate_word(word: str) -> list[str]: def tokenize_paragraphs(text: str) -> list[str]: - return _basic_paragraph.split_paragraphs(text) + return [tok[0] for tok in _basic_paragraph.split_paragraphs(text)] diff --git a/livekit-agents/livekit/agents/tokenize/token_stream.py b/livekit-agents/livekit/agents/tokenize/token_stream.py index 9be14e2ec..a7e09734d 100644 --- a/livekit-agents/livekit/agents/tokenize/token_stream.py +++ b/livekit-agents/livekit/agents/tokenize/token_stream.py @@ -1,16 +1,21 @@ from __future__ import annotations -from typing import Callable +import typing +from typing import Callable, Union from ..utils import aio, shortuuid from .tokenizer import SentenceStream, TokenData, WordStream +# Tokenizers can either provide us with a list of tokens or a list of tokens along with their start and end indices. +# If the start and end indices are not available, we attempt to locate the token within the text using str.find. +TokenizeCallable = Callable[[str], Union[list[str], list[tuple[str, int, int]]]] + class BufferedTokenStream: def __init__( self, *, - tokenize_fnc: Callable[[str], list[str]], + tokenize_fnc: TokenizeCallable, min_token_len: int, min_ctx_len: int, ) -> None: @@ -21,53 +26,68 @@ def __init__( self._current_segment_id = shortuuid() self._buf_tokens: list[str] = [] # <= min_token_len - self._buf = "" + self._in_buf = "" + self._out_buf = "" + @typing.no_type_check def push_text(self, text: str) -> None: self._check_not_closed() - self._buf += text + self._in_buf += text - if len(self._buf) < self._min_ctx_len: + if len(self._in_buf) < self._min_ctx_len: return - tokens = self._tokenize_fnc(self._buf) + while True: + tokens = self._tokenize_fnc(self._in_buf) + if len(tokens) <= 1: + break - buf_toks = [] - buf = "" - while len(tokens) > 1: - if buf: - buf += " " + if self._out_buf: + self._out_buf += " " tok = tokens.pop(0) - buf += tok - buf_toks.append(tok) - if len(buf) >= self._min_token_len: + tok_text = tok + if isinstance(tok, tuple): + tok_text = tok[0] + + self._out_buf += tok_text + if len(self._out_buf) >= self._min_token_len: self._event_ch.send_nowait( - TokenData(token=buf, segment_id=self._current_segment_id) + TokenData(token=self._out_buf, segment_id=self._current_segment_id) ) - for i, tok in enumerate(buf_toks): - tok_i = self._buf.find(tok) - self._buf = self._buf[tok_i + len(tok) :].lstrip() + self._out_buf = "" - buf_toks = [] - buf = "" + if isinstance(tok, tuple): + self._in_buf = self._in_buf[tok[2] :] + else: + tok_i = max(self._in_buf.find(tok), 0) + self._in_buf = self._in_buf[tok_i + len(tok) :].lstrip() + @typing.no_type_check def flush(self) -> None: self._check_not_closed() - if self._buf: - tokens = self._tokenize_fnc(self._buf) + + if self._in_buf or self._out_buf: + tokens = self._tokenize_fnc(self._in_buf) if tokens: - buf = " ".join(tokens) - else: - buf = self._buf + if self._out_buf: + self._out_buf += " " + + if isinstance(tokens[0], tuple): + self._out_buf += " ".join([tok[0] for tok in tokens]) + else: + self._out_buf += " ".join(tokens) + + if self._out_buf: + self._event_ch.send_nowait( + TokenData(token=self._out_buf, segment_id=self._current_segment_id) + ) - self._event_ch.send_nowait( - TokenData(token=buf, segment_id=self._current_segment_id) - ) self._current_segment_id = shortuuid() - self._buf = "" + self._in_buf = "" + self._out_buf = "" def end_input(self) -> None: self.flush() @@ -92,7 +112,7 @@ class BufferedSentenceStream(BufferedTokenStream, SentenceStream): def __init__( self, *, - tokenizer: Callable[[str], list[str]], + tokenizer: TokenizeCallable, min_token_len: int, min_ctx_len: int, ) -> None: @@ -107,7 +127,7 @@ class BufferedWordStream(BufferedTokenStream, WordStream): def __init__( self, *, - tokenizer: Callable[[str], list[str]], + tokenizer: TokenizeCallable, min_token_len: int, min_ctx_len: int, ) -> None: diff --git a/livekit-agents/livekit/agents/tokenize/tokenizer.py b/livekit-agents/livekit/agents/tokenize/tokenizer.py index c4734a204..b785edb0e 100644 --- a/livekit-agents/livekit/agents/tokenize/tokenizer.py +++ b/livekit-agents/livekit/agents/tokenize/tokenizer.py @@ -6,6 +6,12 @@ from ..utils import aio +# fmt: off +PUNCTUATIONS = ['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', + '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~', '±', '—', '‘', '’', '“', '”', '…'] + +# fmt: on + @dataclass class TokenData: diff --git a/livekit-agents/livekit/agents/tokenize/utils.py b/livekit-agents/livekit/agents/tokenize/utils.py new file mode 100644 index 000000000..82e8b302c --- /dev/null +++ b/livekit-agents/livekit/agents/tokenize/utils.py @@ -0,0 +1,82 @@ +from __future__ import annotations + +from typing import AsyncIterable, overload + +from . import _basic_word, tokenizer + + +@overload +def replace_words( + *, + text: str, + replacements: dict[str, str], +) -> str: ... + + +@overload +def replace_words( + *, + text: AsyncIterable[str], + replacements: dict[str, str], +) -> AsyncIterable[str]: ... + + +def replace_words( + *, + text: str | AsyncIterable[str], + replacements: dict[str, str], +) -> str | AsyncIterable[str]: + """ + Replace words in the given (async) text. The replacements are case-insensitive and the + replacement will keep the case of the original word. + Args: + text: text to replace words in + words: dictionary of words to replace + """ + + replacements = {k.lower(): v for k, v in replacements.items()} + + def _process_words(text, words): + offset = 0 + processed_index = 0 + for word, start_index, end_index in words: + no_punctuation = word.rstrip("".join(tokenizer.PUNCTUATIONS)) + punctuation_off = len(word) - len(no_punctuation) + replacement = replacements.get(no_punctuation.lower()) + if replacement: + text = ( + text[: start_index + offset] + + replacement + + text[end_index + offset - punctuation_off :] + ) + offset += len(replacement) - len(word) + punctuation_off + + processed_index = end_index + offset + + return text, processed_index + + if isinstance(text, str): + words = _basic_word.split_words(text, ignore_punctuation=False) + text, _ = _process_words(text, words) + return text + else: + + async def _replace_words(): + buffer = "" + async for chunk in text: + buffer += chunk + words = _basic_word.split_words(buffer, ignore_punctuation=False) + + if len(words) <= 1: + continue + + buffer, procesed_index = _process_words(buffer, words[:-1]) + yield buffer[:procesed_index] + buffer = buffer[procesed_index:] + + if buffer: + words = _basic_word.split_words(buffer, ignore_punctuation=False) + buffer, _ = _process_words(buffer, words) + yield buffer + + return _replace_words() diff --git a/livekit-agents/livekit/agents/transcription/stt_forwarder.py b/livekit-agents/livekit/agents/transcription/stt_forwarder.py index 410b343eb..0d526a3a6 100644 --- a/livekit-agents/livekit/agents/transcription/stt_forwarder.py +++ b/livekit-agents/livekit/agents/transcription/stt_forwarder.py @@ -10,13 +10,16 @@ from ..log import logger from . import _utils -WillForwardTranscription = Callable[ +BeforeForwardCallback = Callable[ ["STTSegmentsForwarder", rtc.Transcription], Union[rtc.Transcription, Awaitable[Optional[rtc.Transcription]]], ] -def _default_will_forward_transcription( +WillForwardTranscription = BeforeForwardCallback + + +def _default_before_forward_cb( fwd: STTSegmentsForwarder, transcription: rtc.Transcription ) -> rtc.Transcription: return transcription @@ -33,7 +36,9 @@ def __init__( room: rtc.Room, participant: rtc.Participant | str, track: rtc.Track | rtc.TrackPublication | str | None = None, - will_forward_transcription: WillForwardTranscription = _default_will_forward_transcription, + before_forward_cb: BeforeForwardCallback = _default_before_forward_cb, + # backward compatibility + will_forward_transcription: WillForwardTranscription | None = None, ): identity = participant if isinstance(participant, str) else participant.identity if track is None: @@ -41,8 +46,14 @@ def __init__( elif isinstance(track, (rtc.TrackPublication, rtc.Track)): track = track.sid + if will_forward_transcription is not None: + logger.warning( + "will_forward_transcription is deprecated and will be removed in 1.5.0, use before_forward_cb instead", + ) + before_forward_cb = will_forward_transcription + self._room, self._participant_identity, self._track_id = room, identity, track - self._will_forward_transcription = will_forward_transcription + self._before_forward_cb = before_forward_cb self._queue = asyncio.Queue[Optional[rtc.TranscriptionSegment]]() self._main_task = asyncio.create_task(self._run()) self._current_id = _utils.segment_uuid() @@ -60,16 +71,12 @@ async def _run(self): segments=[seg], # no history for now ) - transcription = self._will_forward_transcription( - self, base_transcription - ) + transcription = self._before_forward_cb(self, base_transcription) if asyncio.iscoroutine(transcription): transcription = await transcription if not isinstance(transcription, rtc.Transcription): - transcription = _default_will_forward_transcription( - self, base_transcription - ) + transcription = _default_before_forward_cb(self, base_transcription) if transcription.segments and self._room.isconnected(): await self._room.local_participant.publish_transcription( diff --git a/livekit-agents/livekit/agents/transcription/tts_forwarder.py b/livekit-agents/livekit/agents/transcription/tts_forwarder.py index d613e2970..867e86732 100644 --- a/livekit-agents/livekit/agents/transcription/tts_forwarder.py +++ b/livekit-agents/livekit/agents/transcription/tts_forwarder.py @@ -3,27 +3,31 @@ import asyncio import contextlib import time -from collections import deque from dataclasses import dataclass -from typing import Awaitable, Callable, Deque, Optional, Union +from typing import Awaitable, Callable, Optional, Union from livekit import rtc +from livekit.rtc.participant import PublishTranscriptionError from .. import tokenize, utils from ..log import logger +from ..tokenize.tokenizer import PUNCTUATIONS from . import _utils # 3.83 is the "baseline", the number of hyphens per second TTS returns in avg. STANDARD_SPEECH_RATE = 3.83 -WillForwardTranscription = Callable[ +BeforeForwardCallback = Callable[ ["TTSSegmentsForwarder", rtc.Transcription], Union[rtc.Transcription, Awaitable[Optional[rtc.Transcription]]], ] -def _default_will_forward_transcription( +WillForwardTranscription = BeforeForwardCallback + + +def _default_before_forward_callback( fwd: TTSSegmentsForwarder, transcription: rtc.Transcription ) -> rtc.Transcription: return transcription @@ -40,27 +44,23 @@ class _TTSOptions: sentence_tokenizer: tokenize.SentenceTokenizer hyphenate_word: Callable[[str], list[str]] new_sentence_delay: float - will_forward_transcription: WillForwardTranscription + before_forward_cb: BeforeForwardCallback @dataclass -class _SegmentData: - segment_index: int - sentence_stream: tokenize.SentenceStream - pushed_text: str = "" +class _AudioData: pushed_duration: float = 0.0 - real_speed: float | None = None - processed_sentences: int = 0 - processed_hyphens: int = 0 - validated: bool = False - forward_start_time: float | None = 0.0 + done: bool = False @dataclass -class _FormingSegments: - audio: _SegmentData - text: _SegmentData - q: deque[_SegmentData] +class _TextData: + sentence_stream: tokenize.SentenceStream + pushed_text: str = "" + done: bool = False + + forwarded_hyphens: int = 0 + forwarded_sentences: int = 0 class TTSSegmentsForwarder: @@ -83,8 +83,10 @@ def __init__( word_tokenizer: tokenize.WordTokenizer = tokenize.basic.WordTokenizer(), sentence_tokenizer: tokenize.SentenceTokenizer = tokenize.basic.SentenceTokenizer(), hyphenate_word: Callable[[str], list[str]] = tokenize.basic.hyphenate_word, - will_forward_transcription: WillForwardTranscription = _default_will_forward_transcription, + before_forward_cb: BeforeForwardCallback = _default_before_forward_callback, loop: asyncio.AbstractEventLoop | None = None, + # backward compatibility + will_forward_transcription: WillForwardTranscription | None = None, ): """ Args: @@ -109,6 +111,12 @@ def __init__( elif isinstance(track, (rtc.TrackPublication, rtc.Track)): track = track.sid + if will_forward_transcription is not None: + logger.warning( + "will_forward_transcription is deprecated and will be removed in 1.5.0, use before_forward_cb instead", + ) + before_forward_cb = will_forward_transcription + speed = speed * STANDARD_SPEECH_RATE self._opts = _TTSOptions( room=room, @@ -120,31 +128,28 @@ def __init__( sentence_tokenizer=sentence_tokenizer, hyphenate_word=hyphenate_word, new_sentence_delay=new_sentence_delay, - will_forward_transcription=will_forward_transcription, + before_forward_cb=before_forward_cb, ) self._closed = False self._loop = loop or asyncio.get_event_loop() self._close_future = asyncio.Future[None]() - self._next_segment_index = 0 self._playing_seg_index = -1 self._finshed_seg_index = -1 - first_segment = self._create_segment() - segments_q: Deque[_SegmentData] = deque() - segments_q.append(first_segment) + self._text_q_changed = asyncio.Event() + self._text_q = list[Union[_TextData, None]]() + self._audio_q_changed = asyncio.Event() + self._audio_q = list[Union[_AudioData, None]]() - self._forming_segments = _FormingSegments( - audio=first_segment, text=first_segment, q=segments_q - ) + self._text_data: _TextData | None = None + self._audio_data: _AudioData | None = None + + self._played_text = "" - self._seg_queue = asyncio.Queue[Optional[_SegmentData]]() - self._seg_queue.put_nowait(first_segment) self._main_atask = self._loop.create_task(self._main_task()) self._task_set = utils.aio.TaskSet(loop) - self._played_text = "" - def segment_playout_started(self) -> None: """ Notify that the playout of the audio segment has started. @@ -164,47 +169,48 @@ def segment_playout_finished(self) -> None: def push_audio(self, frame: rtc.AudioFrame) -> None: self._check_not_closed() + + if self._audio_data is None: + self._audio_data = _AudioData() + self._audio_q.append(self._audio_data) + self._audio_q_changed.set() + frame_duration = frame.samples_per_channel / frame.sample_rate - cur_seg = self._forming_segments.audio - cur_seg.pushed_duration += frame_duration - cur_seg.validated = True + self._audio_data.pushed_duration += frame_duration def mark_audio_segment_end(self) -> None: self._check_not_closed() - try: - # get last ended segment (text always end before audio) - seg = self._forming_segments.q.popleft() - except IndexError: - raise IndexError( - "mark_audio_segment_end called before any mark_text_segment_end" - ) - if seg.pushed_duration > 0.0: - seg.real_speed = ( - len(self._calc_hyphens(seg.pushed_text)) / seg.pushed_duration - ) + if self._audio_data is None: + self.push_audio(rtc.AudioFrame(bytes(), 24000, 1, 0)) - seg.validated = True - self._forming_segments.audio = self._forming_segments.q[0] + assert self._audio_data is not None + self._audio_data.done = True + self._audio_data = None def push_text(self, text: str) -> None: self._check_not_closed() - cur_seg = self._forming_segments.text - cur_seg.pushed_text += text - cur_seg.sentence_stream.push_text(text) + + if self._text_data is None: + self._text_data = _TextData( + sentence_stream=self._opts.sentence_tokenizer.stream() + ) + self._text_q.append(self._text_data) + self._text_q_changed.set() + + self._text_data.pushed_text += text + self._text_data.sentence_stream.push_text(text) def mark_text_segment_end(self) -> None: self._check_not_closed() - stream = self._forming_segments.text.sentence_stream - stream.end_input() - # create a new segment on "mark_text_segment_end" - # further text can already be pushed even if mark_audio_segment_end has not been - # called yet - new_seg = self._create_segment() - self._forming_segments.text = new_seg - self._forming_segments.q.append(new_seg) - self._seg_queue.put_nowait(new_seg) + if self._text_data is None: + self.push_text("") + + assert self._text_data is not None + self._text_data.done = True + self._text_data.sentence_stream.end_input() + self._text_data = None @property def closed(self) -> bool: @@ -220,10 +226,15 @@ async def aclose(self) -> None: self._closed = True self._close_future.set_result(None) - self._seg_queue.put_nowait(None) - for seg in self._forming_segments.q: - await seg.sentence_stream.aclose() + for text_data in self._text_q: + assert text_data is not None + await text_data.sentence_stream.aclose() + + self._text_q.append(None) + self._audio_q.append(None) + self._text_q_changed.set() + self._audio_q_changed.set() await self._task_set.aclose() await self._main_atask @@ -231,78 +242,105 @@ async def aclose(self) -> None: @utils.log_exceptions(logger=logger) async def _main_task(self) -> None: """Main task that forwards the transcription to the room.""" - rtc_seg_q = asyncio.Queue[Optional[rtc.TranscriptionSegment]]() + rtc_seg_ch = utils.aio.Chan[rtc.TranscriptionSegment]() @utils.log_exceptions(logger=logger) async def _forward_task(): - while True: - seg = await rtc_seg_q.get() - if seg is None: - break - + async for rtc_seg in rtc_seg_ch: base_transcription = rtc.Transcription( participant_identity=self._opts.participant_identity, track_sid=self._opts.track_id, - segments=[seg], # no history for now + segments=[rtc_seg], # no history for now ) - transcription = self._opts.will_forward_transcription( - self, base_transcription - ) + transcription = self._opts.before_forward_cb(self, base_transcription) if asyncio.iscoroutine(transcription): transcription = await transcription # fallback to default impl if no custom/user stream is returned if not isinstance(transcription, rtc.Transcription): - transcription = _default_will_forward_transcription( + transcription = _default_before_forward_callback( self, base_transcription ) if transcription.segments and self._opts.room.isconnected(): - await self._opts.room.local_participant.publish_transcription( - transcription - ) + try: + await self._opts.room.local_participant.publish_transcription( + transcription + ) + except PublishTranscriptionError: + continue forward_task = asyncio.create_task(_forward_task()) - while True: - seg = await self._seg_queue.get() - if seg is None: - break + seg_index = 0 + q_done = False + while not q_done: + await self._text_q_changed.wait() + await self._audio_q_changed.wait() + + while self._text_q and self._audio_q: + text_data = self._text_q.pop(0) + audio_data = self._audio_q.pop(0) - # wait until the segment is validated and has started playing - while not self._closed: - if seg.validated and self._playing_seg_index >= seg.segment_index: + if text_data is None or audio_data is None: + q_done = True break - await self._sleep_if_not_closed(0.1) + # wait until the segment is validated and has started playing + while not self._closed: + if self._playing_seg_index >= seg_index: + break + + await self._sleep_if_not_closed(0.125) - sentence_stream = seg.sentence_stream - seg.forward_start_time = time.time() + sentence_stream = text_data.sentence_stream + forward_start_time = time.time() + + async for ev in sentence_stream: + await self._sync_sentence_co( + seg_index, + forward_start_time, + text_data, + audio_data, + ev.token, + rtc_seg_ch, + ) - async for ev in sentence_stream: - await self._sync_sentence_co(seg, ev.token, rtc_seg_q) + seg_index += 1 - rtc_seg_q.put_nowait(None) + self._text_q_changed.clear() + self._audio_q_changed.clear() + + rtc_seg_ch.close() await forward_task async def _sync_sentence_co( self, - seg: _SegmentData, - tokenized_sentence: str, - rtc_seg_q: asyncio.Queue[Optional[rtc.TranscriptionSegment]], + segment_index: int, + segment_start_time: float, + text_data: _TextData, + audio_data: _AudioData, + sentence: str, + rtc_seg_ch: utils.aio.Chan[rtc.TranscriptionSegment], ): """Synchronize the transcription with the audio playout for a given sentence.""" - assert seg.forward_start_time is not None - # put each sentence in a different transcription segment + + real_speed = None + if audio_data.pushed_duration > 0 and audio_data.done: + real_speed = ( + len(self._calc_hyphens(text_data.pushed_text)) + / audio_data.pushed_duration + ) + seg_id = _utils.segment_uuid() - words = self._opts.word_tokenizer.tokenize(text=tokenized_sentence) + words = self._opts.word_tokenizer.tokenize(text=sentence) processed_words: list[str] = [] og_text = self._played_text for word in words: - if seg.segment_index <= self._finshed_seg_index: + if segment_index <= self._finshed_seg_index: # playout of the audio segment already finished # break the loop and send the final transcription break @@ -315,19 +353,22 @@ async def _sync_sentence_co( processed_words.append(word) # elapsed time since the start of the seg - elapsed_time = time.time() - seg.forward_start_time + elapsed_time = time.time() - segment_start_time text = self._opts.word_tokenizer.format_words(processed_words) + # remove any punctuation at the end of a non-final transcript + text = text.rstrip("".join(PUNCTUATIONS)) + speed = self._opts.speed - if seg.real_speed is not None: - speed = seg.real_speed + if real_speed is not None: + speed = real_speed estimated_pauses_s = ( - seg.processed_sentences * self._opts.new_sentence_delay + text_data.forwarded_sentences * self._opts.new_sentence_delay ) hyph_pauses = estimated_pauses_s * speed target_hyphens = round(speed * elapsed_time) - dt = target_hyphens - seg.processed_hyphens - hyph_pauses + dt = target_hyphens - text_data.forwarded_hyphens - hyph_pauses to_wait_hyphens = max(0.0, word_hyphens - dt) delay = to_wait_hyphens / speed else: @@ -335,7 +376,8 @@ async def _sync_sentence_co( first_delay = min(delay / 2, 2 / speed) await self._sleep_if_not_closed(first_delay) - rtc_seg_q.put_nowait( + + rtc_seg_ch.send_nowait( rtc.TranscriptionSegment( id=seg_id, text=text, @@ -346,23 +388,24 @@ async def _sync_sentence_co( ) ) self._played_text = f"{og_text} {text}" + await self._sleep_if_not_closed(delay - first_delay) - seg.processed_hyphens += word_hyphens + text_data.forwarded_hyphens += word_hyphens - rtc_seg_q.put_nowait( + rtc_seg_ch.send_nowait( rtc.TranscriptionSegment( id=seg_id, - text=tokenized_sentence, + text=sentence, start_time=0, end_time=0, final=True, language=self._opts.language, ) ) - self._played_text = f"{og_text} {tokenized_sentence}" + self._played_text = f"{og_text} {sentence}" await self._sleep_if_not_closed(self._opts.new_sentence_delay) - seg.processed_sentences += 1 + text_data.forwarded_sentences += 1 async def _sleep_if_not_closed(self, delay: float) -> None: with contextlib.suppress(asyncio.TimeoutError): @@ -377,14 +420,6 @@ def _calc_hyphens(self, text: str) -> list[str]: return hyphens - def _create_segment(self) -> _SegmentData: - data = _SegmentData( - segment_index=self._next_segment_index, - sentence_stream=self._opts.sentence_tokenizer.stream(), - ) - self._next_segment_index += 1 - return data - def _check_not_closed(self) -> None: if self._closed: raise RuntimeError("TTSForwarder is closed") diff --git a/livekit-agents/livekit/agents/utils/aio/__init__.py b/livekit-agents/livekit/agents/utils/aio/__init__.py index 803e12f73..df97e26e9 100644 --- a/livekit-agents/livekit/agents/utils/aio/__init__.py +++ b/livekit-agents/livekit/agents/utils/aio/__init__.py @@ -1,7 +1,7 @@ import asyncio -import contextlib +import functools -from . import debug, duplex_unix +from . import debug, duplex_unix, itertools from .channel import Chan, ChanClosed, ChanReceiver, ChanSender from .interval import Interval, interval from .sleep import Sleep, SleepFinished, sleep @@ -9,11 +9,28 @@ async def gracefully_cancel(*futures: asyncio.Future): - for f in futures: - f.cancel() + loop = asyncio.get_running_loop() + waiters = [] - with contextlib.suppress(asyncio.CancelledError): - await asyncio.gather(*futures) + for fut in futures: + waiter = loop.create_future() + cb = functools.partial(_release_waiter, waiter) + waiters.append((waiter, cb)) + fut.add_done_callback(cb) + fut.cancel() + + try: + for waiter, _ in waiters: + await waiter + finally: + for i, fut in enumerate(futures): + _, cb = waiters[i] + fut.remove_done_callback(cb) + + +def _release_waiter(waiter, *args): + if not waiter.done(): + waiter.set_result(None) __all__ = [ @@ -31,4 +48,5 @@ async def gracefully_cancel(*futures: asyncio.Future): "debug", "gracefully_cancel", "duplex_unix", + "itertools", ] diff --git a/livekit-agents/livekit/agents/utils/aio/duplex_unix.py b/livekit-agents/livekit/agents/utils/aio/duplex_unix.py index de9b1c446..a679c2ed2 100644 --- a/livekit-agents/livekit/agents/utils/aio/duplex_unix.py +++ b/livekit-agents/livekit/agents/utils/aio/duplex_unix.py @@ -36,8 +36,7 @@ async def recv_bytes(self) -> bytes: len = struct.unpack("!I", len_bytes)[0] return await self._reader.readexactly(len) except ( - BrokenPipeError, - ConnectionResetError, + OSError, EOFError, asyncio.IncompleteReadError, ): @@ -49,7 +48,7 @@ async def send_bytes(self, data: bytes) -> None: self._writer.write(len_bytes) self._writer.write(data) await self._writer.drain() - except (ConnectionResetError, BrokenPipeError): + except OSError: raise DuplexClosed() async def aclose(self) -> None: @@ -57,7 +56,7 @@ async def aclose(self) -> None: self._writer.close() await self._writer.wait_closed() self._sock.close() - except (BrokenPipeError, ConnectionResetError): + except OSError: raise DuplexClosed() @@ -80,25 +79,31 @@ def open(sock: socket.socket) -> _Duplex: return _Duplex(sock) def recv_bytes(self) -> bytes: - assert self._sock is not None + if self._sock is None: + raise DuplexClosed() + try: len_bytes = _read_exactly(self._sock, 4) len = struct.unpack("!I", len_bytes)[0] return _read_exactly(self._sock, len) - except (BrokenPipeError, ConnectionResetError, EOFError): + except (OSError, EOFError): raise DuplexClosed() def send_bytes(self, data: bytes) -> None: - assert self._sock is not None + if self._sock is None: + raise DuplexClosed() + try: len_bytes = struct.pack("!I", len(data)) self._sock.sendall(len_bytes) self._sock.sendall(data) - except (BrokenPipeError, ConnectionResetError): + except OSError: raise DuplexClosed() def detach(self) -> socket.socket: - assert self._sock is not None + if self._sock is None: + raise DuplexClosed() + sock = self._sock self._sock = None return sock @@ -108,5 +113,5 @@ def close(self) -> None: if self._sock is not None: self._sock.close() self._sock = None - except (BrokenPipeError, ConnectionResetError): + except OSError: raise DuplexClosed() diff --git a/livekit-agents/livekit/agents/utils/aio/itertools.py b/livekit-agents/livekit/agents/utils/aio/itertools.py new file mode 100644 index 000000000..0076f8eb5 --- /dev/null +++ b/livekit-agents/livekit/agents/utils/aio/itertools.py @@ -0,0 +1,114 @@ +import asyncio +from collections import deque +from typing import ( + Any, + AsyncGenerator, + AsyncIterable, + AsyncIterator, + Deque, + Generic, + Iterator, + List, + Protocol, + Tuple, + TypeVar, + Union, + overload, + runtime_checkable, +) + +from typing_extensions import AsyncContextManager + +# based on https://github.com/maxfischer2781/asyncstdlib/blob/master/asyncstdlib/itertools.py + + +@runtime_checkable +class _ACloseable(Protocol): + async def aclose(self) -> None: + """Asynchronously close this object""" + + +T = TypeVar("T") + + +async def tee_peer( + iterator: AsyncIterator[T], + buffer: Deque[T], + peers: List[Deque[T]], + lock: AsyncContextManager[Any], +) -> AsyncGenerator[T, None]: + try: + while True: + if not buffer: + async with lock: + if buffer: + continue + try: + item = await iterator.__anext__() + except StopAsyncIteration: + break + else: + for peer_buffer in peers: + peer_buffer.append(item) + yield buffer.popleft() + finally: + for idx, peer_buffer in enumerate(peers): # pragma: no branch + if peer_buffer is buffer: + peers.pop(idx) + break + + if not peers and isinstance(iterator, _ACloseable): + await iterator.aclose() + + +class Tee(Generic[T]): + __slots__ = ("_iterator", "_buffers", "_children") + + def __init__( + self, + iterator: AsyncIterable[T], + n: int = 2, + ): + self._iterator = iterator.__aiter__() + self._buffers: List[Deque[T]] = [deque() for _ in range(n)] + + lock = asyncio.Lock() + self._children = tuple( + tee_peer( + iterator=self._iterator, + buffer=buffer, + peers=self._buffers, + lock=lock, + ) + for buffer in self._buffers + ) + + def __len__(self) -> int: + return len(self._children) + + @overload + def __getitem__(self, item: int) -> AsyncIterator[T]: ... + + @overload + def __getitem__(self, item: slice) -> Tuple[AsyncIterator[T], ...]: ... + + def __getitem__( + self, item: Union[int, slice] + ) -> Union[AsyncIterator[T], Tuple[AsyncIterator[T], ...]]: + return self._children[item] + + def __iter__(self) -> Iterator[AsyncIterator[T]]: + yield from self._children + + async def __aenter__(self) -> "Tee[T]": + return self + + async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None: + await self.aclose() + + async def aclose(self) -> None: + for child in self._children: + await child.aclose() + + +tee = Tee diff --git a/livekit-agents/livekit/agents/utils/audio.py b/livekit-agents/livekit/agents/utils/audio.py index 2c6975b0d..a8cbf1c17 100644 --- a/livekit-agents/livekit/agents/utils/audio.py +++ b/livekit-agents/livekit/agents/utils/audio.py @@ -18,7 +18,7 @@ def __init__( self._num_channels = num_channels if samples_per_channel is None: - samples_per_channel = sample_rate // 50 # 20ms by default + samples_per_channel = sample_rate // 10 # 100ms by default self._bytes_per_frame = ( num_channels * samples_per_channel * ctypes.sizeof(ctypes.c_int16) diff --git a/livekit-agents/livekit/agents/utils/misc.py b/livekit-agents/livekit/agents/utils/misc.py index 7720a53db..f85ae15b7 100644 --- a/livekit-agents/livekit/agents/utils/misc.py +++ b/livekit-agents/livekit/agents/utils/misc.py @@ -1,3 +1,5 @@ +from __future__ import annotations + import time import uuid from typing import List, Union diff --git a/livekit-agents/livekit/agents/vad.py b/livekit-agents/livekit/agents/vad.py index 55c64a5b8..ea42e9158 100644 --- a/livekit-agents/livekit/agents/vad.py +++ b/livekit-agents/livekit/agents/vad.py @@ -27,7 +27,11 @@ class VADEvent: silence_duration: float """duration of the silence in seconds""" frames: List[rtc.AudioFrame] = field(default_factory=list) - """list of audio frames of the speech""" + """list of audio frames of the speech + + start_of_speech: contains the complete audio chunks that triggered the detection) + end_of_speech: contains the complete user speech + """ probability: float = 0.0 """smoothed probability of the speech (only for INFERENCE_DONE event)""" inference_duration: float = 0.0 @@ -65,7 +69,7 @@ def __init__(self): self._task.add_done_callback(lambda _: self._event_ch.close()) @abstractmethod - def _main_task(self) -> None: ... + async def _main_task(self) -> None: ... def push_frame(self, frame: rtc.AudioFrame) -> None: """Push some text to be synthesized""" diff --git a/livekit-agents/livekit/agents/version.py b/livekit-agents/livekit/agents/version.py index 3292eba82..654ad56ec 100644 --- a/livekit-agents/livekit/agents/version.py +++ b/livekit-agents/livekit/agents/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.8.5" +__version__ = "0.9.0" diff --git a/livekit-agents/livekit/agents/voice_assistant/__init__.py b/livekit-agents/livekit/agents/voice_assistant/__init__.py index c36009752..f151ac5d4 100644 --- a/livekit-agents/livekit/agents/voice_assistant/__init__.py +++ b/livekit-agents/livekit/agents/voice_assistant/__init__.py @@ -4,4 +4,8 @@ VoiceAssistant, ) -__all__ = ["VoiceAssistant", "AssistantCallContext", "AssistantTranscriptionOptions"] +__all__ = [ + "VoiceAssistant", + "AssistantCallContext", + "AssistantTranscriptionOptions", +] diff --git a/livekit-agents/livekit/agents/voice_assistant/agent_output.py b/livekit-agents/livekit/agents/voice_assistant/agent_output.py index ec50fde98..f7747af6b 100644 --- a/livekit-agents/livekit/agents/voice_assistant/agent_output.py +++ b/livekit-agents/livekit/agents/voice_assistant/agent_output.py @@ -2,7 +2,7 @@ import asyncio import time -from typing import Any, AsyncIterable, Callable, Union +from typing import Any, AsyncIterable, Awaitable, Callable, Union from livekit import rtc @@ -12,7 +12,7 @@ from .agent_playout import AgentPlayout, PlayoutHandle from .log import logger -SpeechSource = Union[AsyncIterable[str], str] +SpeechSource = Union[AsyncIterable[str], str, Awaitable[str]] class SynthesisHandle: @@ -20,13 +20,21 @@ def __init__( self, *, speech_id: str, - speech_source: SpeechSource, + tts_source: SpeechSource, + transcript_source: SpeechSource, agent_playout: AgentPlayout, tts: text_to_speech.TTS, transcription_fwd: agent_transcription.TTSSegmentsForwarder, ) -> None: - self._speech_source, self._agent_playout, self._tts, self._tr_fwd = ( - speech_source, + ( + self._tts_source, + self._transcript_source, + self._agent_playout, + self._tts, + self._tr_fwd, + ) = ( + tts_source, + transcript_source, agent_playout, tts, transcription_fwd, @@ -113,14 +121,15 @@ def synthesize( self, *, speech_id: str, - transcript: SpeechSource, + tts_source: SpeechSource, + transcript_source: SpeechSource, transcription: bool, transcription_speed: float, sentence_tokenizer: tokenize.SentenceTokenizer, word_tokenizer: tokenize.WordTokenizer, hyphenate_word: Callable[[str], list[str]], ) -> SynthesisHandle: - def _will_forward_transcription( + def _before_forward( fwd: agent_transcription.TTSSegmentsForwarder, transcription: rtc.Transcription, ): @@ -136,11 +145,12 @@ def _will_forward_transcription( sentence_tokenizer=sentence_tokenizer, word_tokenizer=word_tokenizer, hyphenate_word=hyphenate_word, - will_forward_transcription=_will_forward_transcription, + before_forward_cb=_before_forward, ) handle = SynthesisHandle( - speech_source=transcript, + tts_source=tts_source, + transcript_source=transcript_source, agent_playout=self._agent_playout, tts=self._tts, transcription_fwd=transcription_fwd, @@ -155,10 +165,16 @@ def _will_forward_transcription( @utils.log_exceptions(logger=logger) async def _synthesize_task(self, handle: SynthesisHandle) -> None: """Synthesize speech from the source""" - if isinstance(handle._speech_source, str): - co = _str_synthesis_task(handle._speech_source, handle) + tts_source = handle._tts_source + transcript_source = handle._transcript_source + + if isinstance(tts_source, Awaitable): + tts_source = await tts_source + co = _str_synthesis_task(tts_source, transcript_source, handle) + elif isinstance(tts_source, str): + co = _str_synthesis_task(tts_source, transcript_source, handle) else: - co = _stream_synthesis_task(handle._speech_source, handle) + co = _stream_synthesis_task(tts_source, transcript_source, handle) synth = asyncio.create_task(co) synth.add_done_callback(lambda _: handle._buf_ch.close()) @@ -171,17 +187,19 @@ async def _synthesize_task(self, handle: SynthesisHandle) -> None: @utils.log_exceptions(logger=logger) -async def _str_synthesis_task(text: str, handle: SynthesisHandle) -> None: +async def _str_synthesis_task( + tts_text: str, transcript: str, handle: SynthesisHandle +) -> None: """synthesize speech from a string""" if not handle.tts_forwarder.closed: - handle.tts_forwarder.push_text(text) + handle.tts_forwarder.push_text(transcript) handle.tts_forwarder.mark_text_segment_end() start_time = time.time() first_frame = True try: - async for audio in handle._tts.synthesize(text): + async for audio in handle._tts.synthesize(tts_text): if first_frame: first_frame = False logger.debug( @@ -206,7 +224,9 @@ async def _str_synthesis_task(text: str, handle: SynthesisHandle) -> None: @utils.log_exceptions(logger=logger) async def _stream_synthesis_task( - streamed_text: AsyncIterable[str], handle: SynthesisHandle + tts_source: AsyncIterable[str], + transcript_source: AsyncIterable[str], + handle: SynthesisHandle, ) -> None: """synthesize speech from streamed text""" @@ -232,33 +252,41 @@ async def _read_generated_audio_task(): handle._buf_ch.send_nowait(audio.frame) if handle._tr_fwd and not handle._tr_fwd.closed: - # mark_audio_segment_end must be called *after* mart_text_segment_end handle._tr_fwd.mark_audio_segment_end() + @utils.log_exceptions(logger=logger) + async def _read_transcript_task(): + async for seg in transcript_source: + if not handle._tr_fwd.closed: + handle._tr_fwd.push_text(seg) + + if not handle.tts_forwarder.closed: + handle.tts_forwarder.mark_text_segment_end() + # otherwise, stream the text to the TTS tts_stream = handle._tts.stream() - read_atask: asyncio.Task | None = None + read_tts_atask: asyncio.Task | None = None + read_transcript_atask: asyncio.Task | None = None try: - async for seg in streamed_text: - if not handle.tts_forwarder.closed: - handle.tts_forwarder.push_text(seg) - - if read_atask is None: + async for seg in tts_source: + if read_tts_atask is None: # start the task when we receive the first text segment (so start_time is more accurate) - read_atask = asyncio.create_task(_read_generated_audio_task()) + read_tts_atask = asyncio.create_task(_read_generated_audio_task()) + read_transcript_atask = asyncio.create_task(_read_transcript_task()) tts_stream.push_text(seg) - if not handle.tts_forwarder.closed: - handle.tts_forwarder.mark_text_segment_end() - tts_stream.end_input() - if read_atask is not None: - await read_atask + if read_tts_atask is not None: + assert read_transcript_atask is not None + await read_tts_atask + await read_transcript_atask + finally: - if read_atask is not None: - await utils.aio.gracefully_cancel(read_atask) + if read_tts_atask is not None: + assert read_transcript_atask is not None + await utils.aio.gracefully_cancel(read_tts_atask, read_transcript_atask) await tts_stream.aclose() diff --git a/livekit-agents/livekit/agents/voice_assistant/agent_playout.py b/livekit-agents/livekit/agents/voice_assistant/agent_playout.py index ee32f3608..cd7ddc320 100644 --- a/livekit-agents/livekit/agents/voice_assistant/agent_playout.py +++ b/livekit-agents/livekit/agents/voice_assistant/agent_playout.py @@ -15,16 +15,22 @@ class PlayoutHandle: def __init__( self, speech_id: str, + audio_source: rtc.AudioSource, playout_source: AsyncIterable[rtc.AudioFrame], transcription_fwd: transcription.TTSSegmentsForwarder, ) -> None: self._playout_source = playout_source + self._audio_source = audio_source self._tr_fwd = transcription_fwd self._interrupted = False - self._time_played = 0.0 + self._int_fut = asyncio.Future[None]() self._done_fut = asyncio.Future[None]() self._speech_id = speech_id + self._pushed_duration = 0.0 + + self._total_played_time: float | None = None # set whem the playout is done + @property def speech_id(self) -> str: return self._speech_id @@ -35,15 +41,19 @@ def interrupted(self) -> bool: @property def time_played(self) -> float: - return self._time_played + if self._total_played_time is not None: + return self._total_played_time + + return self._pushed_duration - self._audio_source.queued_duration def done(self) -> bool: - return self._done_fut.done() + return self._done_fut.done() or self._interrupted def interrupt(self) -> None: if self.done(): return + self._int_fut.set_result(None) self._interrupted = True def join(self) -> asyncio.Future: @@ -51,9 +61,9 @@ def join(self) -> asyncio.Future: class AgentPlayout(utils.EventEmitter[EventTypes]): - def __init__(self, *, source: rtc.AudioSource, alpha: float = 0.95) -> None: + def __init__(self, *, audio_source: rtc.AudioSource) -> None: super().__init__() - self._source = source + self._audio_source = audio_source self._target_volume = 1.0 self._playout_atask: asyncio.Task[None] | None = None self._closed = False @@ -90,6 +100,7 @@ def play( handle = PlayoutHandle( speech_id=speech_id, + audio_source=self._audio_source, playout_source=playout_source, transcription_fwd=transcription_fwd, ) @@ -103,12 +114,24 @@ def play( async def _playout_task( self, old_task: asyncio.Task[None] | None, handle: PlayoutHandle ) -> None: - first_frame = True + if old_task is not None: + await utils.aio.gracefully_cancel(old_task) - try: - if old_task is not None: - await utils.aio.gracefully_cancel(old_task) + if self._audio_source.queued_duration > 0: + # this should not happen, but log it just in case + logger.warning( + "new playout while the source is still playing", + extra={ + "speech_id": handle.speech_id, + "queued_duration": self._audio_source.queued_duration, + }, + ) + + first_frame = True + @utils.log_exceptions(logger=logger) + async def _capture_task(): + nonlocal first_frame async for frame in handle._playout_source: if first_frame: handle._tr_fwd.segment_playout_started() @@ -121,29 +144,27 @@ async def _playout_task( self.emit("playout_started") first_frame = False - if handle.interrupted: - break - - # divide the frame by chunks of 20ms - ms20 = frame.sample_rate // 50 - i = 0 - while i < len(frame.data): - if handle.interrupted: - break - - rem = min(ms20, len(frame.data) - i) - data = frame.data[i : i + rem] - i += rem - - chunk_frame = rtc.AudioFrame( - data=data.tobytes(), - sample_rate=frame.sample_rate, - num_channels=frame.num_channels, - samples_per_channel=rem, - ) - await self._source.capture_frame(chunk_frame) - handle._time_played += rem / frame.sample_rate + handle._pushed_duration += frame.samples_per_channel / frame.sample_rate + await self._audio_source.capture_frame(frame) + + await self._audio_source.wait_for_playout() + + capture_task = asyncio.create_task(_capture_task()) + try: + await asyncio.wait( + [capture_task, handle._int_fut], + return_when=asyncio.FIRST_COMPLETED, + ) finally: + await utils.aio.gracefully_cancel(capture_task) + + handle._total_played_time = ( + handle._pushed_duration - self._audio_source.queued_duration + ) + + if handle.interrupted or capture_task.exception(): + self._audio_source.clear_queue() # make sure to remove any queued frames + if not first_frame: if not handle.interrupted: handle._tr_fwd.segment_playout_finished() diff --git a/livekit-agents/livekit/agents/voice_assistant/human_input.py b/livekit-agents/livekit/agents/voice_assistant/human_input.py index a3ddc5248..22fec121e 100644 --- a/livekit-agents/livekit/agents/voice_assistant/human_input.py +++ b/livekit-agents/livekit/agents/voice_assistant/human_input.py @@ -101,7 +101,7 @@ async def _recognize_task(self, audio_stream: rtc.AudioStream) -> None: vad_stream = self._vad.stream() stt_stream = self._stt.stream() - def _will_forward_transcription( + def _before_forward( fwd: transcription.STTSegmentsForwarder, transcription: rtc.Transcription ): if not self._transcription: @@ -113,7 +113,7 @@ def _will_forward_transcription( room=self._room, participant=self._participant, track=self._subscribed_track, - will_forward_transcription=_will_forward_transcription, + before_forward_cb=_before_forward, ) async def _audio_stream_co() -> None: diff --git a/livekit-agents/livekit/agents/voice_assistant/plotter.py b/livekit-agents/livekit/agents/voice_assistant/plotter.py index 3b8d583ae..c0a9a1ca9 100644 --- a/livekit-agents/livekit/agents/voice_assistant/plotter.py +++ b/livekit-agents/livekit/agents/voice_assistant/plotter.py @@ -1,10 +1,14 @@ import asyncio +import contextlib import io import multiprocessing as mp +import selectors +import socket import time from dataclasses import dataclass from typing import ClassVar, Literal, Tuple +from .. import utils from ..ipc import channel PlotType = Literal["vad_probability", "raw_vol", "smoothed_vol"] @@ -57,7 +61,7 @@ def read(self, b: io.BytesIO) -> None: } -def _draw_plot(reader): +def _draw_plot(mp_cch): try: import matplotlib as mpl # type: ignore import matplotlib.pyplot as plt # type: ignore @@ -77,11 +81,18 @@ def _draw_plot(reader): max_points = 250 - plot_rx = channel.ProcChannel(conn=reader, messages=PLT_MESSAGES) + duplex = utils.aio.duplex_unix._Duplex.open(mp_cch) + + selector = selectors.DefaultSelector() + selector.register(mp_cch, selectors.EVENT_READ) def _draw_cb(sp, pv): - while reader.poll(): - msg = plot_rx.recv() + while True: + events = selector.select(timeout=0.01) + if not events: + break + + msg = channel.recv_message(duplex, PLT_MESSAGES) if isinstance(msg, PlotMessage): data = plot_data.setdefault(msg.which, ([], [])) data[0].append(msg.x) @@ -129,7 +140,7 @@ def _draw_cb(sp, pv): fig.canvas.draw() - timer = fig.canvas.new_timer(interval=150) + timer = fig.canvas.new_timer(interval=33) timer.add_callback(_draw_cb, sp, pv) timer.start() plt.show() @@ -140,18 +151,18 @@ def __init__(self, loop: asyncio.AbstractEventLoop) -> None: self._loop = loop self._started = False - def start(self): + async def start(self): if self._started: return - mp_pch, mp_cch = mp.Pipe(duplex=True) - self._plot_tx = channel.AsyncProcChannel( - conn=mp_pch, loop=self._loop, messages=PLT_MESSAGES - ) + mp_pch, mp_cch = socket.socketpair() + self._duplex = await utils.aio.duplex_unix._AsyncDuplex.open(mp_pch) self._plot_proc = mp.Process(target=_draw_plot, args=(mp_cch,), daemon=True) self._plot_proc.start() + mp_cch.close() self._started = True + self._closed = False self._start_time = time.time() def plot_value(self, which: PlotType, y: float): @@ -159,17 +170,32 @@ def plot_value(self, which: PlotType, y: float): return ts = time.time() - self._start_time - asyncio.ensure_future(self._plot_tx.asend(PlotMessage(which=which, x=ts, y=y))) + self._send_message(PlotMessage(which=which, x=ts, y=y)) def plot_event(self, which: EventType): if not self._started: return ts = time.time() - self._start_time - asyncio.ensure_future(self._plot_tx.asend(PlotEventMessage(which=which, x=ts))) + self._send_message(PlotEventMessage(which=which, x=ts)) + + def _send_message(self, msg: channel.Message) -> None: + if self._closed: + return + + async def _asend_message(): + try: + await channel.asend_message(self._duplex, msg) + except Exception: + self._closed = True - def terminate(self): + asyncio.ensure_future(_asend_message()) + + async def terminate(self): if not self._started: return self._plot_proc.terminate() + + with contextlib.suppress(utils.aio.duplex_unix.DuplexClosed): + await self._duplex.aclose() diff --git a/livekit-agents/livekit/agents/voice_assistant/speech_handle.py b/livekit-agents/livekit/agents/voice_assistant/speech_handle.py new file mode 100644 index 000000000..684bf1933 --- /dev/null +++ b/livekit-agents/livekit/agents/voice_assistant/speech_handle.py @@ -0,0 +1,153 @@ +from __future__ import annotations + +import asyncio +from typing import AsyncIterable + +from .. import utils +from ..llm import LLMStream +from .agent_output import SynthesisHandle + + +class SpeechHandle: + def __init__( + self, + *, + id: str, + allow_interruptions: bool, + add_to_chat_ctx: bool, + is_reply: bool, + user_question: str, + ) -> None: + self._id = id + self._allow_interruptions = allow_interruptions + self._add_to_chat_ctx = add_to_chat_ctx + + # is_reply is True when the speech is answering to a user question + self._is_reply = is_reply + self._user_question = user_question + self._user_commited = False + + self._init_fut: asyncio.Future[None] = asyncio.Future() + self._initialized = False + self._speech_commited = False # speech committed (interrupted or not) + + # source and synthesis_handle are None until the speech is initialized + self._source: str | LLMStream | AsyncIterable[str] | None = None + self._synthesis_handle: SynthesisHandle | None = None + + @staticmethod + def create_assistant_reply( + *, + allow_interruptions: bool, + add_to_chat_ctx: bool, + user_question: str, + ) -> SpeechHandle: + return SpeechHandle( + id=utils.shortuuid(), + allow_interruptions=allow_interruptions, + add_to_chat_ctx=add_to_chat_ctx, + is_reply=True, + user_question=user_question, + ) + + @staticmethod + def create_assistant_speech( + *, + allow_interruptions: bool, + add_to_chat_ctx: bool, + ) -> SpeechHandle: + return SpeechHandle( + id=utils.shortuuid(), + allow_interruptions=allow_interruptions, + add_to_chat_ctx=add_to_chat_ctx, + is_reply=False, + user_question="", + ) + + async def wait_for_initialization(self) -> None: + await asyncio.shield(self._init_fut) + + def initialize( + self, + *, + source: str | LLMStream | AsyncIterable[str], + synthesis_handle: SynthesisHandle, + ) -> None: + if self.interrupted: + raise RuntimeError("speech is interrupted") + + self._source = source + self._synthesis_handle = synthesis_handle + self._initialized = True + self._init_fut.set_result(None) + + def mark_user_commited(self) -> None: + self._user_commited = True + + def mark_speech_commited(self) -> None: + self._speech_commited = True + + @property + def user_commited(self) -> bool: + return self._user_commited + + @property + def speech_commited(self) -> bool: + return self._speech_commited + + @property + def id(self) -> str: + return self._id + + @property + def allow_interruptions(self) -> bool: + return self._allow_interruptions + + @property + def add_to_chat_ctx(self) -> bool: + return self._add_to_chat_ctx + + @property + def source(self) -> str | LLMStream | AsyncIterable[str]: + if self._source is None: + raise RuntimeError("speech not initialized") + return self._source + + @property + def synthesis_handle(self) -> SynthesisHandle: + if self._synthesis_handle is None: + raise RuntimeError("speech not initialized") + return self._synthesis_handle + + @synthesis_handle.setter + def synthesis_handle(self, synthesis_handle: SynthesisHandle) -> None: + """synthesis handle can be replaced for the same speech. + This is useful when we need to do a new generation. (e.g for automatic function call answers)""" + if self._synthesis_handle is None: + raise RuntimeError("speech not initialized") + + self._synthesis_handle = synthesis_handle + + @property + def initialized(self) -> bool: + return self._initialized + + @property + def is_reply(self) -> bool: + return self._is_reply + + @property + def user_question(self) -> str: + return self._user_question + + @property + def interrupted(self) -> bool: + return self._init_fut.cancelled() or ( + self._synthesis_handle is not None and self._synthesis_handle.interrupted + ) + + def interrupt(self) -> None: + self._init_fut.cancel() + + if self._synthesis_handle is not None: + self._synthesis_handle.interrupt() diff --git a/livekit-agents/livekit/agents/voice_assistant/voice_assistant.py b/livekit-agents/livekit/agents/voice_assistant/voice_assistant.py index 60ceed1d0..a1c7e465e 100644 --- a/livekit-agents/livekit/agents/voice_assistant/voice_assistant.py +++ b/livekit-agents/livekit/agents/voice_assistant/voice_assistant.py @@ -10,31 +10,27 @@ from .. import stt, tokenize, tts, utils, vad from ..llm import LLM, ChatContext, ChatMessage, FunctionContext, LLMStream +from ..proto import ATTR_AGENT_STATE, AgentState from .agent_output import AgentOutput, SynthesisHandle from .agent_playout import AgentPlayout from .human_input import HumanInput from .log import logger from .plotter import AssistantPlotter +from .speech_handle import SpeechHandle +BeforeLLMCallback = Callable[ + ["VoiceAssistant", ChatContext], + Union[Optional[LLMStream], Awaitable[Optional[LLMStream]], Literal[False]], +] -@dataclass -class _SpeechInfo: - id: str # useful to recognize a specific speech in logs - source: str | LLMStream | AsyncIterable[str] - allow_interruptions: bool - add_to_chat_ctx: bool - synthesis_handle: SynthesisHandle - - # is_reply = True when the speech is answering to a user question - is_reply: bool = False - user_question: str = "" - +WillSynthesizeAssistantReply = BeforeLLMCallback -WillSynthesizeAssistantReply = Callable[ - ["VoiceAssistant", ChatContext], - Union[Optional[LLMStream], Awaitable[Optional[LLMStream]]], +BeforeTTSCallback = Callable[ + ["VoiceAssistant", Union[str, AsyncIterable[str]]], + Union[str, AsyncIterable[str], Awaitable[str]], ] + EventTypes = Literal[ "user_started_speaking", "user_stopped_speaking", @@ -47,7 +43,6 @@ class _SpeechInfo: "function_calls_finished", ] - _CallContextVar = contextvars.ContextVar["AssistantCallContext"]( "voice_assistant_contextvar" ) @@ -77,10 +72,19 @@ def llm_stream(self) -> LLMStream: return self._llm_stream -def _default_will_synthesize_assistant_reply( +def _default_before_llm_cb( assistant: VoiceAssistant, chat_ctx: ChatContext ) -> LLMStream: - return assistant.llm.chat(chat_ctx=chat_ctx, fnc_ctx=assistant.fnc_ctx) + return assistant.llm.chat( + chat_ctx=chat_ctx, + fnc_ctx=assistant.fnc_ctx, + ) + + +def _default_before_tts_cb( + assistant: VoiceAssistant, text: str | AsyncIterable[str] +) -> str | AsyncIterable[str]: + return text @dataclass(frozen=True) @@ -88,8 +92,10 @@ class _ImplOptions: allow_interruptions: bool int_speech_duration: float int_min_words: int + min_endpointing_delay: float preemptive_synthesis: bool - will_synthesize_assistant_reply: WillSynthesizeAssistantReply + before_llm_cb: BeforeLLMCallback + before_tts_cb: BeforeTTSCallback plotting: bool transcription: AssistantTranscriptionOptions @@ -106,7 +112,9 @@ class AssistantTranscriptionOptions: sentence_tokenizer: tokenize.SentenceTokenizer = tokenize.basic.SentenceTokenizer() """The tokenizer used to split the speech into sentences. This is used to decide when to mark a transcript as final for the agent transcription.""" - word_tokenizer: tokenize.WordTokenizer = tokenize.basic.WordTokenizer() + word_tokenizer: tokenize.WordTokenizer = tokenize.basic.WordTokenizer( + ignore_punctuation=False + ) """The tokenizer used to split the speech into words. This is used to simulate the "interim results" of the agent transcription.""" hyphenate_word: Callable[[str], list[str]] = tokenize.basic.hyphenate_word @@ -130,11 +138,15 @@ def __init__( allow_interruptions: bool = True, interrupt_speech_duration: float = 0.5, interrupt_min_words: int = 0, + min_endpointing_delay: float = 0.5, preemptive_synthesis: bool = True, transcription: AssistantTranscriptionOptions = AssistantTranscriptionOptions(), - will_synthesize_assistant_reply: WillSynthesizeAssistantReply = _default_will_synthesize_assistant_reply, + before_llm_cb: BeforeLLMCallback = _default_before_llm_cb, + before_tts_cb: BeforeTTSCallback = _default_before_tts_cb, plotting: bool = False, loop: asyncio.AbstractEventLoop | None = None, + # backward compatibility + will_synthesize_assistant_reply: WillSynthesizeAssistantReply | None = None, ) -> None: """ Create a new VoiceAssistant. @@ -150,23 +162,41 @@ def __init__( interrupt_speech_duration: Minimum duration of speech to consider for interruption. interrupt_min_words: Minimum number of words to consider for interruption. Defaults to 0 as this may increase the latency depending on the STT. + min_endpointing_delay: Delay to wait before considering the user finished speaking. preemptive_synthesis: Whether to preemptively synthesize responses. transcription: Options for assistant transcription. - will_synthesize_assistant_reply: Callback called when the assistant is about to synthesize a reply. + before_llm_cb: Callback called when the assistant is about to synthesize a reply. This can be used to customize the reply (e.g: inject context/RAG). + + Returning None will create a default LLM stream. You can also return your own llm + stream by calling the llm.chat() method. + + Returning False will cancel the synthesis of the reply. + before_tts_cb: Callback called when the assistant is about to + synthesize a speech. This can be used to customize text before the speech synthesis. + (e.g: editing the pronunciation of a word). plotting: Whether to enable plotting for debugging. matplotlib must be installed. loop: Event loop to use. Default to asyncio.get_event_loop(). """ super().__init__() self._loop = loop or asyncio.get_event_loop() + + if will_synthesize_assistant_reply is not None: + logger.warning( + "will_synthesize_assistant_reply is deprecated and will be removed in 1.5.0, use before_llm_cb instead", + ) + before_llm_cb = will_synthesize_assistant_reply + self._opts = _ImplOptions( plotting=plotting, allow_interruptions=allow_interruptions, int_speech_duration=interrupt_speech_duration, int_min_words=interrupt_min_words, + min_endpointing_delay=min_endpointing_delay, preemptive_synthesis=preemptive_synthesis, transcription=transcription, - will_synthesize_assistant_reply=will_synthesize_assistant_reply, + before_llm_cb=before_llm_cb, + before_tts_cb=before_tts_cb, ) self._plotter = AssistantPlotter(self._loop) @@ -199,21 +229,25 @@ def __init__( # done when the agent output track is published self._track_published_fut = asyncio.Future[None]() - self._pending_agent_reply: _SpeechInfo | None = None - self._pending_agent_reply_task: asyncio.Task[None] | None = None + self._pending_agent_reply: SpeechHandle | None = None + self._agent_reply_task: asyncio.Task[None] | None = None - self._playing_speech: _SpeechInfo | None = None + self._playing_speech: SpeechHandle | None = None self._transcribed_text, self._transcribed_interim_text = "", "" self._deferred_validation = _DeferredReplyValidation( - self._validate_reply_if_possible, loop=self._loop + self._validate_reply_if_possible, + self._opts.min_endpointing_delay, + loop=self._loop, ) - self._speech_q: list[_SpeechInfo] = [] + self._speech_q: list[SpeechHandle] = [] self._speech_q_changed = asyncio.Event() self._last_end_of_speech_time: float | None = None + self._update_state_task: asyncio.Task | None = None + @property def fnc_ctx(self) -> FunctionContext | None: return self._fnc_ctx @@ -306,16 +340,30 @@ async def say( add_to_chat_ctx: Whether to add the speech to the chat context. """ await self._track_published_fut - speech_id = utils.shortuuid() - self._add_speech_for_playout( - _SpeechInfo( - id=speech_id, - source=source, - allow_interruptions=allow_interruptions, - add_to_chat_ctx=add_to_chat_ctx, - synthesis_handle=self._synthesize_agent_speech(speech_id, source), - ) + + new_handle = SpeechHandle.create_assistant_speech( + allow_interruptions=allow_interruptions, add_to_chat_ctx=add_to_chat_ctx ) + synthesis_handle = self._synthesize_agent_speech(new_handle.id, source) + new_handle.initialize(source=source, synthesis_handle=synthesis_handle) + self._add_speech_for_playout(new_handle) + + def _update_state(self, state: AgentState, delay: float = 0.0): + """Set the current state of the agent""" + + @utils.log_exceptions(logger=logger) + async def _run_task(delay: float) -> None: + await asyncio.sleep(delay) + + if self._room.isconnected(): + await self._room.local_participant.set_attributes( + {ATTR_AGENT_STATE: state} + ) + + if self._update_state_task is not None: + self._update_state_task.cancel() + + self._update_state_task = asyncio.create_task(_run_task(delay)) async def aclose(self) -> None: """Close the voice assistant""" @@ -349,6 +397,7 @@ def _on_start_of_speech(ev: vad.VADEvent) -> None: self._plotter.plot_event("user_started_speaking") self.emit("user_started_speaking") self._deferred_validation.on_human_start_of_speech(ev) + self._update_state("listening") def _on_vad_updated(ev: vad.VADEvent) -> None: if not self._track_published_fut.done(): @@ -407,15 +456,16 @@ def _on_final_transcript(ev: stt.SpeechEvent) -> None: @utils.log_exceptions(logger=logger) async def _main_task(self) -> None: if self._opts.plotting: - self._plotter.start() + await self._plotter.start() + self._update_state("initializing") audio_source = rtc.AudioSource(self._tts.sample_rate, self._tts.num_channels) track = rtc.LocalAudioTrack.create_audio_track("assistant_voice", audio_source) self._agent_publication = await self._room.local_participant.publish_track( track, rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_MICROPHONE) ) - agent_playout = AgentPlayout(source=audio_source) + agent_playout = AgentPlayout(audio_source=audio_source) self._agent_output = AgentOutput( room=self._room, agent_playout=agent_playout, @@ -426,10 +476,12 @@ async def _main_task(self) -> None: def _on_playout_started() -> None: self._plotter.plot_event("agent_started_speaking") self.emit("agent_started_speaking") + self._update_state("speaking") def _on_playout_stopped(interrupted: bool) -> None: self._plotter.plot_event("agent_stopped_speaking") self.emit("agent_stopped_speaking") + self._update_state("listening") agent_playout.on("playout_started", _on_playout_started) agent_playout.on("playout_stopped", _on_playout_stopped) @@ -448,103 +500,119 @@ def _on_playout_stopped(interrupted: bool) -> None: self._speech_q_changed.clear() - def _synthesize_agent_reply(self, *, validated: bool = False) -> None: + def _synthesize_agent_reply(self) -> None: """Synthesize the agent reply to the user question, also make sure only one reply is synthesized/played at a time""" - @utils.log_exceptions(logger=logger) - async def _synthesize_answer_task( - old_task: asyncio.Task[None], user_transcript: str - ) -> None: - if old_task is not None: - await utils.aio.gracefully_cancel(old_task) - - user_msg = ChatMessage.create(text=user_transcript, role="user") - copied_ctx = self._chat_ctx.copy() - copied_ctx.messages.append(user_msg) - - llm_stream = self._opts.will_synthesize_assistant_reply(self, copied_ctx) - if asyncio.iscoroutine(llm_stream): - llm_stream = await llm_stream - - # fallback to default impl if no custom/user stream is returned - if not isinstance(llm_stream, LLMStream): - llm_stream = _default_will_synthesize_assistant_reply( - self, chat_ctx=copied_ctx + if self._pending_agent_reply is not None: + self._pending_agent_reply.interrupt() + + if self._human_input is not None and not self._human_input.speaking: + self._update_state("thinking", 0.2) + + self._pending_agent_reply = new_handle = SpeechHandle.create_assistant_reply( + allow_interruptions=self._opts.allow_interruptions, + add_to_chat_ctx=True, + user_question=self._transcribed_text, + ) + + self._agent_reply_task = asyncio.create_task( + self._synthesize_answer_task(self._agent_reply_task, new_handle) + ) + + @utils.log_exceptions(logger=logger) + async def _synthesize_answer_task( + self, old_task: asyncio.Task[None], handle: SpeechHandle + ) -> None: + if old_task is not None: + await utils.aio.gracefully_cancel(old_task) + + copied_ctx = self._chat_ctx.copy() + + playing_speech = self._playing_speech + if playing_speech is not None and playing_speech.initialized: + if ( + not playing_speech.user_question or playing_speech.user_commited + ) and not playing_speech.speech_commited: + # the speech is playing but not committed yet, add it to the chat context for this new reply synthesis + copied_ctx.messages.append( + ChatMessage.create( + text=playing_speech.synthesis_handle.tts_forwarder.played_text, + role="assistant", + ) ) - speech_id = utils.shortuuid() - reply = _SpeechInfo( - id=speech_id, - source=llm_stream, - allow_interruptions=self._opts.allow_interruptions, - add_to_chat_ctx=True, - synthesis_handle=self._synthesize_agent_speech(speech_id, llm_stream), - is_reply=True, - user_question=user_transcript, - ) + copied_ctx.messages.append( + ChatMessage.create(text=handle.user_question, role="user") + ) - if self._last_end_of_speech_time is not None: - elapsed = round(time.time() - self._last_end_of_speech_time, 3) - else: - elapsed = -1.0 + llm_stream = self._opts.before_llm_cb(self, copied_ctx) + if llm_stream is False: + return - logger.debug( - "synthesizing agent reply", - extra={ - "user_transcript": user_transcript, - "validated": validated, - "speech_id": reply.id, - "elapsed": elapsed, - }, - ) + if asyncio.iscoroutine(llm_stream): + llm_stream = await llm_stream - if validated: - self._add_speech_for_playout(reply) - else: - self._pending_agent_reply = reply + # fallback to default impl if no custom/user stream is returned + if not isinstance(llm_stream, LLMStream): + llm_stream = _default_before_llm_cb(self, chat_ctx=copied_ctx) - # interrupt the current reply synthesis - if self._pending_agent_reply is not None: - self._pending_agent_reply.synthesis_handle.interrupt() - self._pending_agent_reply = None + if handle.interrupted: + return - self._pending_agent_reply_task = asyncio.create_task( - _synthesize_answer_task( - self._pending_agent_reply_task, self._transcribed_text - ) + synthesis_handle = self._synthesize_agent_speech(handle.id, llm_stream) + handle.initialize(source=llm_stream, synthesis_handle=synthesis_handle) + + # TODO(theomonnom): Find a more reliable way to get the elapsed time from the last end of speech + # (VAD could not have detected any speech - maybe unlikely?) + if self._last_end_of_speech_time is not None: + elapsed = round(time.time() - self._last_end_of_speech_time, 3) + else: + elapsed = -1.0 + + logger.debug( + "synthesizing agent reply", + extra={ + "user_transcript": handle.user_question, + "speech_id": handle.id, + "elapsed": elapsed, + }, ) - async def _play_speech(self, speech_info: _SpeechInfo) -> None: - synthesis_handle = speech_info.synthesis_handle + async def _play_speech(self, speech_handle: SpeechHandle) -> None: + try: + await speech_handle.wait_for_initialization() + except asyncio.CancelledError: + return + + await self._agent_publication.wait_for_subscription() + + synthesis_handle = speech_handle.synthesis_handle if synthesis_handle.interrupted: return - user_question = speech_info.user_question - user_speech_committed = False + user_question = speech_handle.user_question play_handle = synthesis_handle.play() join_fut = play_handle.join() def _commit_user_question_if_needed() -> None: - nonlocal user_speech_committed - if ( not user_question or synthesis_handle.interrupted - or user_speech_committed + or speech_handle.user_commited ): return - is_using_tools = isinstance(speech_info.source, LLMStream) and len( - speech_info.source.function_calls + is_using_tools = isinstance(speech_handle.source, LLMStream) and len( + speech_handle.source.function_calls ) # make sure at least some speech was played before committing the user message # since we try to validate as fast as possible it is possible the agent gets interrupted # really quickly (barely audible), we don't want to mark this question as "answered". if ( - speech_info.allow_interruptions + speech_handle.allow_interruptions and not is_using_tools and ( play_handle.time_played < self.MIN_TIME_PLAYED_FOR_COMMIT @@ -561,7 +629,7 @@ def _commit_user_question_if_needed() -> None: self.emit("user_speech_committed", user_msg) self._transcribed_text = self._transcribed_text[len(user_question) :] - user_speech_committed = True + speech_handle.mark_user_commited() # wait for the play_handle to finish and check every 1s if the user question should be committed _commit_user_question_if_needed() @@ -572,13 +640,16 @@ def _commit_user_question_if_needed() -> None: ) _commit_user_question_if_needed() + + if speech_handle.interrupted: + break _commit_user_question_if_needed() - collected_text = speech_info.synthesis_handle.tts_forwarder.played_text - interrupted = speech_info.synthesis_handle.interrupted - is_using_tools = isinstance(speech_info.source, LLMStream) and len( - speech_info.source.function_calls + collected_text = speech_handle.synthesis_handle.tts_forwarder.played_text + interrupted = speech_handle.interrupted + is_using_tools = isinstance(speech_handle.source, LLMStream) and len( + speech_handle.source.function_calls ) extra_tools_messages = [] # additional messages from the functions to add to the context if needed @@ -586,16 +657,16 @@ def _commit_user_question_if_needed() -> None: # if the answer is using tools, execute the functions and automatically generate # a response to the user question from the returned values if is_using_tools and not interrupted: - assert isinstance(speech_info.source, LLMStream) + assert isinstance(speech_handle.source, LLMStream) assert ( - not user_question or user_speech_committed + not user_question or speech_handle.user_commited ), "user speech should have been committed before using tools" # execute functions - call_ctx = AssistantCallContext(self, speech_info.source) + call_ctx = AssistantCallContext(self, speech_handle.source) tk = _CallContextVar.set(call_ctx) - self.emit("function_calls_collected", speech_info.source.function_calls) - called_fncs_info = speech_info.source.function_calls + self.emit("function_calls_collected", speech_handle.source.function_calls) + called_fncs_info = speech_handle.source.function_calls called_fncs = [] for fnc in called_fncs_info: @@ -605,7 +676,7 @@ def _commit_user_question_if_needed() -> None: "executing ai function", extra={ "function": fnc.function_info.name, - "speech_id": speech_info.id, + "speech_id": speech_handle.id, }, ) try: @@ -633,24 +704,27 @@ def _commit_user_question_if_needed() -> None: extra_tools_messages.append(ChatMessage.create_tool_calls(tool_calls)) extra_tools_messages.extend(tool_calls_results_msg) - chat_ctx = speech_info.source.chat_ctx.copy() + chat_ctx = speech_handle.source.chat_ctx.copy() chat_ctx.messages.extend(extra_tools_messages) answer_llm_stream = self._llm.chat( - chat_ctx=chat_ctx, fnc_ctx=self._fnc_ctx + chat_ctx=chat_ctx, + fnc_ctx=self._fnc_ctx, ) answer_synthesis = self._synthesize_agent_speech( - speech_info.id, answer_llm_stream + speech_handle.id, answer_llm_stream ) # replace the synthesis handle with the new one to allow interruption - speech_info.synthesis_handle = answer_synthesis + speech_handle.synthesis_handle = answer_synthesis play_handle = answer_synthesis.play() await play_handle.join() collected_text = answer_synthesis.tts_forwarder.played_text interrupted = answer_synthesis.interrupted - if speech_info.add_to_chat_ctx and (not user_question or user_speech_committed): + if speech_handle.add_to_chat_ctx and ( + not user_question or speech_handle.user_commited + ): self._chat_ctx.messages.extend(extra_tools_messages) if interrupted: @@ -659,6 +733,8 @@ def _commit_user_question_if_needed() -> None: msg = ChatMessage.create(text=collected_text, role="assistant") self._chat_ctx.messages.append(msg) + speech_handle.mark_speech_commited() + if interrupted: self.emit("agent_speech_interrupted", msg) else: @@ -669,7 +745,7 @@ def _commit_user_question_if_needed() -> None: extra={ "agent_transcript": collected_text, "interrupted": interrupted, - "speech_id": speech_info.id, + "speech_id": speech_handle.id, }, ) @@ -685,9 +761,19 @@ def _synthesize_agent_speech( if isinstance(source, LLMStream): source = _llm_stream_to_str_iterable(speech_id, source) + og_source = source + transcript_source = source + if isinstance(og_source, AsyncIterable): + og_source, transcript_source = utils.aio.itertools.tee(og_source, 2) + + tts_source = self._opts.before_tts_cb(self, og_source) + if tts_source is None: + logger.error("before_tts_cb must return str or AsyncIterable[str]") + return self._agent_output.synthesize( speech_id=speech_id, - transcript=source, + tts_source=tts_source, + transcript_source=transcript_source, transcription=self._opts.transcription.agent_transcription, transcription_speed=self._opts.transcription.agent_transcription_speed, sentence_tokenizer=self._opts.transcription.sentence_tokenizer, @@ -697,35 +783,42 @@ def _synthesize_agent_speech( def _validate_reply_if_possible(self) -> None: """Check if the new agent speech should be played""" - if ( - self._pending_agent_reply is not None - and not self._pending_agent_reply.synthesis_handle.interrupted - ): - # in some timing, we could end up with two pushed agent replies inside the speech queue. - # so make sure we directly interrupt every reply when pushing a new one - for speech in self._speech_q: - if speech.allow_interruptions and speech.is_reply: - speech.synthesis_handle.interrupt() - logger.debug( - "validated agent reply", - extra={"speech_id": self._pending_agent_reply.id}, - ) - self._add_speech_for_playout(self._pending_agent_reply) - self._pending_agent_reply = None - elif not self._opts.preemptive_synthesis and self._transcribed_text: - # validated=True is going to call _add_speech_for_playout - self._synthesize_agent_reply(validated=True) + if self._pending_agent_reply is None: + if self._opts.preemptive_synthesis or not self._transcribed_text: + return - # self._transcribed_text is reset after MIN_TIME_PLAYED_FOR_COMMIT, see self._play_speech + self._synthesize_agent_reply() # this will populate self._pending_agent_reply + + assert self._pending_agent_reply is not None + + # in some bad timing, we could end up with two pushed agent replies inside the speech queue. + # so make sure we directly interrupt every reply when validating a new one + for speech in self._speech_q: + if not speech.is_reply: + continue + + if not speech.allow_interruptions: + return # we shouldn't validate this speech to avoid stacking replies + + speech.interrupt() + + logger.debug( + "validated agent reply", + extra={"speech_id": self._pending_agent_reply.id}, + ) + + self._add_speech_for_playout(self._pending_agent_reply) + self._pending_agent_reply = None self._transcribed_interim_text = "" + # self._transcribed_text is reset after MIN_TIME_PLAYED_FOR_COMMIT, see self._play_speech def _interrupt_if_possible(self) -> None: """Check whether the current assistant speech should be interrupted""" if ( self._playing_speech is None or not self._playing_speech.allow_interruptions - or self._playing_speech.synthesis_handle.interrupted + or self._playing_speech.interrupted ): return @@ -738,10 +831,10 @@ def _interrupt_if_possible(self) -> None: if len(interim_words) < self._opts.int_min_words: return - self._playing_speech.synthesis_handle.interrupt() + self._playing_speech.interrupt() - def _add_speech_for_playout(self, speech: _SpeechInfo) -> None: - self._speech_q.append(speech) + def _add_speech_for_playout(self, speech_handle: SpeechHandle) -> None: + self._speech_q.append(speech_handle) self._speech_q_changed.set() @@ -773,14 +866,15 @@ class _DeferredReplyValidation: # if the STT gives us punctuation, we can try validate the reply faster. PUNCTUATION = ".!?" - PUNCTUATION_REDUCE_FACTOR = 0.5 + PUNCTUATION_REDUCE_FACTOR = 0.75 - DEFER_DELAY_END_OF_SPEECH = 0.2 - DEFER_DELAY_FINAL_TRANSCRIPT = 1.0 LATE_TRANSCRIPT_TOLERANCE = 1.5 # late compared to end of speech def __init__( - self, validate_fnc: Callable[[], None], loop: asyncio.AbstractEventLoop + self, + validate_fnc: Callable[[], None], + min_endpointing_delay: float, + loop: asyncio.AbstractEventLoop | None = None, ) -> None: self._validate_fnc = validate_fnc self._validating_task: asyncio.Task | None = None @@ -788,6 +882,9 @@ def __init__( self._last_recv_end_of_speech_time: float = 0.0 self._speaking = False + self._end_of_speech_delay = min_endpointing_delay + self._final_transcript_delay = min_endpointing_delay + 1.0 + @property def validating(self) -> bool: return self._validating_task is not None and not self._validating_task.done() @@ -803,9 +900,9 @@ def on_human_final_transcript(self, transcript: str) -> None: < self.LATE_TRANSCRIPT_TOLERANCE ) delay = ( - self.DEFER_DELAY_END_OF_SPEECH + self._end_of_speech_delay if has_recent_end_of_speech - else self.DEFER_DELAY_FINAL_TRANSCRIPT + else self._final_transcript_delay ) delay = ( delay * self.PUNCTUATION_REDUCE_FACTOR @@ -827,7 +924,7 @@ def on_human_end_of_speech(self, ev: vad.VADEvent) -> None: if self._last_final_transcript: delay = ( - self.DEFER_DELAY_END_OF_SPEECH * self.PUNCTUATION_REDUCE_FACTOR + self._end_of_speech_delay * self.PUNCTUATION_REDUCE_FACTOR if self._end_with_punctuation() else 1.0 ) diff --git a/livekit-agents/livekit/agents/worker.py b/livekit-agents/livekit/agents/worker.py index 8aaf56d6c..1193ea269 100644 --- a/livekit-agents/livekit/agents/worker.py +++ b/livekit-agents/livekit/agents/worker.py @@ -17,12 +17,15 @@ import asyncio import contextlib import datetime +import math import multiprocessing as mp import os +import sys import threading from dataclasses import dataclass, field +from enum import Enum from functools import reduce -from typing import Any, Callable, Coroutine, Literal +from typing import Any, Awaitable, Callable, Generic, Literal, TypeVar from urllib.parse import urljoin, urlparse import aiohttp @@ -33,7 +36,14 @@ from . import http_server, ipc, utils from .exceptions import AssignmentTimeoutError -from .job import JobAcceptArguments, JobContext, JobProcess, JobRequest, RunningJobInfo +from .job import ( + JobAcceptArguments, + JobContext, + JobExecutorType, + JobProcess, + JobRequest, + RunningJobInfo, +) from .log import DEV_LEVEL, logger from .version import __version__ @@ -49,6 +59,11 @@ async def _default_request_fnc(ctx: JobRequest) -> None: await ctx.accept() +class WorkerType(Enum): + ROOM = agent.JobType.JT_ROOM + PUBLISHER = agent.JobType.JT_PUBLISHER + + class _DefaultLoadCalc: _instance = None @@ -89,12 +104,34 @@ class WorkerPermissions: hidden: bool = False +if sys.platform.startswith("win"): + # Some python versions on Windows gets a BrokenPipeError when creating a new process + _default_job_executor_type = JobExecutorType.THREAD +else: + _default_job_executor_type = JobExecutorType.PROCESS + + +T = TypeVar("T") + + +@dataclass(frozen=True) +class _WorkerEnvOption(Generic[T]): + dev_default: T + prod_default: T + + @staticmethod + def getvalue(opt: T | _WorkerEnvOption[T], devmode: bool) -> T: + if isinstance(opt, _WorkerEnvOption): + return opt.dev_default if devmode else opt.prod_default + return opt + + # NOTE: this object must be pickle-able @dataclass class WorkerOptions: - entrypoint_fnc: Callable[[JobContext], Coroutine] + entrypoint_fnc: Callable[[JobContext], Awaitable[None]] """Entrypoint function that will be called when a job is assigned to this worker.""" - request_fnc: Callable[[JobRequest], Coroutine] = _default_request_fnc + request_fnc: Callable[[JobRequest], Awaitable[None]] = _default_request_fnc """Inspect the request and decide if the current worker should handle it. When left empty, all jobs are accepted.""" @@ -102,9 +139,18 @@ class WorkerOptions: """A function to perform any necessary initialization before the job starts.""" load_fnc: Callable[[], float] = _DefaultLoadCalc.get_load """Called to determine the current load of the worker. Should return a value between 0 and 1.""" - load_threshold: float = 0.65 - """When the load exceeds this threshold, the worker will be marked as unavailable.""" - num_idle_processes: int = 3 + job_executor_type: JobExecutorType = _default_job_executor_type + """Which executor to use to run jobs. (currently thread or process are supported)""" + load_threshold: float | _WorkerEnvOption[float] = _WorkerEnvOption( + dev_default=math.inf, prod_default=0.75 + ) + """When the load exceeds this threshold, the worker will be marked as unavailable. + + Defaults to 0.75 on "production" mode, and is disabled in "development" mode. + """ + num_idle_processes: int | _WorkerEnvOption[int] = _WorkerEnvOption( + dev_default=0, prod_default=3 + ) """Number of idle processes to keep warm.""" shutdown_process_timeout: float = 60.0 """Maximum amount of time to wait for a job to shut down gracefully""" @@ -114,7 +160,9 @@ class WorkerOptions: """Namespace for the agent to be in""" permissions: WorkerPermissions = field(default_factory=WorkerPermissions) """Permissions that the agent should join the room with.""" - worker_type: agent.JobType = agent.JobType.JT_ROOM + agent_name: str = "" + """Agent name can be used when multiple agents are required to join the same room. The LiveKit SFU will dispatch jobs to unique agent_name workers independently.""" + worker_type: WorkerType = WorkerType.ROOM """Whether to spin up an agent for each room or publisher.""" max_retry: int = 16 """Maximum number of times to retry connecting to LiveKit.""" @@ -131,10 +179,13 @@ class WorkerOptions: By default it uses ``LIVEKIT_API_SECRET`` from environment""" host: str = "" # default to all interfaces - port: int = 8081 + port: int | _WorkerEnvOption[int] = _WorkerEnvOption( + dev_default=0, prod_default=8081 + ) """Port for local HTTP server to listen on. - The HTTP server is used as a health check endpoint.""" + The HTTP server is used as a health check endpoint. + """ EventTypes = Literal["worker_registered"] @@ -142,10 +193,14 @@ class WorkerOptions: class Worker(utils.EventEmitter[EventTypes]): def __init__( - self, opts: WorkerOptions, *, loop: asyncio.AbstractEventLoop | None = None + self, + opts: WorkerOptions, + *, + devmode: bool = True, + loop: asyncio.AbstractEventLoop | None = None, ) -> None: super().__init__() - opts.ws_url = opts.ws_url or opts.ws_url or os.environ.get("LIVEKIT_URL") or "" + opts.ws_url = opts.ws_url or os.environ.get("LIVEKIT_URL") or "" opts.api_key = opts.api_key or os.environ.get("LIVEKIT_API_KEY") or "" opts.api_secret = opts.api_secret or os.environ.get("LIVEKIT_API_SECRET") or "" @@ -173,6 +228,7 @@ def __init__( self._pending_assignments: dict[str, asyncio.Future[agent.JobAssignment]] = {} self._close_future: asyncio.Future[None] | None = None self._msg_chan = utils.aio.Chan[agent.WorkerMessage](128, loop=self._loop) + self._devmode = devmode # using spawn context for all platforms. We may have further optimizations for # Linux with forkserver, but for now, this is the safest option @@ -180,8 +236,11 @@ def __init__( self._proc_pool = ipc.proc_pool.ProcPool( initialize_process_fnc=opts.prewarm_fnc, job_entrypoint_fnc=opts.entrypoint_fnc, - num_idle_processes=opts.num_idle_processes, + num_idle_processes=_WorkerEnvOption.getvalue( + opts.num_idle_processes, self._devmode + ), loop=self._loop, + job_executor_type=opts.job_executor_type, mp_ctx=mp_ctx, initialize_timeout=opts.initialize_process_timeout, close_timeout=opts.shutdown_process_timeout, @@ -190,10 +249,12 @@ def __init__( self._api: api.LiveKitAPI | None = None self._http_session: aiohttp.ClientSession | None = None self._http_server = http_server.HttpServer( - opts.host, opts.port, loop=self._loop + opts.host, + _WorkerEnvOption.getvalue(opts.port, self._devmode), + loop=self._loop, ) - self._main_task: asyncio.Task | None = None + self._main_task: asyncio.Task[None] | None = None async def run(self): if not self._closed: @@ -346,7 +407,7 @@ async def _worker_task(self) -> None: # register the worker req = agent.WorkerMessage() - req.register.type = self._opts.worker_type + req.register.type = self._opts.worker_type.value req.register.allowed_permissions.CopyFrom( models.ParticipantPermission( can_publish=self._opts.permissions.can_publish, @@ -409,7 +470,9 @@ async def _load_task(): None, self._opts.load_fnc ) - is_full = current_load >= self._opts.load_threshold + is_full = current_load >= _WorkerEnvOption.getvalue( + self._opts.load_threshold, self._devmode + ) currently_available = not is_full and not self._draining current_status = ( @@ -479,6 +542,13 @@ async def _recv_task(): self._handle_availability(msg.availability) elif which == "assignment": self._handle_assignment(msg.assignment) + elif which == "termination": + user_task = self._loop.create_task( + self._handle_termination(msg.termination), + name="agent_job_termination", + ) + self._tasks.add(user_task) + user_task.add_done_callback(self._tasks.discard) tasks = [ asyncio.create_task(_load_task()), @@ -492,7 +562,11 @@ async def _recv_task(): async def _reload_jobs(self, jobs: list[RunningJobInfo]) -> None: for aj in jobs: - logger.log(DEV_LEVEL, "reloading job", extra={"job_id": aj.job.id}) + logger.log( + DEV_LEVEL, + "reloading job", + extra={"job_id": aj.job.id, "agent_name": aj.job.agent_name}, + ) url = self._opts.ws_url # take the original jwt token and extend it while keeping all the same data that was generated @@ -561,7 +635,7 @@ async def _on_accept(args: JobAcceptArguments) -> None: except asyncio.TimeoutError: logger.warning( f"assignment for job {job_req.id} timed out", - extra={"job_request": job_req}, + extra={"job_request": job_req, "agent_name": self._opts.agent_name}, ) raise AssignmentTimeoutError() @@ -579,7 +653,11 @@ async def _on_accept(args: JobAcceptArguments) -> None: logger.info( "received job request", - extra={"job_request": msg.job, "resuming": msg.resuming}, + extra={ + "job_request": msg.job, + "resuming": msg.resuming, + "agent_name": self._opts.agent_name, + }, ) @utils.log_exceptions(logger=logger) @@ -588,13 +666,14 @@ async def _job_request_task(): await self._opts.request_fnc(job_req) except Exception: logger.exception( - "job_request_fnc failed", extra={"job_request": job_req} + "job_request_fnc failed", + extra={"job_request": job_req, "agent_name": self._opts.agent_name}, ) if not answered: logger.warning( "no answer was given inside the job_request_fnc, automatically rejecting the job", - extra={"job_request": job_req}, + extra={"job_request": job_req, "agent_name": self._opts.agent_name}, ) await _on_reject() @@ -609,5 +688,13 @@ def _handle_assignment(self, assignment: agent.JobAssignment): fut.set_result(assignment) else: logger.warning( - "received assignment for an unknown job", extra={"job": assignment.job} + "received assignment for an unknown job", + extra={"job": assignment.job, "agent_name": self._opts.agent_name}, ) + + async def _handle_termination(self, msg: agent.JobTermination): + proc = self._proc_pool.get_by_job_id(msg.job_id) + if not proc: + # safe to ignore + return + await proc.aclose() diff --git a/livekit-agents/package.json b/livekit-agents/package.json index cc0f161f3..2327f51d3 100644 --- a/livekit-agents/package.json +++ b/livekit-agents/package.json @@ -1,5 +1,5 @@ { "name": "livekit-agents", "private": true, - "version": "0.8.5" + "version": "0.9.0" } diff --git a/livekit-agents/setup.py b/livekit-agents/setup.py index ca4a6eae7..93716feb1 100644 --- a/livekit-agents/setup.py +++ b/livekit-agents/setup.py @@ -48,7 +48,7 @@ python_requires=">=3.9.0", install_requires=[ "click~=8.1", - "livekit~=0.12", + "livekit>=0.16.3", "livekit-api~=0.6", "livekit-protocol~=0.6", "protobuf>=3", @@ -57,6 +57,7 @@ "watchfiles~=0.22", "psutil~=5.9", "aiohttp~=3.10", + "typing-extensions~=4.12", ], extras_require={ ':sys_platform=="win32"': [ diff --git a/livekit-plugins/install_plugins_editable.sh b/livekit-plugins/install_plugins_editable.sh index a56570708..eead3d9f8 100755 --- a/livekit-plugins/install_plugins_editable.sh +++ b/livekit-plugins/install_plugins_editable.sh @@ -16,3 +16,4 @@ pip install -e ./livekit-plugins-nltk --config-settings editable_mode=strict pip install -e ./livekit-plugins-openai --config-settings editable_mode=strict pip install -e ./livekit-plugins-rag --config-settings editable_mode=strict pip install -e ./livekit-plugins-silero --config-settings editable_mode=strict +pip install -e ./livekit-plugins-browser --config-settings editable_mode=strict diff --git a/livekit-plugins/livekit-plugins-anthropic/CHANGELOG.md b/livekit-plugins/livekit-plugins-anthropic/CHANGELOG.md new file mode 100644 index 000000000..81b9b2221 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/CHANGELOG.md @@ -0,0 +1,13 @@ +# livekit-plugins-anthropic + +## 0.2.1 + +### Patch Changes + +- Fixes to Anthropic Function Calling - [#708](https://github.com/livekit/agents/pull/708) ([@keepingitneil](https://github.com/keepingitneil)) + +## 0.2.0 + +### Minor Changes + +- bump anthropic for release - [#724](https://github.com/livekit/agents/pull/724) ([@theomonnom](https://github.com/theomonnom)) diff --git a/livekit-plugins/livekit-plugins-anthropic/README.md b/livekit-plugins/livekit-plugins-anthropic/README.md new file mode 100644 index 000000000..3eabfa1c2 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/README.md @@ -0,0 +1,13 @@ +# LiveKit Plugins Anthropic + +Agent Framework plugin for services from Anthropic. + +## Installation + +```bash +pip install livekit-plugins-anthropic +``` + +## Pre-requisites + +You'll need an API key from Anthropic. It can be set as an environment variable: `ANTHROPIC_API_KEY` diff --git a/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/__init__.py b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/__init__.py new file mode 100644 index 000000000..464766951 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/__init__.py @@ -0,0 +1,37 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +from .llm import LLM, LLMStream +from .log import logger +from .models import ChatModels +from .version import __version__ + +__all__ = [ + "LLM", + "LLMStream", + "ChatModels", + "logger", + "__version__", +] + +from livekit.agents import Plugin + + +class AnthropicPlugin(Plugin): + def __init__(self) -> None: + super().__init__(__name__, __version__, __package__, logger) + + +Plugin.register_plugin(AnthropicPlugin()) diff --git a/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py new file mode 100644 index 000000000..6fa6df13c --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py @@ -0,0 +1,511 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import annotations + +import base64 +import inspect +import json +import os +from dataclasses import dataclass +from typing import Any, Awaitable, List, Tuple, get_args, get_origin + +import httpx +from livekit import rtc +from livekit.agents import llm, utils + +import anthropic + +from .log import logger +from .models import ( + ChatModels, +) + + +@dataclass +class LLMOptions: + model: str | ChatModels + user: str | None + temperature: float | None + + +class LLM(llm.LLM): + def __init__( + self, + *, + model: str | ChatModels = "claude-3-haiku-20240307", + api_key: str | None = None, + base_url: str | None = None, + user: str | None = None, + client: anthropic.AsyncClient | None = None, + temperature: float | None = None, + ) -> None: + """ + Create a new instance of Anthropic LLM. + + ``api_key`` must be set to your Anthropic API key, either using the argument or by setting + the ``ANTHROPIC_API_KEY`` environmental variable. + """ + # throw an error on our end + api_key = api_key or os.environ.get("ANTHROPIC_API_KEY") + if api_key is None: + raise ValueError("Anthropic API key is required") + + self._opts = LLMOptions(model=model, user=user, temperature=temperature) + self._client = client or anthropic.AsyncClient( + api_key=api_key, + base_url=base_url, + http_client=httpx.AsyncClient( + timeout=5.0, + follow_redirects=True, + limits=httpx.Limits( + max_connections=1000, + max_keepalive_connections=100, + keepalive_expiry=120, + ), + ), + ) + + def chat( + self, + *, + chat_ctx: llm.ChatContext, + fnc_ctx: llm.FunctionContext | None = None, + temperature: float | None = None, + n: int | None = 1, + parallel_tool_calls: bool | None = None, + ) -> "LLMStream": + if temperature is None: + temperature = self._opts.temperature + + opts: dict[str, Any] = dict() + if fnc_ctx and len(fnc_ctx.ai_functions) > 0: + fncs_desc: list[anthropic.types.ToolParam] = [] + for fnc in fnc_ctx.ai_functions.values(): + fncs_desc.append(_build_function_description(fnc)) + + opts["tools"] = fncs_desc + + if fnc_ctx and parallel_tool_calls is not None: + opts["parallel_tool_calls"] = parallel_tool_calls + + latest_system_message = _latest_system_message(chat_ctx) + anthropic_ctx = _build_anthropic_context(chat_ctx.messages, id(self)) + collaped_anthropic_ctx = _merge_messages(anthropic_ctx) + stream = self._client.messages.create( + max_tokens=opts.get("max_tokens", 1000), + system=latest_system_message, + messages=collaped_anthropic_ctx, + model=self._opts.model, + temperature=temperature or anthropic.NOT_GIVEN, + top_k=n or anthropic.NOT_GIVEN, + stream=True, + **opts, + ) + + return LLMStream(anthropic_stream=stream, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx) + + +class LLMStream(llm.LLMStream): + def __init__( + self, + *, + anthropic_stream: Awaitable[ + anthropic.AsyncStream[anthropic.types.RawMessageStreamEvent] + ], + chat_ctx: llm.ChatContext, + fnc_ctx: llm.FunctionContext | None, + ) -> None: + super().__init__(chat_ctx=chat_ctx, fnc_ctx=fnc_ctx) + self._awaitable_anthropic_stream = anthropic_stream + self._anthropic_stream: ( + anthropic.AsyncStream[anthropic.types.RawMessageStreamEvent] | None + ) = None + + # current function call that we're waiting for full completion (args are streamed) + self._tool_call_id: str | None = None + self._fnc_name: str | None = None + self._fnc_raw_arguments: str | None = None + + async def aclose(self) -> None: + if self._anthropic_stream: + await self._anthropic_stream.close() + + return await super().aclose() + + async def __anext__(self): + if not self._anthropic_stream: + self._anthropic_stream = await self._awaitable_anthropic_stream + + fn_calling_enabled = self._fnc_ctx is not None + ignore = False + + async for event in self._anthropic_stream: + if event.type == "message_start": + pass + elif event.type == "message_delta": + pass + elif event.type == "message_stop": + pass + elif event.type == "content_block_start": + if event.content_block.type == "tool_use": + self._tool_call_id = event.content_block.id + self._fnc_raw_arguments = "" + self._fnc_name = event.content_block.name + elif event.type == "content_block_delta": + delta = event.delta + if delta.type == "text_delta": + text = delta.text + + # Anthropic seems to add a prompt when tool calling is enabled + # where responses always start with a "" block containing + # the LLM's chain of thought. It's very verbose and not useful for voice + # applications. + if fn_calling_enabled: + if text.startswith(""): + ignore = True + + if "" in text: + text = text.split("")[-1] + ignore = False + + if ignore: + continue + + return llm.ChatChunk( + choices=[ + llm.Choice( + delta=llm.ChoiceDelta(content=text, role="assistant") + ) + ] + ) + elif delta.type == "input_json_delta": + assert self._fnc_raw_arguments is not None + self._fnc_raw_arguments += delta.partial_json + + elif event.type == "content_block_stop": + if self._tool_call_id is not None and self._fnc_ctx: + assert self._fnc_name is not None + assert self._fnc_raw_arguments is not None + fnc_info = _create_ai_function_info( + self._fnc_ctx, + self._tool_call_id, + self._fnc_name, + self._fnc_raw_arguments, + ) + self._function_calls_info.append(fnc_info) + chunk = llm.ChatChunk( + choices=[ + llm.Choice( + delta=llm.ChoiceDelta( + role="assistant", tool_calls=[fnc_info] + ), + index=0, + ) + ] + ) + self._tool_call_id = None + self._fnc_raw_arguments = None + self._fnc_name = None + return chunk + + raise StopAsyncIteration + + +def _latest_system_message(chat_ctx: llm.ChatContext) -> str: + latest_system_message: llm.ChatMessage | None = None + for m in chat_ctx.messages: + if m.role == "system": + latest_system_message = m + continue + + latest_system_str = "" + if latest_system_message: + if isinstance(latest_system_message.content, str): + latest_system_str = latest_system_message.content + elif isinstance(latest_system_message.content, list): + latest_system_str = " ".join( + [c for c in latest_system_message.content if isinstance(c, str)] + ) + return latest_system_str + + +def _merge_messages( + messages: List[anthropic.types.MessageParam], +) -> List[anthropic.types.MessageParam]: + # Anthropic enforces alternating messages + combined_messages: list[anthropic.types.MessageParam] = [] + for m in messages: + if len(combined_messages) == 0 or m["role"] != combined_messages[-1]["role"]: + combined_messages.append(m) + continue + last_message = combined_messages[-1] + if not isinstance(last_message["content"], list) or not isinstance( + m["content"], list + ): + logger.error("message content is not a list") + continue + + last_message["content"].extend(m["content"]) + + if len(combined_messages) == 0 or combined_messages[0]["role"] != "user": + combined_messages.insert( + 0, {"role": "user", "content": [{"type": "text", "text": "(empty)"}]} + ) + + return combined_messages + + +def _build_anthropic_context( + chat_ctx: List[llm.ChatMessage], cache_key: Any +) -> List[anthropic.types.MessageParam]: + result: List[anthropic.types.MessageParam] = [] + for msg in chat_ctx: + a_msg = _build_anthropic_message(msg, cache_key, chat_ctx) + if a_msg: + result.append(a_msg) + return result + + +def _build_anthropic_message( + msg: llm.ChatMessage, cache_key: Any, chat_ctx: List[llm.ChatMessage] +) -> anthropic.types.MessageParam | None: + if msg.role == "user" or msg.role == "assistant": + a_msg: anthropic.types.MessageParam = { + "role": msg.role, + "content": [], + } + assert isinstance(a_msg["content"], list) + a_content = a_msg["content"] + + # add content if provided + if isinstance(msg.content, str): + a_msg["content"].append( + anthropic.types.TextBlock( + text=msg.content, + type="text", + ) + ) + elif isinstance(msg.content, list): + for cnt in msg.content: + if isinstance(cnt, str): + content: anthropic.types.TextBlock = anthropic.types.TextBlock( + text=cnt, + type="text", + ) + a_content.append(content) + elif isinstance(cnt, llm.ChatImage): + a_content.append(_build_anthropic_image_content(cnt, cache_key)) + + if msg.tool_calls is not None: + for fnc in msg.tool_calls: + tool_use = anthropic.types.ToolUseBlockParam( + id=fnc.tool_call_id, + type="tool_use", + name=fnc.function_info.name, + input=fnc.arguments, + ) + a_content.append(tool_use) + + return a_msg + elif msg.role == "tool": + if not isinstance(msg.content, str): + logger.warning("tool message content is not a string") + return None + if not msg.tool_call_id: + return None + + u_content = anthropic.types.ToolResultBlockParam( + tool_use_id=msg.tool_call_id, + type="tool_result", + content=msg.content, + is_error=msg.tool_exception is not None, + ) + return { + "role": "user", + "content": [u_content], + } + + return None + + +def _build_anthropic_image_content( + image: llm.ChatImage, cache_key: Any +) -> anthropic.types.ImageBlockParam: + if isinstance(image.image, str): # image url + logger.warning( + "image url not supported by anthropic, skipping image '%s'", image.image + ) + elif isinstance(image.image, rtc.VideoFrame): # VideoFrame + if cache_key not in image._cache: + # inside our internal implementation, we allow to put extra metadata to + # each ChatImage (avoid to reencode each time we do a chatcompletion request) + opts = utils.images.EncodeOptions() + if image.inference_width and image.inference_height: + opts.resize_options = utils.images.ResizeOptions( + width=image.inference_width, + height=image.inference_height, + strategy="center_aspect_fit", + ) + + encoded_data = utils.images.encode(image.image, opts) + image._cache[cache_key] = base64.b64encode(encoded_data).decode("utf-8") + + return { + "type": "image", + "source": { + "type": "base64", + "data": image._cache[cache_key], + "media_type": "image/jpeg", + }, + } + + raise ValueError(f"unknown image type {type(image.image)}") + + +def _create_ai_function_info( + fnc_ctx: llm.function_context.FunctionContext, + tool_call_id: str, + fnc_name: str, + raw_arguments: str, # JSON string +) -> llm.function_context.FunctionCallInfo: + if fnc_name not in fnc_ctx.ai_functions: + raise ValueError(f"AI function {fnc_name} not found") + + parsed_arguments: dict[str, Any] = {} + try: + if raw_arguments: # ignore empty string + parsed_arguments = json.loads(raw_arguments) + except json.JSONDecodeError: + raise ValueError( + f"AI function {fnc_name} received invalid JSON arguments - {raw_arguments}" + ) + + fnc_info = fnc_ctx.ai_functions[fnc_name] + + # Ensure all necessary arguments are present and of the correct type. + sanitized_arguments: dict[str, Any] = {} + for arg_info in fnc_info.arguments.values(): + if arg_info.name not in parsed_arguments: + if arg_info.default is inspect.Parameter.empty: + raise ValueError( + f"AI function {fnc_name} missing required argument {arg_info.name}" + ) + continue + + arg_value = parsed_arguments[arg_info.name] + if get_origin(arg_info.type) is not None: + if not isinstance(arg_value, list): + raise ValueError( + f"AI function {fnc_name} argument {arg_info.name} should be a list" + ) + + inner_type = get_args(arg_info.type)[0] + sanitized_value = [ + _sanitize_primitive( + value=v, expected_type=inner_type, choices=arg_info.choices + ) + for v in arg_value + ] + else: + sanitized_value = _sanitize_primitive( + value=arg_value, expected_type=arg_info.type, choices=arg_info.choices + ) + + sanitized_arguments[arg_info.name] = sanitized_value + + return llm.function_context.FunctionCallInfo( + tool_call_id=tool_call_id, + raw_arguments=raw_arguments, + function_info=fnc_info, + arguments=sanitized_arguments, + ) + + +def _build_function_description( + fnc_info: llm.function_context.FunctionInfo, +) -> anthropic.types.ToolParam: + def build_schema_field(arg_info: llm.function_context.FunctionArgInfo): + def type2str(t: type) -> str: + if t is str: + return "string" + elif t in (int, float): + return "number" + elif t is bool: + return "boolean" + + raise ValueError(f"unsupported type {t} for ai_property") + + p: dict[str, Any] = {} + if arg_info.default is inspect.Parameter.empty: + p["required"] = True + else: + p["required"] = False + + if arg_info.description: + p["description"] = arg_info.description + + if get_origin(arg_info.type) is list: + inner_type = get_args(arg_info.type)[0] + p["type"] = "array" + p["items"] = {} + p["items"]["type"] = type2str(inner_type) + + if arg_info.choices: + p["items"]["enum"] = arg_info.choices + else: + p["type"] = type2str(arg_info.type) + if arg_info.choices: + p["enum"] = arg_info.choices + + return p + + input_schema: dict[str, object] = {"type": "object"} + + for arg_info in fnc_info.arguments.values(): + input_schema[arg_info.name] = build_schema_field(arg_info) + + return { + "name": fnc_info.name, + "description": fnc_info.description, + "input_schema": input_schema, + } + + +def _sanitize_primitive( + *, value: Any, expected_type: type, choices: Tuple[Any] | None +) -> Any: + if expected_type is str: + if not isinstance(value, str): + raise ValueError(f"expected str, got {type(value)}") + elif expected_type in (int, float): + if not isinstance(value, (int, float)): + raise ValueError(f"expected number, got {type(value)}") + + if expected_type is int: + if value % 1 != 0: + raise ValueError("expected int, got float") + + value = int(value) + elif expected_type is float: + value = float(value) + + elif expected_type is bool: + if not isinstance(value, bool): + raise ValueError(f"expected bool, got {type(value)}") + + if choices and value not in choices: + raise ValueError(f"invalid value {value}, not in {choices}") + + return value diff --git a/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/log.py b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/log.py new file mode 100644 index 000000000..aac7cf6eb --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/log.py @@ -0,0 +1,3 @@ +import logging + +logger = logging.getLogger("livekit.plugins.anthropic") diff --git a/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/models.py b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/models.py new file mode 100644 index 000000000..502d52d03 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/models.py @@ -0,0 +1,8 @@ +from typing import Literal + +ChatModels = Literal[ + "claude-3-5-sonnet-20240620", + "claude-3-opus-20240229", + "claude-3-sonnet-20240229", + "claude-3-haiku-20240307", +] diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/helper_main_win.cpp b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/py.typed similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/helper_main_win.cpp rename to livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/py.typed diff --git a/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py new file mode 100644 index 000000000..875ee5214 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/version.py @@ -0,0 +1,15 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__version__ = "0.2.1" diff --git a/livekit-plugins/livekit-plugins-anthropic/package.json b/livekit-plugins/livekit-plugins-anthropic/package.json new file mode 100644 index 000000000..3394ee822 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/package.json @@ -0,0 +1,5 @@ +{ + "name": "livekit-plugins-anthropic", + "private": true, + "version": "0.2.1" +} diff --git a/livekit-plugins/livekit-plugins-anthropic/pyproject.toml b/livekit-plugins/livekit-plugins-anthropic/pyproject.toml new file mode 100644 index 000000000..8cf32563a --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/pyproject.toml @@ -0,0 +1,3 @@ +[build-system] +requires = ["setuptools>=61.0"] +build-backend = "setuptools.build_meta" \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-anthropic/setup.py b/livekit-plugins/livekit-plugins-anthropic/setup.py new file mode 100644 index 000000000..5cbeb9625 --- /dev/null +++ b/livekit-plugins/livekit-plugins-anthropic/setup.py @@ -0,0 +1,59 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pathlib + +import setuptools +import setuptools.command.build_py + +here = pathlib.Path(__file__).parent.resolve() +about = {} +with open( + os.path.join(here, "livekit", "plugins", "anthropic", "version.py"), "r" +) as f: + exec(f.read(), about) + + +setuptools.setup( + name="livekit-plugins-anthropic", + version=about["__version__"], + description="Agent Framework plugin for services from Anthropic", + long_description=(here / "README.md").read_text(encoding="utf-8"), + long_description_content_type="text/markdown", + url="https://github.com/livekit/agents", + cmdclass={}, + classifiers=[ + "Intended Audience :: Developers", + "License :: OSI Approved :: Apache Software License", + "Topic :: Multimedia :: Sound/Audio", + "Topic :: Multimedia :: Video", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3 :: Only", + ], + keywords=["webrtc", "realtime", "audio", "video", "livekit"], + license="Apache-2.0", + packages=setuptools.find_namespace_packages(include=["livekit.*"]), + python_requires=">=3.9.0", + install_requires=["livekit-agents~=0.8", "anthropic ~= 0.34"], + package_data={"livekit.plugins.anthropic": ["py.typed"]}, + project_urls={ + "Documentation": "https://docs.livekit.io", + "Website": "https://livekit.io/", + "Source": "https://github.com/livekit/agents", + }, +) diff --git a/livekit-plugins/livekit-plugins-azure/CHANGELOG.md b/livekit-plugins/livekit-plugins-azure/CHANGELOG.md index 7a8b527bf..fcee6ff88 100644 --- a/livekit-plugins/livekit-plugins-azure/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-azure/CHANGELOG.md @@ -1,5 +1,11 @@ # livekit-plugins-azure +## 0.3.2 + +### Patch Changes + +- avoid returning tiny frames from TTS - [#747](https://github.com/livekit/agents/pull/747) ([@theomonnom](https://github.com/theomonnom)) + ## 0.3.1 ### Patch Changes diff --git a/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py b/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py index 98fa8de2f..b3ae6b9ee 100644 --- a/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py +++ b/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py @@ -45,6 +45,13 @@ def __init__( num_channels: int = 1, languages: list[str] = [], # when empty, auto-detect the language ): + """ + Create a new instance of Azure STT. + + ``speech_key`` and ``speech_region`` must be set, either using arguments or by setting the + ``AZURE_SPEECH_KEY`` and ``AZURE_SPEECH_REGION`` environmental variables, respectively. + """ + super().__init__( capabilities=stt.STTCapabilities(streaming=True, interim_results=True) ) diff --git a/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py b/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py index 3efeea38a..a21d9e948 100644 --- a/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py +++ b/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/tts.py @@ -16,7 +16,6 @@ import os from dataclasses import dataclass -from livekit import rtc from livekit.agents import tts, utils import azure.cognitiveservices.speech as speechsdk # type: ignore @@ -42,6 +41,13 @@ def __init__( speech_region: str | None = None, voice: str | None = None, ) -> None: + """ + Create a new instance of Azure TTS. + + ``speech_key`` and ``speech_region`` must be set, either using arguments or by setting the + ``AZURE_SPEECH_KEY`` and ``AZURE_SPEECH_REGION`` environmental variables, respectively. + """ + super().__init__( capabilities=tts.TTSCapabilities( streaming=False, @@ -73,17 +79,18 @@ def __init__(self, text: str, opts: _TTSOptions) -> None: @utils.log_exceptions() async def _main_task(self): - stream_callback = _PushAudioOutputStreamCallback( - asyncio.get_running_loop(), self._event_ch + stream_callback = speechsdk.audio.PushAudioOutputStream( + _PushAudioOutputStreamCallback(asyncio.get_running_loop(), self._event_ch) ) synthesizer = _create_speech_synthesizer( config=self._opts, - stream=speechsdk.audio.PushAudioOutputStream(stream_callback), + stream=stream_callback, ) def _synthesize() -> speechsdk.SpeechSynthesisResult: return synthesizer.speak_text_async(self._text).get() # type: ignore + result = None try: result = await asyncio.to_thread(_synthesize) if result.reason != speechsdk.ResultReason.SynthesizingAudioCompleted: @@ -93,8 +100,11 @@ def _synthesize() -> speechsdk.SpeechSynthesisResult: finally: def _cleanup() -> None: - nonlocal synthesizer, result + # cleanup resources inside an Executor + # to avoid blocking the event loop + nonlocal synthesizer, stream_callback, result del synthesizer + del stream_callback del result await asyncio.to_thread(_cleanup) @@ -112,20 +122,30 @@ def __init__( self._request_id = utils.shortuuid() self._segment_id = utils.shortuuid() - def write(self, audio_buffer: memoryview) -> int: - audio = tts.SynthesizedAudio( - request_id=self._request_id, - segment_id=self._segment_id, - frame=rtc.AudioFrame( - data=audio_buffer, - sample_rate=AZURE_SAMPLE_RATE, - num_channels=AZURE_NUM_CHANNELS, - samples_per_channel=audio_buffer.nbytes // 2, - ), + self._bstream = utils.audio.AudioByteStream( + sample_rate=AZURE_SAMPLE_RATE, num_channels=AZURE_NUM_CHANNELS ) - self._loop.call_soon_threadsafe(self._event_ch.send_nowait, audio) + + def write(self, audio_buffer: memoryview) -> int: + for frame in self._bstream.write(audio_buffer.tobytes()): + audio = tts.SynthesizedAudio( + request_id=self._request_id, + segment_id=self._segment_id, + frame=frame, + ) + self._loop.call_soon_threadsafe(self._event_ch.send_nowait, audio) + return audio_buffer.nbytes + def close(self) -> None: + for frame in self._bstream.flush(): + audio = tts.SynthesizedAudio( + request_id=self._request_id, + segment_id=self._segment_id, + frame=frame, + ) + self._loop.call_soon_threadsafe(self._event_ch.send_nowait, audio) + def _create_speech_synthesizer( *, config: _TTSOptions, stream: speechsdk.audio.AudioOutputStream diff --git a/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py b/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py index 8787f001e..38fc4a80e 100644 --- a/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py +++ b/livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.3.1" +__version__ = "0.3.2" diff --git a/livekit-plugins/livekit-plugins-azure/package.json b/livekit-plugins/livekit-plugins-azure/package.json index 40342724b..e1db756ac 100644 --- a/livekit-plugins/livekit-plugins-azure/package.json +++ b/livekit-plugins/livekit-plugins-azure/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-azure", "private": true, - "version": "0.3.1" + "version": "0.3.2" } diff --git a/livekit-plugins/livekit-plugins-browser/cef/.clang-format b/livekit-plugins/livekit-plugins-browser/.clang-format similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/.clang-format rename to livekit-plugins/livekit-plugins-browser/.clang-format diff --git a/livekit-plugins/livekit-plugins-browser/cef/.gitignore b/livekit-plugins/livekit-plugins-browser/.gitignore similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/.gitignore rename to livekit-plugins/livekit-plugins-browser/.gitignore diff --git a/livekit-plugins/livekit-plugins-browser/CHANGELOG.md b/livekit-plugins/livekit-plugins-browser/CHANGELOG.md new file mode 100644 index 000000000..f000991ea --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/CHANGELOG.md @@ -0,0 +1,7 @@ +# livekit-plugins-browser + +## 0.0.2 + +### Patch Changes + +- livekit-plugins-browser: prepare for release - [#659](https://github.com/livekit/agents/pull/659) ([@theomonnom](https://github.com/theomonnom)) diff --git a/livekit-plugins/livekit-plugins-browser/cef/CMakeLists.txt b/livekit-plugins/livekit-plugins-browser/CMakeLists.txt similarity index 90% rename from livekit-plugins/livekit-plugins-browser/cef/CMakeLists.txt rename to livekit-plugins/livekit-plugins-browser/CMakeLists.txt index 0d113bd32..30b9e1255 100644 --- a/livekit-plugins/livekit-plugins-browser/cef/CMakeLists.txt +++ b/livekit-plugins/livekit-plugins-browser/CMakeLists.txt @@ -11,7 +11,8 @@ set(USE_SANDBOX OFF) # TODO(theomonnom): I don't think we want to enable sandbox # Specify the CEF distribution version. if(NOT DEFINED CEF_VERSION) - set(CEF_VERSION "122.1.10+gc902316+chromium-122.0.6261.112") + # set(CEF_VERSION "122.1.10+gc902316+chromium-122.0.6261.112") + set(CEF_VERSION "127.3.5+g114ea2a+chromium-127.0.6533.120") endif() if("${CMAKE_SYSTEM_NAME}" STREQUAL "Darwin") diff --git a/livekit-plugins/livekit-plugins-browser/cef/LICENSE.txt b/livekit-plugins/livekit-plugins-browser/LICENSE.txt similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/LICENSE.txt rename to livekit-plugins/livekit-plugins-browser/LICENSE.txt diff --git a/livekit-plugins/livekit-plugins-browser/README.md b/livekit-plugins/livekit-plugins-browser/README.md new file mode 100644 index 000000000..ae9207bfd --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/README.md @@ -0,0 +1,4 @@ +# LiveKit Plugins Browser + +Chromium Embedded Framework (CEF) for LiveKit Agents + diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/agents_python.cpp b/livekit-plugins/livekit-plugins-browser/cef/src/agents_python.cpp deleted file mode 100644 index 7e44d624a..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/agents_python.cpp +++ /dev/null @@ -1,52 +0,0 @@ -#include "agents_python.hpp" - -#include "app.hpp" -#include "include/internal/cef_mac.h" - -#include -#include - -namespace py = pybind11; - -BrowserApp::BrowserApp(const AppOptions& options) : options_(options) { - app_ = new AgentApp(options_.dev_mode, options_.initialized_callback); -} - -std::shared_ptr BrowserApp::CreateBrowser( - const std::string& url, - const BrowserOptions& options) { - - app_->CreateBrowser(url, options.framerate, options.created_callback); - return nullptr;//std::make_shared(); -} - -int BrowserApp::Run() { - return RunAgentApp(app_); -} - -BrowserImpl::BrowserImpl() {} - -void BrowserImpl::SetSize(int width, int height) {} - -PYBIND11_MODULE(lkcef_python, m) { - // Isn't that fucking cool? llm using browsers - m.doc() = "Chromium Embedded Framework (CEF) for LiveKit Agents"; - - py::class_(m, "AppOptions") - .def(py::init()) - .def_readwrite("dev_mode", &AppOptions::dev_mode) - .def_readwrite("initialized_callback", &AppOptions::initialized_callback); - - py::class_(m, "BrowserOptions") - .def(py::init()) - .def_readwrite("framerate", &BrowserOptions::framerate) - .def_readwrite("created_callback", &BrowserOptions::created_callback); - - py::class_(m, "BrowserApp") - .def(py::init()) - .def("create_browser", &BrowserApp::CreateBrowser) - .def("run", &BrowserApp::Run); - - py::class_(m, "BrowserImpl") - .def("set_size", &BrowserImpl::SetSize); -} diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/agents_python.hpp b/livekit-plugins/livekit-plugins-browser/cef/src/agents_python.hpp deleted file mode 100644 index e77a59776..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/agents_python.hpp +++ /dev/null @@ -1,39 +0,0 @@ -#ifndef LKCEF_AGENTS_PYTHON_HPP -#define LKCEF_AGENTS_PYTHON_HPP - -#include -#include - -#include "app.hpp" - -class BrowserImpl; - -struct AppOptions { - bool dev_mode = false; - std::function initialized_callback = nullptr; -}; - -struct BrowserOptions { - int framerate = 30; - std::function created_callback = nullptr; -}; - -struct BrowserApp { - BrowserApp(const AppOptions& options); - - std::shared_ptr CreateBrowser(const std::string& url, - const BrowserOptions& options); - int Run(); - - private: - AppOptions options_; - CefRefPtr app_; -}; - -struct BrowserImpl { - BrowserImpl(); - - void SetSize(int width, int height); -}; - -#endif // LKCEF_AGENTS_PYTHON_HPP diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/app.hpp b/livekit-plugins/livekit-plugins-browser/cef/src/app.hpp deleted file mode 100644 index aa5b8d1ab..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/app.hpp +++ /dev/null @@ -1,47 +0,0 @@ -#ifndef LKCEF_APP_HPP -#define LKCEF_APP_HPP - -#include "dev_renderer.hpp" -#include "handler.hpp" -#include "include/cef_app.h" -#include "include/cef_base.h" -#include "include/cef_browser_process_handler.h" -#include "include/cef_client.h" -#include "include/internal/cef_ptr.h" - -class AgentApp : public CefApp, public CefBrowserProcessHandler { - public: - AgentApp(bool dev_mode, std::function initialized_callback); - - CefRefPtr GetBrowserProcessHandler() override { - return this; - } - - void OnBeforeCommandLineProcessing( - const CefString& process_type, - CefRefPtr command_line) override; - - void OnContextInitialized() override; - - CefRefPtr GetDefaultClient() override; - - CefRefPtr CreateBrowser( - const std::string& url, - int framerate, - std::function created_callback); - - int Run(); - - private: - IMPLEMENT_REFCOUNTING(AgentApp); - - CefRefPtr client_; - CefRefPtr dev_renderer_; - - bool dev_mode_; - std::function initialized_callback_; -}; - -int RunAgentApp(CefRefPtr app); - -#endif // LKCEF_APP_HPP diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/app_mac.mm b/livekit-plugins/livekit-plugins-browser/cef/src/app_mac.mm deleted file mode 100644 index 3136303eb..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/app_mac.mm +++ /dev/null @@ -1,146 +0,0 @@ - -#import - -#include - -#include "app.hpp" -#include "handler.hpp" -#include "include/cef_application_mac.h" -#include "include/cef_command_line.h" -#include "include/wrapper/cef_library_loader.h" - -// Receives notifications from the application. -@interface AgentsAppDelegate : NSObject - -- (void)createApplication:(id)object; -- (void)tryToTerminateApplication:(NSApplication*)app; -@end - -// Provide the CefAppProtocol implementation required by CEF. -@interface AgentsApplication : NSApplication { - @private - BOOL handlingSendEvent_; -} -@end - -@implementation AgentsApplication -- (BOOL)isHandlingSendEvent { - return handlingSendEvent_; -} - -- (void)setHandlingSendEvent:(BOOL)handlingSendEvent { - handlingSendEvent_ = handlingSendEvent; -} - -- (void)sendEvent:(NSEvent*)event { - CefScopedSendingEvent sendingEventScoper; - [super sendEvent:event]; -} - -- (void)terminate:(id)sender { - AgentsAppDelegate* delegate = - static_cast([NSApp delegate]); - [delegate tryToTerminateApplication:self]; - // Return, don't exit. The application is responsible for exiting on its own. -} -@end - -@implementation AgentsAppDelegate - -// Create the application on the UI thread. -- (void)createApplication:(id)object { - [[NSBundle mainBundle] loadNibNamed:@"MainMenu" - owner:NSApp - topLevelObjects:nil]; - - // Set the delegate for application events. - [[NSApplication sharedApplication] setDelegate:self]; -} - -- (void)tryToTerminateApplication:(NSApplication*)app { -} - -- (NSApplicationTerminateReply)applicationShouldTerminate: - (NSApplication*)sender { - return NSTerminateNow; -} - -// Called when the user clicks the app dock icon while the application is -// already running. -- (BOOL)applicationShouldHandleReopen:(NSApplication*)theApplication - hasVisibleWindows:(BOOL)flag { - return NO; -} -@end - -// Entry point function for the browser process. -int RunAgentApp(CefRefPtr app) { - CefMainArgs main_args(0, nullptr); - - @autoreleasepool { - [AgentsApplication sharedApplication]; - - // If there was an invocation to NSApp prior to this method, then the NSApp - // will not be a AgentsApplication, but will instead be an NSApplication. - // This is undesirable and we must enforce that this doesn't happen. - CHECK([NSApp isKindOfClass:[AgentsApplication class]]); - - std::string framework_path = - "/Users/theomonnom/livekit/agents/livekit-plugins/" - "livekit-plugins-browser/cef/src/Debug/lkcef_app.app/Contents/" - "Frameworks/Chromium Embedded Framework.framework"; - std::string main_bundle_path = - "/Users/theomonnom/livekit/agents/livekit-plugins/" - "livekit-plugins-browser/cef/src/Debug/lkcef_app.app"; - std::string subprocess_path = - "/Users/theomonnom/livekit/agents/livekit-plugins/" - "livekit-plugins-browser/cef/src/Debug/lkcef_app.app/Contents/" - "Frameworks/lkcef Helper.app/Contents/MacOS/lkcef Helper"; - - std::string framework_lib = framework_path + "/Chromium Embedded Framework"; - if (!cef_load_library(framework_lib.c_str())) { - std::cerr << "lkcef: Failed to load CEF library" << std::endl; - return 1; - } - - CefSettings settings{}; - // settings.remote_debugging_port = 8088; - CefString(&settings.framework_dir_path).FromString(framework_path); - CefString(&settings.main_bundle_path).FromString(main_bundle_path); - CefString(&settings.browser_subprocess_path).FromString(subprocess_path); - - settings.no_sandbox = true; // No sandbox for MacOS, for livekit-agents, - // we're only going to support Linux - settings.windowless_rendering_enabled = true; - - // Initialize the CEF browser process. May return false if initialization - // fails or if early exit is desired (for example, due to process singleton - // relaunch behavior). - if (!CefInitialize(main_args, settings, app.get(), nullptr)) { - std::cerr << "lkcef: Failed to initialize CEF" << std::endl; - // TODO(theomonnom): Use CefGetExitCode(); - return 1; - } - - // Create the application delegate. - AgentsAppDelegate* delegate = [[AgentsAppDelegate alloc] init]; - // Set as the delegate for application events. - NSApp.delegate = delegate; - - [delegate performSelectorOnMainThread:@selector(createApplication:) - withObject:nil - waitUntilDone:NO]; - - app->Run(); - - CefShutdown(); - cef_unload_library(); - -#if !__has_feature(objc_arc) - [delegate release]; -#endif // !__has_feature(objc_arc) - delegate = nil; - } // @autoreleasepool - - return 0; -} diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.cpp b/livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.cpp deleted file mode 100644 index a1e10d316..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.cpp +++ /dev/null @@ -1,195 +0,0 @@ -#include "dev_renderer.hpp" - -#include - -#include "imgui.h" -#include "imgui_impl_glfw.h" -#include "imgui_impl_opengl3.h" - -#include "include/wrapper/cef_helpers.h" - -#include "include/cef_app.h" - -// DCHECK on gl errors. -#if DCHECK_IS_ON() -#define VERIFY_NO_ERROR \ - { \ - int _gl_error = glGetError(); \ - DCHECK(_gl_error == GL_NO_ERROR) << "glGetError returned " << _gl_error; \ - } -#else -#define VERIFY_NO_ERROR -#endif - -static void glfw_error_callback(int error, const char* description) { - fprintf(stderr, "GLFW Error %d: %s\n", error, description); -} - -DevRenderer::DevRenderer() { -} - -void DevRenderer::OnAfterCreated(CefRefPtr browser) { - CEF_REQUIRE_UI_THREAD(); - int identifier = browser->GetIdentifier(); - - unsigned int texture_id; - glGenTextures(1, &texture_id); - VERIFY_NO_ERROR; - - RenderData render_data{}; - render_data.texture_id = texture_id; - render_data_.insert({identifier, render_data}); - - glBindTexture(GL_TEXTURE_2D, texture_id); - VERIFY_NO_ERROR; - glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); - VERIFY_NO_ERROR; - glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); -} - -void DevRenderer::OnPaint(CefRefPtr browser, - CefRenderHandler::PaintElementType type, - const CefRenderHandler::RectList& dirtyRects, - const void* buffer, - int width, - int height) { - CEF_REQUIRE_UI_THREAD(); - - if (type != CefRenderHandler::PaintElementType::PET_VIEW){ - std::cout << "Ignoring PET_POPUP" << std::endl; - return; // Ignore PET_POPUP for now, bc I'm lazy - } - - int identifier = browser->GetIdentifier(); - RenderData* render_data = &render_data_[identifier]; - - int old_width = render_data->view_width; - int old_height = render_data->view_height; - - render_data->view_width = width; - render_data->view_height = height; - - glBindTexture(GL_TEXTURE_2D, render_data->texture_id); - - glPixelStorei(GL_UNPACK_ROW_LENGTH, width); - VERIFY_NO_ERROR; - - bool has_fullscreen_rect = dirtyRects.size() == 1 && - dirtyRects[0] == CefRect(0, 0, width, height); - - if (old_width != width || old_height != height || has_fullscreen_rect) { - glPixelStorei(GL_UNPACK_SKIP_PIXELS, 0); - VERIFY_NO_ERROR; - glPixelStorei(GL_UNPACK_SKIP_ROWS, 0); - VERIFY_NO_ERROR; - glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, - GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, buffer); - VERIFY_NO_ERROR; - } else { - CefRenderHandler::RectList::const_iterator i = dirtyRects.begin(); - for (; i != dirtyRects.end(); ++i) { - const CefRect& rect = *i; - glPixelStorei(GL_UNPACK_SKIP_PIXELS, rect.x); - VERIFY_NO_ERROR; - glPixelStorei(GL_UNPACK_SKIP_ROWS, rect.y); - VERIFY_NO_ERROR; - glTexSubImage2D(GL_TEXTURE_2D, 0, rect.x, rect.y, rect.width, - rect.height, GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, - buffer); - VERIFY_NO_ERROR; - } - } -} - -void DevRenderer::OnBeforeClose(CefRefPtr browser) { - CEF_REQUIRE_UI_THREAD(); - int identifier = browser->GetIdentifier(); - RenderData* render_data = &render_data_[identifier]; - glDeleteTextures(1, &render_data->texture_id); - render_data_.erase(identifier); -} - -void DevRenderer::Run() { - glfwSetErrorCallback(glfw_error_callback); - - if (!glfwInit()) { - std::cerr << "Failed to initialize GLFW" << std::endl; - return; - } - - const char* glsl_version = "#version 150"; - glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3); - glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2); - glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); - glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); - - window_ = - glfwCreateWindow(800, 600, "livekit-plugins-browser (Development Window)", - nullptr, nullptr); - - if (!window_) { - std::cerr << "Failed to create GLFW window" << std::endl; - glfwTerminate(); - return; - } - glfwMakeContextCurrent(window_); - glfwSwapInterval(1); // Enable vsync - - IMGUI_CHECKVERSION(); - - ImGui::CreateContext(); - ImGuiIO& io = ImGui::GetIO(); - io.ConfigFlags |= ImGuiConfigFlags_NavEnableKeyboard; - io.ConfigFlags |= ImGuiConfigFlags_DockingEnable; - - // Setup Platform/Renderer backends - ImGui_ImplGlfw_InitForOpenGL(window_, true); - ImGui_ImplOpenGL3_Init(glsl_version); - - - ImVec4 clear_color = ImVec4(0.45f, 0.55f, 0.60f, 1.00f); - while (!glfwWindowShouldClose(window_)) { - glfwPollEvents(); - - CefDoMessageLoopWork(); - - ImGui_ImplOpenGL3_NewFrame(); - ImGui_ImplGlfw_NewFrame(); - ImGui::NewFrame(); - ImGui::ShowDemoWindow(); - - - for (auto& [identifier, render_data] : render_data_) { - ImGui::Begin("Browser"); - ImGui::Text("Browser %d", identifier); - ImGui::Image((void*)(intptr_t)render_data.texture_id, - ImVec2(render_data.view_width, render_data.view_height)); - ImGui::End(); - } - - - - // Rendering - ImGui::Render(); - int display_w, display_h; - glfwGetFramebufferSize(window_, &display_w, &display_h); - glViewport(0, 0, display_w, display_h); - glClearColor(clear_color.x * clear_color.w, clear_color.y * clear_color.w, - clear_color.z * clear_color.w, clear_color.w); - glClear(GL_COLOR_BUFFER_BIT); - ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData()); - - glfwSwapBuffers(window_); - } - - ImGui_ImplOpenGL3_Shutdown(); - ImGui_ImplGlfw_Shutdown(); - ImGui::DestroyContext(); - - glfwDestroyWindow(window_); - glfwTerminate(); -} - -void DevRenderer::Close() { - //glfwSetWindowShouldClose(window_, GLFW_TRUE); -} diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/handler.cpp b/livekit-plugins/livekit-plugins-browser/cef/src/handler.cpp deleted file mode 100644 index 8ca9c88c5..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/handler.cpp +++ /dev/null @@ -1,156 +0,0 @@ -#include "handler.hpp" - -#include - -#include "include/base/cef_callback.h" -#include "include/cef_app.h" -#include "include/cef_parser.h" -#include "include/views/cef_browser_view.h" -#include "include/views/cef_window.h" -#include "include/wrapper/cef_closure_task.h" -#include "include/wrapper/cef_helpers.h" - -namespace { - -// Returns a data: URI with the specified contents. -std::string GetDataURI(const std::string& data, const std::string& mime_type) { - return "data:" + mime_type + ";base64," + - CefURIEncode(CefBase64Encode(data.data(), data.size()), false) - .ToString(); -} - -} // namespace - -AgentHandler::AgentHandler(CefRefPtr dev_renderer) - : dev_renderer_(dev_renderer) {} - -void AgentHandler::OnTitleChange(CefRefPtr browser, - const CefString& title) { - CEF_REQUIRE_UI_THREAD(); -} - -void AgentHandler::OnPaint(CefRefPtr browser, - PaintElementType type, - const RectList& dirtyRects, - const void* buffer, - int width, - int height) { - - std::cout << "OnPaint" << std::endl; - - if (dev_renderer_) - dev_renderer_->OnPaint(browser, type, dirtyRects, buffer, width, height); -} - -void AgentHandler::GetViewRect(CefRefPtr browser, CefRect& rect) { - CEF_REQUIRE_UI_THREAD(); - rect.Set(0, 0, 800, 600); -}; - -void AgentHandler::OnAudioStreamPacket(CefRefPtr browser, - const float** data, - int frames, - int64_t pts) { - std::cout << "OnAudioStreamPacket" << std::endl; -} - -void AgentHandler::OnAudioStreamStarted(CefRefPtr browser, - const CefAudioParameters& params, - int channels) {} - -void AgentHandler::OnAudioStreamStopped(CefRefPtr browser) {} - -void AgentHandler::OnAudioStreamError(CefRefPtr browser, - const CefString& message) {} - -void AgentHandler::OnAfterCreated(CefRefPtr browser) { - CEF_REQUIRE_UI_THREAD(); - - int identifier = browser->GetIdentifier(); - CefRefPtr handle = pending_handles_.front(); - pending_handles_.pop_front(); - - handle->browser_ = browser; - if (handle->created_callback_) - handle->created_callback_(); - - browser_handles_[identifier] = handle; - - if (dev_renderer_) - dev_renderer_->OnAfterCreated(browser); -} - -bool AgentHandler::DoClose(CefRefPtr browser) { - CEF_REQUIRE_UI_THREAD(); - - return false; -} - -void AgentHandler::OnBeforeClose(CefRefPtr browser) { - CEF_REQUIRE_UI_THREAD(); - - - if (dev_renderer_) - dev_renderer_->OnBeforeClose(browser); -} - -void AgentHandler::OnLoadError(CefRefPtr browser, - CefRefPtr frame, - ErrorCode errorCode, - const CefString& errorText, - const CefString& failedUrl) { - CEF_REQUIRE_UI_THREAD(); - - // Allow Chrome to show the error page. - if (IsChromeRuntimeEnabled()) { - return; - } - - // Don't display an error for downloaded files. - if (errorCode == ERR_ABORTED) { - return; - } - - // Display a load error message using a data: URI. - std::stringstream ss; - ss << "" - "

Failed to load URL " - << std::string(failedUrl) << " with error " << std::string(errorText) - << " (" << errorCode << ").

"; - - frame->LoadURL(GetDataURI(ss.str(), "text/html")); -} - -/* -void AgentHandler::CloseAllBrowsers(bool force_close) { - if (!CefCurrentlyOn(TID_UI)) { - // Execute on the UI thread. - CefPostTask(TID_UI, base::BindOnce(&AgentHandler::CloseAllBrowsers, this, - force_close)); - return; - } - - if (browser_list_.empty()) { - return; - } - - BrowserList::const_iterator it = browser_list_.begin(); - for (; it != browser_list_.end(); ++it) { - (*it)->GetHost()->CloseBrowser(force_close); - } -} - */ - -bool AgentHandler::IsChromeRuntimeEnabled() { - static bool enabled = []() { - return CefCommandLine::GetGlobalCommandLine()->HasSwitch( - "enable-chrome-runtime"); - }(); - return enabled; -} - -#if !defined(OS_MAC) -void AgentHandler::PlatformShowWindow(CefRefPtr browser) { - NOTIMPLEMENTED(); -} -#endif diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/handler.hpp b/livekit-plugins/livekit-plugins-browser/cef/src/handler.hpp deleted file mode 100644 index 2406c6d91..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/handler.hpp +++ /dev/null @@ -1,94 +0,0 @@ -#ifndef LKCEF_HANDLER_HPP -#define LKCEF_HANDLER_HPP - -#include "include/cef_client.h" - -#include "dev_renderer.hpp" -#include - -class BrowserHandle: public CefBaseRefCounted{ - public: - BrowserHandle(std::function created_callback) : created_callback_(created_callback) {} - - - CefRefPtr browser_ = nullptr; - std::function created_callback_ = nullptr; - - - IMPLEMENT_REFCOUNTING(BrowserHandle); -}; - - -class AgentHandler : public CefClient, - public CefDisplayHandler, - public CefRenderHandler, - public CefAudioHandler, - public CefLifeSpanHandler, - public CefLoadHandler { - -public: - AgentHandler(CefRefPtr dev_renderer); - - CefRefPtr GetDisplayHandler() override { return this; } - CefRefPtr GetRenderHandler() override { return this; } - CefRefPtr GetAudioHandler() override { return this; } - CefRefPtr GetLifeSpanHandler() override { return this; } - CefRefPtr GetLoadHandler() override { return this; } - - // CefDisplayHandler methods - void OnTitleChange(CefRefPtr browser, - const CefString &title) override; - - // CefRenderHandler methods - void OnPaint(CefRefPtr browser, PaintElementType type, - const RectList &dirtyRects, const void *buffer, int width, - int height) override; - - void GetViewRect(CefRefPtr browser, CefRect &rect) override; - - // CefAudioHandler methods - void OnAudioStreamPacket(CefRefPtr browser, const float **data, - int frames, int64_t pts) override; - - void OnAudioStreamStarted(CefRefPtr browser, - const CefAudioParameters ¶ms, - int channels) override; - - void OnAudioStreamStopped(CefRefPtr browser) override; - - void OnAudioStreamError(CefRefPtr browser, - const CefString &message) override; - - // CefLifeSpanHandler methods - void OnAfterCreated(CefRefPtr browser) override; - bool DoClose(CefRefPtr browser) override; - void OnBeforeClose(CefRefPtr browser) override; - - // CefLoadHandler methods - void OnLoadError(CefRefPtr browser, CefRefPtr frame, - ErrorCode errorCode, const CefString &errorText, - const CefString &failedUrl) override; - - //void CloseAllBrowsers(bool force_close); - - static bool IsChromeRuntimeEnabled(); - - - void AddPendingHandle(CefRefPtr handle) { - pending_handles_.push_back(handle); - } - - void RemovePendingHandle(CefRefPtr handle) { - pending_handles_.remove(handle); - } - -private: - std::unordered_map> browser_handles_; - std::list> pending_handles_; - - CefRefPtr dev_renderer_; - - IMPLEMENT_REFCOUNTING(AgentHandler); -}; - -#endif // LKCEF_HANDLER_HPP diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcef-Info.plist b/livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcef-Info.plist deleted file mode 100644 index ce63cb8f6..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcef-Info.plist +++ /dev/null @@ -1,36 +0,0 @@ - - - - - CFBundleDevelopmentRegion - en - CFBundleDisplayName - ${EXECUTABLE_NAME} - CFBundleExecutable - ${EXECUTABLE_NAME} - CFBundleIdentifier - io.livekit.cef.helper${BUNDLE_ID_SUFFIX} - CFBundleInfoDictionaryVersion - 6.0 - CFBundleName - ${PRODUCT_NAME} - CFBundlePackageType - APPL - CFBundleSignature - ???? - LSEnvironment - - MallocNanoZone - 0 - - LSFileQuarantineEnabled - - LSMinimumSystemVersion - 10.11.0 - LSUIElement - 1 - NSSupportsAutomaticGraphicsSwitching - - - - diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/run_browser.py b/livekit-plugins/livekit-plugins-browser/cef/src/run_browser.py deleted file mode 100644 index c12ab3744..000000000 --- a/livekit-plugins/livekit-plugins-browser/cef/src/run_browser.py +++ /dev/null @@ -1,27 +0,0 @@ -# flake8: noqa - -import sys - -print("cwd: ", sys.path[0]) - -sys.path.insert(0, "./Debug") -import lkcef_python as lkcef - -print("lkcef __dict__: ", lkcef.__dict__) -print("BrowserImpl __dict__: ", lkcef.BrowserImpl.__dict__) - - -def _context_initialized(): - opts = lkcef.BrowserOptions() - opts.framerate = 30 - - app.create_browser("http://www.livekit.io", opts) - print("LOL: Context initialized") - - -opts = lkcef.AppOptions() -opts.dev_mode = True -opts.initialized_callback = _context_initialized - -app = lkcef.BrowserApp(opts) -app.run() diff --git a/livekit-plugins/livekit-plugins-browser/cef/cmake/DownloadCEF.cmake b/livekit-plugins/livekit-plugins-browser/cmake/DownloadCEF.cmake similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/cmake/DownloadCEF.cmake rename to livekit-plugins/livekit-plugins-browser/cmake/DownloadCEF.cmake diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/__init__.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/__init__.py new file mode 100644 index 000000000..66009b84e --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/__init__.py @@ -0,0 +1,29 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from livekit.agents import Plugin + +from .log import logger +from .proc import BrowserContext, BrowserPage +from .version import __version__ + +__all__ = ["BrowserContext", "BrowserPage"] + + +class BrowserPlugin(Plugin): + def __init__(self): + super().__init__(__name__, __version__, __package__, logger) + + +Plugin.register_plugin(BrowserPlugin()) diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/log.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/log.py new file mode 100644 index 000000000..8179ee6a5 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/log.py @@ -0,0 +1,3 @@ +import logging + +logger = logging.getLogger("livekit.plugins.browser") diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc.py new file mode 100644 index 000000000..6910a0ba9 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc.py @@ -0,0 +1,239 @@ +from __future__ import annotations + +import asyncio +import contextlib +import multiprocessing as mp +import multiprocessing.context as mpc +import multiprocessing.shared_memory as mp_shm +import socket +import tempfile +from contextlib import asynccontextmanager +from dataclasses import dataclass +from typing import Callable, Literal + +from livekit import rtc +from livekit.agents import ipc, utils + +from . import logger, proc_main, proto + + +@dataclass +class _PageOptions: + page_id: int + url: str + width: int + height: int + framerate: int + + +EventTypes = Literal["paint"] + + +@dataclass +class PaintData: + dirty_rects: list[tuple[int, int, int, int]] + frame: rtc.VideoFrame + width: int + height: int + + +@dataclass +class BrowserOptions: + url: str + framerate: int + width: int + height: int + paint_callback: Callable[[PaintData], None] + + +class BrowserPage(utils.EventEmitter[EventTypes]): + def __init__( + self, + mp_ctx: mpc.SpawnContext, + opts: _PageOptions, + ctx_duplex: utils.aio.duplex_unix._AsyncDuplex, + ) -> None: + super().__init__() + self._mp_ctx = mp_ctx + self._opts = opts + self._ctx_duplex = ctx_duplex + + self._view_width = 0 + self._view_height = 0 + + self._created_fut = asyncio.Future() + self._close_fut = asyncio.Future() + + @property + def id(self) -> int: + return self._opts.page_id + + async def start(self) -> None: + shm_name = f"lkcef_browser_{utils.shortuuid()}" + self._shm = mp_shm.SharedMemory( + create=True, + size=proto.SHM_MAX_WIDTH * proto.SHM_MAX_HEIGHT * 4, + name=shm_name, + ) + + self._framebuffer = rtc.VideoFrame( + proto.SHM_MAX_WIDTH, + proto.SHM_MAX_HEIGHT, + rtc.VideoBufferType.BGRA, + bytearray(proto.SHM_MAX_WIDTH * proto.SHM_MAX_HEIGHT * 4), + ) + + req = proto.CreateBrowserRequest( + page_id=self._opts.page_id, + width=self._opts.width, + height=self._opts.height, + shm_name=shm_name, + url=self._opts.url, + framerate=self._opts.framerate, + ) + + await ipc.channel.asend_message(self._ctx_duplex, req) + + # TODO(theomonnom): create timeout (would prevent never resolving futures if the + # browser process crashed for some reasons) + await asyncio.shield(self._created_fut) + + async def aclose(self) -> None: + await ipc.channel.asend_message( + self._ctx_duplex, proto.CloseBrowserRequest(page_id=self.id) + ) + await asyncio.shield(self._close_fut) + + self._shm.unlink() + self._shm.close() + + async def _handle_created(self, msg: proto.CreateBrowserResponse) -> None: + self._created_fut.set_result(None) + + async def _handle_paint(self, acq: proto.AcquirePaintData) -> None: + old_width = self._view_width + old_height = self._view_height + self._view_width = acq.width + self._view_height = acq.height + + # TODO(theomonnom): remove hacky alloc-free resizing + self._framebuffer._width = acq.width + self._framebuffer._height = acq.height + + proto.copy_paint_data( + acq, old_width, old_height, self._shm.buf, self._framebuffer.data + ) + + paint_data = PaintData( + dirty_rects=acq.dirty_rects, + frame=self._framebuffer, + width=acq.width, + height=acq.height, + ) + self.emit("paint", paint_data) + + release_paint = proto.ReleasePaintData(page_id=acq.page_id) + await ipc.channel.asend_message(self._ctx_duplex, release_paint) + + async def _handle_close(self, msg: proto.BrowserClosed) -> None: + logger.debug("browser page closed", extra={"page_id": self.id}) + self._close_fut.set_result(None) + + +class BrowserContext: + def __init__(self, *, dev_mode: bool, remote_debugging_port: int = 0) -> None: + self._mp_ctx = mp.get_context("spawn") + self._pages: dict[int, BrowserPage] = {} + self._dev_mode = dev_mode + self._initialized = False + self._next_page_id = 1 + self._remote_debugging_port = remote_debugging_port + + async def initialize(self) -> None: + mp_pch, mp_cch = socket.socketpair() + self._duplex = await utils.aio.duplex_unix._AsyncDuplex.open(mp_pch) + + self._proc = self._mp_ctx.Process(target=proc_main.main, args=(mp_cch,)) + self._proc.start() + mp_cch.close() + + if not self._remote_debugging_port: + with contextlib.closing( + socket.socket(socket.AF_INET, socket.SOCK_STREAM) + ) as s: + s.bind(("", 0)) + s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) + self._remote_debugging_port = s.getsockname()[1] + + logger.debug("using remote debugging port %d", self._remote_debugging_port) + + await ipc.channel.asend_message( + self._duplex, + proto.InitializeContextRequest( + dev_mode=self._dev_mode, + remote_debugging_port=self._remote_debugging_port, + root_cache_path=tempfile.mkdtemp(), # TODO(theomonnom): cleanup + ), + ) + resp = await ipc.channel.arecv_message(self._duplex, proto.IPC_MESSAGES) + assert isinstance(resp, proto.ContextInitializedResponse) + self._initialized = True + logger.debug("browser context initialized", extra={"pid": self._proc.pid}) + + self._main_atask = asyncio.create_task(self._main_task(self._duplex)) + + @asynccontextmanager + async def playwright(self, timeout: float | None = None): + if not self._initialized: + raise RuntimeError("BrowserContext not initialized") + + from playwright.async_api import async_playwright + + async with async_playwright() as p: + url = f"http://localhost:{self._remote_debugging_port}" + browser = await p.chromium.connect_over_cdp(url, timeout=timeout) + try: + yield browser + finally: + await browser.close() + + @utils.log_exceptions(logger) + async def _main_task(self, duplex: utils.aio.duplex_unix._AsyncDuplex) -> None: + while True: + try: + msg = await ipc.channel.arecv_message(duplex, proto.IPC_MESSAGES) + except utils.aio.duplex_unix.DuplexClosed: + break + + if isinstance(msg, proto.CreateBrowserResponse): + page = self._pages[msg.page_id] + await page._handle_created(msg) + elif isinstance(msg, proto.AcquirePaintData): + page = self._pages[msg.page_id] + await page._handle_paint(msg) + elif isinstance(msg, proto.BrowserClosed): + page = self._pages[msg.page_id] + await page._handle_close(msg) + + async def new_page( + self, *, url: str, width: int = 800, height: int = 600, framerate: int = 30 + ) -> BrowserPage: + if not self._initialized: + raise RuntimeError("BrowserContext not initialized") + + page_id = self._next_page_id + self._next_page_id += 1 + page = BrowserPage( + self._mp_ctx, + _PageOptions( + page_id=page_id, + url=url, + width=width, + height=height, + framerate=framerate, + ), + self._duplex, + ) + self._pages[page_id] = page + await page.start() + return page diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc_main.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc_main.py new file mode 100644 index 000000000..c9ca11706 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proc_main.py @@ -0,0 +1,193 @@ +import importlib.resources +import multiprocessing.shared_memory as mp_shm +import socket +import threading + +from livekit.agents import ipc, utils + +from . import logger, proto + + +class BrowserServer: + def __init__( + self, + duplex: utils.aio.duplex_unix._Duplex, + shm: mp_shm.SharedMemory, + page_id: int, + ): + self._duplex = duplex + self._shm = shm + self._page_id = page_id + + self._view_width = 0 + self._view_height = 0 + + self._closing = False + self._release_paint_e = threading.Event() + + @staticmethod + def create( + *, + duplex: utils.aio.duplex_unix._Duplex, + create_req: proto.CreateBrowserRequest, + browser_app, + ) -> "BrowserServer": + logger.debug( + "creating browser", + extra={ + "page_id": create_req.page_id, + "url": create_req.url, + "framerate": create_req.framerate, + "width": create_req.width, + "height": create_req.height, + "shm_name": create_req.shm_name, + }, + ) + + import lkcef_python as lkcef + + opts = lkcef.BrowserOptions() + opts.framerate = create_req.framerate + opts.width = create_req.width + opts.height = create_req.height + + shm = mp_shm.SharedMemory(name=create_req.shm_name) + bserver = BrowserServer(duplex, shm, create_req.page_id) + + opts.created_callback = bserver._browser_created + opts.paint_callback = bserver._paint + opts.close_callback = bserver._closed + browser_app.create_browser(create_req.url, opts) + return bserver + + def _browser_created(self, impl): + browser_id = impl.identifier() + logger.debug( + "browser created", + extra={"browser_id": browser_id, "page_id": self._page_id}, + ) + + self._impl = impl + + try: + ipc.channel.send_message( + self._duplex, + proto.CreateBrowserResponse( + page_id=self._page_id, browser_id=browser_id + ), + ) + except utils.aio.duplex_unix.DuplexClosed: + logger.exception("failed to send CreateBrowserResponse") + + def _paint(self, frame_data): + if self._closing: + return # make sure to not use the shm + + acq = proto.AcquirePaintData() + acq.page_id = self._page_id + acq.width = frame_data.width + acq.height = frame_data.height + + dirty_rects = [] + for rect in frame_data.dirty_rects: + dirty_rects.append((rect.x, rect.y, rect.width, rect.height)) + + acq.dirty_rects = dirty_rects + + old_width = self._view_width + old_height = self._view_height + self._view_width = frame_data.width + self._view_height = frame_data.height + + proto.copy_paint_data( + acq, old_width, old_height, frame_data.buffer, self._shm.buf + ) + + try: + ipc.channel.send_message(self._duplex, acq) + self._release_paint_e.wait() # wait for release + self._release_paint_e.clear() + except utils.aio.duplex_unix.DuplexClosed: + logger.exception("failed to send AcquirePaintData") + + def _closed(self) -> None: + ipc.channel.send_message( + self._duplex, proto.BrowserClosed(page_id=self._page_id) + ) + + def handle_release_paint(self, msg: proto.ReleasePaintData): + self._release_paint_e.set() + + def handle_close(self, msg: proto.CloseBrowserRequest): + self._closing = True + self._impl.close() + + +def _manager_thread(duplex: utils.aio.duplex_unix._Duplex, browser_app): + browsers: dict[int, BrowserServer] = {} + + while True: + try: + msg = ipc.channel.recv_message(duplex, proto.IPC_MESSAGES) + except utils.aio.duplex_unix.DuplexClosed: + break + + if isinstance(msg, proto.CreateBrowserRequest): + server = BrowserServer.create( + duplex=duplex, create_req=msg, browser_app=browser_app + ) + browsers[msg.page_id] = server + elif isinstance(msg, proto.ReleasePaintData): + server = browsers[msg.page_id] + server.handle_release_paint(msg) + elif isinstance(msg, proto.CloseBrowserRequest): + server = browsers[msg.page_id] + server.handle_close(msg) + del browsers[msg.page_id] + + +def main(mp_cch: socket.socket): + import lkcef_python as lkcef + + duplex = utils.aio.duplex_unix._Duplex.open(mp_cch) + + init_req = ipc.channel.recv_message(duplex, proto.IPC_MESSAGES) + assert isinstance(init_req, proto.InitializeContextRequest) + + logger.debug("initializing browser context", extra={"dev_mode": init_req.dev_mode}) + + def _context_initialized(): + try: + ipc.channel.send_message(duplex, proto.ContextInitializedResponse()) + except utils.aio.duplex_unix.DuplexClosed: + logger.exception("failed to send ContextInitializedResponse") + + opts = lkcef.AppOptions() + opts.dev_mode = init_req.dev_mode + opts.remote_debugging_port = init_req.remote_debugging_port + opts.root_cache_path = init_req.root_cache_path + opts.initialized_callback = _context_initialized + + res = ( + importlib.resources.files("livekit.plugins.browser.resources") / "lkcef_app.app" + ) + with importlib.resources.as_file(res) as path: + opts.framework_path = str( + path / "Contents" / "Frameworks" / "Chromium Embedded Framework.framework" + ) + opts.main_bundle_path = str(path) + opts.subprocess_path = str( + path + / "Contents" + / "Frameworks" + / "lkcef Helper.app" + / "Contents" + / "MacOS" + / "lkcef Helper" + ) + + app = lkcef.BrowserApp(opts) + man_t = threading.Thread(target=_manager_thread, args=(duplex, app)) + man_t.start() + + app.run() # run indefinitely diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proto.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proto.py new file mode 100644 index 000000000..17d0cac0f --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/proto.py @@ -0,0 +1,196 @@ +import io +from dataclasses import dataclass, field +from typing import ClassVar + +import numpy as np +from livekit.agents.ipc import channel + +# there is no risk to increase these values. just using these defaults for now +SHM_MAX_WIDTH = 1920 +SHM_MAX_HEIGHT = 1080 + + +@dataclass +class InitializeContextRequest: + MSG_ID: ClassVar[int] = 0 + dev_mode: bool = False + remote_debugging_port: int = 0 + root_cache_path: str = "" + + def write(self, b: io.BytesIO) -> None: + channel.write_bool(b, self.dev_mode) + channel.write_int(b, self.remote_debugging_port) + channel.write_string(b, self.root_cache_path) + + def read(self, b: io.BytesIO) -> None: + self.dev_mode = channel.read_bool(b) + self.remote_debugging_port = channel.read_int(b) + self.root_cache_path = channel.read_string(b) + + +@dataclass +class ContextInitializedResponse: + MSG_ID: ClassVar[int] = 1 + + +@dataclass +class CreateBrowserRequest: + MSG_ID: ClassVar[int] = 2 + page_id: int = -1 + url: str = "" + framerate: int = 0 + width: int = 0 + height: int = 0 + shm_name: str = "" + + def write(self, b: io.BytesIO) -> None: + channel.write_int(b, self.page_id) + channel.write_string(b, self.url) + channel.write_int(b, self.framerate) + channel.write_int(b, self.width) + channel.write_int(b, self.height) + channel.write_string(b, self.shm_name) + + def read(self, b: io.BytesIO) -> None: + self.page_id = channel.read_int(b) + self.url = channel.read_string(b) + self.framerate = channel.read_int(b) + self.width = channel.read_int(b) + self.height = channel.read_int(b) + self.shm_name = channel.read_string(b) + + +@dataclass +class CreateBrowserResponse: + """ + This is going to wait for the created_callback to be called. + (The create_browser function will be async) + """ + + MSG_ID: ClassVar[int] = 3 + page_id: int = -1 + browser_id: int = 0 + + def write(self, b: io.BytesIO) -> None: + channel.write_int(b, self.page_id) + channel.write_int(b, self.browser_id) + + def read(self, b: io.BytesIO) -> None: + self.page_id = channel.read_int(b) + self.browser_id = channel.read_int(b) + + +@dataclass +class AcquirePaintData: + MSG_ID: ClassVar[int] = 4 + page_id: int = -1 + width: int = 0 + height: int = 0 + dirty_rects: list[tuple[int, int, int, int]] = field(default_factory=list) + + def write(self, b: io.BytesIO) -> None: + channel.write_int(b, self.page_id) + channel.write_int(b, self.width) + channel.write_int(b, self.height) + channel.write_int(b, len(self.dirty_rects)) + for rect in self.dirty_rects: + channel.write_int(b, rect[0]) + channel.write_int(b, rect[1]) + channel.write_int(b, rect[2]) + channel.write_int(b, rect[3]) + + def read(self, b: io.BytesIO) -> None: + self.page_id = channel.read_int(b) + self.width = channel.read_int(b) + self.height = channel.read_int(b) + num_rects = channel.read_int(b) + self.dirty_rects = [] + for _ in range(num_rects): + x = channel.read_int(b) + y = channel.read_int(b) + width = channel.read_int(b) + height = channel.read_int(b) + self.dirty_rects.append((x, y, width, height)) + + +@dataclass +class ReleasePaintData: + MSG_ID: ClassVar[int] = 5 + page_id: int = -1 + + def write(self, b: io.BytesIO) -> None: + channel.write_int(b, self.page_id) + + def read(self, b: io.BytesIO) -> None: + self.page_id = channel.read_int(b) + + +@dataclass +class CloseBrowserRequest: + MSG_ID: ClassVar[int] = 6 + page_id: int = -1 + + def write(self, b: io.BytesIO) -> None: + channel.write_int(b, self.page_id) + + def read(self, b: io.BytesIO) -> None: + self.page_id = channel.read_int(b) + + +@dataclass +class BrowserClosed: + MSG_ID: ClassVar[int] = 7 + page_id: int = -1 + + def write(self, b: io.BytesIO) -> None: + channel.write_int(b, self.page_id) + + def read(self, b: io.BytesIO) -> None: + self.page_id = channel.read_int(b) + + +IPC_MESSAGES = { + InitializeContextRequest.MSG_ID: InitializeContextRequest, + ContextInitializedResponse.MSG_ID: ContextInitializedResponse, + CreateBrowserRequest.MSG_ID: CreateBrowserRequest, + CreateBrowserResponse.MSG_ID: CreateBrowserResponse, + AcquirePaintData.MSG_ID: AcquirePaintData, + ReleasePaintData.MSG_ID: ReleasePaintData, + CloseBrowserRequest.MSG_ID: CloseBrowserRequest, + BrowserClosed.MSG_ID: BrowserClosed, +} + + +def copy_paint_data( + acq: AcquirePaintData, + old_width: int, + old_height: int, + source: memoryview, + dest: memoryview, +): + dirty_rects = acq.dirty_rects + + # source_arr = np.frombuffer(source, dtype=np.uint32).reshape((acq.height, acq.width)) + source_arr = np.ndarray( + (acq.height, acq.width), + dtype=np.uint32, + buffer=source, + ) + dest_arr = np.ndarray( + (acq.height, acq.width), + dtype=np.uint32, + buffer=dest, + ) + + has_fullscreen_rect = len(dirty_rects) == 1 and dirty_rects[0] == ( + 0, + 0, + acq.width, + acq.height, + ) + if old_width != acq.width or old_height != acq.height or has_fullscreen_rect: + np.copyto(dest_arr, source_arr) + else: + for rect in dirty_rects: + x, y, w, h = rect + dest_arr[y : y + h, x : x + w] = source_arr[y : y + h, x : x + w] diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/utils.cpp b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/py.typed similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/utils.cpp rename to livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/py.typed diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/resources/__init__.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/resources/__init__.py new file mode 100644 index 000000000..2133c6432 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/resources/__init__.py @@ -0,0 +1 @@ +"""Used by importlib.resources and setuptools""" diff --git a/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py new file mode 100644 index 000000000..f3454fa71 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/livekit/plugins/browser/version.py @@ -0,0 +1,15 @@ +# Copyright 2023 LiveKit, Inc. + +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__version__ = "0.0.2" diff --git a/livekit-plugins/livekit-plugins-browser/package.json b/livekit-plugins/livekit-plugins-browser/package.json new file mode 100644 index 000000000..795a90d4e --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/package.json @@ -0,0 +1,5 @@ +{ + "name": "livekit-plugins-browser", + "private": true, + "version": "0.0.2" +} diff --git a/livekit-plugins/livekit-plugins-browser/pyproject.toml b/livekit-plugins/livekit-plugins-browser/pyproject.toml new file mode 100644 index 000000000..4ece2e4c8 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/pyproject.toml @@ -0,0 +1,9 @@ +[build-system] +requires = ["setuptools>=61.0"] +build-backend = "setuptools.build_meta" + +[tool.cibuildwheel.macos] +repair-wheel-command = "" # getting issues with unresolved files + +[tool.cibuildwheel] +before-build = "pip install pybind11[global]" \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-browser/setup.py b/livekit-plugins/livekit-plugins-browser/setup.py new file mode 100644 index 000000000..96c557142 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/setup.py @@ -0,0 +1,126 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pathlib +import re +import subprocess +import sys +from pathlib import Path + +import setuptools +from setuptools import Extension +from setuptools.command.build_ext import build_ext + +here = pathlib.Path(__file__).parent.resolve() +about = {} +with open(os.path.join(here, "livekit", "plugins", "browser", "version.py"), "r") as f: + exec(f.read(), about) + + +class CMakeExtension(Extension): + def __init__(self, name: str, sourcedir: str = "") -> None: + super().__init__(name, sources=[]) + self.sourcedir = os.fspath(Path(sourcedir).resolve()) + + +class CMakeBuild(build_ext): + def build_extension(self, ext: CMakeExtension) -> None: + # Must be in this form due to bug in .resolve() only fixed in Python 3.10+ + ext_fullpath = Path.cwd() / self.get_ext_fullpath(ext.name) + extdir = ext_fullpath.parent.resolve() + + debug = int(os.environ.get("DEBUG", 0)) if self.debug is None else self.debug + cfg = "Debug" if debug else "Release" + + cmake_args = [ + f"-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={extdir}", + f"-DPYTHON_EXECUTABLE={sys.executable}", + f"-DCMAKE_BUILD_TYPE={cfg}", + ] + + print(f"cmake_args: {cmake_args}") + + if sys.platform.startswith("darwin"): + # Cross-compile support for macOS - respect ARCHFLAGS if set + archs = re.findall(r"-arch (\S+)", os.environ.get("ARCHFLAGS", "")) + if archs: + cmake_args += ["-DCMAKE_OSX_ARCHITECTURES={}".format(";".join(archs))] + + self.build_temp = Path(self.build_temp) / ext.name + if not self.build_temp.exists(): + self.build_temp.mkdir(parents=True) + + subprocess.run( + ["cmake", ext.sourcedir, *cmake_args], cwd=self.build_temp, check=True + ) + subprocess.run(["cmake", "--build", "."], cwd=self.build_temp, check=True) + + build_output = self.build_temp / "src" / cfg + + for f in build_output.iterdir(): + if f.suffix == ".so": + self.copy_file(f, extdir / f.name) + + if sys.platform.startswith("darwin"): + # on macos, copy the dummy app + app = build_output / "lkcef_app.app" + self.copy_tree( + app, + str( + extdir + / "livekit" + / "plugins" + / "browser" + / "resources" + / "lkcef_app.app" + ), + ) + + +setuptools.setup( + name="livekit-plugins-browser", + version=about["__version__"], + description="Chromium Embedded Framework (CEF) for LiveKit Agents", + long_description=(here / "README.md").read_text(encoding="utf-8"), + long_description_content_type="text/markdown", + url="https://github.com/livekit/agents", + classifiers=[ + "Intended Audience :: Developers", + "License :: OSI Approved :: Apache Software License", + "Topic :: Multimedia :: Sound/Audio", + "Topic :: Multimedia :: Video", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3 :: Only", + ], + keywords=["webrtc", "realtime", "audio", "video", "livekit"], + license="Apache-2.0", + ext_modules=[CMakeExtension("lkcef_python")], + cmdclass={"build_ext": CMakeBuild}, + packages=setuptools.find_namespace_packages(include=["livekit.*"]), + python_requires=">=3.9.0", + install_requires=["livekit-agents>=0.8.0"], + package_data={ + "livekit.plugins.browser": ["py.typed"], + "livekit.plugins.browser.resources": ["**", "lkcef_app.app"], + }, + project_urls={ + "Documentation": "https://docs.livekit.io", + "Website": "https://livekit.io/", + "Source": "https://github.com/livekit/agents", + }, +) diff --git a/livekit-plugins/livekit-plugins-browser/src/.gitignore b/livekit-plugins/livekit-plugins-browser/src/.gitignore new file mode 100644 index 000000000..28f37169b --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/.gitignore @@ -0,0 +1,3 @@ +Debug/ +Release/ +lib* \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/CMakeLists.txt b/livekit-plugins/livekit-plugins-browser/src/CMakeLists.txt similarity index 90% rename from livekit-plugins/livekit-plugins-browser/cef/src/CMakeLists.txt rename to livekit-plugins/livekit-plugins-browser/src/CMakeLists.txt index 648f16864..298ee3c37 100644 --- a/livekit-plugins/livekit-plugins-browser/cef/src/CMakeLists.txt +++ b/livekit-plugins/livekit-plugins-browser/src/CMakeLists.txt @@ -15,14 +15,17 @@ FetchContent_Declare(imgui GIT_REPOSITORY https://github.com/ocornut/imgui GIT_T FetchContent_GetProperties(imgui) FetchContent_MakeAvailable(imgui) file(GLOB IMGUI_SOURCES ${imgui_SOURCE_DIR}/*.cpp) -file(GLOB IMGUI_HEADERS ${imgui_SOURCE_DIR}/*.h) -add_library(imgui STATIC ${IMGUI_SOURCES} ${IMGUI_SOURCES} ${imgui_SOURCE_DIR}/backends/imgui_impl_glfw.cpp ${imgui_SOURCE_DIR}/backends/imgui_impl_opengl3.cpp) +add_library(imgui STATIC ${IMGUI_SOURCES} + ${imgui_SOURCE_DIR}/backends/imgui_impl_glfw.cpp + ${imgui_SOURCE_DIR}/backends/imgui_impl_opengl3.cpp + ${imgui_SOURCE_DIR}/misc/cpp/imgui_stdlib.cpp +) set_target_properties(imgui PROPERTIES CXX_STANDARD 17) -target_include_directories(imgui PUBLIC ${imgui_SOURCE_DIR} ${imgui_SOURCE_DIR}/backends ${GLFW_INCLUDE_DIR}) +target_include_directories(imgui PUBLIC ${imgui_SOURCE_DIR} ${imgui_SOURCE_DIR}/misc/cpp ${imgui_SOURCE_DIR}/backends ${GLFW_INCLUDE_DIR}) target_link_libraries(imgui PRIVATE glfw) -set(LKCEF_SRCS app.cpp app.hpp handler.hpp handler.cpp dev_renderer.hpp dev_renderer.cpp) +set(LKCEF_SRCS app.cpp app.hpp handler.hpp handler.cpp dev_renderer.hpp dev_renderer.cpp gleq.h browser_handle.hpp browser_handle.cpp) set(LKCEF_SRCS_LINUX main_linux.cpp) set(LKCEF_SRCS_MAC app_mac.mm) set(LKCEF_SRCS_WINDOWS main_win.cpp ) @@ -86,8 +89,12 @@ if(OS_MAC) cmake_policy(SET CMP0068 NEW) endif() - # output path for the main app bundle. - set(LKCEF_APP "${CEF_TARGET_OUT_DIR}/lkcef_app.app") + add_executable(lkcef_app MACOSX_BUNDLE dummy.cpp) # dummy app + set_target_properties(lkcef_app PROPERTIES + MACOSX_BUNDLE_INFO_PLIST "${CMAKE_CURRENT_SOURCE_DIR}/resources/lkcefapp-Info.plist" + OUTPUT_NAME "lkcef_app" + ) + # library target. add_library(lkcef STATIC ${LKCEF_SRCS}) @@ -103,10 +110,7 @@ if(OS_MAC) COMMAND ${CMAKE_COMMAND} -E copy_directory "${CEF_BINARY_DIR}/Chromium Embedded Framework.framework" - "${LKCEF_APP}/Contents/Frameworks/Chromium Embedded Framework.framework" - # Copy the library into the main app bindle. COMMAND ${CMAKE_COMMAND} -E - # copy_if_different "${CEF_TARGET_OUT_DIR}/liblkcef.dylib" - # "${LKCEF_APP}/Contents/MacOS/liblkcef.dylib" + "$/Contents/Frameworks/Chromium Embedded Framework.framework" VERBATIM) # Create the multiple Helper app bundle targets. @@ -140,6 +144,8 @@ if(OS_MAC) add_dependencies(${_helper_target} libcef_dll_wrapper) target_link_libraries(${_helper_target} libcef_dll_wrapper ${CEF_STANDARD_LIBS}) + + set_target_properties( ${_helper_target} PROPERTIES MACOSX_BUNDLE_INFO_PLIST ${_helper_info_plist} @@ -155,7 +161,7 @@ if(OS_MAC) COMMAND ${CMAKE_COMMAND} -E copy_directory "${CEF_TARGET_OUT_DIR}/${_helper_output_name}.app" - "${LKCEF_APP}/Contents/Frameworks/${_helper_output_name}.app" + "$/Contents/Frameworks/${_helper_output_name}.app" VERBATIM) endforeach() endif() diff --git a/livekit-plugins/livekit-plugins-browser/src/agents_python.cpp b/livekit-plugins/livekit-plugins-browser/src/agents_python.cpp new file mode 100644 index 000000000..bf344f867 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/agents_python.cpp @@ -0,0 +1,138 @@ +#include "agents_python.hpp" + +#include +#include +#include + +#include "app.hpp" +#include "include/base/cef_callback.h" +#include "include/internal/cef_mac.h" +#include "include/wrapper/cef_closure_task.h" + +namespace py = pybind11; + +BrowserApp::BrowserApp(const AppOptions& options) : options_(options) { + app_ = new AgentApp(options_.dev_mode, options.remote_debugging_port, + options.root_cache_path, options.framework_path, + options.main_bundle_path, options.subprocess_path, + options_.initialized_callback); +} + +bool BrowserApp::CreateBrowser(const std::string& url, + const BrowserOptions& options) { + if (CefCurrentlyOn(TID_UI)) { + CreateBrowserOnUIThread(url, options); + return true; + } + + // TODO(theomonnom): Document base::Unretained + CefPostTask(TID_UI, base::BindOnce(&BrowserApp::CreateBrowserOnUIThread, + base::Unretained(this), url, options)); + + return true; +} + +void BrowserApp::CreateBrowserOnUIThread(const std::string& url, + const BrowserOptions& options) { + std::shared_ptr browser_impl = std::make_shared(); + browsers_.push_back(browser_impl); + + CefRefPtr handle = app_->CreateBrowser( + url, options.framerate, options.width, options.height, + [options, browser_impl]() { options.created_callback(browser_impl); }, + [options](std::vector dirtyRects, const void* buffer, int width, + int height) { + PaintData event{}; + std::vector rects; + rects.reserve(dirtyRects.size()); + + for (const auto& rect : dirtyRects) { + rects.push_back({rect.x, rect.y, rect.width, rect.height}); + } + + event.dirtyRect = rects; + event.buffer = buffer; + event.width = width; + event.height = height; + options.paint_callback(event); + }, + options.close_callback); + + browser_impl->handle = handle; +} + +int BrowserApp::Run() { + return RunAgentApp(app_); +} + +BrowserImpl::BrowserImpl() {} + +void BrowserImpl::SetSize(int width, int height) { + if (handle) + handle->SetSize(width, height); +} + +void BrowserImpl::Close() { + if (handle) + handle->Close(); +} + +int BrowserImpl::Identifier() const { + return handle->GetBrowser()->GetIdentifier(); +} + +py::memoryview paint_data_to_memoryview(const PaintData& event) { + return py::memoryview::from_buffer( + const_cast(static_cast(event.buffer)), + {event.height * event.width}, {sizeof(uint32_t)}, true); +} + +PYBIND11_MODULE(lkcef_python, m) { + // Isn't that fucking cool? llm using browsers + m.doc() = "Chromium Embedded Framework (CEF) for LiveKit Agents"; + + py::class_(m, "AppOptions") + .def(py::init()) + .def_readwrite("dev_mode", &AppOptions::dev_mode) + .def_readwrite("remote_debugging_port", + &AppOptions::remote_debugging_port) + .def_readwrite("root_cache_path", &AppOptions::root_cache_path) + .def_readwrite("framework_path", &AppOptions::framework_path) + .def_readwrite("main_bundle_path", &AppOptions::main_bundle_path) + .def_readwrite("subprocess_path", &AppOptions::subprocess_path) + .def_readwrite("initialized_callback", &AppOptions::initialized_callback); + + py::class_(m, "BrowserOptions") + .def(py::init()) + .def_readwrite("framerate", &BrowserOptions::framerate) + .def_readwrite("width", &BrowserOptions::width) + .def_readwrite("height", &BrowserOptions::height) + .def_readwrite("created_callback", &BrowserOptions::created_callback) + .def_readwrite("paint_callback", &BrowserOptions::paint_callback) + .def_readwrite("close_callback", &BrowserOptions::close_callback); + + py::class_(m, "BrowserApp") + .def(py::init()) + .def("create_browser", &BrowserApp::CreateBrowser) + .def("run", &BrowserApp::Run, py::call_guard()); + + py::class_>(m, "BrowserImpl") + .def("set_size", &BrowserImpl::SetSize) + .def("close", &BrowserImpl::Close) + .def("identifier", &BrowserImpl::Identifier); + + py::class_(m, "PaintRect") + .def_readwrite("x", &PaintRect::x) + .def_readwrite("y", &PaintRect::y) + .def_readwrite("width", &PaintRect::width) + .def_readwrite("height", &PaintRect::height); + + py::class_(m, "PaintData") + .def(py::init()) + .def_readwrite("dirty_rects", &PaintData::dirtyRect) + .def_readwrite("width", &PaintData::width) + .def_readwrite("height", &PaintData::height) + .def_property_readonly("buffer", [](const PaintData& event) { + return paint_data_to_memoryview(event); + }); +} diff --git a/livekit-plugins/livekit-plugins-browser/src/agents_python.hpp b/livekit-plugins/livekit-plugins-browser/src/agents_python.hpp new file mode 100644 index 000000000..7312b464c --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/agents_python.hpp @@ -0,0 +1,69 @@ +#ifndef LKCEF_AGENTS_PYTHON_HPP +#define LKCEF_AGENTS_PYTHON_HPP + +#include +#include + +#include "app.hpp" + +class BrowserImpl; +struct PaintData; + +struct AppOptions { + bool dev_mode = false; + int remote_debugging_port = 0; + std::string root_cache_path; + std::string framework_path; + std::string main_bundle_path; + std::string subprocess_path; + std::function initialized_callback = nullptr; +}; + +struct BrowserOptions { + int framerate = 30; + int width = 800; + int height = 600; + std::function)> created_callback = nullptr; + std::function paint_callback = nullptr; + std::function close_callback = nullptr; +}; + +struct BrowserApp { + BrowserApp(const AppOptions& options); + + bool CreateBrowser(const std::string& url, const BrowserOptions& options); + void CreateBrowserOnUIThread(const std::string& url, const BrowserOptions& options); + + int Run(); + + private: + AppOptions options_; + CefRefPtr app_; + std::list> browsers_; +}; + +struct BrowserImpl { + BrowserImpl(); + + void SetSize(int width, int height); + void Close(); + int Identifier() const; + + CefRefPtr handle = nullptr; +}; + +struct PaintRect { + int x = 0; + int y = 0; + int width = 0; + int height = 0; +}; + +struct PaintData { + std::vector dirtyRect; + const void* buffer; + int width; + int height; +}; + +#endif // LKCEF_AGENTS_PYTHON_HPP diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/app.cpp b/livekit-plugins/livekit-plugins-browser/src/app.cpp similarity index 50% rename from livekit-plugins/livekit-plugins-browser/cef/src/app.cpp rename to livekit-plugins/livekit-plugins-browser/src/app.cpp index 1dfbe7976..ae688bb54 100644 --- a/livekit-plugins/livekit-plugins-browser/cef/src/app.cpp +++ b/livekit-plugins/livekit-plugins-browser/src/app.cpp @@ -8,11 +8,24 @@ #include "include/views/cef_window.h" #include "include/wrapper/cef_helpers.h" -AgentApp::AgentApp(bool dev_mode, std::function initialized_callback) +AgentApp::AgentApp(bool dev_mode, + int remote_debugging_port, + std::string root_cache_path, + std::string framework_path, + std::string main_bundle_path, + std::string subprocess_path, + std::function initialized_callback) : dev_mode_(dev_mode), + remote_debugging_port_(remote_debugging_port), + root_cache_path_(std::move(root_cache_path)), + framework_path_(std::move(framework_path)), + main_bundle_path_(std::move(main_bundle_path)), + subprocess_path_(std::move(subprocess_path)), initialized_callback_(std::move(initialized_callback)) { + browser_store_ = CefRefPtr(new BrowserStore()); + if (dev_mode) - dev_renderer_ = CefRefPtr(new DevRenderer()); + dev_renderer_ = CefRefPtr(new DevRenderer(browser_store_)); } void AgentApp::OnBeforeCommandLineProcessing( @@ -20,12 +33,15 @@ void AgentApp::OnBeforeCommandLineProcessing( CefRefPtr command_line) { command_line->AppendSwitch("--disable-gpu"); command_line->AppendSwitch("--disable-gpu-compositing"); + command_line->AppendSwitch("--enable-chrome-runtime"); // command_line->AppendSwitch("--enable-begin-frame-scheduling"); } void AgentApp::OnContextInitialized() { CEF_REQUIRE_UI_THREAD(); // Main thread in our case - client_ = CefRefPtr(new AgentHandler(dev_renderer_)); + client_ = + CefRefPtr(new AgentHandler(browser_store_, dev_renderer_)); + dev_client_ = CefRefPtr(new DevToolsHandler()); if (initialized_callback_) initialized_callback_(); @@ -38,27 +54,34 @@ CefRefPtr AgentApp::GetDefaultClient() { CefRefPtr AgentApp::CreateBrowser( const std::string& url, int framerate, - std::function created_callback) { + int width, + int height, + std::function created_callback, + std::function dirtyRects, + const void* buffer, + int width, + int height)> paint_callback, + std::function close_callback) { CEF_REQUIRE_UI_THREAD(); - CefWindowInfo windowInfo; + // windowInfo.SetAsWindowless(dev_renderer_->getNativeWindowHandle()); + CefWindowInfo windowInfo; windowInfo.SetAsWindowless(nullptr); - CefRefPtr command_line = - CefCommandLine::GetGlobalCommandLine(); - CefBrowserSettings settings; + settings.windowless_frame_rate = framerate; settings.background_color = CefColorSetARGB(255, 255, 255, 255); CefRefPtr browser_handle = - new BrowserHandle(created_callback); + new BrowserHandle(std::move(created_callback), std::move(paint_callback), + std::move(close_callback), width, height); - client_->AddPendingHandle(browser_handle); + browser_store_->AddPendingHandle(browser_handle); bool result = CefBrowserHost::CreateBrowser(windowInfo, client_, url, settings, nullptr, nullptr); if (!result) { - client_->RemovePendingHandle(browser_handle); + browser_store_->RemovePendingHandle(browser_handle); return nullptr; } return browser_handle; @@ -71,5 +94,7 @@ int AgentApp::Run() { CefRunMessageLoop(); } + // Close all browsers + return 0; } diff --git a/livekit-plugins/livekit-plugins-browser/src/app.hpp b/livekit-plugins/livekit-plugins-browser/src/app.hpp new file mode 100644 index 000000000..da7a27cee --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/app.hpp @@ -0,0 +1,75 @@ +#ifndef LKCEF_APP_HPP +#define LKCEF_APP_HPP + +#include "browser_handle.hpp" +#include "dev_renderer.hpp" +#include "handler.hpp" +#include "include/cef_app.h" +#include "include/cef_base.h" +#include "include/cef_browser_process_handler.h" +#include "include/cef_client.h" +#include "include/internal/cef_ptr.h" + +class AgentApp : public CefApp, public CefBrowserProcessHandler { + public: + AgentApp(bool dev_mode, + int remote_debugging_port, + std::string root_cache_path, + std::string framework_path, + std::string main_bundle_path, + std::string subprocess_path, + std::function initialized_callback); + + CefRefPtr GetBrowserProcessHandler() override { + return this; + } + + void OnBeforeCommandLineProcessing( + const CefString& process_type, + CefRefPtr command_line) override; + + void OnContextInitialized() override; + + CefRefPtr GetDefaultClient() override; + + CefRefPtr CreateBrowser( + const std::string& url, + int framerate, + int width, + int height, + std::function created_callback, + std::function dirtyRect, + const void* buffer, + int width, + int height)> paint_callback, + std::function close_callback); + + int Run(); + + bool IsDevMode() const { return dev_mode_; } + int GetRemoteDebuggingPort() const { return remote_debugging_port_; } + std::string GetRootCachePath() const { return root_cache_path_; } + std::string GetFrameworkPath() const { return framework_path_; } + std::string GetMainBundlePath() const { return main_bundle_path_; } + std::string GetSubprocessPath() const { return subprocess_path_; } + + private: + IMPLEMENT_REFCOUNTING(AgentApp); + + CefRefPtr browser_store_; + CefRefPtr client_; + CefRefPtr dev_client_; + CefRefPtr dev_renderer_; + + bool dev_mode_; + int remote_debugging_port_; + std::string root_cache_path_; + std::string framework_path_; + std::string main_bundle_path_; + std::string subprocess_path_; + std::function initialized_callback_; +}; + +int RunAgentApp(CefRefPtr app); + +#endif // LKCEF_APP_HPP diff --git a/livekit-plugins/livekit-plugins-browser/src/app_mac.mm b/livekit-plugins/livekit-plugins-browser/src/app_mac.mm new file mode 100644 index 000000000..68a5822bf --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/app_mac.mm @@ -0,0 +1,110 @@ + +#import + +#include + +#import +#include + +#include "app.hpp" +#include "handler.hpp" +#include "include/cef_application_mac.h" +#include "include/cef_command_line.h" +#include "include/wrapper/cef_library_loader.h" + +BOOL g_handling_send_event = false; + +@interface NSApplication (AgentsApplication) + +- (BOOL)isHandlingSendEvent; +- (void)setHandlingSendEvent:(BOOL)handlingSendEvent; +- (void)_swizzled_sendEvent:(NSEvent*)event; +- (void)_swizzled_terminate:(id)sender; + +@end + +@implementation NSApplication (AgentsApplication) + +// This selector is called very early during the application initialization. ++ (void)load { + NSLog(@"AgentsApplication::load"); + // Swap NSApplication::sendEvent with _swizzled_sendEvent. + Method original = class_getInstanceMethod(self, @selector(sendEvent)); + Method swizzled = + class_getInstanceMethod(self, @selector(_swizzled_sendEvent)); + method_exchangeImplementations(original, swizzled); + + Method originalTerm = class_getInstanceMethod(self, @selector(terminate:)); + Method swizzledTerm = + class_getInstanceMethod(self, @selector(_swizzled_terminate:)); + method_exchangeImplementations(originalTerm, swizzledTerm); +} + +- (BOOL)isHandlingSendEvent { + return g_handling_send_event; +} + +- (void)setHandlingSendEvent:(BOOL)handlingSendEvent { + g_handling_send_event = handlingSendEvent; +} + +- (void)_swizzled_sendEvent:(NSEvent*)event { + CefScopedSendingEvent sendingEventScoper; + // Calls NSApplication::sendEvent due to the swizzling. + [self _swizzled_sendEvent:event]; +} + +- (void)_swizzled_terminate:(id)sender { + [self _swizzled_terminate:sender]; +} + +@end + +// Entry point function for the browser process. +int RunAgentApp(CefRefPtr app) { + CefMainArgs main_args(0, nullptr); + + @autoreleasepool { + [NSApplication sharedApplication]; + + // If there was an invocation to NSApp prior to this method, then the NSApp + // will not be a AgentsApplication, but will instead be an NSApplication. + // This is undesirable and we must enforce that this doesn't happen. + CHECK([NSApp isKindOfClass:[NSApplication class]]); + + std::string framework_lib = app->GetFrameworkPath() + "/Chromium Embedded Framework"; + if (!cef_load_library(framework_lib.c_str())) { + std::cerr << "lkcef: Failed to load CEF library" << std::endl; + return 1; + } + + CefSettings settings{}; + settings.chrome_runtime = true; + settings.external_message_pump = app->IsDevMode(); + settings.remote_debugging_port = app->GetRemoteDebuggingPort(); + CefString(&settings.root_cache_path).FromString(app->GetRootCachePath()); + CefString(&settings.framework_dir_path).FromString(app->GetFrameworkPath()); + CefString(&settings.main_bundle_path).FromString(app->GetMainBundlePath()); + CefString(&settings.browser_subprocess_path).FromString(app->GetSubprocessPath()); + + settings.no_sandbox = true; // No sandbox for MacOS, for livekit-agents, + // we're only going to support Linux + settings.windowless_rendering_enabled = true; + + // Initialize the CEF browser process. May return false if initialization + // fails or if early exit is desired (for example, due to process singleton + // relaunch behavior). + if (!CefInitialize(main_args, settings, app.get(), nullptr)) { + std::cerr << "lkcef: Failed to initialize CEF" << std::endl; + // TODO(theomonnom): Use CefGetExitCode(); + return 1; + } + + app->Run(); + CefShutdown(); + + cef_unload_library(); + } // @autoreleasepool + + return 0; +} diff --git a/livekit-plugins/livekit-plugins-browser/src/browser_handle.cpp b/livekit-plugins/livekit-plugins-browser/src/browser_handle.cpp new file mode 100644 index 000000000..9e0893bef --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/browser_handle.cpp @@ -0,0 +1,15 @@ +#include "browser_handle.hpp" + +void BrowserHandle::SetSize(int width, int height) { + width_ = width; + height_ = height; + + if (browser_) + browser_->GetHost()->WasResized(); +} + + +void BrowserHandle::Close() { + if (browser_) + browser_->GetHost()->CloseBrowser(true); +} diff --git a/livekit-plugins/livekit-plugins-browser/src/browser_handle.hpp b/livekit-plugins/livekit-plugins-browser/src/browser_handle.hpp new file mode 100644 index 000000000..d93da9dad --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/browser_handle.hpp @@ -0,0 +1,72 @@ +#ifndef LKCEF_BROWSER_HANDLE_HPP +#define LKCEF_BROWSER_HANDLE_HPP + +#include + +#include "include/cef_client.h" +#include "include/wrapper/cef_helpers.h" + +class BrowserHandle : public CefBaseRefCounted { + public: + BrowserHandle( + std::function created_callback, + std::function dirtyRects, + const void* buffer, + int width, + int height)> paint_callback, + std::function close_callback, + int width, + int height) + : created_callback_(std::move(created_callback)), + paint_callback_(std::move(paint_callback)), + close_callback_(std::move(close_callback)), + width_(width), + height_(height) {} + + CefRefPtr browser_ = nullptr; + std::function created_callback_ = nullptr; + std::function dirtyRect, + const void* buffer, + int width, + int height)> + paint_callback_ = nullptr; + std::function close_callback_ = nullptr; + + void SetSize(int width, int height); + void Close(); + + int GetWidth() const { return width_; } + int GetHeight() const { return height_; } + + CefRefPtr GetBrowser() const { return browser_; } + + private: + int width_ = 0; + int height_ = 0; + + IMPLEMENT_REFCOUNTING(BrowserHandle); +}; + +struct BrowserStore : public CefBaseRefCounted { + std::unordered_map> browser_handles_; + std::list> pending_handles_; + + void AddPendingHandle(CefRefPtr handle) { + CEF_REQUIRE_UI_THREAD(); + pending_handles_.push_back(handle); + } + + void RemovePendingHandle(CefRefPtr handle) { + CEF_REQUIRE_UI_THREAD(); + pending_handles_.remove(handle); + } + + CefRefPtr GetBrowserHandle(int identifier) { + CEF_REQUIRE_UI_THREAD(); + return browser_handles_[identifier]; + } + + IMPLEMENT_REFCOUNTING(BrowserStore); +}; + +#endif // LKCEF_BROWSER_HANDLE_HPP diff --git a/livekit-plugins/livekit-plugins-browser/src/dev_renderer.cpp b/livekit-plugins/livekit-plugins-browser/src/dev_renderer.cpp new file mode 100644 index 000000000..1eed5c94e --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/dev_renderer.cpp @@ -0,0 +1,593 @@ +#include "dev_renderer.hpp" + +#include + +#include "handler.hpp" + +#define IMGUI_DEFINE_MATH_OPERATORS +#include "imgui.h" +#include "imgui_impl_glfw.h" +#include "imgui_impl_opengl3.h" +#include "imgui_stdlib.h" +#include "include/cef_app.h" +#include "include/wrapper/cef_helpers.h" +#include "keyboard_codes.h" + +#define GLEQ_IMPLEMENTATION +#define GLEQ_STATIC +#include "gleq.h" + +// DCHECK on gl errors. +#if DCHECK_IS_ON() +#define VERIFY_NO_ERROR \ + { \ + int _gl_error = glGetError(); \ + DCHECK(_gl_error == GL_NO_ERROR) << "glGetError returned " << _gl_error; \ + } +#else +#define VERIFY_NO_ERROR +#endif + +int glfw_key_to_cef_key(int glfwKey) { + switch (glfwKey) { + case GLFW_KEY_SPACE: + return WebCore::VK_SPACE; + case GLFW_KEY_APOSTROPHE: + return WebCore::VK_OEM_7; + case GLFW_KEY_COMMA: + return WebCore::VK_OEM_COMMA; + case GLFW_KEY_MINUS: + return WebCore::VK_OEM_MINUS; + case GLFW_KEY_PERIOD: + return WebCore::VK_OEM_PERIOD; + case GLFW_KEY_SLASH: + return WebCore::VK_OEM_2; + case GLFW_KEY_0: + return WebCore::VK_0; + case GLFW_KEY_1: + return WebCore::VK_1; + case GLFW_KEY_2: + return WebCore::VK_2; + case GLFW_KEY_3: + return WebCore::VK_3; + case GLFW_KEY_4: + return WebCore::VK_4; + case GLFW_KEY_5: + return WebCore::VK_5; + case GLFW_KEY_6: + return WebCore::VK_6; + case GLFW_KEY_7: + return WebCore::VK_7; + case GLFW_KEY_8: + return WebCore::VK_8; + case GLFW_KEY_9: + return WebCore::VK_9; + case GLFW_KEY_SEMICOLON: + return WebCore::VK_OEM_1; + case GLFW_KEY_EQUAL: + return WebCore::VK_OEM_PLUS; + case GLFW_KEY_A: + return WebCore::VK_A; + case GLFW_KEY_B: + return WebCore::VK_B; + case GLFW_KEY_C: + return WebCore::VK_C; + case GLFW_KEY_D: + return WebCore::VK_D; + case GLFW_KEY_E: + return WebCore::VK_E; + case GLFW_KEY_F: + return WebCore::VK_F; + case GLFW_KEY_G: + return WebCore::VK_G; + case GLFW_KEY_H: + return WebCore::VK_H; + case GLFW_KEY_I: + return WebCore::VK_I; + case GLFW_KEY_J: + return WebCore::VK_J; + case GLFW_KEY_K: + return WebCore::VK_K; + case GLFW_KEY_L: + return WebCore::VK_L; + case GLFW_KEY_M: + return WebCore::VK_M; + case GLFW_KEY_N: + return WebCore::VK_N; + case GLFW_KEY_O: + return WebCore::VK_O; + case GLFW_KEY_P: + return WebCore::VK_P; + case GLFW_KEY_Q: + return WebCore::VK_Q; + case GLFW_KEY_R: + return WebCore::VK_R; + case GLFW_KEY_S: + return WebCore::VK_S; + case GLFW_KEY_T: + return WebCore::VK_T; + case GLFW_KEY_U: + return WebCore::VK_U; + case GLFW_KEY_V: + return WebCore::VK_V; + case GLFW_KEY_W: + return WebCore::VK_W; + case GLFW_KEY_X: + return WebCore::VK_X; + case GLFW_KEY_Y: + return WebCore::VK_Y; + case GLFW_KEY_Z: + return WebCore::VK_Z; + case GLFW_KEY_LEFT_BRACKET: + return WebCore::VK_OEM_4; + case GLFW_KEY_BACKSLASH: + return WebCore::VK_OEM_5; + case GLFW_KEY_RIGHT_BRACKET: + return WebCore::VK_OEM_6; + case GLFW_KEY_GRAVE_ACCENT: + return WebCore::VK_OEM_3; + case GLFW_KEY_ESCAPE: + return WebCore::VK_ESCAPE; + case GLFW_KEY_ENTER: + return WebCore::VK_RETURN; + case GLFW_KEY_TAB: + return WebCore::VK_TAB; + case GLFW_KEY_BACKSPACE: + return WebCore::VK_BACK; + case GLFW_KEY_INSERT: + return WebCore::VK_INSERT; + case GLFW_KEY_DELETE: + return WebCore::VK_DELETE; + case GLFW_KEY_RIGHT: + return WebCore::VK_RIGHT; + case GLFW_KEY_LEFT: + return WebCore::VK_LEFT; + case GLFW_KEY_DOWN: + return WebCore::VK_DOWN; + case GLFW_KEY_UP: + return WebCore::VK_UP; + case GLFW_KEY_PAGE_UP: + return WebCore::VK_PRIOR; + case GLFW_KEY_PAGE_DOWN: + return WebCore::VK_NEXT; + case GLFW_KEY_HOME: + return WebCore::VK_HOME; + case GLFW_KEY_END: + return WebCore::VK_END; + case GLFW_KEY_CAPS_LOCK: + return WebCore::VK_CAPITAL; + case GLFW_KEY_SCROLL_LOCK: + return WebCore::VK_SCROLL; + case GLFW_KEY_NUM_LOCK: + return WebCore::VK_NUMLOCK; + case GLFW_KEY_PRINT_SCREEN: + return WebCore::VK_SNAPSHOT; + case GLFW_KEY_PAUSE: + return WebCore::VK_PAUSE; + case GLFW_KEY_F1: + return WebCore::VK_F1; + case GLFW_KEY_F2: + return WebCore::VK_F2; + case GLFW_KEY_F3: + return WebCore::VK_F3; + case GLFW_KEY_F4: + return WebCore::VK_F4; + case GLFW_KEY_F5: + return WebCore::VK_F5; + case GLFW_KEY_F6: + return WebCore::VK_F6; + case GLFW_KEY_F7: + return WebCore::VK_F7; + case GLFW_KEY_F8: + return WebCore::VK_F8; + case GLFW_KEY_F9: + return WebCore::VK_F9; + case GLFW_KEY_F10: + return WebCore::VK_F10; + case GLFW_KEY_F11: + return WebCore::VK_F11; + case GLFW_KEY_F12: + return WebCore::VK_F12; + // Add more cases as needed + default: + return WebCore::VK_UNKNOWN; + } +} + +static uint32_t glfw_mods_to_cef_mods(int glfw_mods) { + uint32_t cef_flags = 0; + + if (glfw_mods & 0x0001) { // GLFW_MOD_SHIFT + cef_flags |= (1 << 1); // EVENTFLAG_SHIFT_DOWN + } + if (glfw_mods & 0x0002) { // GLFW_MOD_CONTROL + cef_flags |= (1 << 2); // EVENTFLAG_CONTROL_DOWN + } + if (glfw_mods & 0x0004) { // GLFW_MOD_ALT + cef_flags |= (1 << 3); // EVENTFLAG_ALT_DOWN + } + if (glfw_mods & 0x0008) { // GLFW_MOD_SUPER + cef_flags |= + (1 << 7); // EVENTFLAG_COMMAND_DOWN (Super key -> Command on Mac) + } + if (glfw_mods & 0x0010) { // GLFW_MOD_CAPS_LOCK + cef_flags |= (1 << 0); // EVENTFLAG_CAPS_LOCK_ON + } + if (glfw_mods & 0x0020) { // GLFW_MOD_NUM_LOCK + cef_flags |= (1 << 8); // EVENTFLAG_NUM_LOCK_ON + } + + return cef_flags; +} + +static std::optional glfw_button_to_cef_button( + int button) { + switch (button) { + case GLFW_MOUSE_BUTTON_LEFT: + return CefBrowserHost::MouseButtonType::MBT_LEFT; + case GLFW_MOUSE_BUTTON_MIDDLE: + return CefBrowserHost::MouseButtonType::MBT_MIDDLE; + case GLFW_MOUSE_BUTTON_RIGHT: + return CefBrowserHost::MouseButtonType::MBT_RIGHT; + default: + return std::nullopt; + } +} + +static void glfw_error_callback(int error, const char* description) { + fprintf(stderr, "GLFW Error %d: %s\n", error, description); +} + +DevRenderer::DevRenderer(CefRefPtr browser_store) + : browser_store_(browser_store) {} + +void DevRenderer::OnTitleChange(CefRefPtr browser, + const CefString& title) { + CEF_REQUIRE_UI_THREAD(); + int identifier = browser->GetIdentifier(); + BrowserData* data = &browser_data_[identifier]; + data->title = title; +} + +void DevRenderer::OnLoadingStateChange(CefRefPtr browser, + bool isLoading, + bool canGoBack, + bool canGoForward) { + if (!isLoading) { + int identifier = browser->GetIdentifier(); + BrowserData* data = &browser_data_[identifier]; + data->url = browser->GetMainFrame()->GetURL(); + } +} + +void DevRenderer::OnAfterCreated(CefRefPtr browser) { + CEF_REQUIRE_UI_THREAD(); + int identifier = browser->GetIdentifier(); + + unsigned int texture_id; + glGenTextures(1, &texture_id); + VERIFY_NO_ERROR; + + BrowserData data{}; + data.browser = browser; + data.texture_id = texture_id; + browser_data_.insert({identifier, data}); + + glBindTexture(GL_TEXTURE_2D, texture_id); + VERIFY_NO_ERROR; + glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); + VERIFY_NO_ERROR; + glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); +} + +void DevRenderer::OnPaint(CefRefPtr browser, + CefRenderHandler::PaintElementType type, + const CefRenderHandler::RectList& dirtyRects, + const void* buffer, + int width, + int height) { + CEF_REQUIRE_UI_THREAD(); + + if (type != CefRenderHandler::PaintElementType::PET_VIEW) { + return; // Ignore PET_POPUP for now, bc I'm lazy + } + + int identifier = browser->GetIdentifier(); + BrowserData* data = &browser_data_[identifier]; + + int old_width = data->view_width; + int old_height = data->view_height; + + data->view_width = width; + data->view_height = height; + + glBindTexture(GL_TEXTURE_2D, data->texture_id); + + glPixelStorei(GL_UNPACK_ROW_LENGTH, width); + VERIFY_NO_ERROR; + + bool has_fullscreen_rect = + dirtyRects.size() == 1 && dirtyRects[0] == CefRect(0, 0, width, height); + + if (old_width != width || old_height != height || has_fullscreen_rect) { + glPixelStorei(GL_UNPACK_SKIP_PIXELS, 0); + VERIFY_NO_ERROR; + glPixelStorei(GL_UNPACK_SKIP_ROWS, 0); + VERIFY_NO_ERROR; + glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_BGRA, + GL_UNSIGNED_INT_8_8_8_8_REV, buffer); + VERIFY_NO_ERROR; + } else { + CefRenderHandler::RectList::const_iterator i = dirtyRects.begin(); + for (; i != dirtyRects.end(); ++i) { + const CefRect& rect = *i; + glPixelStorei(GL_UNPACK_SKIP_PIXELS, rect.x); + VERIFY_NO_ERROR; + glPixelStorei(GL_UNPACK_SKIP_ROWS, rect.y); + VERIFY_NO_ERROR; + glTexSubImage2D(GL_TEXTURE_2D, 0, rect.x, rect.y, rect.width, rect.height, + GL_BGRA, GL_UNSIGNED_INT_8_8_8_8_REV, buffer); + VERIFY_NO_ERROR; + } + } +} + +void DevRenderer::OnBeforeClose(CefRefPtr browser) { + CEF_REQUIRE_UI_THREAD(); + int identifier = browser->GetIdentifier(); + BrowserData* data = &browser_data_[identifier]; + glDeleteTextures(1, &data->texture_id); + browser_data_.erase(identifier); +} + +void DevRenderer::Run() { + glfwSetErrorCallback(glfw_error_callback); + + if (!glfwInit()) { + std::cerr << "Failed to initialize GLFW" << std::endl; + return; + } + + gleqInit(); + + const char* glsl_version = "#version 150"; + glfwWindowHint(GLFW_CONTEXT_VERSION_MAJOR, 3); + glfwWindowHint(GLFW_CONTEXT_VERSION_MINOR, 2); + glfwWindowHint(GLFW_OPENGL_PROFILE, GLFW_OPENGL_CORE_PROFILE); + glfwWindowHint(GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE); + + window_ = + glfwCreateWindow(800, 600, "livekit-plugins-browser (Development Window)", + nullptr, nullptr); + + gleqTrackWindow(window_); + + if (!window_) { + std::cerr << "Failed to create GLFW window" << std::endl; + glfwTerminate(); + return; + } + glfwMakeContextCurrent(window_); + glfwSwapInterval(1); // Enable vsync + + IMGUI_CHECKVERSION(); + + ImGui::CreateContext(); + ImGuiIO& io = ImGui::GetIO(); + io.ConfigFlags |= ImGuiConfigFlags_NavEnableKeyboard; + io.ConfigFlags |= ImGuiConfigFlags_DockingEnable; + + ImGui_ImplGlfw_InitForOpenGL(window_, true); + ImGui_ImplOpenGL3_Init(glsl_version); + + ImVec4 clear_color = ImVec4(0.03f, 0.03f, 0.03f, 1.0f); + while (!glfwWindowShouldClose(window_)) { + glfwPollEvents(); + + CefDoMessageLoopWork(); + + ImGui_ImplOpenGL3_NewFrame(); + ImGui_ImplGlfw_NewFrame(); + ImGui::NewFrame(); + + // Flags used for the "invisible" dockspace frame + ImGuiWindowFlags windowFlags = + ImGuiWindowFlags_NoDocking | ImGuiWindowFlags_NoTitleBar | + ImGuiWindowFlags_NoCollapse | ImGuiWindowFlags_NoResize | + ImGuiWindowFlags_NoMove | ImGuiWindowFlags_NoBringToFrontOnFocus | + ImGuiWindowFlags_NoNavFocus | ImGuiWindowFlags_NoBackground; + + ImGuiViewport* viewport = ImGui::GetMainViewport(); + ImGui::SetNextWindowPos(viewport->Pos); + ImGui::SetNextWindowSize(viewport->Size); + ImGui::SetNextWindowViewport(viewport->ID); + + ImGui::PushStyleVar(ImGuiStyleVar_WindowRounding, 0); + ImGui::PushStyleVar(ImGuiStyleVar_WindowBorderSize, 0); + ImGui::PushStyleVar(ImGuiStyleVar_WindowPadding, ImVec2(0, 0)); + ImGui::Begin("Editor", nullptr, windowFlags); + ImGui::PopStyleVar(3); + ImGui::DockSpace(ImGui::GetID("EditorDockSpace"), ImVec2(), + ImGuiDockNodeFlags_PassthruCentralNode); + + // Focused browser input states + BrowserData* focused_browser = nullptr; + int browser_view_x = 0; + int browser_view_y = 0; + + for (auto& [identifier, data] : browser_data_) { + std::string name = + (data.title.empty() ? "Browser #" + std::to_string(identifier) + : data.title) + + "###Browser" + std::to_string(identifier); + + ImGui::PushStyleVar(ImGuiStyleVar_WindowPadding, ImVec2(0, 0)); + if (ImGui::Begin(name.c_str())) { + ImGui::BeginDisabled(!data.browser->CanGoBack()); + if (ImGui::ArrowButton("##BrowserBack", ImGuiDir_Left)) { + data.browser->GoBack(); + } + ImGui::EndDisabled(); + ImGui::SameLine(); + + ImGui::BeginDisabled(!data.browser->CanGoForward()); + if (ImGui::ArrowButton("##BrowserForward", ImGuiDir_Right)) { + data.browser->GoForward(); + } + ImGui::EndDisabled(); + ImGui::SameLine(); + + if (ImGui::InputText("##BrowserURL", &data.url, + ImGuiInputTextFlags_EnterReturnsTrue)) { + data.browser->GetMainFrame()->LoadURL(data.url); + } + + ImGui::SameLine(); + + if (ImGui::Button("Show DevTools")) { + CefWindowInfo windowInfo{}; + CefBrowserSettings settings{}; + + data.browser->GetHost()->ShowDevTools( + windowInfo, DevToolsHandler::GetInstance(), settings, CefPoint()); + } + + ImVec2 size = ImGui::GetContentRegionAvail(); + + // Resize the browser view if needed + if (size.x > 0 && size.y > 0 && + (data.view_width != static_cast(size.x) || + data.view_height != static_cast(size.y))) { + browser_store_->GetBrowserHandle(identifier) + ->SetSize(static_cast(size.x), static_cast(size.y)); + } + + ImVec2 cursor_pos = ImGui::GetCursorScreenPos(); + + bool is_focused = ImGui::IsWindowFocused(); + if (is_focused) { + focused_browser = &data; + browser_view_x = static_cast(cursor_pos.x); + browser_view_y = static_cast(cursor_pos.y); + data.browser->GetHost()->SetFocus(true); + } + + // Render the browser tex + ImGui::Image((void*)(intptr_t)data.texture_id, + ImVec2((float)data.view_width, (float)data.view_height)); + } + ImGui::End(); + ImGui::PopStyleVar(); + } + + GLEQevent event; + + while (gleqNextEvent(&event)) { + switch (event.type) { + case GLEQ_CURSOR_MOVED: + case GLEQ_BUTTON_PRESSED: + case GLEQ_SCROLLED: + case GLEQ_BUTTON_RELEASED: + if (focused_browser) { + CefMouseEvent cef_event; + + if (event.type == GLEQ_CURSOR_MOVED) { + cef_event.x = event.pos.x - browser_view_x; + cef_event.y = event.pos.y - browser_view_y; + focused_browser->browser->GetHost()->SendMouseMoveEvent(cef_event, + false); + } else if (event.type == GLEQ_SCROLLED) { + double xpos, ypos; + glfwGetCursorPos(window_, &xpos, &ypos); + cef_event.x = static_cast(xpos) - browser_view_x; + cef_event.y = static_cast(ypos) - browser_view_y; + + static const int scrollbarPixelsPerTick = 20; + int scroll_x = + static_cast(event.scroll.x * scrollbarPixelsPerTick); + int scroll_y = + static_cast(event.scroll.y * scrollbarPixelsPerTick); + + focused_browser->browser->GetHost()->SendMouseWheelEvent( + cef_event, scroll_x, scroll_y); + } else { + double xpos, ypos; + glfwGetCursorPos(window_, &xpos, &ypos); + cef_event.x = static_cast(xpos) - browser_view_x; + cef_event.y = static_cast(ypos) - browser_view_y; + cef_event.modifiers = glfw_mods_to_cef_mods(event.mouse.mods); + + std::optional cef_button = + glfw_button_to_cef_button(event.mouse.button); + + if (cef_button.has_value()) { + focused_browser->browser->GetHost()->SendMouseClickEvent( + cef_event, cef_button.value(), + event.type == GLEQ_BUTTON_RELEASED, 1); + } + } + } + break; + case GLEQ_KEY_PRESSED: + case GLEQ_KEY_RELEASED: + if (focused_browser) { + CefKeyEvent cef_event; + cef_event.windows_key_code = + glfw_key_to_cef_key(event.keyboard.key); + cef_event.native_key_code = event.keyboard.scancode; + cef_event.modifiers = glfw_mods_to_cef_mods(event.keyboard.mods); + cef_event.is_system_key = false; + + if (event.type == GLEQ_KEY_PRESSED) { + cef_event.type = KEYEVENT_RAWKEYDOWN; + focused_browser->browser->GetHost()->SendKeyEvent(cef_event); + } else { + cef_event.type = KEYEVENT_KEYUP; + focused_browser->browser->GetHost()->SendKeyEvent(cef_event); + } + } + break; + case GLEQ_CODEPOINT_INPUT: + if (focused_browser) { + CefKeyEvent cef_event; + cef_event.type = KEYEVENT_CHAR; + cef_event.windows_key_code = 0; + cef_event.native_key_code = 0; + cef_event.modifiers = 0; + cef_event.is_system_key = false; + cef_event.unmodified_character = event.codepoint; + cef_event.character = event.codepoint; + focused_browser->browser->GetHost()->SendKeyEvent(cef_event); + } + break; + default: + break; + } + + gleqFreeEvent(&event); + } + + ImGui::End(); + ImGui::Render(); + int display_w, display_h; + glfwGetFramebufferSize(window_, &display_w, &display_h); + glViewport(0, 0, display_w, display_h); + glClearColor(clear_color.x * clear_color.w, clear_color.y * clear_color.w, + clear_color.z * clear_color.w, clear_color.w); + glClear(GL_COLOR_BUFFER_BIT); + ImGui_ImplOpenGL3_RenderDrawData(ImGui::GetDrawData()); + + glfwSwapBuffers(window_); + } + + ImGui_ImplOpenGL3_Shutdown(); + ImGui_ImplGlfw_Shutdown(); + ImGui::DestroyContext(); + + glfwDestroyWindow(window_); + glfwTerminate(); +} + +void DevRenderer::Close() { + // glfwSetWindowShouldClose(window_, GLFW_TRUE); +} diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.hpp b/livekit-plugins/livekit-plugins-browser/src/dev_renderer.hpp similarity index 62% rename from livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.hpp rename to livekit-plugins/livekit-plugins-browser/src/dev_renderer.hpp index 673674474..c6110a742 100644 --- a/livekit-plugins/livekit-plugins-browser/cef/src/dev_renderer.hpp +++ b/livekit-plugins/livekit-plugins-browser/src/dev_renderer.hpp @@ -2,6 +2,7 @@ #define LKCEF_DEV_RENDERER_HPP #include "include/cef_app.h" +#include "browser_handle.hpp" #define GL_SILENCE_DEPRECATION #include // Will drag system OpenGL headers @@ -13,11 +14,18 @@ class DevRenderer: public CefBaseRefCounted { public: - DevRenderer(); + DevRenderer(CefRefPtr browser_store); void Run(); void Close(); + void OnTitleChange(CefRefPtr browser, + const CefString &title); + + void OnLoadingStateChange(CefRefPtr browser, + bool isLoading, + bool canGoBack, + bool canGoForward); void OnAfterCreated(CefRefPtr browser); @@ -30,19 +38,24 @@ class DevRenderer: public CefBaseRefCounted { void OnBeforeClose(CefRefPtr browser); - void* getNativeWindowHandle() { + void* getNativeWindowHandle() const { return glfwGetCocoaWindow(window_); } private: - struct RenderData{ + struct BrowserData{ + CefRefPtr browser; unsigned int texture_id; int view_width; int view_height; + std::string title; + std::string url; }; GLFWwindow* window_ = nullptr; - std::unordered_map render_data_; + std::unordered_map browser_data_; + + CefRefPtr browser_store_; IMPLEMENT_REFCOUNTING(DevRenderer); }; diff --git a/livekit-plugins/livekit-plugins-browser/src/dummy.cpp b/livekit-plugins/livekit-plugins-browser/src/dummy.cpp new file mode 100644 index 000000000..d269c8943 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/dummy.cpp @@ -0,0 +1,3 @@ +int main() { + return 0; +} \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-browser/src/gleq.h b/livekit-plugins/livekit-plugins-browser/src/gleq.h new file mode 100644 index 000000000..69a9e6293 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/gleq.h @@ -0,0 +1,419 @@ +/* +* GLEQ - A basic event queue for GLFW 3 +* Copyright © Camilla Löwy +* +* This software is provided 'as-is', without any express or implied +* warranty. In no event will the authors be held liable for any damages +* arising from the use of this software. +* +* Permission is granted to anyone to use this software for any purpose, +* including commercial applications, and to alter it and redistribute it +* freely, subject to the following restrictions: +* +* 1. The origin of this software must not be misrepresented; you must not +* claim that you wrote the original software. If you use this software +* in a product, an acknowledgment in the product documentation would +* be appreciated but is not required. +* +* 2. Altered source versions must be plainly marked as such, and must not +* be misrepresented as being the original software. +* +* 3. This notice may not be removed or altered from any source +* distribution. +*/ + +#ifndef GLEQ_HEADER_FILE +#define GLEQ_HEADER_FILE + +#include + +#ifdef GLEQ_STATIC +#define GLEQDEF static +#else +#define GLEQDEF extern +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +typedef enum +{ + GLEQ_NONE, + GLEQ_WINDOW_MOVED, + GLEQ_WINDOW_RESIZED, + GLEQ_WINDOW_CLOSED, + GLEQ_WINDOW_REFRESH, + GLEQ_WINDOW_FOCUSED, + GLEQ_WINDOW_DEFOCUSED, + GLEQ_WINDOW_ICONIFIED, + GLEQ_WINDOW_UNICONIFIED, + GLEQ_FRAMEBUFFER_RESIZED, + GLEQ_BUTTON_PRESSED, + GLEQ_BUTTON_RELEASED, + GLEQ_CURSOR_MOVED, + GLEQ_CURSOR_ENTERED, + GLEQ_CURSOR_LEFT, + GLEQ_SCROLLED, + GLEQ_KEY_PRESSED, + GLEQ_KEY_REPEATED, + GLEQ_KEY_RELEASED, + GLEQ_CODEPOINT_INPUT, + GLEQ_MONITOR_CONNECTED, + GLEQ_MONITOR_DISCONNECTED, +#if GLFW_VERSION_MINOR >= 1 + GLEQ_FILE_DROPPED, +#endif +#if GLFW_VERSION_MINOR >= 2 + GLEQ_JOYSTICK_CONNECTED, + GLEQ_JOYSTICK_DISCONNECTED, +#endif +#if GLFW_VERSION_MINOR >= 3 + GLEQ_WINDOW_MAXIMIZED, + GLEQ_WINDOW_UNMAXIMIZED, + GLEQ_WINDOW_SCALE_CHANGED, +#endif +} GLEQtype; + +typedef struct GLEQevent +{ + GLEQtype type; + union { + GLFWwindow* window; + GLFWmonitor* monitor; + int joystick; + }; + union { + struct { + int x; + int y; + } pos; + struct { + int width; + int height; + } size; + struct { + double x; + double y; + } scroll; + struct { + int key; + int scancode; + int mods; + } keyboard; + struct { + int button; + int mods; + } mouse; + unsigned int codepoint; +#if GLFW_VERSION_MINOR >= 1 + struct { + char** paths; + int count; + } file; +#endif +#if GLFW_VERSION_MINOR >= 3 + struct { + float x; + float y; + } scale; +#endif + }; +} GLEQevent; + +GLEQDEF void gleqInit(void); +GLEQDEF void gleqTrackWindow(GLFWwindow* window); + +GLEQDEF int gleqNextEvent(GLEQevent* event); +GLEQDEF void gleqFreeEvent(GLEQevent* event); + +#ifdef __cplusplus +} +#endif + +#ifdef GLEQ_IMPLEMENTATION + +#include +#include +#include + +#ifndef GLEQ_CAPACITY +#define GLEQ_CAPACITY 1024 +#endif + +static struct +{ + GLEQevent events[GLEQ_CAPACITY]; + size_t head; + size_t tail; +} gleq_queue = { {}, 0, 0 }; + +static char* gleq_strdup(const char* string) +{ + const size_t size = strlen(string) + 1; + char* result = (char*) malloc(size); + memcpy(result, string, size); + return result; +} + +static GLEQevent* gleq_new_event(void) +{ + GLEQevent* event = gleq_queue.events + gleq_queue.head; + gleq_queue.head = (gleq_queue.head + 1) % GLEQ_CAPACITY; + assert(gleq_queue.head != gleq_queue.tail); + memset(event, 0, sizeof(GLEQevent)); + return event; +} + +static void gleq_window_pos_callback(GLFWwindow* window, int x, int y) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_WINDOW_MOVED; + event->window = window; + event->pos.x = x; + event->pos.y = y; +} + +static void gleq_window_size_callback(GLFWwindow* window, int width, int height) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_WINDOW_RESIZED; + event->window = window; + event->size.width = width; + event->size.height = height; +} + +static void gleq_window_close_callback(GLFWwindow* window) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_WINDOW_CLOSED; + event->window = window; +} + +static void gleq_window_refresh_callback(GLFWwindow* window) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_WINDOW_REFRESH; + event->window = window; +} + +static void gleq_window_focus_callback(GLFWwindow* window, int focused) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + + if (focused) + event->type = GLEQ_WINDOW_FOCUSED; + else + event->type = GLEQ_WINDOW_DEFOCUSED; +} + +static void gleq_window_iconify_callback(GLFWwindow* window, int iconified) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + + if (iconified) + event->type = GLEQ_WINDOW_ICONIFIED; + else + event->type = GLEQ_WINDOW_UNICONIFIED; +} + +static void gleq_framebuffer_size_callback(GLFWwindow* window, int width, int height) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_FRAMEBUFFER_RESIZED; + event->window = window; + event->size.width = width; + event->size.height = height; +} + +static void gleq_mouse_button_callback(GLFWwindow* window, int button, int action, int mods) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + event->mouse.button = button; + event->mouse.mods = mods; + + if (action == GLFW_PRESS) + event->type = GLEQ_BUTTON_PRESSED; + else if (action == GLFW_RELEASE) + event->type = GLEQ_BUTTON_RELEASED; +} + +static void gleq_cursor_pos_callback(GLFWwindow* window, double x, double y) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_CURSOR_MOVED; + event->window = window; + event->pos.x = (int) x; + event->pos.y = (int) y; +} + +static void gleq_cursor_enter_callback(GLFWwindow* window, int entered) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + + if (entered) + event->type = GLEQ_CURSOR_ENTERED; + else + event->type = GLEQ_CURSOR_LEFT; +} + +static void gleq_scroll_callback(GLFWwindow* window, double x, double y) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_SCROLLED; + event->window = window; + event->scroll.x = x; + event->scroll.y = y; +} + +static void gleq_key_callback(GLFWwindow* window, int key, int scancode, int action, int mods) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + event->keyboard.key = key; + event->keyboard.scancode = scancode; + event->keyboard.mods = mods; + + if (action == GLFW_PRESS) + event->type = GLEQ_KEY_PRESSED; + else if (action == GLFW_RELEASE) + event->type = GLEQ_KEY_RELEASED; + else if (action == GLFW_REPEAT) + event->type = GLEQ_KEY_REPEATED; +} + +static void gleq_char_callback(GLFWwindow* window, unsigned int codepoint) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_CODEPOINT_INPUT; + event->window = window; + event->codepoint = codepoint; +} + +static void gleq_monitor_callback(GLFWmonitor* monitor, int action) +{ + GLEQevent* event = gleq_new_event(); + event->monitor = monitor; + + if (action == GLFW_CONNECTED) + event->type = GLEQ_MONITOR_CONNECTED; + else if (action == GLFW_DISCONNECTED) + event->type = GLEQ_MONITOR_DISCONNECTED; +} + +#if GLFW_VERSION_MINOR >= 1 +static void gleq_file_drop_callback(GLFWwindow* window, int count, const char** paths) +{ + GLEQevent* event = gleq_new_event(); + event->type = GLEQ_FILE_DROPPED; + event->window = window; + event->file.paths = (char**) malloc(count * sizeof(char*)); + event->file.count = count; + + while (count--) + event->file.paths[count] = gleq_strdup(paths[count]); +} +#endif + +#if GLFW_VERSION_MINOR >= 2 +static void gleq_joystick_callback(int jid, int action) +{ + GLEQevent* event = gleq_new_event(); + event->joystick = jid; + + if (action == GLFW_CONNECTED) + event->type = GLEQ_JOYSTICK_CONNECTED; + else if (action == GLFW_DISCONNECTED) + event->type = GLEQ_JOYSTICK_DISCONNECTED; +} +#endif + +#if GLFW_VERSION_MINOR >= 3 +static void gleq_window_maximize_callback(GLFWwindow* window, int maximized) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + + if (maximized) + event->type = GLEQ_WINDOW_MAXIMIZED; + else + event->type = GLEQ_WINDOW_UNMAXIMIZED; +} + +static void gleq_window_content_scale_callback(GLFWwindow* window, float xscale, float yscale) +{ + GLEQevent* event = gleq_new_event(); + event->window = window; + event->type = GLEQ_WINDOW_SCALE_CHANGED; + event->scale.x = xscale; + event->scale.y = yscale; +} +#endif + +GLEQDEF void gleqInit(void) +{ + glfwSetMonitorCallback(gleq_monitor_callback); +#if GLFW_VERSION_MINOR >= 2 + glfwSetJoystickCallback(gleq_joystick_callback); +#endif +} + +GLEQDEF void gleqTrackWindow(GLFWwindow* window) +{ + glfwSetWindowPosCallback(window, gleq_window_pos_callback); + glfwSetWindowSizeCallback(window, gleq_window_size_callback); + glfwSetWindowCloseCallback(window, gleq_window_close_callback); + glfwSetWindowRefreshCallback(window, gleq_window_refresh_callback); + glfwSetWindowFocusCallback(window, gleq_window_focus_callback); + glfwSetWindowIconifyCallback(window, gleq_window_iconify_callback); + glfwSetFramebufferSizeCallback(window, gleq_framebuffer_size_callback); + glfwSetMouseButtonCallback(window, gleq_mouse_button_callback); + glfwSetCursorPosCallback(window, gleq_cursor_pos_callback); + glfwSetCursorEnterCallback(window, gleq_cursor_enter_callback); + glfwSetScrollCallback(window, gleq_scroll_callback); + glfwSetKeyCallback(window, gleq_key_callback); + glfwSetCharCallback(window, gleq_char_callback); +#if GLFW_VERSION_MINOR >= 1 + glfwSetDropCallback(window, gleq_file_drop_callback); +#endif +#if GLFW_VERSION_MINOR >= 3 + glfwSetWindowMaximizeCallback(window, gleq_window_maximize_callback); + glfwSetWindowContentScaleCallback(window, gleq_window_content_scale_callback); +#endif +} + +GLEQDEF int gleqNextEvent(GLEQevent* event) +{ + memset(event, 0, sizeof(GLEQevent)); + + if (gleq_queue.head != gleq_queue.tail) + { + *event = gleq_queue.events[gleq_queue.tail]; + gleq_queue.tail = (gleq_queue.tail + 1) % GLEQ_CAPACITY; + } + + return event->type != GLEQ_NONE; +} + +GLEQDEF void gleqFreeEvent(GLEQevent* event) +{ +#if GLFW_VERSION_MINOR >= 1 + if (event->type == GLEQ_FILE_DROPPED) + { + while (event->file.count--) + free(event->file.paths[event->file.count]); + + free(event->file.paths); + } +#endif + + memset(event, 0, sizeof(GLEQevent)); +} + +#endif /* GLEQ_IMPLEMENTATION */ + +#endif /* GLEQ_HEADER_FILE */ diff --git a/livekit-plugins/livekit-plugins-browser/src/handler.cpp b/livekit-plugins/livekit-plugins-browser/src/handler.cpp new file mode 100644 index 000000000..1c5e95972 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/handler.cpp @@ -0,0 +1,181 @@ +#include "handler.hpp" + +#include + +#include "include/base/cef_callback.h" +#include "include/cef_parser.h" +#include "include/views/cef_browser_view.h" +#include "include/wrapper/cef_closure_task.h" +#include "include/wrapper/cef_helpers.h" + +DevToolsHandler* g_dev_instance = nullptr; + +DevToolsHandler::DevToolsHandler() { + g_dev_instance = this; +} + +DevToolsHandler::~DevToolsHandler() { + g_dev_instance = nullptr; +} + +DevToolsHandler* DevToolsHandler::GetInstance() { + return g_dev_instance; +} + +AgentHandler* g_instance = nullptr; + +AgentHandler::AgentHandler(CefRefPtr browser_store, + CefRefPtr dev_renderer) + : browser_store_(std::move(browser_store)), + dev_renderer_(std::move(dev_renderer)) { + g_instance = this; +} + +AgentHandler::~AgentHandler() { + g_instance = nullptr; +} + +AgentHandler* AgentHandler::GetInstance() { + return g_instance; +} + +void AgentHandler::OnTitleChange(CefRefPtr browser, + const CefString& title) { + CEF_REQUIRE_UI_THREAD(); + if (dev_renderer_) + dev_renderer_->OnTitleChange(browser, title); +} + +void AgentHandler::OnPaint(CefRefPtr browser, + PaintElementType type, + const RectList& dirtyRects, + const void* buffer, + int width, + int height) { + CEF_REQUIRE_UI_THREAD(); + + int identifier = browser->GetIdentifier(); + CefRefPtr handle = + browser_store_->browser_handles_[identifier]; + if (handle->paint_callback_) + handle->paint_callback_(dirtyRects, buffer, width, height); + + if (dev_renderer_) + dev_renderer_->OnPaint(browser, type, dirtyRects, buffer, width, height); +} + +void AgentHandler::GetViewRect(CefRefPtr browser, CefRect& rect) { + CEF_REQUIRE_UI_THREAD(); + + int identifier = browser->GetIdentifier(); + CefRefPtr& handle = + browser_store_->browser_handles_[identifier]; + rect.Set(0, 0, handle->GetWidth(), handle->GetHeight()); +}; + +void AgentHandler::OnAudioStreamPacket(CefRefPtr browser, + const float** data, + int frames, + int64_t pts) { + // std::cout << "OnAudioStreamPacket" << std::endl; +} + +void AgentHandler::OnAudioStreamStarted(CefRefPtr browser, + const CefAudioParameters& params, + int channels) {} + +void AgentHandler::OnAudioStreamStopped(CefRefPtr browser) {} + +void AgentHandler::OnAudioStreamError(CefRefPtr browser, + const CefString& message) {} + +bool AgentHandler::OnBeforePopup(CefRefPtr browser, + CefRefPtr frame, + const CefString& target_url, + const CefString& target_frame_name, + WindowOpenDisposition target_disposition, + bool user_gesture, + const CefPopupFeatures& popupFeatures, + CefWindowInfo& windowInfo, + CefRefPtr& client, + CefBrowserSettings& settings, + CefRefPtr& extra_info, + bool* no_javascript_access) { + browser->GetMainFrame()->LoadURL(target_url); + return true; +} + +void AgentHandler::OnAfterCreated(CefRefPtr browser) { + CEF_REQUIRE_UI_THREAD(); + + if (browser->IsPopup()) { + return; + } + + int identifier = browser->GetIdentifier(); + CefRefPtr handle = browser_store_->pending_handles_.front(); + browser_store_->pending_handles_.pop_front(); + + handle->browser_ = browser; + browser_store_->browser_handles_[identifier] = handle; + + if (handle->created_callback_) + handle->created_callback_(); + + if (dev_renderer_) + dev_renderer_->OnAfterCreated(browser); +} + +bool AgentHandler::DoClose(CefRefPtr browser) { + CEF_REQUIRE_UI_THREAD(); + int identifier = browser->GetIdentifier(); + CefRefPtr handle = + browser_store_->browser_handles_[identifier]; + browser_store_->browser_handles_.erase(identifier); + + if (handle->close_callback_) + handle->close_callback_(); + + return false; +} + +void AgentHandler::OnBeforeClose(CefRefPtr browser) { + CEF_REQUIRE_UI_THREAD(); + + if (dev_renderer_) + dev_renderer_->OnBeforeClose(browser); +} + +void AgentHandler::OnLoadingStateChange(CefRefPtr browser, + bool isLoading, + bool canGoBack, + bool canGoForward) { + CEF_REQUIRE_UI_THREAD(); + + if (dev_renderer_) + dev_renderer_->OnLoadingStateChange(browser, isLoading, canGoBack, + canGoForward); +} + +void AgentHandler::CloseAllBrowsers(bool force_close) { + if (!CefCurrentlyOn(TID_UI)) { + // Execute on the UI thread. + CefPostTask(TID_UI, base::BindOnce(&AgentHandler::CloseAllBrowsers, this, + force_close)); + return; + } + + if (browser_store_->browser_handles_.empty()) { + return; + } + + for (const auto& pair : browser_store_->browser_handles_) { + pair.second->browser_->GetHost()->CloseBrowser(force_close); + } +} + +#if !defined(OS_MAC) +void AgentHandler::PlatformShowWindow(CefRefPtr browser) { + NOTIMPLEMENTED(); +} +#endif diff --git a/livekit-plugins/livekit-plugins-browser/src/handler.hpp b/livekit-plugins/livekit-plugins-browser/src/handler.hpp new file mode 100644 index 000000000..3967ee7b2 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/handler.hpp @@ -0,0 +1,104 @@ +#ifndef LKCEF_HANDLER_HPP +#define LKCEF_HANDLER_HPP + +#include + +#include "dev_renderer.hpp" +#include "browser_handle.hpp" +#include "include/cef_client.h" +#include "include/wrapper/cef_helpers.h" + +class DevToolsHandler : public CefClient { + public: + DevToolsHandler(); + ~DevToolsHandler(); + + static DevToolsHandler* GetInstance(); + + private: + IMPLEMENT_REFCOUNTING(DevToolsHandler); +}; + +class AgentHandler : public CefClient, + public CefDisplayHandler, + public CefRenderHandler, + public CefAudioHandler, + public CefLifeSpanHandler, + public CefLoadHandler { + public: + AgentHandler(CefRefPtr browser_store, CefRefPtr dev_renderer); + ~AgentHandler(); + + static AgentHandler* GetInstance(); + + CefRefPtr GetDisplayHandler() override { return this; } + CefRefPtr GetRenderHandler() override { return this; } + CefRefPtr GetAudioHandler() override { return this; } + CefRefPtr GetLifeSpanHandler() override { return this; } + CefRefPtr GetLoadHandler() override { return this; } + + // CefDisplayHandler methods + void OnTitleChange(CefRefPtr browser, + const CefString& title) override; + + // CefRenderHandler methods + void OnPaint(CefRefPtr browser, + PaintElementType type, + const RectList& dirtyRects, + const void* buffer, + int width, + int height) override; + + void GetViewRect(CefRefPtr browser, CefRect& rect) override; + + // CefAudioHandler methods + void OnAudioStreamPacket(CefRefPtr browser, + const float** data, + int frames, + int64_t pts) override; + + void OnAudioStreamStarted(CefRefPtr browser, + const CefAudioParameters& params, + int channels) override; + + void OnAudioStreamStopped(CefRefPtr browser) override; + + void OnAudioStreamError(CefRefPtr browser, + const CefString& message) override; + + // CefLifeSpanHandler methods + + bool OnBeforePopup(CefRefPtr browser, + CefRefPtr frame, + const CefString& target_url, + const CefString& target_frame_name, + WindowOpenDisposition target_disposition, + bool user_gesture, + const CefPopupFeatures& popupFeatures, + CefWindowInfo& windowInfo, + CefRefPtr& client, + CefBrowserSettings& settings, + CefRefPtr& extra_info, + bool* no_javascript_access) override; + + void OnAfterCreated(CefRefPtr browser) override; + bool DoClose(CefRefPtr browser) override; + void OnBeforeClose(CefRefPtr browser) override; + + // CefLoadHandler methods + + void OnLoadingStateChange(CefRefPtr browser, + bool isLoading, + bool canGoBack, + bool canGoForward) override; + + void CloseAllBrowsers(bool force_close); + + private: + CefRefPtr browser_store_; + CefRefPtr dev_renderer_; + + IMPLEMENT_REFCOUNTING(AgentHandler); +}; + +#endif // LKCEF_HANDLER_HPP diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/helper_main_linux.cpp b/livekit-plugins/livekit-plugins-browser/src/helper_main_linux.cpp similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/helper_main_linux.cpp rename to livekit-plugins/livekit-plugins-browser/src/helper_main_linux.cpp diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/helper_main_mac.mm b/livekit-plugins/livekit-plugins-browser/src/helper_main_mac.mm similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/helper_main_mac.mm rename to livekit-plugins/livekit-plugins-browser/src/helper_main_mac.mm diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/utils.hpp b/livekit-plugins/livekit-plugins-browser/src/helper_main_win.cpp similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/utils.hpp rename to livekit-plugins/livekit-plugins-browser/src/helper_main_win.cpp diff --git a/livekit-plugins/livekit-plugins-browser/src/keyboard_codes.h b/livekit-plugins/livekit-plugins-browser/src/keyboard_codes.h new file mode 100644 index 000000000..5a3b67e82 --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/keyboard_codes.h @@ -0,0 +1,528 @@ +#ifndef LKCEF_KEYBOARD_CODES_H +#define LKCEF_KEYBOARD_CODES_H + +namespace WebCore { +// VK_LBUTTON (01) Left mouse button +// VK_RBUTTON (02) Right mouse button +// VK_CANCEL (03) Control-break processing +// VK_MBUTTON (04) Middle mouse button (three-button mouse) +// VK_XBUTTON1 (05) +// VK_XBUTTON2 (06) + +// VK_BACK (08) BACKSPACE key +const int VK_BACK = 0x08; + +// VK_TAB (09) TAB key +const int VK_TAB = 0x09; + +// VK_CLEAR (0C) CLEAR key +const int VK_CLEAR = 0x0C; + +// VK_RETURN (0D) +const int VK_RETURN = 0x0D; + +// VK_SHIFT (10) SHIFT key +const int VK_SHIFT = 0x10; + +// VK_CONTROL (11) CTRL key +const int VK_CONTROL = 0x11; + +// VK_MENU (12) ALT key +const int VK_MENU = 0x12; + +// VK_PAUSE (13) PAUSE key +const int VK_PAUSE = 0x13; + +// VK_CAPITAL (14) CAPS LOCK key +const int VK_CAPITAL = 0x14; + +// VK_KANA (15) Input Method Editor (IME) Kana mode +const int VK_KANA = 0x15; + +// VK_HANGUEL (15) IME Hanguel mode (maintained for compatibility; use +// VK_HANGUL) VK_HANGUL (15) IME Hangul mode +const int VK_HANGUL = 0x15; + +// VK_JUNJA (17) IME Junja mode +const int VK_JUNJA = 0x17; + +// VK_FINAL (18) IME final mode +const int VK_FINAL = 0x18; + +// VK_HANJA (19) IME Hanja mode +const int VK_HANJA = 0x19; + +// VK_KANJI (19) IME Kanji mode +const int VK_KANJI = 0x19; + +// VK_ESCAPE (1B) ESC key +const int VK_ESCAPE = 0x1B; + +// VK_CONVERT (1C) IME convert +const int VK_CONVERT = 0x1C; + +// VK_NONCONVERT (1D) IME nonconvert +const int VK_NONCONVERT = 0x1D; + +// VK_ACCEPT (1E) IME accept +const int VK_ACCEPT = 0x1E; + +// VK_MODECHANGE (1F) IME mode change request +const int VK_MODECHANGE = 0x1F; + +// VK_SPACE (20) SPACEBAR +const int VK_SPACE = 0x20; + +// VK_PRIOR (21) PAGE UP key +const int VK_PRIOR = 0x21; + +// VK_NEXT (22) PAGE DOWN key +const int VK_NEXT = 0x22; + +// VK_END (23) END key +const int VK_END = 0x23; + +// VK_HOME (24) HOME key +const int VK_HOME = 0x24; + +// VK_LEFT (25) LEFT ARROW key +const int VK_LEFT = 0x25; + +// VK_UP (26) UP ARROW key +const int VK_UP = 0x26; + +// VK_RIGHT (27) RIGHT ARROW key +const int VK_RIGHT = 0x27; + +// VK_DOWN (28) DOWN ARROW key +const int VK_DOWN = 0x28; + +// VK_SELECT (29) SELECT key +const int VK_SELECT = 0x29; + +// VK_PRINT (2A) PRINT key +const int VK_PRINT = 0x2A; + +// VK_EXECUTE (2B) EXECUTE key +const int VK_EXECUTE = 0x2B; + +// VK_SNAPSHOT (2C) PRINT SCREEN key +const int VK_SNAPSHOT = 0x2C; + +// VK_INSERT (2D) INS key +const int VK_INSERT = 0x2D; + +// VK_DELETE (2E) DEL key +const int VK_DELETE = 0x2E; + +// VK_HELP (2F) HELP key +const int VK_HELP = 0x2F; + +// (30) 0 key +const int VK_0 = 0x30; + +// (31) 1 key +const int VK_1 = 0x31; + +// (32) 2 key +const int VK_2 = 0x32; + +// (33) 3 key +const int VK_3 = 0x33; + +// (34) 4 key +const int VK_4 = 0x34; + +// (35) 5 key; + +const int VK_5 = 0x35; + +// (36) 6 key +const int VK_6 = 0x36; + +// (37) 7 key +const int VK_7 = 0x37; + +// (38) 8 key +const int VK_8 = 0x38; + +// (39) 9 key +const int VK_9 = 0x39; + +// (41) A key +const int VK_A = 0x41; + +// (42) B key +const int VK_B = 0x42; + +// (43) C key +const int VK_C = 0x43; + +// (44) D key +const int VK_D = 0x44; + +// (45) E key +const int VK_E = 0x45; + +// (46) F key +const int VK_F = 0x46; + +// (47) G key +const int VK_G = 0x47; + +// (48) H key +const int VK_H = 0x48; + +// (49) I key +const int VK_I = 0x49; + +// (4A) J key +const int VK_J = 0x4A; + +// (4B) K key +const int VK_K = 0x4B; + +// (4C) L key +const int VK_L = 0x4C; + +// (4D) M key +const int VK_M = 0x4D; + +// (4E) N key +const int VK_N = 0x4E; + +// (4F) O key +const int VK_O = 0x4F; + +// (50) P key +const int VK_P = 0x50; + +// (51) Q key +const int VK_Q = 0x51; + +// (52) R key +const int VK_R = 0x52; + +// (53) S key +const int VK_S = 0x53; + +// (54) T key +const int VK_T = 0x54; + +// (55) U key +const int VK_U = 0x55; + +// (56) V key +const int VK_V = 0x56; + +// (57) W key +const int VK_W = 0x57; + +// (58) X key +const int VK_X = 0x58; + +// (59) Y key +const int VK_Y = 0x59; + +// (5A) Z key +const int VK_Z = 0x5A; + +// VK_LWIN (5B) Left Windows key (Microsoft Natural keyboard) +const int VK_LWIN = 0x5B; + +// VK_RWIN (5C) Right Windows key (Natural keyboard) +const int VK_RWIN = 0x5C; + +// VK_APPS (5D) Applications key (Natural keyboard) +const int VK_APPS = 0x5D; + +// VK_SLEEP (5F) Computer Sleep key +const int VK_SLEEP = 0x5F; + +// VK_NUMPAD0 (60) Numeric keypad 0 key +const int VK_NUMPAD0 = 0x60; + +// VK_NUMPAD1 (61) Numeric keypad 1 key +const int VK_NUMPAD1 = 0x61; + +// VK_NUMPAD2 (62) Numeric keypad 2 key +const int VK_NUMPAD2 = 0x62; + +// VK_NUMPAD3 (63) Numeric keypad 3 key +const int VK_NUMPAD3 = 0x63; + +// VK_NUMPAD4 (64) Numeric keypad 4 key +const int VK_NUMPAD4 = 0x64; + +// VK_NUMPAD5 (65) Numeric keypad 5 key +const int VK_NUMPAD5 = 0x65; + +// VK_NUMPAD6 (66) Numeric keypad 6 key +const int VK_NUMPAD6 = 0x66; + +// VK_NUMPAD7 (67) Numeric keypad 7 key +const int VK_NUMPAD7 = 0x67; + +// VK_NUMPAD8 (68) Numeric keypad 8 key +const int VK_NUMPAD8 = 0x68; + +// VK_NUMPAD9 (69) Numeric keypad 9 key +const int VK_NUMPAD9 = 0x69; + +// VK_MULTIPLY (6A) Multiply key +const int VK_MULTIPLY = 0x6A; + +// VK_ADD (6B) Add key +const int VK_ADD = 0x6B; + +// VK_SEPARATOR (6C) Separator key +const int VK_SEPARATOR = 0x6C; + +// VK_SUBTRACT (6D) Subtract key +const int VK_SUBTRACT = 0x6D; + +// VK_DECIMAL (6E) Decimal key +const int VK_DECIMAL = 0x6E; + +// VK_DIVIDE (6F) Divide key +const int VK_DIVIDE = 0x6F; + +// VK_F1 (70) F1 key +const int VK_F1 = 0x70; + +// VK_F2 (71) F2 key +const int VK_F2 = 0x71; + +// VK_F3 (72) F3 key +const int VK_F3 = 0x72; + +// VK_F4 (73) F4 key +const int VK_F4 = 0x73; + +// VK_F5 (74) F5 key +const int VK_F5 = 0x74; + +// VK_F6 (75) F6 key +const int VK_F6 = 0x75; + +// VK_F7 (76) F7 key +const int VK_F7 = 0x76; + +// VK_F8 (77) F8 key +const int VK_F8 = 0x77; + +// VK_F9 (78) F9 key +const int VK_F9 = 0x78; + +// VK_F10 (79) F10 key +const int VK_F10 = 0x79; + +// VK_F11 (7A) F11 key +const int VK_F11 = 0x7A; + +// VK_F12 (7B) F12 key +const int VK_F12 = 0x7B; + +// VK_F13 (7C) F13 key +const int VK_F13 = 0x7C; + +// VK_F14 (7D) F14 key +const int VK_F14 = 0x7D; + +// VK_F15 (7E) F15 key +const int VK_F15 = 0x7E; + +// VK_F16 (7F) F16 key +const int VK_F16 = 0x7F; + +// VK_F17 (80H) F17 key +const int VK_F17 = 0x80; + +// VK_F18 (81H) F18 key +const int VK_F18 = 0x81; + +// VK_F19 (82H) F19 key +const int VK_F19 = 0x82; + +// VK_F20 (83H) F20 key +const int VK_F20 = 0x83; + +// VK_F21 (84H) F21 key +const int VK_F21 = 0x84; + +// VK_F22 (85H) F22 key +const int VK_F22 = 0x85; + +// VK_F23 (86H) F23 key +const int VK_F23 = 0x86; + +// VK_F24 (87H) F24 key +const int VK_F24 = 0x87; + +// VK_NUMLOCK (90) NUM LOCK key +const int VK_NUMLOCK = 0x90; + +// VK_SCROLL (91) SCROLL LOCK key +const int VK_SCROLL = 0x91; + +// VK_LSHIFT (A0) Left SHIFT key +const int VK_LSHIFT = 0xA0; + +// VK_RSHIFT (A1) Right SHIFT key +const int VK_RSHIFT = 0xA1; + +// VK_LCONTROL (A2) Left CONTROL key +const int VK_LCONTROL = 0xA2; + +// VK_RCONTROL (A3) Right CONTROL key +const int VK_RCONTROL = 0xA3; + +// VK_LMENU (A4) Left MENU key +const int VK_LMENU = 0xA4; + +// VK_RMENU (A5) Right MENU key +const int VK_RMENU = 0xA5; + +// VK_BROWSER_BACK (A6) Windows 2000/XP: Browser Back key +const int VK_BROWSER_BACK = 0xA6; + +// VK_BROWSER_FORWARD (A7) Windows 2000/XP: Browser Forward key +const int VK_BROWSER_FORWARD = 0xA7; + +// VK_BROWSER_REFRESH (A8) Windows 2000/XP: Browser Refresh key +const int VK_BROWSER_REFRESH = 0xA8; + +// VK_BROWSER_STOP (A9) Windows 2000/XP: Browser Stop key +const int VK_BROWSER_STOP = 0xA9; + +// VK_BROWSER_SEARCH (AA) Windows 2000/XP: Browser Search key +const int VK_BROWSER_SEARCH = 0xAA; + +// VK_BROWSER_FAVORITES (AB) Windows 2000/XP: Browser Favorites key +const int VK_BROWSER_FAVORITES = 0xAB; + +// VK_BROWSER_HOME (AC) Windows 2000/XP: Browser Start and Home key +const int VK_BROWSER_HOME = 0xAC; + +// VK_VOLUME_MUTE (AD) Windows 2000/XP: Volume Mute key +const int VK_VOLUME_MUTE = 0xAD; + +// VK_VOLUME_DOWN (AE) Windows 2000/XP: Volume Down key +const int VK_VOLUME_DOWN = 0xAE; + +// VK_VOLUME_UP (AF) Windows 2000/XP: Volume Up key +const int VK_VOLUME_UP = 0xAF; + +// VK_MEDIA_NEXT_TRACK (B0) Windows 2000/XP: Next Track key +const int VK_MEDIA_NEXT_TRACK = 0xB0; + +// VK_MEDIA_PREV_TRACK (B1) Windows 2000/XP: Previous Track key +const int VK_MEDIA_PREV_TRACK = 0xB1; + +// VK_MEDIA_STOP (B2) Windows 2000/XP: Stop Media key +const int VK_MEDIA_STOP = 0xB2; + +// VK_MEDIA_PLAY_PAUSE (B3) Windows 2000/XP: Play/Pause Media key +const int VK_MEDIA_PLAY_PAUSE = 0xB3; + +// VK_LAUNCH_MAIL (B4) Windows 2000/XP: Start Mail key +const int VK_MEDIA_LAUNCH_MAIL = 0xB4; + +// VK_LAUNCH_MEDIA_SELECT (B5) Windows 2000/XP: Select Media key +const int VK_MEDIA_LAUNCH_MEDIA_SELECT = 0xB5; + +// VK_LAUNCH_APP1 (B6) Windows 2000/XP: Start Application 1 key +const int VK_MEDIA_LAUNCH_APP1 = 0xB6; + +// VK_LAUNCH_APP2 (B7) Windows 2000/XP: Start Application 2 key +const int VK_MEDIA_LAUNCH_APP2 = 0xB7; + +// VK_OEM_1 (BA) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the ';:' key +const int VK_OEM_1 = 0xBA; + +// VK_OEM_PLUS (BB) Windows 2000/XP: For any country/region, the '+' key +const int VK_OEM_PLUS = 0xBB; + +// VK_OEM_COMMA (BC) Windows 2000/XP: For any country/region, the ',' key +const int VK_OEM_COMMA = 0xBC; + +// VK_OEM_MINUS (BD) Windows 2000/XP: For any country/region, the '-' key +const int VK_OEM_MINUS = 0xBD; + +// VK_OEM_PERIOD (BE) Windows 2000/XP: For any country/region, the '.' key +const int VK_OEM_PERIOD = 0xBE; + +// VK_OEM_2 (BF) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the '/?' key +const int VK_OEM_2 = 0xBF; + +// VK_OEM_3 (C0) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the '`~' key +const int VK_OEM_3 = 0xC0; + +// VK_OEM_4 (DB) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the '[{' key +const int VK_OEM_4 = 0xDB; + +// VK_OEM_5 (DC) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the '\|' key +const int VK_OEM_5 = 0xDC; + +// VK_OEM_6 (DD) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the ']}' key +const int VK_OEM_6 = 0xDD; + +// VK_OEM_7 (DE) Used for miscellaneous characters; it can vary by keyboard. +// Windows 2000/XP: For the US standard keyboard, the +// 'single-quote/double-quote' key +const int VK_OEM_7 = 0xDE; + +// VK_OEM_8 (DF) Used for miscellaneous characters; it can vary by keyboard. +const int VK_OEM_8 = 0xDF; + +// VK_OEM_102 (E2) Windows 2000/XP: Either the angle bracket key or the +// backslash key on the RT 102-key keyboard +const int VK_OEM_102 = 0xE2; + +// VK_PROCESSKEY (E5) Windows 95/98/Me, Windows NT 4.0, Windows 2000/XP: IME +// PROCESS key +const int VK_PROCESSKEY = 0xE5; + +// VK_PACKET (E7) Windows 2000/XP: Used to pass Unicode characters as if they +// were keystrokes. The VK_PACKET key is the low word of a 32-bit Virtual Key +// value used for non-keyboard input methods. For more information, see Remark +// in KEYBDINPUT,SendInput, WM_KEYDOWN, and WM_KEYUP +const int VK_PACKET = 0xE7; + +// VK_ATTN (F6) Attn key +const int VK_ATTN = 0xF6; + +// VK_CRSEL (F7) CrSel key +const int VK_CRSEL = 0xF7; + +// VK_EXSEL (F8) ExSel key +const int VK_EXSEL = 0xF8; + +// VK_EREOF (F9) Erase EOF key +const int VK_EREOF = 0xF9; + +// VK_PLAY (FA) Play key +const int VK_PLAY = 0xFA; + +// VK_ZOOM (FB) Zoom key +const int VK_ZOOM = 0xFB; + +// VK_NONAME (FC) Reserved for future use +const int VK_NONAME = 0xFC; + +// VK_PA1 (FD) PA1 key +const int VK_PA1 = 0xFD; + +// VK_OEM_CLEAR (FE) Clear key +const int VK_OEM_CLEAR = 0xFE; + +const int VK_UNKNOWN = 0; +} // namespace WebCore + +#endif // LKCEF_KEYBOARD_CODES_H diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcefapp-Info.plist b/livekit-plugins/livekit-plugins-browser/src/resources/lkcefapp-Info.plist similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcefapp-Info.plist rename to livekit-plugins/livekit-plugins-browser/src/resources/lkcefapp-Info.plist diff --git a/livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcefhelper-Info.plist b/livekit-plugins/livekit-plugins-browser/src/resources/lkcefhelper-Info.plist similarity index 100% rename from livekit-plugins/livekit-plugins-browser/cef/src/resources/lkcefhelper-Info.plist rename to livekit-plugins/livekit-plugins-browser/src/resources/lkcefhelper-Info.plist diff --git a/livekit-plugins/livekit-plugins-browser/src/run_browser.py b/livekit-plugins/livekit-plugins-browser/src/run_browser.py new file mode 100644 index 000000000..e43c2e63a --- /dev/null +++ b/livekit-plugins/livekit-plugins-browser/src/run_browser.py @@ -0,0 +1,45 @@ +# flake8: noqa + +import sys + +print("cwd: ", sys.path[0]) + +sys.path.insert(0, "./Debug") +import lkcef_python as lkcef + +print("lkcef __dict__: ", lkcef.__dict__) +print("BrowserImpl __dict__: ", lkcef.BrowserImpl.__dict__) + + +def _context_initialized(): + opts = lkcef.BrowserOptions() + opts.framerate = 30 + + def _browser_created(browser_impl): + print("run_browser.py - Browser created") + + opts.created_callback = _browser_created + + def _on_paint(frame_data): + pass + + opts.paint_callback = _on_paint + + def _on_closed(): + print("run_browser.py - Browser closed") + + opts.close_callback = _on_closed + + app.create_browser("http://www.livekit.io", opts) + print("run_browser.py - Context initialized") + + +opts = lkcef.AppOptions() +opts.dev_mode = True +opts.initialized_callback = _context_initialized +opts.framework_path = "/Users/theomonnom/livekit/agents/livekit-plugins/livekit-plugins-browser/cef/src/Debug/lkcef_app.app/Contents/Frameworks/Chromium Embedded Framework.framework" +opts.main_bundle_path = "/Users/theomonnom/livekit/agents/livekit-plugins/livekit-plugins-browser/cef/src/Debug/lkcef_app.app" +opts.subprocess_path = "/Users/theomonnom/livekit/agents/livekit-plugins/livekit-plugins-browser/cef/src/Debug/lkcef_app.app/Contents/Frameworks/lkcef Helper.app/Contents/MacOS/lkcef Helper" + +app = lkcef.BrowserApp(opts) +app.run() diff --git a/livekit-plugins/livekit-plugins-cartesia/CHANGELOG.md b/livekit-plugins/livekit-plugins-cartesia/CHANGELOG.md index 3f5c40b47..d92a10504 100644 --- a/livekit-plugins/livekit-plugins-cartesia/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-cartesia/CHANGELOG.md @@ -1,5 +1,18 @@ # livekit-plugins-cartesia +## 0.4.2 + +### Patch Changes + +- Add support for cartesia voice control - [#740](https://github.com/livekit/agents/pull/740) ([@bcherry](https://github.com/bcherry)) + +## 0.4.1 + +### Patch Changes + +- Switch Cartesia to a sentence tokenizer and keep the same context id throughout. - [#608](https://github.com/livekit/agents/pull/608) ([@keepingitneil](https://github.com/keepingitneil)) + Propagate segment_id through the basic sentence tokenizer + ## 0.3.0 ### Minor Changes diff --git a/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/models.py b/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/models.py index ca238356c..309448bdd 100644 --- a/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/models.py +++ b/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/models.py @@ -8,7 +8,34 @@ # "pcm_alaw", ] - TTSModels = Literal["sonic-english", "sonic-multilingual"] TTSLanguages = Literal["en", "es", "fr", "de", "pt", "zh", "ja"] TTSDefaultVoiceId = "c2ac25f9-ecc4-4f56-9095-651354df60c0" +TTSVoiceSpeed = Literal["fastest", "fast", "normal", "slow", "slowest"] +TTSVoiceEmotion = Literal[ + "anger:lowest", + "anger:low", + "anger", + "anger:high", + "anger:highest", + "positivity:lowest", + "positivity:low", + "positivity", + "positivity:high", + "positivity:highest", + "surprise:lowest", + "surprise:low", + "surprise", + "surprise:high", + "surprise:highest", + "sadness:lowest", + "sadness:low", + "sadness", + "sadness:high", + "sadness:highest", + "curiosity:lowest", + "curiosity:low", + "curiosity", + "curiosity:high", + "curiosity:highest", +] diff --git a/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/tts.py b/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/tts.py index 7a93a2ab6..42b830efb 100644 --- a/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/tts.py +++ b/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/tts.py @@ -25,7 +25,13 @@ from livekit.agents import tokenize, tts, utils from .log import logger -from .models import TTSDefaultVoiceId, TTSEncoding, TTSModels +from .models import ( + TTSDefaultVoiceId, + TTSEncoding, + TTSModels, + TTSVoiceEmotion, + TTSVoiceSpeed, +) API_AUTH_HEADER = "X-API-Key" API_VERSION_HEADER = "Cartesia-Version" @@ -41,6 +47,8 @@ class _TTSOptions: encoding: TTSEncoding sample_rate: int voice: str | list[float] + speed: TTSVoiceSpeed | float | None + emotion: list[TTSVoiceEmotion | str] | None api_key: str language: str @@ -53,10 +61,29 @@ def __init__( language: str = "en", encoding: TTSEncoding = "pcm_s16le", voice: str | list[float] = TTSDefaultVoiceId, + speed: TTSVoiceSpeed | float | None = None, + emotion: list[TTSVoiceEmotion | str] | None = None, sample_rate: int = 24000, api_key: str | None = None, http_session: aiohttp.ClientSession | None = None, ) -> None: + """ + Create a new instance of Cartesia TTS. + + See https://docs.cartesia.ai/reference/web-socket/stream-speech/stream-speech for more details on the the Cartesia API. + + Args: + model (TTSModels, optional): The Cartesia TTS model to use. Defaults to "sonic-english". + language (str, optional): The language code for synthesis. Defaults to "en". + encoding (TTSEncoding, optional): The audio encoding format. Defaults to "pcm_s16le". + voice (str | list[float], optional): The voice ID or embedding array. + speed (TTSVoiceSpeed | float, optional): Voice Control - Speed (https://docs.cartesia.ai/user-guides/voice-control) + emotion (list[TTSVoiceEmotion], optional): Voice Control - Emotion (https://docs.cartesia.ai/user-guides/voice-control) + sample_rate (int, optional): The audio sample rate in Hz. Defaults to 24000. + api_key (str, optional): The Cartesia API key. If not provided, it will be read from the CARTESIA_API_KEY environment variable. + http_session (aiohttp.ClientSession | None, optional): An existing aiohttp ClientSession to use. If not provided, a new session will be created. + """ + super().__init__( capabilities=tts.TTSCapabilities(streaming=True), sample_rate=sample_rate, @@ -73,6 +100,8 @@ def __init__( encoding=encoding, sample_rate=sample_rate, voice=voice, + speed=speed, + emotion=emotion, api_key=api_key, ) self._session = http_session @@ -268,6 +297,15 @@ def _to_cartesia_options(opts: _TTSOptions) -> dict[str, Any]: voice["mode"] = "embedding" voice["embedding"] = opts.voice + voice_controls: dict = {} + if opts.speed is not None: + voice_controls["speed"] = opts.speed + if opts.emotion is not None: + voice_controls["emotion"] = opts.emotion + + if voice_controls: + voice["__experimental_controls"] = voice_controls + return { "model_id": opts.model, "voice": voice, diff --git a/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py b/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py index 00a7bde1d..608b5cd5a 100644 --- a/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py +++ b/livekit-plugins/livekit-plugins-cartesia/livekit/plugins/cartesia/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.3.0" +__version__ = "0.4.2" diff --git a/livekit-plugins/livekit-plugins-cartesia/package.json b/livekit-plugins/livekit-plugins-cartesia/package.json index 48a4aeb31..2bf74f9a9 100644 --- a/livekit-plugins/livekit-plugins-cartesia/package.json +++ b/livekit-plugins/livekit-plugins-cartesia/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-cartesia", "private": true, - "version": "0.3.0" + "version": "0.4.2" } diff --git a/livekit-plugins/livekit-plugins-clova/README.md b/livekit-plugins/livekit-plugins-clova/README.md new file mode 100644 index 000000000..013cb7fe4 --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/README.md @@ -0,0 +1,13 @@ +# LiveKit Plugins Clova + +Agent Framework plugin for speech-to-text with [Clova](https://api.ncloud-docs.com/docs/)'s API. Currently supports speech-to-text. + +## Installation + +```bash +pip install livekit-plugins-clova +``` + +## Pre-requisites + +You need invoke url and secret key from Naver cloud platform -> Clova Speech and set as environment variables: `CLOVA_STT_INVOKE_URL` & `CLOVA_STT_SECRET_KEY` diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/__init__.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/__init__.py new file mode 100644 index 000000000..d554599f0 --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/__init__.py @@ -0,0 +1,21 @@ +from .stt import STT +from .version import __version__ + +__all__ = [ + "STT", + "__version__", +] + + +from livekit.agents import Plugin + + +class ClovaSTTPlugin(Plugin): + def __init__(self): + super().__init__(__name__, __version__, __package__) + + def download_files(self): + pass + + +Plugin.register_plugin(ClovaSTTPlugin()) diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.py new file mode 100644 index 000000000..3418dd8bf --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.py @@ -0,0 +1,13 @@ +import io + +from pydub import AudioSegment + + +def resample_audio(audio_bytes, original_sample_rate, target_sample_rate): + resampled_audio = AudioSegment.from_raw( + io.BytesIO(audio_bytes), + sample_width=2, + frame_rate=original_sample_rate, + channels=1, + ).set_frame_rate(target_sample_rate) + return resampled_audio.raw_data diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/constants.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/constants.py new file mode 100644 index 000000000..ec109084f --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/constants.py @@ -0,0 +1,2 @@ +CLOVA_INPUT_SAMPLE_RATE = 16000 +LIVEKIT_INPUT_SAMPLE_RATE = 48000 diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/log.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/log.py new file mode 100644 index 000000000..e28e00f47 --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/log.py @@ -0,0 +1,3 @@ +import logging + +logger = logging.getLogger("livekit.plugins.clova") diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/models.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/models.py new file mode 100644 index 000000000..490ab9660 --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/models.py @@ -0,0 +1,17 @@ +from typing import Literal + +ClovaSttLanguages = Literal["ko-KR", "en-US", "enko", "ja", "zh-cn", "zh-tw"] + +ClovaSpeechAPIType = Literal[ + "recognizer/object-storage", "recognizer/url", "recognizer/upload" +] + +clova_languages_mapping = { + "en": "en-US", + "ko-KR": "ko-KR", + "en-US": "en-US", + "enko": "enko", + "ja": "ja", + "zh-cn": "zh-cn", + "zh-tw": "zh-tw", +} diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/stt.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/stt.py new file mode 100644 index 000000000..308aa8b15 --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/stt.py @@ -0,0 +1,132 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +import io +import json +import os +import time +import wave +from typing import Optional, Union + +import aiohttp +from livekit.agents import stt, utils +from livekit.agents.stt import SpeechEventType, STTCapabilities +from livekit.agents.utils import AudioBuffer, merge_frames +from livekit.plugins.clova.constants import CLOVA_INPUT_SAMPLE_RATE + +from .common import resample_audio +from .log import logger +from .models import ClovaSpeechAPIType, ClovaSttLanguages, clova_languages_mapping + + +class STT(stt.STT): + def __init__( + self, + *, + language: ClovaSttLanguages = "en-US", + secret: Optional[str] = None, + invoke_url: Optional[str] = None, + http_session: Optional[aiohttp.ClientSession] = None, + threshold: float = 0.5, + ): + """ + Create a new instance of Clova STT. + + ``secret`` and ``invoke_url`` must be set, either using arguments or by setting the + ``CLOVA_STT_SECRET_KEY`` and ``CLOVA_STT_INVOKE_URL`` environmental variables, respectively. + """ + + super().__init__( + capabilities=STTCapabilities(streaming=False, interim_results=True) + ) + self._secret = secret or os.environ.get("CLOVA_STT_SECRET_KEY") + self._invoke_url = invoke_url or os.environ.get("CLOVA_STT_INVOKE_URL") + self._language = clova_languages_mapping.get(language, language) + self._session = http_session + if self._secret is None: + raise ValueError( + "Clova STT secret key is required. It should be set with env CLOVA_STT_SECRET_KEY" + ) + self.threshold = threshold + + def _ensure_session(self) -> aiohttp.ClientSession: + if not self._session: + self._session = utils.http_context.http_session() + return self._session + + def url_builder( + self, process_method: ClovaSpeechAPIType = "recognizer/upload" + ) -> str: + return f"{self._invoke_url}/{process_method}" + + async def recognize( + self, + *, + buffer: AudioBuffer, + language: Union[ClovaSttLanguages, str, None] = None, + ) -> stt.SpeechEvent: + try: + url = self.url_builder() + payload = json.dumps({"language": self._language, "completion": "sync"}) + + buffer = merge_frames(buffer) + buffer_bytes = resample_audio( + buffer.data.tobytes(), buffer.sample_rate, CLOVA_INPUT_SAMPLE_RATE + ) + + io_buffer = io.BytesIO() + with wave.open(io_buffer, "wb") as wav: + wav.setnchannels(1) + wav.setsampwidth(2) # 16-bit + wav.setframerate(CLOVA_INPUT_SAMPLE_RATE) + wav.writeframes(buffer_bytes) + io_buffer.seek(0) + + headers = {"X-CLOVASPEECH-API-KEY": self._secret} + form_data = aiohttp.FormData() + form_data.add_field("params", payload) + form_data.add_field( + "media", io_buffer, filename="audio.wav", content_type="audio/wav" + ) + start = time.time() + async with self._ensure_session().post( + url, data=form_data, headers=headers + ) as response: + response_data = await response.json() + end = time.time() + text = response_data.get("text") + confidence = response_data.get("confidence") + logger.info(f"{text} | {confidence} | total_seconds: {end - start}") + if not text or "error" in response_data: + raise ValueError(f"Unexpected response: {response_data}") + if confidence < self.threshold: + raise ValueError( + f"Confidence: {confidence} is bellow threshold {self.threshold}. Skipping." + ) + logger.info(f"final event: {response_data}") + return self._transcription_to_speech_event(text=text) + except Exception as ex: + logger.error(f"{ex}") + return self._transcription_to_speech_event( + event_type=stt.SpeechEventType.FINAL_TRANSCRIPT, text="" + ) + + def _transcription_to_speech_event( + self, + event_type: SpeechEventType = stt.SpeechEventType.INTERIM_TRANSCRIPT, + text: str = None, + ) -> stt.SpeechEvent: + return stt.SpeechEvent( + type=event_type, + alternatives=[stt.SpeechData(text=text, language=self._language)], + ) diff --git a/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py new file mode 100644 index 000000000..18b2a337f --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/version.py @@ -0,0 +1,15 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +__version__ = "0.0.2" diff --git a/livekit-plugins/livekit-plugins-clova/pyproject.toml b/livekit-plugins/livekit-plugins-clova/pyproject.toml new file mode 100644 index 000000000..8cf32563a --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/pyproject.toml @@ -0,0 +1,3 @@ +[build-system] +requires = ["setuptools>=61.0"] +build-backend = "setuptools.build_meta" \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-clova/setup.py b/livekit-plugins/livekit-plugins-clova/setup.py new file mode 100644 index 000000000..b6bc2fd09 --- /dev/null +++ b/livekit-plugins/livekit-plugins-clova/setup.py @@ -0,0 +1,56 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import pathlib + +import setuptools +import setuptools.command.build_py + +here = pathlib.Path(__file__).parent.resolve() +about = {} +with open(os.path.join(here, "livekit", "plugins", "clova", "version.py"), "r") as f: + exec(f.read(), about) + + +setuptools.setup( + name="livekit-plugins-clova", + version=about["__version__"], + description="LiveKit Agents Plugin for LINE Clova STT", + long_description=(here / "README.md").read_text(encoding="utf-8"), + long_description_content_type="text/markdown", + url="https://github.com/livekit/agents", + cmdclass={}, + classifiers=[ + "Intended Audience :: Developers", + "License :: OSI Approved :: Apache Software License", + "Topic :: Multimedia :: Sound/Audio", + "Topic :: Multimedia :: Video", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", + "Programming Language :: Python :: 3 :: Only", + ], + keywords=["webrtc", "realtime", "audio", "video", "livekit"], + license="Apache-2.0", + packages=setuptools.find_namespace_packages(include=["livekit.*"]), + python_requires=">=3.9.0", + install_requires=["livekit-agents>=0.8.0.dev0", "pydub~=0.25.1"], + project_urls={ + "Documentation": "https://docs.livekit.io", + "Website": "https://livekit.io/", + "Source": "https://github.com/livekit/agents", + }, +) diff --git a/livekit-plugins/livekit-plugins-deepgram/CHANGELOG.md b/livekit-plugins/livekit-plugins-deepgram/CHANGELOG.md index f81ba4feb..c6b2dfe99 100644 --- a/livekit-plugins/livekit-plugins-deepgram/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-deepgram/CHANGELOG.md @@ -1,5 +1,25 @@ # livekit-plugins-deepgram +## 0.6.7 + +### Patch Changes + +- Only send actual audio to Deepgram using a basic audio RMS filter - [#738](https://github.com/livekit/agents/pull/738) ([@keepingitneil](https://github.com/keepingitneil)) + +- defaults to nova-2-general model - [#726](https://github.com/livekit/agents/pull/726) ([@davidzhao](https://github.com/davidzhao)) + +## 0.6.6 + +### Patch Changes + +- deepgram: switch the default model to phonecall - [#676](https://github.com/livekit/agents/pull/676) ([@theomonnom](https://github.com/theomonnom)) + +## 0.6.5 + +### Patch Changes + +- deepgram: fallback to nova-2-general when the language isn't supported - [#623](https://github.com/livekit/agents/pull/623) ([@theomonnom](https://github.com/theomonnom)) + ## 0.6.4 ### Patch Changes diff --git a/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py b/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py index c56a2d74b..b1d593abb 100644 --- a/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py +++ b/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py @@ -30,6 +30,7 @@ from .log import logger from .models import DeepgramLanguages, DeepgramModels +from .utils import BasicAudioEnergyFilter BASE_URL = "https://api.deepgram.com/v1/listen" BASE_URL_WS = "wss://api.deepgram.com/v1/listen" @@ -55,7 +56,7 @@ class STT(stt.STT): def __init__( self, *, - model: DeepgramModels = "nova-2-conversationalai", + model: DeepgramModels = "nova-2-general", language: DeepgramLanguages = "en-US", detect_language: bool = False, interim_results: bool = True, @@ -68,6 +69,13 @@ def __init__( api_key: str | None = None, http_session: aiohttp.ClientSession | None = None, ) -> None: + """ + Create a new instance of Deepgram STT. + + ``api_key`` must be set to your Deepgram API key, either using the argument or by setting + the ``DEEPGRAM_API_KEY`` environmental variable. + """ + super().__init__( capabilities=stt.STTCapabilities( streaming=True, interim_results=interim_results @@ -78,7 +86,7 @@ def __init__( if api_key is None: raise ValueError("Deepgram API key is required") - if (language != "en-US" or language != "en") and model in ( + if language not in ("en-US", "en") and model in ( "nova-2-meeting", "nova-2-phonecall", "nova-2-finance", @@ -193,6 +201,7 @@ def __init__( self._session = http_session self._speaking = False self._max_retry = max_retry + self._audio_energy_filter = BasicAudioEnergyFilter(cooldown_seconds=1) @utils.log_exceptions(logger=logger) async def _main_task(self) -> None: @@ -284,10 +293,12 @@ async def send_task(): if isinstance(data, self._FlushSentinel): frames = audio_bstream.flush() else: - frames = audio_bstream.write(data.data) + frames = audio_bstream.write(data.data.tobytes()) for frame in frames: - await ws.send_bytes(frame.data.tobytes()) + has_audio = self._audio_energy_filter.push_frame(frame) + if has_audio: + await ws.send_bytes(frame.data.tobytes()) # tell deepgram we are done sending audio/inputs closing_ws = True diff --git a/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/utils.py b/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/utils.py new file mode 100644 index 000000000..c9c9ee452 --- /dev/null +++ b/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/utils.py @@ -0,0 +1,27 @@ +import numpy as np +from livekit import rtc + +# This is the magic number during testing that we use to determine if a frame is loud enough +# to possibly contain speech. It's very conservative. +MAGIC_NUMBER_THRESHOLD = 0.004 + + +class BasicAudioEnergyFilter: + def __init__(self, *, cooldown_seconds: float = 1): + self._cooldown_seconds = cooldown_seconds + self._cooldown = cooldown_seconds + + def push_frame(self, frame: rtc.AudioFrame) -> bool: + arr = np.frombuffer(frame.data, dtype=np.int16) + float_arr = arr.astype(np.float32) / 32768.0 + rms = np.sqrt(np.mean(np.square(float_arr))) + if rms > MAGIC_NUMBER_THRESHOLD: + self._cooldown = self._cooldown_seconds + return True + + duration_seconds = frame.samples_per_channel / frame.sample_rate + self._cooldown -= duration_seconds + if self._cooldown > 0: + return True + + return False diff --git a/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py b/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py index 4f1df5fb6..9aacd12fa 100644 --- a/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py +++ b/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.6.4" +__version__ = "0.6.7" diff --git a/livekit-plugins/livekit-plugins-deepgram/package.json b/livekit-plugins/livekit-plugins-deepgram/package.json index d28dfabe7..d317146ed 100644 --- a/livekit-plugins/livekit-plugins-deepgram/package.json +++ b/livekit-plugins/livekit-plugins-deepgram/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-deepgram", "private": true, - "version": "0.6.4" + "version": "0.6.7" } diff --git a/livekit-plugins/livekit-plugins-deepgram/setup.py b/livekit-plugins/livekit-plugins-deepgram/setup.py index 37b739565..98a4b82ba 100644 --- a/livekit-plugins/livekit-plugins-deepgram/setup.py +++ b/livekit-plugins/livekit-plugins-deepgram/setup.py @@ -47,7 +47,7 @@ license="Apache-2.0", packages=setuptools.find_namespace_packages(include=["livekit.*"]), python_requires=">=3.9.0", - install_requires=["livekit-agents>=0.8.0.dev0"], + install_requires=["livekit-agents>=0.8.0", "numpy~=1.21"], package_data={"livekit.plugins.deepgram": ["py.typed"]}, project_urls={ "Documentation": "https://docs.livekit.io", diff --git a/livekit-plugins/livekit-plugins-elevenlabs/CHANGELOG.md b/livekit-plugins/livekit-plugins-elevenlabs/CHANGELOG.md index d78e8d7d0..aabc2cbac 100644 --- a/livekit-plugins/livekit-plugins-elevenlabs/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-elevenlabs/CHANGELOG.md @@ -1,5 +1,19 @@ # livekit-plugins-elevenlabs +## 0.7.5 + +### Patch Changes + +- avoid returning tiny frames from TTS - [#747](https://github.com/livekit/agents/pull/747) ([@theomonnom](https://github.com/theomonnom)) + +- 11labs: send phoneme in one entire xml chunk - [#766](https://github.com/livekit/agents/pull/766) ([@theomonnom](https://github.com/theomonnom)) + +## 0.7.4 + +### Patch Changes + +- elevenlabs: expose enable_ssml_parsing - [#723](https://github.com/livekit/agents/pull/723) ([@theomonnom](https://github.com/theomonnom)) + ## 0.7.3 ### Patch Changes diff --git a/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.py b/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.py index 72b2490a0..a1907cdf6 100644 --- a/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.py +++ b/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/tts.py @@ -86,6 +86,7 @@ class _TTSOptions: streaming_latency: int word_tokenizer: tokenize.WordTokenizer chunk_length_schedule: list[int] + enable_ssml_parsing: bool class TTS(tts.TTS): @@ -101,9 +102,17 @@ def __init__( word_tokenizer: tokenize.WordTokenizer = tokenize.basic.WordTokenizer( ignore_punctuation=False # punctuation can help for intonation ), + enable_ssml_parsing: bool = False, chunk_length_schedule: list[int] = [80, 120, 200, 260], # range is [50, 500] http_session: aiohttp.ClientSession | None = None, ) -> None: + """ + Create a new instance of ElevenLabs TTS. + + ``api_key`` must be set to your ElevenLabs API key, either using the argument or by setting + the ``ELEVEN_API_KEY`` environmental variable. + """ + super().__init__( capabilities=tts.TTSCapabilities( streaming=True, @@ -125,6 +134,7 @@ def __init__( streaming_latency=streaming_latency, word_tokenizer=word_tokenizer, chunk_length_schedule=chunk_length_schedule, + enable_ssml_parsing=enable_ssml_parsing, ) self._session = http_session @@ -187,17 +197,19 @@ async def _main_task(self) -> None: content = await resp.text() logger.error("11labs returned non-audio data: %s", content) return + encoding = _encoding_from_format(self._opts.encoding) if encoding == "mp3": async for bytes_data, _ in resp.content.iter_chunks(): for frame in self._mp3_decoder.decode_chunk(bytes_data): - self._event_ch.send_nowait( - tts.SynthesizedAudio( - request_id=request_id, - segment_id=segment_id, - frame=frame, + for frame in bstream.write(frame.data.tobytes()): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, + segment_id=segment_id, + frame=frame, + ) ) - ) else: async for bytes_data, _ in resp.content.iter_chunks(): for frame in bstream.write(bytes_data): @@ -209,12 +221,12 @@ async def _main_task(self) -> None: ) ) - for frame in bstream.flush(): - self._event_ch.send_nowait( - tts.SynthesizedAudio( - request_id=request_id, segment_id=segment_id, frame=frame - ) + for frame in bstream.flush(): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, segment_id=segment_id, frame=frame ) + ) class SynthesizeStream(tts.SynthesizeStream): @@ -313,15 +325,34 @@ async def _run_ws( async def send_task(): nonlocal eos_sent + xml_content = [] async for data in word_stream: + text = data.token + + # send the xml phoneme in one go + if ( + self._opts.enable_ssml_parsing + and data.token.startswith("") > -1: + text = self._opts.word_tokenizer.format_words(xml_content) + xml_content = [] + else: + continue + # try_trigger_generation=True is a bad practice, we expose # chunk_length_schedule instead data_pkt = dict( - text=f"{data.token} ", # must always end with a space + text=f"{text} ", # must always end with a space try_trigger_generation=False, ) await ws_conn.send_str(json.dumps(data_pkt)) + if xml_content: + logger.warning("11labs stream ended with incomplete xml content") + # no more token, mark eos eos_pkt = dict(text="") await ws_conn.send_str(json.dumps(eos_pkt)) @@ -434,7 +465,9 @@ def _stream_url(opts: _TTSOptions) -> str: model_id = opts.model_id output_format = opts.encoding latency = opts.streaming_latency + enable_ssml = str(opts.enable_ssml_parsing).lower() return ( f"{base_url}/text-to-speech/{voice_id}/stream-input?" - f"model_id={model_id}&output_format={output_format}&optimize_streaming_latency={latency}" + f"model_id={model_id}&output_format={output_format}&optimize_streaming_latency={latency}&" + f"enable_ssml_parsing={enable_ssml}" ) diff --git a/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py b/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py index 20d8a2226..7bd26ee36 100644 --- a/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py +++ b/livekit-plugins/livekit-plugins-elevenlabs/livekit/plugins/elevenlabs/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.7.3" +__version__ = "0.7.5" diff --git a/livekit-plugins/livekit-plugins-elevenlabs/package.json b/livekit-plugins/livekit-plugins-elevenlabs/package.json index 16ebaf330..78fe20504 100644 --- a/livekit-plugins/livekit-plugins-elevenlabs/package.json +++ b/livekit-plugins/livekit-plugins-elevenlabs/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-elevenlabs", "private": true, - "version": "0.7.3" + "version": "0.7.5" } diff --git a/livekit-plugins/livekit-plugins-google/CHANGELOG.md b/livekit-plugins/livekit-plugins-google/CHANGELOG.md index 9977a53a0..7a187de9a 100644 --- a/livekit-plugins/livekit-plugins-google/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-google/CHANGELOG.md @@ -1,5 +1,27 @@ # livekit-plugins-google +## 0.7.1 + +### Patch Changes + +- avoid returning tiny frames from TTS - [#747](https://github.com/livekit/agents/pull/747) ([@theomonnom](https://github.com/theomonnom)) + +## 0.7.0 + +### Minor Changes + +- Enable use of Google STT with Application Default Credentials. - [#721](https://github.com/livekit/agents/pull/721) ([@rsinnet](https://github.com/rsinnet)) + +### Patch Changes + +- google-tts: ignore wav header - [#703](https://github.com/livekit/agents/pull/703) ([@theomonnom](https://github.com/theomonnom)) + +## 0.6.3 + +### Patch Changes + +- Fix Google STT exception when no valid speech is recognized - [#680](https://github.com/livekit/agents/pull/680) ([@davidzhao](https://github.com/davidzhao)) + ## 0.6.2 ### Patch Changes diff --git a/livekit-plugins/livekit-plugins-google/README.md b/livekit-plugins/livekit-plugins-google/README.md index 746e94473..b0fffb41e 100644 --- a/livekit-plugins/livekit-plugins-google/README.md +++ b/livekit-plugins/livekit-plugins-google/README.md @@ -10,4 +10,4 @@ pip install livekit-plugins-google ## Pre-requisites -For credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or set as [GOOGLE_APPLICATION_CREDENTIALS](https://cloud.google.com/docs/authentication/application-default-credentials) environment variable. +For credentials, you'll need a Google Cloud account and obtain the correct credentials. Credentials can be passed directly or via Application Default Credentials as specified in [How Application Default Credentials works](https://cloud.google.com/docs/authentication/application-default-credentials). diff --git a/livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py b/livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py index 4946a1de9..afff6f93a 100644 --- a/livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py +++ b/livekit-plugins/livekit-plugins-google/livekit/plugins/google/stt.py @@ -16,13 +16,14 @@ import asyncio import dataclasses -import os from dataclasses import dataclass from typing import AsyncIterable, List, Union from livekit import agents, rtc from livekit.agents import stt, utils +from google.auth import default as gauth_default +from google.auth.exceptions import DefaultCredentialsError from google.cloud.speech_v2 import SpeechAsyncClient from google.cloud.speech_v2.types import cloud_speech @@ -58,8 +59,11 @@ def __init__( credentials_file: str | None = None, ): """ - if no credentials is provided, it will use the credentials on the environment - GOOGLE_APPLICATION_CREDENTIALS (default behavior of Google SpeechAsyncClient) + Create a new instance of Google STT. + + Credentials must be provided, either by using the ``credentials_info`` dict, or reading + from the file specified in ``credentials_file`` or via Application Default Credentials as + described in https://cloud.google.com/docs/authentication/application-default-credentials """ super().__init__( capabilities=stt.STTCapabilities(streaming=True, interim_results=True) @@ -70,10 +74,13 @@ def __init__( self._credentials_file = credentials_file if credentials_file is None and credentials_info is None: - creds = os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") - if not creds: + try: + gauth_default() + except DefaultCredentialsError: raise ValueError( - "GOOGLE_APPLICATION_CREDENTIALS must be set if no credentials is provided" + "Application default credentials must be available " + "when using Google STT without explicitly passing " + "credentials through credentials_info or credentials_file." ) if isinstance(languages, str): @@ -109,7 +116,12 @@ def _recognizer(self) -> str: # recognizers may improve latency https://cloud.google.com/speech-to-text/v2/docs/recognizers#understand_recognizers # TODO(theomonnom): find a better way to access the project_id - project_id = self._ensure_client().transport._credentials.project_id # type: ignore + try: + project_id = self._ensure_client().transport._credentials.project_id # type: ignore + except AttributeError: + from google.auth import default as ga_default + + _, project_id = ga_default() return f"projects/{project_id}/locations/global/recognizers/_" def _sanitize_options(self, *, language: str | None = None) -> STTOptions: @@ -278,22 +290,22 @@ async def _run_stream( == cloud_speech.StreamingRecognizeResponse.SpeechEventType.SPEECH_EVENT_TYPE_UNSPECIFIED ): result = resp.results[0] + speech_data = _streaming_recognize_response_to_speech_data(resp) + if speech_data is None: + continue + if not result.is_final: self._event_ch.send_nowait( stt.SpeechEvent( type=stt.SpeechEventType.INTERIM_TRANSCRIPT, - alternatives=[ - _streaming_recognize_response_to_speech_data(resp) - ], + alternatives=[speech_data], ) ) else: self._event_ch.send_nowait( stt.SpeechEvent( type=stt.SpeechEventType.FINAL_TRANSCRIPT, - alternatives=[ - _streaming_recognize_response_to_speech_data(resp) - ], + alternatives=[speech_data], ) ) @@ -337,16 +349,21 @@ def _recognize_response_to_speech_event( def _streaming_recognize_response_to_speech_data( resp: cloud_speech.StreamingRecognizeResponse, -) -> stt.SpeechData: +) -> stt.SpeechData | None: text = "" confidence = 0.0 for result in resp.results: + if len(result.alternatives) == 0: + continue text += result.alternatives[0].transcript confidence += result.alternatives[0].confidence confidence /= len(resp.results) lg = resp.results[0].language_code + if text == "": + return None + data = stt.SpeechData( language=lg, start_time=0, end_time=0, confidence=confidence, text=text ) diff --git a/livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py b/livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py index 433ec84d2..f6fdb23e1 100644 --- a/livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py +++ b/livekit-plugins/livekit-plugins-google/livekit/plugins/google/tts.py @@ -51,9 +51,13 @@ def __init__( credentials_file: str | None = None, ) -> None: """ - if no credentials is provided, it will use the credentials on the environment - GOOGLE_APPLICATION_CREDENTIALS (default behavior of Google TextToSpeechAsyncClient) + Create a new instance of Google TTS. + + Credentials must be provided, either by using the ``credentials_info`` dict, or reading + from the file specified in ``credentials_file`` or the ``GOOGLE_APPLICATION_CREDENTIALS`` + environmental variable. """ + super().__init__( capabilities=tts.TTSCapabilities( streaming=False, @@ -137,13 +141,25 @@ async def _main_task(self) -> None: data = response.audio_content if self._opts.audio_config.audio_encoding == "mp3": decoder = utils.codecs.Mp3StreamDecoder() + bstream = utils.audio.AudioByteStream( + sample_rate=self._opts.audio_config.sample_rate_hertz, num_channels=1 + ) for frame in decoder.decode_chunk(data): + for frame in bstream.write(frame.data): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, segment_id=segment_id, frame=frame + ) + ) + + for frame in bstream.flush(): self._event_ch.send_nowait( tts.SynthesizedAudio( request_id=request_id, segment_id=segment_id, frame=frame ) ) else: + data = data[44:] # skip WAV header self._event_ch.send_nowait( tts.SynthesizedAudio( request_id=request_id, diff --git a/livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py b/livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py index 61bb6ddc4..947379190 100644 --- a/livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py +++ b/livekit-plugins/livekit-plugins-google/livekit/plugins/google/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.6.2" +__version__ = "0.7.1" diff --git a/livekit-plugins/livekit-plugins-google/package.json b/livekit-plugins/livekit-plugins-google/package.json index b837c6a0f..96c90e560 100644 --- a/livekit-plugins/livekit-plugins-google/package.json +++ b/livekit-plugins/livekit-plugins-google/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-google", "private": true, - "version": "0.6.2" + "version": "0.7.1" } diff --git a/livekit-plugins/livekit-plugins-google/setup.py b/livekit-plugins/livekit-plugins-google/setup.py index b3d601e02..02441a882 100644 --- a/livekit-plugins/livekit-plugins-google/setup.py +++ b/livekit-plugins/livekit-plugins-google/setup.py @@ -48,6 +48,7 @@ packages=setuptools.find_namespace_packages(include=["livekit.*"]), python_requires=">=3.9.0", install_requires=[ + "google-auth >= 2, < 3", "google-cloud-speech >= 2, < 3", "google-cloud-texttospeech >= 2, < 3", "livekit-agents>=0.8.0.dev0", diff --git a/livekit-plugins/livekit-plugins-nltk/CHANGELOG.md b/livekit-plugins/livekit-plugins-nltk/CHANGELOG.md index a5e977792..6ee2124fe 100644 --- a/livekit-plugins/livekit-plugins-nltk/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-nltk/CHANGELOG.md @@ -1,5 +1,17 @@ # livekit-plugins-nltk +## 0.7.2 + +### Patch Changes + +- fix another semver break - [#659](https://github.com/livekit/agents/pull/659) ([@theomonnom](https://github.com/theomonnom)) + +## 0.7.1 + +### Patch Changes + +- Revert "nltk: fix broken punkt download" - [#630](https://github.com/livekit/agents/pull/630) ([@theomonnom](https://github.com/theomonnom)) + ## 0.7.0 ### Minor Changes diff --git a/livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py b/livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py index 6d6d0deb7..d40c15247 100644 --- a/livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py +++ b/livekit-plugins/livekit-plugins-nltk/livekit/plugins/nltk/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.7.0" +__version__ = "0.7.2" diff --git a/livekit-plugins/livekit-plugins-nltk/package.json b/livekit-plugins/livekit-plugins-nltk/package.json index f7bd7b3b2..66a8eb3fa 100644 --- a/livekit-plugins/livekit-plugins-nltk/package.json +++ b/livekit-plugins/livekit-plugins-nltk/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-nltk", "private": true, - "version": "0.7.0" + "version": "0.7.2" } diff --git a/livekit-plugins/livekit-plugins-nltk/setup.py b/livekit-plugins/livekit-plugins-nltk/setup.py index 3f1307ba3..49ce4a921 100644 --- a/livekit-plugins/livekit-plugins-nltk/setup.py +++ b/livekit-plugins/livekit-plugins-nltk/setup.py @@ -46,7 +46,7 @@ license="Apache-2.0", packages=setuptools.find_namespace_packages(include=["livekit.*"]), python_requires=">=3.9.0", - install_requires=["livekit-agents>=0.8.0.dev0", "nltk >= 3.8.2, < 4"], + install_requires=["livekit-agents>=0.8.0.dev0", "nltk >= 3.9.1, < 4"], package_data={"livekit.plugins.nltk": ["py.typed"]}, project_urls={ "Documentation": "https://docs.livekit.io", diff --git a/livekit-plugins/livekit-plugins-openai/CHANGELOG.md b/livekit-plugins/livekit-plugins-openai/CHANGELOG.md index f7e55bef2..686c38acf 100644 --- a/livekit-plugins/livekit-plugins-openai/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-openai/CHANGELOG.md @@ -1,5 +1,40 @@ # livekit-plugins-openai +## 0.8.4 + +### Patch Changes + +- avoid returning tiny frames from TTS - [#747](https://github.com/livekit/agents/pull/747) ([@theomonnom](https://github.com/theomonnom)) + +- Fixing Assistant API Vision Capabilities - [#771](https://github.com/livekit/agents/pull/771) ([@keepingitneil](https://github.com/keepingitneil)) + +## 0.8.3 + +### Patch Changes + +- Introduce function calling to OpenAI Assistants - [#710](https://github.com/livekit/agents/pull/710) ([@keepingitneil](https://github.com/keepingitneil)) + +- Add Cerebras to OpenAI Plugin - [#731](https://github.com/livekit/agents/pull/731) ([@henrytwo](https://github.com/henrytwo)) + +## 0.8.2 + +### Patch Changes + +- Add deepseek LLMs at OpenAI plugin - [#714](https://github.com/livekit/agents/pull/714) ([@lenage](https://github.com/lenage)) + +- skip processing of choice.delta when it is None - [#705](https://github.com/livekit/agents/pull/705) ([@theomonnom](https://github.com/theomonnom)) + +## 0.8.1 + +### Patch Changes + +- add support for Ollama, Perplexity, Fireworks, Octo, Together, and Groq LLMs through the OpenAI API - [#611](https://github.com/livekit/agents/pull/611) ([@nbsp](https://github.com/nbsp)) + +- allow sending user IDs - [#633](https://github.com/livekit/agents/pull/633) ([@nbsp](https://github.com/nbsp)) + +- Support OpenAI Assistants API as a beta feature under `livekit.plugins.openai.beta` - [#601](https://github.com/livekit/agents/pull/601) ([@keepingitneil](https://github.com/keepingitneil)) + Add \_metadata to ChatCtx and ChatMessage which can be used (in the case of OpenAI assistants) for bookeeping to sync local state with remote, OpenAI state + ## 0.8.0 ### Minor Changes diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/__init__.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/__init__.py index e0fa12e4b..a09f09fb9 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/__init__.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/__init__.py @@ -12,6 +12,8 @@ # See the License for the specific language governing permissions and # limitations under the License. + +from . import beta from .embeddings import EmbeddingData, create_embeddings from .llm import LLM, LLMStream from .models import TTSModels, TTSVoices, WhisperModels @@ -25,6 +27,7 @@ "LLM", "LLMStream", "WhisperModels", + "beta", "TTSModels", "TTSVoices", "create_embeddings", diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/README.md b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/README.md new file mode 100644 index 000000000..99827b787 --- /dev/null +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/README.md @@ -0,0 +1,78 @@ +# OpenAI Beta Features + +## Assistants API + +Example usage: + +```python +import asyncio + +from dotenv import load_dotenv +from livekit import rtc +from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm +from livekit.agents.voice_assistant import VoiceAssistant +from livekit.plugins import deepgram, openai, silero +from livekit.plugins.openai.beta import ( + AssistantCreateOptions, + AssistantLLM, + AssistantOptions, + OnFileUploadedInfo +) + +load_dotenv() + + +async def entrypoint(ctx: JobContext): + initial_ctx = llm.ChatContext() + + await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) + + # When using vision capabilities, files are uploaded. + # It's up to you to remove them if desired or otherwise manage + # them going forward. + def on_file_uploaded(self, info: OnFileUploadedInfo): + pass + + assistant = VoiceAssistant( + vad=silero.VAD.load(), + stt=deepgram.STT(), + llm=AssistantLLM( + assistant_opts=AssistantOptions( + create_options=AssistantCreateOptions( + model="gpt-4o", + instructions="You are a voice assistant created by LiveKit. Your interface with users will be voice.", + name="KITT", + ) + ) + ), + tts=openai.TTS(), + chat_ctx=initial_ctx, + on_file_uploaded: on_file_uploaded, + ) + assistant.start(ctx.room) + + # listen to incoming chat messages, only required if you'd like the agent to + # answer incoming messages from Chat + chat = rtc.ChatManager(ctx.room) + + async def answer_from_text(txt: str): + chat_ctx = assistant.chat_ctx.copy() + chat_ctx.append(role="user", text=txt) + stream = assistant.llm.chat(chat_ctx=chat_ctx) + await assistant.say(stream) + + @chat.on("message_received") + def on_chat_received(msg: rtc.ChatMessage): + if msg.message: + asyncio.create_task(answer_from_text(msg.message)) + + await asyncio.sleep(1) + await assistant.say("Hey, how can I help you today?", allow_interruptions=True) + + +if __name__ == "__main__": + cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) +``` + +## TODO +- tool calling \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/__init__.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/__init__.py new file mode 100644 index 000000000..f062606fb --- /dev/null +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/__init__.py @@ -0,0 +1,17 @@ +from .assistant_llm import ( + AssistantCreateOptions, + AssistantLLM, + AssistantLoadOptions, + AssistantOptions, + OnFileUploaded, + OnFileUploadedInfo, +) + +__all__ = [ + "AssistantLLM", + "AssistantOptions", + "AssistantCreateOptions", + "AssistantLoadOptions", + "OnFileUploaded", + "OnFileUploadedInfo", +] diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/assistant_llm.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/assistant_llm.py new file mode 100644 index 000000000..01fd60bc4 --- /dev/null +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/beta/assistant_llm.py @@ -0,0 +1,590 @@ +# Copyright 2023 LiveKit, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from __future__ import annotations + +import asyncio +import json +import uuid +from dataclasses import dataclass +from typing import Any, Callable, Dict, Literal, MutableSet, Union + +import httpx +from livekit import rtc +from livekit.agents import llm, utils + +from openai import AsyncAssistantEventHandler, AsyncClient +from openai.types.beta.threads import Text, TextDelta +from openai.types.beta.threads.run_create_params import AdditionalMessage +from openai.types.beta.threads.run_submit_tool_outputs_params import ToolOutput +from openai.types.beta.threads.runs import ( + CodeInterpreterToolCall, + FileSearchToolCall, + FunctionToolCall, + ToolCall, +) +from openai.types.file_object import FileObject + +from ..log import logger +from ..models import ChatModels + +DEFAULT_MODEL = "gpt-4o" +OPENAI_MESSAGE_ID_KEY = "__openai_message_id__" +LIVEKIT_MESSAGE_ID_KEY = "__livekit_message_id__" +OPENAI_MESSAGES_ADDED_KEY = "__openai_messages_added__" +OPENAI_FILE_ID_KEY = "__openai_file_id__" + + +@dataclass +class LLMOptions: + model: str | ChatModels + + +@dataclass +class AssistantOptions: + """Options for creating (on-the-fly) or loading an assistant. Only one of create_options or load_options should be set.""" + + create_options: AssistantCreateOptions | None = None + load_options: AssistantLoadOptions | None = None + + +@dataclass +class AssistantCreateOptions: + name: str + instructions: str + model: ChatModels + temperature: float | None = None + # TODO: when we implement code_interpreter and file_search tools + # tool_resources: ToolResources | None = None + # tools: list[AssistantTools] = field(default_factory=list) + + +@dataclass +class AssistantLoadOptions: + assistant_id: str + thread_id: str | None + + +@dataclass +class OnFileUploadedInfo: + type: Literal["image"] + original_file: llm.ChatImage + openai_file_object: FileObject + + +OnFileUploaded = Callable[[OnFileUploadedInfo], None] + + +class AssistantLLM(llm.LLM): + def __init__( + self, + *, + assistant_opts: AssistantOptions, + client: AsyncClient | None = None, + api_key: str | None = None, + base_url: str | None = None, + on_file_uploaded: OnFileUploaded | None = None, + ) -> None: + test_ctx = llm.ChatContext() + if not hasattr(test_ctx, "_metadata"): + raise Exception( + "This beta feature of 'livekit-plugins-openai' requires a newer version of 'livekit-agents'" + ) + self._client = client or AsyncClient( + api_key=api_key, + base_url=base_url, + http_client=httpx.AsyncClient( + timeout=httpx.Timeout(timeout=30, connect=10, read=5, pool=5), + follow_redirects=True, + limits=httpx.Limits( + max_connections=1000, + max_keepalive_connections=100, + keepalive_expiry=120, + ), + ), + ) + self._assistant_opts = assistant_opts + self._running_fncs: MutableSet[asyncio.Task[Any]] = set() + self._on_file_uploaded = on_file_uploaded + self._tool_call_run_id_lookup = dict[str, str]() + self._submitted_tool_calls = set[str]() + + self._sync_openai_task: asyncio.Task[AssistantLoadOptions] | None = None + try: + self._sync_openai_task = asyncio.create_task(self._sync_openai()) + except Exception: + logger.error( + "failed to create sync openai task. This can happen when instantiating without a running asyncio event loop (such has when running tests)" + ) + self._done_futures = list[asyncio.Future[None]]() + + async def _sync_openai(self) -> AssistantLoadOptions: + if self._assistant_opts.create_options: + kwargs: Dict[str, Any] = { + "model": self._assistant_opts.create_options.model, + "name": self._assistant_opts.create_options.name, + "instructions": self._assistant_opts.create_options.instructions, + # "tools": [ + # {"type": t} for t in self._assistant_opts.create_options.tools + # ], + # "tool_resources": self._assistant_opts.create_options.tool_resources, + } + # TODO when we implement code_interpreter and file_search tools + # if self._assistant_opts.create_options.tool_resources: + # kwargs["tool_resources"] = ( + # self._assistant_opts.create_options.tool_resources + # ) + if self._assistant_opts.create_options.temperature: + kwargs["temperature"] = self._assistant_opts.create_options.temperature + assistant = await self._client.beta.assistants.create(**kwargs) + + thread = await self._client.beta.threads.create() + return AssistantLoadOptions(assistant_id=assistant.id, thread_id=thread.id) + elif self._assistant_opts.load_options: + if not self._assistant_opts.load_options.thread_id: + thread = await self._client.beta.threads.create() + self._assistant_opts.load_options.thread_id = thread.id + return self._assistant_opts.load_options + + raise Exception("One of create_options or load_options must be set") + + def chat( + self, + *, + chat_ctx: llm.ChatContext, + fnc_ctx: llm.FunctionContext | None = None, + temperature: float | None = None, + n: int | None = None, + parallel_tool_calls: bool | None = None, + ): + if n is not None: + logger.warning("OpenAI Assistants does not support the 'n' parameter") + + if parallel_tool_calls is not None: + logger.warning( + "OpenAI Assistants does not support the 'parallel_tool_calls' parameter" + ) + + if not self._sync_openai_task: + self._sync_openai_task = asyncio.create_task(self._sync_openai()) + + return AssistantLLMStream( + temperature=temperature, + assistant_llm=self, + sync_openai_task=self._sync_openai_task, + client=self._client, + chat_ctx=chat_ctx, + fnc_ctx=fnc_ctx, + on_file_uploaded=self._on_file_uploaded, + ) + + async def _register_tool_call(self, tool_call_id: str, run_id: str) -> None: + self._tool_call_run_id_lookup[tool_call_id] = run_id + + async def _submit_tool_call_result(self, tool_call_id: str, result: str) -> None: + if tool_call_id in self._submitted_tool_calls: + return + logger.debug(f"submitting tool call {tool_call_id} result") + run_id = self._tool_call_run_id_lookup.get(tool_call_id) + if not run_id: + logger.error(f"tool call {tool_call_id} not found") + return + + if not self._sync_openai_task: + logger.error("sync_openai_task not set") + return + + thread_id = (await self._sync_openai_task).thread_id + if not thread_id: + logger.error("thread_id not set") + return + tool_output = ToolOutput(output=result, tool_call_id=tool_call_id) + await self._client.beta.threads.runs.submit_tool_outputs_and_poll( + tool_outputs=[tool_output], run_id=run_id, thread_id=thread_id + ) + self._submitted_tool_calls.add(tool_call_id) + logger.debug(f"submitted tool call {tool_call_id} result") + + +class AssistantLLMStream(llm.LLMStream): + class EventHandler(AsyncAssistantEventHandler): + def __init__( + self, + llm: AssistantLLM, + llm_stream: AssistantLLMStream, + output_queue: asyncio.Queue[llm.ChatChunk | Exception | None], + chat_ctx: llm.ChatContext, + fnc_ctx: llm.FunctionContext | None = None, + ): + super().__init__() + self._llm = llm + self._llm_stream = llm_stream + self._chat_ctx = chat_ctx + self._output_queue = output_queue + self._fnc_ctx = fnc_ctx + + async def on_text_delta(self, delta: TextDelta, snapshot: Text): + self._output_queue.put_nowait( + llm.ChatChunk( + choices=[ + llm.Choice( + delta=llm.ChoiceDelta(role="assistant", content=delta.value) + ) + ] + ) + ) + + async def on_tool_call_created(self, tool_call: ToolCall): + if not self.current_run: + logger.error("tool call created without run") + return + await self._llm._register_tool_call(tool_call.id, self.current_run.id) + + async def on_tool_call_done( + self, + tool_call: CodeInterpreterToolCall | FileSearchToolCall | FunctionToolCall, + ) -> None: + if tool_call.type == "code_interpreter": + logger.warning("code interpreter tool call not yet implemented") + elif tool_call.type == "file_search": + logger.warning("file_search tool call not yet implemented") + elif tool_call.type == "function": + if not self._fnc_ctx: + logger.error("function tool called without function context") + return + + fnc = llm.FunctionCallInfo( + function_info=self._fnc_ctx.ai_functions[tool_call.function.name], + arguments=json.loads(tool_call.function.arguments), + tool_call_id=tool_call.id, + raw_arguments=tool_call.function.arguments, + ) + + self._llm_stream._function_calls_info.append(fnc) + chunk = llm.ChatChunk( + choices=[ + llm.Choice( + delta=llm.ChoiceDelta(role="assistant", tool_calls=[fnc]), + index=0, + ) + ] + ) + self._output_queue.put_nowait(chunk) + + def __init__( + self, + *, + assistant_llm: AssistantLLM, + client: AsyncClient, + sync_openai_task: asyncio.Task[AssistantLoadOptions], + chat_ctx: llm.ChatContext, + fnc_ctx: llm.FunctionContext | None, + temperature: float | None, + on_file_uploaded: OnFileUploaded | None, + ) -> None: + super().__init__(chat_ctx=chat_ctx, fnc_ctx=fnc_ctx) + self._llm = assistant_llm + self._client = client + self._temperature = temperature + self._on_file_uploaded = on_file_uploaded + + # current function call that we're waiting for full completion (args are streamed) + self._tool_call_id: str | None = None + self._fnc_name: str | None = None + self._fnc_raw_arguments: str | None = None + self._output_queue = asyncio.Queue[Union[llm.ChatChunk, Exception, None]]() + self._create_stream_task = asyncio.create_task(self._create_stream()) + self._sync_openai_task = sync_openai_task + + # Running stream is used to ensure that we only have one stream running at a time + self._done_future: asyncio.Future[None] = asyncio.Future() + + async def _create_stream(self) -> None: + # This function's complexity is due to the fact that we need to sync chat_ctx messages with OpenAI. + # OpenAI also does not allow us to modify messages while a stream is running. So we need to make sure streams run + # sequentially. The strategy is as follows: + # + # 1. ensure that we have a thread_id and assistant_id from OpenAI. This comes from the _sync_openai_task + # 2. make sure all previous streams are done before starting a new one + # 3. delete messages that are no longer in the chat_ctx but are still in OpenAI by using the OpenAI message id + # 4. add new messages to OpenAI that are in the chat_ctx but not in OpenAI. We don't know the OpenAI message id yet + # so we create a random uuid (we call it the LiveKit message id) and set that in the metdata. + # 5. start the stream and wait for it to finish + # 6. get the OpenAI message ids for the messages we added to OpenAI by using the metadata + # 7. Resolve the OpenAI message id with all messages that have a LiveKit message id. + try: + load_options = await self._sync_openai_task + + # The assistants api does not let us modify messages while a stream is running. + # So we have to make sure previous streams are done before starting a new one. + await asyncio.gather(*self._llm._done_futures) + self._llm._done_futures.clear() + self._llm._done_futures.append(self._done_future) + + # OpenAI required submitting tool call outputs manually. We iterate + # tool outputs in the chat_ctx (from previous runs) and submit them + # before continuing. + for msg in self._chat_ctx.messages: + if msg.role == "tool": + if not msg.tool_call_id: + logger.error("tool message without tool_call_id") + continue + if not isinstance(msg.content, str): + logger.error("tool message content is not str") + continue + await self._llm._submit_tool_call_result( + msg.tool_call_id, msg.content + ) + + # At the chat_ctx level, create a map of thread_id to message_ids + # This is used to keep track of which messages have been added to the thread + # and which we may need to delete from OpenAI + if OPENAI_MESSAGES_ADDED_KEY not in self._chat_ctx._metadata: + self._chat_ctx._metadata[OPENAI_MESSAGES_ADDED_KEY] = dict() + + if ( + load_options.thread_id + not in self._chat_ctx._metadata[OPENAI_MESSAGES_ADDED_KEY] + ): + self._chat_ctx._metadata[OPENAI_MESSAGES_ADDED_KEY][ + load_options.thread_id + ] = set() + + # Keep this handy to make the code more readable later on + openai_addded_messages_set: set[str] = self._chat_ctx._metadata[ + OPENAI_MESSAGES_ADDED_KEY + ][load_options.thread_id] + + # Keep track of messages that are no longer in the chat_ctx but are still in OpenAI + # Note: Unfortuneately, this will add latency unfortunately. Usually it's just one message so we loop it but + # it will create an extra round trip to OpenAI before being able to run inference. + # TODO: parallelize it? + for msg in self._chat_ctx.messages: + msg_id = msg._metadata.get(OPENAI_MESSAGE_ID_KEY, {}).get( + load_options.thread_id + ) + assert load_options.thread_id + if msg_id and msg_id not in openai_addded_messages_set: + await self._client.beta.threads.messages.delete( + thread_id=load_options.thread_id, + message_id=msg_id, + ) + logger.debug( + f"Deleted message '{msg_id}' in thread '{load_options.thread_id}'" + ) + openai_addded_messages_set.remove(msg_id) + + # Upload any images in the chat_ctx that have not been uploaded to OpenAI + for msg in self._chat_ctx.messages: + if msg.role != "user": + continue + + if not isinstance(msg.content, list): + continue + + for cnt in msg.content: + if ( + not isinstance(cnt, llm.ChatImage) + or OPENAI_FILE_ID_KEY in cnt._cache + ): + continue + + if isinstance(cnt.image, str): + continue + + file_obj = await self._upload_frame( + cnt.image, cnt.inference_width, cnt.inference_height + ) + cnt._cache[OPENAI_FILE_ID_KEY] = file_obj.id + if self._on_file_uploaded: + self._on_file_uploaded( + OnFileUploadedInfo( + type="image", + original_file=cnt, + openai_file_object=file_obj, + ) + ) + + # Keep track of the new messages in the chat_ctx that we need to add to OpenAI + additional_messages: list[AdditionalMessage] = [] + for msg in self._chat_ctx.messages: + if msg.role != "user": + continue + + msg_id = str(uuid.uuid4()) + if OPENAI_MESSAGE_ID_KEY not in msg._metadata: + msg._metadata[OPENAI_MESSAGE_ID_KEY] = dict[str, str]() + + if LIVEKIT_MESSAGE_ID_KEY not in msg._metadata: + msg._metadata[LIVEKIT_MESSAGE_ID_KEY] = dict[str, str]() + + oai_msg_id_dict = msg._metadata[OPENAI_MESSAGE_ID_KEY] + lk_msg_id_dict = msg._metadata[LIVEKIT_MESSAGE_ID_KEY] + + if load_options.thread_id not in oai_msg_id_dict: + converted_msg = build_oai_message(msg) + converted_msg["private_message_id"] = msg_id + additional_messages.append( + AdditionalMessage( + role="user", + content=converted_msg["content"], + metadata={LIVEKIT_MESSAGE_ID_KEY: msg_id}, + ) + ) + lk_msg_id_dict[load_options.thread_id] = msg_id + + eh = AssistantLLMStream.EventHandler( + llm=self._llm, + output_queue=self._output_queue, + chat_ctx=self._chat_ctx, + fnc_ctx=self._fnc_ctx, + llm_stream=self, + ) + assert load_options.thread_id + kwargs: dict[str, Any] = { + "additional_messages": additional_messages, + "thread_id": load_options.thread_id, + "assistant_id": load_options.assistant_id, + "event_handler": eh, + "temperature": self._temperature, + } + if self._fnc_ctx: + kwargs["tools"] = [ + llm._oai_api.build_oai_function_description(f) + for f in self._fnc_ctx.ai_functions.values() + ] + + async with self._client.beta.threads.runs.stream(**kwargs) as stream: + await stream.until_done() + + await self._output_queue.put(None) + + # Populate the openai_message_id for the messages we added to OpenAI. Note, we do this after + # sending None to close the iterator so that it is done in parellel with any users of + # the stream. However, the next stream will not start until this is done. + lk_to_oai_lookup = dict[str, str]() + messages = await self._client.beta.threads.messages.list( + thread_id=load_options.thread_id, + limit=10, # We could be smarter and make a more exact query, but this is probably fine + ) + for oai_msg in messages.data: + if oai_msg.metadata.get(LIVEKIT_MESSAGE_ID_KEY): # type: ignore + lk_to_oai_lookup[oai_msg.metadata[LIVEKIT_MESSAGE_ID_KEY]] = ( # type: ignore + oai_msg.id + ) + + for msg in self._chat_ctx.messages: + if msg.role != "user": + continue + oai_msg_id_dict = msg._metadata.get(OPENAI_MESSAGE_ID_KEY) + lk_msg_id_dict = msg._metadata.get(LIVEKIT_MESSAGE_ID_KEY) + if oai_msg_id_dict is None or lk_msg_id_dict is None: + continue + + lk_msg_id = lk_msg_id_dict.get(load_options.thread_id) + if lk_msg_id and lk_msg_id in lk_to_oai_lookup: + oai_msg_id = lk_to_oai_lookup[lk_msg_id] + oai_msg_id_dict[load_options.thread_id] = oai_msg_id + openai_addded_messages_set.add(oai_msg_id) + # We don't need the LiveKit message id anymore + lk_msg_id_dict.pop(load_options.thread_id) + + except Exception as e: + await self._output_queue.put(e) + finally: + self._done_future.set_result(None) + + async def _upload_frame( + self, + frame: rtc.VideoFrame, + inference_width: int | None, + inference_height: int | None, + ): + # inside our internal implementation, we allow to put extra metadata to + # each ChatImage (avoid to reencode each time we do a chatcompletion request) + opts = utils.images.EncodeOptions() + if inference_width and inference_height: + opts.resize_options = utils.images.ResizeOptions( + width=inference_width, + height=inference_height, + strategy="center_aspect_fit", + ) + + encoded_data = utils.images.encode(frame, opts) + fileObj = await self._client.files.create( + file=("image.jpg", encoded_data), + purpose="vision", + ) + + return fileObj + + async def __anext__(self): + item = await self._output_queue.get() + if item is None: + raise StopAsyncIteration + + if isinstance(item, Exception): + raise item + + return item + + +def build_oai_message(msg: llm.ChatMessage): + oai_msg: dict[str, Any] = {"role": msg.role} + + if msg.name: + oai_msg["name"] = msg.name + + # add content if provided + if isinstance(msg.content, str): + oai_msg["content"] = msg.content + elif isinstance(msg.content, list): + oai_content: list[dict[str, Any]] = [] + for cnt in msg.content: + if isinstance(cnt, str): + oai_content.append({"type": "text", "text": cnt}) + elif isinstance(cnt, llm.ChatImage): + if cnt._cache[OPENAI_FILE_ID_KEY]: + oai_content.append( + { + "type": "image_file", + "image_file": {"file_id": cnt._cache[OPENAI_FILE_ID_KEY]}, + } + ) + + oai_msg["content"] = oai_content + + # make sure to provide when function has been called inside the context + # (+ raw_arguments) + if msg.tool_calls is not None: + tool_calls: list[dict[str, Any]] = [] + oai_msg["tool_calls"] = tool_calls + for fnc in msg.tool_calls: + tool_calls.append( + { + "id": fnc.tool_call_id, + "type": "function", + "function": { + "name": fnc.function_info.name, + "arguments": fnc.raw_arguments, + }, + } + ) + + # tool_call_id is set when the message is a response/result to a function call + # (content is a string in this case) + if msg.tool_call_id: + oai_msg["tool_call_id"] = msg.tool_call_id + + return oai_msg diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py index 07f75cc85..d8cfccc9f 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py @@ -15,13 +15,12 @@ from __future__ import annotations import asyncio -import base64 +import os from dataclasses import dataclass from typing import Any, Awaitable, MutableSet import httpx -from livekit import rtc -from livekit.agents import llm, utils +from livekit.agents import llm import openai from openai.types.chat import ChatCompletionChunk, ChatCompletionMessageParam @@ -29,18 +28,22 @@ from .log import logger from .models import ( + CerebrasChatModels, ChatModels, + DeepSeekChatModels, GroqChatModels, OctoChatModels, PerplexityChatModels, TogetherChatModels, ) -from .utils import AsyncAzureADTokenProvider +from .utils import AsyncAzureADTokenProvider, build_oai_message @dataclass class LLMOptions: model: str | ChatModels + user: str | None + temperature: float | None class LLM(llm.LLM): @@ -50,14 +53,27 @@ def __init__( model: str | ChatModels = "gpt-4o", api_key: str | None = None, base_url: str | None = None, + user: str | None = None, client: openai.AsyncClient | None = None, + temperature: float | None = None, ) -> None: - self._opts = LLMOptions(model=model) + """ + Create a new instance of OpenAI LLM. + + ``api_key`` must be set to your OpenAI API key, either using the argument or by setting the + ``OPENAI_API_KEY`` environmental variable. + """ + # throw an error on our end + api_key = api_key or os.environ.get("OPENAI_API_KEY") + if api_key is None: + raise ValueError("OpenAI API key is required") + + self._opts = LLMOptions(model=model, user=user, temperature=temperature) self._client = client or openai.AsyncClient( api_key=api_key, base_url=base_url, http_client=httpx.AsyncClient( - timeout=5.0, + timeout=httpx.Timeout(timeout=30, connect=10, read=5, pool=5), follow_redirects=True, limits=httpx.Limits( max_connections=1000, @@ -81,6 +97,8 @@ def with_azure( organization: str | None = None, project: str | None = None, base_url: str | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: """ This automatically infers the following arguments from their corresponding environment variables if they are not provided: @@ -104,7 +122,38 @@ def with_azure( base_url=base_url, ) # type: ignore - return LLM(model=model, client=azure_client) + return LLM(model=model, client=azure_client, user=user, temperature=temperature) + + @staticmethod + def with_cerebras( + *, + model: str | CerebrasChatModels = "llama3.1-8b", + api_key: str | None = None, + base_url: str | None = "https://api.cerebras.ai/v1", + client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, + ) -> LLM: + """ + Create a new instance of Cerebras LLM. + + ``api_key`` must be set to your Cerebras API key, either using the argument or by setting + the ``CEREBRAS_API_KEY`` environmental variable. + """ + + # shim for not using OPENAI_API_KEY + api_key = api_key or os.environ.get("CEREBRAS_API_KEY") + if api_key is None: + raise ValueError("Cerebras API key is required") + + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) @staticmethod def with_fireworks( @@ -113,8 +162,29 @@ def with_fireworks( api_key: str | None = None, base_url: str | None = "https://api.fireworks.ai/inference/v1", client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: - return LLM(model=model, api_key=api_key, base_url=base_url, client=client) + """ + Create a new instance of Fireworks LLM. + + ``api_key`` must be set to your Fireworks API key, either using the argument or by setting + the ``FIREWORKS_API_KEY`` environmental variable. + """ + + # shim for not using OPENAI_API_KEY + api_key = api_key or os.environ.get("FIREWORKS_API_KEY") + if api_key is None: + raise ValueError("Fireworks API key is required") + + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) @staticmethod def with_groq( @@ -123,8 +193,60 @@ def with_groq( api_key: str | None = None, base_url: str | None = "https://api.groq.com/openai/v1", client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: - return LLM(model=model, api_key=api_key, base_url=base_url, client=client) + """ + Create a new instance of Groq LLM. + + ``api_key`` must be set to your Groq API key, either using the argument or by setting + the ``GROQ_API_KEY`` environmental variable. + """ + + # shim for not using OPENAI_API_KEY + api_key = api_key or os.environ.get("GROQ_API_KEY") + if api_key is None: + raise ValueError("Groq API key is required") + + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) + + @staticmethod + def with_deepseek( + *, + model: str | DeepSeekChatModels = "deepseek-chat", + api_key: str | None = None, + base_url: str | None = "https://api.deepseek.com/v1", + client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, + ) -> LLM: + """ + Create a new instance of DeepSeek LLM. + + ``api_key`` must be set to your DeepSeek API key, either using the argument or by setting + the ``DEEPSEEK_API_KEY`` environmental variable. + """ + + # shim for not using OPENAI_API_KEY + api_key = api_key or os.environ.get("DEEPSEEK_API_KEY") + if api_key is None: + raise ValueError("DeepSeek API key is required") + + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) @staticmethod def with_octo( @@ -133,8 +255,29 @@ def with_octo( api_key: str | None = None, base_url: str | None = "https://text.octoai.run/v1", client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: - return LLM(model=model, api_key=api_key, base_url=base_url, client=client) + """ + Create a new instance of OctoAI LLM. + + ``api_key`` must be set to your OctoAI API key, either using the argument or by setting + the ``OCTOAI_TOKEN`` environmental variable. + """ + + # shim for not using OPENAI_API_KEY + api_key = api_key or os.environ.get("OCTOAI_TOKEN") + if api_key is None: + raise ValueError("OctoAI API key is required") + + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) @staticmethod def with_ollama( @@ -142,8 +285,19 @@ def with_ollama( model: str = "llama3.1", base_url: str | None = "http://localhost:11434/v1", client: openai.AsyncClient | None = None, + temperature: float | None = None, ) -> LLM: - return LLM(model=model, api_key="ollama", base_url=base_url, client=client) + """ + Create a new instance of Ollama LLM. + """ + + return LLM( + model=model, + api_key="ollama", + base_url=base_url, + client=client, + temperature=temperature, + ) @staticmethod def with_perplexity( @@ -152,8 +306,17 @@ def with_perplexity( api_key: str | None = None, base_url: str | None = "https://api.perplexity.ai", client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: - return LLM(model=model, api_key=api_key, base_url=base_url, client=client) + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) @staticmethod def with_together( @@ -162,8 +325,29 @@ def with_together( api_key: str | None = None, base_url: str | None = "https://api.together.xyz/v1", client: openai.AsyncClient | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: - return LLM(model=model, api_key=api_key, base_url=base_url, client=client) + """ + Create a new instance of TogetherAI LLM. + + ``api_key`` must be set to your TogetherAI API key, either using the argument or by setting + the ``TOGETHER_API_KEY`` environmental variable. + """ + + # shim for not using OPENAI_API_KEY + api_key = api_key or os.environ.get("TOGETHER_API_KEY") + if api_key is None: + raise ValueError("TogetherAI API key is required") + + return LLM( + model=model, + api_key=api_key, + base_url=base_url, + client=client, + user=user, + temperature=temperature, + ) @staticmethod def create_azure_client( @@ -178,6 +362,8 @@ def create_azure_client( organization: str | None = None, project: str | None = None, base_url: str | None = None, + user: str | None = None, + temperature: float | None = None, ) -> LLM: logger.warning("This alias is deprecated. Use LLM.with_azure() instead") return LLM.with_azure( @@ -190,6 +376,8 @@ def create_azure_client( organization=organization, project=project, base_url=base_url, + user=user, + temperature=temperature, ) def chat( @@ -212,6 +400,10 @@ def chat( if fnc_ctx and parallel_tool_calls is not None: opts["parallel_tool_calls"] = parallel_tool_calls + user = self._opts.user or openai.NOT_GIVEN + if temperature is None: + temperature = self._opts.temperature + messages = _build_oai_context(chat_ctx, id(self)) cmp = self._client.chat.completions.create( messages=messages, @@ -219,6 +411,7 @@ def chat( n=n, temperature=temperature, stream=True, + user=user, **opts, ) @@ -263,6 +456,11 @@ async def __anext__(self): def _parse_choice(self, choice: Choice) -> llm.ChatChunk | None: delta = choice.delta + # https://github.com/livekit/agents/issues/688 + # the delta can be None when using Azure OpenAI using content filtering + if delta is None: + return None + if delta.tool_calls: # check if we have functions to calls for tool in delta.tool_calls: @@ -332,77 +530,4 @@ def _try_run_function(self, choice: Choice) -> llm.ChatChunk | None: def _build_oai_context( chat_ctx: llm.ChatContext, cache_key: Any ) -> list[ChatCompletionMessageParam]: - return [_build_oai_message(msg, cache_key) for msg in chat_ctx.messages] # type: ignore - - -def _build_oai_message(msg: llm.ChatMessage, cache_key: Any): - oai_msg: dict = {"role": msg.role} - - if msg.name: - oai_msg["name"] = msg.name - - # add content if provided - if isinstance(msg.content, str): - oai_msg["content"] = msg.content - elif isinstance(msg.content, list): - oai_content = [] - for cnt in msg.content: - if isinstance(cnt, str): - oai_content.append({"type": "text", "text": cnt}) - elif isinstance(cnt, llm.ChatImage): - oai_content.append(_build_oai_image_content(cnt, cache_key)) - - oai_msg["content"] = oai_content - - # make sure to provide when function has been called inside the context - # (+ raw_arguments) - if msg.tool_calls is not None: - tool_calls: list[dict[str, Any]] = [] - oai_msg["tool_calls"] = tool_calls - for fnc in msg.tool_calls: - tool_calls.append( - { - "id": fnc.tool_call_id, - "type": "function", - "function": { - "name": fnc.function_info.name, - "arguments": fnc.raw_arguments, - }, - } - ) - - # tool_call_id is set when the message is a response/result to a function call - # (content is a string in this case) - if msg.tool_call_id: - oai_msg["tool_call_id"] = msg.tool_call_id - - return oai_msg - - -def _build_oai_image_content(image: llm.ChatImage, cache_key: Any): - if isinstance(image.image, str): # image url - return { - "type": "image_url", - "image_url": {"url": image.image, "detail": "auto"}, - } - elif isinstance(image.image, rtc.VideoFrame): # VideoFrame - if cache_key not in image._cache: - # inside our internal implementation, we allow to put extra metadata to - # each ChatImage (avoid to reencode each time we do a chatcompletion request) - opts = utils.images.EncodeOptions() - if image.inference_width and image.inference_height: - opts.resize_options = utils.images.ResizeOptions( - width=image.inference_width, - height=image.inference_height, - strategy="center_aspect_fit", - ) - - encoded_data = utils.images.encode(image.image, opts) - image._cache[cache_key] = base64.b64encode(encoded_data).decode("utf-8") - - return { - "type": "image_url", - "image_url": {"url": f"data:image/jpeg;base64,{image._cache[cache_key]}"}, - } - - raise ValueError(f"unknown image type {type(image.image)}") + return [build_oai_message(msg, cache_key) for msg in chat_ctx.messages] # type: ignore diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py index 95e81aa66..3815826e4 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/models.py @@ -7,6 +7,8 @@ ChatModels = Literal[ "gpt-4o", "gpt-4o-2024-05-13", + "gpt-4o-mini", + "gpt-4o-mini-2024-07-18", "gpt-4-turbo", "gpt-4-turbo-2024-04-09", "gpt-4-turbo-preview", @@ -31,8 +33,15 @@ "text-embedding-ada-002", "text-embedding-3-small", "text-embedding-3-large" ] +AssistantTools = Literal["code_interpreter", "file_search", "function"] + # adapters for OpenAI-compatible LLMs +CerebrasChatModels = Literal[ + "llama3.1-8b", + "llama3.1-70b", +] + PerplexityChatModels = Literal[ "llama-3.1-sonar-small-128k-online", "llama-3.1-sonar-small-128k-chat", @@ -56,6 +65,11 @@ "gemma2-9b-it", ] +DeepSeekChatModels = Literal[ + "deepseek-coder", + "deepseek-chat", +] + TogetherChatModels = Literal[ "Austism/chronos-hermes-13b", "Gryphe/MythoMax-L2-13b", diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py index f9356a1cb..6cb949b9d 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py @@ -16,6 +16,7 @@ import dataclasses import io +import os import wave from dataclasses import dataclass @@ -47,6 +48,13 @@ def __init__( api_key: str | None = None, client: openai.AsyncClient | None = None, ): + """ + Create a new instance of OpenAI STT. + + ``api_key`` must be set to your OpenAI API key, either using the argument or by setting the + ``OPENAI_API_KEY`` environmental variable. + """ + super().__init__( capabilities=stt.STTCapabilities(streaming=False, interim_results=False) ) @@ -59,6 +67,11 @@ def __init__( model=model, ) + # throw an error on our end + api_key = api_key or os.environ.get("OPENAI_API_KEY") + if api_key is None: + raise ValueError("OpenAI API key is required") + self._client = client or openai.AsyncClient( api_key=api_key, base_url=base_url, diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py index 27f62df13..fed67c9c5 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/tts.py @@ -14,6 +14,7 @@ from __future__ import annotations +import os from dataclasses import dataclass from typing import AsyncContextManager @@ -48,6 +49,13 @@ def __init__( api_key: str | None = None, client: openai.AsyncClient | None = None, ) -> None: + """ + Create a new instance of OpenAI TTS. + + ``api_key`` must be set to your OpenAI API key, either using the argument or by setting the + ``OPENAI_API_KEY`` environmental variable. + """ + super().__init__( capabilities=tts.TTSCapabilities( streaming=False, @@ -56,6 +64,11 @@ def __init__( num_channels=OPENAI_TTS_CHANNELS, ) + # throw an error on our end + api_key = api_key or os.environ.get("OPENAI_API_KEY") + if api_key is None: + raise ValueError("OpenAI API key is required") + self._client = client or openai.AsyncClient( api_key=api_key, base_url=base_url, @@ -144,11 +157,26 @@ async def _main_task(self): request_id = utils.shortuuid() segment_id = utils.shortuuid() decoder = utils.codecs.Mp3StreamDecoder() + audio_bstream = utils.audio.AudioByteStream( + sample_rate=OPENAI_TTS_SAMPLE_RATE, + num_channels=OPENAI_TTS_CHANNELS, + ) + async with self._oai_stream as stream: - async for data in stream.iter_bytes(4096): + async for data in stream.iter_bytes(): for frame in decoder.decode_chunk(data): - self._event_ch.send_nowait( - tts.SynthesizedAudio( - request_id=request_id, segment_id=segment_id, frame=frame + for frame in audio_bstream.write(frame.data): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, + segment_id=segment_id, + frame=frame, + ) ) + + for frame in audio_bstream.flush(): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, segment_id=segment_id, frame=frame ) + ) diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/utils.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/utils.py index 55e7d8d13..40d95037f 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/utils.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/utils.py @@ -1,3 +1,89 @@ -from typing import Awaitable, Callable, Union +from __future__ import annotations + +import base64 +import os +from typing import Any, Awaitable, Callable, Optional, Union + +from livekit import rtc +from livekit.agents import llm, utils AsyncAzureADTokenProvider = Callable[[], Union[str, Awaitable[str]]] + + +def get_base_url(base_url: Optional[str]) -> str: + if not base_url: + base_url = os.getenv("OPENAI_BASE_URL", "https://api.openai.com/v1") + return base_url + + +def build_oai_message(msg: llm.ChatMessage, cache_key: Any): + oai_msg: dict[str, Any] = {"role": msg.role} + + if msg.name: + oai_msg["name"] = msg.name + + # add content if provided + if isinstance(msg.content, str): + oai_msg["content"] = msg.content + elif isinstance(msg.content, list): + oai_content: list[dict[str, Any]] = [] + for cnt in msg.content: + if isinstance(cnt, str): + oai_content.append({"type": "text", "text": cnt}) + elif isinstance(cnt, llm.ChatImage): + oai_content.append(_build_oai_image_content(cnt, cache_key)) + + oai_msg["content"] = oai_content + + # make sure to provide when function has been called inside the context + # (+ raw_arguments) + if msg.tool_calls is not None: + tool_calls: list[dict[str, Any]] = [] + oai_msg["tool_calls"] = tool_calls + for fnc in msg.tool_calls: + tool_calls.append( + { + "id": fnc.tool_call_id, + "type": "function", + "function": { + "name": fnc.function_info.name, + "arguments": fnc.raw_arguments, + }, + } + ) + + # tool_call_id is set when the message is a response/result to a function call + # (content is a string in this case) + if msg.tool_call_id: + oai_msg["tool_call_id"] = msg.tool_call_id + + return oai_msg + + +def _build_oai_image_content(image: llm.ChatImage, cache_key: Any): + if isinstance(image.image, str): # image url + return { + "type": "image_url", + "image_url": {"url": image.image, "detail": "auto"}, + } + elif isinstance(image.image, rtc.VideoFrame): # VideoFrame + if cache_key not in image._cache: + # inside our internal implementation, we allow to put extra metadata to + # each ChatImage (avoid to reencode each time we do a chatcompletion request) + opts = utils.images.EncodeOptions() + if image.inference_width and image.inference_height: + opts.resize_options = utils.images.ResizeOptions( + width=image.inference_width, + height=image.inference_height, + strategy="center_aspect_fit", + ) + + encoded_data = utils.images.encode(image.image, opts) + image._cache[cache_key] = base64.b64encode(encoded_data).decode("utf-8") + + return { + "type": "image_url", + "image_url": {"url": f"data:image/jpeg;base64,{image._cache[cache_key]}"}, + } + + raise ValueError(f"unknown image type {type(image.image)}") diff --git a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py index fc4dcfeb4..bdeeae9e4 100644 --- a/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py +++ b/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.8.0" +__version__ = "0.8.4" diff --git a/livekit-plugins/livekit-plugins-openai/package.json b/livekit-plugins/livekit-plugins-openai/package.json index b27e3946f..20eeab2f4 100644 --- a/livekit-plugins/livekit-plugins-openai/package.json +++ b/livekit-plugins/livekit-plugins-openai/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-openai", "private": true, - "version": "0.8.0" + "version": "0.8.4" } diff --git a/livekit-plugins/livekit-plugins-playht/README.md b/livekit-plugins/livekit-plugins-playht/README.md new file mode 100644 index 000000000..53badc144 --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/README.md @@ -0,0 +1,13 @@ +# LiveKit Plugins PlayHT + +Agent Framework plugin for voice synthesis with [PlayHT](https://play.ht/) API. + +## Installation + +```bash +pip install livekit-plugins-playht +``` + +## Pre-requisites + +You'll need USER ID and API Secret KEY from PlayHT. It can be set as an environment variable: `PLAYHT_USER_ID`, `PLAYHT_API_KEY` \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-playht/livekit/__init__.py b/livekit-plugins/livekit-plugins-playht/livekit/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/livekit-plugins/livekit-plugins-playht/livekit/plugins/__init__.py b/livekit-plugins/livekit-plugins-playht/livekit/plugins/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/__init__.py b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/__init__.py new file mode 100644 index 000000000..366012953 --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/__init__.py @@ -0,0 +1,24 @@ + +from .models import TTSEngines +from .tts import DEFAULT_VOICE, TTS, Voice +from .version import __version__ + +__all__ = [ + "TTS", + "Voice", + "DEFAULT_VOICE", + "TTSEngines", + "__version__", +] + +from livekit.agents import Plugin + + +class PlayHTPlugin(Plugin): + def __init__(self) -> None: + super().__init__(__name__, __version__, __package__) + + def download_files(self) -> None: + self.download_files(self) + +Plugin.register_plugin(PlayHTPlugin()) diff --git a/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/log.py b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/log.py new file mode 100644 index 000000000..fe278b042 --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/log.py @@ -0,0 +1,3 @@ +import logging + +logger = logging.getLogger("livekit.custom_tts_plugins.playht") \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/models.py b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/models.py new file mode 100644 index 000000000..942560b9b --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/models.py @@ -0,0 +1,19 @@ +from typing import Literal + +TTSEngines = Literal[ + 'PlayHT2.0', + 'PlayHT1.0', + 'PlayHT2.0-turbo' +] + +TTSEncoding = Literal[ + "mp3_22050_32", + "mp3_44100_32", + "mp3_44100_64", + "mp3_44100_96", + "mp3_44100_128", + "mp3_44100_192", + "pcm_16000", + "pcm_22050", + "pcm_44100", +] \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/tts.py b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/tts.py new file mode 100644 index 000000000..4ce65e7fc --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/tts.py @@ -0,0 +1,218 @@ +from __future__ import annotations + +import asyncio +import base64 +import dataclasses +import json +import os +import io +from dataclasses import dataclass +from typing import Any, List, Literal +from pyht import Client, TTSOptions, Format + +import aiohttp +from livekit.agents import tts, utils, tokenize +from livekit import rtc + +from .log import logger +from .models import TTSEncoding, TTSEngines + +_Encoding = Literal["mp3", "pcm"] + + +def _sample_rate_from_format(output_format: TTSEncoding) -> int: + split = output_format.split("_") + return int(split[1]) + + +def _encoding_from_format(output_format: TTSEncoding) -> _Encoding: + if output_format.startswith("mp3"): + return "mp3" + elif output_format.startswith("pcm"): + return "pcm" + + raise ValueError(f"Unknown format: {output_format}") + + +@dataclass +class Voice: + id: str + name: str + voice_engine: TTSEngines + + +DEFAULT_VOICE = Voice( + id="s3://peregrine-voices/mel22/manifest.json", + name="Will", + voice_engine="PlayHT2.0" +) + +ACCEPT_HEADER = { + "mp3": "audio/mpeg", + "wav": "audio/wav", + "ogg": "audio/ogg", + "flac": "audio/flac", + "mulaw": "audio/basic" # commonly used for mulaw +} + +API_BASE_URL_V1 = "https://api.play.ht/api/v2" +AUTHORIZATION_HEADER = "AUTHORIZATION" +USERID_HEADER = "X-USER-ID" +PLAYHT_TTS_SAMPLE_RATE = 24000 +PLAYHT_TTS_CHANNELS = 1 + + +@dataclass +class _TTSOptions: + api_key: str + user_id: str + voice: Voice + base_url: str + sample_rate: int + encoding: TTSEncoding + + + + +class TTS(tts.TTS): + def __init__( + self, + *, + voice: Voice = DEFAULT_VOICE, + api_key: str | None = None, + user_id: str | None = None, + base_url: str | None = None, + encoding: Literal["mp3", "wav", "ogg", "flac", "mulaw"] | None = "wav", + http_session: aiohttp.ClientSession | None = None, + ) -> None: + super().__init__( + capabilities=tts.TTSCapabilities( + streaming=False, + ), + sample_rate=PLAYHT_TTS_SAMPLE_RATE, + num_channels=PLAYHT_TTS_CHANNELS, + ) + api_key = api_key or os.environ.get("PLAYHT_API_KEY") + if not api_key: + raise ValueError("PLAYHT_API_KEY must be set") + + user_id = user_id or os.environ.get("PLAYHT_USER_ID") + if not user_id: + raise ValueError("PLAYHT_USER_ID mus be set") + + self._opts = _TTSOptions( + voice=voice, + user_id=user_id, + api_key=api_key, + base_url=base_url or API_BASE_URL_V1, + sample_rate=self.sample_rate, + encoding=encoding + ) + self._session = http_session + + def _ensure_session(self) -> aiohttp.ClientSession: + if not self._session: + self._session = utils.http_context.http_session() + + return self._session + + async def list_voices(self) -> List[Voice]: + async with self._ensure_session().get( + f"{self._opts.base_url}/voices", + headers={ + "accept": "application/json", + AUTHORIZATION_HEADER: self._opts.api_key, + USERID_HEADER: self._opts.user_id + }, + ) as resp: + return _dict_to_voices_list(await resp.json()) + + def synthesize(self, text: str) -> "ChunkedStream": + return ChunkedStream(text, self._opts, self._ensure_session()) + + +class ChunkedStream(tts.ChunkedStream): + """Synthesize using the chunked api endpoint""" + + def __init__( + self, text: str, opts: _TTSOptions, session: aiohttp.ClientSession + ) -> None: + super().__init__() + self._text, self._opts, self._session = text, opts, session + + @utils.log_exceptions(logger=logger) + async def _main_task(self) -> None: + stream = utils.audio.AudioByteStream( + sample_rate=self._opts.sample_rate, num_channels=1 + ) + self._mp3_decoder = utils.codecs.Mp3StreamDecoder() + request_id = utils.shortuuid() + segment_id = utils.shortuuid() + parent_path = os.path.dirname(os.path.abspath(__file__)) + client = Client(self._opts.user_id, self._opts.api_key) + options = TTSOptions( + voice=self._opts.voice.id, + sample_rate=PLAYHT_TTS_SAMPLE_RATE, + format=Format.FORMAT_WAV, + speed=1 + ) + url = "https://api.play.ht/api/v2/tts/stream" + headers = { + "accept": ACCEPT_HEADER[self._opts.encoding], + "content-type": "application/json", + "AUTHORIZATION": self._opts.api_key, + "X-USER-ID": self._opts.user_id + } + json_data = { + "text": "Hello, How are you?", + "output_format": self._opts.en, + "voice": "s3://voice-cloning-zero-shot/d9ff78ba-d016-47f6-b0ef-dd630f59414e/female-cs/manifest.json", + } + async with self._session.post(url=url, headers=headers, json=json_data) as resp: + if not resp.content_type.startswith("audio/"): + content = await resp.text() + logger.error("playHT returned non-audio data: %s", content) + return + + encoding = _encoding_from_format(self._opts.encoding) + if encoding == "mp3": + async for bytes_data, _ in resp.content.iter_chunks(): + for frame in self._mp3_decoder.decode_chunk(bytes_data): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, + segment_id=segment_id, + frame=frame, + ) + ) + else: + async for bytes_data, _ in resp.content.iter_chunks(): + for frame in stream.write(bytes_data): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, + segment_id=segment_id, + frame=frame, + ) + ) + + for frame in stream.flush(): + self._event_ch.send_nowait( + tts.SynthesizedAudio( + request_id=request_id, segment_id=segment_id, frame=frame + ) + ) + + +def _dict_to_voices_list(data: dict[str, Any]): + voices: List[Voice] = [] + for voice in data["text"]: + voices.append( + Voice( + id=voice["id"], + name=voice["name"], + voice_engine=voice["voice_engine"] + ) + ) + return voices + diff --git a/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/version.py b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/version.py new file mode 100644 index 000000000..5becc17c0 --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/livekit/plugins/playht/version.py @@ -0,0 +1 @@ +__version__ = "1.0.0" diff --git a/livekit-plugins/livekit-plugins-playht/package.json b/livekit-plugins/livekit-plugins-playht/package.json new file mode 100644 index 000000000..0ac1584b7 --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/package.json @@ -0,0 +1,6 @@ +{ + "name": "livekit-plugins-playht", + "private": true, + "version": "1.0.0" + } + \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-playht/pyproject.toml b/livekit-plugins/livekit-plugins-playht/pyproject.toml new file mode 100644 index 000000000..8cf32563a --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/pyproject.toml @@ -0,0 +1,3 @@ +[build-system] +requires = ["setuptools>=61.0"] +build-backend = "setuptools.build_meta" \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-playht/setup.py b/livekit-plugins/livekit-plugins-playht/setup.py new file mode 100644 index 000000000..ba8d7f293 --- /dev/null +++ b/livekit-plugins/livekit-plugins-playht/setup.py @@ -0,0 +1,44 @@ + +import os +import pathlib + +import setuptools +import setuptools.command.build_py + +here = pathlib.Path(__file__).parent.resolve() +about = {} +with open( + os.path.join(here, "livekit", "plugins", "playht", "version.py"), "r" +) as f: + exec(f.read(), about) + + +setuptools.setup( + name="livekit-plugins-playht", + version=about["__version__"], + description="Agent Framework plugin for voice synthesis with PlayHT's API.", + long_description=(here / "README.md").read_text(encoding="utf-8"), + long_description_content_type="text/markdown", + url="https://github.com/livekit/agents", + cmdclass={}, + classifiers=[ + "Intended Audience :: Developers", + "Topic :: Multimedia :: Sound/Audio", + "Topic :: Scientific/Engineering :: Artificial Intelligence", + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.11", + "Programming Language :: Python :: 3.12", + "Programming Language :: Python :: 3 :: Only", + ], + keywords=["webrtc", "realtime", "audio", "livekit", "playHT"], + license="Apache-2.0", + packages=setuptools.find_namespace_packages(include=["livekit.*"]), + python_requires=">=3.9.0", + install_requires=["livekit-agents[codecs]>=0.8.0.dev0", "pyht", "aiohttp", "livekit"], + package_data={"livekit.plugins.playht": ["py.typed"]}, + project_urls={ + "Documentation": "https://docs.livekit.io", + "Website": "https://livekit.io/", + "Source": "https://github.com/livekit/agents", + }, +) \ No newline at end of file diff --git a/livekit-plugins/livekit-plugins-rag/CHANGELOG.md b/livekit-plugins/livekit-plugins-rag/CHANGELOG.md index 875d3beee..6a7effef5 100644 --- a/livekit-plugins/livekit-plugins-rag/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-rag/CHANGELOG.md @@ -1,5 +1,11 @@ # livekit-plugins-rag +## 0.2.2 + +### Patch Changes + +- rag: fix backward compatibility - [#629](https://github.com/livekit/agents/pull/629) ([@afigar](https://github.com/afigar)) + ## 0.2.1 ### Patch Changes diff --git a/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/__init__.py b/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/__init__.py index 3cd283ce6..7042c3fa7 100644 --- a/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/__init__.py +++ b/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/__init__.py @@ -27,5 +27,8 @@ class RAGPlugin(Plugin): def __init__(self) -> None: super().__init__(__name__, __version__, __package__, logger) + def download_files(self) -> None: + pass + Plugin.register_plugin(RAGPlugin()) diff --git a/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/version.py b/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/version.py index 875ee5214..2985d9da1 100644 --- a/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/version.py +++ b/livekit-plugins/livekit-plugins-rag/livekit/plugins/rag/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.2.1" +__version__ = "0.2.2" diff --git a/livekit-plugins/livekit-plugins-rag/package.json b/livekit-plugins/livekit-plugins-rag/package.json index ccac72cfa..897e16552 100644 --- a/livekit-plugins/livekit-plugins-rag/package.json +++ b/livekit-plugins/livekit-plugins-rag/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-rag", "private": true, - "version": "0.2.1" + "version": "0.2.2" } diff --git a/livekit-plugins/livekit-plugins-silero/CHANGELOG.md b/livekit-plugins/livekit-plugins-silero/CHANGELOG.md index 8e754db97..5fd5671c5 100644 --- a/livekit-plugins/livekit-plugins-silero/CHANGELOG.md +++ b/livekit-plugins/livekit-plugins-silero/CHANGELOG.md @@ -1,5 +1,13 @@ # livekit-plugins-silero +## 0.6.4 + +### Patch Changes + +- silero: adjust vad activation threshold - [#639](https://github.com/livekit/agents/pull/639) ([@theomonnom](https://github.com/theomonnom)) + +- silero: fix vad padding & static audio - [#631](https://github.com/livekit/agents/pull/631) ([@theomonnom](https://github.com/theomonnom)) + ## 0.6.3 ### Patch Changes diff --git a/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/vad.py b/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/vad.py index 2cd23c9da..7ac763508 100644 --- a/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/vad.py +++ b/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/vad.py @@ -27,6 +27,8 @@ from . import onnx_model from .log import logger +SLOW_INFERENCE_THRESHOLD = 0.2 # late by 200ms + @dataclass class _VADOptions: @@ -47,7 +49,7 @@ def load( min_silence_duration: float = 0.25, padding_duration: float = 0.1, max_buffered_speech: float = 60.0, - activation_threshold: float = 0.25, + activation_threshold: float = 0.5, sample_rate: int = 16000, force_cpu: bool = True, ) -> "VAD": @@ -108,11 +110,14 @@ def __init__(self, opts: _VADOptions, model: onnx_model.OnnxModel) -> None: self._task.add_done_callback(lambda _: self._executor.shutdown(wait=False)) self._exp_filter = utils.ExpFilter(alpha=0.35) + self._extra_inference_time = 0.0 + @agents.utils.log_exceptions(logger=logger) async def _main_task(self): og_sample_rate = 0 og_needed_samples = 0 # needed samples to complete the window data og_window_size_samples = 0 # size in samples of og_window_data + og_padding_size_samples = 0 # size in samples of padding data og_window_data: np.ndarray | None = None index_step = 0 @@ -143,16 +148,22 @@ async def _main_task(self): elif og_window_data is None: # alloc the og buffers now that we know the pushed sample rate og_sample_rate = frame.sample_rate + og_window_size_samples = int( (self._model.window_size_samples / self._model.sample_rate) * og_sample_rate ) + og_padding_size_samples = int( + self._opts.padding_duration * og_sample_rate + ) og_window_data = np.empty(og_window_size_samples, dtype=np.int16) og_needed_samples = og_window_size_samples index_step = frame.sample_rate // 16000 speech_buffer = np.empty( - int(self._opts.max_buffered_speech * og_sample_rate), dtype=np.int16 + int(self._opts.max_buffered_speech * og_sample_rate) + + int(self._opts.padding_duration * og_sample_rate) * 2, + dtype=np.int16, ) elif og_sample_rate != frame.sample_rate: logger.error("a frame with another sample rate was already pushed") @@ -160,11 +171,15 @@ async def _main_task(self): frame_data = np.frombuffer(frame.data, dtype=np.int16) remaining_samples = len(frame_data) + while remaining_samples > 0: to_copy = min(remaining_samples, og_needed_samples) - index = len(og_window_data) - og_needed_samples - og_window_data[index : index + to_copy] = frame_data[:to_copy] + window_index = og_window_size_samples - og_needed_samples + frame_index = len(frame_data) - remaining_samples + og_window_data[window_index : window_index + to_copy] = frame_data[ + frame_index : frame_index + to_copy + ] remaining_samples -= to_copy og_needed_samples -= to_copy @@ -183,45 +198,74 @@ async def _main_task(self): ) # run the inference - start_time = time.time() + start_time = time.perf_counter() raw_prob = await self._loop.run_in_executor( self._executor, self._model, inference_window_data ) + inference_duration = time.perf_counter() - start_time + prob_change = abs(raw_prob - self._exp_filter.filtered()) exp = 0.5 if prob_change > 0.25 else 1 raw_prob = self._exp_filter.apply(exp=exp, sample=raw_prob) - inference_duration = time.time() - start_time window_duration = ( self._model.window_size_samples / self._opts.sample_rate ) - if inference_duration > window_duration: + + self._extra_inference_time = max( + 0.0, + self._extra_inference_time + inference_duration - window_duration, + ) + if inference_duration > SLOW_INFERENCE_THRESHOLD: logger.warning( - "vad inference took too long - slower than realtime: %f", - inference_duration, + "inference is slower than realtime", + extra={"delay": self._extra_inference_time}, ) pub_current_sample += og_window_size_samples - def _copy_window(): + def _copy_inference_window(): nonlocal speech_buffer_index - to_copy = min( - og_window_size_samples, - len(speech_buffer) - speech_buffer_index, - ) + available_space = len(speech_buffer) - speech_buffer_index + to_copy = min(og_window_size_samples, available_space) if to_copy <= 0: - # max_buffered_speech reached - return + return # max_buffered_speech reached speech_buffer[ speech_buffer_index : speech_buffer_index + to_copy - ] = og_window_data - speech_buffer_index += og_window_size_samples + ] = og_window_data[:to_copy] + speech_buffer_index += to_copy + + def _reset_write_cursor(): + nonlocal speech_buffer_index + if speech_buffer_index <= og_padding_size_samples: + return + + padding_data = speech_buffer[ + speech_buffer_index + - og_padding_size_samples : speech_buffer_index + ] + + speech_buffer[:og_padding_size_samples] = padding_data + speech_buffer_index = og_padding_size_samples + + def _copy_speech_buffer() -> rtc.AudioFrame: + # copy the data from speech_buffer + assert speech_buffer is not None + speech_data = speech_buffer[:speech_buffer_index].tobytes() + + return rtc.AudioFrame( + sample_rate=og_sample_rate, + num_channels=1, + samples_per_channel=speech_buffer_index, + data=speech_data, + ) + + _copy_inference_window() if pub_speaking: pub_speech_duration += window_duration - _copy_window() else: pub_silence_duration += window_duration @@ -242,8 +286,6 @@ def _copy_window(): silence_threshold_duration = 0.0 if not pub_speaking: - _copy_window() - if speech_threshold_duration >= self._opts.min_speech_duration: pub_speaking = True pub_silence_duration = 0.0 @@ -255,6 +297,7 @@ def _copy_window(): samples_index=pub_current_sample, silence_duration=pub_silence_duration, speech_duration=pub_speech_duration, + frames=[_copy_speech_buffer()], speaking=True, ) ) @@ -263,37 +306,26 @@ def _copy_window(): speech_threshold_duration = 0.0 if not pub_speaking: - speech_buffer_index = 0 + _reset_write_cursor() if ( pub_speaking and silence_threshold_duration - >= self._opts.min_silence_duration + >= self._opts.min_silence_duration + self._opts.padding_duration ): pub_speaking = False pub_speech_duration = 0.0 pub_silence_duration = silence_threshold_duration - speech_data = speech_buffer[ - :speech_buffer_index - ].tobytes() # copy the data from speech_buffer - self._event_ch.send_nowait( agents.vad.VADEvent( type=agents.vad.VADEventType.END_OF_SPEECH, samples_index=pub_current_sample, silence_duration=pub_silence_duration, speech_duration=pub_speech_duration, - frames=[ - rtc.AudioFrame( - sample_rate=og_sample_rate, - num_channels=1, - samples_per_channel=speech_buffer_index, - data=speech_data, - ) - ], + frames=[_copy_speech_buffer()], speaking=False, ) ) - speech_buffer_index = 0 + _reset_write_cursor() diff --git a/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py b/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py index b315b98ad..4f1df5fb6 100644 --- a/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py +++ b/livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/version.py @@ -12,4 +12,4 @@ # See the License for the specific language governing permissions and # limitations under the License. -__version__ = "0.6.3" +__version__ = "0.6.4" diff --git a/livekit-plugins/livekit-plugins-silero/package.json b/livekit-plugins/livekit-plugins-silero/package.json index 39ad000c4..5d0bc7ed4 100644 --- a/livekit-plugins/livekit-plugins-silero/package.json +++ b/livekit-plugins/livekit-plugins-silero/package.json @@ -1,5 +1,5 @@ { "name": "livekit-plugins-silero", "private": true, - "version": "0.6.3" + "version": "0.6.4" } diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 635bf97df..3d3b0e9e1 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -17,37 +17,507 @@ importers: livekit-agents: {} + livekit-plugins/livekit-plugins-anthropic: {} + livekit-plugins/livekit-plugins-azure: {} - livekit-plugins/livekit-plugins-cartesia: {} + livekit-plugins/livekit-plugins-browser: {} + + livekit-plugins/livekit-plugins-cartesia: {} + + livekit-plugins/livekit-plugins-deepgram: {} + + livekit-plugins/livekit-plugins-elevenlabs: {} + + livekit-plugins/livekit-plugins-google: {} + + livekit-plugins/livekit-plugins-minimal: {} + + livekit-plugins/livekit-plugins-nltk: {} + + livekit-plugins/livekit-plugins-openai: {} + + livekit-plugins/livekit-plugins-rag: {} + + livekit-plugins/livekit-plugins-silero: {} + +packages: + + '@babel/runtime@7.24.8': + resolution: {integrity: sha512-5F7SDGs1T72ZczbRwbGO9lQi0NLjQxzl6i4lJxLxfW9U5UluCSyEJeniWvnhl3/euNiqQVbo8zruhsDfid0esA==} + engines: {node: '>=6.9.0'} + + '@changesets/apply-release-plan@7.0.4': + resolution: {integrity: sha512-HLFwhKWayKinWAul0Vj+76jVx1Pc2v55MGPVjZ924Y/ROeSsBMFutv9heHmCUj48lJyRfOTJG5+ar+29FUky/A==} + + '@changesets/assemble-release-plan@6.0.3': + resolution: {integrity: sha512-bLNh9/Lgl1VwkjWZTq8JmRqH+hj7/Yzfz0jsQ/zJJ+FTmVqmqPj3szeKOri8O/hEM8JmHW019vh2gTO9iq5Cuw==} + + '@changesets/changelog-git@0.2.0': + resolution: {integrity: sha512-bHOx97iFI4OClIT35Lok3sJAwM31VbUM++gnMBV16fdbtBhgYu4dxsphBF/0AZZsyAHMrnM0yFcj5gZM1py6uQ==} + + '@changesets/cli@2.27.7': + resolution: {integrity: sha512-6lr8JltiiXPIjDeYg4iM2MeePP6VN/JkmqBsVA5XRiy01hGS3y629LtSDvKcycj/w/5Eur1rEwby/MjcYS+e2A==} + hasBin: true + + '@changesets/config@3.0.2': + resolution: {integrity: sha512-cdEhS4t8woKCX2M8AotcV2BOWnBp09sqICxKapgLHf9m5KdENpWjyrFNMjkLqGJtUys9U+w93OxWT0czorVDfw==} + + '@changesets/errors@0.2.0': + resolution: {integrity: sha512-6BLOQUscTpZeGljvyQXlWOItQyU71kCdGz7Pi8H8zdw6BI0g3m43iL4xKUVPWtG+qrrL9DTjpdn8eYuCQSRpow==} + + '@changesets/get-dependents-graph@2.1.1': + resolution: {integrity: sha512-LRFjjvigBSzfnPU2n/AhFsuWR5DK++1x47aq6qZ8dzYsPtS/I5mNhIGAS68IAxh1xjO9BTtz55FwefhANZ+FCA==} + + '@changesets/get-github-info@0.5.2': + resolution: {integrity: sha512-JppheLu7S114aEs157fOZDjFqUDpm7eHdq5E8SSR0gUBTEK0cNSHsrSR5a66xs0z3RWuo46QvA3vawp8BxDHvg==} + + '@changesets/get-release-plan@4.0.3': + resolution: {integrity: sha512-6PLgvOIwTSdJPTtpdcr3sLtGatT+Jr22+cQwEBJBy6wP0rjB4yJ9lv583J9fVpn1bfQlBkDa8JxbS2g/n9lIyA==} + + '@changesets/get-version-range-type@0.4.0': + resolution: {integrity: sha512-hwawtob9DryoGTpixy1D3ZXbGgJu1Rhr+ySH2PvTLHvkZuQ7sRT4oQwMh0hbqZH1weAooedEjRsbrWcGLCeyVQ==} + + '@changesets/git@3.0.0': + resolution: {integrity: sha512-vvhnZDHe2eiBNRFHEgMiGd2CT+164dfYyrJDhwwxTVD/OW0FUD6G7+4DIx1dNwkwjHyzisxGAU96q0sVNBns0w==} + + '@changesets/logger@0.1.0': + resolution: {integrity: sha512-pBrJm4CQm9VqFVwWnSqKEfsS2ESnwqwH+xR7jETxIErZcfd1u2zBSqrHbRHR7xjhSgep9x2PSKFKY//FAshA3g==} + + '@changesets/parse@0.4.0': + resolution: {integrity: sha512-TS/9KG2CdGXS27S+QxbZXgr8uPsP4yNJYb4BC2/NeFUj80Rni3TeD2qwWmabymxmrLo7JEsytXH1FbpKTbvivw==} + + '@changesets/pre@2.0.0': + resolution: {integrity: sha512-HLTNYX/A4jZxc+Sq8D1AMBsv+1qD6rmmJtjsCJa/9MSRybdxh0mjbTvE6JYZQ/ZiQ0mMlDOlGPXTm9KLTU3jyw==} + + '@changesets/read@0.6.0': + resolution: {integrity: sha512-ZypqX8+/im1Fm98K4YcZtmLKgjs1kDQ5zHpc2U1qdtNBmZZfo/IBiG162RoP0CUF05tvp2y4IspH11PLnPxuuw==} + + '@changesets/should-skip-package@0.1.0': + resolution: {integrity: sha512-FxG6Mhjw7yFStlSM7Z0Gmg3RiyQ98d/9VpQAZ3Fzr59dCOM9G6ZdYbjiSAt0XtFr9JR5U2tBaJWPjrkGGc618g==} + + '@changesets/types@4.1.0': + resolution: {integrity: sha512-LDQvVDv5Kb50ny2s25Fhm3d9QSZimsoUGBsUioj6MC3qbMUCuC8GPIvk/M6IvXx3lYhAs0lwWUQLb+VIEUCECw==} + + '@changesets/types@5.2.1': + resolution: {integrity: sha512-myLfHbVOqaq9UtUKqR/nZA/OY7xFjQMdfgfqeZIBK4d0hA6pgxArvdv8M+6NUzzBsjWLOtvApv8YHr4qM+Kpfg==} + + '@changesets/types@6.0.0': + resolution: {integrity: sha512-b1UkfNulgKoWfqyHtzKS5fOZYSJO+77adgL7DLRDr+/7jhChN+QcHnbjiQVOz/U+Ts3PGNySq7diAItzDgugfQ==} + + '@changesets/write@0.3.1': + resolution: {integrity: sha512-SyGtMXzH3qFqlHKcvFY2eX+6b0NGiFcNav8AFsYwy5l8hejOeoeTDemu5Yjmke2V5jpzY+pBvM0vCCQ3gdZpfw==} + + '@livekit/changesets-changelog-github@0.0.4': + resolution: {integrity: sha512-MXaiLYwgkYciZb8G2wkVtZ1pJJzZmVx5cM30Q+ClslrIYyAqQhRbPmZDM79/5CGxb1MTemR/tfOM25tgJgAK0g==} + + '@manypkg/find-root@1.1.0': + resolution: {integrity: sha512-mki5uBvhHzO8kYYix/WRy2WX8S3B5wdVSc9D6KcU5lQNglP2yt58/VfLuAK49glRXChosY8ap2oJ1qgma3GUVA==} + + '@manypkg/get-packages@1.1.3': + resolution: {integrity: sha512-fo+QhuU3qE/2TQMQmbVMqaQ6EWbMhi4ABWP+O4AM1NqPBuy0OrApV5LO6BrrgnhtAHS2NH6RrVk9OL181tTi8A==} + + '@nodelib/fs.scandir@2.1.5': + resolution: {integrity: sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g==} + engines: {node: '>= 8'} + + '@nodelib/fs.stat@2.0.5': + resolution: {integrity: sha512-RkhPPp2zrqDAQA/2jNhnztcPAlv64XdhIp7a7454A5ovI7Bukxgt7MX7udwAu3zg1DcpPU0rz3VV1SeaqvY4+A==} + engines: {node: '>= 8'} + + '@nodelib/fs.walk@1.2.8': + resolution: {integrity: sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg==} + engines: {node: '>= 8'} + + '@types/node@12.20.55': + resolution: {integrity: sha512-J8xLz7q2OFulZ2cyGTLE1TbbZcjpno7FaN6zdJNrgAdrJ+DZzh/uFR6YrTb4C+nXakvud8Q4+rbhoIWlYQbUFQ==} + + '@types/semver@7.5.8': + resolution: {integrity: sha512-I8EUhyrgfLrcTkzV3TSsGyl1tSuPrEDzr0yd5m90UgNxQkyDXULk3b6MlQqTCpZpNtWe1K0hzclnZkTcLBe2UQ==} + + ansi-colors@4.1.3: + resolution: {integrity: sha512-/6w/C21Pm1A7aZitlI5Ni/2J6FFQN8i1Cvz3kHABAAbw93v/NlvKdVOqz7CCWz/3iv/JplRSEEZ83XION15ovw==} + engines: {node: '>=6'} + + ansi-regex@5.0.1: + resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==} + engines: {node: '>=8'} + + ansi-styles@3.2.1: + resolution: {integrity: sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==} + engines: {node: '>=4'} + + argparse@1.0.10: + resolution: {integrity: sha512-o5Roy6tNG4SL/FOkCAN6RzjiakZS25RLYFrcMttJqbdd8BWrnA+fGz57iN5Pb06pvBGvl5gQ0B48dJlslXvoTg==} + + array-union@2.1.0: + resolution: {integrity: sha512-HGyxoOTYUyCM6stUe6EJgnd4EoewAI7zMdfqO+kGjnlZmBDz/cR5pf8r/cR4Wq60sL/p0IkcjUEEPwS3GFrIyw==} + engines: {node: '>=8'} + + better-path-resolve@1.0.0: + resolution: {integrity: sha512-pbnl5XzGBdrFU/wT4jqmJVPn2B6UHPBOhzMQkY/SPUPB6QtUXtmBHBIwCbXJol93mOpGMnQyP/+BB19q04xj7g==} + engines: {node: '>=4'} + + braces@3.0.3: + resolution: {integrity: sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==} + engines: {node: '>=8'} + + chalk@2.4.2: + resolution: {integrity: sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==} + engines: {node: '>=4'} + + chardet@0.7.0: + resolution: {integrity: sha512-mT8iDcrh03qDGRRmoA2hmBJnxpllMR+0/0qlzjqZES6NdiWDcZkCNAk4rPFZ9Q85r27unkiNNg8ZOiwZXBHwcA==} + + ci-info@3.9.0: + resolution: {integrity: sha512-NIxF55hv4nSqQswkAeiOi1r83xy8JldOFDTWiug55KBu9Jnblncd2U6ViHmYgHf01TPZS77NJBhBMKdWj9HQMQ==} + engines: {node: '>=8'} + + color-convert@1.9.3: + resolution: {integrity: sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==} + + color-name@1.1.3: + resolution: {integrity: sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw==} + + cross-spawn@5.1.0: + resolution: {integrity: sha512-pTgQJ5KC0d2hcY8eyL1IzlBPYjTkyH72XRZPnLyKus2mBfNjQs3klqbJU2VILqZryAZUt9JOb3h/mWMy23/f5A==} + + dataloader@1.4.0: + resolution: {integrity: sha512-68s5jYdlvasItOJnCuI2Q9s4q98g0pCyL3HrcKJu8KNugUl8ahgmZYg38ysLTgQjjXX3H8CJLkAvWrclWfcalw==} + + detect-indent@6.1.0: + resolution: {integrity: sha512-reYkTUJAZb9gUuZ2RvVCNhVHdg62RHnJ7WJl8ftMi4diZ6NWlciOzQN88pUhSELEwflJht4oQDv0F0BMlwaYtA==} + engines: {node: '>=8'} + + dir-glob@3.0.1: + resolution: {integrity: sha512-WkrWp9GR4KXfKGYzOLmTuGVi1UWFfws377n9cc55/tb6DuqyF6pcQ5AbiHEshaDpY9v6oaSr2XCDidGmMwdzIA==} + engines: {node: '>=8'} + + dotenv@8.6.0: + resolution: {integrity: sha512-IrPdXQsk2BbzvCBGBOTmmSH5SodmqZNt4ERAZDmW4CT+tL8VtvinqywuANaFu4bOMWki16nqf0e4oC0QIaDr/g==} + engines: {node: '>=10'} + + enquirer@2.4.1: + resolution: {integrity: sha512-rRqJg/6gd538VHvR3PSrdRBb/1Vy2YfzHqzvbhGIQpDRKIa4FgV/54b5Q1xYSxOOwKvjXweS26E0Q+nAMwp2pQ==} + engines: {node: '>=8.6'} + + escape-string-regexp@1.0.5: + resolution: {integrity: sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg==} + engines: {node: '>=0.8.0'} + + esprima@4.0.1: + resolution: {integrity: sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A==} + engines: {node: '>=4'} + hasBin: true + + extendable-error@0.1.7: + resolution: {integrity: sha512-UOiS2in6/Q0FK0R0q6UY9vYpQ21mr/Qn1KOnte7vsACuNJf514WvCCUHSRCPcgjPT2bAhNIJdlE6bVap1GKmeg==} + + external-editor@3.1.0: + resolution: {integrity: sha512-hMQ4CX1p1izmuLYyZqLMO/qGNw10wSv9QDCPfzXfyFrOaCSSoRfqE1Kf1s5an66J5JZC62NewG+mK49jOCtQew==} + engines: {node: '>=4'} + + fast-glob@3.3.2: + resolution: {integrity: sha512-oX2ruAFQwf/Orj8m737Y5adxDQO0LAB7/S5MnxCdTNDd4p6BsyIVsv9JQsATbTSq8KHRpLwIHbVlUNatxd+1Ow==} + engines: {node: '>=8.6.0'} + + fastq@1.17.1: + resolution: {integrity: sha512-sRVD3lWVIXWg6By68ZN7vho9a1pQcN/WBFaAAsDDFzlJjvoGx0P8z7V1t72grFJfJhu3YPZBuu25f7Kaw2jN1w==} + + fill-range@7.1.1: + resolution: {integrity: sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==} + engines: {node: '>=8'} + + find-up@4.1.0: + resolution: {integrity: sha512-PpOwAdQ/YlXQ2vj8a3h8IipDuYRi3wceVQQGYWxNINccq40Anw7BlsEXCMbt1Zt+OLA6Fq9suIpIWD0OsnISlw==} + engines: {node: '>=8'} + + find-up@5.0.0: + resolution: {integrity: sha512-78/PXT1wlLLDgTzDs7sjq9hzz0vXD+zn+7wypEe4fXQxCmdmqfGsEPQxmiCSQI3ajFV91bVSsvNtrJRiW6nGng==} + engines: {node: '>=10'} + + find-yarn-workspace-root2@1.2.16: + resolution: {integrity: sha512-hr6hb1w8ePMpPVUK39S4RlwJzi+xPLuVuG8XlwXU3KD5Yn3qgBWVfy3AzNlDhWvE1EORCE65/Qm26rFQt3VLVA==} + + fs-extra@7.0.1: + resolution: {integrity: sha512-YJDaCJZEnBmcbw13fvdAM9AwNOJwOzrE4pqMqBq5nFiEqXUqHwlK4B+3pUw6JNvfSPtX05xFHtYy/1ni01eGCw==} + engines: {node: '>=6 <7 || >=8'} + + fs-extra@8.1.0: + resolution: {integrity: sha512-yhlQgA6mnOJUKOsRUFsgJdQCvkKhcz8tlZG5HBQfReYZy46OwLcY+Zia0mtdHsOo9y/hP+CxMN0TU9QxoOtG4g==} + engines: {node: '>=6 <7 || >=8'} + + glob-parent@5.1.2: + resolution: {integrity: sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow==} + engines: {node: '>= 6'} + + globby@11.1.0: + resolution: {integrity: sha512-jhIXaOzy1sb8IyocaruWSn1TjmnBVs8Ayhcy83rmxNJ8q2uWKCAj3CnJY+KpGSXCueAPc0i05kVvVKtP1t9S3g==} + engines: {node: '>=10'} + + graceful-fs@4.2.11: + resolution: {integrity: sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ==} + + has-flag@3.0.0: + resolution: {integrity: sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw==} + engines: {node: '>=4'} + + human-id@1.0.2: + resolution: {integrity: sha512-UNopramDEhHJD+VR+ehk8rOslwSfByxPIZyJRfV739NDhN5LF1fa1MqnzKm2lGTQRjNrjK19Q5fhkgIfjlVUKw==} + + iconv-lite@0.4.24: + resolution: {integrity: sha512-v3MXnZAcvnywkTUEZomIActle7RXXeedOR31wwl7VlyoXO4Qi9arvSenNQWne1TcRwhCL1HwLI21bEqdpj8/rA==} + engines: {node: '>=0.10.0'} + + ignore@5.3.1: + resolution: {integrity: sha512-5Fytz/IraMjqpwfd34ke28PTVMjZjJG2MPn5t7OE4eUCUNf8BAa7b5WUS9/Qvr6mwOQS7Mk6vdsMno5he+T8Xw==} + engines: {node: '>= 4'} + + is-extglob@2.1.1: + resolution: {integrity: sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==} + engines: {node: '>=0.10.0'} + + is-glob@4.0.3: + resolution: {integrity: sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==} + engines: {node: '>=0.10.0'} + + is-number@7.0.0: + resolution: {integrity: sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng==} + engines: {node: '>=0.12.0'} + + is-subdir@1.2.0: + resolution: {integrity: sha512-2AT6j+gXe/1ueqbW6fLZJiIw3F8iXGJtt0yDrZaBhAZEG1raiTxKWU+IPqMCzQAXOUCKdA4UDMgacKH25XG2Cw==} + engines: {node: '>=4'} + + is-windows@1.0.2: + resolution: {integrity: sha512-eXK1UInq2bPmjyX6e3VHIzMLobc4J94i4AWn+Hpq3OU5KkrRC96OAcR3PRJ/pGu6m8TRnBHP9dkXQVsT/COVIA==} + engines: {node: '>=0.10.0'} + + isexe@2.0.0: + resolution: {integrity: sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==} + + js-yaml@3.14.1: + resolution: {integrity: sha512-okMH7OXXJ7YrN9Ok3/SXrnu4iX9yOk+25nqX4imS2npuvTYDmo/QEZoqwZkYaIDk3jVvBOTOIEgEhaLOynBS9g==} + hasBin: true + + jsonfile@4.0.0: + resolution: {integrity: sha512-m6F1R3z8jjlf2imQHS2Qez5sjKWQzbuuhuJ/FKYFRZvPE3PuHcSMVZzfsLhGVOkfd20obL5SWEBew5ShlquNxg==} + + load-yaml-file@0.2.0: + resolution: {integrity: sha512-OfCBkGEw4nN6JLtgRidPX6QxjBQGQf72q3si2uvqyFEMbycSFFHwAZeXx6cJgFM9wmLrf9zBwCP3Ivqa+LLZPw==} + engines: {node: '>=6'} + + locate-path@5.0.0: + resolution: {integrity: sha512-t7hw9pI+WvuwNJXwk5zVHpyhIqzg2qTlklJOf0mVxGSbe3Fp2VieZcduNYjaLDoy6p9uGpQEGWG87WpMKlNq8g==} + engines: {node: '>=8'} + + locate-path@6.0.0: + resolution: {integrity: sha512-iPZK6eYjbxRu3uB4/WZ3EsEIMJFMqAoopl3R+zuq0UjcAm/MO6KCweDgPfP3elTztoKP3KtnVHxTn2NHBSDVUw==} + engines: {node: '>=10'} + + lodash.startcase@4.4.0: + resolution: {integrity: sha512-+WKqsK294HMSc2jEbNgpHpd0JfIBhp7rEV4aqXWqFr6AlXov+SlcgB1Fv01y2kGe3Gc8nMW7VA0SrGuSkRfIEg==} + + lru-cache@4.1.5: + resolution: {integrity: sha512-sWZlbEP2OsHNkXrMl5GYk/jKk70MBng6UU4YI/qGDYbgf6YbP4EvmqISbXCoJiRKs+1bSpFHVgQxvJ17F2li5g==} + + merge2@1.4.1: + resolution: {integrity: sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==} + engines: {node: '>= 8'} + + micromatch@4.0.7: + resolution: {integrity: sha512-LPP/3KorzCwBxfeUuZmaR6bG2kdeHSbe0P2tY3FLRU4vYrjYz5hI4QZwV0njUx3jeuKe67YukQ1LSPZBKDqO/Q==} + engines: {node: '>=8.6'} + + mri@1.2.0: + resolution: {integrity: sha512-tzzskb3bG8LvYGFF/mDTpq3jpI6Q9wc3LEmBaghu+DdCssd1FakN7Bc0hVNmEyGq1bq3RgfkCb3cmQLpNPOroA==} + engines: {node: '>=4'} + + node-fetch@2.7.0: + resolution: {integrity: sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==} + engines: {node: 4.x || >=6.0.0} + peerDependencies: + encoding: ^0.1.0 + peerDependenciesMeta: + encoding: + optional: true + + os-tmpdir@1.0.2: + resolution: {integrity: sha512-D2FR03Vir7FIu45XBY20mTb+/ZSWB00sjU9jdQXt83gDrI4Ztz5Fs7/yy74g2N5SVQY4xY1qDr4rNddwYRVX0g==} + engines: {node: '>=0.10.0'} + + outdent@0.5.0: + resolution: {integrity: sha512-/jHxFIzoMXdqPzTaCpFzAAWhpkSjZPF4Vsn6jAfNpmbH/ymsmd7Qc6VE9BGn0L6YMj6uwpQLxCECpus4ukKS9Q==} + + p-filter@2.1.0: + resolution: {integrity: sha512-ZBxxZ5sL2HghephhpGAQdoskxplTwr7ICaehZwLIlfL6acuVgZPm8yBNuRAFBGEqtD/hmUeq9eqLg2ys9Xr/yw==} + engines: {node: '>=8'} + + p-limit@2.3.0: + resolution: {integrity: sha512-//88mFWSJx8lxCzwdAABTJL2MyWB12+eIY7MDL2SqLmAkeKU9qxRvWuSyTjm3FUmpBEMuFfckAIqEaVGUDxb6w==} + engines: {node: '>=6'} + + p-limit@3.1.0: + resolution: {integrity: sha512-TYOanM3wGwNGsZN2cVTYPArw454xnXj5qmWF1bEoAc4+cU/ol7GVh7odevjp1FNHduHc3KZMcFduxU5Xc6uJRQ==} + engines: {node: '>=10'} + + p-locate@4.1.0: + resolution: {integrity: sha512-R79ZZ/0wAxKGu3oYMlz8jy/kbhsNrS7SKZ7PxEHBgJ5+F2mtFW2fK2cOtBh1cHYkQsbzFV7I+EoRKe6Yt0oK7A==} + engines: {node: '>=8'} + + p-locate@5.0.0: + resolution: {integrity: sha512-LaNjtRWUBY++zB5nE/NwcaoMylSPk+S+ZHNB1TzdbMJMny6dynpAGt7X/tl/QYq3TIeE6nxHppbo2LGymrG5Pw==} + engines: {node: '>=10'} + + p-map@2.1.0: + resolution: {integrity: sha512-y3b8Kpd8OAN444hxfBbFfj1FY/RjtTd8tzYwhUqNYXx0fXx2iX4maP4Qr6qhIKbQXI02wTLAda4fYUbDagTUFw==} + engines: {node: '>=6'} + + p-try@2.2.0: + resolution: {integrity: sha512-R4nPAVTAU0B9D35/Gk3uJf/7XYbQcyohSKdvAxIRSNghFl4e71hVoGnBNQz9cWaXxO2I10KTC+3jMdvvoKw6dQ==} + engines: {node: '>=6'} + + path-exists@4.0.0: + resolution: {integrity: sha512-ak9Qy5Q7jYb2Wwcey5Fpvg2KoAc/ZIhLSLOSBmRmygPsGwkVVt0fZa0qrtMz+m6tJTAHfZQ8FnmB4MG4LWy7/w==} + engines: {node: '>=8'} + + path-type@4.0.0: + resolution: {integrity: sha512-gDKb8aZMDeD/tZWs9P6+q0J9Mwkdl6xMV8TjnGP3qJVJ06bdMgkbBlLU8IdfOsIsFz2BW1rNVT3XuNEl8zPAvw==} + engines: {node: '>=8'} + + picomatch@2.3.1: + resolution: {integrity: sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==} + engines: {node: '>=8.6'} + + pify@4.0.1: + resolution: {integrity: sha512-uB80kBFb/tfd68bVleG9T5GGsGPjJrLAUpR5PZIrhBnIaRTQRjqdJSsIKkOP6OAIFbj7GOrcudc5pNjZ+geV2g==} + engines: {node: '>=6'} + + pkg-dir@4.2.0: + resolution: {integrity: sha512-HRDzbaKjC+AOWVXxAU/x54COGeIv9eb+6CkDSQoNTt4XyWoIJvuPsXizxu/Fr23EiekbtZwmh1IcIG/l/a10GQ==} + engines: {node: '>=8'} + + preferred-pm@3.1.4: + resolution: {integrity: sha512-lEHd+yEm22jXdCphDrkvIJQU66EuLojPPtvZkpKIkiD+l0DMThF/niqZKJSoU8Vl7iuvtmzyMhir9LdVy5WMnA==} + engines: {node: '>=10'} + + prettier@2.8.8: + resolution: {integrity: sha512-tdN8qQGvNjw4CHbY+XXk0JgCXn9QiF21a55rBe5LJAU+kDyC4WQn4+awm2Xfk2lQMk5fKup9XgzTZtGkjBdP9Q==} + engines: {node: '>=10.13.0'} + hasBin: true + + pseudomap@1.0.2: + resolution: {integrity: sha512-b/YwNhb8lk1Zz2+bXXpS/LK9OisiZZ1SNsSLxN1x2OXVEhW2Ckr/7mWE5vrC1ZTiJlD9g19jWszTmJsB+oEpFQ==} + + queue-microtask@1.2.3: + resolution: {integrity: sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A==} + + read-yaml-file@1.1.0: + resolution: {integrity: sha512-VIMnQi/Z4HT2Fxuwg5KrY174U1VdUIASQVWXXyqtNRtxSr9IYkn1rsI6Tb6HsrHCmB7gVpNwX6JxPTHcH6IoTA==} + engines: {node: '>=6'} + + regenerator-runtime@0.14.1: + resolution: {integrity: sha512-dYnhHh0nJoMfnkZs6GmmhFknAGRrLznOu5nc9ML+EJxGvrx6H7teuevqVqCuPcPK//3eDrrjQhehXVx9cnkGdw==} + + resolve-from@5.0.0: + resolution: {integrity: sha512-qYg9KP24dD5qka9J47d0aVky0N+b4fTU89LN9iDnjB5waksiC49rvMB0PrUJQGoTmH50XPiqOvAjDfaijGxYZw==} + engines: {node: '>=8'} + + reusify@1.0.4: + resolution: {integrity: sha512-U9nH88a3fc/ekCF1l0/UP1IosiuIjyTh7hBvXVMHYgVcfGvt897Xguj2UOLDeI5BG2m7/uwyaLVT6fbtCwTyzw==} + engines: {iojs: '>=1.0.0', node: '>=0.10.0'} + + run-parallel@1.2.0: + resolution: {integrity: sha512-5l4VyZR86LZ/lDxZTR6jqL8AFE2S0IFLMP26AbjsLVADxHdhB/c0GUsH+y39UfCi3dzz8OlQuPmnaJOMoDHQBA==} + + safer-buffer@2.1.2: + resolution: {integrity: sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==} + + semver@7.6.3: + resolution: {integrity: sha512-oVekP1cKtI+CTDvHWYFUcMtsK/00wmAEfyqKfNdARm8u1wNVhSgaX7A8d4UuIlUI5e84iEwOhs7ZPYRmzU9U6A==} + engines: {node: '>=10'} + hasBin: true + + shebang-command@1.2.0: + resolution: {integrity: sha512-EV3L1+UQWGor21OmnvojK36mhg+TyIKDh3iFBKBohr5xeXIhNBcx8oWdgkTEEQ+BEFFYdLRuqMfd5L84N1V5Vg==} + engines: {node: '>=0.10.0'} + + shebang-regex@1.0.0: + resolution: {integrity: sha512-wpoSFAxys6b2a2wHZ1XpDSgD7N9iVjg29Ph9uV/uaP9Ex/KXlkTZTeddxDPSYQpgvzKLGJke2UU0AzoGCjNIvQ==} + engines: {node: '>=0.10.0'} + + signal-exit@3.0.7: + resolution: {integrity: sha512-wnD2ZE+l+SPC/uoS0vXeE9L1+0wuaMqKlfz9AMUo38JsyLSBWSFcHR1Rri62LZc12vLr1gb3jl7iwQhgwpAbGQ==} + + slash@3.0.0: + resolution: {integrity: sha512-g9Q1haeby36OSStwb4ntCGGGaKsaVSjQ68fBxoQcutl5fS1vuY18H3wSt3jFyFtrkx+Kz0V1G85A4MyAdDMi2Q==} + engines: {node: '>=8'} + + spawndamnit@2.0.0: + resolution: {integrity: sha512-j4JKEcncSjFlqIwU5L/rp2N5SIPsdxaRsIv678+TZxZ0SRDJTm8JrxJMjE/XuiEZNEir3S8l0Fa3Ke339WI4qA==} - livekit-plugins/livekit-plugins-deepgram: {} + sprintf-js@1.0.3: + resolution: {integrity: sha512-D9cPgkvLlV3t3IzL0D0YLvGA9Ahk4PcvVwUbN0dSGr1aP0Nrt4AEnTUbuGvquEC0mA64Gqt1fzirlRs5ibXx8g==} - livekit-plugins/livekit-plugins-elevenlabs: {} + strip-ansi@6.0.1: + resolution: {integrity: sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==} + engines: {node: '>=8'} - livekit-plugins/livekit-plugins-google: {} + strip-bom@3.0.0: + resolution: {integrity: sha512-vavAMRXOgBVNF6nyEEmL3DBK19iRpDcoIwW+swQ+CbGiu7lju6t+JklA1MHweoWtadgt4ISVUsXLyDq34ddcwA==} + engines: {node: '>=4'} - livekit-plugins/livekit-plugins-minimal: {} + supports-color@5.5.0: + resolution: {integrity: sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==} + engines: {node: '>=4'} - livekit-plugins/livekit-plugins-nltk: {} + term-size@2.2.1: + resolution: {integrity: sha512-wK0Ri4fOGjv/XPy8SBHZChl8CM7uMc5VML7SqiQ0zG7+J5Vr+RMQDoHa2CNT6KHUnTGIXH34UDMkPzAUyapBZg==} + engines: {node: '>=8'} - livekit-plugins/livekit-plugins-openai: {} + tmp@0.0.33: + resolution: {integrity: sha512-jRCJlojKnZ3addtTOjdIqoRuPEKBvNXcGYqzO6zWZX8KfKEpnGY5jfggJQ3EjKuu8D4bJRr0y+cYJFmYbImXGw==} + engines: {node: '>=0.6.0'} - livekit-plugins/livekit-plugins-rag: {} + to-regex-range@5.0.1: + resolution: {integrity: sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==} + engines: {node: '>=8.0'} - livekit-plugins/livekit-plugins-silero: {} + tr46@0.0.3: + resolution: {integrity: sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==} -packages: + universalify@0.1.2: + resolution: {integrity: sha512-rBJeI5CXAlmy1pV+617WB9J63U6XcazHHF2f2dbJix4XzpUF0RS3Zbj0FGIOCAva5P/d/GBOYaACQ1w+0azUkg==} + engines: {node: '>= 4.0.0'} - /@babel/runtime@7.24.8: - resolution: {integrity: sha512-5F7SDGs1T72ZczbRwbGO9lQi0NLjQxzl6i4lJxLxfW9U5UluCSyEJeniWvnhl3/euNiqQVbo8zruhsDfid0esA==} - engines: {node: '>=6.9.0'} + webidl-conversions@3.0.1: + resolution: {integrity: sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==} + + whatwg-url@5.0.0: + resolution: {integrity: sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==} + + which-pm@2.2.0: + resolution: {integrity: sha512-MOiaDbA5ZZgUjkeMWM5EkJp4loW5ZRoa5bc3/aeMox/PJelMhE6t7S/mLuiY43DBupyxH+S0U1bTui9kWUlmsw==} + engines: {node: '>=8.15'} + + which@1.3.1: + resolution: {integrity: sha512-HxJdYWq1MTIQbJ3nw0cqssHoTNU267KlrDuGZ1WYlxDStUtKUhOaJmh112/TZmHxxUfuJqPXSOm7tDyas0OSIQ==} + hasBin: true + + yallist@2.1.2: + resolution: {integrity: sha512-ncTzHV7NvsQZkYe1DW7cbDLm0YpzHmZF5r/iyP3ZnQtMiJ+pjzisCiMNI+Sj+xQF5pXhSHxSB3uDbsBTzY/c2A==} + + yocto-queue@0.1.0: + resolution: {integrity: sha512-rVksvsnNCdJ/ohGc6xgPwyN8eheCxsiLM8mxuE/t/mOVqJewPuO1miLpTHQiRgTKCLexL4MeAFVagts7HmNZ2Q==} + engines: {node: '>=10'} + +snapshots: + + '@babel/runtime@7.24.8': dependencies: regenerator-runtime: 0.14.1 - dev: true - /@changesets/apply-release-plan@7.0.4: - resolution: {integrity: sha512-HLFwhKWayKinWAul0Vj+76jVx1Pc2v55MGPVjZ924Y/ROeSsBMFutv9heHmCUj48lJyRfOTJG5+ar+29FUky/A==} + '@changesets/apply-release-plan@7.0.4': dependencies: '@babel/runtime': 7.24.8 '@changesets/config': 3.0.2 @@ -63,10 +533,8 @@ packages: prettier: 2.8.8 resolve-from: 5.0.0 semver: 7.6.3 - dev: true - /@changesets/assemble-release-plan@6.0.3: - resolution: {integrity: sha512-bLNh9/Lgl1VwkjWZTq8JmRqH+hj7/Yzfz0jsQ/zJJ+FTmVqmqPj3szeKOri8O/hEM8JmHW019vh2gTO9iq5Cuw==} + '@changesets/assemble-release-plan@6.0.3': dependencies: '@babel/runtime': 7.24.8 '@changesets/errors': 0.2.0 @@ -75,17 +543,12 @@ packages: '@changesets/types': 6.0.0 '@manypkg/get-packages': 1.1.3 semver: 7.6.3 - dev: true - /@changesets/changelog-git@0.2.0: - resolution: {integrity: sha512-bHOx97iFI4OClIT35Lok3sJAwM31VbUM++gnMBV16fdbtBhgYu4dxsphBF/0AZZsyAHMrnM0yFcj5gZM1py6uQ==} + '@changesets/changelog-git@0.2.0': dependencies: '@changesets/types': 6.0.0 - dev: true - /@changesets/cli@2.27.7: - resolution: {integrity: sha512-6lr8JltiiXPIjDeYg4iM2MeePP6VN/JkmqBsVA5XRiy01hGS3y629LtSDvKcycj/w/5Eur1rEwby/MjcYS+e2A==} - hasBin: true + '@changesets/cli@2.27.7': dependencies: '@babel/runtime': 7.24.8 '@changesets/apply-release-plan': 7.0.4 @@ -119,10 +582,8 @@ packages: semver: 7.6.3 spawndamnit: 2.0.0 term-size: 2.2.1 - dev: true - /@changesets/config@3.0.2: - resolution: {integrity: sha512-cdEhS4t8woKCX2M8AotcV2BOWnBp09sqICxKapgLHf9m5KdENpWjyrFNMjkLqGJtUys9U+w93OxWT0czorVDfw==} + '@changesets/config@3.0.2': dependencies: '@changesets/errors': 0.2.0 '@changesets/get-dependents-graph': 2.1.1 @@ -131,35 +592,27 @@ packages: '@manypkg/get-packages': 1.1.3 fs-extra: 7.0.1 micromatch: 4.0.7 - dev: true - /@changesets/errors@0.2.0: - resolution: {integrity: sha512-6BLOQUscTpZeGljvyQXlWOItQyU71kCdGz7Pi8H8zdw6BI0g3m43iL4xKUVPWtG+qrrL9DTjpdn8eYuCQSRpow==} + '@changesets/errors@0.2.0': dependencies: extendable-error: 0.1.7 - dev: true - /@changesets/get-dependents-graph@2.1.1: - resolution: {integrity: sha512-LRFjjvigBSzfnPU2n/AhFsuWR5DK++1x47aq6qZ8dzYsPtS/I5mNhIGAS68IAxh1xjO9BTtz55FwefhANZ+FCA==} + '@changesets/get-dependents-graph@2.1.1': dependencies: '@changesets/types': 6.0.0 '@manypkg/get-packages': 1.1.3 chalk: 2.4.2 fs-extra: 7.0.1 semver: 7.6.3 - dev: true - /@changesets/get-github-info@0.5.2: - resolution: {integrity: sha512-JppheLu7S114aEs157fOZDjFqUDpm7eHdq5E8SSR0gUBTEK0cNSHsrSR5a66xs0z3RWuo46QvA3vawp8BxDHvg==} + '@changesets/get-github-info@0.5.2': dependencies: dataloader: 1.4.0 node-fetch: 2.7.0 transitivePeerDependencies: - encoding - dev: true - /@changesets/get-release-plan@4.0.3: - resolution: {integrity: sha512-6PLgvOIwTSdJPTtpdcr3sLtGatT+Jr22+cQwEBJBy6wP0rjB4yJ9lv583J9fVpn1bfQlBkDa8JxbS2g/n9lIyA==} + '@changesets/get-release-plan@4.0.3': dependencies: '@babel/runtime': 7.24.8 '@changesets/assemble-release-plan': 6.0.3 @@ -168,14 +621,10 @@ packages: '@changesets/read': 0.6.0 '@changesets/types': 6.0.0 '@manypkg/get-packages': 1.1.3 - dev: true - /@changesets/get-version-range-type@0.4.0: - resolution: {integrity: sha512-hwawtob9DryoGTpixy1D3ZXbGgJu1Rhr+ySH2PvTLHvkZuQ7sRT4oQwMh0hbqZH1weAooedEjRsbrWcGLCeyVQ==} - dev: true + '@changesets/get-version-range-type@0.4.0': {} - /@changesets/git@3.0.0: - resolution: {integrity: sha512-vvhnZDHe2eiBNRFHEgMiGd2CT+164dfYyrJDhwwxTVD/OW0FUD6G7+4DIx1dNwkwjHyzisxGAU96q0sVNBns0w==} + '@changesets/git@3.0.0': dependencies: '@babel/runtime': 7.24.8 '@changesets/errors': 0.2.0 @@ -184,33 +633,25 @@ packages: is-subdir: 1.2.0 micromatch: 4.0.7 spawndamnit: 2.0.0 - dev: true - /@changesets/logger@0.1.0: - resolution: {integrity: sha512-pBrJm4CQm9VqFVwWnSqKEfsS2ESnwqwH+xR7jETxIErZcfd1u2zBSqrHbRHR7xjhSgep9x2PSKFKY//FAshA3g==} + '@changesets/logger@0.1.0': dependencies: chalk: 2.4.2 - dev: true - /@changesets/parse@0.4.0: - resolution: {integrity: sha512-TS/9KG2CdGXS27S+QxbZXgr8uPsP4yNJYb4BC2/NeFUj80Rni3TeD2qwWmabymxmrLo7JEsytXH1FbpKTbvivw==} + '@changesets/parse@0.4.0': dependencies: '@changesets/types': 6.0.0 js-yaml: 3.14.1 - dev: true - /@changesets/pre@2.0.0: - resolution: {integrity: sha512-HLTNYX/A4jZxc+Sq8D1AMBsv+1qD6rmmJtjsCJa/9MSRybdxh0mjbTvE6JYZQ/ZiQ0mMlDOlGPXTm9KLTU3jyw==} + '@changesets/pre@2.0.0': dependencies: '@babel/runtime': 7.24.8 '@changesets/errors': 0.2.0 '@changesets/types': 6.0.0 '@manypkg/get-packages': 1.1.3 fs-extra: 7.0.1 - dev: true - /@changesets/read@0.6.0: - resolution: {integrity: sha512-ZypqX8+/im1Fm98K4YcZtmLKgjs1kDQ5zHpc2U1qdtNBmZZfo/IBiG162RoP0CUF05tvp2y4IspH11PLnPxuuw==} + '@changesets/read@0.6.0': dependencies: '@babel/runtime': 7.24.8 '@changesets/git': 3.0.0 @@ -220,59 +661,43 @@ packages: chalk: 2.4.2 fs-extra: 7.0.1 p-filter: 2.1.0 - dev: true - /@changesets/should-skip-package@0.1.0: - resolution: {integrity: sha512-FxG6Mhjw7yFStlSM7Z0Gmg3RiyQ98d/9VpQAZ3Fzr59dCOM9G6ZdYbjiSAt0XtFr9JR5U2tBaJWPjrkGGc618g==} + '@changesets/should-skip-package@0.1.0': dependencies: '@babel/runtime': 7.24.8 '@changesets/types': 6.0.0 '@manypkg/get-packages': 1.1.3 - dev: true - /@changesets/types@4.1.0: - resolution: {integrity: sha512-LDQvVDv5Kb50ny2s25Fhm3d9QSZimsoUGBsUioj6MC3qbMUCuC8GPIvk/M6IvXx3lYhAs0lwWUQLb+VIEUCECw==} - dev: true + '@changesets/types@4.1.0': {} - /@changesets/types@5.2.1: - resolution: {integrity: sha512-myLfHbVOqaq9UtUKqR/nZA/OY7xFjQMdfgfqeZIBK4d0hA6pgxArvdv8M+6NUzzBsjWLOtvApv8YHr4qM+Kpfg==} - dev: true + '@changesets/types@5.2.1': {} - /@changesets/types@6.0.0: - resolution: {integrity: sha512-b1UkfNulgKoWfqyHtzKS5fOZYSJO+77adgL7DLRDr+/7jhChN+QcHnbjiQVOz/U+Ts3PGNySq7diAItzDgugfQ==} - dev: true + '@changesets/types@6.0.0': {} - /@changesets/write@0.3.1: - resolution: {integrity: sha512-SyGtMXzH3qFqlHKcvFY2eX+6b0NGiFcNav8AFsYwy5l8hejOeoeTDemu5Yjmke2V5jpzY+pBvM0vCCQ3gdZpfw==} + '@changesets/write@0.3.1': dependencies: '@babel/runtime': 7.24.8 '@changesets/types': 6.0.0 fs-extra: 7.0.1 human-id: 1.0.2 prettier: 2.8.8 - dev: true - /@livekit/changesets-changelog-github@0.0.4: - resolution: {integrity: sha512-MXaiLYwgkYciZb8G2wkVtZ1pJJzZmVx5cM30Q+ClslrIYyAqQhRbPmZDM79/5CGxb1MTemR/tfOM25tgJgAK0g==} + '@livekit/changesets-changelog-github@0.0.4': dependencies: '@changesets/get-github-info': 0.5.2 '@changesets/types': 5.2.1 dotenv: 8.6.0 transitivePeerDependencies: - encoding - dev: true - /@manypkg/find-root@1.1.0: - resolution: {integrity: sha512-mki5uBvhHzO8kYYix/WRy2WX8S3B5wdVSc9D6KcU5lQNglP2yt58/VfLuAK49glRXChosY8ap2oJ1qgma3GUVA==} + '@manypkg/find-root@1.1.0': dependencies: '@babel/runtime': 7.24.8 '@types/node': 12.20.55 find-up: 4.1.0 fs-extra: 8.1.0 - dev: true - /@manypkg/get-packages@1.1.3: - resolution: {integrity: sha512-fo+QhuU3qE/2TQMQmbVMqaQ6EWbMhi4ABWP+O4AM1NqPBuy0OrApV5LO6BrrgnhtAHS2NH6RrVk9OL181tTi8A==} + '@manypkg/get-packages@1.1.3': dependencies: '@babel/runtime': 7.24.8 '@changesets/types': 4.1.0 @@ -280,243 +705,142 @@ packages: fs-extra: 8.1.0 globby: 11.1.0 read-yaml-file: 1.1.0 - dev: true - /@nodelib/fs.scandir@2.1.5: - resolution: {integrity: sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g==} - engines: {node: '>= 8'} + '@nodelib/fs.scandir@2.1.5': dependencies: '@nodelib/fs.stat': 2.0.5 run-parallel: 1.2.0 - dev: true - /@nodelib/fs.stat@2.0.5: - resolution: {integrity: sha512-RkhPPp2zrqDAQA/2jNhnztcPAlv64XdhIp7a7454A5ovI7Bukxgt7MX7udwAu3zg1DcpPU0rz3VV1SeaqvY4+A==} - engines: {node: '>= 8'} - dev: true + '@nodelib/fs.stat@2.0.5': {} - /@nodelib/fs.walk@1.2.8: - resolution: {integrity: sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg==} - engines: {node: '>= 8'} + '@nodelib/fs.walk@1.2.8': dependencies: '@nodelib/fs.scandir': 2.1.5 fastq: 1.17.1 - dev: true - /@types/node@12.20.55: - resolution: {integrity: sha512-J8xLz7q2OFulZ2cyGTLE1TbbZcjpno7FaN6zdJNrgAdrJ+DZzh/uFR6YrTb4C+nXakvud8Q4+rbhoIWlYQbUFQ==} - dev: true + '@types/node@12.20.55': {} - /@types/semver@7.5.8: - resolution: {integrity: sha512-I8EUhyrgfLrcTkzV3TSsGyl1tSuPrEDzr0yd5m90UgNxQkyDXULk3b6MlQqTCpZpNtWe1K0hzclnZkTcLBe2UQ==} - dev: true + '@types/semver@7.5.8': {} - /ansi-colors@4.1.3: - resolution: {integrity: sha512-/6w/C21Pm1A7aZitlI5Ni/2J6FFQN8i1Cvz3kHABAAbw93v/NlvKdVOqz7CCWz/3iv/JplRSEEZ83XION15ovw==} - engines: {node: '>=6'} - dev: true + ansi-colors@4.1.3: {} - /ansi-regex@5.0.1: - resolution: {integrity: sha512-quJQXlTSUGL2LH9SUXo8VwsY4soanhgo6LNSm84E1LBcE8s3O0wpdiRzyR9z/ZZJMlMWv37qOOb9pdJlMUEKFQ==} - engines: {node: '>=8'} - dev: true + ansi-regex@5.0.1: {} - /ansi-styles@3.2.1: - resolution: {integrity: sha512-VT0ZI6kZRdTh8YyJw3SMbYm/u+NqfsAxEpWO0Pf9sq8/e94WxxOpPKx9FR1FlyCtOVDNOQ+8ntlqFxiRc+r5qA==} - engines: {node: '>=4'} + ansi-styles@3.2.1: dependencies: color-convert: 1.9.3 - dev: true - /argparse@1.0.10: - resolution: {integrity: sha512-o5Roy6tNG4SL/FOkCAN6RzjiakZS25RLYFrcMttJqbdd8BWrnA+fGz57iN5Pb06pvBGvl5gQ0B48dJlslXvoTg==} + argparse@1.0.10: dependencies: sprintf-js: 1.0.3 - dev: true - /array-union@2.1.0: - resolution: {integrity: sha512-HGyxoOTYUyCM6stUe6EJgnd4EoewAI7zMdfqO+kGjnlZmBDz/cR5pf8r/cR4Wq60sL/p0IkcjUEEPwS3GFrIyw==} - engines: {node: '>=8'} - dev: true + array-union@2.1.0: {} - /better-path-resolve@1.0.0: - resolution: {integrity: sha512-pbnl5XzGBdrFU/wT4jqmJVPn2B6UHPBOhzMQkY/SPUPB6QtUXtmBHBIwCbXJol93mOpGMnQyP/+BB19q04xj7g==} - engines: {node: '>=4'} + better-path-resolve@1.0.0: dependencies: is-windows: 1.0.2 - dev: true - /braces@3.0.3: - resolution: {integrity: sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==} - engines: {node: '>=8'} + braces@3.0.3: dependencies: fill-range: 7.1.1 - dev: true - /chalk@2.4.2: - resolution: {integrity: sha512-Mti+f9lpJNcwF4tWV8/OrTTtF1gZi+f8FqlyAdouralcFWFQWF2+NgCHShjkCb+IFBLq9buZwE1xckQU4peSuQ==} - engines: {node: '>=4'} + chalk@2.4.2: dependencies: ansi-styles: 3.2.1 escape-string-regexp: 1.0.5 supports-color: 5.5.0 - dev: true - /chardet@0.7.0: - resolution: {integrity: sha512-mT8iDcrh03qDGRRmoA2hmBJnxpllMR+0/0qlzjqZES6NdiWDcZkCNAk4rPFZ9Q85r27unkiNNg8ZOiwZXBHwcA==} - dev: true + chardet@0.7.0: {} - /ci-info@3.9.0: - resolution: {integrity: sha512-NIxF55hv4nSqQswkAeiOi1r83xy8JldOFDTWiug55KBu9Jnblncd2U6ViHmYgHf01TPZS77NJBhBMKdWj9HQMQ==} - engines: {node: '>=8'} - dev: true + ci-info@3.9.0: {} - /color-convert@1.9.3: - resolution: {integrity: sha512-QfAUtd+vFdAtFQcC8CCyYt1fYWxSqAiK2cSD6zDB8N3cpsEBAvRxp9zOGg6G/SHHJYAT88/az/IuDGALsNVbGg==} + color-convert@1.9.3: dependencies: color-name: 1.1.3 - dev: true - /color-name@1.1.3: - resolution: {integrity: sha512-72fSenhMw2HZMTVHeCA9KCmpEIbzWiQsjN+BHcBbS9vr1mtt+vJjPdksIBNUmKAW8TFUDPJK5SUU3QhE9NEXDw==} - dev: true + color-name@1.1.3: {} - /cross-spawn@5.1.0: - resolution: {integrity: sha512-pTgQJ5KC0d2hcY8eyL1IzlBPYjTkyH72XRZPnLyKus2mBfNjQs3klqbJU2VILqZryAZUt9JOb3h/mWMy23/f5A==} + cross-spawn@5.1.0: dependencies: lru-cache: 4.1.5 shebang-command: 1.2.0 which: 1.3.1 - dev: true - /dataloader@1.4.0: - resolution: {integrity: sha512-68s5jYdlvasItOJnCuI2Q9s4q98g0pCyL3HrcKJu8KNugUl8ahgmZYg38ysLTgQjjXX3H8CJLkAvWrclWfcalw==} - dev: true + dataloader@1.4.0: {} - /detect-indent@6.1.0: - resolution: {integrity: sha512-reYkTUJAZb9gUuZ2RvVCNhVHdg62RHnJ7WJl8ftMi4diZ6NWlciOzQN88pUhSELEwflJht4oQDv0F0BMlwaYtA==} - engines: {node: '>=8'} - dev: true + detect-indent@6.1.0: {} - /dir-glob@3.0.1: - resolution: {integrity: sha512-WkrWp9GR4KXfKGYzOLmTuGVi1UWFfws377n9cc55/tb6DuqyF6pcQ5AbiHEshaDpY9v6oaSr2XCDidGmMwdzIA==} - engines: {node: '>=8'} + dir-glob@3.0.1: dependencies: path-type: 4.0.0 - dev: true - /dotenv@8.6.0: - resolution: {integrity: sha512-IrPdXQsk2BbzvCBGBOTmmSH5SodmqZNt4ERAZDmW4CT+tL8VtvinqywuANaFu4bOMWki16nqf0e4oC0QIaDr/g==} - engines: {node: '>=10'} - dev: true + dotenv@8.6.0: {} - /enquirer@2.4.1: - resolution: {integrity: sha512-rRqJg/6gd538VHvR3PSrdRBb/1Vy2YfzHqzvbhGIQpDRKIa4FgV/54b5Q1xYSxOOwKvjXweS26E0Q+nAMwp2pQ==} - engines: {node: '>=8.6'} + enquirer@2.4.1: dependencies: ansi-colors: 4.1.3 strip-ansi: 6.0.1 - dev: true - /escape-string-regexp@1.0.5: - resolution: {integrity: sha512-vbRorB5FUQWvla16U8R/qgaFIya2qGzwDrNmCZuYKrbdSUMG6I1ZCGQRefkRVhuOkIGVne7BQ35DSfo1qvJqFg==} - engines: {node: '>=0.8.0'} - dev: true + escape-string-regexp@1.0.5: {} - /esprima@4.0.1: - resolution: {integrity: sha512-eGuFFw7Upda+g4p+QHvnW0RyTX/SVeJBDM/gCtMARO0cLuT2HcEKnTPvhjV6aGeqrCB/sbNop0Kszm0jsaWU4A==} - engines: {node: '>=4'} - hasBin: true - dev: true + esprima@4.0.1: {} - /extendable-error@0.1.7: - resolution: {integrity: sha512-UOiS2in6/Q0FK0R0q6UY9vYpQ21mr/Qn1KOnte7vsACuNJf514WvCCUHSRCPcgjPT2bAhNIJdlE6bVap1GKmeg==} - dev: true + extendable-error@0.1.7: {} - /external-editor@3.1.0: - resolution: {integrity: sha512-hMQ4CX1p1izmuLYyZqLMO/qGNw10wSv9QDCPfzXfyFrOaCSSoRfqE1Kf1s5an66J5JZC62NewG+mK49jOCtQew==} - engines: {node: '>=4'} + external-editor@3.1.0: dependencies: chardet: 0.7.0 iconv-lite: 0.4.24 tmp: 0.0.33 - dev: true - /fast-glob@3.3.2: - resolution: {integrity: sha512-oX2ruAFQwf/Orj8m737Y5adxDQO0LAB7/S5MnxCdTNDd4p6BsyIVsv9JQsATbTSq8KHRpLwIHbVlUNatxd+1Ow==} - engines: {node: '>=8.6.0'} + fast-glob@3.3.2: dependencies: '@nodelib/fs.stat': 2.0.5 '@nodelib/fs.walk': 1.2.8 glob-parent: 5.1.2 merge2: 1.4.1 micromatch: 4.0.7 - dev: true - /fastq@1.17.1: - resolution: {integrity: sha512-sRVD3lWVIXWg6By68ZN7vho9a1pQcN/WBFaAAsDDFzlJjvoGx0P8z7V1t72grFJfJhu3YPZBuu25f7Kaw2jN1w==} + fastq@1.17.1: dependencies: reusify: 1.0.4 - dev: true - /fill-range@7.1.1: - resolution: {integrity: sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==} - engines: {node: '>=8'} + fill-range@7.1.1: dependencies: to-regex-range: 5.0.1 - dev: true - /find-up@4.1.0: - resolution: {integrity: sha512-PpOwAdQ/YlXQ2vj8a3h8IipDuYRi3wceVQQGYWxNINccq40Anw7BlsEXCMbt1Zt+OLA6Fq9suIpIWD0OsnISlw==} - engines: {node: '>=8'} + find-up@4.1.0: dependencies: locate-path: 5.0.0 path-exists: 4.0.0 - dev: true - /find-up@5.0.0: - resolution: {integrity: sha512-78/PXT1wlLLDgTzDs7sjq9hzz0vXD+zn+7wypEe4fXQxCmdmqfGsEPQxmiCSQI3ajFV91bVSsvNtrJRiW6nGng==} - engines: {node: '>=10'} + find-up@5.0.0: dependencies: locate-path: 6.0.0 path-exists: 4.0.0 - dev: true - /find-yarn-workspace-root2@1.2.16: - resolution: {integrity: sha512-hr6hb1w8ePMpPVUK39S4RlwJzi+xPLuVuG8XlwXU3KD5Yn3qgBWVfy3AzNlDhWvE1EORCE65/Qm26rFQt3VLVA==} + find-yarn-workspace-root2@1.2.16: dependencies: micromatch: 4.0.7 pkg-dir: 4.2.0 - dev: true - /fs-extra@7.0.1: - resolution: {integrity: sha512-YJDaCJZEnBmcbw13fvdAM9AwNOJwOzrE4pqMqBq5nFiEqXUqHwlK4B+3pUw6JNvfSPtX05xFHtYy/1ni01eGCw==} - engines: {node: '>=6 <7 || >=8'} + fs-extra@7.0.1: dependencies: graceful-fs: 4.2.11 jsonfile: 4.0.0 universalify: 0.1.2 - dev: true - /fs-extra@8.1.0: - resolution: {integrity: sha512-yhlQgA6mnOJUKOsRUFsgJdQCvkKhcz8tlZG5HBQfReYZy46OwLcY+Zia0mtdHsOo9y/hP+CxMN0TU9QxoOtG4g==} - engines: {node: '>=6 <7 || >=8'} + fs-extra@8.1.0: dependencies: graceful-fs: 4.2.11 jsonfile: 4.0.0 universalify: 0.1.2 - dev: true - /glob-parent@5.1.2: - resolution: {integrity: sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow==} - engines: {node: '>= 6'} + glob-parent@5.1.2: dependencies: is-glob: 4.0.3 - dev: true - /globby@11.1.0: - resolution: {integrity: sha512-jhIXaOzy1sb8IyocaruWSn1TjmnBVs8Ayhcy83rmxNJ8q2uWKCAj3CnJY+KpGSXCueAPc0i05kVvVKtP1t9S3g==} - engines: {node: '>=10'} + globby@11.1.0: dependencies: array-union: 2.1.0 dir-glob: 3.0.1 @@ -524,400 +848,210 @@ packages: ignore: 5.3.1 merge2: 1.4.1 slash: 3.0.0 - dev: true - /graceful-fs@4.2.11: - resolution: {integrity: sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ==} - dev: true + graceful-fs@4.2.11: {} - /has-flag@3.0.0: - resolution: {integrity: sha512-sKJf1+ceQBr4SMkvQnBDNDtf4TXpVhVGateu0t918bl30FnbE2m4vNLX+VWe/dpjlb+HugGYzW7uQXH98HPEYw==} - engines: {node: '>=4'} - dev: true + has-flag@3.0.0: {} - /human-id@1.0.2: - resolution: {integrity: sha512-UNopramDEhHJD+VR+ehk8rOslwSfByxPIZyJRfV739NDhN5LF1fa1MqnzKm2lGTQRjNrjK19Q5fhkgIfjlVUKw==} - dev: true + human-id@1.0.2: {} - /iconv-lite@0.4.24: - resolution: {integrity: sha512-v3MXnZAcvnywkTUEZomIActle7RXXeedOR31wwl7VlyoXO4Qi9arvSenNQWne1TcRwhCL1HwLI21bEqdpj8/rA==} - engines: {node: '>=0.10.0'} + iconv-lite@0.4.24: dependencies: safer-buffer: 2.1.2 - dev: true - /ignore@5.3.1: - resolution: {integrity: sha512-5Fytz/IraMjqpwfd34ke28PTVMjZjJG2MPn5t7OE4eUCUNf8BAa7b5WUS9/Qvr6mwOQS7Mk6vdsMno5he+T8Xw==} - engines: {node: '>= 4'} - dev: true + ignore@5.3.1: {} - /is-extglob@2.1.1: - resolution: {integrity: sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==} - engines: {node: '>=0.10.0'} - dev: true + is-extglob@2.1.1: {} - /is-glob@4.0.3: - resolution: {integrity: sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==} - engines: {node: '>=0.10.0'} + is-glob@4.0.3: dependencies: is-extglob: 2.1.1 - dev: true - /is-number@7.0.0: - resolution: {integrity: sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng==} - engines: {node: '>=0.12.0'} - dev: true + is-number@7.0.0: {} - /is-subdir@1.2.0: - resolution: {integrity: sha512-2AT6j+gXe/1ueqbW6fLZJiIw3F8iXGJtt0yDrZaBhAZEG1raiTxKWU+IPqMCzQAXOUCKdA4UDMgacKH25XG2Cw==} - engines: {node: '>=4'} + is-subdir@1.2.0: dependencies: better-path-resolve: 1.0.0 - dev: true - /is-windows@1.0.2: - resolution: {integrity: sha512-eXK1UInq2bPmjyX6e3VHIzMLobc4J94i4AWn+Hpq3OU5KkrRC96OAcR3PRJ/pGu6m8TRnBHP9dkXQVsT/COVIA==} - engines: {node: '>=0.10.0'} - dev: true + is-windows@1.0.2: {} - /isexe@2.0.0: - resolution: {integrity: sha512-RHxMLp9lnKHGHRng9QFhRCMbYAcVpn69smSGcq3f36xjgVVWThj4qqLbTLlq7Ssj8B+fIQ1EuCEGI2lKsyQeIw==} - dev: true + isexe@2.0.0: {} - /js-yaml@3.14.1: - resolution: {integrity: sha512-okMH7OXXJ7YrN9Ok3/SXrnu4iX9yOk+25nqX4imS2npuvTYDmo/QEZoqwZkYaIDk3jVvBOTOIEgEhaLOynBS9g==} - hasBin: true + js-yaml@3.14.1: dependencies: argparse: 1.0.10 esprima: 4.0.1 - dev: true - /jsonfile@4.0.0: - resolution: {integrity: sha512-m6F1R3z8jjlf2imQHS2Qez5sjKWQzbuuhuJ/FKYFRZvPE3PuHcSMVZzfsLhGVOkfd20obL5SWEBew5ShlquNxg==} + jsonfile@4.0.0: optionalDependencies: graceful-fs: 4.2.11 - dev: true - /load-yaml-file@0.2.0: - resolution: {integrity: sha512-OfCBkGEw4nN6JLtgRidPX6QxjBQGQf72q3si2uvqyFEMbycSFFHwAZeXx6cJgFM9wmLrf9zBwCP3Ivqa+LLZPw==} - engines: {node: '>=6'} + load-yaml-file@0.2.0: dependencies: graceful-fs: 4.2.11 js-yaml: 3.14.1 pify: 4.0.1 strip-bom: 3.0.0 - dev: true - /locate-path@5.0.0: - resolution: {integrity: sha512-t7hw9pI+WvuwNJXwk5zVHpyhIqzg2qTlklJOf0mVxGSbe3Fp2VieZcduNYjaLDoy6p9uGpQEGWG87WpMKlNq8g==} - engines: {node: '>=8'} + locate-path@5.0.0: dependencies: p-locate: 4.1.0 - dev: true - /locate-path@6.0.0: - resolution: {integrity: sha512-iPZK6eYjbxRu3uB4/WZ3EsEIMJFMqAoopl3R+zuq0UjcAm/MO6KCweDgPfP3elTztoKP3KtnVHxTn2NHBSDVUw==} - engines: {node: '>=10'} + locate-path@6.0.0: dependencies: p-locate: 5.0.0 - dev: true - /lodash.startcase@4.4.0: - resolution: {integrity: sha512-+WKqsK294HMSc2jEbNgpHpd0JfIBhp7rEV4aqXWqFr6AlXov+SlcgB1Fv01y2kGe3Gc8nMW7VA0SrGuSkRfIEg==} - dev: true + lodash.startcase@4.4.0: {} - /lru-cache@4.1.5: - resolution: {integrity: sha512-sWZlbEP2OsHNkXrMl5GYk/jKk70MBng6UU4YI/qGDYbgf6YbP4EvmqISbXCoJiRKs+1bSpFHVgQxvJ17F2li5g==} + lru-cache@4.1.5: dependencies: pseudomap: 1.0.2 yallist: 2.1.2 - dev: true - /merge2@1.4.1: - resolution: {integrity: sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==} - engines: {node: '>= 8'} - dev: true + merge2@1.4.1: {} - /micromatch@4.0.7: - resolution: {integrity: sha512-LPP/3KorzCwBxfeUuZmaR6bG2kdeHSbe0P2tY3FLRU4vYrjYz5hI4QZwV0njUx3jeuKe67YukQ1LSPZBKDqO/Q==} - engines: {node: '>=8.6'} + micromatch@4.0.7: dependencies: braces: 3.0.3 picomatch: 2.3.1 - dev: true - /mri@1.2.0: - resolution: {integrity: sha512-tzzskb3bG8LvYGFF/mDTpq3jpI6Q9wc3LEmBaghu+DdCssd1FakN7Bc0hVNmEyGq1bq3RgfkCb3cmQLpNPOroA==} - engines: {node: '>=4'} - dev: true + mri@1.2.0: {} - /node-fetch@2.7.0: - resolution: {integrity: sha512-c4FRfUm/dbcWZ7U+1Wq0AwCyFL+3nt2bEw05wfxSz+DWpWsitgmSgYmy2dQdWyKC1694ELPqMs/YzUSNozLt8A==} - engines: {node: 4.x || >=6.0.0} - peerDependencies: - encoding: ^0.1.0 - peerDependenciesMeta: - encoding: - optional: true + node-fetch@2.7.0: dependencies: whatwg-url: 5.0.0 - dev: true - /os-tmpdir@1.0.2: - resolution: {integrity: sha512-D2FR03Vir7FIu45XBY20mTb+/ZSWB00sjU9jdQXt83gDrI4Ztz5Fs7/yy74g2N5SVQY4xY1qDr4rNddwYRVX0g==} - engines: {node: '>=0.10.0'} - dev: true + os-tmpdir@1.0.2: {} - /outdent@0.5.0: - resolution: {integrity: sha512-/jHxFIzoMXdqPzTaCpFzAAWhpkSjZPF4Vsn6jAfNpmbH/ymsmd7Qc6VE9BGn0L6YMj6uwpQLxCECpus4ukKS9Q==} - dev: true + outdent@0.5.0: {} - /p-filter@2.1.0: - resolution: {integrity: sha512-ZBxxZ5sL2HghephhpGAQdoskxplTwr7ICaehZwLIlfL6acuVgZPm8yBNuRAFBGEqtD/hmUeq9eqLg2ys9Xr/yw==} - engines: {node: '>=8'} + p-filter@2.1.0: dependencies: p-map: 2.1.0 - dev: true - /p-limit@2.3.0: - resolution: {integrity: sha512-//88mFWSJx8lxCzwdAABTJL2MyWB12+eIY7MDL2SqLmAkeKU9qxRvWuSyTjm3FUmpBEMuFfckAIqEaVGUDxb6w==} - engines: {node: '>=6'} + p-limit@2.3.0: dependencies: p-try: 2.2.0 - dev: true - /p-limit@3.1.0: - resolution: {integrity: sha512-TYOanM3wGwNGsZN2cVTYPArw454xnXj5qmWF1bEoAc4+cU/ol7GVh7odevjp1FNHduHc3KZMcFduxU5Xc6uJRQ==} - engines: {node: '>=10'} + p-limit@3.1.0: dependencies: yocto-queue: 0.1.0 - dev: true - /p-locate@4.1.0: - resolution: {integrity: sha512-R79ZZ/0wAxKGu3oYMlz8jy/kbhsNrS7SKZ7PxEHBgJ5+F2mtFW2fK2cOtBh1cHYkQsbzFV7I+EoRKe6Yt0oK7A==} - engines: {node: '>=8'} + p-locate@4.1.0: dependencies: p-limit: 2.3.0 - dev: true - /p-locate@5.0.0: - resolution: {integrity: sha512-LaNjtRWUBY++zB5nE/NwcaoMylSPk+S+ZHNB1TzdbMJMny6dynpAGt7X/tl/QYq3TIeE6nxHppbo2LGymrG5Pw==} - engines: {node: '>=10'} + p-locate@5.0.0: dependencies: p-limit: 3.1.0 - dev: true - /p-map@2.1.0: - resolution: {integrity: sha512-y3b8Kpd8OAN444hxfBbFfj1FY/RjtTd8tzYwhUqNYXx0fXx2iX4maP4Qr6qhIKbQXI02wTLAda4fYUbDagTUFw==} - engines: {node: '>=6'} - dev: true + p-map@2.1.0: {} - /p-try@2.2.0: - resolution: {integrity: sha512-R4nPAVTAU0B9D35/Gk3uJf/7XYbQcyohSKdvAxIRSNghFl4e71hVoGnBNQz9cWaXxO2I10KTC+3jMdvvoKw6dQ==} - engines: {node: '>=6'} - dev: true + p-try@2.2.0: {} - /path-exists@4.0.0: - resolution: {integrity: sha512-ak9Qy5Q7jYb2Wwcey5Fpvg2KoAc/ZIhLSLOSBmRmygPsGwkVVt0fZa0qrtMz+m6tJTAHfZQ8FnmB4MG4LWy7/w==} - engines: {node: '>=8'} - dev: true + path-exists@4.0.0: {} - /path-type@4.0.0: - resolution: {integrity: sha512-gDKb8aZMDeD/tZWs9P6+q0J9Mwkdl6xMV8TjnGP3qJVJ06bdMgkbBlLU8IdfOsIsFz2BW1rNVT3XuNEl8zPAvw==} - engines: {node: '>=8'} - dev: true + path-type@4.0.0: {} - /picomatch@2.3.1: - resolution: {integrity: sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==} - engines: {node: '>=8.6'} - dev: true + picomatch@2.3.1: {} - /pify@4.0.1: - resolution: {integrity: sha512-uB80kBFb/tfd68bVleG9T5GGsGPjJrLAUpR5PZIrhBnIaRTQRjqdJSsIKkOP6OAIFbj7GOrcudc5pNjZ+geV2g==} - engines: {node: '>=6'} - dev: true + pify@4.0.1: {} - /pkg-dir@4.2.0: - resolution: {integrity: sha512-HRDzbaKjC+AOWVXxAU/x54COGeIv9eb+6CkDSQoNTt4XyWoIJvuPsXizxu/Fr23EiekbtZwmh1IcIG/l/a10GQ==} - engines: {node: '>=8'} + pkg-dir@4.2.0: dependencies: find-up: 4.1.0 - dev: true - /preferred-pm@3.1.4: - resolution: {integrity: sha512-lEHd+yEm22jXdCphDrkvIJQU66EuLojPPtvZkpKIkiD+l0DMThF/niqZKJSoU8Vl7iuvtmzyMhir9LdVy5WMnA==} - engines: {node: '>=10'} + preferred-pm@3.1.4: dependencies: find-up: 5.0.0 find-yarn-workspace-root2: 1.2.16 path-exists: 4.0.0 which-pm: 2.2.0 - dev: true - /prettier@2.8.8: - resolution: {integrity: sha512-tdN8qQGvNjw4CHbY+XXk0JgCXn9QiF21a55rBe5LJAU+kDyC4WQn4+awm2Xfk2lQMk5fKup9XgzTZtGkjBdP9Q==} - engines: {node: '>=10.13.0'} - hasBin: true - dev: true + prettier@2.8.8: {} - /pseudomap@1.0.2: - resolution: {integrity: sha512-b/YwNhb8lk1Zz2+bXXpS/LK9OisiZZ1SNsSLxN1x2OXVEhW2Ckr/7mWE5vrC1ZTiJlD9g19jWszTmJsB+oEpFQ==} - dev: true + pseudomap@1.0.2: {} - /queue-microtask@1.2.3: - resolution: {integrity: sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A==} - dev: true + queue-microtask@1.2.3: {} - /read-yaml-file@1.1.0: - resolution: {integrity: sha512-VIMnQi/Z4HT2Fxuwg5KrY174U1VdUIASQVWXXyqtNRtxSr9IYkn1rsI6Tb6HsrHCmB7gVpNwX6JxPTHcH6IoTA==} - engines: {node: '>=6'} + read-yaml-file@1.1.0: dependencies: graceful-fs: 4.2.11 js-yaml: 3.14.1 pify: 4.0.1 strip-bom: 3.0.0 - dev: true - /regenerator-runtime@0.14.1: - resolution: {integrity: sha512-dYnhHh0nJoMfnkZs6GmmhFknAGRrLznOu5nc9ML+EJxGvrx6H7teuevqVqCuPcPK//3eDrrjQhehXVx9cnkGdw==} - dev: true + regenerator-runtime@0.14.1: {} - /resolve-from@5.0.0: - resolution: {integrity: sha512-qYg9KP24dD5qka9J47d0aVky0N+b4fTU89LN9iDnjB5waksiC49rvMB0PrUJQGoTmH50XPiqOvAjDfaijGxYZw==} - engines: {node: '>=8'} - dev: true + resolve-from@5.0.0: {} - /reusify@1.0.4: - resolution: {integrity: sha512-U9nH88a3fc/ekCF1l0/UP1IosiuIjyTh7hBvXVMHYgVcfGvt897Xguj2UOLDeI5BG2m7/uwyaLVT6fbtCwTyzw==} - engines: {iojs: '>=1.0.0', node: '>=0.10.0'} - dev: true + reusify@1.0.4: {} - /run-parallel@1.2.0: - resolution: {integrity: sha512-5l4VyZR86LZ/lDxZTR6jqL8AFE2S0IFLMP26AbjsLVADxHdhB/c0GUsH+y39UfCi3dzz8OlQuPmnaJOMoDHQBA==} + run-parallel@1.2.0: dependencies: queue-microtask: 1.2.3 - dev: true - /safer-buffer@2.1.2: - resolution: {integrity: sha512-YZo3K82SD7Riyi0E1EQPojLz7kpepnSQI9IyPbHHg1XXXevb5dJI7tpyN2ADxGcQbHG7vcyRHk0cbwqcQriUtg==} - dev: true + safer-buffer@2.1.2: {} - /semver@7.6.3: - resolution: {integrity: sha512-oVekP1cKtI+CTDvHWYFUcMtsK/00wmAEfyqKfNdARm8u1wNVhSgaX7A8d4UuIlUI5e84iEwOhs7ZPYRmzU9U6A==} - engines: {node: '>=10'} - hasBin: true - dev: true + semver@7.6.3: {} - /shebang-command@1.2.0: - resolution: {integrity: sha512-EV3L1+UQWGor21OmnvojK36mhg+TyIKDh3iFBKBohr5xeXIhNBcx8oWdgkTEEQ+BEFFYdLRuqMfd5L84N1V5Vg==} - engines: {node: '>=0.10.0'} + shebang-command@1.2.0: dependencies: shebang-regex: 1.0.0 - dev: true - /shebang-regex@1.0.0: - resolution: {integrity: sha512-wpoSFAxys6b2a2wHZ1XpDSgD7N9iVjg29Ph9uV/uaP9Ex/KXlkTZTeddxDPSYQpgvzKLGJke2UU0AzoGCjNIvQ==} - engines: {node: '>=0.10.0'} - dev: true + shebang-regex@1.0.0: {} - /signal-exit@3.0.7: - resolution: {integrity: sha512-wnD2ZE+l+SPC/uoS0vXeE9L1+0wuaMqKlfz9AMUo38JsyLSBWSFcHR1Rri62LZc12vLr1gb3jl7iwQhgwpAbGQ==} - dev: true + signal-exit@3.0.7: {} - /slash@3.0.0: - resolution: {integrity: sha512-g9Q1haeby36OSStwb4ntCGGGaKsaVSjQ68fBxoQcutl5fS1vuY18H3wSt3jFyFtrkx+Kz0V1G85A4MyAdDMi2Q==} - engines: {node: '>=8'} - dev: true + slash@3.0.0: {} - /spawndamnit@2.0.0: - resolution: {integrity: sha512-j4JKEcncSjFlqIwU5L/rp2N5SIPsdxaRsIv678+TZxZ0SRDJTm8JrxJMjE/XuiEZNEir3S8l0Fa3Ke339WI4qA==} + spawndamnit@2.0.0: dependencies: cross-spawn: 5.1.0 signal-exit: 3.0.7 - dev: true - /sprintf-js@1.0.3: - resolution: {integrity: sha512-D9cPgkvLlV3t3IzL0D0YLvGA9Ahk4PcvVwUbN0dSGr1aP0Nrt4AEnTUbuGvquEC0mA64Gqt1fzirlRs5ibXx8g==} - dev: true + sprintf-js@1.0.3: {} - /strip-ansi@6.0.1: - resolution: {integrity: sha512-Y38VPSHcqkFrCpFnQ9vuSXmquuv5oXOKpGeT6aGrr3o3Gc9AlVa6JBfUSOCnbxGGZF+/0ooI7KrPuUSztUdU5A==} - engines: {node: '>=8'} + strip-ansi@6.0.1: dependencies: ansi-regex: 5.0.1 - dev: true - /strip-bom@3.0.0: - resolution: {integrity: sha512-vavAMRXOgBVNF6nyEEmL3DBK19iRpDcoIwW+swQ+CbGiu7lju6t+JklA1MHweoWtadgt4ISVUsXLyDq34ddcwA==} - engines: {node: '>=4'} - dev: true + strip-bom@3.0.0: {} - /supports-color@5.5.0: - resolution: {integrity: sha512-QjVjwdXIt408MIiAqCX4oUKsgU2EqAGzs2Ppkm4aQYbjm+ZEWEcW4SfFNTr4uMNZma0ey4f5lgLrkB0aX0QMow==} - engines: {node: '>=4'} + supports-color@5.5.0: dependencies: has-flag: 3.0.0 - dev: true - /term-size@2.2.1: - resolution: {integrity: sha512-wK0Ri4fOGjv/XPy8SBHZChl8CM7uMc5VML7SqiQ0zG7+J5Vr+RMQDoHa2CNT6KHUnTGIXH34UDMkPzAUyapBZg==} - engines: {node: '>=8'} - dev: true + term-size@2.2.1: {} - /tmp@0.0.33: - resolution: {integrity: sha512-jRCJlojKnZ3addtTOjdIqoRuPEKBvNXcGYqzO6zWZX8KfKEpnGY5jfggJQ3EjKuu8D4bJRr0y+cYJFmYbImXGw==} - engines: {node: '>=0.6.0'} + tmp@0.0.33: dependencies: os-tmpdir: 1.0.2 - dev: true - /to-regex-range@5.0.1: - resolution: {integrity: sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==} - engines: {node: '>=8.0'} + to-regex-range@5.0.1: dependencies: is-number: 7.0.0 - dev: true - /tr46@0.0.3: - resolution: {integrity: sha512-N3WMsuqV66lT30CrXNbEjx4GEwlow3v6rr4mCcv6prnfwhS01rkgyFdjPNBYd9br7LpXV1+Emh01fHnq2Gdgrw==} - dev: true + tr46@0.0.3: {} - /universalify@0.1.2: - resolution: {integrity: sha512-rBJeI5CXAlmy1pV+617WB9J63U6XcazHHF2f2dbJix4XzpUF0RS3Zbj0FGIOCAva5P/d/GBOYaACQ1w+0azUkg==} - engines: {node: '>= 4.0.0'} - dev: true + universalify@0.1.2: {} - /webidl-conversions@3.0.1: - resolution: {integrity: sha512-2JAn3z8AR6rjK8Sm8orRC0h/bcl/DqL7tRPdGZ4I1CjdF+EaMLmYxBHyXuKL849eucPFhvBoxMsflfOb8kxaeQ==} - dev: true + webidl-conversions@3.0.1: {} - /whatwg-url@5.0.0: - resolution: {integrity: sha512-saE57nupxk6v3HY35+jzBwYa0rKSy0XR8JSxZPwgLr7ys0IBzhGviA1/TUGJLmSVqs8pb9AnvICXEuOHLprYTw==} + whatwg-url@5.0.0: dependencies: tr46: 0.0.3 webidl-conversions: 3.0.1 - dev: true - /which-pm@2.2.0: - resolution: {integrity: sha512-MOiaDbA5ZZgUjkeMWM5EkJp4loW5ZRoa5bc3/aeMox/PJelMhE6t7S/mLuiY43DBupyxH+S0U1bTui9kWUlmsw==} - engines: {node: '>=8.15'} + which-pm@2.2.0: dependencies: load-yaml-file: 0.2.0 path-exists: 4.0.0 - dev: true - /which@1.3.1: - resolution: {integrity: sha512-HxJdYWq1MTIQbJ3nw0cqssHoTNU267KlrDuGZ1WYlxDStUtKUhOaJmh112/TZmHxxUfuJqPXSOm7tDyas0OSIQ==} - hasBin: true + which@1.3.1: dependencies: isexe: 2.0.0 - dev: true - /yallist@2.1.2: - resolution: {integrity: sha512-ncTzHV7NvsQZkYe1DW7cbDLm0YpzHmZF5r/iyP3ZnQtMiJ+pjzisCiMNI+Sj+xQF5pXhSHxSB3uDbsBTzY/c2A==} - dev: true + yallist@2.1.2: {} - /yocto-queue@0.1.0: - resolution: {integrity: sha512-rVksvsnNCdJ/ohGc6xgPwyN8eheCxsiLM8mxuE/t/mOVqJewPuO1miLpTHQiRgTKCLexL4MeAFVagts7HmNZ2Q==} - engines: {node: '>=10'} - dev: true + yocto-queue@0.1.0: {} diff --git a/test.py b/test.py deleted file mode 100644 index e5d5b542b..000000000 --- a/test.py +++ /dev/null @@ -1,59 +0,0 @@ -import asyncio -import multiprocessing as mp -import os -import socket - - -async def async_send(loop, sock, message): - await loop.sock_sendall(sock, message.encode("utf-8")) - - -async def async_recv(loop, sock, buffer_size=1024): - data = await loop.sock_recv(sock, buffer_size) - return data.decode("utf-8") - - -def worker_process(send_sock): - # This will run in the worker process - loop = asyncio.get_event_loop() - - async def worker_task(): - # Simulate sending messages from the worker to the main process - for i in range(5): - message = f"Message {i} from process {os.getpid()}" - print(f"Sending: {message}") - await async_send(loop, send_sock, message) - await asyncio.sleep(1) - - loop.run_until_complete(worker_task()) - send_sock.close() - - -async def main(): - parent_sock, child_sock = socket.socketpair() - - ctx = mp.get_context("spawn") - process = ctx.Process(target=worker_process, args=(child_sock,)) - process.start() - - child_sock.close() # Close the child socket in the main process - - loop = asyncio.get_event_loop() - - # Asynchronously receive messages from the worker process - async def receive_messages(): - while True: - message = await async_recv(loop, parent_sock) - if not message: - break - print(f"Received: {message}") - - await receive_messages() - - # Wait for the process to finish - process.join() - parent_sock.close() - - -if __name__ == "__main__": - asyncio.run(main()) diff --git a/tests/.gitignore b/tests/.gitignore new file mode 100644 index 000000000..ef3d21d85 --- /dev/null +++ b/tests/.gitignore @@ -0,0 +1 @@ +**/test_vad*.wav \ No newline at end of file diff --git a/tests/test_ipc.py b/tests/test_ipc.py index 9256cb1d8..d77715dde 100644 --- a/tests/test_ipc.py +++ b/tests/test_ipc.py @@ -56,6 +56,7 @@ async def _pong(): msg = await ipc.channel.arecv_message(cch, IPC_MESSAGES) await ipc.channel.asend_message(cch, msg) except utils.aio.duplex_unix.DuplexClosed: + print("_echo_main, duplex closed..") break asyncio.run(_pong()) @@ -192,6 +193,7 @@ async def test_proc_pool(): initialize_process_fnc=_initialize_proc, job_entrypoint_fnc=_job_entrypoint, num_idle_processes=num_idle_processes, + job_executor_type=job.JobExecutorType.PROCESS, initialize_timeout=20.0, close_timeout=20.0, mp_ctx=mp_ctx, @@ -208,21 +210,21 @@ async def test_proc_pool(): exitcodes = [] @pool.on("process_created") - def _process_created(proc: ipc.proc_pool.SupervisedProc): + def _process_created(proc: ipc.proc_job_executor.ProcJobExecutor): created_q.put_nowait(None) proc.start_arguments = start_args @pool.on("process_started") - def _process_started(proc: ipc.proc_pool.SupervisedProc): + def _process_started(proc: ipc.proc_job_executor.ProcJobExecutor): start_q.put_nowait(None) pids.append(proc.pid) @pool.on("process_ready") - def _process_ready(proc: ipc.proc_pool.SupervisedProc): + def _process_ready(proc: ipc.proc_job_executor.ProcJobExecutor): ready_q.put_nowait(None) @pool.on("process_closed") - def _process_closed(proc: ipc.proc_pool.SupervisedProc): + def _process_closed(proc: ipc.proc_job_executor.ProcJobExecutor): close_q.put_nowait(None) exitcodes.append(proc.exitcode) @@ -264,6 +266,7 @@ async def test_slow_initialization(): loop = asyncio.get_running_loop() num_idle_processes = 2 pool = ipc.proc_pool.ProcPool( + job_executor_type=job.JobExecutorType.PROCESS, initialize_process_fnc=_initialize_proc, job_entrypoint_fnc=_job_entrypoint, num_idle_processes=num_idle_processes, @@ -282,12 +285,12 @@ async def test_slow_initialization(): exitcodes = [] @pool.on("process_created") - def _process_created(proc: ipc.proc_pool.SupervisedProc): + def _process_created(proc: ipc.proc_job_executor.ProcJobExecutor): proc.start_arguments = start_args start_q.put_nowait(None) @pool.on("process_closed") - def _process_closed(proc: ipc.proc_pool.SupervisedProc): + def _process_closed(proc: ipc.proc_job_executor.ProcJobExecutor): close_q.put_nowait(None) pids.append(proc.pid) exitcodes.append(proc.exitcode) @@ -313,10 +316,10 @@ def _create_proc( close_timeout: float, mp_ctx: BaseContext, initialize_timeout: float = 20.0, -) -> (ipc.supervised_proc.SupervisedProc, _StartArgs): +) -> tuple[ipc.proc_job_executor.ProcJobExecutor, _StartArgs]: start_args = _new_start_args(mp_ctx) loop = asyncio.get_running_loop() - proc = ipc.supervised_proc.SupervisedProc( + proc = ipc.proc_job_executor.ProcJobExecutor( initialize_process_fnc=_initialize_proc, job_entrypoint_fnc=_job_entrypoint, initialize_timeout=initialize_timeout, diff --git a/tests/test_llm.py b/tests/test_llm.py index 97ff6f033..44ce9c434 100644 --- a/tests/test_llm.py +++ b/tests/test_llm.py @@ -1,10 +1,14 @@ +from __future__ import annotations + import asyncio +import uuid from enum import Enum -from typing import Annotated +from typing import Annotated, Callable, Optional +import pytest from livekit.agents import llm from livekit.agents.llm import ChatContext, FunctionContext, TypeInfo, ai_callable -from livekit.plugins import openai +from livekit.plugins import anthropic, openai class Unit(Enum): @@ -13,18 +17,6 @@ class Unit(Enum): class FncCtx(FunctionContext): - def __init__(self) -> None: - super().__init__() - self._get_weather_calls = 0 - self._play_music_calls = 0 - self._toggle_light_calls = 0 - self._select_currency_calls = 0 - self._change_volume_calls = 0 - - self._toggle_light_cancelled = False - self._selected_currencies = None - self._selected_volume = None - @ai_callable( description="Get the current weather in a given location", auto_retry=True ) @@ -36,8 +28,7 @@ def get_weather( unit: Annotated[ Unit, TypeInfo(description="The temperature unit to use.") ] = Unit.CELSIUS, - ) -> None: - self._get_weather_calls += 1 + ) -> None: ... @ai_callable(description="Play a music") def play_music( @@ -45,8 +36,7 @@ def play_music( name: Annotated[ str, TypeInfo(description="The artist and the name of the song") ], - ) -> None: - self._play_music_calls += 1 + ) -> None: ... # test for cancelled calls @ai_callable(description="Turn on/off the lights in a room") @@ -55,26 +45,20 @@ async def toggle_light( room: Annotated[str, TypeInfo(description="The room to control")], on: bool = True, ) -> None: - self._toggle_light_calls += 1 - try: - await asyncio.sleep(60) - except asyncio.CancelledError: - self._toggle_light_cancelled = True + await asyncio.sleep(60) # used to test arrays as arguments - @ai_callable(description="Currencies of a specific country") + @ai_callable(description="Currencies of a specific area") def select_currencies( self, currencies: Annotated[ list[str], TypeInfo( - description="The currency to select", + description="The currencies to select", choices=["usd", "eur", "gbp", "jpy", "sek"], ), ], - ) -> None: - self._select_currency_calls += 1 - self._selected_currencies = currencies + ) -> None: ... # test choices on int @ai_callable(description="Change the volume") @@ -83,19 +67,58 @@ def change_volume( volume: Annotated[ int, TypeInfo(description="The volume level", choices=[0, 11, 30, 83, 99]) ], - ) -> None: - self._change_volume_calls += 1 - self._selected_volume = volume - - -async def test_chat(): - llm = openai.LLM(model="gpt-4o") + ) -> None: ... + @ai_callable(description="Update user info") + def update_user_info( + self, + email: Annotated[ + Optional[str], TypeInfo(description="The user address email") + ] = None, + name: Annotated[Optional[str], TypeInfo(description="The user name")] = None, + address: Optional[ + Annotated[str, TypeInfo(description="The user address")] + ] = None, + ) -> None: ... + + +def test_hashable_typeinfo(): + typeinfo = TypeInfo(description="testing", choices=[1, 2, 3]) + # TypeInfo must be hashable when used in combination of typing.Annotated + hash(typeinfo) + + +LLMS: list[llm.LLM | Callable[[], llm.LLM]] = [ + openai.LLM(), + lambda: openai.beta.AssistantLLM( + assistant_opts=openai.beta.AssistantOptions( + create_options=openai.beta.AssistantCreateOptions( + name=f"test-{uuid.uuid4()}", + instructions="You are a basic assistant", + model="gpt-4o", + ) + ) + ), + # anthropic.LLM(), +] + + +@pytest.mark.parametrize("input_llm", LLMS) +async def test_chat(input_llm: llm.LLM | Callable[[], llm.LLM]): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() chat_ctx = ChatContext().append( text='You are an assistant at a drive-thru restaurant "Live-Burger". Ask the customer what they would like to order.' ) - stream = llm.chat(chat_ctx=chat_ctx) + # Anthropics LLM requires at least one message (system messages don't count) + if isinstance(input_llm, anthropic.LLM): + chat_ctx.append( + text="Hello", + role="user", + ) + + stream = input_llm.chat(chat_ctx=chat_ctx) text = "" async for chunk in stream: content = chunk.choices[0].delta.content @@ -105,23 +128,28 @@ async def test_chat(): assert len(text) > 0 -async def test_fnc_calls(): +@pytest.mark.parametrize("input_llm", LLMS) +async def test_basic_fnc_calls(input_llm: Callable[[], llm.LLM] | llm.LLM): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() fnc_ctx = FncCtx() - llm = openai.LLM(model="gpt-4o") stream = await _request_fnc_call( - llm, "What's the weather in San Francisco and Paris?", fnc_ctx + input_llm, + "What's the weather in San Francisco and what's the weather Paris?", + fnc_ctx, ) - fns = stream.execute_functions() - await asyncio.gather(*[f.task for f in fns]) + calls = stream.execute_functions() + await asyncio.gather(*[f.task for f in calls]) await stream.aclose() + assert len(calls) == 2, "get_weather should be called twice" - assert fnc_ctx._get_weather_calls == 2, "get_weather should be called twice" - -async def test_fnc_calls_runtime_addition(): +@pytest.mark.parametrize("input_llm", LLMS) +async def test_runtime_addition(input_llm: Callable[[], llm.LLM] | llm.LLM): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() fnc_ctx = FncCtx() - llm = openai.LLM(model="gpt-4o") called_msg = "" @fnc_ctx.ai_callable(description="Show a message on the screen") @@ -132,7 +160,7 @@ async def show_message( called_msg = message stream = await _request_fnc_call( - llm, "Can you show 'Hello LiveKit!' on the screen?", fnc_ctx + input_llm, "Can you show 'Hello LiveKit!' on the screen?", fnc_ctx ) fns = stream.execute_functions() await asyncio.gather(*[f.task for f in fns]) @@ -141,63 +169,107 @@ async def show_message( assert called_msg == "Hello LiveKit!", "send_message should be called" -async def test_cancelled_calls(): +@pytest.mark.parametrize("input_llm", LLMS) +async def test_cancelled_calls(input_llm: Callable[[], llm.LLM] | llm.LLM): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() fnc_ctx = FncCtx() - llm = openai.LLM(model="gpt-4o") stream = await _request_fnc_call( - llm, "Turn off the lights in the Theo's bedroom", fnc_ctx + input_llm, "Turn off the lights in the Theo's bedroom", fnc_ctx ) - stream.execute_functions() - - # Need to wait for the task to start - await asyncio.sleep(0) + calls = stream.execute_functions() + await asyncio.sleep(0.2) # wait for the loop executor to start the task - # don't wait for gather_function_results and directly close + # don't wait for gather_function_results and directly close (this should cancel the ongoing calls) await stream.aclose() - assert fnc_ctx._toggle_light_calls == 1 - assert fnc_ctx._toggle_light_cancelled, "toggle_light should be cancelled" + assert len(calls) == 1 + assert isinstance( + calls[0].exception, asyncio.CancelledError + ), "toggle_light should have been cancelled" -async def test_calls_arrays(): +@pytest.mark.parametrize("input_llm", LLMS) +async def test_calls_arrays(input_llm: Callable[[], llm.LLM] | llm.LLM): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() fnc_ctx = FncCtx() - llm = openai.LLM(model="gpt-4o") stream = await _request_fnc_call( - llm, "Can you select all currencies in Europe at once?", fnc_ctx + input_llm, + "Can you select all currencies in Europe at once?", + fnc_ctx, + temperature=0.2, ) - fns = stream.execute_functions() - await asyncio.gather(*[f.task for f in fns]) + calls = stream.execute_functions() + await asyncio.gather(*[f.task for f in calls]) await stream.aclose() - assert fnc_ctx._select_currency_calls == 1 - assert fnc_ctx._selected_currencies is not None - assert len(fnc_ctx._selected_currencies) == 3 + assert len(calls) == 1, "select_currencies should have been called only once" - assert "eur" in fnc_ctx._selected_currencies - assert "gbp" in fnc_ctx._selected_currencies - assert "sek" in fnc_ctx._selected_currencies + call = calls[0] + currencies = call.call_info.arguments["currencies"] + assert len(currencies) == 3, "select_currencies should have 3 currencies" + assert ( + "eur" in currencies and "gbp" in currencies and "sek" in currencies + ), "select_currencies should have eur, gbp, sek" -async def test_calls_choices(): +@pytest.mark.parametrize("input_llm", LLMS) +async def test_calls_choices(input_llm: Callable[[], llm.LLM] | llm.LLM): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() fnc_ctx = FncCtx() - llm = openai.LLM(model="gpt-4o") - stream = await _request_fnc_call(llm, "Set the volume to 30", fnc_ctx) - fns = stream.execute_functions() - await asyncio.gather(*[f.task for f in fns]) + stream = await _request_fnc_call(input_llm, "Set the volume to 30", fnc_ctx) + calls = stream.execute_functions() + await asyncio.gather(*[f.task for f in calls]) await stream.aclose() - assert fnc_ctx._change_volume_calls == 1 - assert fnc_ctx._selected_volume == 30 + assert len(calls) == 1, "change_volume should have been called only once" + + call = calls[0] + volume = call.call_info.arguments["volume"] + assert volume == 30, "change_volume should have been called with volume 30" + + +@pytest.mark.parametrize("input_llm", LLMS) +async def test_optional_args(input_llm: Callable[[], llm.LLM] | llm.LLM): + if not isinstance(input_llm, llm.LLM): + input_llm = input_llm() + fnc_ctx = FncCtx() + + stream = await _request_fnc_call( + input_llm, "Can you update my information? My name is Theo", fnc_ctx + ) + + calls = stream.execute_functions() + await asyncio.gather(*[f.task for f in calls]) + await stream.aclose() + + assert len(calls) == 1, "update_user_info should have been called only once" + + call = calls[0] + name = call.call_info.arguments.get("name", None) + email = call.call_info.arguments.get("email", None) + address = call.call_info.arguments.get("address", None) + + assert name == "Theo", "update_user_info should have been called with name 'Theo'" + assert email is None, "update_user_info should have been called with email None" + assert address is None, "update_user_info should have been called with address None" async def _request_fnc_call( - model: llm.LLM, request: str, fnc_ctx: FncCtx + model: llm.LLM, + request: str, + fnc_ctx: FncCtx, + temperature: float | None = None, ) -> llm.LLMStream: stream = model.chat( - chat_ctx=ChatContext().append(text=request, role="user"), fnc_ctx=fnc_ctx + chat_ctx=ChatContext().append(text=request, role="user"), + fnc_ctx=fnc_ctx, + temperature=temperature, ) async for _ in stream: diff --git a/tests/test_tokenizer.py b/tests/test_tokenizer.py index 931713eeb..bead760b7 100644 --- a/tests/test_tokenizer.py +++ b/tests/test_tokenizer.py @@ -118,6 +118,60 @@ async def test_streamed_word_tokenizer(tokenizer: tokenize.WordTokenizer): assert ev.token == WORDS_EXPECTED[i] +WORDS_PUNCT_TEXT = 'This is actually tricky to handle.' + +WORDS_PUNCT_EXPECTED = [ + "This", + "is", + "actually', + "tricky", + "to", + "handle.", +] + +WORD_PUNCT_TOKENIZERS = [basic.WordTokenizer(ignore_punctuation=False)] + + +@pytest.mark.parametrize("tokenizer", WORD_PUNCT_TOKENIZERS) +def test_punct_word_tokenizer(tokenizer: tokenize.WordTokenizer): + tokens = tokenizer.tokenize(text=WORDS_PUNCT_TEXT) + for i, token in enumerate(WORDS_PUNCT_EXPECTED): + assert token == tokens[i] + + +@pytest.mark.parametrize("tokenizer", WORD_PUNCT_TOKENIZERS) +async def test_streamed_punct_word_tokenizer(tokenizer: tokenize.WordTokenizer): + # divide text by chunks of arbitrary length (1-4) + pattern = [1, 2, 4] + text = WORDS_PUNCT_TEXT + chunks = [] + pattern_iter = iter(pattern * (len(text) // sum(pattern) + 1)) + + for chunk_size in pattern_iter: + if not text: + break + chunks.append(text[:chunk_size]) + text = text[chunk_size:] + + stream = tokenizer.stream() + for chunk in chunks: + stream.push_text(chunk) + + stream.end_input() + + for i in range(len(WORDS_PUNCT_EXPECTED)): + ev = await stream.__anext__() + assert ev.token == WORDS_PUNCT_EXPECTED[i] + + HYPHENATOR_TEXT = [ "Segment", "expected", @@ -141,3 +195,55 @@ def test_hyphenate_word(): for i, word in enumerate(HYPHENATOR_TEXT): hyphenated = basic.hyphenate_word(word) assert hyphenated == HYPHENATOR_EXPECTED[i] + + +REPLACE_TEXT = ( + "This is a test. Hello world, I'm creating this agents.. framework. Once again " + "framework. A.B.C" +) +REPLACE_EXPECTED = ( + "This is a test. Hello universe, I'm creating this assistants.. library. twice again " + "library. A.B.C.D" +) + +REPLACE_REPLACEMENTS = { + "world": "universe", + "framework": "library", + "a.b.c": "A.B.C.D", + "once": "twice", + "agents": "assistants", +} + + +def test_replace_words(): + replaced = tokenize.utils.replace_words( + text=REPLACE_TEXT, replacements=REPLACE_REPLACEMENTS + ) + assert replaced == REPLACE_EXPECTED + + +async def test_replace_words_async(): + pattern = [1, 2, 4] + text = REPLACE_TEXT + chunks = [] + pattern_iter = iter(pattern * (len(text) // sum(pattern) + 1)) + + for chunk_size in pattern_iter: + if not text: + break + chunks.append(text[:chunk_size]) + text = text[chunk_size:] + + async def _replace_words_async(): + for chunk in chunks: + yield chunk + + replaced_chunks = [] + + async for chunk in tokenize.utils.replace_words( + text=_replace_words_async(), replacements=REPLACE_REPLACEMENTS + ): + replaced_chunks.append(chunk) + + replaced = "".join(replaced_chunks) + assert replaced == REPLACE_EXPECTED diff --git a/tests/test_tts.py b/tests/test_tts.py index 5b2ebe1d4..cd1858607 100644 --- a/tests/test_tts.py +++ b/tests/test_tts.py @@ -41,6 +41,7 @@ async def _assert_valid_synthesized_audio( google.TTS(), azure.TTS(), cartesia.TTS(), + cartesia.TTS(speed="fastest", emotion=["surprise:highest"]), ] @@ -61,6 +62,7 @@ async def test_synthesize(tts: agents.tts.TTS): elevenlabs.TTS(), elevenlabs.TTS(encoding="pcm_44100"), cartesia.TTS(), + cartesia.TTS(speed="fastest", emotion=["surprise:highest"]), agents.tts.StreamAdapter( tts=openai.TTS(), sentence_tokenizer=STREAM_SENT_TOKENIZER ), diff --git a/tests/test_vad.py b/tests/test_vad.py index e69de29bb..15d066571 100644 --- a/tests/test_vad.py +++ b/tests/test_vad.py @@ -0,0 +1,66 @@ +from livekit.agents import vad +from livekit.plugins import silero + +from . import utils + +VAD = silero.VAD.load( + min_speech_duration=0.5, min_silence_duration=0.5, padding_duration=1.0 +) + + +async def test_chunks_vad() -> None: + frames, transcript = utils.make_test_audio(chunk_duration_ms=10) + assert len(frames) > 1, "frames aren't chunked" + + stream = VAD.stream() + + for frame in frames: + stream.push_frame(frame) + + stream.end_input() + + start_of_speech_i = 0 + end_of_speech_i = 0 + async for ev in stream: + if ev.type == vad.VADEventType.START_OF_SPEECH: + with open( + f"test_vad.start_of_speech_frames_{start_of_speech_i}.wav", "wb" + ) as f: + f.write(utils.make_wav_file(ev.frames)) + + start_of_speech_i += 1 + + if ev.type == vad.VADEventType.END_OF_SPEECH: + with open( + f"test_vad.end_of_speech_frames_{end_of_speech_i}.wav", "wb" + ) as f: + f.write(utils.make_wav_file(ev.frames)) + + end_of_speech_i += 1 + + assert start_of_speech_i > 0, "no start of speech detected" + assert start_of_speech_i == end_of_speech_i, "start and end of speech mismatch" + + +async def test_file_vad(): + frames, transcript = utils.make_test_audio() + assert len(frames) == 1, "one frame should be the whole audio" + + stream = VAD.stream() + + for frame in frames: + stream.push_frame(frame) + + stream.end_input() + + start_of_speech_i = 0 + end_of_speech_i = 0 + async for ev in stream: + if ev.type == vad.VADEventType.START_OF_SPEECH: + start_of_speech_i += 1 + + if ev.type == vad.VADEventType.END_OF_SPEECH: + end_of_speech_i += 1 + + assert start_of_speech_i > 0, "no start of speech detected" + assert start_of_speech_i == end_of_speech_i, "start and end of speech mismatch" diff --git a/tests/utils.py b/tests/utils.py index efcc6f964..bd1d6fe1e 100644 --- a/tests/utils.py +++ b/tests/utils.py @@ -1,4 +1,18 @@ +from __future__ import annotations + +import io +import os +import pathlib +import wave + import jiwer as tr +from livekit import rtc +from livekit.agents import utils + +TEST_AUDIO_FILEPATH = os.path.join(os.path.dirname(__file__), "long.mp3") +TEST_AUDIO_TRANSCRIPT = pathlib.Path( + os.path.dirname(__file__), "long_transcript.txt" +).read_text() def wer(hypothesis: str, reference: str) -> float: @@ -21,3 +35,49 @@ def wer(hypothesis: str, reference: str) -> float: reference_transform=wer_standardize_contiguous, hypothesis_transform=wer_standardize_contiguous, ) + + +def read_mp3_file(path) -> rtc.AudioFrame: + mp3 = utils.codecs.Mp3StreamDecoder() + frames: list[rtc.AudioFrame] = [] + with open(path, "rb") as file: + while True: + chunk = file.read(4096) + if not chunk: + break + + frames.extend(mp3.decode_chunk(chunk)) + + return utils.merge_frames(frames) # merging just for ease of use + + +def make_test_audio( + chunk_duration_ms: int | None = None, +) -> (list[rtc.AudioFrame], str): + mp3_audio = read_mp3_file(TEST_AUDIO_FILEPATH) + + if not chunk_duration_ms: + return [mp3_audio], TEST_AUDIO_TRANSCRIPT + + chunk_size = int(mp3_audio.sample_rate / (1000 / chunk_duration_ms)) + bstream = utils.audio.AudioByteStream( + sample_rate=mp3_audio.sample_rate, + num_channels=mp3_audio.num_channels, + samples_per_channel=chunk_size, + ) + + frames = bstream.write(mp3_audio.data.tobytes()) + frames.extend(bstream.flush()) + return frames, TEST_AUDIO_TRANSCRIPT + + +def make_wav_file(frames: list[rtc.AudioFrame]) -> bytes: + buffer = utils.merge_frames(frames) + io_buffer = io.BytesIO() + with wave.open(io_buffer, "wb") as wav: + wav.setnchannels(buffer.num_channels) + wav.setsampwidth(2) # 16-bit + wav.setframerate(buffer.sample_rate) + wav.writeframes(buffer.data) + + return io_buffer.getvalue()
LiveKit Ecosystem
Realtime SDKsReact Components · Browser · Swift Components · iOS/macOS/visionOS · Android · Flutter · React Native · Rust · Node.js · Python · Unity (web) · Unity (beta)