Git Product home page Git Product logo

agents's Introduction

The LiveKit icon, the name of the repository and some sample code in the background.

LiveKit Agents

The Agent Framework is designed for building realtime, programmable participants that run on servers. Use it to create conversational, multi-modal voice agents that can see, hear, and understand.

The framework includes plugins for common workflows, such as voice activity detection and speech-to-text.

Agents integrates seamlessly with Cloud or self-hosted LiveKit server, offloading job queuing and scheduling responsibilities to it. This eliminates the need for additional queuing infrastructure. Agent code developed on your local machine can scale to support thousands of concurrent sessions when deployed to a server in production.

This SDK is currently in Developer Preview. During this period, you may encounter bugs and the APIs may change.

We welcome and appreciate any feedback or contributions. You can create issues here or chat live with us in the LiveKit Community Slack.

Docs & Guides

Note

There are breaking API changes between versions 0.7.x and 0.8.x. Please refer to the 0.8 migration guide for a detailed overview of the changes.

Examples

  • Voice assistant: A voice assistant with STT, LLM, and TTS. Demo
  • Video publishing: A demonstration of publishing RGB frames to a LiveKit Room
  • STT: An agent that transcribes a participant's audio into text
  • TTS: An agent that publishes synthesized speech to a LiveKit Room

Installation

To install the core Agents library:

pip install livekit-agents

Agents includes a set of prebuilt plugins that make it easier to compose together agents. These plugins cover common tasks like converting speech to text or vice versa and running inference on a generative AI model. You can install a plugin as follows:

pip install livekit-plugins-deepgram

The following plugins are available today:

Plugin Features
livekit-plugins-azure STT, TTS
livekit-plugins-cartesia TTS
livekit-plugins-deepgram STT
livekit-plugins-elevenlabs TTS
livekit-plugins-google STT, TTS
livekit-plugins-nltk Utilities for working with text
livekit-plugins-openai LLM, STT, TTS
livekit-plugins-silero VAD

Concepts

  • Agent: A function that defines the workflow of a programmable, server-side participant. This is your application code.
  • Worker: A container process responsible for managing job queuing with LiveKit server. Each worker is capable of running multiple agents simultaneously.
  • Plugin: A library class that performs a specific task, like speech-to-text, from a specific provider. An agent can compose multiple plugins together to perform more complex tasks.

Running an agent

The framework exposes a CLI interface to run your agent. To get started, you'll need the following environment variables set:

  • LIVEKIT_URL
  • LIVEKIT_API_KEY
  • LIVEKIT_API_SECRET

Starting the worker

This will start the worker and wait for users to connect to your LiveKit server:

python my_agent.py start

To run the worker in dev-mode (with hot code reloading), you can use the dev command:

python my_agent.py dev

Using playground for your agent UI

To ease the process of building and testing an agent, we've developed a versatile web frontend called "playground". You can use or modify this app to suit your specific requirements. It can also serve as a starting point for a completely custom agent application.

Joining a specific room

To join a LiveKit room that's already active, you can use the connect command:

python my_agent.py connect --room <my-room>

What happens when I run my agent?

When you follow the steps above to run your agent, a worker is started that opens an authenticated WebSocket connection to a LiveKit server instance(defined by your LIVEKIT_URL and authenticated with an access token).

No agents are actually running at this point. Instead, the worker is waiting for LiveKit server to give it a job.

When a room is created, the server notifies one of the registered workers about a new job. The notified worker can decide whether or not to accept it. If the worker accepts the job, the worker will instantiate your agent as a participant and have it join the room where it can start subscribing to tracks. A worker can manage multiple agent instances simultaneously.

If a notified worker rejects the job or does not accept within a predetermined timeout period, the server will route the job request to another available worker.

What happens when I SIGTERM a worker?

The orchestration system was designed for production use cases. Unlike the typical web server, an agent is a stateful program, so it's important that a worker isn't terminated while active sessions are ongoing.

When calling SIGTERM on a worker, the worker will signal to LiveKit server that it no longer wants additional jobs. It will also auto-reject any new job requests that get through before the server signal is received. The worker will remain alive while it manages any agents connected to rooms.

Downloading model files

Some plugins require model files to be downloaded before they can be used. To download all the necessary models for your agent, execute the following command:

python my_agent.py download-files

If you're developing a custom plugin, you can integrate this functionality by implementing a download_files method in your Plugin class:

class MyPlugin(Plugin):
    def __init__(self):
        super().__init__(__name__, __version__)

    def download_files(self):
        _ = torch.hub.load(
            repo_or_dir="my-repo",
            model="my-model",
        )


LiveKit Ecosystem
Realtime SDKsReact Components · Browser · Swift Components · iOS/macOS/visionOS · Android · Flutter · React Native · Rust · Node.js · Python · Unity (web) · Unity (beta)
Server APIsNode.js · Golang · Ruby · Java/Kotlin · Python · Rust · PHP (community)
Agents FrameworksPython · Playground
ServicesLiveKit server · Egress · Ingress · SIP
ResourcesDocs · Example apps · Cloud · Self-hosting · CLI

agents's People

Contributors

afigar avatar brightsparc avatar calinr avatar cs50victor avatar davidzhao avatar devdlabs avatar dsa avatar egoldschmidt avatar eltociear avatar github-actions[bot] avatar hauntsaninja avatar josephkieu avatar keepingitneil avatar lukasio avatar mattherzog avatar mike-r-mclaughlin avatar minhpq331 avatar nabil372 avatar naman-scogo avatar nbsp avatar ocupe avatar sauhardjain avatar seanmuirhead avatar theomonnom avatar ty-elastic avatar vanics avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

agents's Issues

Agent speech output audio is interpreted as user speech

When using LiveKit agents, sometimes the agent hears its own TTS output (eg via the laptop speakers) which is then interpreted as speech from the user.

This then creates a feedback loop where the agent will then translate + respond a second time to its own speech output.

This only seems to happen when device volume is above ~25-30% and audio is being played through the device speakers.

To provide a seamless UX though, the user shouldn't have to worry about managing volume level in order to prevent this.

My current approach is:

  1. When instantiating a LiveKit room, enabling audioSuppression and echoCancellation, eg:
    <LiveKitRoom
        token={createAudioCoachingCallRequest.result.room_access_token}
        serverUrl={createAudioCoachingCallRequest.result.active_server_websocket_url}
        audio={{echoCancellation: true, noiseSuppression: true}}
        connect={true}
    >
  1. Enabling allowInterruptions=True in agent.py, eg:
    assistant = VoiceAssistant(
        ...,
        allow_interruptions=True,
    )
  1. Muting the user's mic on user_speech_committed + agent_started_speaking events, then unmuting on agent_speech_committed event (eg, after the Agent finishes speaking).

Muting the user's mic is a short-term workaround -- the main limitation being that the user can't interrupt the agent once it starts speaking.

Are there best practices for preventing this feedback loop / is this something LiveKit is working on addressing?

STT Timing Information -> Propose emitting END_OF_SPEECH before FINAL_TRANSCRIPT

I'm having trouble capturing timing information with VAD + STT.

given:

openai_stt = openai.STT()
vad = silero.VAD()
vad_stream = vad.stream()
stt = StreamAdapter(openai_stt, vad_stream)
stt_stream = stt.stream()

I looked into the StreamAdapter and found that it was re-emitting the VAD start/end of speech events. I was planning to use those to capture timing, but then I found that the END_OF_SPEECH event is delayed until after the FINAL_TRANSCRIPT, meaning that the timing now includes inference and API call overhead.

It looks like the END_OF_SPEECH event includes the first alternative just for convenience. I would propose to propagate the VAD events as-is in the adapter, and direct user to the transcript events to get transcription results.

diff --git a/livekit-agents/livekit/agents/stt/stream_adapter.py b/livekit-agents/livekit/agents/stt/stream_adapter.py
index 7050178..9b2d918 100644
--- a/livekit-agents/livekit/agents/stt/stream_adapter.py
+++ b/livekit-agents/livekit/agents/stt/stream_adapter.py
@@ -76,6 +76,9 @@ class StreamAdapterWrapper(SpeechStream):
                     start_event = SpeechEvent(SpeechEventType.START_OF_SPEECH)
                     self._event_queue.put_nowait(start_event)
                 elif event.type == VADEventType.END_OF_SPEECH:
+                    end_event = SpeechEvent(type=SpeechEventType.END_OF_SPEECH)
+                    self._event_queue.put_nowait(end_event)
+
                     merged_frames = merge_frames(event.frames)
                     event = await self._stt.recognize(
                         buffer=merged_frames, *self._args, **self._kwargs
@@ -87,12 +90,6 @@ class StreamAdapterWrapper(SpeechStream):
                         alternatives=[event.alternatives[0]],
                     )
                     self._event_queue.put_nowait(final_event)
-
-                    end_event = SpeechEvent(
-                        type=SpeechEventType.END_OF_SPEECH,
-                        alternatives=[event.alternatives[0]],
-                    )
-                    self._event_queue.put_nowait(end_event)
         except Exception:
             logging.exception("stt stream adapter failed")
         finally:

Ability to perform actions when the session ends

Discussion from Slack:


I'm experimenting with the transcription agent example. It works well, but I'd like to do a bit of post processing after the room ends. I can't figure out where to do this though. It seems like the agent worker is possibly being killed immediately.
I've tried listening on the 'disconnect' event in my agent like this:

@job.room.on("disconnected")
async def on_room_disconnect():
    print('disconnected, do work here?')  # this never runs

We should ensure that all disconnected callbacks are finished before we terminate the worker process.

AttributeError on docker run: 'livekit.rtc' has no attribute 'ArgbFrame'

I executed docker build and run as follows.

docker build fal ./examples/fal
docker run fal

Then, the following error occurred.

Traceback (most recent call last):
  File "/app/fal.py", line 21, in <module>
    from fal_sd_turbo import FalSDTurbo
  File "/app/fal_sd_turbo.py", line 61, in <module>
    class SDTurboHighFPSStream:
  File "/app/fal_sd_turbo.py", line 201, in SDTurboHighFPSStream
    async def __anext__(self) -> rtc.ArgbFrame:
                                 ^^^^^^^^^^^^^
AttributeError: module 'livekit.rtc' has no attribute 'ArgbFrame'

I would appreciate it if you could tell me how to handle this error.

Thank you.

wait_pc_connection time out

I meet some error.
the pc_connection is always time out.(maybe my free account?)

@davidzhao
@JARVISMindEngineer

need some help please.

2024-06-05 16:05:05,356 - livekit - ERROR - livekit_ffi::server::room:200:livekit_ffi::server::room - error while connecting to a room: engine: connection error: wait_pc_connection timed out 2024-06-05 16:05:05,356 - livekit.agents - DEBUG - disconnecting from room 2024-06-05 16:05:05,356 - livekit - ERROR - livekit_ffi::server::room:200:livekit_ffi::server::room - error while connecting to a room: engine: connection error: wait_pc_connection timed out {"asctime": "2024-06-05 16:05:05,356", "level": "ERROR", "name": "livekit", "message": "livekit_ffi::server::room:200:livekit_ffi::server::room - error while connecting to a room: engine: connection error: wait_pc_connection timed out", "taskName": "Task-13", "job_id": "AJ_6eR7aeYjj2QR", "pid": 22306} 2024-06-05 16:05:05,608 - livekit.agents - ERROR - pipe closed, exiting job {"asctime": "2024-06-05 16:05:05,608", "level": "ERROR", "name": "livekit.agents", "message": "pipe closed, exiting job", "taskName": "Task-13", "job_id": "AJ_6eR7aeYjj2QR", "pid": 22306} 2024-06-05 16:05:05,608 - livekit.agents - INFO - job process closed {"asctime": "2024-06-05 16:05:05,608", "level": "INFO", "name": "livekit.agents", "message": "job process closed", "taskName": "Task-13", "job_id": "AJ_6eR7aeYjj2QR", "pid": 22306}

and sometimes other error
`2024-06-05 17:20:50,914 - livekit.agents - WARNING - assignment for job AJ_4UiBnp38bMBp timed out
{"asctime": "2024-06-05 17:20:50,914", "level": "WARNING", "name": "livekit.agents", "message": "assignment for job AJ_4UiBnp38bMBp timed out", "taskName": "Task-65", "req": "<livekit.agents.job_request.JobRequest object at 0x151d76780>"}
2024-06-05 17:20:50,915 - livekit.agents - ERROR - user request handler for job AJ_4UiBnp38bMBp failed
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
return await fut
^^^^^^^^^
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 386, in _user_cb
await self._opts.request_fnc(req)
File "/Users/yangqingyuan/PycharmProjects/livekit/main.py", line 65, in request_fnc
await req.accept(entrypoint)
File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/job_request.py", line 127, in accept
raise exc
File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 429, in _wait_response
await asyncio.wait_for(wait_assignment, consts.ASSIGNMENT_TIMEOUT)
File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 519, in wait_for
async with timeouts.timeout(timeout):
File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/timeouts.py", line 115, in aexit
raise TimeoutError from exc_val
TimeoutError
{"asctime": "2024-06-05 17:20:50,915", "level": "ERROR", "name": "livekit.agents", "message": "user request handler for job AJ_4UiBnp38bMBp failed", "exc_info": "Traceback (most recent call last):\n File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 520, in wait_for\n return await fut\n ^^^^^^^^^\nasyncio.exceptions.CancelledError\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 386, in _user_cb\n await self._opts.request_fnc(req)\n File "/Users/yangqingyuan/PycharmProjects/livekit/main.py", line 65, in request_fnc\n await req.accept(entrypoint)\n File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/job_request.py", line 127, in accept\n raise exc\n File "/Users/yangqingyuan/venv/lib/python3.12/site-packages/livekit/agents/worker.py", line 429, in _wait_response\n await asyncio.wait_for(wait_assignment, consts.ASSIGNMENT_TIMEOUT)\n File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py", line 519, in wait_for\n async with timeouts.timeout(timeout):\n File "/opt/homebrew/Cellar/[email protected]/3.12.3/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/timeouts.py", line 115, in aexit\n raise TimeoutError from exc_val\nTimeoutError", "taskName": "Task-66", "req": "<livekit.agents.job_request.JobRequest object at 0x151d76780>"}`

OnDemand Agent

A use case we're exploring involves implementing a translator agent that would only be needed when participants are speaking different languages. It seems that an on-demand agent would be better suited than agents automatically joining the room since the translator agent wouldn't always be needed. We envision user interaction that allows for starting and stopping the agent as needed. Other than running "python my_agent.py simulate-job --room-name ", are there any other approaches for this or anything the team considering?

Elevenlabs TTS websocket connection design

Hi,

I was able to make the minimal_assistant.py implementation work. Once I sorted out all the difficulties, it runs pretty well! Kudos for that 😃.

I have a question regarding the WebSocket connections used in the ElevenLabs TTS module. In my environment, I noticed that the WebSocket creation is being triggered every time the agent responds to the user. Consequently, the WebSocket is being closed every time the agent stops talking.

Questions:

  • Is this a design decision? If so, could you please explain the rationale behind it?

  • Is there a specific reason for not maintaining a persistent WebSocket connection throughout the session?

I believe closing and reopening the WebSocket repeatedly introduces unnecessary overhead. Maintaining one or a few stable connections throughout the session might be more efficient.

Looking forward to your insights on this.

Thank you!

error when running the examples on codespaces

When I try to run the voice assistant example on github codespaces, upon the execution of python minimal_assistant.py download-files
I get the following error :

python minimal_assistant.py download-files
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.deepgram.DeepgramPlugin object at 0x7b7d8a435030>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Finished downloading files for <livekit.plugins.deepgram.DeepgramPlugin object at 0x7b7d8a435030>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.elevenlabs.ElevenLabsPlugin object at 0x7b7d8a436050>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Finished downloading files for <livekit.plugins.elevenlabs.ElevenLabsPlugin object at 0x7b7d8a436050>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.openai.OpenAIPlugin object at 0x7b7d8a436e60>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Finished downloading files for <livekit.plugins.openai.OpenAIPlugin object at 0x7b7d8a436e60>"}
{"asctime": "2024-05-27 18:26:13,005", "level": "INFO", "name": "livekit.agents", "message": "Downloading files for <livekit.plugins.silero.SileroPlugin object at 0x7b7d89471ba0>"}
Using cache found in /home/codespace/.cache/torch/hub/snakers4_silero-vad_master
Traceback (most recent call last):
  File "/workspaces/agents/examples/voice-assistant/minimal_assistant.py", line 43, in <module>
    cli.run_app(WorkerOptions(request_fnc))
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 191, in run_app
    cli()
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 188, in download_files
    plugin.download_files()
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/livekit/plugins/silero/__init__.py", line 29, in download_files
    _ = torch.hub.load(
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/hub.py", line 568, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/hub.py", line 594, in _load_local
    hub_module = _import_module(MODULE_HUBCONF, hubconf_path)
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/hub.py", line 106, in _import_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/codespace/.cache/torch/hub/snakers4_silero-vad_master/hubconf.py", line 5, in <module>
    from utils_vad import (init_jit_model,
  File "/home/codespace/.cache/torch/hub/snakers4_silero-vad_master/utils_vad.py", line 2, in <module>
    import torchaudio
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/torchaudio/__init__.py", line 2, in <module>
    from . import _extension  # noqa  # usort: skip
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/torchaudio/_extension/__init__.py", line 38, in <module>
    _load_lib("libtorchaudio")
  File "/usr/local/python/3.10.13/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 60, in _load_lib
    torch.ops.load_library(path)
  File "/home/codespace/.local/lib/python3.10/site-packages/torch/_ops.py", line 1032, in load_library
    ctypes.CDLL(path)
  File "/usr/local/python/3.10.13/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libtorch_cuda.so: cannot open shared object file: No such file or directory

Anyone knows the cause behind this ?

Hidden agent can not publish audio track

A hidden agent will not publish its audio track and have other participants subscribe to it. Other real participants do not automatically subscribe to this track and do not receive track publish events. If the agent is marked not hidden / or visible, then the tracks publish correctly and the other participants automatically subscribe to them. I imagine most use cases will be for the agent to be hidden. Is this a bug or by design? For a use case we are looking at, it would be great to have this functionality.

Feature Request: Support for Digital Avatars in Audio/Video and Text Communication

I have familiarized myself with the project at https://livekit.io/kitt, and I believe it offers a great opportunity to seamlessly integrate with the ChatGPT API. I am curious to know if LiveKit is preparing for the emergence of Sora.


I am exploring innovative ways to enhance user interaction within my application, particularly through the integration of digital avatars. These avatars could either be predefined virtual characters or dynamically generated based on user inputs, such as descriptions or images. The core idea is to facilitate audio/video and text-based communication between the user and these digital avatars, enriching the overall user experience.

Feature Description:

  • Digital Avatar Integration: Ability to integrate digital avatars that serve as the user's counterpart in communications. These avatars can be virtual characters (predefined) or created dynamically from user inputs (text descriptions, images, etc.).
  • Audio/Video Communication: Users should be able to engage in audio/video calls with these avatars, where the avatars can generate responses in real-time.
  • Text Communication: Alongside audio/video capabilities, the system should support text-based interactions between the user and the avatar.
  • Real-time Subtitles: For audio and video communications, real-time subtitles or captions that reflect the avatar's responses could greatly enhance accessibility and user understanding.

Implementation Considerations:

  • AI and Machine Learning: Utilizing AI to interpret user inputs for dynamic avatar creation and to drive the interaction model (speech recognition, text-to-speech, natural language processing).
  • LiveKit Integration: How can LiveKit support the backend infrastructure for such an interaction model? This includes considerations for low-latency audio/video streaming and data transmission for text chat.
  • Customizability and Scalability: Ensuring the system supports a wide range of avatar customizations and can scale to support a large number of concurrent interactions.

I am particularly interested in understanding whether the current capabilities of LiveKit can support such a feature or if there are planned updates that could facilitate this. Additionally, any guidance on how one might approach implementing this feature, considering the architectural and technological requirements, would be greatly appreciated.

Thank you for considering this request. I believe that the integration of digital avatars into communication platforms can significantly enhance user engagement and offer novel interaction experiences.

Duplicated agent responses (LLM inference + TTS audio)

I've noticed that occasionally the agent will generate two distinct responses (LLM inference and TTS audio) for the same user input.

Interestingly, the second LLM inference isn't generated until after the first TTS audio is completed.

Usually, the second LLM inference + response will be generated using the entire user input, with the first inference being generated using a fraction of it (eg, it doesn't always seem to wait for the user to finish, or handle an interruption cleanly).

Another variant of this I've seen is that sometimes the LLM inference seems to get "caught" - eg, the AI will respond to the previous question I asked instead of the current one, but only once I ask the following question.

Will add logs here as repro's happen locally.

Am using this local setup on Macbook Chrome:

    assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o"),
        tts=elevenlabs.TTS(voice=DEFAULT_VOICE),
        chat_ctx=initial_ctx,
        fnc_ctx=fnc_ctx,
        allow_interruptions=True,
        debug=True
    )

Elevenlabs should work for Free and Starter price tiers

The ElevenLabs plugin was written when PCM audio was supported in lower price tiers. Now it looks like only mp3 audio is supported in the lower price tiers.

Our default implementation currently doesn't work. This issue represents:

  1. The ElevenLabs Plugin should support for the mp3 output formats
  2. The default for the ElevenLabs should be an output format supported by the lowest price tier

remix_and_resample effect optimization


                if isinstance(data, rtc.AudioFrame):
                    # TODO(theomonnom): The remix_and_resample method is low quality
                    # and should be replaced with a continuous resampling
                    frame = data.remix_and_resample(
                        self._sample_rate, self._num_channels
                    )

When will this TODO be optimized? After testing, the accuracy of this solution is much different from that of using pyaudio to read data from the microphone.

VADEventType error

When I was running the worker locally, I encountered an issue where it was impossible to conduct agent conversations. After debugging, it was discovered that within the "silero/vad.py" file, the method "_dispatch_event" consistently failed to create "agents.vad.VADEvent".

Subsequently, the issue was identified to be related to the VADEventType, as illustrated in the figure.
wait for help @davidzhao

问题图1
image
image

Eleven Labs TTS Stream Doesn't return the text of the audio events being generated

Just add this code to fix it please

          text = ''
          try:
              text = ''.join(msg['normalizedAlignment']['chars'])
          except Exception:
              pass

In this section of the code
env/lib/python3.10/site-packages/livekit/plugins/elevenlabs/tts.py

LINE 286: msg = json.loads(msg.data)
if msg.get("audio"):
    data = base64.b64decode(msg["audio"])
    audio_frame = rtc.AudioFrame(
        data=data,
        sample_rate=self._config.sample_rate,
        num_channels=1,
        samples_per_channel=len(data) // 2,
    )
    text = ''
    try:
        text = ''.join(msg['normalizedAlignment']['chars'])
    except Exception:
        pass
    self._event_queue.put_nowait(
        tts.SynthesisEvent(
            type=tts.SynthesisEventType.AUDIO,
            audio=tts.SynthesizedAudio(text=text, data=audio_frame),
        )
    )

Quickstart on the doc doesn't work. "Waiting for audio track" forever

I was trying to learn the agent but even the quickstart on the documentation doesn't work. I have properly set the steps and also set up deepgram api key. The agent playground works and I was able to join the room and Agent connected true but status is "starting" forever and in audio section, "Waiting for audio track" forever. I'm using Mac M3 with chrome.
Please help me out. Thanks in advance.

agent architecture - recovery from agent shutdown

if an agent (implemented by users of livekit using the python sdk) crashes, then ideally the livekit-backend should continue pinging the agent pool until another agent becomes available.

however this is not the case:
if the agent stops (for instance I am running the agent in my debugger, and stop the debugger to change code),
and then run the agent again, the livekit-backend never calls my agent to rejoin the room.

for a system built with agents to become production-ready, this error recovery mechanism is a must

KITT not working - latest master

KITT example is not working throwing this exception:

Exception in thread Thread-2 (_write_thread):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
{"asctime": "2024-04-13 20:00:42,152", "level": "DEBUG", "name": "root", "message": "process started", "job_id": "AJ_AUFCiWbbBkka", "url": "http://livekit-server.livekit", "pid": 92412}
failed to write log: to_bytes() missing required argument 'byteorder' (pos 2)
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/venv/lib/python3.10/site-packages/livekit/agents/apipe.py", line 49, in _write_thread
    ipc_enc.write_msg(self._p, msg)
  File "/venv/lib/python3.10/site-packages/livekit/agents/ipc_enc.py", line 45, in write_msg
    b.write(msg.MSG_ID.to_bytes(4))
TypeError: to_bytes() missing required argument 'byteorder' (pos 2)

deepgram transcribe quality in livekit much lower than when transcribing on deepgram demo website

please compare the transcribing the accuracy of running the livekit+deepgram agent demo with running the deepgram demo on their website, you will see quality of transcription much higher on deepgram.

deepgram demo:
https://console.deepgram.com/project/<YOUR_PROJECT_ID>/mission/transcribe-your-voice-in-realtime

livekit+agent demo:
https://github.com/livekit/agents/tree/main/livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram

TTS Steaming is Broken for livekit-plugins-elevenlabs v0.4.0

This is a more clear description + RCA of #279

I think the issue is related to the TTS streaming implementation of livekit-plugins-elevenlabs. The reason I think that is because when I comment out this code from the assistant.py code, the very first assistant.say call is sent down to the agent playground.

Debugging further, I was able to verify that OpenAI is sending down the correct response to voices, but it wasn't getting streamed as audio to the playground.

I also tried downgrading to all the 0.4.0 dev versions, and none of them fixed the issue. 0.3.0 didn't work at all with the other packages.

The product code I'm using is the official agents quickstart guide.

livekit is not support CentOS 7?

I want to start livekit-agents
I used livekit-cloud.
But I meet this error.

OSError: /home/shared/apps/tutor-livekit-speech/.venv/lib/python3.12/site-packages/livekit/rtc/resources/liblivekit_ffi.so: cannot open shared object file: No such file or directory

system:
Linux version 4.19.104-300.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) livekit/livekit#1 SMP Mon Feb 17 15:34:16 UTC 2020
CentOS Linux release 7.6.1810 (Core)

version:
livekit: 0.11.1
livekit-agents:0.6.0

language:
python3.12

Bug in the examples/kitt

After the latest updates, the example agent KITT does not run

Exception in inference 'SynthesizeStream' object has no attribute 'flush'
Traceback (most recent call last):
  File "examples/kitt/inference_job.py", line 111, in _run
    await asyncio.gather(
  File "examples/kitt/inference_job.py", line 132, in _llm_task
    await self._tts_stream.flush()
          ^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'SynthesizeStream' object has no attribute 'flush'

openai.tts with StreamAdapter has some bugs

The experience with this SDK is a bit poor

1、elevenlabs tts.py line:333, if meet api error, please take a log. @MichaelYang1995 china area can't visit elevenlabs, with VPN only use paid elevenlabs API. image be a paid api user to try.

2、openai.tts with StreamAdapter has some bugs, if you follow agent quick-start doc, and use follow code:

openai_tts = openai.TTS(
            model=openai.TTSModels, 
            voice=openai.TTSVoices)
    vad = silero.VAD()
    vad_stream = vad.stream(min_silence_duration=1.0)
    tts = agents.tts.StreamAdapter(openai_tts, vad_stream)

you will got some error like VADStream has no attribute 'stream', from file: livekit/agents/voice_assistant/assistant.py:728 @keepingitneil it's a code bug right? i want to use openai.tts-1 not elevenlabs, how to fix it ?

elevenlabs-plugin: "cloned" + "professional" voices not working

With the minimal_assistant.py example, the above categories of voices don't seem to generate output properly. Instead it seems to hang for long periods of time.

Am seeing LLM chat completion requests completing successfully, which seems to suggest that STT and OpenAI are working, but no audio output.

I'm using a Macbook with M1 / Ventura 13.0.1, running Python 3.12.

elevenlabs-plugin is working correctly with "premade" voices, and also with OpenAI's TTS (though it doesn't support streaming)

Examples of non-working code:

Voice = elevenlabs.Voice(
    id=MY_VOICE_ID,
    name="Voice Name",
    category="professional",
    settings=elevenlabs.VoiceSettings(
        stability=0.60, similarity_boost=1.0
    )
)

assistant = VoiceAssistant(
  vad=silero.VAD(),
  stt=deepgram.STT(),
  llm=openai.LLM(),
  tts=elevenlabs.TTS(voice=Voice),
  chat_ctx=initial_ctx,
)
assistant.start(ctx.room)

As well as:

Voice = elevenlabs.Voice(
    id=MY_VOICE_ID,
    name="Voice Name",
    category="cloned",
    settings=elevenlabs.VoiceSettings(
        stability=0.60, similarity_boost=1.0
    )
)

assistant = VoiceAssistant(
  vad=silero.VAD(),
  stt=deepgram.STT(),
  llm=openai.LLM(),
  tts=elevenlabs.TTS(voice=Voice),
  chat_ctx=initial_ctx,
)
assistant.start(ctx.room)

What I'm seeing in logs from agent running locally:

2024-05-14 01:56:21,398 INFO  httpx  HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:22,141 INFO  httpx  HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:23,720 ERROR  livekit.plugins.elevenlabs  11labs connection failed
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 365, in _run_ws
    await asyncio.gather(send_task(), recv_task())
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 340, in recv_task
    raise Exception("11labs connection closed unexpectedly")
Exception: 11labs connection closed unexpectedly
	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:32,020 INFO  httpx  HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"	  job_id=AJ_oiddbfaSEWxh pid=56250
2024-05-14 01:56:33,258 ERROR  livekit.plugins.elevenlabs  11labs connection failed
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 365, in _run_ws
    await asyncio.gather(send_task(), recv_task())
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/livekit/plugins/elevenlabs/tts.py", line 340, in recv_task
    raise Exception("11labs connection closed unexpectedly")
Exception: 11labs connection closed unexpectedly
	  job_id=AJ_oiddbfaSEWxh pid=56250

Happy to share more context as needed - awesome project!

wait_pc_connection error job_request.accept

I am testing agents and everything has been working fine until today when I started getting wait_pc_connection error thrown in the async def job_request_cb(job_request: agents.JobRequest):
await job_request.accept(

I am also seeing this in the logs, which I am not sure if is related:

2024-04-12 11:09:44,964 - ERROR - livekit_api::signal_client::signal_stream:171:livekit_api::signal_client::signal_stream - unhandled websocket message Err(Protocol(ResetWithoutClosingHandshake))
2024-04-12 11:09:44,965 - DEBUG - livekit::rtc_engine::rtc_session:455:livekit::rtc_engine::rtc_session - received leave request: LeaveRequest { can_reconnect: false, reason: DuplicateIdentity }

I tried upgrading all the of the livekit python libraries but no luck.

Any ideas?

OpenAI: Optional arguments for ai.callable results in Key Error

When using an optional parameter with @llm.ai_callable, an "Key Error: 'key'" is thrown at https://github.com/livekit/agents/blob/main/livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py#L169

@llm.ai_callable(desc="Use this tool to find a flight.")
async def search_flights(
    self,
    departure_id: Annotated[str, llm.TypeInfo(desc="Departure airport's IATA code or a city's Freebase ID starting with.")],
    passengers: Annotated[str, llm.TypeInfo(desc="Number of passenger to book the ticket for.")] = "1"
):

This happens because the args do not contain the "passenger" argument. Using arg.default it would be possible to populate the default values beforehand.

fnc = fncs[name]
# validate args before calling fnc
for arg in fnc.args.values():
    # Populate args with default values
    if arg.name not in args and arg.default is not inspect.Parameter.empty:
        args[arg.name] = arg.default

    if arg.default is inspect.Parameter.empty and arg.name not in args:
        logger.error(f"missing required arg {arg.name} for ai_callable {name}")
        return
  • Affected version livekit-plugins-openai~=0.4.dev1

livekit is not support CentOS 7?

I want to start livekit-agents
I used livekit-cloud.
I start this:
python3 main.py download-files
But I meet this error.

OSError: /home/shared/apps/tutor-livekit-speech/.venv/lib/python3.12/site-packages/livekit/rtc/resources/liblivekit_ffi.so: cannot open shared object file: No such file or directory

system:
Linux version 4.19.104-300.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)) #1 SMP Mon Feb 17 15:34:16 UTC 2020
CentOS Linux release 7.6.1810 (Core)

version:
livekit: 0.11.1
livekit-agents:0.6.0

language:
python3.12

Can clients re-use token after disconnect?

I recently used livekit for my application using agents example. And I had a problem:

What happended?

I use Android application to connect to livekit with genereated token and wss url, everything is okay. Then I disconnect my client from livekit using room.disconnect(). I saw that my agents still running for this room. After that, I reconnect using the same token and I only see a black screen.
In server side I saw this log:

{"asctime": "2024-04-25 09:11:32,833", "level": "ERROR", "name": "livekit.agents", "message": "livekit_api::signal_client::signal_stream:178:livekit_api::signal_client::signal_stream - unhandled websocket message Err(Protocol(ResetWithoutClosingHandshake))", "job_id": "AJ_8oNxanmAa5c2", "pid": 133}
{"asctime": "2024-04-25 09:11:32,837", "level": "DEBUG", "name": "livekit", "message": "livekit::rtc_engine:377:livekit::rtc_engine - engine task closed"}
{"asctime": "2024-04-25 09:11:32,837", "level": "DEBUG", "name": "livekit", "message": "livekit::room:943:livekit::room - disconnected from room: UnknownReason"}

What I expected:

I can re-use the token from client to re-connect to the room.
Please help,
Thanks!

ideo Frame Not Appearing in Livekit Playground on Mac M3 - Agent Status Stuck at "Starting"

Hi I'm trying to test the playground with agent but the video frame doesn't appear somehow. It shows dark green color on the video section. And Agent connection True but agent status is "starting" forever.
Here is my code for agent. I'm using mac M3.
import logging
from livekit import rtc
from livekit.agents import JobContext, JobRequest, WorkerOptions, cli

WIDTH = 640
HEIGHT = 480

async def entrypoint(job: JobContext):
room = job.room
source = rtc.VideoSource(WIDTH, HEIGHT)
track = rtc.LocalVideoTrack.create_video_track("video", source)
options = rtc.TrackPublishOptions(source=rtc.TrackSource.SOURCE_CAMERA)
publication = await room.local_participant.publish_track(track, options)
logging.info("published track", extra={"track_sid": publication.sid})

async def request_fnc(req: JobRequest) -> None:
logging.info("received request %s", req)
await req.accept(entrypoint)

if name == "main":
from dotenv import load_dotenv
import os

load_dotenv()

LIVEKIT_API_KEY = os.getenv("LIVEKIT_API_KEY")
LIVEKIT_API_SECRET = os.getenv("LIVEKIT_API_SECRET")
LIVEKIT_URL = os.getenv("NEXT_PUBLIC_LIVEKIT_URL")
cli.run_app(WorkerOptions(request_fnc=request_fnc, api_key=LIVEKIT_API_KEY, api_secret=LIVEKIT_API_SECRET,ws_url=LIVEKIT_URL))

Could someone help?

ZeroDivisionError in voice assistant during function calling and answering

Just tried function calling example with following versions:

livekit-agents==0.7.0
livekit-plugins-openai==0.5.0
livekit-plugins-deepgram==0.5.0
livekit-plugins-elevenlabs==0.5.0
livekit-plugins-silero==0.5.0

When i request toggle lights, I get ZeroDivisionError error:

Task exception was never retrieved
future: <Task finished name='Task-6761' coro=<entrypoint.<locals>._answer_light_toggling() done, defined at /path/to/project/test.py:65> exception=ZeroDivisionError('float division by zero')>

Traceback (most recent call last):
  File "/path/to/project/test.py", line 80, in _answer_light_toggling
    await assistant.say(stream)
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 225, in say
    await self._start_speech(data, interrupt_current_if_possible=True)
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 634, in _start_speech
    await self._play_task
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 712, in _play_speech_if_validated
    await _synthesize_task
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 818, in _synthesize_task
    await _forward_task
  File "/path/to/packages/livekit/agents/voice_assistant/assistant.py", line 773, in _forward_stream
    tts_forwarder.mark_audio_segment_end()
  File "/path/to/packages/livekit/agents/transcription/tts_forwarder.py", line 170, in mark_audio_segment_end
    seg.avg_speed = len(self._calc_hyphenes(seg.text)) / seg.audio_duration
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~
ZeroDivisionError: float division by zero
Here is the `test.py` file content

import asyncio
import logging
from enum import Enum
from typing import Annotated

from livekit.agents import (
    JobContext,
    JobRequest,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.voice_assistant import AssistantContext, VoiceAssistant
from livekit.plugins import deepgram, elevenlabs, openai, silero


class Room(Enum):
    BEDROOM = "bedroom"
    LIVING_ROOM = "living room"
    KITCHEN = "kitchen"
    BATHROOM = "bathroom"
    OFFICE = "office"


class AssistantFnc(llm.FunctionContext):
    @llm.ai_callable(desc="Turn on/off the lights in a room")
    async def toggle_light(
        self,
        room: Annotated[Room, llm.TypeInfo(desc="The specific room")],
        status: bool,
    ):
        logging.info("toggle_light %s %s", room, status)
        ctx = AssistantContext.get_current()
        key = "enabled_rooms" if status else "disabled_rooms"
        li = ctx.get_metadata(key, [])
        li.append(room)
        ctx.store_metadata(key, li)

    @llm.ai_callable(desc="User want the assistant to stop/pause speaking")
    def stop_speaking(self):
        pass  # do nothing


async def entrypoint(ctx: JobContext):
    gpt = openai.LLM(model="gpt-4-turbo")

    initial_ctx = llm.ChatContext(
        messages=[
            llm.ChatMessage(
                role=llm.ChatRole.SYSTEM,
                text="You are a voice assistant created by LiveKit. Your interface with users will be voice. You should use short and concise responses, and avoiding usage of unpronouncable punctuation.",
            )
        ]
    )

    assistant = VoiceAssistant(
        vad=silero.VAD(),
        stt=deepgram.STT(),
        llm=gpt,
        tts=elevenlabs.TTS(),
        fnc_ctx=AssistantFnc(),
        chat_ctx=initial_ctx,
    )

    async def _answer_light_toggling(enabled_rooms, disabled_rooms):
        prompt = "Make a summary of the following actions you did:"
        if enabled_rooms:
            enabled_rooms_str = ", ".join(enabled_rooms)
            prompt += f"\n - You enabled the lights in the following rooms: {enabled_rooms_str}"

        if disabled_rooms:
            disabled_rooms_str = ", ".join(disabled_rooms)
            prompt += f"\n - You disabled the lights in the following rooms {disabled_rooms_str}"

        chat_ctx = llm.ChatContext(
            messages=[llm.ChatMessage(role=llm.ChatRole.SYSTEM, text=prompt)]
        )

        stream = await gpt.chat(chat_ctx)
        await assistant.say(stream)

    @assistant.on("agent_speech_interrupted")
    def _agent_speech_interrupted(chat_ctx: llm.ChatContext, msg: llm.ChatMessage):
        msg.text += "... (user interrupted you)"

    @assistant.on("function_calls_finished")
    def _function_calls_done(ctx: AssistantContext):
        logging.info("function_calls_done %s", ctx)
        enabled_rooms = ctx.get_metadata("enabled_rooms", [])
        disabled_rooms = ctx.get_metadata("disabled_rooms", [])

        if enabled_rooms or disabled_rooms:
            # if there was a change in the lights, summarize it and let the user know
            asyncio.ensure_future(_answer_light_toggling(enabled_rooms, disabled_rooms))

    assistant.start(ctx.room)
    await asyncio.sleep(3)
    await assistant.say("Hey, how can I help you today?")


async def request_fnc(req: JobRequest) -> None:
    logging.info("received request %s", req)
    await req.accept(entrypoint)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(request_fnc))

Re-join the room after agent server restarted

Currently, if agent server were restarted, all agents of joined rooms would be disconnected. Is there any way to make the agent re-join the room, or to invite the agent to the room actively?

My playground is not voice and can't talk with agent #293

I successfully deployed the worker locally and registered it using ElevenLabs, but after a successful connection, there is no voice saying "Hey, how can I help you today?".

I talk with agent, there is not response and voice. And I didn't find some error log.

image

this is my log:

import sys; print('Python %s on %s' % (sys.version, sys.platform)) /Users/yangqingyuan/anaconda3/bin/python -X pycache_prefix=/Users/yangqingyuan/Library/Caches/JetBrains/PyCharm2024.1/cpython-cache /Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py --multiprocess --qt-support=auto --client 127.0.0.1 --port 61578 --file /Users/yangqingyuan/main.py start Connected to pydev debugger (build 241.15989.155) {"asctime": "2024-05-20 17:55:01,480", "level": "INFO", "name": "livekit.agents", "message": "starting worker", "version": "0.6.0"} {"asctime": "2024-05-20 17:55:04,062", "level": "INFO", "name": "livekit.agents", "message": "registered worker", "id": "AW_3947XPhMbsS8", "server_info": "edition: Cloud\nversion: \"1.6.1\"\nprotocol: 13\nregion: \"Japan\"\nnode_id: \"NC_OTOKYO1A_a23my2USzTK5\"\n"} {"asctime": "2024-05-20 17:55:13,499", "level": "INFO", "name": "root", "message": "received request <livekit.agents.job_request.JobRequest object at 0x142008ad0>"} {"asctime": "2024-05-20 17:55:13,607", "level": "INFO", "name": "livekit.agents", "message": "accepted job AJ_nPJmwytXoBGM", "job": "id: \"AJ_nPJmwytXoBGM\"\nroom {\n sid: \"RM_PNHFgotZGwGc\"\n name: \"playground-rVeN-H9h0\"\n empty_timeout: 300\n creation_time: 1716198912\n enabled_codecs {\n mime: \"video/H264\"\n }\n enabled_codecs {\n mime: \"video/VP8\"\n }\n enabled_codecs {\n mime: \"video/VP9\"\n }\n enabled_codecs {\n mime: \"video/AV1\"\n }\n enabled_codecs {\n mime: \"audio/red\"\n }\n enabled_codecs {\n mime: \"audio/opus\"\n }\n version {\n unix_micro: 1716198912832897\n }\n departure_timeout: 20\n}\nnamespace: \"default\"\n"} {"asctime": "2024-05-20 17:55:22,110", "level": "INFO", "name": "livekit", "message": "livekit_ffi::server:125:livekit_ffi::server - initializing ffi server v0.5.0", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:22,113", "level": "INFO", "name": "livekit", "message": "livekit_ffi::cabi:27:livekit_ffi::cabi - initializing ffi server v0.5.0", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:22,125", "level": "INFO", "name": "livekit", "message": "livekit_api::signal_client::signal_stream:88:livekit_api::signal_client::signal_stream - connecting to wss://frogpig-iidj6sdo.livekit.cloud/rtc?sdk=rust&protocol=9&auto_subscribe=1&adaptive_stream=0&access_token=...", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} Using cache found in /Users/yangqingyuan/.cache/torch/hub/snakers4_silero-vad_master 2024-05-20 17:55:33.112970 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '628'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115145 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '629'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115185 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '623'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115194 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '625'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115204 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '620'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115500 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '139'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115529 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '131'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115535 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '140'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115541 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '134'. It is not used by any node and should be removed from the model. 2024-05-20 17:55:33.115546 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer '136'. It is not used by any node and should be removed from the model. {"asctime": "2024-05-20 17:55:33,517", "level": "WARNING", "name": "livekit.agents", "message": "Running <Task pending name='Task-16' coro=<entrypoint() running at /Users/yangqingyuan/main.py:43> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[_start.<locals>._start_if_valid.<locals>.log_exception() at /Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/livekit/agents/ipc/job_main.py:99]> took too long: 2.36 seconds", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,517", "level": "INFO", "name": "livekit.agents", "message": "assistant - saying", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,518", "level": "INFO", "name": "livekit.agents", "message": "assistant - synthesizing text", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,519", "level": "INFO", "name": "livekit.agents", "message": "assistant - enqueuing speech", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,520", "level": "INFO", "name": "livekit.agents", "message": "assistant - speech validated, data=", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:36,527", "level": "INFO", "name": "livekit.agents", "message": "_SpeechData(source=<async_generator object VoiceAssistant.say.<locals>._gen at 0x134344d00>, allow_interruptions=True, add_to_ctx=True, val_ch=<livekit.agents.aio.channel.Chan object at 0x134303390>, validated=True, interrupted=True, collected_text='', answering_user_speech=None)", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:37,159", "level": "INFO", "name": "livekit.plugins.elevenlabs", "message": "waiting for 11labs message", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:37,991", "level": "INFO", "name": "livekit.plugins.elevenlabs", "message": "received 11labs message WSMessage(type=<WSMsgType.TEXT: 1>, data='{\"audio\":\"\",\"isFinal\":null,\"normalizedAlignment\":{\"chars\":[\" \",\"H\",\"e\",\"y\",\",\",\" \",\"h\",\"o\",\"w\",\" \",\"c\",\"a\",\"n\",\" \",\"I\",\" \",\"h\",\"e\",\"l\",\"p\",\" \",\"y\",\"o\",\"u\",\" \",\"t\",\"o\",\"d\",\"a\",\"y\",\"?\",\" \"],\"charStartTimesMs\":[0,35,81,174,221,255,279,313,348,372,406,441,476,511,546,580,615,650,685,720,755,789,813,836,871,906,929,975,1045,1091,1207,1277],\"charDurationsMs\":[35,46,93,47,34,24,34,35,24,34,35,35,35,35,34,35,35,35,35,35,34,24,23,35,35,23,46,70,46,116,70,163]},\"alignment\":{\"chars\":[\"H\",\"e\",\"y\",\",\",\" \",\" \",\"h\",\"o\",\"w\",\" \",\"c\",\"a\",\"n\",\" \",\"I\",\" \",\"h\",\"e\",\"l\",\"p\",\" \",\"y\",\"o\",\"u\",\" \",\"t\",\"o\",\"d\",\"a\",\"y\",\"?\"],\"charStartTimesMs\":[0,81,174,221,255,279,279,313,348,372,406,441,476,511,546,580,615,650,685,720,755,789,813,836,871,906,929,975,1045,1091,1207],\"charDurationsMs\":[81,93,47,34,24,0,34,35,24,34,35,35,35,35,34,35,35,35,35,35,34,24,23,35,35,23,46,70,46,116,233]}}', extra='')", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:55:37,995", "level": "INFO", "name": "livekit.plugins.elevenlabs", "message": "waiting for 11labs message", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:56:48,664", "level": "WARNING", "name": "livekit.plugins.deepgram", "message": "deepgram connection failed, retrying in 0s", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:57:28,808", "level": "WARNING", "name": "livekit.agents", "message": "job is unresponsive", "delay": 27, "job_id": "AJ_nPJmwytXoBGM", "pid": 88921} {"asctime": "2024-05-20 17:58:03,682", "level": "WARNING", "name": "livekit.plugins.deepgram", "message": "deepgram connection failed, retrying in 2s", "job_id": "AJ_nPJmwytXoBGM", "pid": 88921}

what is the working version of example/kitt

I am trying to local deploy the example/kitt on branch neil/kitt, but it fails to work, the agent can join in the room, but when llm replies the text, the agent is not speaking. The log shows below, looks tts is not called in the process. Is the latest example/kitt working on branch neil/kitt? I know there are many changes in the kitt area. How can I get the working example kitt working? Thanks.
image
Uploading image (1).png…

In Kitt example, eleven labs API throws many time out errors

What I expect

When Running the agent I see logs but no errors

What happens

The repeated error you get while the agent is running:

ERROR:root:Unhandled message from ElevenLabs: {'message': 'Have not received a new text input within the timeout of 20.0 seconds. Streaming input terminated. Please make sure to either feed the input in the timely manner, or to send the end of input text (empty string "") when done.', 'error': 'input_timeout_exceeded', 'code': 1008}

How to reproduce

Follow the steps to run the kitt example and open her up in the playground.

Severity

This error happens repeatedly and even after a few seconds of this failure on an unpaid eleven labs account a free account will get banned with this error response:

ERROR:root:Unhandled message from ElevenLabs: {'message': 'Unusual activity detected. Free Tier usage disabled. If you are using 
a proxy/VPN you might need to purchase a Paid Plan to not trigger our abuse detectors. Free Tier only works if users do not abuse it, for example by creating multiple free accounts. If we notice that many people try to abuse it, we will need to reconsider Free Tier altogether. \nPlease play fair and purchase any Paid Subscription to continue.', 'error': 'detected_unusual_activity', 'code': 1008}

Adding timestamp (+ customizable logging configuration?) to worker logging

Currently, worker.py does not offer an easy way to customize log formats, particularly to include timestamps in logs for debugging. This makes it difficult to debug our worker deployment.

Feature request
Add logging functionality to allow users to customize the log format, either by allowing users to use custom logger or simply adding another CLI flag to toggle timestamps in logs.

Questions for maintainers

  • Have you considered adding timestamps to worker logging?
  • Is there a specific approach you prefer?

Feature Request: Node.js Environment Support for LiveKit Agents

Feature Details:

Node.js SDK Integration: Develop a dedicated Node.js SDK for LiveKit, providing developers with native access to LiveKit functionalities within their Node.js environments. This SDK should offer comprehensive coverage of LiveKit features, ensuring parity with existing SDKs.
NPM Package: Publish the Node.js SDK as an NPM package, facilitating easy installation and dependency management for Node.js projects. This approach aligns with established Node.js development practices and enhances the accessibility of LiveKit within the Node.js ecosystem.
Comprehensive Documentation: Furnish extensive documentation specifically tailored for Node.js developers, offering clear guidance on integrating and utilizing LiveKit functionalities within Node.js applications. This documentation should include code examples, best practices, and troubleshooting tips to streamline the development process.
Support for Node.js Frameworks: Ensure compatibility with popular Node.js frameworks such as Express.js, NestJS, and Fastify, enabling developers to seamlessly integrate LiveKit into their existing projects without friction. Compatibility with these frameworks enhances flexibility and fosters rapid development.
Event Emitters and Promises: Leverage Node.js conventions such as event emitters and promises to provide an intuitive and asynchronous programming model for interacting with LiveKit APIs. This approach aligns with Node.js development paradigms and enhances the developer experience when working with LiveKit in Node.js environments.
Community Engagement: Actively engage with the Node.js developer community through forums, social media, and developer outreach programs to solicit feedback, address concerns, and foster a vibrant ecosystem around LiveKit in Node.js. Incorporating community feedback ensures that the Node.js SDK evolves in alignment with developer needs and industry trends.
Benefits:

Expanded Developer Reach: By supporting Node.js environments, LiveKit can attract a broader range of developers who prefer Node.js for their real-time communication projects, thereby increasing its user base and fostering community growth.
Enhanced Developer Experience: Node.js developers can leverage their existing skills and familiarity with the Node.js ecosystem to seamlessly integrate LiveKit into their projects, resulting in a more streamlined development experience.
Ecosystem Synergy: Integration with Node.js opens up opportunities for collaboration and integration with other Node.js libraries and frameworks, enriching the LiveKit ecosystem and enabling developers to leverage a wider range of tools and resources.
Conclusion:
Introducing native support for Node.js environments within LiveKit represents a significant opportunity to expand the platform's reach, enhance developer experience, and foster ecosystem growth. By embracing Node.js, LiveKit can empower a new wave of developers to build innovative real-time communication applications while enriching the overall developer community.

Minimal Assistant Download Files error

I get the following onnx error when trying to download files. Please help

➜ python minimal_assistant.py download-files
Using cache found in /Users/sajjadk/.cache/torch/hub/snakers4_silero-vad_master
Traceback (most recent call last):
File "/Users/sajjadk/dev/onset_assist/agents/examples/voice-assistant/minimal_assistant.py", line 43, in
cli.run_app(WorkerOptions(request_fnc))
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 127, in run_app
cli()
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/livekit/agents/cli/cli.py", line 124, in download_files
plugin.download_files()
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/livekit/plugins/silero/init.py", line 29, in download_files
_ = torch.hub.load(
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/torch/hub.py", line 568, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/Users/sajjadk/dev/onset_assist/agents/myenv/lib/python3.10/site-packages/torch/hub.py", line 597, in _load_local
model = entry(*args, **kwargs)
TypeError: silero_vad() got an unexpected keyword argument 'use_onnx'

deepgram connection failed

Has anyone encountered this problem?

{"asctime": "2024-05-20 20:00:29,659", "level": "WARNING", "name": "livekit.plugins.deepgram", "message": "deepgram connection failed, retrying in 2s", "job_id": "AJ_oRch8GR9fxqf", "pid": 99836}

{"asctime": "2024-05-20 20:00:31,663", "level": "INFO", "name": "livekit.plugins.deepgram", "message": "connecting to deepgram url wss://api.deepgram.com/v1/listen?model=nova-2-general&punctuate=true&smart_format=true&interim_results=true&encoding=linear16&sample_rate=48000&vad_events=true&channels=1&endpointing=0&language=en-us", "job_id": "AJ_oRch8GR9fxqf", "pid": 99836}

{"asctime": "2024-05-20 20:00:29,662", "level": "ERROR", "name": "livekit.plugins.deepgram", "message": "deepgram Exception.with_tracebackTraceback (most recent call last):\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1025, in _wrap_create_connection\n return await self._loop.create_connection(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/base_events.py\", line 1085, in create_connection\n raise exceptions[0]\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/base_events.py\", line 1069, in create_connection\n sock = await self._connect_sock(\n ^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/base_events.py\", line 973, in _connect_sock\n await self.sock_connect(sock, address)\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/selector_events.py\", line 634, in sock_connect\n return await fut\n ^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/asyncio/selector_events.py\", line 674, in _sock_connect_cb\n raise OSError(err, f'Connect call failed {address}')\nTimeoutError: [Errno 60] Connect call failed ('38.104.135.211', 443)\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/livekit/plugins/deepgram/stt.py\", line 228, in _run\n ws = await self._session.ws_connect(url, headers=headers)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/client.py\", line 835, in _ws_connect\n resp = await self.request(\n ^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/client.py\", line 581, in _request\n conn = await self._connector.connect(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 544, in connect\n proto = await self._create_connection(req, traces, timeout)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 944, in _create_connection\n _, proto = await self._create_direct_connection(req, traces, timeout)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1257, in _create_direct_connection\n raise last_exc\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1226, in _create_direct_connection\n transp, proto = await self._wrap_create_connection(\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/Users/yangqingyuan/anaconda3/lib/python3.11/site-packages/aiohttp/connector.py\", line 1033, in _wrap_create_connection\n raise client_error(req.connection_key, exc) from exc\naiohttp.client_exceptions.ClientConnectorError: Cannot connect to host api.deepgram.com:443 ssl:default [Connect call failed ('38.104.135.211', 443)]\n", "job_id": "AJ_oRch8GR9fxqf", "pid": 99836}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.