Add support for on-demand involvement

Fixes #15
etkecc · Oct 1, 2024 · 9908512 · 9908512
1 parent eae6472
commit 9908512
Show file tree

Hide file tree

Showing 23 changed files with 828 additions and 325 deletions.
diff --git a/README.md b/README.md
@@ -41,7 +41,7 @@ It's influenced by [chaz](https://github.com/arcuru/chaz), but does **not** use
 
 ![Introduction and general usage](./docs/screenshots/introduction-and-general-usage.webp)
 
-You can find more screenshots on the the [🌟 Features](./docs/features.md) and other [📚 Documentation](./docs/README.md) pages, as well as in the [docs/screenshots](./docs/screenshots) directory.
+You can find more screenshots on the [🌟 Features](./docs/features.md) and other [📚 Documentation](./docs/README.md) pages, as well as in the [docs/screenshots](./docs/screenshots) directory.
 
 
 ## 🚀 Getting Started

diff --git a/docs/access.md b/docs/access.md
@@ -16,6 +16,7 @@ Users:
 
 - ✅ can **invite the bot to rooms**
 - ✅ can **use all the bot's [features](./features.md)** ([💬 Text Generation](./features.md#-text-generation), [🦻 Speech-to-Text](./features.md#-speech-to-text), etc.) by sending room messages
+- ✅ can **mention the bot** in threads and reply chains to provoke it to respond to non-user messages (see [📖 Usage / 💬 Text Generation / On-demand involvement](./usage.md#on-demand-involvement))
 - ✅ can **change the bot's configuration in a room** (e.g. `!bai config room ...` commands)
 - ❌ cannot **change the bot's global configuration** (e.g. `!bai config global ...` commands)
 - ❌ cannot **create new [🤖 Agents](./agents.md)** (neither in rooms, nor globally). See [💼 Room-local agent managers](#-room-local-agent-managers) for controlling which users can create agents.

diff --git a/docs/configuration/text-generation.md b/docs/configuration/text-generation.md
@@ -13,7 +13,7 @@ You may also wish to see:
 
 In Direct Message rooms with the bot (1:1 rooms), it most usually makes sense for the bot to respond to **all** of your messages, as shown on this [🖼️ screenshot](../screenshots/text-generation.webp).
 
-In group rooms (with multiple users), it may be more appropriate for the bot to only respond to messages that are **prefixed** with the command prefix (e.g. `!bai`), so that other chat exchange in the room will not trigger it. Such a setup is shown on this [🖼️ screenshot](../screenshots/text-generation-prefix-requirement.webp).
+In group rooms (with multiple users), it may be more appropriate for the bot to only respond to messages that are **prefixed** with the command prefix (e.g. `!bai`) or which are [mentioning](https://spec.matrix.org/latest/client-server-api/#user-and-room-mentions) the bot (e.g. `@baibot`), so that other chat exchange in the room will not trigger it. Such a setup is shown on the [🖼️ On-demand involvement in the room](../screenshots/text-generation-prefix-requirement.webp) screenshot.
 
 There are exceptions to these rules, and you can configure the bot to respond only to prefixed messages in a 1:1 room, or to respond to all messages even in a multi-user group room.
 
@@ -27,7 +27,10 @@ By default, the bot is **auto-configured (upon joining a new room)** to use the
 
 Example: `!bai config room text-generation set-prefix-requirement-type command_prefix` (this can also be set globally, see [🛠️ Room Settings](./README.md#room-settings))
 
-Regardless of this configuration, **the bot will also respond to messages which directly [mention](https://spec.matrix.org/latest/client-server-api/#user-and-room-mentions) the bot** (e.g. `@baibot`), even if they are not prefixed. An example of this can be seen on this [🖼️ screenshot](../screenshots/text-generation-prefix-requirement.webp).
+Regardless of this configuration, **the bot will also respond to messages by allowed [👥 Users](../access.md#-users) which directly [mention](https://spec.matrix.org/latest/client-server-api/#user-and-room-mentions) the bot** (e.g. `@baibot`), even if they are not prefixed. An example of this can be seen on these screenshots:
+
+- [🖼️ On-demand involvement in a thread](../screenshots/text-generation-on-demand-thread-involvement.webp)
+- [🖼️ On-demand involvement in a reply chain](../screenshots/text-generation-on-demand-reply-involvement.webp)
 
 
 ### 🪄 Auto Usage

diff --git a/docs/features.md b/docs/features.md
@@ -28,6 +28,8 @@ Text Generation is the bot's ability to **respond to users' text messages with t
 
 In multi-user (group) rooms, to avoid disturbing the normal conversation between people, the bot is auto-configured to only respond to messages starting with the command prefix (`!bai`) or direct mentions via the [💬 Text Generation / 🗟 Prefix Requirement Type](./configuration/text-generation.md#-prefix-requirement-type) setting.
 
+Normally, the bot only responds to allowed [👥 Users](./access.md#-users). In certain cases, it's useful for an allowed user to provoke the bot to respond even in foreign threads or reply chains. You can learn more about this feature in the [📖 Usage / 💬 Text Generation / On-demand involvement](./usage.md#on-demand-involvement) section.
+
 A few other features (like [🗣️ Text-to-Speech](#️-text-to-speech) and [🦻 Speech-to-Text](#-speech-to-text)) combine well with Text Generation, so you **don't necessarily need to communicate with the bot via text** (with [Seamless voice interaction](#seamless-voice-interaction), you can communicate only with voice).
 
 You may also wish to see:

diff --git a/docs/screenshots/text-generation-on-demand-reply-involvement.webp b/docs/screenshots/text-generation-on-demand-reply-involvement.webp
diff --git a/docs/screenshots/text-generation-on-demand-thread-involvement.webp b/docs/screenshots/text-generation-on-demand-thread-involvement.webp
diff --git a/docs/usage.md b/docs/usage.md
@@ -11,10 +11,11 @@ This is related to the [💬 Text Generation](./features.md#-text-generation) fe
 
 If there's a text-generation handler agent configured, the bot **may** respond to messages sent in the room.
 
-🖼️ See screenshots of:
+See screenshots of:
 
-- the [default Text Generation flow](./screenshots/text-generation.webp) for 1:1 rooms
-- the [Text Generation flow in multi-user rooms](./screenshots/text-generation-prefix-requirement.webp) (where the [🗟 Prefix Requirement](./configuration/text-generation.md#-prefix-requirement-type) setting is auto-configured to "required")
+- 🖼️ [the default Text Generation flow](./screenshots/text-generation.webp) in 1:1 rooms
+- 🖼️ [the Text Generation flow in multi-user rooms](./screenshots/text-generation-prefix-requirement.webp) (where the [🗟 Prefix Requirement](./configuration/text-generation.md#-prefix-requirement-type) setting is auto-configured to "required")
+- [on-demand involvement](#on-demand-involvement)
 
 Whether the bot responds depends on:
 
@@ -24,12 +25,27 @@ Whether the bot responds depends on:
 
 - (🎨 agent capabilities) whether the configured `text-generation` (or `catch-all`) handler agent actually supports text-generation. The provider may lack support for this feature or it may be disabled in the [🤖 agents](./agents.md) configuration
 
-- (the [🗟 Prefix Requirement](./configuration/text-generation.md#-prefix-requirement-type) setting) whether a prefix (e.g. `!bai`) is required in front of messages sent to the room. For multi-user rooms, this setting defaults to "required"
+- (the [🗟 Prefix Requirement](./configuration/text-generation.md#-prefix-requirement-type) setting) whether a prefix (e.g. `!bai`) or user mention (e.g. `@baibot`) is required for messages sent to the room. For multi-user rooms, this setting defaults to "required". See [on-demand involvement](#on-demand-involvement) for details.
 
 Room messages start a threaded conversation where you can continue back-and-forth communication with the bot.
 
 Unless you've enabled the [♻️ Context Management](./features.md#️-context-management) feature, all messages will be sent to the agent's API each time. If the context management feature is enabled, older messages may be dropped.
 
+#### On-demand involvement
+
+In the following 2 cases, it's useful to involve the bot in conversations on-demand:
+
+1. For multi-user rooms (with the [🗟 Prefix Requirement](./configuration/text-generation.md#-prefix-requirement-type) setting set to "required")
+2. In rooms with foreign users (users that are not authorized bot [👥 users](./access.md#-users))
+
+In these instances, an allowed [👥 user](./access.md#-users) can also provoke the bot to respond to **any** thread or reply chain by [mentioning](https://spec.matrix.org/latest/client-server-api/#user-and-room-mentions) the bot (e.g. `@baibot Hello!`). The following screenshots demonstrate this behavior:
+
+- [🖼️ On-demand involvement in the room](./screenshots/text-generation-prefix-requirement.webp)
+- [🖼️ On-demand involvement in a thread](./screenshots/text-generation-on-demand-thread-involvement.webp) (the Alice user in this example is not an allowed user, yet her messages are still considered as part of the conversation context)
+- [🖼️ On-demand involvement in a reply chain](./screenshots/text-generation-on-demand-reply-involvement.webp) (the Alice user in this example is not an allowed user, yet her messages are still considered as part of the conversation context)
+
+💡 **NOTE**: Normally, the bot **only considers messages from allowed [👥 Users](./access.md#-users)** and ignores all other messages when responding. However, **when the bot is explicitly invoked (via mention)** in a thread or reply chain, **it will consider all messages** in the thread and reply chain (even those from foreign users) as part of the conversation context.
+
 
 ### 🗣️ Text-to-Speech
 

diff --git a/src/bot/messaging.rs b/src/bot/messaging.rs
@@ -11,7 +11,7 @@ use mxlink::{CallbackError, MessageResponseType};
 use tracing::Instrument;
 
 use crate::{
-    conversation::matrix::determine_thread_context_for_room_event,
+    conversation::matrix::determine_interaction_context_for_room_event,
     entity::{MessageContext, MessagePayload, RoomConfigContext, TriggerEventInfo},
 };
 
@@ -239,7 +239,7 @@ impl Messaging {
             }
         };
 
-        let thread_context = determine_thread_context_for_room_event(
+        let interaction_context = determine_interaction_context_for_room_event(
             self.bot.user_id(),
             &room,
             &event,
@@ -248,16 +248,18 @@ impl Messaging {
         )
         .await;
 
-        let thread_context = match thread_context {
+        let interaction_context = match interaction_context {
             Ok(value) => value,
             Err(err) => {
-                tracing::error!(?err, "Failed to determine thread context for event");
+                tracing::error!(?err, "Failed to determine interaction context for event");
                 return Ok(());
             }
         };
 
-        let Some(thread_context) = thread_context else {
-            tracing::debug!("Ignoring message with unknown thread context (likely not a threaded message or a top-level message)");
+        let Some(interaction_context) = interaction_context else {
+            tracing::debug!(
+                "Ignoring message with unknown interaction context (likely not a message for us)"
+            );
             return Ok(());
         };
 
@@ -276,41 +278,21 @@ impl Messaging {
             room_config_context,
             self.bot.admin_pattern_regexes().clone(),
             trigger_event_info,
-            thread_context.info.clone(),
+            interaction_context.thread_info.clone(),
         );
 
-        let bot_display_name = self
-            .bot
-            .room_display_name_fetcher()
-            .own_display_name_in_room(message_context.room())
-            .await;
-
-        let bot_display_name = match bot_display_name {
-            Ok(value) => value,
-            Err(err) => {
-                tracing::warn!(
-                    ?err,
-                    "Failed to fetch bot display name. Proceeding without it"
-                );
-                None
-            }
-        };
-
-        // The first event in the thread determines which handler processes the current event.
         let controller_type = crate::controller::determine_controller(
             self.bot.command_prefix(),
-            &thread_context.first_message,
+            &interaction_context.trigger,
             &message_context,
-            self.bot.user_id(),
-            &bot_display_name,
         );
 
         tracing::info!(?controller_type, "Determined controller");
 
         let _ = room
             .send_single_receipt(
                 ReceiptType::Read,
-                thread_context.info.clone().into(),
+                interaction_context.thread_info.clone().into(),
                 event.event_id.clone(),
             )
             .await;

diff --git a/src/controller/chat_completion/mod.rs b/src/controller/chat_completion/mod.rs
@@ -18,13 +18,28 @@ use crate::entity::roomconfig::{
 use crate::entity::MessagePayload;
 use crate::strings;
 use crate::utils::text_to_speech::create_transcribed_message_text;
-use crate::{conversation::create_llm_conversation_for_matrix_thread, entity::MessageContext, Bot};
+use crate::{
+    conversation::{
+        create_llm_conversation_for_matrix_reply_chain, create_llm_conversation_for_matrix_thread,
+        matrix::create_list_of_bot_user_prefixes_to_strip,
+    },
+    entity::MessageContext,
+    Bot,
+};
 
 #[derive(Debug, PartialEq)]
 pub enum ChatCompletionControllerType {
-    ViaText { prefixes_to_strip: Vec<String> },
+    // Invoked via a command prefix (e.g. `!bai Hello!`)
+    TextCommand,
+    // Invoked via a mention (e.g. `@baibot Hello!`)
+    TextMention,
+    // Invoked via a direct message (e.g. `Hello!`)
+    TextDirect,
+
+    Audio,
 
-    ViaAudio,
+    ThreadMention,
+    ReplyMention,
 }
 
 struct TextToSpeechEligiblePayload {
@@ -125,7 +140,15 @@ pub async fn handle(
                 None
             };
 
-        let response_type = MessageResponseType::InThread(message_context.thread_info().clone());
+        let response_type = match controller_type {
+            // When we're triggered via a reply mention, we reply to the message that triggered us.
+            ChatCompletionControllerType::ReplyMention => {
+                MessageResponseType::Reply(message_context.thread_info().last_event_id.clone())
+            }
+
+            // In all other cases, we're dealing with a threaded conversation, so we reply in the thread.
+            _ => MessageResponseType::InThread(message_context.thread_info().clone()),
+        };
 
         let text_to_speech_eligible_payload = handle_stage_text_generation(
             bot,
@@ -353,24 +376,78 @@ async fn handle_stage_text_generation(
     )
     .await?;
 
-    let prefixes_to_strip = match controller_type {
-        ChatCompletionControllerType::ViaText { prefixes_to_strip } => prefixes_to_strip.clone(),
-        ChatCompletionControllerType::ViaAudio => vec![],
+    // We only strip text from the first message if we're invoked via a command prefix.
+    // Otherwise, we do bot-user mentions stripping on all messages below.
+    let first_message_prefixes_to_strip = match controller_type {
+        ChatCompletionControllerType::TextCommand => vec![bot.command_prefix().to_owned()],
+        _ => vec![],
     };
 
-    let params = MatrixMessageProcessingParams::new(
-        bot.user_id().as_str().to_owned(),
-        message_context.combined_admin_and_user_regexes(),
-    )
-    .with_first_message_stripped_prefixes(prefixes_to_strip);
+    let bot_display_name = bot
+        .room_display_name_fetcher()
+        .own_display_name_in_room(message_context.room())
+        .await;
 
-    let conversation = create_llm_conversation_for_matrix_thread(
-        matrix_link.clone(),
-        message_context.room(),
-        message_context.thread_info().root_event_id.clone(),
-        &params,
-    )
-    .await;
+    let bot_display_name = match bot_display_name {
+        Ok(value) => value,
+        Err(err) => {
+            tracing::warn!(
+                ?err,
+                "Failed to fetch bot display name. Proceeding without it"
+            );
+            None
+        }
+    };
+
+    let bot_user_prefixes_to_strip =
+        create_list_of_bot_user_prefixes_to_strip(bot.user_id(), &bot_display_name);
+
+    let allowed_users = match controller_type {
+        // Regular chat completion only operates on messages from allowed users.
+        ChatCompletionControllerType::TextCommand
+        | ChatCompletionControllerType::TextMention
+        | ChatCompletionControllerType::TextDirect
+        | ChatCompletionControllerType::Audio => {
+            Some(message_context.combined_admin_and_user_regexes())
+        }
+
+        // When we're triggered via an explicit mention (thread or reply), we wish to operate against the mention's whole context
+        // (the whole thread or the whole reply chain upward of the message that triggered us).
+        //
+        // This is to allow admins and users to trigger text-generation for other users' messages.
+        // When we're dragged into a conversation by a known (to us) user, we'd like to process all messages in the conversation,
+        // not just those from allowed users.
+        ChatCompletionControllerType::ThreadMention
+        | ChatCompletionControllerType::ReplyMention => None,
+    };
+
+    let params = MatrixMessageProcessingParams::new(bot.user_id().to_owned(), allowed_users)
+        .with_first_message_prefixes_to_strip(first_message_prefixes_to_strip)
+        .with_bot_user_prefixes_to_strip(bot_user_prefixes_to_strip);
+
+    let conversation = match controller_type {
+        // When we're triggered via a reply mention, the context is the whole reply chain upward of the message that triggered us.
+        ChatCompletionControllerType::ReplyMention => {
+            create_llm_conversation_for_matrix_reply_chain(
+                &bot.room_event_fetcher().clone(),
+                message_context.room(),
+                message_context.thread_info().last_event_id.clone(),
+                &params,
+            )
+            .await
+        }
+
+        // Everything else is happening in a thread, so the context is the whole thread.
+        _ => {
+            create_llm_conversation_for_matrix_thread(
+                matrix_link.clone(),
+                message_context.room(),
+                message_context.thread_info().root_event_id.clone(),
+                &params,
+            )
+            .await
+        }
+    };
 
     let conversation = match conversation {
         Ok(conversation) => conversation,
@@ -565,11 +642,15 @@ async fn handle_stage_speech_to_text_actual_transcribing(
     //
     // Regardless of how we post this message, it will be posted as a notice,
     // which can indicate to the bot (for potential future text-generation purposes) that this message is not a bot message.
-    let (transcribed_text, annotate_message_with_reaction) = if let MessageResponseType::InThread(_) = response_type {
-        (create_transcribed_message_text(&speech_to_text_result.text), false)
-    } else {
-        (speech_to_text_result.text, true)
-    };
+    let (transcribed_text, annotate_message_with_reaction) =
+        if let MessageResponseType::InThread(_) = response_type {
+            (
+                create_transcribed_message_text(&speech_to_text_result.text),
+                false,
+            )
+        } else {
+            (speech_to_text_result.text, true)
+        };
 
     let result = bot
         .messaging()