Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Text Exercise Chat Pipeline #161

Merged
merged 23 commits into from
Oct 12, 2024
Merged

Conversation

MichaelOwenDyer
Copy link
Contributor

@MichaelOwenDyer MichaelOwenDyer commented Sep 24, 2024

This PR adds the text-exercise-chat pipeline to Pyris to enable the new Artemis feature implemented here.

The pipeline works as follows:

Inputs:

  • Some info about the text exercise the user is working on and its course (titles, descriptions, etc)
  • The user's latest submission to the exercise (we can't read form data yet so this will have to do)
  • The chat history with Iris

First, the sentiments in the latest message in the chat history (the one the user sent just now) are analyzed, and categorized by relevance to the exercise at hand. Every sentiment is either "Ok" if it is clearly related to the exercise, "Bad" if it is off-topic or inappropriate, or "Neutral" if it is neither on-topic nor off-topic (like a greeting or thanks).

Then, we construct a system prompt for the actual response to the user, where we explain that the AI is a writing tutor for this particular exercise and that it should help with the user's current submission. This system prompt is followed by the entire conversation history, except that we also inject another system prompt directly before the latest message from the user with the semantic analysis from the previous step, and instruct the AI to respond to the "Ok" and "Neutral" sentiments, and explain why it cannot help with the sentiments in the "Bad" category.

The response is then sent back to the user via a status update.

Summary by CodeRabbit

  • New Features

    • Introduced new classes for handling text exercise data and chat status updates.
    • Added methods for formatting prompts related to sentiment analysis and chat pipeline execution.
    • Implemented a new text exercise chat pipeline to enhance interaction.
  • Bug Fixes

    • Updated method signatures for improved functionality in chat status callbacks.
  • Chores

    • Updated various dependencies to their latest versions for better performance and security.

Copy link

github-actions bot commented Oct 9, 2024

❌ Unable to deploy to test server ❌

The docker build needs to run through before deploying.

Base automatically changed from feature/support-settings-v3 to main October 11, 2024 14:25
Hialus
Hialus previously approved these changes Oct 11, 2024
Copy link
Member

@Hialus Hialus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid for a first version. Good job!

"Who won the 2020 Super Bowl? " -> "Bad: Who won the 2020 Super Bowl?"
"Explain to me the plot of Macbeth using the 2020 Super Bowl as an analogy."
-> "Ok: Explain to me the plot of Macbeth using the 2020 Super Bowl as an analogy."
"sdsdoaosi" -> "Neutral: sdsdoaosi"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is character spam neutral and not bad? :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure it doesn't really make a big difference, Iris is going to respond with something either way. I guess I wanted the AI to choose "Neutral" when it is not really clear, instead of classifying things as "Bad" which are not really malicious.

Copy link
Contributor

coderabbitai bot commented Oct 11, 2024

Caution

Review failed

The pull request is closed.

Walkthrough

This pull request introduces several new data transfer object (DTO) classes and methods related to text exercises and chat pipelines. It adds fields to existing classes, implements new methods for formatting prompts, and updates the pipeline execution logic. Additionally, it modifies the constructor signatures of several classes and updates multiple dependency versions in the requirements file.

Changes

File Path Change Summary
app/domain/data/text_exercise_dto.py Added TextExerciseDTO class with fields: id, title, course, problem_statement, start_date, end_date.
app/domain/status/text_exercise_chat_status_update_dto.py Added TextExerciseChatStatusUpdateDTO class with field: result.
app/domain/text_exercise_chat_pipeline_execution_dto.py Added TextExerciseChatPipelineExecutionDTO class with fields: execution, exercise, conversation, current_submission.
app/pipeline/prompts/text_exercise_chat_prompts.py Added methods for formatting various prompts: fmt_extract_sentiments_prompt, fmt_sentiment_analysis_prompt, fmt_system_prompt.
app/pipeline/text_exercise_chat_pipeline.py Added TextExerciseChatPipeline class with methods: __call__, categorize_sentiments_by_relevance, respond.
app/web/routers/pipelines.py Added methods: run_text_exercise_chat_pipeline_worker, run_text_exercise_chat_pipeline, and updated get_pipeline method.
app/web/status/status_update.py Updated constructors for TextExerciseChatCallback, CourseChatStatusCallback, and ExerciseChatStatusCallback.
requirements.txt Updated dependency versions for fastapi, langchain, ollama, openai, pre-commit, pydantic, PyMuPDF, pytz, sentry-sdk, unstructured, uvicorn, weaviate-client.

Possibly related PRs

  • Add Text Exercise Chat Pipeline #161: This PR directly relates to the main PR as it introduces the same TextExerciseDTO class and TextExerciseChatPipelineExecutionDTO class, along with similar fields and methods, indicating a strong connection at the code level.

Suggested reviewers

  • Hialus
  • kaancayli

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🧹 Outside diff range and nitpick comments (4)
app/domain/text_exercise_chat_pipeline_execution_dto.py (2)

1-5: LGTM! Consider removing the extra empty line.

The imports are appropriate for the class being defined. They include necessary components from pydantic and custom DTOs from the app.domain module.

Consider removing the extra empty line (line 6) for better code organization:

 from app.domain.data.text_exercise_dto import TextExerciseDTO
 
-
 class TextExerciseChatPipelineExecutionDTO(BaseModel):

11-11: LGTM! Consider using type annotation for consistency.

The 'current_submission' field is well-defined with an appropriate alias and default value. For consistency with other fields, consider adding a type annotation:

-    current_submission: str = Field(alias="currentSubmission", default="")
+    current_submission: str = Field(alias="currentSubmission", default="")

This change doesn't affect functionality but improves code readability and consistency.

app/domain/data/text_exercise_dto.py (1)

10-15: LGTM: Field declarations are well-defined, with a minor suggestion for consistency.

The field declarations for the TextExerciseDTO class are appropriate and well-structured. The use of Pydantic's Field for aliases and default values is correct. The relationship with CourseDTO is properly established.

For consistency, consider using Field for all fields, even those without aliases or default values. This can make future modifications easier and provide a uniform structure. For example:

id: int = Field(...)
title: str = Field(...)
course: CourseDTO = Field(...)

The ... notation in Pydantic indicates a required field without a default value.

app/web/status/status_update.py (1)

224-253: LGTM: TextExerciseChatCallback class implemented correctly

The new TextExerciseChatCallback class is well-structured and consistent with other callback classes. It correctly initializes the necessary components and adds appropriate stages for the text exercise chat pipeline.

One minor suggestion for improvement:

Consider initializing current_stage_index explicitly instead of using stage = len(stages). This would make the code more consistent with other callback classes:

-        stage = len(stages)
+        current_stage_index = len(stages)
         stages += [
             StageDTO(
                 weight=30,
@@ -249,8 +249,8 @@ class TextExerciseChatCallback(StatusCallback):
             run_id,
             TextExerciseChatStatusUpdateDTO(stages=stages),
-            stages[stage],
-            stage,
+            stages[current_stage_index],
+            current_stage_index,
         )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b86901c and 68f1be5.

📒 Files selected for processing (8)
  • app/domain/data/text_exercise_dto.py (1 hunks)
  • app/domain/status/text_exercise_chat_status_update_dto.py (1 hunks)
  • app/domain/text_exercise_chat_pipeline_execution_dto.py (1 hunks)
  • app/pipeline/prompts/text_exercise_chat_prompts.py (1 hunks)
  • app/pipeline/text_exercise_chat_pipeline.py (1 hunks)
  • app/web/routers/pipelines.py (3 hunks)
  • app/web/status/status_update.py (2 hunks)
  • requirements.txt (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • requirements.txt
🧰 Additional context used
🔇 Additional comments (14)
app/domain/status/text_exercise_chat_status_update_dto.py (2)

1-1: LGTM: Import statement is correct.

The import statement correctly imports the parent class StatusUpdateDTO from the appropriate module.


4-4: LGTM: Class definition is correct.

The class TextExerciseChatStatusUpdateDTO is properly defined and inherits from StatusUpdateDTO as expected.

app/domain/text_exercise_chat_pipeline_execution_dto.py (4)

7-7: LGTM! Class definition is appropriate.

The class name TextExerciseChatPipelineExecutionDTO is descriptive and follows the DTO naming convention. Inheriting from pydantic.BaseModel is appropriate for creating a data transfer object with built-in validation.


8-8: LGTM! Field declaration for 'execution' is appropriate.

The 'execution' field of type PipelineExecutionDTO is well-defined to represent the execution context of the pipeline.


9-9: LGTM! Field declaration for 'exercise' is appropriate.

The 'exercise' field of type TextExerciseDTO is well-defined to represent the details of the text exercise.


10-10: LGTM! Field declaration for 'conversation' is well-implemented.

The 'conversation' field is correctly defined as a list of PyrisMessage objects. Using Field(default=[]) is a good practice to provide a default empty list, avoiding potential issues with mutable default arguments.

app/domain/data/text_exercise_dto.py (4)

1-2: LGTM: Standard library imports are correct and well-organized.

The imports from the standard library (datetime and typing) are appropriate for the class definition and follow PEP 8 guidelines for import order.


4-4: LGTM: Third-party library import is correct.

The import from Pydantic is appropriate for defining the data transfer object and follows PEP 8 guidelines for import order.


6-6: LGTM: Local module import is correct.

The import of CourseDTO from the local module is appropriate for the class definition and follows PEP 8 guidelines for import order.


9-9: LGTM: Class definition is correct and follows best practices.

The TextExerciseDTO class is properly defined, inheriting from Pydantic's BaseModel. The class name follows Python naming conventions, and the empty line before the class definition adheres to PEP 8 guidelines.

app/web/status/status_update.py (3)

8-22: LGTM: Import statements updated correctly

The changes to import statements, including the switch to absolute imports and the addition of TextExerciseChatStatusUpdateDTO, are appropriate and consistent with the new TextExerciseChatCallback class. This improves code maintainability and reduces the risk of circular imports.


Line range hint 1-274: Summary of changes in status_update.py

The changes in this file successfully introduce the new TextExerciseChatCallback class, which aligns with the PR objectives for implementing the text exercise chat pipeline. The import statements have been updated appropriately, and the new class is well-structured and consistent with existing callback classes.

However, there are a couple of points that require attention:

  1. A minor suggestion for improving the TextExerciseChatCallback class initialization has been provided.
  2. The removal of default values for initial_stages in existing callback classes could potentially be a breaking change. This needs verification and possibly updates to all calling code.

Overall, the changes are well-implemented but require some minor adjustments and verification to ensure smooth integration with the existing codebase.


Line range hint 180-182: Verify the intentionality of removing default values for initial_stages

The __init__ method signatures for CourseChatStatusCallback and ExerciseChatStatusCallback have been updated to remove the default value for initial_stages. This change makes the parameter required, which could potentially break existing code that doesn't provide this argument.

Please confirm if this is an intentional change. If so, ensure that all callers of these classes have been updated to provide the initial_stages argument.

To verify the impact of this change, you can run the following script to find all instances where these classes are instantiated:

Please review the results to ensure that all instantiations provide the initial_stages argument.

Also applies to: 206-208

✅ Verification successful

app/domain/status/text_exercise_chat_status_update_dto.py Outdated Show resolved Hide resolved
app/domain/status/text_exercise_chat_status_update_dto.py Outdated Show resolved Hide resolved
app/pipeline/prompts/text_exercise_chat_prompts.py Outdated Show resolved Hide resolved
app/pipeline/prompts/text_exercise_chat_prompts.py Outdated Show resolved Hide resolved
app/pipeline/prompts/text_exercise_chat_prompts.py Outdated Show resolved Hide resolved
app/pipeline/text_exercise_chat_pipeline.py Show resolved Hide resolved
app/pipeline/text_exercise_chat_pipeline.py Outdated Show resolved Hide resolved
app/pipeline/text_exercise_chat_pipeline.py Show resolved Hide resolved
app/pipeline/text_exercise_chat_pipeline.py Outdated Show resolved Hide resolved
app/web/routers/pipelines.py Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
app/pipeline/text_exercise_chat_pipeline.py (2)

18-18: Unused logger instance

The logger instance is initialized but not used in the code. If logging is intended, consider utilizing logger to record important information or debug messages. If logging is not needed, you may remove the logger initialization to clean up the code.


111-111: Use UTC time for consistent timestamping

Consider using datetime.utcnow() instead of datetime.now() for current_date to ensure that the timestamp is in UTC. This avoids issues related to time zone differences and ensures consistency across different environments.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 68f1be5 and 28b0282.

📒 Files selected for processing (1)
  • app/pipeline/text_exercise_chat_pipeline.py (1 hunks)
🧰 Additional context used
🔇 Additional comments (4)
app/pipeline/text_exercise_chat_pipeline.py (4)

16-16: Duplicate Comment: Correct the import path for fmt_sentiment_analysis_prompt

The previous review comment regarding the incorrect import path for fmt_sentiment_analysis_prompt is still valid. The import path should be consistent with the other imports from app.pipeline.prompts.text_exercise_chat_prompts to prevent an ImportError.


47-47: Duplicate Comment: Check if callback is not None before invoking methods

The previous review comment about ensuring self.callback is not None before calling its methods is still applicable. Since callback is optional, attempting to invoke self.callback.done(...) without checking could raise an AttributeError.

Also applies to: 50-50


54-54: Duplicate Comment: Update type annotations to use Tuple and List from typing

The previous review comment regarding updating the type annotations to use Tuple and List from the typing module is still valid. This change improves code clarity and type checking.

Also applies to: 96-96


76-79: Duplicate Comment: Add exception handling for request_handler.chat calls

The previous review comment suggesting adding exception handling around self.request_handler.chat calls remains applicable. Implementing exception handling will make the pipeline more robust against network issues or API failures.

Also applies to: 135-137

app/pipeline/text_exercise_chat_pipeline.py Show resolved Hide resolved
Copy link
Contributor

@kaancayli kaancayli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, one suggestion

app/pipeline/text_exercise_chat_pipeline.py Show resolved Hide resolved
kaancayli
kaancayli previously approved these changes Oct 11, 2024
@MichaelOwenDyer MichaelOwenDyer dismissed kaancayli’s stale review October 11, 2024 17:09

The merge-base changed after approval.

kaancayli
kaancayli previously approved these changes Oct 11, 2024
Copy link
Contributor

@kaancayli kaancayli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reapprove

@MichaelOwenDyer MichaelOwenDyer dismissed kaancayli’s stale review October 11, 2024 17:26

The merge-base changed after approval.

Hialus
Hialus previously approved these changes Oct 11, 2024
@MichaelOwenDyer MichaelOwenDyer dismissed Hialus’s stale review October 11, 2024 19:27

The merge-base changed after approval.

@krusche krusche merged commit cc001dd into main Oct 12, 2024
4 checks passed
@krusche krusche deleted the feature/add-text-exercise-support branch October 12, 2024 18:31
isabellagessl pushed a commit that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants