Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] added the summary agent #147

Merged
merged 9 commits into from
Sep 3, 2024
Merged

[MRG] added the summary agent #147

merged 9 commits into from
Sep 3, 2024

Conversation

huangyz0918
Copy link
Member

@huangyz0918 huangyz0918 commented Aug 28, 2024

User description

We need to update the Github integration function as well, I will do this in the same PR


PR Type

enhancement


Description

  • Introduced a new SummaryAgent class in mle/agents/summarizer.py to summarize GitHub projects, including project information, structure, issues, and source code.
  • Implemented a summary method to interact with a model and return project summaries in JSON format.
  • Added comments in mle/function/__init__.py for a new GitHub integration function schema, preparing for future enhancements.

Changes walkthrough 📝

Relevant files
Enhancement
summarizer.py
Added `SummaryAgent` class for GitHub project summarization

mle/agents/summarizer.py

  • Introduced a new SummaryAgent class for summarizing GitHub projects.
  • Implemented initialization with model, GitHub repository, and console.
  • Added system prompt and JSON output format for project summarization.
  • Developed a summary method to handle model queries and return
    summaries.
  • +84/-0   
    __init__.py
    Added comments for GitHub integration function schema       

    mle/function/init.py

  • Added comments for a new GitHub integration function schema.
  • Prepared for potential future implementation of project structure
    reading.
  • +16/-0   

    💡 PR-Agent usage:
    Comment /help on the PR to get a list of all available PR-Agent tools and their descriptions

    @dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Aug 28, 2024
    Copy link

    PR Reviewer Guide 🔍

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Key issues to review

    Code Clarity
    The SummaryAgent class initialization method contains a large block of text for sys_prompt and json_mode_prompt which reduces code readability and maintainability. Consider moving these large text blocks to external files or resources.

    Exception Handling
    The summary method lacks exception handling when parsing JSON, which could lead to unhandled exceptions if the JSON is malformed.

    Copy link

    github-actions bot commented Aug 28, 2024

    PR Code Suggestions ✨

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Add error handling for JSON parsing to improve robustness

    Use a more structured approach to handle JSON parsing to avoid potential crashes
    from malformed JSON strings. Implement error handling around json.loads to catch and
    handle JSONDecodeError.

    mle/agents/summarizer.py [82]

    -summary = json.loads(text)
    +try:
    +    summary = json.loads(text)
    +except json.JSONDecodeError:
    +    self.console.log("Failed to parse JSON response.")
    +    summary = {}
     
    Suggestion importance[1-10]: 9

    Why: Adding error handling for JSON parsing is crucial for robustness, as it prevents crashes due to malformed JSON, which is a significant improvement.

    9
    Possible issue
    Handle cases of empty user content more explicitly to avoid issues in processing

    Instead of appending empty content directly to chat_history, provide a meaningful
    default message or handle this scenario more gracefully to avoid potential issues
    with processing empty strings.

    mle/agents/summarizer.py [73]

    -self.chat_history.append({"role": "user", "content": ""})  # TODO
    +# If user content is expected to be empty sometimes, handle it explicitly
    +if not some_user_input:
    +    self.chat_history.append({"role": "user", "content": "No input provided"})
    +else:
    +    self.chat_history.append({"role": "user", "content": some_user_input})
     
    Suggestion importance[1-10]: 8

    Why: Handling empty user content explicitly can prevent potential issues during processing, making this a valuable improvement for robustness.

    8
    Best practice
    Improve flexibility and testability by using dependency injection for the console

    Replace the direct instantiation of Console inside the SummaryAgent constructor with
    dependency injection. This change allows for better testing and flexibility, as it
    enables the use of different console implementations or mocks for testing.

    mle/agents/summarizer.py [24-25]

    -if not self.console:
    -    self.console = Console()
    +# Assuming console is injected, no need to check or create a new instance here.
     
    Suggestion importance[1-10]: 7

    Why: The suggestion to use dependency injection for the console improves testability and flexibility, but the existing code already allows for a console to be passed in, so the improvement is minor.

    7
    Maintainability
    Improve code readability and maintainability by externalizing long string assignments

    Refactor the large string assignments for sys_prompt and json_mode_prompt into
    separate methods or external files for better readability and maintainability.

    mle/agents/summarizer.py [26-57]

    -self.sys_prompt = """
    -    You are a software expert tasked with summarizing the Github project provided by the user. The project may
    -     contain the dataset, the source code, and the documentation, etc.
    -    ...
    -self.json_mode_prompt = """
    -    JSON Output Format:
    -    ...
    +self.sys_prompt = self.load_sys_prompt()
    +self.json_mode_prompt = self.load_json_mode_prompt()
     
    Suggestion importance[1-10]: 6

    Why: Refactoring long string assignments into separate methods or files enhances readability and maintainability, but it is not a critical change.

    6

    @dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Sep 2, 2024
    @huangyz0918 huangyz0918 changed the title [WIP] added the summary agent [MRG] added the summary agent Sep 2, 2024
    @huangyz0918
    Copy link
    Member Author

    huangyz0918 commented Sep 2, 2024

    How to test this PR? @leeeizhang @HuaizhengZhang

    run this command under an MLE-Agent project:

    mle start report

    And you will be requiring to enter a GitHub repo name, please make sure you have set the github token before running:

    export GITHUB_TOKEN=xxx
    Screenshot 2024-09-02 at 12 23 57 PM

    @HuaizhengZhang
    Copy link
    Contributor

    image

    why does a public repo require a token to analyze?

    Copy link
    Contributor

    @HuaizhengZhang HuaizhengZhang left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    current workflow is weired

    I need to mle new first and then I can call mle start report for a non-relevant github repo in that project

    @HuaizhengZhang
    Copy link
    Contributor

    I prefer a mle report

    1. if users run mle report under our project without any github links, we generate reports for that project locally.
    2. if users run mle report github-url, we create a project called mle-report-github-repo-name and put report in that project

    Copy link
    Contributor

    @HuaizhengZhang HuaizhengZhang left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    we can merge and then improve next PR

    @dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Sep 3, 2024
    @leeeizhang
    Copy link
    Collaborator

    I prefer a mle report

    1. if users run mle report under our project without any github links, we generate reports for that project locally.

    2. if users run mle report github-url, we create a project called mle-report-github-repo-name and put report in that project

    awesome!! i will refactor the report workflow today!

    @huangyz0918 huangyz0918 merged commit 8de4cce into main Sep 3, 2024
    3 checks passed
    @huangyz0918 huangyz0918 deleted the feat/summary-agent branch September 3, 2024 02:15
    @huangyz0918
    Copy link
    Member Author

    I prefer a mle report

    1. if users run mle report under our project without any github links, we generate reports for that project locally.
    2. if users run mle report github-url, we create a project called mle-report-github-repo-name and put report in that project

    Agreed, @leeeizhang please file a PR to fix this, thank you!

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    enhancement New feature or request lgtm This PR has been approved by a maintainer Review effort [1-5]: 3 size:XL This PR changes 500-999 lines, ignoring generated files.
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    Improve the Github integration functions New summary agent for workspace summary
    3 participants