Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SeleniumBrowser #1733

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open

Commits on Feb 19, 2024

  1. Update test_web_surfer.py

    Tests for the new Selenium WebDriver addition
    signalprime authored Feb 19, 2024
    Configuration menu
    Copy the full SHA
    23ee145 View commit details
    Browse the repository at this point in the history
  2. Update browser_utils.py

    Inclusions of `SeleniumBrowserWrapper`, `SeleniumBrowser`, and several required helper functions that are part of the upcoming `ContentCollector` class and the `WebCollectionAgent`.
    signalprime authored Feb 19, 2024
    Configuration menu
    Copy the full SHA
    2daec15 View commit details
    Browse the repository at this point in the history
  3. Update web_surfer.py

    Provides an optional drop-in replacement for `SimpleTextBrowser` with `SeleniumBrowserWrapper` for use-cases including pages that depend on JavaScript and others that prevent calls from `requests`.  Nearly all compatibility is held through with the exception of page numbering.
    signalprime authored Feb 19, 2024
    Configuration menu
    Copy the full SHA
    9efb297 View commit details
    Browse the repository at this point in the history
  4. ContentAgent: Custom LLM agent for collecting online content.

        The ContentAgent class is a custom Autogen agent that can be used to collect and store online content from different web pages. It extends the ConversableAgent class and provides additional functionality for managing a list of additional links, storing collected content in local directories, and customizing request headers.
        ContentAgent uses deque to manage a list of additional links for further exploration, with a maximum depth limit set by max_depth parameter. The collected content is stored in the specified storage path (storage_path) using local directories.
        ContentAgent can be customized with request_kwargs and llm_config parameters during instantiation. The default User-Agent header is used for requests, but it can be overridden by providing a new dictionary of headers under request_kwargs.
    signalprime authored Feb 19, 2024
    Configuration menu
    Copy the full SHA
    217ed91 View commit details
    Browse the repository at this point in the history
  5. Update content_agent.py

    Very minor updates prior to submitting a PR
    signalprime authored Feb 19, 2024
    Configuration menu
    Copy the full SHA
    72a165a View commit details
    Browse the repository at this point in the history

Commits on Feb 20, 2024

  1. Update browser_utils.py

    small fix in the `fix_missing_protocol` function
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    46b2424 View commit details
    Browse the repository at this point in the history
  2. Update content_agent.py

    Small addition to maintain a dictionary of processed html content, referenced by the source URL (Uniform Resource Locator)
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    d34ae1b View commit details
    Browse the repository at this point in the history
  3. Update content_agent.py

    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    1ba9e05 View commit details
    Browse the repository at this point in the history
  4. Unit Tests for the ContentAgent

    We cover a small sample of websites, asserting expectations against a number of measurements performed on the collected content.  
    
    The assertions include, but are not limited to: 
    - the expected variables contain values
    - the presence of the expected output files
    - that the expected output files are not empty
    
    Further improvements can include:
    - evaluation against all choices of WebDriver to confirm functionality 
    - evaluation against a larger sample of websites
    -
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    84fa1b8 View commit details
    Browse the repository at this point in the history
  5. Update browser_utils.py

    It's noted that `_set_page_content`, `_split_pages`, and `viewport` are likely not yet compatible but seemingly not necessary at this time for the selenium browser wrapper class.
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    67f95bf View commit details
    Browse the repository at this point in the history
  6. Update web_surfer.py

    Small updates on imports that have been recently refactored to other locations.  Specifically:
    ```
    from ..agent import Agent
    from .. import ConversableAgent, AssistantAgent, UserProxyAgent, GroupChatManager, GroupChat
    from ...oai.client import OpenAIWrapper
    ```
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    08f8ff9 View commit details
    Browse the repository at this point in the history
  7. Update content_agent.py

    A small change to declaring `self.browser_kwargs` prior to initializing the parent class (ConversableAgent).  This is done to avoid triggering an unexpected argument error for `browser_kwargs`.
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    3954412 View commit details
    Browse the repository at this point in the history
  8. Update content_agent.py

    fixing the following pre-commit errors:
    autogen/agentchat/contrib/content_agent.py:21:1: E402 Module level import not at top of file
    autogen/agentchat/contrib/content_agent.py:34:1: E402 Module level import not at top of file
    autogen/agentchat/contrib/content_agent.py:65:33: F811 Redefinition of unused `deque` from line 6
    autogen/agentchat/contrib/content_agent.py:374:26: F811 Redefinition of unused `filename` from line 7
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    749a556 View commit details
    Browse the repository at this point in the history
  9. Update browser_utils.py

    Fixing the redundant import of selenium webdriver within `SeleniumBrowser`
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    818a010 View commit details
    Browse the repository at this point in the history
  10. Update content_agent.py

    Small corrections based on pre-commit errors, both resulting in removed code:
    content_agent.py:94:9: F821 Undefined name `f`
    content_agent.py:371:26: F811 Redefinition of unused `filename` from line 21
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    643bad0 View commit details
    Browse the repository at this point in the history
  11. Update browser_utils.py

    pre-commit fixes for:
    autogen/browser_utils.py:455: argumnets ==> arguments
    autogen/browser_utils.py:486: compatability ==> compatibility
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    20cd2a6 View commit details
    Browse the repository at this point in the history
  12. Update test_web_surfer.py

    Still a bit new to the unit test framework and had to remove some conditional statements that are covered elsewhere
    signalprime authored Feb 20, 2024
    Configuration menu
    Copy the full SHA
    0389387 View commit details
    Browse the repository at this point in the history
  13. Updates to include selenium in websurfer extras, webdrivers in the py…

    …thon-package.yml workflow, and additional small fixes to bring the PR into compliance
    signalprime committed Feb 20, 2024
    Configuration menu
    Copy the full SHA
    be89b9b View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2024

  1. Configuration menu
    Copy the full SHA
    0a40763 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    25e15e0 View commit details
    Browse the repository at this point in the history
  3. Restored to original form in official main branch. Added for clarity.…

    … Updated to account for refactoring. All updates now stable and done. Inside Dev Docker, all test files Passed, all pre-commit checks Passed.
    signalprime committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    5602958 View commit details
    Browse the repository at this point in the history
  4. Further cleaned the two test files and confirmed they passed using th…

    …e dev docker and the pytest library
    signalprime committed Feb 22, 2024
    Configuration menu
    Copy the full SHA
    8954fef View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    0c2202c View commit details
    Browse the repository at this point in the history
  6. Update contrib-tests.yml for Selenium

    This update should GitHub to use the WebSurfer extras when testing test_web_surfer_selenium.py.
    signalprime authored Feb 22, 2024
    Configuration menu
    Copy the full SHA
    13ba006 View commit details
    Browse the repository at this point in the history
  7. Update contrib-openai.yml

    Adding coverage within the Websurfer workflow for this PR: 
    - test/agentchat/contrib/test_web_surfer_selenium.py 
    - test/agentchat/contrib/test_content_agent.py
    signalprime authored Feb 22, 2024
    Configuration menu
    Copy the full SHA
    e1e81f6 View commit details
    Browse the repository at this point in the history
  8. Update contrib-tests.yml

    Adding `test/agentchat/contrib/test_content_agent.py --skip-openai` under the assumption that all test files must be accounted for or they will rely on the default workflow.  This test requires openAI calls, but still needs to be registered on this file to avoid build errors.
    signalprime authored Feb 22, 2024
    Configuration menu
    Copy the full SHA
    0b5e733 View commit details
    Browse the repository at this point in the history
  9. Update contrib-openai.yml

    removed duplicate entry for test_web_surfer_selenium.py
    signalprime authored Feb 22, 2024
    Configuration menu
    Copy the full SHA
    9099b57 View commit details
    Browse the repository at this point in the history
  10. Update setup.py

    Added the missing `pillow` dependency for graphical based web browsing and downstream tasks
    signalprime authored Feb 22, 2024
    Configuration menu
    Copy the full SHA
    7443458 View commit details
    Browse the repository at this point in the history
  11. Update test_content_agent.py

    Moving the ContentAgent import to be conditional on "not skip_oai" in the hope that it helps avoid the `markdownify` import error during build tests.
    signalprime authored Feb 22, 2024
    Configuration menu
    Copy the full SHA
    1b87acd View commit details
    Browse the repository at this point in the history
  12. Configuration menu
    Copy the full SHA
    11b00e5 View commit details
    Browse the repository at this point in the history
  13. Configuration menu
    Copy the full SHA
    66ac7bd View commit details
    Browse the repository at this point in the history
  14. Configuration menu
    Copy the full SHA
    6fbe0b8 View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2024

  1. Configuration menu
    Copy the full SHA
    451405b View commit details
    Browse the repository at this point in the history
  2. Provided a more descriptive name for the agent responsible for collec…

    …ting web data. Added '_' to internal functions and docstrings to the web_archiver_agent.py file.
    signalprime committed Feb 25, 2024
    Configuration menu
    Copy the full SHA
    c06f6fd View commit details
    Browse the repository at this point in the history

Commits on Mar 26, 2024

  1. Update web_surfer.py

    change _set_page_content to set_page_content
    signalprime authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    ef7586e View commit details
    Browse the repository at this point in the history
  2. Update browser_utils.py

    change _set_page_content to set_page_content
    signalprime authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    2be44bc View commit details
    Browse the repository at this point in the history
  3. Update browser_utils.py

    Removing the exception messages related to Selenium
    signalprime authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    e64ae32 View commit details
    Browse the repository at this point in the history
  4. Update contrib-openai.yml

    Minor fix to permit testing
    signalprime authored Mar 26, 2024
    Configuration menu
    Copy the full SHA
    3e7cf18 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    841ed31 View commit details
    Browse the repository at this point in the history