Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Roadmap]: Roadmap about Enhanced non-OpenAI models #2946

Open
6 of 9 tasks
qingyun-wu opened this issue Jun 14, 2024 · 18 comments
Open
6 of 9 tasks

[Roadmap]: Roadmap about Enhanced non-OpenAI models #2946

qingyun-wu opened this issue Jun 14, 2024 · 18 comments
Labels
roadmap Issues related to roadmap of AutoGen

Comments

@qingyun-wu
Copy link
Collaborator

qingyun-wu commented Jun 14, 2024

One major milestone in release v0.2.30-v0.2.32 will be enhanced support of non-OpenAI models.

Plan for the next release v0.2.32:

Tasks about enhanced non-OpenAI Model Support

@marklysze, @Hk669, @yiranwu0, feel free to add tasks to the task list.

💡Feel free to suggest any other pressing features you want to see,/which issues you hope to be addressed, or/which PRs to be merged in the next release!

And let's make it happen together 🏆💪!

Finished task in release v0.2.30 and v0.2.31

Tasks about enhanced non-OpenAI Model Support

@Hk669
Copy link
Collaborator

Hk669 commented Jun 14, 2024

thanks @qingyun-wu.

@Hk669 Hk669 added the roadmap Issues related to roadmap of AutoGen label Jun 14, 2024
@Josephrp
Copy link
Collaborator

may i suggest 01 Yi model familly apis , i just got the docs, and this was actually on my list : to bring it to autogen i mean :-) openai drop in compatible already.

@Josephrp
Copy link
Collaborator

also the cohere api has high potential since they have a performant function calling model among other interesting offerings , just a thought

@qingyun-wu
Copy link
Collaborator Author

may i suggest 01 Yi model familly apis , i just got the docs, and this was actually on my list : to bring it to autogen i mean :-) openai drop in compatible already.

Good ideas! Let's see if we can find volunteers to add those!

@qingyun-wu qingyun-wu pinned this issue Jun 14, 2024
@PanQiWei
Copy link

Hi, I find this roadmap is mainly about clients implementation. Here is an optimization advice I would very appreciate if you can also implement in 0.2.30 🙏

Suggestion

Implement an object pool to cache clients to avoid instantiate client with the same key repeatly.

Reason

This is really influence agent's init speed when there are tones of agents and tools. Especially for tools/functions, everytime a tool is registered to a llm, a new OpenAIWrapper is created to update config, that's ok if only request config (payload) is updated, however, in current implementation, this will always create a new client such as openai.OpenAI for _register_default_client is called when init a OpenAIWrapper no matter what.

Below is an image cut for my project's agents init profie result, as you can see, it costed up to ~3s in total to initiate all agents when there are tools registered and llm_config provided (even though they are all the same for each agent).
image

The root cause is load_verify_locations in ssl package which used in httpx under the hood in openai client. Thus it there is a cache mechanism (such as object pool) implemented in client level, it would boost up a lot for agent initialization when one's project using lots of agents and tools at the same time, make it truely possible for product deployment.

image

@PanQiWei
Copy link

Here is my simple implementation for caching client, hope it helpful:

import json
import logging
import sys
from hashlib import md5
from typing import Any, Dict
from threading import Lock

from autogen import OpenAIWrapper
from autogen.oai.client import PlaceHolderClient
from flaml.automl.logger import logger_formatter

from omne._types import ThreadLevelSingleton

logger = logging.getLogger(__name__)
if not logger.handlers:
    # Add the console handler.
    _ch = logging.StreamHandler(stream=sys.stdout)
    _ch.setFormatter(logger_formatter)
    logger.addHandler(_ch)


def _config_to_key(config: Dict[str, Any]) -> str:
    return md5(json.dumps(config, sort_keys=True).encode()).hexdigest()


class ClientCache(ThreadLevelSingleton):
    def __init__(self):
        self._client_creation_lock = Lock()

        self._oai_clients = {}
        self._aoai_clients = {}
        self._google_clients = {}

    def _get_client(self, cache: dict, config: Dict[str, Any], client_class: Any):
        key = _config_to_key(config)
        if key not in cache:
            with self._client_creation_lock:
                if key not in cache:
                    cache[key] = client_class(**config)
        return cache[key]

    def create_or_get_oai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import OpenAI
        return OpenAIClient(client=self._get_client(self._oai_clients, config, OpenAI).copy())

    def create_or_get_aoai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import AzureOpenAI
        return OpenAIClient(client=self._get_client(self._aoai_clients, config, AzureOpenAI).copy())

    def create_or_get_google_client(self, config: Dict[str, Any]):
        try:
            from autogen.oai.gemini import GeminiClient
        except:
            raise ImportError("Please install `google-generativeai` to use Google OpenAI API.")
        return self._get_client(self._google_clients, config, GeminiClient)


def _register_default_client(self, config: Dict[str, Any], openai_config: Dict[str, Any]) -> None:
    client_cache = ClientCache()

    openai_config = {**openai_config, **{k: v for k, v in config.items() if k in self.openai_kwargs}}
    api_type = config.get("api_type")
    model_client_cls_name = config.get("model_client_cls")
    if model_client_cls_name is not None:
        # a config for a custom client is set
        # adding placeholder until the register_model_client is called with the appropriate class
        self._clients.append(PlaceHolderClient(config))
        logger.info(
            f"Detected custom model client in config: {model_client_cls_name}, model client can not be used until register_model_client is called."
        )
    else:
        if api_type is not None and api_type.startswith("azure"):
            self._configure_azure_openai(config, openai_config)
            self._clients.append(client_cache.create_or_get_aoai_client(openai_config))
        elif api_type is not None and api_type.startswith("google"):
            self._clients.append(client_cache.create_or_get_google_client(openai_config))
        else:
            self._clients.append(client_cache.create_or_get_oai_client(openai_config))


def patch_openai_wrapper():
    OpenAIWrapper._register_default_client = _register_default_client


__all__ = ["patch_openai_wrapper"]

@qingyun-wu
Copy link
Collaborator Author

Here is my simple implementation for caching client, hope it helpful:

import json
import logging
import sys
from hashlib import md5
from typing import Any, Dict
from threading import Lock

from autogen import OpenAIWrapper
from autogen.oai.client import PlaceHolderClient
from flaml.automl.logger import logger_formatter

from omne._types import ThreadLevelSingleton

logger = logging.getLogger(__name__)
if not logger.handlers:
    # Add the console handler.
    _ch = logging.StreamHandler(stream=sys.stdout)
    _ch.setFormatter(logger_formatter)
    logger.addHandler(_ch)


def _config_to_key(config: Dict[str, Any]) -> str:
    return md5(json.dumps(config, sort_keys=True).encode()).hexdigest()


class ClientCache(ThreadLevelSingleton):
    def __init__(self):
        self._client_creation_lock = Lock()

        self._oai_clients = {}
        self._aoai_clients = {}
        self._google_clients = {}

    def _get_client(self, cache: dict, config: Dict[str, Any], client_class: Any):
        key = _config_to_key(config)
        if key not in cache:
            with self._client_creation_lock:
                if key not in cache:
                    cache[key] = client_class(**config)
        return cache[key]

    def create_or_get_oai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import OpenAI
        return OpenAIClient(client=self._get_client(self._oai_clients, config, OpenAI).copy())

    def create_or_get_aoai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import AzureOpenAI
        return OpenAIClient(client=self._get_client(self._aoai_clients, config, AzureOpenAI).copy())

    def create_or_get_google_client(self, config: Dict[str, Any]):
        try:
            from autogen.oai.gemini import GeminiClient
        except:
            raise ImportError("Please install `google-generativeai` to use Google OpenAI API.")
        return self._get_client(self._google_clients, config, GeminiClient)


def _register_default_client(self, config: Dict[str, Any], openai_config: Dict[str, Any]) -> None:
    client_cache = ClientCache()

    openai_config = {**openai_config, **{k: v for k, v in config.items() if k in self.openai_kwargs}}
    api_type = config.get("api_type")
    model_client_cls_name = config.get("model_client_cls")
    if model_client_cls_name is not None:
        # a config for a custom client is set
        # adding placeholder until the register_model_client is called with the appropriate class
        self._clients.append(PlaceHolderClient(config))
        logger.info(
            f"Detected custom model client in config: {model_client_cls_name}, model client can not be used until register_model_client is called."
        )
    else:
        if api_type is not None and api_type.startswith("azure"):
            self._configure_azure_openai(config, openai_config)
            self._clients.append(client_cache.create_or_get_aoai_client(openai_config))
        elif api_type is not None and api_type.startswith("google"):
            self._clients.append(client_cache.create_or_get_google_client(openai_config))
        else:
            self._clients.append(client_cache.create_or_get_oai_client(openai_config))


def patch_openai_wrapper():
    OpenAIWrapper._register_default_client = _register_default_client


__all__ = ["patch_openai_wrapper"]

Thanks @PanQiWei! This looks great! I wonder if you would like to contribute? Or help to review/test it if we find contributors? We can chat for more details on Discord: https://discord.com/invite/Yb5gwGVkE5. Thank you!

@Phodaie
Copy link

Phodaie commented Jun 18, 2024

Regarding Cohere Command R and Command R+ models, I have implemented a basic CohereAgent(ConversableAgent). Just as GPTAssistantAgent I think Cohere model support should be in the form of CoversableAgent extension (not ModelClient). These models have support for parallel and sequential function calling so a single prompt may result in the model calling multiple (dependent) functions/tools in sequence before returning its response.

@geoffroy-noel-ddh
Copy link

geoffroy-noel-ddh commented Jun 18, 2024

Better support for local models would be appreciated. See issues reported in #2953 . At a minimum indicating in your documentation which examples have been successfully tested with local models. That would save a lot of time for developers new to autogen trying to understand why your examples work so differently once they use something else than OpenAI. If it doesn't work being upfront about the limitations would greatly help. If it works, telling which model it has been successfully tested with would also save users a lot of time & efforts.

I think it is a common problem with many frameworks (e.g. Langchain). There are plenty of tutorials, examples, prompts, etc. designed primarily and often tested exclusively with OpenAI services but assessing whether they work sufficiently with local models (or how to make them work, or if anyone has ever managed to make them work) requires a lot of experimentations, online search, etc. which can quickly go beyond the resources of smaller development teams.

@qingyun-wu
Copy link
Collaborator Author

Can we also address this issue in this release: #1262
@yiranwu0 , @Hk669, @marklysze Thanks!

@scruffynerf
Copy link

Add #2929 and #2930

@Hk669
Copy link
Collaborator

Hk669 commented Jun 19, 2024

Add #2929 and #2930

thanks, I think the Anthropic client will close these issues.

@scruffynerf
Copy link

scruffynerf commented Jun 19, 2024

Instructor clients? Instructor needs to use a custom client even with using OpenAI API because it wraps the calls and enforces the response model so it can re-request multiple times until it succeeds (or fails N times)
So while it might be possible to support via a non-client method (you'd have to hooks in multiple places, and reproduce Instructor-ish behavior ala Guidance (but much better than that), allowing using Instructor as a client is much easier.

I have code for this to submit.

@scruffynerf
Copy link

I also did an Ollama Raw client, though it's not really worth the effort (I did it to do mistral v0.3 tools but it works fine with my 'toolsfortoolless' code without Raw. I'll probably put it someplace regardless, just so it's out there.

@garnermccloud
Copy link
Contributor

It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library:
https://github.com/BerriAI/litellm

Has anyone looked into this yet? Is there functionality specific to AutoGen that isn't supported in LiteLLM?

@scruffynerf
Copy link

scruffynerf commented Jun 20, 2024

It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library...

That doesn't really address the problem of wanting other clients supported directly. Yes, litellm as a wrapper can work for some, and recommending it is fine, but it's not an answer for everyone. We already have some 'tweaks' in various clients to allow adjusting stuff as needed. Using a 'universal wrapper' means you can't tune that way. If it did, we could adjust the OpenAI wrapper we already use in a majority of cases.

I go back and forth between using Ollama and LLMStudio, and I've played with other local servers. Each has pros and cons. Same is true of wrappers like litellm. There are tradeoffs. Litellm as a library gives you a translation from a common format to various specific formats that differ, but you also lose ability to do those tweaks.

OpenAI API support is the most common, but not universal, API, and adding additional APIs with 'in repo' supported clients is a good thing, because there will always be other flavors of API out there.

I wouldn't be opposed to seeing a LiteLLM using client to be clear, I just don't want it to be 'the answer'

@marklysze
Copy link
Collaborator

I was thinking that this roadmap can cover the cloud-based inference providers.

Separately, I think it would be good to have a local LLM focused blog and associated PRs on a roadmap. That could focus on client classes for the likes of LiteLLM / Ollama / etc. as well as approaches / classes like @scruffynerf's "toolsfortoolless". Local LLMs is an area I started out in and found it frustrating, like @geoffroy-noel-ddh noted, trying to figure out the right LLM for the right setup. If that's something that people want to work on let's create that.

@qingyun-wu qingyun-wu changed the title [Roadmap]: v0.2.30 roadmap [Roadmap]: v0.2.32 roadmap Jun 22, 2024
@qingyun-wu qingyun-wu changed the title [Roadmap]: v0.2.32 roadmap [Roadmap]: Roadmap about Enhanced non-OpenAI models in v0.2.30, v0.2.31, and v0.2.32 Jun 22, 2024
@qingyun-wu qingyun-wu changed the title [Roadmap]: Roadmap about Enhanced non-OpenAI models in v0.2.30, v0.2.31, and v0.2.32 [Roadmap]: Roadmap about Enhanced non-OpenAI models Jun 22, 2024
@brycecf
Copy link

brycecf commented Jun 24, 2024

It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library: https://github.com/BerriAI/litellm

Has anyone looked into this yet? Is there functionality specific to AutoGen that isn't supported in LiteLLM?

LiteLLM did not actually resolve the underlying issue that AutoGen is implemented assuming an OpenAI/GPT-style valid conversation flow. LiteLLM just creates an API proxy.

At least prior to the Anthropic PRs. Haven't tested since then to see if it consistently works now (with LiteLLM.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
roadmap Issues related to roadmap of AutoGen
Projects
None yet
Development

No branches or pull requests

10 participants