[Feature Request]: Generalize token accounting and budgeting #1680

afourney · 2024-02-14T21:42:18Z

Is your feature request related to a problem? Please describe.

At present, model token limits are hardcoded here:

Lines 16 to 34 in f68c09b

    
           max_token_limit = { 
        
               "gpt-3.5-turbo": 4096, 
        
               "gpt-3.5-turbo-0301": 4096, 
        
               "gpt-3.5-turbo-0613": 4096, 
        
               "gpt-3.5-turbo-instruct": 4096, 
        
               "gpt-3.5-turbo-16k": 16385, 
        
               "gpt-3.5-turbo-16k-0613": 16385, 
        
               "gpt-3.5-turbo-1106": 16385, 
        
               "gpt-4": 8192, 
        
               "gpt-4-32k": 32768, 
        
               "gpt-4-32k-0314": 32768,  # deprecate in Sep 
        
               "gpt-4-0314": 8192,  # deprecate in Sep 
        
               "gpt-4-0613": 8192, 
        
               "gpt-4-32k-0613": 32768, 
        
               "gpt-4-1106-preview": 128000, 
        
               "gpt-4-0125-preview": 128000, 
        
               "gpt-4-turbo-preview": 128000, 
        
               "gpt-4-vision-preview": 128000, 
        
           }

Token costs are hardcoded here:

autogen/autogen/oai/openai_utils.py

Lines 23 to 53 in f68c09b

    
           OAI_PRICE1K = { 
        
               "text-ada-001": 0.0004, 
        
               "text-babbage-001": 0.0005, 
        
               "text-curie-001": 0.002, 
        
               "code-cushman-001": 0.024, 
        
               "code-davinci-002": 0.1, 
        
               "text-davinci-002": 0.02, 
        
               "text-davinci-003": 0.02, 
        
               "gpt-3.5-turbo-instruct": (0.0015, 0.002), 
        
               "gpt-3.5-turbo-0301": (0.0015, 0.002),  # deprecate in Sep 
        
               "gpt-3.5-turbo-0613": (0.0015, 0.002), 
        
               "gpt-3.5-turbo-16k": (0.003, 0.004), 
        
               "gpt-3.5-turbo-16k-0613": (0.003, 0.004), 
        
               "gpt-35-turbo": (0.0015, 0.002), 
        
               "gpt-35-turbo-16k": (0.003, 0.004), 
        
               "gpt-35-turbo-instruct": (0.0015, 0.002), 
        
               "gpt-4": (0.03, 0.06), 
        
               "gpt-4-32k": (0.06, 0.12), 
        
               "gpt-4-0314": (0.03, 0.06),  # deprecate in Sep 
        
               "gpt-4-32k-0314": (0.06, 0.12),  # deprecate in Sep 
        
               "gpt-4-0613": (0.03, 0.06), 
        
               "gpt-4-32k-0613": (0.06, 0.12), 
        
               # 11-06 
        
               "gpt-3.5-turbo": (0.0015, 0.002),  # default is still 0613 
        
               "gpt-3.5-turbo-1106": (0.001, 0.002), 
        
               "gpt-35-turbo-1106": (0.001, 0.002), 
        
               "gpt-4-1106-preview": (0.01, 0.03), 
        
               "gpt-4-0125-preview": (0.01, 0.03), 
        
               "gpt-4-turbo-preview": (0.01, 0.03), 
        
               "gpt-4-1106-vision-preview": (0.01, 0.03),  # TODO: support vision pricing of images 
        
           }

However, there are a number of issues with this including:

The name "max_token_limit" is confusing and can clash with "max_tokens", which is a parameter of the chat completion request, and refers exclusively to output tokens.
Adding new models requires editing both files. It's easy to forget one or the other.
It's keyed off model names, but those name may not be reliable. For example, "gpt-4" on Azure might point to an 8k, 32k, or 128k model. There is no requirement that these match with OpenAI names. Likewise, model names may not match at all (e.g., "my-gpt-4")
Adding new models or providers requires hand-editing core library files.

Describe the solution you'd like

I would like to be able to provide token accounting information in the OAI_CONFIG_LIST.

Perhaps something like this:

[
        "model": "gpt-4-turbo",
        "api_key": "blahblahblah",
        "base_url": "https://mymodel.openai.azure.com/",
        "api_type": "azure",
        "api_version": "2023-12-01-preview",
        "window_size": (128000, 4096),
        "1k_token_cost": (0.01, 0.03) 
]

In the case where input and output tokens are not distinguished:

[
        "model": "gpt-3.5-turbo",
        "api_key": "blahblahblah",
        "base_url": "https://mymodel.openai.azure.com/",
        "api_type": "azure",
        "api_version": "2023-12-01-preview",
        "window_size": 16385,
        "1k_token_cost": (0.0005, 0.0015) 
]

We would then use these values when present to compute costs, token limits, etc.

Additional context

No response

The text was updated successfully, but these errors were encountered:

afourney · 2024-09-25T22:42:13Z

#1682 was closed, so I'm closing this issue as well. Reopen ad needed.

afourney added the enhancement New feature or request label Feb 14, 2024

afourney self-assigned this Feb 14, 2024

afourney mentioned this issue Feb 14, 2024

Allow costs and window size limits to be specified in the config_list #1682

Closed

3 tasks

afourney closed this as completed Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Generalize token accounting and budgeting #1680

[Feature Request]: Generalize token accounting and budgeting #1680

afourney commented Feb 14, 2024 •

edited

Loading

afourney commented Sep 25, 2024

[Feature Request]: Generalize token accounting and budgeting #1680

[Feature Request]: Generalize token accounting and budgeting #1680

Comments

afourney commented Feb 14, 2024 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Additional context

afourney commented Sep 25, 2024

afourney commented Feb 14, 2024 •

edited

Loading