Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It doesn't support new model "o1-mini" and "o1-preview" #120

Closed
Talented-Business opened this issue Sep 13, 2024 · 8 comments · Fixed by #121
Closed

It doesn't support new model "o1-mini" and "o1-preview" #120

Talented-Business opened this issue Sep 13, 2024 · 8 comments · Fixed by #121

Comments

@Talented-Business
Copy link

Talented-Business commented Sep 13, 2024

Hi openai devs,

how can I count tokens for o1-preview and o1-mini?

Thanks in advance!

@tmlxrd
Copy link

tmlxrd commented Sep 16, 2024

Hi,
I’m using the tiktoken library to count tokens for the gpt-4o-mini model. However, I’ve noticed a discrepancy between my token counts and the counts returned by the OpenAI API. It seems that tiktoken doesn’t fully support this new model yet, and the tokenization may differ slightly. Is there a plan to officially support the gpt-4o-mini in tiktoken?

Thanks in advance!

@tmlxrd
Copy link

tmlxrd commented Sep 16, 2024

Hi openai devs,

how can I count tokens for o1-preview and o1-mini?

Thanks in advance!

Here’s my example code:

const countTokens = (messages: any[], model: TiktokenModel): number => {
const enc = encoding_for_model(model); // Tokenizer for the model
let tokenCount = 0;

// Iterating over each message and counting tokens for 'role' and 'content'
messages.forEach((message) => {
    tokenCount += enc.encode(message.role).length;   // Count role tokens
    tokenCount += enc.encode(message.content).length; // Count content tokens
});

return tokenCount;

};

const messages = [
{
role: 'system',
content: instructions
},
{
role: 'user',
content: userContent
}
];

const model: TiktokenModel = "gpt-4o-mini";
const tokenCountInput = countTokens(messages, model);

@dqbd
Copy link
Owner

dqbd commented Sep 16, 2024

Hello! Will keep monitoring openai#337 to see if there are any changes w.r.t. the underlying token map.

@dqbd
Copy link
Owner

dqbd commented Sep 16, 2024

@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to separate the messages: see dqbd/tiktokenizer

@tmlxrd
Copy link

tmlxrd commented Sep 16, 2024

@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to divide the messages: see dqbd/tiktokenizer

Thank you for your answer!
I do this because I get a smaller number of tokens than openai returns in the api response

I got 1708 incoming tokens in the big text and 1717 in the response from openai. It's a small difference, but I don't understand what it's about, so I added two roles

UPD: Thank you for the link to the feature. It works better now, but there are discrepancies with the answer from openai

@NuoJohnChen
Copy link

Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?

@tmlxrd
Copy link

tmlxrd commented Sep 17, 2024

Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?

Hi. Unfortunately, I don't know that. Share the answer if you find the information

@dqbd
Copy link
Owner

dqbd commented Oct 3, 2024

Got clarification with the latest tiktoken@0.8.0 release, updating here as well

@dqbd dqbd closed this as completed in #121 Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants