It doesn't support new model "o1-mini" and "o1-preview" #120

Talented-Business · 2024-09-13T09:08:04Z

Hi openai devs,

how can I count tokens for o1-preview and o1-mini?

Thanks in advance!

tmlxrd · 2024-09-16T16:26:07Z

Hi,
I’m using the tiktoken library to count tokens for the gpt-4o-mini model. However, I’ve noticed a discrepancy between my token counts and the counts returned by the OpenAI API. It seems that tiktoken doesn’t fully support this new model yet, and the tokenization may differ slightly. Is there a plan to officially support the gpt-4o-mini in tiktoken?

Thanks in advance!

tmlxrd · 2024-09-16T16:28:59Z

Hi openai devs,

how can I count tokens for o1-preview and o1-mini?

Thanks in advance!

Here’s my example code:

const countTokens = (messages: any[], model: TiktokenModel): number => {
const enc = encoding_for_model(model); // Tokenizer for the model
let tokenCount = 0;

// Iterating over each message and counting tokens for 'role' and 'content'
messages.forEach((message) => {
    tokenCount += enc.encode(message.role).length;   // Count role tokens
    tokenCount += enc.encode(message.content).length; // Count content tokens
});

return tokenCount;

};

const messages = [
{
role: 'system',
content: instructions
},
{
role: 'user',
content: userContent
}
];

const model: TiktokenModel = "gpt-4o-mini";
const tokenCountInput = countTokens(messages, model);

dqbd · 2024-09-16T16:29:36Z

Hello! Will keep monitoring openai#337 to see if there are any changes w.r.t. the underlying token map.

dqbd · 2024-09-16T16:30:27Z

@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to separate the messages: see dqbd/tiktokenizer

tmlxrd · 2024-09-16T16:33:22Z

@tmlxrd Just counting role and content is not necessarily enough. You need to also include the tokens which are used to divide the messages: see dqbd/tiktokenizer

Thank you for your answer!
I do this because I get a smaller number of tokens than openai returns in the api response

I got 1708 incoming tokens in the big text and 1717 in the response from openai. It's a small difference, but I don't understand what it's about, so I added two roles

UPD: Thank you for the link to the feature. It works better now, but there are discrepancies with the answer from openai

NuoJohnChen · 2024-09-17T06:04:40Z

Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?

tmlxrd · 2024-09-17T08:46:08Z

Do 'o1-mini' and 'o1-preview' still use the cl100k_base vocabulary?

Hi. Unfortunately, I don't know that. Share the answer if you find the information

dqbd · 2024-10-03T23:22:59Z

Got clarification with the latest tiktoken@0.8.0 release, updating here as well

dqbd mentioned this issue Oct 3, 2024

feat: add missing o1 models #121

Merged

dqbd closed this as completed in #121 Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It doesn't support new model "o1-mini" and "o1-preview" #120

It doesn't support new model "o1-mini" and "o1-preview" #120

Talented-Business commented Sep 13, 2024 •

edited

Loading

tmlxrd commented Sep 16, 2024

tmlxrd commented Sep 16, 2024

dqbd commented Sep 16, 2024

dqbd commented Sep 16, 2024 •

edited

Loading

tmlxrd commented Sep 16, 2024 •

edited

Loading

NuoJohnChen commented Sep 17, 2024

tmlxrd commented Sep 17, 2024 •

edited

Loading

dqbd commented Oct 3, 2024

It doesn't support new model "o1-mini" and "o1-preview" #120

It doesn't support new model "o1-mini" and "o1-preview" #120

Comments

Talented-Business commented Sep 13, 2024 • edited Loading

tmlxrd commented Sep 16, 2024

tmlxrd commented Sep 16, 2024

dqbd commented Sep 16, 2024

dqbd commented Sep 16, 2024 • edited Loading

tmlxrd commented Sep 16, 2024 • edited Loading

NuoJohnChen commented Sep 17, 2024

tmlxrd commented Sep 17, 2024 • edited Loading

dqbd commented Oct 3, 2024

Talented-Business commented Sep 13, 2024 •

edited

Loading

dqbd commented Sep 16, 2024 •

edited

Loading

tmlxrd commented Sep 16, 2024 •

edited

Loading

tmlxrd commented Sep 17, 2024 •

edited

Loading