-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain #965
Comments
I'm having a similar issue. Using the same construct, but not defining the model returns Not working:
Working:
I added own issue for this |
Yeah, the problem is that not defining the model uses davinci-003 which costs 0.02 per token vs the 3.5 turbo, which is 0.002
…On Fri, Apr 28, 2023 at 4:56 AM, Kasper Hämäläinen < ***@***.*** > wrote:
I'm having a similar issue.
When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is
missing tokenUsage object.
Using the same construct, but not defining the model returns token usage as
part of llmOutput
Not working:
const model = new OpenAI({
openAIApiKey: openAISecret,
modelName: 'gpt-3.5-turbo',
callbacks: [
{
handleLLMEnd:
async (output: LLMResult) => {
logger. info ( http://logger.info/ ) ('output', { output })
logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage:
output.llmOutput })
// tokenUsage: UNDEFINED
},
},
],
})
Working:
const model = new OpenAI({
openAIApiKey: openAISecret,
callbacks: [
{
handleLLMEnd: async (output: LLMResult) => {
logger. info ( http://logger.info/ ) ('output', { output })
logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage:
output.llmOutput })
// tokenUsage: found
},
},
],
})
—
Reply to this email directly, view it on GitHub (
#965 (comment)
) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/ANK7G7HOXLZL5T2X7RZMIHDXDOA4FANCNFSM6AAAAAAXIXQAZY
).
You are receiving this because you authored the thread. Message ID: <hwchase17/langchainjs/issues/965/1527218240
@ github. com>
|
Use it this way:
I found that importing it like that returns the tokenUsage in handleLLMEnd handler |
still waiting for the solution on this |
tested with azure API: curl -X POST -H 'Content-type: application/json' -H 'User-Agent: OpenAI/NodeJS/3.3.0' -H 'api-key: xxxxx' -H --data '{"model":"gpt-3.5-turbo","temperature":0.7,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"!"}]}' https://{azureApiInstanceName}.openai.azure.com/openai/deployments/{azureOpenAIApiDeploymentName}/chat/completi ns\?api-version=2023-05-15 stream=false gets usage data, works as expected. |
I am also running into this. There doesn’t seem to be anyway to grab cost or at least token usage when calling chains or agents. Having and output after a chain or agent finishes with total usage would be great |
Same issue, tokenUsage is not returned when using OpenAI() model. |
same problem here, when streaming is set to true it doesn't return token usage. Any idea for workaround? |
Hello everyone, I recently started working on a stealth startup, and I'm using langchainjs as a core component of our tech stack. I must say, I've been impressed with the work done here! Thank you so much for all your hard work on this project, and for providing tools that startups like mine can rely on! While integrating the library, I noticed the problem of lack of token statistics when using ChatOpenAi in streaming mode. I did some digging in the code and I believe I found the source of the problem. In I believe that adding the required fields, using the .getNumTokensFromMessages(...) might address this. // EDIT 2023-08-17 8:50 CET I did some more digging. It's not as simple as I thought. Using .getNumTokensFromMessages(...) would introduce two more calls to OpenAI API. Using it to get It turns out that the original langchain implementation has the same problem. When Both implementations are actually correct as the source of problem lies within the OpenAI API. When streaming is enabled the token usage statistics are not being sent to the client at all. What is being sent is a stream of |
Did anyone find solution for this? |
I think the reason is that the GPT-3.5-turbo model can only be used for Chat models. curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ...." \
-d '{
"model": "gpt-3.5-turbo",
"prompt": "Say this is a test",
"max_tokens": 7,
"temperature": 0
}'
{
"error": {
"message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
"type": "invalid_request_error",
"param": "model",
"code": null
}
}' I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue. // old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
const prompt = PromptTemplate.fromTemplate(
"What is a good name for a company that makes {product}?"
);
const chain = new LLMChain({ llm: model, prompt });
const resA2 = await chain.run("colorful socks", {callbacks: [{
handleLLMEnd: (output, runId, parentRunId?, tags?) => {
const { completionTokens, promptTokens, totalTokens } =
output.llmOutput?.tokenUsage;
console.log(completionTokens ?? 0);
console.log(promptTokens ?? 0);
console.log(totalTokens ?? 0);
// "llmOutput": {
// "tokenUsage": {
// "completionTokens": 3,
// "promptTokens": 20,
// "totalTokens": 23
// }
// }
},
}]}); |
I managed to count tokens for const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
query: query,
}, {
callbacks: [
{
handleChatModelStart: async (llm, messages) => {
const tokenCount = tokenCounter(messages[0][0].content);
// The prompt is available here: messages[0][0].content
},
handleChainEnd: async (outputs) => {
const { text: outputText } = outputs;
// outputText is the response from the chat call
const tokenCount = tokenCounter(outputText);
}
}
]
}
); |
Doesn't that only account for the initial prompt and the final response (not any intermediate calls for functions, etc)? |
This solved the issue for me too! |
Any news on this? I still get an empty object for the token usage with streaming mode enabled. |
Hi Thread, I am using typescript sdk of langchain. I am still receiving 0 token count. Can you please here ? |
@jacoblee93 Any help here ? |
@hwchase17 @nfcampos @bracesproul @sullivan-sean Any help here ? |
Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back Package Version: 1.36.0
The output
|
I just wrote my own using the OpenAI api. The implementation is not that complex and you have more control, and don't have to wait over a year for someone else to fix.
…--
Raphael Castro
TED Team
Monster Reservations Group
C: 843.855.7133
www.monsterrg.com ( http://www.monsterrg.com/ )
On Tue, Apr 30, 2024 at 6:08 AM, Clovis Rodriguez < ***@***.*** > wrote:
Yes I'm experimenting the same issue here, the token counter it seems not
to be working for agents, I'm getting all token counts on 0 here I paste
the code I'm using and the log I'm getting back
*Package Version* : 1.36.0
V8 and Chromium: Node: 20.9.0; Chromium: 122
import { ChatOpenAI } from ***@***.***/openai";
import {
ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import
{ TavilySearchResults } from ***@***.***/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";
// Define the tools the agent will have access to.
const tools = [new
TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];
const llm
= new ChatOpenAI({
modelName: "gpt-4-turbo",
temperature: 0.15,
maxRetries: 3,
timeout: 30000,
callbacks: [
{
handleLLMEnd(output) {
console.log(output)
output.generations.map(generation => {
generation.map(g => {
// console.log(g.message.response_metadata.tokenUsage)
})
})
},
}
]
});
const prompt =
ChatPromptTemplate.fromMessages([
[
'system',
`You are a virtual agent`,
],
new MessagesPlaceholder({
variableName: 'chat_history',
optional: true,
}),
['user', '{input}'],
new MessagesPlaceholder({
variableName: 'agent_scratchpad',
optional: false,
}),
]);
const agent = await createOpenAIToolsAgent({
llm,
tools,
prompt,
});
const agentExecutor = new AgentExecutor({
agent,
tools,
});
const result = await agentExecutor.invoke({
input: "what is
LangChain?, describe it in a sentence",
});
console.log(result);
The output
{
generations: [
[
ChatGenerationChunk {
text:
'LangChain is a software library designed to facilitate the development of
applications that integrate language models, providing tools and
frameworks to streamline the process of building AI-powered language
understanding and generation features.',
generationInfo: {
prompt: 0,
completion: 0,
finish_reason: 'stop'
},
message: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: 'LangChain is a software library
designed to facilitate the development of applications that integrate
language models, providing tools and frameworks to streamline the process
of building AI-powered language understanding and generation features.',
additional_kwargs: {},
response_metadata: {
prompt: 0,
completion: 0,
finish_reason:
'stop'
},
tool_call_chunks: [],
tool_calls: [],
invalid_tool_calls: []
},
lc_namespace: [ 'langchain_core', 'messages' ],
content:
'LangChain is a software library designed to facilitate the development of
applications that integrate language models, providing tools and
frameworks to streamline the process of building AI-powered language
understanding and generation features.',
name: undefined,
additional_kwargs: {},
response_metadata: {
prompt:
0,
completion: 0,
finish_reason: 'stop'
},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
},
__proto__: {
constructor: ƒ ChatGenerationChunk(),
concat: ƒ concat()
}
}
]
]
}
—
Reply to this email directly, view it on GitHub (
#965 (comment)
) , or unsubscribe (
https://github.com/notifications/unsubscribe-auth/ANK7G7CA6O7Y2M3PN67TZ73Y75URDAVCNFSM6AAAAAAXIXQAZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBUHA4TKOBUGA
).
You are receiving this because you authored the thread. Message ID: <langchain-ai/langchainjs/issues/965/2084895840
@ github. com>
|
same issue |
@bracesproul Brace, I think the 0 token issue is a very serious problem, any chance you can look into it? |
Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful: import { encodingForModel } from 'js-tiktoken';
export class TokenCounter {
private _totalTokens: number = 0;
private _promptTokens: number = 0;
private _completionTokens: number = 0;
private _enc: any;
constructor(model) {
this._enc = encodingForModel(model);
}
encodeAndCountTokens(text: string): number {
return this._enc.encode(text).length;
}
handleLLMEnd(result: any) {
result.generations.forEach((generation: any) => {
const content = generation[0]?.message?.text || '';
const calls = generation[0]?.message?.additional_kwargs || '';
console.log('Calls & Content:', {
calls,
content,
});
const output = JSON.stringify(calls, null, 2);
const tokens = this.encodeAndCountTokens(content + output);
this._completionTokens += tokens;
});
console.log('Tokens for this LLMEnd:', this._completionTokens);
}
handleChatModelStart(_, args) {
args[0].forEach((arg) => {
const content = arg?.content || '';
const calls = arg?.additional_kwargs || '';
const tokens = this.encodeAndCountTokens(
content + JSON.stringify(calls, null, 2),
);
this._promptTokens += tokens;
console.log('content:', content, calls);
});
console.log('Tokens for this ChatModelStart:', this._promptTokens);
}
modelTracer() {
return {
handleChatModelStart: this.handleChatModelStart.bind(this),
handleLLMEnd: this.handleLLMEnd.bind(this),
};
}
sumTokens() {
this._totalTokens = this._promptTokens + this._completionTokens;
console.log('Total Tokens:', this._totalTokens);
}
} |
I will try your solution as soon as possible, thank you very much. The problem is that langsmith often shows 0 tokens. This makes a very important functionality of langsmith unusable due to this problem in Langchain. I hope @bracesproul or @jacoblee93 will look into this issue. |
Yes, will fix this as OpenAI recently added support. There is an open PR here #5485 |
Hey @jacoblee93. I just tested Release 0.2.4 and it still does not show the token usage when using The code: const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 });
const vectorStore = await FaissStore.load(`data/search_index_${projectId}.pkl`, new OpenAIEmbeddings());
const vectorStoreRetriever = vectorStore.asRetriever();
const SYSTEM_TEMPLATE = `...`;
const messages = [
SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
HumanMessagePromptTemplate.fromTemplate("{question}"),
];
const prompt = ChatPromptTemplate.fromMessages(messages);
const chain = RunnableSequence.from([
{
sourceDocuments: RunnableSequence.from([
(input) => input.question,
vectorStoreRetriever,
]),
question: (input) => input.question,
},
{
sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
question: (previousStepResult) => previousStepResult.question,
context: (previousStepResult) =>
formatDocumentsAsString(previousStepResult.sourceDocuments),
},
{
result: prompt.pipe(llm).pipe(new StringOutputParser()),
sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
},
]);
return await chain.stream({question: question}, {
callbacks: [
{
handleLLMEnd(output: LLMResult, runId: string, parentRunId?: string, tags?: string[]): any {
output.generations.map((g) => console.log(JSON.stringify(g, null, 2)));
}
}
]
}); The output is as follows: [
{
"text": "<the loooong answer goes here>",
"generationInfo": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessageChunk"
],
"kwargs": {
"content": "<the loooong answer goes here>",
"additional_kwargs": {},
"response_metadata": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"tool_call_chunks": [],
"tool_calls": [],
"invalid_tool_calls": []
}
}
}
] Am I looking for the token count in the wrong place? Or has it not been implemented yet to provide the token count at |
Can you verify you're on latest version of core and LangChain OpenAI? https://js.langchain.com/v0.2/docs/how_to/installation/#installing-integration-packages Otherwise will check tomorrow |
Yes definitely:
All these packages are on latest. |
I see openai just released an update for that and seems like it was already done via this PR |
EDIT: I should add that I'm use the Langchain Agents. I'm guessing support for token usage hasn't reached that yet Unfortunately I am also the latest packages, and get a 0 token count even for the last chunk that is supposed to contain usages. Zero counts happening for the
|
Ah @rrichc, @zaiddabaeen, @gkhngyk and others, you'll need to pass const response = await model.stream("Hello, how are you?", {
stream_options: {
include_usage: true,
},
}); You can also const model = model.bind({
stream_options: {
include_usage: true,
},
}); import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { formatDocumentsAsString } from "langchain/util/document";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 }).bind({
stream_options: {
include_usage: true
}
});
const vectorStore = await MemoryVectorStore.fromTexts([], [], new OpenAIEmbeddings());
const vectorStoreRetriever = vectorStore.asRetriever();
const SYSTEM_TEMPLATE = `You are a pro at responding to questions.`;
const prompt = ChatPromptTemplate.fromMessages([
["system", SYSTEM_TEMPLATE],
["human", "{question}"],
]);
const chain = RunnableSequence.from([
{
sourceDocuments: RunnableSequence.from([
(input) => input.question,
vectorStoreRetriever,
]),
question: (input) => input.question,
},
{
sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
question: (previousStepResult) => previousStepResult.question,
context: (previousStepResult) =>
formatDocumentsAsString(previousStepResult.sourceDocuments),
},
{
result: prompt.pipe(llm).pipe(new StringOutputParser()),
sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
},
]);
const stream = await chain.stream({question: "Who is this about?"}, {
callbacks: [
{
handleLLMEnd(output: any, runId: string, parentRunId?: string, tags?: string[]): any {
output.generations.map((g: any) => console.log(JSON.stringify(g, null, 2)));
/*
[
{
"text": "I'm here to provide information and assistance on a wide range of topics. Feel free to ask me anything you'd like to know more about!",
"generationInfo": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"message": {
"lc": 1,
"type": "constructor",
"id": [
"langchain_core",
"messages",
"AIMessageChunk"
],
"kwargs": {
"content": "I'm here to provide information and assistance on a wide range of topics. Feel free to ask me anything you'd like to know more about!",
"additional_kwargs": {},
"response_metadata": {
"prompt": 0,
"completion": 0,
"finish_reason": "stop"
},
"tool_call_chunks": [],
"usage_metadata": {
"input_tokens": 25,
"output_tokens": 29,
"total_tokens": 54
},
"tool_calls": [],
"invalid_tool_calls": []
}
}
}
]
*/
}
}
]
});
for await (const chunk of stream) {
// console.log(chunk);
} We don't pass this through by default - maybe we should reconsider CC @ccurme. We should also fix that misleading blank |
Ah, never mind. That data does not refer to usage. Will close this, please reopen if the above setting doesn't fix. Docs are here: https://js.langchain.com/v0.2/docs/how_to/chat_token_usage_tracking#openai-2 |
@jacoblee93 Is there a proper way to pass
I'd also like to use the |
As a temproary fix @rrichc, its possible to apply this patch and then set stream_options when instantiating the ChatOpenAI class.
|
Oh dear. Yeah I will fix that typing - sorry about this! That patch would be a good PR too if you're willing to push one up @deranga, thank you for making it. |
@jacoblee93, I've just created a PR as mentioned above. |
Have you solve this? and btw that provide a breakdown of token usage for each step in an Agent Executor? |
I am trying to get a token count for a process, I am passing callbacks to the class initialization like this
However all of the calls from a RetrievalQAChain end up in the catch portion of that try-catch block as 'tokenUsage' does not exist for those calls. Can someone point me in the right direction?
The text was updated successfully, but these errors were encountered: