How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain #965

theTechGoose · 2023-04-23T20:00:51Z

I am trying to get a token count for a process, I am passing callbacks to the class initialization like this

let finalTokens = 0
const initPayload = {
  openAIApiKey: process.env['OPEN_AI_KEY'],
  temperature: 1.5,
  callbacks: [
    {
      handleLLMEnd: (val) => {
        try {
          const tokens = val.llmOutput.tokenUsage.totalTokens
          finalTokens += tokens
          console.log({tokens, finalTokens})
        } catch {
          console.log(val.generations[0])
        }
      },
    },
  ],
};

However all of the calls from a RetrievalQAChain end up in the catch portion of that try-catch block as 'tokenUsage' does not exist for those calls. Can someone point me in the right direction?

miekassu · 2023-04-28T08:56:24Z

I'm having a similar issue.
When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is missing tokenUsage object.

Using the same construct, but not defining the model returns token usage as part of llmOutput

Not working:

  const model = new OpenAI({
    openAIApiKey: openAISecret,
    modelName: 'gpt-3.5-turbo',
    callbacks: [
      {
        handleLLMEnd: async (output: LLMResult) => {
          logger.info('output', { output })
          logger.info('tokenUsage', { tokenUsage: output.llmOutput })
          // tokenUsage: UNDEFINED
        },
      },
    ],
  })

Working:

  const model = new OpenAI({
    openAIApiKey: openAISecret,
    callbacks: [
      {
        handleLLMEnd: async (output: LLMResult) => {
          logger.info('output', { output })
          logger.info('tokenUsage', { tokenUsage: output.llmOutput })
          // tokenUsage: found
        },
      },
    ],
  })

I added own issue for this

theTechGoose · 2023-04-28T16:27:43Z

Yeah, the problem is that not defining the model uses davinci-003 which costs 0.02 per token vs the 3.5 turbo, which is 0.002

…

On Fri, Apr 28, 2023 at 4:56 AM, Kasper Hämäläinen < ***@***.*** > wrote: I'm having a similar issue. When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is missing tokenUsage object. Using the same construct, but not defining the model returns token usage as part of llmOutput Not working: const model = new OpenAI({ openAIApiKey: openAISecret, modelName: 'gpt-3.5-turbo', callbacks: [ { handleLLMEnd: async (output: LLMResult) => { logger. info ( http://logger.info/ ) ('output', { output }) logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage: output.llmOutput }) // tokenUsage: UNDEFINED }, }, ], }) Working: const model = new OpenAI({ openAIApiKey: openAISecret, callbacks: [ { handleLLMEnd: async (output: LLMResult) => { logger. info ( http://logger.info/ ) ('output', { output }) logger. info ( http://logger.info/ ) ('tokenUsage', { tokenUsage: output.llmOutput }) // tokenUsage: found }, }, ], }) — Reply to this email directly, view it on GitHub ( #965 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ANK7G7HOXLZL5T2X7RZMIHDXDOA4FANCNFSM6AAAAAAXIXQAZY ). You are receiving this because you authored the thread. Message ID: <hwchase17/langchainjs/issues/965/1527218240 @ github. com>

ciocan · 2023-05-04T15:21:34Z

Use it this way:

import { ChatOpenAI } from "langchain/chat_models/openai";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });

I found that importing it like that returns the tokenUsage in handleLLMEnd handler

ibrahimyaacob92 · 2023-07-02T15:08:21Z

still waiting for the solution on this

pond918 · 2023-07-09T06:22:19Z

tested with azure API:

curl -X POST -H 'Content-type: application/json' -H 'User-Agent: OpenAI/NodeJS/3.3.0' -H 'api-key: xxxxx' -H --data '{"model":"gpt-3.5-turbo","temperature":0.7,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"!"}]}' https://{azureApiInstanceName}.openai.azure.com/openai/deployments/{azureOpenAIApiDeploymentName}/chat/completi ns\?api-version=2023-05-15

stream=false gets usage data, works as expected.
stream=true result "usage":null

j1philli · 2023-07-11T06:44:35Z

I am also running into this. There doesn’t seem to be anyway to grab cost or at least token usage when calling chains or agents. Having and output after a chain or agent finishes with total usage would be great

fvisticot · 2023-08-01T14:46:27Z

Same issue, tokenUsage is not returned when using OpenAI() model.

nikorter · 2023-08-14T10:57:20Z

same problem here, when streaming is set to true it doesn't return token usage. Any idea for workaround?

mchalapuk · 2023-08-17T06:04:25Z

Hello everyone,

I recently started working on a stealth startup, and I'm using langchainjs as a core component of our tech stack. I must say, I've been impressed with the work done here! Thank you so much for all your hard work on this project, and for providing tools that startups like mine can rely on!

While integrating the library, I noticed the problem of lack of token statistics when using ChatOpenAi in streaming mode. I did some digging in the code and I believe I found the source of the problem.

In _generate method of ChatOpenAi class it's expected that data.usage contains completion_tokens, prompt_tokens, and total_tokens fields which are later copied to tokenUsage. When ChatOpenAi is instantiated with streaming: false, response.data field from the call to OpenAIApi.createChatCompletion is returned as data. response.data is an instance of Completion which indeed contains usage with the required fields. That's why token usage works with streaming: false. When ChatOpenAi is instantiated with streaming: true, response object is not created by the OpenAIApi but in the code of _generate instead. This branch of the implementation doesn't set the usage field at all.

I believe that adding the required fields, using the .getNumTokensFromMessages(...) might address this.

// EDIT 2023-08-17 8:50 CET

I did some more digging. It's not as simple as I thought. Using .getNumTokensFromMessages(...) would introduce two more calls to OpenAI API. Using it to get tokenUsage for each call with streaming: true would introduce additional cost for all users of the library even if they don't care about token usage.

It turns out that the original langchain implementation has the same problem. When streaming=True, The ChatResult instance is created without llm_output field which contains token usage stats.

Both implementations are actually correct as the source of problem lies within the OpenAI API. When streaming is enabled the token usage statistics are not being sent to the client at all. What is being sent is a stream of chat.completion.chunk objects that don't contain any token information.

MirzaHasnat · 2023-09-12T19:07:39Z

Did anyone find solution for this?

ankitruong · 2023-09-23T19:10:29Z

I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ...." \
  -d '{
    "model": "gpt-3.5-turbo",         
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'
{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}'

I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.

// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new 
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });

const prompt = PromptTemplate.fromTemplate(
  "What is a good name for a company that makes {product}?"
);

const chain = new LLMChain({ llm: model, prompt }); 

const resA2 = await chain.run("colorful socks", {callbacks: [{
  handleLLMEnd: (output, runId, parentRunId?, tags?) => {
    const { completionTokens, promptTokens, totalTokens } =
      output.llmOutput?.tokenUsage; 
    console.log(completionTokens ?? 0);
    console.log(promptTokens ?? 0);
    console.log(totalTokens ?? 0);
    
// "llmOutput": {
//     "tokenUsage": {
//       "completionTokens": 3,
//       "promptTokens": 20,
//       "totalTokens": 23
//     }
//   }
  },
}]});

liowalex · 2023-09-27T19:45:18Z

I managed to count tokens for streaming: true by using callbacks:

const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
    query: query,
  }, {
    callbacks: [
      {
        handleChatModelStart: async (llm, messages) => {
          const tokenCount = tokenCounter(messages[0][0].content);
          // The prompt is available here: messages[0][0].content
        },
        handleChainEnd: async (outputs) => {
          const { text: outputText } = outputs;
          // outputText is the response from the chat call
          const tokenCount = tokenCounter(outputText);
        }
      }
    ]
  }
);

jwilger · 2023-10-09T23:03:49Z

I managed to count tokens for streaming: true by using callbacks:

const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
    query: query,
  }, {
    callbacks: [
      {
        handleChatModelStart: async (llm, messages) => {
          const tokenCount = tokenCounter(messages[0][0].content);
          // The prompt is available here: messages[0][0].content
        },
        handleChainEnd: async (outputs) => {
          const { text: outputText } = outputs;
          // outputText is the response from the chat call
          const tokenCount = tokenCounter(outputText);
        }
      }
    ]
  }
);

Doesn't that only account for the initial prompt and the final response (not any intermediate calls for functions, etc)?

girithodu · 2023-12-12T06:52:31Z

I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ...." \
  -d '{
    "model": "gpt-3.5-turbo",         
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'
{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}'

I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.

// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new 
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });

const prompt = PromptTemplate.fromTemplate(
  "What is a good name for a company that makes {product}?"
);

const chain = new LLMChain({ llm: model, prompt }); 

const resA2 = await chain.run("colorful socks", {callbacks: [{
  handleLLMEnd: (output, runId, parentRunId?, tags?) => {
    const { completionTokens, promptTokens, totalTokens } =
      output.llmOutput?.tokenUsage; 
    console.log(completionTokens ?? 0);
    console.log(promptTokens ?? 0);
    console.log(totalTokens ?? 0);
    
// "llmOutput": {
//     "tokenUsage": {
//       "completionTokens": 3,
//       "promptTokens": 20,
//       "totalTokens": 23
//     }
//   }
  },
}]});

This solved the issue for me too!

brokenfiles · 2024-03-08T13:16:54Z

Any news on this?

I still get an empty object for the token usage with streaming mode enabled.

rutwikpulseenergy · 2024-04-17T03:09:55Z

Hi Thread, I am using typescript sdk of langchain. I am still receiving 0 token count. Can you please here ?

rutwikpulseenergy · 2024-04-24T13:09:17Z

@jacoblee93 Any help here ?

rutwikpulseenergy · 2024-04-25T02:23:35Z

@hwchase17 @nfcampos @bracesproul @sullivan-sean Any help here ?

clovisrodriguez · 2024-04-30T10:07:52Z

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version: 1.36.0
V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";


// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo",
  temperature: 0.15,
  maxRetries: 3,
  timeout: 30000,
  callbacks: [
    {
      handleLLMEnd(output) {
        console.log(output)
        output.generations.map(generation => {
          generation.map(g => {
            // console.log(g.message.response_metadata.tokenUsage)
          })
        })
      },
    }
  ]
});

const prompt = ChatPromptTemplate.fromMessages([
        [
            'system',
            `You are a virtual agent`,
        ],
        new MessagesPlaceholder({
            variableName: 'chat_history',
            optional: true,
        }),
        ['user', '{input}'],
        new MessagesPlaceholder({
            variableName: 'agent_scratchpad',
            optional: false,
        }),
    ]);

const agent = await createOpenAIToolsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

const result = await agentExecutor.invoke({
  input: "what is LangChain?, describe it in a sentence",
});

console.log(result);

The output

{
  generations: [
    [
      ChatGenerationChunk {
        text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
        generationInfo: {
          prompt: 0,
          completion: 0,
          finish_reason: 'stop'
        },
        message: AIMessageChunk {
          lc_serializable: true,
          lc_kwargs: {
            content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
            additional_kwargs: {},
            response_metadata: {
              prompt: 0,
              completion: 0,
              finish_reason: 'stop'
            },
            tool_call_chunks: [],
            tool_calls: [],
            invalid_tool_calls: []
          },
          lc_namespace: [ 'langchain_core', 'messages' ],
          content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
          name: undefined,
          additional_kwargs: {},
          response_metadata: {
            prompt: 0,
            completion: 0,
            finish_reason: 'stop'
          },
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: []
        },
        __proto__: {
          constructor: ƒ ChatGenerationChunk(),
          concat: ƒ concat()
        }
      }
    ]
  ]
}

theTechGoose · 2024-04-30T13:23:48Z

I just wrote my own using the OpenAI api. The implementation is not that complex and you have more control, and don't have to wait over a year for someone else to fix.

…

-- Raphael Castro TED Team Monster Reservations Group C: 843.855.7133 www.monsterrg.com ( http://www.monsterrg.com/ )

On Tue, Apr 30, 2024 at 6:08 AM, Clovis Rodriguez < ***@***.*** > wrote: Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back *Package Version* : 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122 import { ChatOpenAI } from ***@***.***/openai"; import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts'; import { TavilySearchResults } from ***@***.***/community/tools/tavily_search"; import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents"; // Define the tools the agent will have access to. const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })]; const llm = new ChatOpenAI({ modelName: "gpt-4-turbo", temperature: 0.15, maxRetries: 3, timeout: 30000, callbacks: [ { handleLLMEnd(output) { console.log(output) output.generations.map(generation => { generation.map(g => { // console.log(g.message.response_metadata.tokenUsage) }) }) }, } ] }); const prompt = ChatPromptTemplate.fromMessages([ [ 'system', `You are a virtual agent`, ], new MessagesPlaceholder({ variableName: 'chat_history', optional: true, }), ['user', '{input}'], new MessagesPlaceholder({ variableName: 'agent_scratchpad', optional: false, }), ]); const agent = await createOpenAIToolsAgent({ llm, tools, prompt, }); const agentExecutor = new AgentExecutor({ agent, tools, }); const result = await agentExecutor.invoke({ input: "what is LangChain?, describe it in a sentence", }); console.log(result); The output { generations: [ [ ChatGenerationChunk { text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', generationInfo: { prompt: 0, completion: 0, finish_reason: 'stop' }, message: AIMessageChunk { lc_serializable: true, lc_kwargs: { content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', additional_kwargs: {}, response_metadata: { prompt: 0, completion: 0, finish_reason: 'stop' }, tool_call_chunks: [], tool_calls: [], invalid_tool_calls: [] }, lc_namespace: [ 'langchain_core', 'messages' ], content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.', name: undefined, additional_kwargs: {}, response_metadata: { prompt: 0, completion: 0, finish_reason: 'stop' }, tool_calls: [], invalid_tool_calls: [], tool_call_chunks: [] }, __proto__: { constructor: ƒ ChatGenerationChunk(), concat: ƒ concat() } } ] ] } — Reply to this email directly, view it on GitHub ( #965 (comment) ) , or unsubscribe ( https://github.com/notifications/unsubscribe-auth/ANK7G7CA6O7Y2M3PN67TZ73Y75URDAVCNFSM6AAAAAAXIXQAZ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBUHA4TKOBUGA ). You are receiving this because you authored the thread. Message ID: <langchain-ai/langchainjs/issues/965/2084895840 @ github. com>

gkhngyk · 2024-05-18T14:46:36Z

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version: 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";


// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo",
  temperature: 0.15,
  maxRetries: 3,
  timeout: 30000,
  callbacks: [
    {
      handleLLMEnd(output) {
        console.log(output)
        output.generations.map(generation => {
          generation.map(g => {
            // console.log(g.message.response_metadata.tokenUsage)
          })
        })
      },
    }
  ]
});

const prompt = ChatPromptTemplate.fromMessages([
        [
            'system',
            `You are a virtual agent`,
        ],
        new MessagesPlaceholder({
            variableName: 'chat_history',
            optional: true,
        }),
        ['user', '{input}'],
        new MessagesPlaceholder({
            variableName: 'agent_scratchpad',
            optional: false,
        }),
    ]);

const agent = await createOpenAIToolsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

const result = await agentExecutor.invoke({
  input: "what is LangChain?, describe it in a sentence",
});

console.log(result);

The output

{
  generations: [
    [
      ChatGenerationChunk {
        text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
        generationInfo: {
          prompt: 0,
          completion: 0,
          finish_reason: 'stop'
        },
        message: AIMessageChunk {
          lc_serializable: true,
          lc_kwargs: {
            content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
            additional_kwargs: {},
            response_metadata: {
              prompt: 0,
              completion: 0,
              finish_reason: 'stop'
            },
            tool_call_chunks: [],
            tool_calls: [],
            invalid_tool_calls: []
          },
          lc_namespace: [ 'langchain_core', 'messages' ],
          content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
          name: undefined,
          additional_kwargs: {},
          response_metadata: {
            prompt: 0,
            completion: 0,
            finish_reason: 'stop'
          },
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: []
        },
        __proto__: {
          constructor: ƒ ChatGenerationChunk(),
          concat: ƒ concat()
        }
      }
    ]
  ]
}

same issue

gkhngyk · 2024-05-18T15:14:00Z

@bracesproul Brace, I think the 0 token issue is a very serious problem, any chance you can look into it?

clovisrodriguez · 2024-05-21T03:52:42Z

Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:

import { encodingForModel } from 'js-tiktoken';

export class TokenCounter {
    private _totalTokens: number = 0;
    private _promptTokens: number = 0;
    private _completionTokens: number = 0;
    private _enc: any;

    constructor(model) {
        this._enc = encodingForModel(model);
    }

    encodeAndCountTokens(text: string): number {
        return this._enc.encode(text).length;
    }

    handleLLMEnd(result: any) {
        result.generations.forEach((generation: any) => {
            const content = generation[0]?.message?.text || '';
            const calls = generation[0]?.message?.additional_kwargs || '';
            console.log('Calls & Content:', {
                calls,
                content,
            });
            const output = JSON.stringify(calls, null, 2);
            const tokens = this.encodeAndCountTokens(content + output);
            this._completionTokens += tokens;
        });
        console.log('Tokens for this LLMEnd:', this._completionTokens);
    }

    handleChatModelStart(_, args) {
        args[0].forEach((arg) => {
            const content = arg?.content || '';
            const calls = arg?.additional_kwargs || '';

            const tokens = this.encodeAndCountTokens(
                content + JSON.stringify(calls, null, 2),
            );
            this._promptTokens += tokens;
            console.log('content:', content, calls);
        });

        console.log('Tokens for this ChatModelStart:', this._promptTokens);
    }

    modelTracer() {
        return {
            handleChatModelStart: this.handleChatModelStart.bind(this),
            handleLLMEnd: this.handleLLMEnd.bind(this),
        };
    }

    sumTokens() {
        this._totalTokens = this._promptTokens + this._completionTokens;
        console.log('Total Tokens:', this._totalTokens);
    }
}

gkhngyk · 2024-05-22T14:36:16Z

Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:

import { encodingForModel } from 'js-tiktoken';

export class TokenCounter {
    private _totalTokens: number = 0;
    private _promptTokens: number = 0;
    private _completionTokens: number = 0;
    private _enc: any;

    constructor(model) {
        this._enc = encodingForModel(model);
    }

    encodeAndCountTokens(text: string): number {
        return this._enc.encode(text).length;
    }

    handleLLMEnd(result: any) {
        result.generations.forEach((generation: any) => {
            const content = generation[0]?.message?.text || '';
            const calls = generation[0]?.message?.additional_kwargs || '';
            console.log('Calls & Content:', {
                calls,
                content,
            });
            const output = JSON.stringify(calls, null, 2);
            const tokens = this.encodeAndCountTokens(content + output);
            this._completionTokens += tokens;
        });
        console.log('Tokens for this LLMEnd:', this._completionTokens);
    }

    handleChatModelStart(_, args) {
        args[0].forEach((arg) => {
            const content = arg?.content || '';
            const calls = arg?.additional_kwargs || '';

            const tokens = this.encodeAndCountTokens(
                content + JSON.stringify(calls, null, 2),
            );
            this._promptTokens += tokens;
            console.log('content:', content, calls);
        });

        console.log('Tokens for this ChatModelStart:', this._promptTokens);
    }

    modelTracer() {
        return {
            handleChatModelStart: this.handleChatModelStart.bind(this),
            handleLLMEnd: this.handleLLMEnd.bind(this),
        };
    }

    sumTokens() {
        this._totalTokens = this._promptTokens + this._completionTokens;
        console.log('Total Tokens:', this._totalTokens);
    }
}

I will try your solution as soon as possible, thank you very much.

The problem is that langsmith often shows 0 tokens. This makes a very important functionality of langsmith unusable due to this problem in Langchain. I hope @bracesproul or @jacoblee93 will look into this issue.

jacoblee93 · 2024-05-22T18:09:45Z

Yes, will fix this as OpenAI recently added support. There is an open PR here #5485

zaiddabaeen · 2024-06-02T01:25:32Z

Hey @jacoblee93. I just tested Release 0.2.4 and it still does not show the token usage when using RunnableSequence.

The code:

    const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 });
    const vectorStore = await FaissStore.load(`data/search_index_${projectId}.pkl`, new OpenAIEmbeddings());
    const vectorStoreRetriever = vectorStore.asRetriever();

    const SYSTEM_TEMPLATE = `...`;
    const messages = [
      SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
      HumanMessagePromptTemplate.fromTemplate("{question}"),
    ];
    const prompt = ChatPromptTemplate.fromMessages(messages);

    const chain = RunnableSequence.from([
    {
      sourceDocuments: RunnableSequence.from([
        (input) => input.question,
        vectorStoreRetriever,
      ]),
      question: (input) => input.question,
    },
    {
      sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
      question: (previousStepResult) => previousStepResult.question,
      context: (previousStepResult) =>
        formatDocumentsAsString(previousStepResult.sourceDocuments),
    },
    {
      result: prompt.pipe(llm).pipe(new StringOutputParser()),
      sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
    },
  ]);

  return await chain.stream({question: question}, {
    callbacks: [
      {
        handleLLMEnd(output: LLMResult, runId: string, parentRunId?: string, tags?: string[]): any {
          output.generations.map((g) => console.log(JSON.stringify(g, null, 2)));
        }
      }
    ]
  });

The output is as follows:

 [
   {
     "text": "<the loooong answer goes here>",
     "generationInfo": {
       "prompt": 0,
       "completion": 0,
       "finish_reason": "stop"
     },
     "message": {
       "lc": 1,
       "type": "constructor",
       "id": [
         "langchain_core",
         "messages",
         "AIMessageChunk"
       ],
       "kwargs": {
         "content": "<the loooong answer goes here>",
         "additional_kwargs": {},
         "response_metadata": {
           "prompt": 0,
           "completion": 0,
           "finish_reason": "stop"
         },
         "tool_call_chunks": [],
         "tool_calls": [],
         "invalid_tool_calls": []
       }
     }
   }
 ]

Am I looking for the token count in the wrong place? Or has it not been implemented yet to provide the token count at handleLLMEnd callback?

jacoblee93 · 2024-06-02T04:09:48Z

Can you verify you're on latest version of core and LangChain OpenAI?

https://js.langchain.com/v0.2/docs/how_to/installation/#installing-integration-packages

Otherwise will check tomorrow

zaiddabaeen · 2024-06-03T23:04:43Z

Yes definitely:

❯ npm list
aidocs@1.0.0 ...
├── @langchain/community@0.2.5
├── @langchain/core@0.2.5
├── @langchain/openai@0.1.1
...
├── langchain@0.2.4
...

All these packages are on latest.

niztal · 2024-06-07T02:59:44Z

I see openai just released an update for that

https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response

and seems like it was already done via this PR

rrichc · 2024-06-07T22:31:38Z

EDIT: I should add that I'm use the Langchain Agents. I'm guessing support for token usage hasn't reached that yet

Unfortunately I am also the latest packages, and get a 0 token count even for the last chunk that is supposed to contain usages. Zero counts happening for the handleLLMEnd callback, last message of .streamEvents and .invoke response

{
  "generations": [
    [
      {
        "text": "Hi there! How can I assist you today?",
        "generationInfo": {
          "prompt": 0,
          "completion": 0,
          "finish_reason": "stop"
        },
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "Hi there! How can I assist you today?",
            "additional_kwargs": {},
            "response_metadata": {
              "prompt": 0,
              "completion": 0,
              "finish_reason": "stop"
            },
            "tool_call_chunks": [],
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ]
}

{
  "generations": [
    [
      {
        "text": "removed for readability",
        "generationInfo": {
          "prompt": 0,
          "completion": 0,
          "finish_reason": "stop"
        },
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "removed for readability",
            "additional_kwargs": {},
            "response_metadata": {
              "prompt": 0,
              "completion": 0,
              "finish_reason": "stop"
            },
            "tool_call_chunks": [],
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ]
}

  ├── @langchain/community@0.2.9
  ├── @langchain/core@0.2.6
  ├── @langchain/openai@0.1.2
    ├── langchain@0.2.5

jacoblee93 · 2024-06-08T00:33:12Z

Ah @rrichc, @zaiddabaeen, @gkhngyk and others, you'll need to pass stream_options like this:

  const response = await model.stream("Hello, how are you?", {
    stream_options: {
      include_usage: true,
    },
  });

You can also bind it to the model like this:

  const model = model.bind({
    stream_options: {
      include_usage: true,
    },
  });

import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { formatDocumentsAsString } from "langchain/util/document";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 }).bind({
  stream_options: {
    include_usage: true
  }
});
const vectorStore = await MemoryVectorStore.fromTexts([], [], new OpenAIEmbeddings());
const vectorStoreRetriever = vectorStore.asRetriever();

const SYSTEM_TEMPLATE = `You are a pro at responding to questions.`;
const prompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const chain = RunnableSequence.from([
{
  sourceDocuments: RunnableSequence.from([
    (input) => input.question,
    vectorStoreRetriever,
  ]),
  question: (input) => input.question,
},
{
  sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
  question: (previousStepResult) => previousStepResult.question,
  context: (previousStepResult) =>
    formatDocumentsAsString(previousStepResult.sourceDocuments),
},
{
  result: prompt.pipe(llm).pipe(new StringOutputParser()),
  sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
},
]);

const stream = await chain.stream({question: "Who is this about?"}, {
  callbacks: [
    {
      handleLLMEnd(output: any, runId: string, parentRunId?: string, tags?: string[]): any {
        output.generations.map((g: any) => console.log(JSON.stringify(g, null, 2)));
        /*
          [
            {
              "text": "I'm here to provide information and assistance on a wide range of topics. Feel free to ask me anything you'd like to know more about!",
              "generationInfo": {
                "prompt": 0,
                "completion": 0,
                "finish_reason": "stop"
              },
              "message": {
                "lc": 1,
                "type": "constructor",
                "id": [
                  "langchain_core",
                  "messages",
                  "AIMessageChunk"
                ],
                "kwargs": {
                  "content": "I'm here to provide information and assistance on a wide range of topics. Feel free to ask me anything you'd like to know more about!",
                  "additional_kwargs": {},
                  "response_metadata": {
                    "prompt": 0,
                    "completion": 0,
                    "finish_reason": "stop"
                  },
                  "tool_call_chunks": [],
                  "usage_metadata": {
                    "input_tokens": 25,
                    "output_tokens": 29,
                    "total_tokens": 54
                  },
                  "tool_calls": [],
                  "invalid_tool_calls": []
                }
              }
            }
          ]
        */
      }
    }
  ]
});

for await (const chunk of stream) {
  // console.log(chunk);
}

We don't pass this through by default - maybe we should reconsider CC @ccurme.

We should also fix that misleading blank response_metadata field.

jacoblee93 · 2024-06-08T00:49:08Z

Ah, never mind. That data does not refer to usage. Will close this, please reopen if the above setting doesn't fix.

Docs are here: https://js.langchain.com/v0.2/docs/how_to/chat_token_usage_tracking#openai-2

rrichc · 2024-06-08T03:33:49Z

@jacoblee93 Is there a proper way to pass stream_options to streamEvents? Like

      const eventStream = await this.agent.streamEvents(
        {
          input: input,
          chat_history: await this.zepMemory.chatHistory.getMessages(),
          signal: signal,
          stream_options: {
            include_usage: true,
          },
        },
        {
          version: "v1",
          callbacks: agentCallbacks,
        }
      );

I'd also like to use the model.bind way, but after binding I'm left with a runnable which I can't feed into createToolCallingAgent which expects a BaseChatModel, since I'm still using AgentExecutor and haven't been able to migrate to LCEL yet

deranga · 2024-06-13T17:44:45Z

As a temproary fix @rrichc, its possible to apply this patch and then set stream_options when instantiating the ChatOpenAI class.

@langchain+openai+0.1.3.patch

const llm = new ChatOpenAI({
      model: "",
      temperature: 0,
      streaming: true,
      verbose: true,
      stream_options: {
        include_usage: true,
      },
    });

const agent = await createOpenAIToolsAgent({
      llm,
      tools,
      prompt,
    });

jacoblee93 · 2024-06-13T18:16:26Z

Oh dear. Yeah I will fix that typing - sorry about this!

That patch would be a good PR too if you're willing to push one up @deranga, thank you for making it.

deranga · 2024-06-14T08:26:14Z

@jacoblee93, I've just created a PR as mentioned above.

akmaldira · 2024-07-04T11:59:48Z

@clovisrodriguez

Have you solve this?

and btw that provide a breakdown of token usage for each step in an Agent Executor?

miekassu mentioned this issue Apr 28, 2023

TokenUsage missing from LLMResult when modelName is defined #1032

Closed

jacoblee93 mentioned this issue May 22, 2024

openai[minor]: Send token usage stats when streaming #5485

Closed

jacoblee93 closed this as completed Jun 8, 2024

deranga mentioned this issue Jun 14, 2024

openai[minor]: OpenAI - Ability to set stream_options and streamUsage when instantiating ChatOpenAI class. #5761

Merged

How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain #965

How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain #965

Comments

theTechGoose commented Apr 23, 2023

miekassu commented Apr 28, 2023 • edited Loading

theTechGoose commented Apr 28, 2023 via email

ciocan commented May 4, 2023

ibrahimyaacob92 commented Jul 2, 2023

pond918 commented Jul 9, 2023

j1philli commented Jul 11, 2023

fvisticot commented Aug 1, 2023

nikorter commented Aug 14, 2023

mchalapuk commented Aug 17, 2023 • edited Loading

MirzaHasnat commented Sep 12, 2023 • edited Loading

ankitruong commented Sep 23, 2023 • edited Loading

liowalex commented Sep 27, 2023 • edited Loading

jwilger commented Oct 9, 2023

girithodu commented Dec 12, 2023

brokenfiles commented Mar 8, 2024

rutwikpulseenergy commented Apr 17, 2024

rutwikpulseenergy commented Apr 24, 2024 • edited Loading

rutwikpulseenergy commented Apr 25, 2024

clovisrodriguez commented Apr 30, 2024

theTechGoose commented Apr 30, 2024 via email

gkhngyk commented May 18, 2024

gkhngyk commented May 18, 2024

clovisrodriguez commented May 21, 2024

gkhngyk commented May 22, 2024

jacoblee93 commented May 22, 2024

zaiddabaeen commented Jun 2, 2024 • edited Loading

jacoblee93 commented Jun 2, 2024

zaiddabaeen commented Jun 3, 2024 • edited Loading

niztal commented Jun 7, 2024 • edited Loading

rrichc commented Jun 7, 2024 • edited Loading

jacoblee93 commented Jun 8, 2024

jacoblee93 commented Jun 8, 2024

rrichc commented Jun 8, 2024

deranga commented Jun 13, 2024

jacoblee93 commented Jun 13, 2024 • edited Loading

deranga commented Jun 14, 2024

akmaldira commented Jul 4, 2024 • edited Loading

miekassu commented Apr 28, 2023 •

edited

Loading

mchalapuk commented Aug 17, 2023 •

edited

Loading

MirzaHasnat commented Sep 12, 2023 •

edited

Loading

ankitruong commented Sep 23, 2023 •

edited

Loading

liowalex commented Sep 27, 2023 •

edited

Loading

rutwikpulseenergy commented Apr 24, 2024 •

edited

Loading

zaiddabaeen commented Jun 2, 2024 •

edited

Loading

zaiddabaeen commented Jun 3, 2024 •

edited

Loading

niztal commented Jun 7, 2024 •

edited

Loading

rrichc commented Jun 7, 2024 •

edited

Loading

jacoblee93 commented Jun 13, 2024 •

edited

Loading

akmaldira commented Jul 4, 2024 •

edited

Loading