Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get token count, callbacks work but just for ChatOpenAI but not for RetrievalQAChain #965

Closed
theTechGoose opened this issue Apr 23, 2023 · 37 comments · Fixed by #5761

Comments

@theTechGoose
Copy link

I am trying to get a token count for a process, I am passing callbacks to the class initialization like this

let finalTokens = 0
const initPayload = {
  openAIApiKey: process.env['OPEN_AI_KEY'],
  temperature: 1.5,
  callbacks: [
    {
      handleLLMEnd: (val) => {
        try {
          const tokens = val.llmOutput.tokenUsage.totalTokens
          finalTokens += tokens
          console.log({tokens, finalTokens})
        } catch {
          console.log(val.generations[0])
        }
      },
    },
  ],
};

However all of the calls from a RetrievalQAChain end up in the catch portion of that try-catch block as 'tokenUsage' does not exist for those calls. Can someone point me in the right direction?

@miekassu
Copy link

miekassu commented Apr 28, 2023

I'm having a similar issue.
When I define gpt-3.5-turbo as the model for OpenAI construct, llmOutput is missing tokenUsage object.

Using the same construct, but not defining the model returns token usage as part of llmOutput

Not working:

  const model = new OpenAI({
    openAIApiKey: openAISecret,
    modelName: 'gpt-3.5-turbo',
    callbacks: [
      {
        handleLLMEnd: async (output: LLMResult) => {
          logger.info('output', { output })
          logger.info('tokenUsage', { tokenUsage: output.llmOutput })
          // tokenUsage: UNDEFINED
        },
      },
    ],
  })

Working:

  const model = new OpenAI({
    openAIApiKey: openAISecret,
    callbacks: [
      {
        handleLLMEnd: async (output: LLMResult) => {
          logger.info('output', { output })
          logger.info('tokenUsage', { tokenUsage: output.llmOutput })
          // tokenUsage: found
        },
      },
    ],
  })

I added own issue for this

@theTechGoose
Copy link
Author

theTechGoose commented Apr 28, 2023 via email

@ciocan
Copy link

ciocan commented May 4, 2023

Use it this way:

import { ChatOpenAI } from "langchain/chat_models/openai";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo" });

I found that importing it like that returns the tokenUsage in handleLLMEnd handler

@ibrahimyaacob92
Copy link

still waiting for the solution on this

@pond918
Copy link

pond918 commented Jul 9, 2023

tested with azure API:

curl -X POST -H 'Content-type: application/json' -H 'User-Agent: OpenAI/NodeJS/3.3.0' -H 'api-key: xxxxx' -H --data '{"model":"gpt-3.5-turbo","temperature":0.7,"top_p":1,"frequency_penalty":0,"presence_penalty":0,"n":1,"stream":false,"messages":[{"role":"user","content":"!"}]}' https://{azureApiInstanceName}.openai.azure.com/openai/deployments/{azureOpenAIApiDeploymentName}/chat/completi ns\?api-version=2023-05-15

stream=false gets usage data, works as expected.
stream=true result "usage":null

@j1philli
Copy link

I am also running into this. There doesn’t seem to be anyway to grab cost or at least token usage when calling chains or agents. Having and output after a chain or agent finishes with total usage would be great

@fvisticot
Copy link

Same issue, tokenUsage is not returned when using OpenAI() model.

@nikorter
Copy link

same problem here, when streaming is set to true it doesn't return token usage. Any idea for workaround?

@mchalapuk
Copy link

mchalapuk commented Aug 17, 2023

Hello everyone,

I recently started working on a stealth startup, and I'm using langchainjs as a core component of our tech stack. I must say, I've been impressed with the work done here! Thank you so much for all your hard work on this project, and for providing tools that startups like mine can rely on!

While integrating the library, I noticed the problem of lack of token statistics when using ChatOpenAi in streaming mode. I did some digging in the code and I believe I found the source of the problem.

In _generate method of ChatOpenAi class it's expected that data.usage contains completion_tokens, prompt_tokens, and total_tokens fields which are later copied to tokenUsage. When ChatOpenAi is instantiated with streaming: false, response.data field from the call to OpenAIApi.createChatCompletion is returned as data. response.data is an instance of Completion which indeed contains usage with the required fields. That's why token usage works with streaming: false. When ChatOpenAi is instantiated with streaming: true, response object is not created by the OpenAIApi but in the code of _generate instead. This branch of the implementation doesn't set the usage field at all.

I believe that adding the required fields, using the .getNumTokensFromMessages(...) might address this.

// EDIT 2023-08-17 8:50 CET

I did some more digging. It's not as simple as I thought. Using .getNumTokensFromMessages(...) would introduce two more calls to OpenAI API. Using it to get tokenUsage for each call with streaming: true would introduce additional cost for all users of the library even if they don't care about token usage.

It turns out that the original langchain implementation has the same problem. When streaming=True, The ChatResult instance is created without llm_output field which contains token usage stats.

Both implementations are actually correct as the source of problem lies within the OpenAI API. When streaming is enabled the token usage statistics are not being sent to the client at all. What is being sent is a stream of chat.completion.chunk objects that don't contain any token information.

@MirzaHasnat
Copy link

MirzaHasnat commented Sep 12, 2023

Did anyone find solution for this?

@ankitruong
Copy link

ankitruong commented Sep 23, 2023

I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ...." \
  -d '{
    "model": "gpt-3.5-turbo",         
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'
{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}'

I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.

// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new 
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });

const prompt = PromptTemplate.fromTemplate(
  "What is a good name for a company that makes {product}?"
);

const chain = new LLMChain({ llm: model, prompt }); 

const resA2 = await chain.run("colorful socks", {callbacks: [{
  handleLLMEnd: (output, runId, parentRunId?, tags?) => {
    const { completionTokens, promptTokens, totalTokens } =
      output.llmOutput?.tokenUsage; 
    console.log(completionTokens ?? 0);
    console.log(promptTokens ?? 0);
    console.log(totalTokens ?? 0);
    
// "llmOutput": {
//     "tokenUsage": {
//       "completionTokens": 3,
//       "promptTokens": 20,
//       "totalTokens": 23
//     }
//   }
  },
}]});

@liowalex
Copy link

liowalex commented Sep 27, 2023

I managed to count tokens for streaming: true by using callbacks:

const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
    query: query,
  }, {
    callbacks: [
      {
        handleChatModelStart: async (llm, messages) => {
          const tokenCount = tokenCounter(messages[0][0].content);
          // The prompt is available here: messages[0][0].content
        },
        handleChainEnd: async (outputs) => {
          const { text: outputText } = outputs;
          // outputText is the response from the chat call
          const tokenCount = tokenCounter(outputText);
        }
      }
    ]
  }
);

@jwilger
Copy link

jwilger commented Oct 9, 2023

I managed to count tokens for streaming: true by using callbacks:

const model = new ChatOpenAI({ modelName: "gpt-3.5-turbo", streaming: true });
const chain = new LLMChain({ llm: model, prompt })
const { text: assistantResponse } = await chain.call({
    query: query,
  }, {
    callbacks: [
      {
        handleChatModelStart: async (llm, messages) => {
          const tokenCount = tokenCounter(messages[0][0].content);
          // The prompt is available here: messages[0][0].content
        },
        handleChainEnd: async (outputs) => {
          const { text: outputText } = outputs;
          // outputText is the response from the chat call
          const tokenCount = tokenCounter(outputText);
        }
      }
    ]
  }
);

Doesn't that only account for the initial prompt and the final response (not any intermediate calls for functions, etc)?

@girithodu
Copy link

I think the reason is that the GPT-3.5-turbo model can only be used for Chat models.

curl https://api.openai.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ...." \
  -d '{
    "model": "gpt-3.5-turbo",         
    "prompt": "Say this is a test",
    "max_tokens": 7,
    "temperature": 0
  }'
{
  "error": {
    "message": "This is a chat model and not supported in the v1/completions endpoint. Did you mean to use v1/chat/completions?",
    "type": "invalid_request_error",
    "param": "model",
    "code": null
  }
}'

I had to update my old code from 'OpenAI' to 'ChatOpenAI', and that fixed the issue.

// old
// const model = new OpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });
// new 
const model = new ChatOpenAI({ temperature: 0, openAIApiKey: KEY, modelName: "gpt-3.5-turbo" });

const prompt = PromptTemplate.fromTemplate(
  "What is a good name for a company that makes {product}?"
);

const chain = new LLMChain({ llm: model, prompt }); 

const resA2 = await chain.run("colorful socks", {callbacks: [{
  handleLLMEnd: (output, runId, parentRunId?, tags?) => {
    const { completionTokens, promptTokens, totalTokens } =
      output.llmOutput?.tokenUsage; 
    console.log(completionTokens ?? 0);
    console.log(promptTokens ?? 0);
    console.log(totalTokens ?? 0);
    
// "llmOutput": {
//     "tokenUsage": {
//       "completionTokens": 3,
//       "promptTokens": 20,
//       "totalTokens": 23
//     }
//   }
  },
}]});

This solved the issue for me too!

@brokenfiles
Copy link

Any news on this?

I still get an empty object for the token usage with streaming mode enabled.

@rutwikpulseenergy
Copy link

Hi Thread, I am using typescript sdk of langchain. I am still receiving 0 token count. Can you please here ?

@rutwikpulseenergy
Copy link

rutwikpulseenergy commented Apr 24, 2024

@jacoblee93 Any help here ?

@rutwikpulseenergy
Copy link

@hwchase17 @nfcampos @bracesproul @sullivan-sean Any help here ?

@clovisrodriguez
Copy link

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version: 1.36.0
V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";


// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo",
  temperature: 0.15,
  maxRetries: 3,
  timeout: 30000,
  callbacks: [
    {
      handleLLMEnd(output) {
        console.log(output)
        output.generations.map(generation => {
          generation.map(g => {
            // console.log(g.message.response_metadata.tokenUsage)
          })
        })
      },
    }
  ]
});

const prompt = ChatPromptTemplate.fromMessages([
        [
            'system',
            `You are a virtual agent`,
        ],
        new MessagesPlaceholder({
            variableName: 'chat_history',
            optional: true,
        }),
        ['user', '{input}'],
        new MessagesPlaceholder({
            variableName: 'agent_scratchpad',
            optional: false,
        }),
    ]);

const agent = await createOpenAIToolsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

const result = await agentExecutor.invoke({
  input: "what is LangChain?, describe it in a sentence",
});

console.log(result);

The output

{
  generations: [
    [
      ChatGenerationChunk {
        text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
        generationInfo: {
          prompt: 0,
          completion: 0,
          finish_reason: 'stop'
        },
        message: AIMessageChunk {
          lc_serializable: true,
          lc_kwargs: {
            content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
            additional_kwargs: {},
            response_metadata: {
              prompt: 0,
              completion: 0,
              finish_reason: 'stop'
            },
            tool_call_chunks: [],
            tool_calls: [],
            invalid_tool_calls: []
          },
          lc_namespace: [ 'langchain_core', 'messages' ],
          content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
          name: undefined,
          additional_kwargs: {},
          response_metadata: {
            prompt: 0,
            completion: 0,
            finish_reason: 'stop'
          },
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: []
        },
        __proto__: {
          constructor: ƒ ChatGenerationChunk(),
          concat: ƒ concat()
        }
      }
    ]
  ]
}

@theTechGoose
Copy link
Author

theTechGoose commented Apr 30, 2024 via email

@gkhngyk
Copy link
Contributor

gkhngyk commented May 18, 2024

Yes I'm experimenting the same issue here, the token counter it seems not to be working for agents, I'm getting all token counts on 0 here I paste the code I'm using and the log I'm getting back

Package Version: 1.36.0 V8 and Chromium: Node: 20.9.0; Chromium: 122

import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate, MessagesPlaceholder } from 'langchain/prompts';
import { TavilySearchResults } from "@langchain/community/tools/tavily_search";
import { AgentExecutor, createOpenAIToolsAgent } from "langchain/agents";


// Define the tools the agent will have access to.
const tools = [new TavilySearchResults({ maxResults: 1, apiKey: 'MY-API-KEY' })];

const llm = new ChatOpenAI({
  modelName: "gpt-4-turbo",
  temperature: 0.15,
  maxRetries: 3,
  timeout: 30000,
  callbacks: [
    {
      handleLLMEnd(output) {
        console.log(output)
        output.generations.map(generation => {
          generation.map(g => {
            // console.log(g.message.response_metadata.tokenUsage)
          })
        })
      },
    }
  ]
});

const prompt = ChatPromptTemplate.fromMessages([
        [
            'system',
            `You are a virtual agent`,
        ],
        new MessagesPlaceholder({
            variableName: 'chat_history',
            optional: true,
        }),
        ['user', '{input}'],
        new MessagesPlaceholder({
            variableName: 'agent_scratchpad',
            optional: false,
        }),
    ]);

const agent = await createOpenAIToolsAgent({
  llm,
  tools,
  prompt,
});

const agentExecutor = new AgentExecutor({
  agent,
  tools,
});

const result = await agentExecutor.invoke({
  input: "what is LangChain?, describe it in a sentence",
});

console.log(result);

The output

{
  generations: [
    [
      ChatGenerationChunk {
        text: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
        generationInfo: {
          prompt: 0,
          completion: 0,
          finish_reason: 'stop'
        },
        message: AIMessageChunk {
          lc_serializable: true,
          lc_kwargs: {
            content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
            additional_kwargs: {},
            response_metadata: {
              prompt: 0,
              completion: 0,
              finish_reason: 'stop'
            },
            tool_call_chunks: [],
            tool_calls: [],
            invalid_tool_calls: []
          },
          lc_namespace: [ 'langchain_core', 'messages' ],
          content: 'LangChain is a software library designed to facilitate the development of applications that integrate language models, providing tools and frameworks to streamline the process of building AI-powered language understanding and generation features.',
          name: undefined,
          additional_kwargs: {},
          response_metadata: {
            prompt: 0,
            completion: 0,
            finish_reason: 'stop'
          },
          tool_calls: [],
          invalid_tool_calls: [],
          tool_call_chunks: []
        },
        __proto__: {
          constructor: ƒ ChatGenerationChunk(),
          concat: ƒ concat()
        }
      }
    ]
  ]
}

same issue

@gkhngyk
Copy link
Contributor

gkhngyk commented May 18, 2024

@bracesproul Brace, I think the 0 token issue is a very serious problem, any chance you can look into it?

@clovisrodriguez
Copy link

Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:

import { encodingForModel } from 'js-tiktoken';

export class TokenCounter {
    private _totalTokens: number = 0;
    private _promptTokens: number = 0;
    private _completionTokens: number = 0;
    private _enc: any;

    constructor(model) {
        this._enc = encodingForModel(model);
    }

    encodeAndCountTokens(text: string): number {
        return this._enc.encode(text).length;
    }

    handleLLMEnd(result: any) {
        result.generations.forEach((generation: any) => {
            const content = generation[0]?.message?.text || '';
            const calls = generation[0]?.message?.additional_kwargs || '';
            console.log('Calls & Content:', {
                calls,
                content,
            });
            const output = JSON.stringify(calls, null, 2);
            const tokens = this.encodeAndCountTokens(content + output);
            this._completionTokens += tokens;
        });
        console.log('Tokens for this LLMEnd:', this._completionTokens);
    }

    handleChatModelStart(_, args) {
        args[0].forEach((arg) => {
            const content = arg?.content || '';
            const calls = arg?.additional_kwargs || '';

            const tokens = this.encodeAndCountTokens(
                content + JSON.stringify(calls, null, 2),
            );
            this._promptTokens += tokens;
            console.log('content:', content, calls);
        });

        console.log('Tokens for this ChatModelStart:', this._promptTokens);
    }

    modelTracer() {
        return {
            handleChatModelStart: this.handleChatModelStart.bind(this),
            handleLLMEnd: this.handleLLMEnd.bind(this),
        };
    }

    sumTokens() {
        this._totalTokens = this._promptTokens + this._completionTokens;
        console.log('Total Tokens:', this._totalTokens);
    }
}

@gkhngyk
Copy link
Contributor

gkhngyk commented May 22, 2024

Hey community I create this counter, it might not be perfect, but I tested againts Langsmith and it gets a pretty close count, if you have any ideas to improve it everyone are more than welcome to improve it, I hope you find it useful:

import { encodingForModel } from 'js-tiktoken';

export class TokenCounter {
    private _totalTokens: number = 0;
    private _promptTokens: number = 0;
    private _completionTokens: number = 0;
    private _enc: any;

    constructor(model) {
        this._enc = encodingForModel(model);
    }

    encodeAndCountTokens(text: string): number {
        return this._enc.encode(text).length;
    }

    handleLLMEnd(result: any) {
        result.generations.forEach((generation: any) => {
            const content = generation[0]?.message?.text || '';
            const calls = generation[0]?.message?.additional_kwargs || '';
            console.log('Calls & Content:', {
                calls,
                content,
            });
            const output = JSON.stringify(calls, null, 2);
            const tokens = this.encodeAndCountTokens(content + output);
            this._completionTokens += tokens;
        });
        console.log('Tokens for this LLMEnd:', this._completionTokens);
    }

    handleChatModelStart(_, args) {
        args[0].forEach((arg) => {
            const content = arg?.content || '';
            const calls = arg?.additional_kwargs || '';

            const tokens = this.encodeAndCountTokens(
                content + JSON.stringify(calls, null, 2),
            );
            this._promptTokens += tokens;
            console.log('content:', content, calls);
        });

        console.log('Tokens for this ChatModelStart:', this._promptTokens);
    }

    modelTracer() {
        return {
            handleChatModelStart: this.handleChatModelStart.bind(this),
            handleLLMEnd: this.handleLLMEnd.bind(this),
        };
    }

    sumTokens() {
        this._totalTokens = this._promptTokens + this._completionTokens;
        console.log('Total Tokens:', this._totalTokens);
    }
}

I will try your solution as soon as possible, thank you very much.

The problem is that langsmith often shows 0 tokens. This makes a very important functionality of langsmith unusable due to this problem in Langchain. I hope @bracesproul or @jacoblee93 will look into this issue.

@jacoblee93
Copy link
Collaborator

Yes, will fix this as OpenAI recently added support. There is an open PR here #5485

@zaiddabaeen
Copy link

zaiddabaeen commented Jun 2, 2024

Hey @jacoblee93. I just tested Release 0.2.4 and it still does not show the token usage when using RunnableSequence.

The code:

    const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 });
    const vectorStore = await FaissStore.load(`data/search_index_${projectId}.pkl`, new OpenAIEmbeddings());
    const vectorStoreRetriever = vectorStore.asRetriever();

    const SYSTEM_TEMPLATE = `...`;
    const messages = [
      SystemMessagePromptTemplate.fromTemplate(SYSTEM_TEMPLATE),
      HumanMessagePromptTemplate.fromTemplate("{question}"),
    ];
    const prompt = ChatPromptTemplate.fromMessages(messages);

    const chain = RunnableSequence.from([
    {
      sourceDocuments: RunnableSequence.from([
        (input) => input.question,
        vectorStoreRetriever,
      ]),
      question: (input) => input.question,
    },
    {
      sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
      question: (previousStepResult) => previousStepResult.question,
      context: (previousStepResult) =>
        formatDocumentsAsString(previousStepResult.sourceDocuments),
    },
    {
      result: prompt.pipe(llm).pipe(new StringOutputParser()),
      sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
    },
  ]);

  return await chain.stream({question: question}, {
    callbacks: [
      {
        handleLLMEnd(output: LLMResult, runId: string, parentRunId?: string, tags?: string[]): any {
          output.generations.map((g) => console.log(JSON.stringify(g, null, 2)));
        }
      }
    ]
  });

The output is as follows:

 [
   {
     "text": "<the loooong answer goes here>",
     "generationInfo": {
       "prompt": 0,
       "completion": 0,
       "finish_reason": "stop"
     },
     "message": {
       "lc": 1,
       "type": "constructor",
       "id": [
         "langchain_core",
         "messages",
         "AIMessageChunk"
       ],
       "kwargs": {
         "content": "<the loooong answer goes here>",
         "additional_kwargs": {},
         "response_metadata": {
           "prompt": 0,
           "completion": 0,
           "finish_reason": "stop"
         },
         "tool_call_chunks": [],
         "tool_calls": [],
         "invalid_tool_calls": []
       }
     }
   }
 ]

Am I looking for the token count in the wrong place? Or has it not been implemented yet to provide the token count at handleLLMEnd callback?

@jacoblee93
Copy link
Collaborator

Can you verify you're on latest version of core and LangChain OpenAI?

https://js.langchain.com/v0.2/docs/how_to/installation/#installing-integration-packages

Otherwise will check tomorrow

@zaiddabaeen
Copy link

zaiddabaeen commented Jun 3, 2024

Yes definitely:

❯ npm list
aidocs@1.0.0 ...
├── @langchain/community@0.2.5
├── @langchain/core@0.2.5
├── @langchain/openai@0.1.1
...
├── langchain@0.2.4
...

All these packages are on latest.

@niztal
Copy link

niztal commented Jun 7, 2024

I see openai just released an update for that

https://cookbook.openai.com/examples/how_to_stream_completions#4-how-to-get-token-usage-data-for-streamed-chat-completion-response

and seems like it was already done via this PR

@rrichc
Copy link

rrichc commented Jun 7, 2024

EDIT: I should add that I'm use the Langchain Agents. I'm guessing support for token usage hasn't reached that yet

Unfortunately I am also the latest packages, and get a 0 token count even for the last chunk that is supposed to contain usages. Zero counts happening for the handleLLMEnd callback, last message of .streamEvents and .invoke response

{
  "generations": [
    [
      {
        "text": "Hi there! How can I assist you today?",
        "generationInfo": {
          "prompt": 0,
          "completion": 0,
          "finish_reason": "stop"
        },
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "Hi there! How can I assist you today?",
            "additional_kwargs": {},
            "response_metadata": {
              "prompt": 0,
              "completion": 0,
              "finish_reason": "stop"
            },
            "tool_call_chunks": [],
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ]
}
{
  "generations": [
    [
      {
        "text": "removed for readability",
        "generationInfo": {
          "prompt": 0,
          "completion": 0,
          "finish_reason": "stop"
        },
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain_core",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "removed for readability",
            "additional_kwargs": {},
            "response_metadata": {
              "prompt": 0,
              "completion": 0,
              "finish_reason": "stop"
            },
            "tool_call_chunks": [],
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ]
}
  ├── @langchain/community@0.2.9
  ├── @langchain/core@0.2.6
  ├── @langchain/openai@0.1.2
    ├── langchain@0.2.5

@jacoblee93
Copy link
Collaborator

Ah @rrichc, @zaiddabaeen, @gkhngyk and others, you'll need to pass stream_options like this:

  const response = await model.stream("Hello, how are you?", {
    stream_options: {
      include_usage: true,
    },
  });

You can also bind it to the model like this:

  const model = model.bind({
    stream_options: {
      include_usage: true,
    },
  });
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { RunnableSequence } from "@langchain/core/runnables";
import { ChatOpenAI, OpenAIEmbeddings } from "@langchain/openai";
import { formatDocumentsAsString } from "langchain/util/document";
import { MemoryVectorStore } from "langchain/vectorstores/memory";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.0 }).bind({
  stream_options: {
    include_usage: true
  }
});
const vectorStore = await MemoryVectorStore.fromTexts([], [], new OpenAIEmbeddings());
const vectorStoreRetriever = vectorStore.asRetriever();

const SYSTEM_TEMPLATE = `You are a pro at responding to questions.`;
const prompt = ChatPromptTemplate.fromMessages([
  ["system", SYSTEM_TEMPLATE],
  ["human", "{question}"],
]);

const chain = RunnableSequence.from([
{
  sourceDocuments: RunnableSequence.from([
    (input) => input.question,
    vectorStoreRetriever,
  ]),
  question: (input) => input.question,
},
{
  sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
  question: (previousStepResult) => previousStepResult.question,
  context: (previousStepResult) =>
    formatDocumentsAsString(previousStepResult.sourceDocuments),
},
{
  result: prompt.pipe(llm).pipe(new StringOutputParser()),
  sourceDocuments: (previousStepResult) => previousStepResult.sourceDocuments,
},
]);

const stream = await chain.stream({question: "Who is this about?"}, {
  callbacks: [
    {
      handleLLMEnd(output: any, runId: string, parentRunId?: string, tags?: string[]): any {
        output.generations.map((g: any) => console.log(JSON.stringify(g, null, 2)));
        /*
          [
            {
              "text": "I'm here to provide information and assistance on a wide range of topics. Feel free to ask me anything you'd like to know more about!",
              "generationInfo": {
                "prompt": 0,
                "completion": 0,
                "finish_reason": "stop"
              },
              "message": {
                "lc": 1,
                "type": "constructor",
                "id": [
                  "langchain_core",
                  "messages",
                  "AIMessageChunk"
                ],
                "kwargs": {
                  "content": "I'm here to provide information and assistance on a wide range of topics. Feel free to ask me anything you'd like to know more about!",
                  "additional_kwargs": {},
                  "response_metadata": {
                    "prompt": 0,
                    "completion": 0,
                    "finish_reason": "stop"
                  },
                  "tool_call_chunks": [],
                  "usage_metadata": {
                    "input_tokens": 25,
                    "output_tokens": 29,
                    "total_tokens": 54
                  },
                  "tool_calls": [],
                  "invalid_tool_calls": []
                }
              }
            }
          ]
        */
      }
    }
  ]
});

for await (const chunk of stream) {
  // console.log(chunk);
}

We don't pass this through by default - maybe we should reconsider CC @ccurme.

We should also fix that misleading blank response_metadata field.

@jacoblee93
Copy link
Collaborator

Ah, never mind. That data does not refer to usage. Will close this, please reopen if the above setting doesn't fix.

Docs are here: https://js.langchain.com/v0.2/docs/how_to/chat_token_usage_tracking#openai-2

@rrichc
Copy link

rrichc commented Jun 8, 2024

@jacoblee93 Is there a proper way to pass stream_options to streamEvents? Like

      const eventStream = await this.agent.streamEvents(
        {
          input: input,
          chat_history: await this.zepMemory.chatHistory.getMessages(),
          signal: signal,
          stream_options: {
            include_usage: true,
          },
        },
        {
          version: "v1",
          callbacks: agentCallbacks,
        }
      );

I'd also like to use the model.bind way, but after binding I'm left with a runnable which I can't feed into createToolCallingAgent which expects a BaseChatModel, since I'm still using AgentExecutor and haven't been able to migrate to LCEL yet

@deranga
Copy link
Contributor

deranga commented Jun 13, 2024

As a temproary fix @rrichc, its possible to apply this patch and then set stream_options when instantiating the ChatOpenAI class.

@langchain+openai+0.1.3.patch

const llm = new ChatOpenAI({
      model: "",
      temperature: 0,
      streaming: true,
      verbose: true,
      stream_options: {
        include_usage: true,
      },
    });

const agent = await createOpenAIToolsAgent({
      llm,
      tools,
      prompt,
    });

@jacoblee93
Copy link
Collaborator

jacoblee93 commented Jun 13, 2024

Oh dear. Yeah I will fix that typing - sorry about this!

That patch would be a good PR too if you're willing to push one up @deranga, thank you for making it.

@deranga
Copy link
Contributor

deranga commented Jun 14, 2024

@jacoblee93, I've just created a PR as mentioned above.

@akmaldira
Copy link

akmaldira commented Jul 4, 2024

@clovisrodriguez

Have you solve this?

and btw that provide a breakdown of token usage for each step in an Agent Executor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet