Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure non-streaming usage data from function calling is in history #5676

Merged
merged 1 commit into from
Nov 20, 2024

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Nov 20, 2024

It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.

Fixes #5668

Microsoft Reviewers: Open in CodeFlow

It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.
@SteveSandersonMS
Copy link
Member

While this is a valid solution, I suspect it's not what #5668 had in mind. It sounds like they were not expecting to have to set KeepFunctionCallingMessages = true or to have to sum over a set of auto-added messages (somehow working out which ones to sum over), and just expected the response object to contain the sum over usage details across all sub-requests.

I'm fairly sure you've done it this way because there could be usage information that we can't sum automatically, i.e., the UsageDetails.AdditionalProperties entries.

Even with this we're still discarding other intermediate state such as the AdditionalProperties and RawRepresentation on the original ChatCompletion objects, but then FunctionInvokingChatClient is meant as a convenient but lossy process that primarily just gives you the final post-function-call results and doesn't try to preserve and expose all the intermediate states. People who must have all the intermediate states verbatim would need not to use FunctionInvokingChatClient.

Given the lossiness of FunctionInvokingChatClient I'd also be open to the alternative of auto-summing known properties on UsageDetails and ignoring its AdditionalProperties. I know the casualty there would be gpt4-o1 reasoning tokens - at least in the short term, since we could potentially add that as a first-class thing to UsageDetails in the long term.

Overall I'm fine with whatever you prefer here.

@stephentoub
Copy link
Member Author

It sounds like they were not expecting to have to set KeepFunctionCallingMessages = true

That's the default, fwiw

I'm fairly sure you've done it this way because there could be usage information that we can't sum automatically, i.e., the UsageDetails.AdditionalProperties entries.

Yes

I know the casualty there would be gpt4-o1 reasoning tokens

And cached tokens. And predicted tokens. All of those have emerged in just the last few months. Presumably there will be more in the future.

I'd also be open to the alternative of auto-summing known properties on UsageDetails and ignoring its AdditionalProperties

That makes me nervous. The mitigating factor is at least all the ones we're aware of are included in the totals. But they also all have different pricing structures.

I liked this PR's approach because it logically matches the streaming case, but I understand the hesitancy and the rationale that the resulting ChatCompletion should represent the totality of the operation. I'm just concerned about the summing due to the financial aspect of tokens.

Another approach would be to include all of the additional properties in the final one, and assume that anything that's a property with an int value is summable. Thats not necessarily a valid assumption, but we could do it anyway. Or find another way to include everything, like putting in lists instead of indovidual values, or having a slot in additional properties thats a list of all the individual usagr details.

@SteveSandersonMS
Copy link
Member

That's the default, fwiw

Thanks for the reminder. I usually expect bool flags to default to false (in ASP.NET Core that's pretty much a rule, even if it makes naming harder) but it does make sense that we have this behavior as default.

That adds extra weight to the validity of this design.

Another approach would be to include all of the additional properties in the final one, and assume that anything that's a property with an int value is summable. Thats not necessarily a valid assumption, but we could do it anyway. Or find another way to include everything, like putting in lists instead of indovidual values, or having a slot in additional properties thats a list of all the individual usagr details.

Summing all ints seems very sketchy. Tracking all the UsageDetails in a list in AdditionalProperties would technically preserve the information but it's likely harder to use that information as a consumer because (1) it's not accessible in a strongly-typed way and (2) even if you find the list, it's hard to match up entries with their original sources.

I think the approach you've taken here is good, at least as good as we can do. It retains precision at the cost of a bit of ease-of-use. As long as it's relatively uncommon for people to need to track token usage, there's not much drawback to what this PR does. If it does turn into a common requirement we could look at adding some helper method that does the counting given an IEnumerable<ChatMessage> or something like that (or encourage IChatClient implementations to provide one, since they know about their own custom usage details extension data).

@stephentoub stephentoub merged commit 7ebb34d into dotnet:main Nov 20, 2024
6 checks passed
@stephentoub stephentoub deleted the addusagetohistory branch November 20, 2024 15:27
stephentoub added a commit to stephentoub/extensions that referenced this pull request Nov 20, 2024
…otnet#5676)

It's already yielded during streaming, but it's not being surfaced for non-streaming. Do so by manufacturing a new UsageContent for the UsageDetails and adding that to the response message that's added to the history.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FunctionCallingChatClient reports incorrect UsageData
2 participants