-
Notifications
You must be signed in to change notification settings - Fork 2
Billing and API Key Management #9
Comments
w.r.t tokens The best way to represent the atomic resource is to represent the underlying compute, rather the length of the prompt of the request and response It should be a combination of time (minutes of billing) and "weight" of the RAM on the total available, to run the cumulative transformer blocks that are served to consumers. Think of Ethereum Gas consumption as an example; you estimate the GPU compute in advance (with static analysis you can infer the cost of each Operator that maps to a collection of instructions for a model Ie. |
User cost (what a user pays for one interaction) is a sum of:
While our service fee remains static, the compute cost varies based on the 'weight' of the user's request. For instance: For LLM models, the cost can be tied to the 'number of tokens'. To make this practical, we could: For eg. in case of Etherium price is decided based on 'Gas used' which is abstraction similar to 'number of tokens/img resolution/minutes of audio' and Gas price(inn our case this price is fixed and it will directly reflect GPU computation cost) |
|
Project Overview
The goal is to develop a platform that integrates with an existing system to safeguard running services by enforcing users to provide a valid API key. The UI/UX should draw inspiration from the ChatGPT API Key Management platform, focusing on credit-based payments without a subscription option.
Target Users
prem-app
as an admin dashboard. In this role, they should be able to create API keys (e.g., without constraints), view usage, etc.Core Features
Identity Management
prem-app
dashboard.API Key Management
Billing
Usage/Analytics
Integration
prem-app
) should be enhanced to enable the admin to create API keys and view Billing/Usage analytics.Main Flow
prem-service
based on API Key constraints which include:prem-service
using the API key. The platform checks if the API key exists, whether the related user has enough balance, and whether the rate and usage limits for the desired service path are adhered to.@tiero @filopedraz
The text was updated successfully, but these errors were encountered: