feat: Add Get Token Metrics to GRPC server #3687

siddimore · 2024-09-29T04:54:40Z

Description

This PR creates a GRPC method to expose some of the Token Metrics

TODO
Add tests

Signed commits

Yes, I signed my commits.

Signed-off-by: Siddharth More <siddimore@gmail.com>

netlify · 2024-09-29T04:55:03Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`b44218f`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/66fad364e93a7c0008058471
😎 Deploy Preview	https://deploy-preview-3687--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

siddimore · 2024-09-29T05:04:56Z

backend/backend.proto

+// Define the empty request
+message MetricsRequest {}
+
+message MetricsResponse {


what other metrics are good to be exposed out?

Maybe not related to the processing request, but could be useful also to expose infos like how many parallel requests can be served

mudler · 2024-09-29T18:28:58Z

mmm I wonder if it would make sense to add a route also via http rest to leverage this, but shouldn't block merging this PR

mudler · 2024-09-29T18:33:13Z

pkg/grpc/client.go

@@ -374,3 +374,21 @@ func (c *Client) Rerank(ctx context.Context, in *pb.RerankRequest, opts ...grpc.
 	client := pb.NewBackendClient(conn)
 	return client.Rerank(ctx, in, opts...)
 }
+
+func (c *Client) GetTokenMetrics(ctx context.Context, in *pb.MetricsRequest, opts ...grpc.CallOption) (*pb.MetricsResponse, error) {
+	if !c.parallel {


Code doesn't look formatted here - might need a go fmt

forgot to fix this

siddimore · 2024-09-29T18:33:17Z

mmm I wonder if it would make sense to add a route also via http rest to leverage this, but shouldn't block merging this PR

@mudler yup i already did make those changes just did not add it yet because i had to investigate some build issue

mudler · 2024-09-29T18:34:11Z

mmm I wonder if it would make sense to add a route also via http rest to leverage this, but shouldn't block merging this PR

@mudler yup i already did make those changes just did not add it yet because i had to investigate some build issue

gotcha 👍 changes looks good here otherwise!

Signed-off-by: Siddharth More <siddimore@gmail.com>

siddimore · 2024-09-30T06:01:08Z

mmm I wonder if it would make sense to add a route also via http rest to leverage this, but shouldn't block merging this PR

@mudler added a route to http endpoints. I will update test and have this PR ready for review this week

mintyleaf · 2025-01-02T18:31:20Z

core/http/endpoints/localai/get_token_metrics.go

+//	@Summary	Get TokenMetrics for Active Slot.
+//	@Accept json
+//	@Produce audio/x-wav
+//	@Success	200		{string}	binary				"generated audio/wav file"


binary "generated audio/wav file"?

mintyleaf · 2025-01-02T18:33:19Z

core/http/endpoints/localai/get_token_metrics.go

+//	@Success	200		{string}	binary				"generated audio/wav file"
+//	@Router		/v1/tokenMetrics [get]
+//	@Router		/tokenMetrics [get]
+func TokenMetricsEndpoint(cl *config.BackendConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {


That endpoint didn't used anywhere at all

If it gets included - it needs to be called in parallel with running slot to get some output by the logic from modified grpc-server.cpp, but i guess loadModel is just waiting until the slot released producing empty output every time

mintyleaf · 2025-01-02T18:42:20Z

That thing is very unfinished and exposed swagger api documentation makes it even more confusing

To get it to work every request is need to be stored somewhere with that statistics (we have an id field in openai response)

Or the less intrusive but hacky and unreliable way is to fire that route right after firing inference, where the metrics will wait until the finish and return that additional data

mintyleaf · 2025-01-02T19:57:30Z

backend/cpp/llama/grpc-server.cpp

+    llama_client_slot* get_active_slot() {
+        for (llama_client_slot& slot : slots) {
+            // Check if the slot is currently processing
+            if (slot.is_processing()) {


if we replace that with slot.available() - if there was at least one inference we get statistics for the last call

if not - we get a 'Server error error="json: unsupported value: NaN"'

mintyleaf · 2025-01-02T20:09:09Z

@mudler
we already have all the needed data in backend ready to be packed into the message

maybe we can introduce some request flag that will add some statistics into the response that will be just an addition to the openai response?

for my goals i need to get the token speed and precise inference run time after each request for multiple nodes which running in federated mode

using tokenMetrics endpoint possibly can be routed to the wrong machine after each request, which is not reliable at all

what do you think about that?

Add Get Token Metrics to GRPC server

2a42f5b

Signed-off-by: Siddharth More <siddimore@gmail.com>

siddimore changed the title ~~(feat) Add Get Token Metrics to GRPC server~~ feat: Add Get Token Metrics to GRPC server Sep 29, 2024

siddimore commented Sep 29, 2024

View reviewed changes

mudler reviewed Sep 29, 2024

View reviewed changes

mudler previously approved these changes Sep 29, 2024

View reviewed changes

Expose LocalAI endpoint

35dde60

Signed-off-by: Siddharth More <siddimore@gmail.com>

siddimore dismissed mudler’s stale review via 35dde60 September 30, 2024 05:59

Merge branch 'master' into expose_token_metrics

b44218f

siddimore marked this pull request as ready for review September 30, 2024 16:35

mudler approved these changes Oct 1, 2024

View reviewed changes

mudler added the enhancement New feature or request label Oct 1, 2024

mudler merged commit f84b55d into mudler:master Oct 1, 2024
29 checks passed

mintyleaf reviewed Jan 2, 2025

View reviewed changes

mintyleaf mentioned this pull request Jan 11, 2025

feat: add machine tag and inference timings #4577

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Get Token Metrics to GRPC server #3687

feat: Add Get Token Metrics to GRPC server #3687

siddimore commented Sep 29, 2024 •

edited

Loading

netlify bot commented Sep 29, 2024 •

edited

Loading

siddimore Sep 29, 2024

mudler Sep 29, 2024

mudler commented Sep 29, 2024

mudler Sep 29, 2024

siddimore Sep 30, 2024

siddimore commented Sep 29, 2024 •

edited

Loading

mudler commented Sep 29, 2024

siddimore commented Sep 30, 2024

mintyleaf Jan 2, 2025

mintyleaf Jan 2, 2025

mintyleaf commented Jan 2, 2025 •

edited

Loading

mintyleaf Jan 2, 2025

mintyleaf commented Jan 2, 2025

feat: Add Get Token Metrics to GRPC server #3687

feat: Add Get Token Metrics to GRPC server #3687

Conversation

siddimore commented Sep 29, 2024 • edited Loading

netlify bot commented Sep 29, 2024 • edited Loading

✅ Deploy Preview for localai ready!

siddimore Sep 29, 2024

Choose a reason for hiding this comment

mudler Sep 29, 2024

Choose a reason for hiding this comment

mudler commented Sep 29, 2024

mudler Sep 29, 2024

Choose a reason for hiding this comment

siddimore Sep 30, 2024

Choose a reason for hiding this comment

siddimore commented Sep 29, 2024 • edited Loading

mudler commented Sep 29, 2024

siddimore commented Sep 30, 2024

mintyleaf Jan 2, 2025

Choose a reason for hiding this comment

mintyleaf Jan 2, 2025

Choose a reason for hiding this comment

mintyleaf commented Jan 2, 2025 • edited Loading

mintyleaf Jan 2, 2025

Choose a reason for hiding this comment

mintyleaf commented Jan 2, 2025

siddimore commented Sep 29, 2024 •

edited

Loading

netlify bot commented Sep 29, 2024 •

edited

Loading

siddimore commented Sep 29, 2024 •

edited

Loading

mintyleaf commented Jan 2, 2025 •

edited

Loading