Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Prometheus Metrics #1461

Closed
wants to merge 23 commits into from
Closed

Conversation

binarycrayon
Copy link
Contributor

@binarycrayon binarycrayon commented Sep 19, 2024

Motivation

Initial integration with prometheus client to facilitate metrics logging and expose the metrics to /metrics, which can be collected by external Grafana collectors such as alloy.

prometheus client

This is a Draft, how do I make this a draft?

Modifications

  • Add basic metrics collector and metrics type
  • Lazy initiate prometheus client in multiprocess mode
  • Log stats in TPServer
  • Stats at /metrics

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@blacker521
Copy link
Contributor

blacker521 commented Sep 19, 2024

File "/python/sglang/srt/managers/tp_worker.py", line 142, in init
from python.sglang.srt.metrics.metrics_collector import SGLangMetricsCollector

You should delete python.

@binarycrayon binarycrayon changed the title Metrics [WIP] Prometheus Metrics Sep 19, 2024
@Ying1123 Ying1123 mentioned this pull request Sep 22, 2024
29 tasks
@merrymercy merrymercy mentioned this pull request Sep 22, 2024
33 tasks
@binarycrayon
Copy link
Contributor Author

Sample of current logged metrics

# HELP sglang:new_seq Number of new sequences
# TYPE sglang:new_seq gauge
sglang:new_seq{name="google/gemma-2-9b-it"} 50.0
# HELP sglang:new_token Number of new token
# TYPE sglang:new_token gauge
sglang:new_token{name="google/gemma-2-9b-it"} 153.0
# HELP sglang:cached_token Number of cached token
# TYPE sglang:cached_token gauge
sglang:cached_token{name="google/gemma-2-9b-it"} 6252.0
# HELP sglang:cache_hit_rate Cache hit rate
# TYPE sglang:cache_hit_rate gauge
sglang:cache_hit_rate{name="google/gemma-2-9b-it"} 87.58777633289988
# HELP sglang:queue_req Number of queue requests
# TYPE sglang:queue_req gauge
sglang:queue_req{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:num_requests_running Number of requests currently running on GPU
# TYPE sglang:num_requests_running gauge
sglang:num_requests_running{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:num_requests_waiting Number of requests waiting to be processed.
# TYPE sglang:num_requests_waiting gauge
sglang:num_requests_waiting{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:gen_throughput Gen token throughput (token/s)
# TYPE sglang:gen_throughput gauge
sglang:gen_throughput{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:token_usage Total token usage
# TYPE sglang:token_usage gauge
sglang:token_usage{name="google/gemma-2-9b-it"} 0.0
# HELP sglang:max_total_num_tokens Maximum total number of tokens
# TYPE sglang:max_total_num_tokens gauge
sglang:max_total_num_tokens{name="google/gemma-2-9b-it"} 161723.0
# HELP sglang:max_prefill_tokens Maximum prefill tokens
# TYPE sglang:max_prefill_tokens gauge
sglang:max_prefill_tokens{name="google/gemma-2-9b-it"} 16384.0
# HELP sglang:max_running_requests Maximum running requests
# TYPE sglang:max_running_requests gauge
sglang:max_running_requests{name="google/gemma-2-9b-it"} 4097.0
# HELP sglang:context_len Context length
# TYPE sglang:context_len gauge
sglang:context_len{name="google/gemma-2-9b-it"} 8192.0
# HELP sglang:request_prompt_tokens Number of prefill tokens processed
# TYPE sglang:request_prompt_tokens histogram
sglang:request_prompt_tokens_sum{name="google/gemma-2-9b-it"} 1129.0
sglang:request_prompt_tokens_bucket{le="1.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_prompt_tokens_bucket{le="2.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_prompt_tokens_bucket{le="5.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_prompt_tokens_bucket{le="10.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_prompt_tokens_bucket{le="20.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_prompt_tokens_bucket{le="50.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="100.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="200.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="500.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="1000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="2000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="5000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="10000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="20000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="50000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="100000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_bucket{le="+Inf",name="google/gemma-2-9b-it"} 51.0
sglang:request_prompt_tokens_count{name="google/gemma-2-9b-it"} 51.0
# HELP sglang:request_generation_tokens Number of generation tokens processed.
# TYPE sglang:request_generation_tokens histogram
sglang:request_generation_tokens_sum{name="google/gemma-2-9b-it"} 5341.0
sglang:request_generation_tokens_bucket{le="1.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_generation_tokens_bucket{le="2.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_generation_tokens_bucket{le="5.0",name="google/gemma-2-9b-it"} 0.0
sglang:request_generation_tokens_bucket{le="10.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="20.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="50.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="100.0",name="google/gemma-2-9b-it"} 1.0
sglang:request_generation_tokens_bucket{le="200.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="500.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="1000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="2000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="5000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="10000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="20000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="50000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="100000.0",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_bucket{le="+Inf",name="google/gemma-2-9b-it"} 51.0
sglang:request_generation_tokens_count{name="google/gemma-2-9b-it"} 51.0

@merrymercy
Copy link
Contributor

moved to #1853

@merrymercy merrymercy closed this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants