-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add controller performance metrics #391
Add controller performance metrics #391
Conversation
Updated Grafana dashboard
… smithdavi/add-controller-performance-metrics
@dgkanatsios In the excel file the description for this issue was:
Did you mean Grafana charts, or should this connect to an existing (or to be built) Geneva/Jarvis system? |
@dsmith111 thank you for the PR! The charts were mean to be Grafana, which aligns with your PR, thanks! |
Refactor non-active handling
…//github.com/dsmith111/thundernetes into smithdavi/add-controller-performance-metrics
Add util funciton; Remove repeated lines
Rename deleteSum; Fix GS state update;
….com:dsmith111/thundernetes into smithdavi/add-controller-performance-metrics
A couple of comments, then I think we're good to merge! Appreciate all the work and the communication, thank you! |
Also FYI @abbasahmed and @ghov since they have #414 open with changes to the Grafana dashboards. You'll need to rebase after we merge this one. |
@dgkanatsios I believe that's all of the current comments wrapped up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really appreciate all the effort and the discussion, will merge as soon as tests pass! Thank you! |
Problem:
Currently we do not have any metrics in place to track the performance of controlling/reconciling GameServers: #361
Solution:
This PR adds in 2 new Prometheus metrics:
GameServerReachedInitializingDuration
The 5-minute average time to reach Initialization from all new GameServers.
GameServerReachedStandingByDuration
The 5-minute average time to reach StandBy from all new GameServers.
These new metrics can potentially show any issues in the controller itself (time taken to begin GameServer creation) as well as issues related to the servers themselves, or performance of the cluster (time taken to complete GameServer initialization).
Testing used to verify the change:
I created a temporary custom Docker image to build the thundernetes controller manager; using the netcore sample GameServerBuild, I proceeded to:
All of these events were monitored within the modified Grafana dashboard: