Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new metric 'StolenTargetMemoryRatio' to ‎sqlPerformanceMetrics in telegraf/plugins/inputs/sqlserver/sqlqueriesV1.go #14751

Closed
chenteddybear opened this issue Feb 12, 2024 · 12 comments · Fixed by #14792
Labels
feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution size/s 1 day effort, great beginniner issue

Comments

@chenteddybear
Copy link

chenteddybear commented Feb 12, 2024

Use Case

This metric will help identify potential memory leak introduced in sql 2019 CU14 where MEMORYCLERK_SOSNODE takes up exessive amount of available memory without releasing them, resulting in crashing the sql server. MEMORYCLERK_SOSNODE is categorized under 'Stolen Memory' in SQL Server.

UNION ALL SELECT 'Stolen target memory ratio', StolenTargetMemoryRatio = 100.0 * cntr_value / (SELECT 1.0*cntr_value FROM sys.dm_os_performance_counters WHERE counter_name = 'Target Server Memory (KB)') FROM sys.dm_os_performance_counters WHERE counter_name = 'Stolen Server Memory (KB)

Expected behavior

When the value of Stolen/Target memory goes above 50% or even 60% it can be considered alarming for DBA.

Actual behavior

it returns the ratio of stolen/target memory in sql server.

Additional info

https://learn.microsoft.com/en-us/answers/questions/826973/what-can-cause-memoryclerk-sosnode-clerk-gradually

@chenteddybear chenteddybear added the feature request Requests for new plugin and for new features to existing plugins label Feb 12, 2024
@srebhan
Copy link
Member

srebhan commented Feb 12, 2024

@Trovalo does that make sense to you?

@chenteddybear a pull-request to add this is welcome... ;-)

@Trovalo
Copy link
Collaborator

Trovalo commented Feb 13, 2024

it costs nothing to add this counter, it's just an additional string in the WHERE filter of the perfCounter query.

About how useful it might be... I usually use the "Page Life Expectancy" to check for memory pressure, but everyone has his ways

@powersj
Copy link
Contributor

powersj commented Feb 13, 2024

@chenteddybear,

Is using the Page Life Expectancy something you have considered?

Otherwise, are you going to put up a PR?

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label Feb 13, 2024
@chenteddybear
Copy link
Author

Hi @powersj
Page Life Expectancy is not enough to indicate memory pressure / memory leak. The new metric may be useful in advising DBAs to investigate SQL memory clerks for odd behavior while troubleshooting.

As for a PR, I'm not sure exactly how to proceed, as it'll be my first time. Thank you.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Feb 13, 2024
@powersj powersj added help wanted Request for community participation, code, contribution size/s 1 day effort, great beginniner issue labels Feb 13, 2024
powersj added a commit to powersj/telegraf that referenced this issue Feb 13, 2024
@powersj
Copy link
Contributor

powersj commented Feb 13, 2024

@chenteddybear in 20-30mins there will be artifacts that you can download attached to the following PR: #14792

Could you please download and confirm that the new metric is visible? Thanks

@powersj powersj added the waiting for response waiting for response from contributor label Feb 13, 2024
@simonwhybrow-cbre
Copy link

That would be great to get it added as that a very useful stat to collect.

I'd mentioned it back here #9112 that it was missing from v2 stats onwards as well as Database Cache Memory and Free Memory, but never chased it up as just had to do it via an exec command.

It would be great to get these other two stats added in too.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Feb 16, 2024
@srebhan
Copy link
Member

srebhan commented Feb 19, 2024

@chenteddybear and @simonwhybrow can you please test the binary in PR #14792 and verify it produces what you expect!?

@simonwhybrow-cbre
Copy link

@chenteddybear and @simonwhybrow can you please test the binary in PR #14792 and verify it produces what you expect!?

Yep, looks good. I can see it pulling through Stolen memory. Any chance we can get the other two metrics added as mentioned in #9112 - Database Cache Memory (KB) and Free Memory (KB)?
Or does that need to be a separate PR?

@srebhan
Copy link
Member

srebhan commented Feb 19, 2024

@powersj do you think we should add the two metric mentioned by @simonwhybrow?

@powersj
Copy link
Contributor

powersj commented Feb 20, 2024

I've updated the PR with the new additional metrics.

@simonwhybrow @chenteddybear,

Can you verify the updates look correct. Thanks

@simonwhybrow-cbre
Copy link

@powersj looks good. Thank you for adding those counters.

@chenteddybear
Copy link
Author

looks great! thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution size/s 1 day effort, great beginniner issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants