-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database io additional tags & fields #7103
Database io additional tags & fields #7103
Conversation
…Database-IO-additional-counters
I kind of dislike adding
We could special case these in the query perhaps?
What do you think about these @denzilribeiro
I don't think we should do this change, instead it should be done at query time with a sum aggregation. While it can be tempting to add totals they add extra unneeded metrics and can cause confusion when applying the normal aggregation patterns, for example sum() could return double the correct value if the total tag isn't excluded.
Maybe it is being returned as a Decimal type, similar to #6397 |
My opinion is that the more complex the queries the more overhead on the server you are monitoring, People use this to monitor instances with few cores the query this PR submits is on the heavy side for sure.
| rg_read_stall_ms | removed | not compatible for SQL Server 2012 11.x
|
I will edit the PR soon to remove the "total rows" and maybe simplify the query. |
I've just updated the code, below a quick recap (I'm not sure if it's better to edit the PR text instead of adding comments. if yes I will update it later)
|
I am still not a fan of any of the calculations that can be done outside for most part. That is true for all of these: read_latency_ms_per_read | | added @danielnelson thoughts? |
I expect the the However, I do see why you would want this since it does make querying easier, especially with Grafana/Chronograf, if you can precompute the results you are interested in ahead of time. What I think might be a better solution for the division is to hold tight for Telegraf 1.15, we are planning a processor that will allow you to compute these rates at ingestion time. It would be a little more setup in Telegraf, but will be more flexible overall. I can't promise it for 1.15 yet, right now its a weekend project for me, but we are also working on a query plugin that will let you do ad-hoc queries as well, so there are going to be more options available for power users and we can keep this plugin lean for performance and simplicity. |
There is the correctness issue too that many people will miss. A straight division is an average since server startup - which means if server started 2 days ago and you had a bad time period 22 hours ago, it will still affect the average that you take now. Hope am making sense :) |
@denzilribeiro is right on the precision, by calculating it in the frontend (Grafana/Chronograf) you are more accurate. |
The code has been edited (hopefully for the last time).
In the end, this is what the whole PR does: Only for SQL Server on-prem
For Azure SQL DB nothing has changed (not even "file_type") |
Sorry to ask you to possibly change again to make query shorter :) You could coalesce the on-prem scenario to make query shorter :) can you test this on SQL 2012/below - Should generate a statement that is correct for both ( but I haven't tested with 2012) DECLARE @sqlstatement AS nvarchar(max);
|
This reduces code duplication, also using 'ProductMajorVersion' is easier to use (I had to cast it anyway to avoid "strange" results)...
The cast is needed otherwise the result is always true see test below -- using a SQL 2017
SELECT
SERVERPROPERTY('ProductMajorVersion') as Version, --type sql_variant
IIF( CAST(SERVERPROPERTY('ProductMajorVersion') AS int) <= 11, 1 , 0) as checkWithCast, --Correct
IIF( SERVERPROPERTY('ProductMajorVersion') <= 11, 1 , 0) as CheckWithoutCast --unexpected result Here is the whole query (it works) SET DEADLOCK_PRIORITY -10;
IF SERVERPROPERTY('EngineEdition') = 5
BEGIN
SELECT
'sqlserver_database_io' As [measurement]
,REPLACE(@@SERVERNAME,'\',':') AS [sql_instance]
,DB_NAME([vfs].[database_id]) AS [database_name]
,vfs.io_stall_read_ms AS read_latency_ms
,vfs.num_of_reads AS reads
,vfs.num_of_bytes_read AS read_bytes
,vfs.io_stall_write_ms AS write_latency_ms
,vfs.num_of_writes AS writes
,vfs.num_of_bytes_written AS write_bytes
,vfs.io_stall_queued_read_ms as rg_read_stall_ms
,ISNULL(b.name ,'RBPEX') as logical_filename
,ISNULL(b.physical_name, 'RBPEX') as physical_filename
,CASE WHEN vfs.file_id = 2 THEN 'LOG' ELSE 'DATA' END AS file_type
,ISNULL(size,0)/128 AS current_size_mb
,ISNULL(FILEPROPERTY(b.name,'SpaceUsed')/128,0) as space_used_mb
,vfs.io_stall_queued_read_ms AS [rg_read_stall_ms]
,vfs.io_stall_queued_write_ms AS [rg_write_stall_ms]
FROM [sys].[dm_io_virtual_file_stats](NULL,NULL) AS vfs
LEFT OUTER join sys.database_files b
ON b.file_id = vfs.file_id
END
ELSE
BEGIN
DECLARE @SqlStatement AS nvarchar(max);
SET @SqlStatement = N'
SELECT
''sqlserver_database_io'' AS [measurement]
,REPLACE(@@SERVERNAME,''\'','':'') AS [sql_instance]
,DB_NAME(vfs.[database_id]) AS [database_name]
,COALESCE(mf.[physical_name],''RBPEX'') AS [physical_filename] --RPBEX = Resilient Buffer Pool Extension
,COALESCE(mf.[name],''RBPEX'') AS [logical_filename] --RPBEX = Resilient Buffer Pool Extension
,mf.[type_desc] AS [file_type]
,vs.[volume_mount_point]
,IIF( RIGHT(vs.[volume_mount_point],1) = ''\'' /*Tag value cannot end with \ */
,LEFT(vs.[volume_mount_point],LEN(vs.[volume_mount_point])-1)
,vs.[volume_mount_point]
) AS [volume_mount_point]
,vfs.[io_stall_read_ms] AS [read_latency_ms]
,vfs.[num_of_reads] AS [reads]
,vfs.[num_of_bytes_read] AS [read_bytes]
,vfs.[io_stall_write_ms] AS [write_latency_ms]
,vfs.[num_of_writes] AS [writes]
,vfs.[num_of_bytes_written] AS [write_bytes]
'
+
CASE
WHEN CAST(SERVERPROPERTY('ProductMajorVersion') AS int) <= 11
/*SQL Server 2012 (ver 11.x) does not have [io_stall_queued_read_ms] and [io_stall_queued_write_ms]*/
THEN ''
ELSE N',vfs.io_stall_queued_read_ms AS [rg_read_stall_ms] ,vfs.io_stall_queued_write_ms AS [rg_write_stall_ms]'
END
+
N'FROM sys.dm_io_virtual_file_stats(NULL, NULL) AS vfs
INNER JOIN sys.master_files AS mf WITH (NOLOCK)
ON vfs.[database_id] = mf.[database_id] AND vfs.[file_id] = mf.[file_id]
CROSS APPLY sys.dm_os_volume_stats(vfs.[database_id], vfs.[file_id]) AS vs
'
EXEC sp_executesql @SqlStatement
END I will push the broken code anyway and try to understand what goes wrong later. (I'm still don't get that much when reading GO but I'm working on it) |
I figured it out, the last time I copy/pasted some code I didn't notice that there where 2 columns with the same name. (one has been removed, it was there by error). |
I've run some tests and made some changes.
I've tested the whole plugin with multiple instances of different versions, from 2012 to 2019 and everything works fine. Some more testing is welcomed though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason these 2 were moved for the Azure case ? They are definitely useful so would leave them as is from original. I am good with all the other cahnces.
,ISNULL(size,0)/128 AS current_size_mb
,ISNULL(FILEPROPERTY(b.name,''SpaceUsed'')/128,0) as space_used_mb
If those are missing then I've made an error, in the Azure SQL DB nothing has changed, not even the file type (which now is kept as is in the on-prem queries). I've looked at the file and those two fields are still there. Are you looking at the last version? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, was looking at wrong version
Seems like this broke something - pulled latest and getting errors in this plugin - Did you test? 2020-03-11T20:11:47Z E! [inputs.sqlserver] Error in plugin: sql: expected 17 destination arguments in Scan, not 16 |
Interesting, I had the same error but I fixed it.
Is it happening on the on-prem or azure dB version of the query?
In the on prem it was caused by 2 columns having the same name, one was
there by accident.
I will have a look at it as soon as possible but right now I cannot test on
azure SQL dB (I will be able to that tomorrow)
…On Wed, 11 Mar 2020, 21:14 denzilribeiro, ***@***.***> wrote:
Seems like this broke something - pulled latest and getting errors in this
plugin - Did you test?
./telegraf --input-filter sqlserver --test > output.txt
2020-03-11T20:11:47Z E! [inputs.sqlserver] Error in plugin: sql: expected
17 destination arguments in Scan, not 16
2020-03-11T20:11:48Z E! [inputs.sqlserver] Error in plugin: sql: expected
17 destination arguments in Scan, not 16
2020-03-11T20:11:49Z E! [inputs.sqlserver] Error in plugin: sql: expected
17 destination arguments in Scan, not 16
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#7103 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC5J5KVN4JJXPGRZRRRSU2DRG7WJZANCNFSM4LAIXZ6Q>
.
|
This is on SQL DB but doesn't matter on what - run the test irrespective and it throws an error. |
It's happening because the azure version gets 2 times rg_read_stall_ms which causes this error. Also I tested with on prem with 2012 and 2019, on the same plugin configuration, one returns 14 columns and the other 16 and it worked perfectly |
Good catch this worked before this PR - I missed catching it when you - I am making a PR anyways here will fix it. |
If you can submit a PR I will be glad, as I said It's hard for me to test this right now (I'm almost offline... the internet is so slow in those days in Italy I am barely able access GitHub, also I need someone to turn on the Azure SQL Db instance which won't happen until tomorrow). I think the root problem is that the keys (col names) are kept in distinct while the values are not (and it makes sense). In this case you will have 17 columns, with 16 distinct names but 17 values |
I will submit it soon - I think you moved columns around for your PR , aka was working before. |
Required for all PRs:
Resolves #7073
The PR covers both Azure SQL DB and "Other versions", by adding new tags and fields that are better if calculated on a per-point basis and might be unhandy to calculate in influxDB but also removes some fields.
Below a recap
Changes
For SQL Server On-Prem (or better all except Azure SQL DB)
Edited
Removed
Added
For SQL Azure DB
Added
Tests
Known Issues
missing precision
In the new the calculated fields, the division is made between integers, which causes the result to be an integer (the decimal part gets truncated). If the values get converted to float, the final result will be converted to a string after being serialized and becomes useless.
Sample:
Personally, I'm not able to understand why a returned value like 0.55 is seen as as string and outputtent in line protocol as value="0.55", therefore maiing it unusable.
Can someone help me understand this? (changing it afterwards will cause errors due to field type conflict)