sqlserver input: change statement_text from a tag to a field #6976

sawo1337 · 2020-02-04T12:09:33Z

Relevant telegraf.conf:

default sqlserver setup, query_version=2

System info:

Telegraf 1.13.2 for Ubuntu
SQL Server 2017 server

Steps to reproduce:

Configure Telegraf to connect to mssql server with default settings
Examine the data stored in measurement sqlserver_requests, tag statement_text

Expected behavior:

data to be stored without escape characters, for example:
select @par = parfrom [server].[db].dbo.table where id = @p_par_id and ([dbid] = @p_db_id or [db_id] = 0

Actual behavior:

data is stored with \n, \t, \r characters:
select @par = par\r\n\t\tfrom [server].[db].dbo.table\r\n\t\twhere id = @p_par_id and \r\n\t\t\t([dbid] = @p_db_id or [db_id] = 0

The text was updated successfully, but these errors were encountered:

danielnelson · 2020-02-04T20:00:04Z

Can you attach the output of telegraf --input-filter sqlserver --test? Just one line from the output that shows how we are writing the metrics?

sawo1337 · 2020-02-04T20:52:56Z

Sure:

sqlserver_requests,command=AWAITING\ COMMAND,database_name=tempdb,host=HOST,host_name=HOST,program_name=telegraf,session_db_name=tempdb,sql_instance=SERVER,statement_text=SET\ DEADLOCK_PRIORITY\ -10;\nDECLARE\ @sys_info\ TABLE\ (\n\tcpu_count\ INT\,\n\tserver_memory\ BIGINT\,\n\tsku\ NVARCHAR(64)\,\n\tengine_edition\ SMALLINT\,\n\thardware_type\ VARCHAR(16)\,\n\ttotal_storage_mb\ BIGINT\,\n\tavailable_storage_mb\ BIGINT\,\n\tuptime\ INT\n)\n\nIF\ SERVERPROPERTY('EngineEdition')\ \=\ 8\ \ --\ Managed\ Instance\n\ \tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\ \tTOP(1)\n\t\t\tvirtual_core_count\ AS\ cpu_count\,\n\t\t\t(SELECT\ process_memory_limit_mb\ FROM\ sys.dm_os_job_object)\ AS\ server_memory\,\n\t\t\tsku\,\n\t\t\tcast(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ AS\ engine_edition\,\n\t\t\thardware_generation\ AS\ hardware_type\,\n\t\t\treserved_storage_mb\ AS\ total_storage_mb\,\n\t\t\t(reserved_storage_mb\ -\ storage_space_used_mb)\ AS\ available_storage_mb\,\n\t\t\t(select\ DATEDIFF(MINUTE\,sqlserver_start_time\,GETDATE())\ from\ sys.dm_os_sys_info)\ as\ uptime\n\tFROM\tsys.server_resource_stats\n\tORDER\ BY\ start_time\ DESC\n\nIF\ SERVERPROPERTY('EngineEdition')\ \=\ 5\ \ --\ Azure\ SQL\ DB\n\tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\ \tTOP(1)\n\t\t\t(SELECT\ count(*)\ FROM\ sys.dm_os_schedulers\ WHERE\ status\ \=\ 'VISIBLE\ ONLINE')\ AS\ cpu_count\,\n\t\t\t(SELECT\ process_memory_limit_mb\ FROM\ sys.dm_os_job_object)\ AS\ server_memory\,\n\t\t\tslo.edition\ as\ sku\,\n\t\t\tcast(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ \ AS\ engine_edition\,\n\t\t\tslo.service_objective\ AS\ hardware_type\,\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ cast(DATABASEPROPERTYEX(DB_NAME()\,'MaxSizeInBytes')\ as\ bigint)/(1024*1024)\ \ AS\ total_storage_mb\,\n\t\t\tNULL\ AS\ available_storage_mb\,\ \ --\ Can\ we\ find\ out\ storage?\n\t\t\tNULL\ as\ uptime\n\tFROM\t\ sys.databases\ d\ \ \ \n\t\t--\ sys.databases.database_id\ may\ not\ match\ current\ DB_ID\ on\ Azure\ SQL\ DB\n\t\tCROSS\ JOIN\ sys.database_service_objectives\ slo\n\t\tWHERE\ d.name\ \=\ DB_NAME()\ AND\ slo.database_id\ \=\ DB_ID()\n\nELSE\nBEGIN\n\tINSERT\ INTO\ @sys_info\ (\ cpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\ )\n\tSELECT\tcpu_count\,\n\t\t\t(SELECT\ total_physical_memory_kb\ FROM\ sys.dm_os_sys_memory)\ AS\ server_memory\,\n\t\t\tCAST(SERVERPROPERTY('Edition')\ AS\ NVARCHAR(64))\ as\ sku\,\n\t\t\tCAST(SERVERPROPERTY('EngineEdition')\ as\ smallint)\ as\ engine_edition\,\n\t\t\tCASE\ virtual_machine_type_desc\n\t\t\t\tWHEN\ 'NONE'\ THEN\ 'PHYSICAL\ Machine'\n\t\t\t\tELSE\ virtual_machine_type_desc\n\t\t\tEND\ AS\ hardware_type\,\n\t\t\tNULL\,\n\t\t\tNULL\,\n\t\t\t\ DATEDIFF(MINUTE\,sqlserver_start_time\,GETDATE())\n\tFROM\tsys.dm_os_sys_info\nEND\nSELECT\t'sqlserver_server_properties'\ AS\ [measurement]\,\n\t\tREPLACE(@@SERVERNAME\,'\'\,':')\ AS\ [sql_instance]\,\n\t\tDB_NAME()\ as\ [database_name]\,\n\t\ts.cpu_count\,\n\t\ts.server_memory\,\n\t\ts.sku\,\n\t\ts.engine_edition\,\n\t\ts.hardware_type\,\n\t\ts.total_storage_mb\,\n\t\ts.available_storage_mb\,\n\t\ts.uptime\,\n\t\tSERVERPROPERTY('ProductVersion')\ AS\ sql_version\,\n\t\tdb_online\,\n\t\tdb_restoring\,\n\t\tdb_recovering\,\n\t\tdb_recoveryPending\,\n\t\tdb_suspect\,\n\t\tdb_offline\nFROM\t(\n\t\t\tSELECT\tSUM(\ CASE\ WHEN\ state\ \=\ 0\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_online\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 1\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_restoring\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 2\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_recovering\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 3\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_recoveryPending\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 4\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_suspect\,\n\t\t\t\t\tSUM(\ CASE\ WHEN\ state\ \=\ 6\ or\ state\ \=\ 10\ THEN\ 1\ ELSE\ 0\ END\ )\ AS\ db_offline\n\t\t\tFROM\tsys.databases\n\t\t)\ AS\ dbs\n\t\tCROSS\ APPLY\ (\n\t\t\tSELECT\tcpu_count\,\ server_memory\,\ sku\,\ engine_edition\,\ hardware_type\,\ total_storage_mb\,\ available_storage_mb\,\ uptime\n\t\t\tFROM\t@sys_info\n\t\t)\ AS\ s\n,status=runnable,transaction_isolation_level=2-Read\ Committed,wait_resource=2:1:16 blocking_session_id=0i,cpu_time_ms=4i,granted_query_memory_pages=0i,logical_reads=187i,open_transaction=0i,percent_complete=0,request_id=0i,session_id=64i,total_elasped_time_ms=15i,wait_time_ms=0i,writes=1i 1580849430000000000

sawo1337 · 2020-02-06T12:58:36Z

Apparently, the statement is being parsed as a formatted SQL - with newlines, tabs, etc. This is why all that is being converted to \n, \t, etc.
While in theory, you can convert the special characters back into structured SQL, I would strongly suggest against storing all this data into the database. At the very least, it makes things unreadable, also takes much more space than it would normally. If one wants formatted SQL, it would make much more sense to just reformat the SQL after examining the query.
I've tested this and found a workaround to remove any extra characters from the statement text. on sqlserver.go, line 1431-1436 I've changed this:

	, (SUBSTRING(qt.text, r.statement_start_offset / 2 + 1,
											(CASE WHEN r.statement_end_offset = -1
													THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
													ELSE r.statement_end_offset
											END - r.statement_start_offset) / 2)
		) AS statement_text

To this:

	, replace(replace(replace(replace(replace(replace((SUBSTRING(qt.text, r.statement_start_offset / 2 + 1,
											(CASE WHEN r.statement_end_offset = -1
													THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
													ELSE r.statement_end_offset
											END - r.statement_start_offset) / 2)
		),Char(13),''),Char(10),''),Char(9),''),' ','<>'),'><',''),'<>',' ') AS statement_text

The above removes all extra white spaces, tabs, new lines etc.

danielnelson · 2020-02-14T23:59:59Z

Just linking into some related discussion: #6678 (comment).

danielnelson · 2020-02-18T19:55:55Z

This tag value definitely sticks out, we don't have any other tag values that are so large and with whitespace like this.

Additionally, as pointed out by @Trovalo on the community site, tag key values are limited to 64KiB in length. This seems like a strong argument in favor of switching this to a string field.

Trovalo · 2020-02-19T18:18:07Z

To me this is not a bug, Telegraf is just saving the string as-is, by encoding the chars that will otherwise get lost.

Yes, the output raw output is not nice to see, but even by removing the special chars you may not get a readable result.

Removing special chars - will allow you to have a "cleaner" string (by losing part of it), but it's still a one-line string which is readable only if the statement is short, any mid-long statement will be unreadable anyway without proper indentation.
This will save a bit of space, but in my cases not that much.
Keeping special chars - won't output a "clean" string, but you will have the actual string with some encoded chars. The difference is that even a short statement will be unreadable, but if the tool allows you to manipulate the string you might get a readable result.

IMO this kind of logic should be placed when reading the data and not while writing them to the database, below the reasons:

most of the strings will be unreadable anyway without proper indentation (regardless of encoded chars or not)
If you can add an intermediate elaboration step while reading the data, then you can solve the problem in the most appropriate way (up to the tool), in example:
- showing the actual (decoded) chars
- removing all those chars and format the string as you wish
If you have a huge one-line string, how are you going to format it? probably the simplest logic is to use a replace function, but if you have no char to replace then you can't

@danielnelson I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings

Summary
IMO, this kind of logic belongs to the tool that reads the data and should not be applied while writing them.

Removing the encoded chars will just save a bit of space, but will also limit your options in managing the data later, since the only way to format that huge one-line string will be to format it somehow, without the possibility to use a simple replace function.
Of course the function may not be included in the query language itself InfluxQL/Flux, but might be present in other reporting/dashboarding/analysis tools.

My Case
In my case, I use Grafana as a dashboarding tool, and I can't format that string using InfluxQL, the only option left for me is to find/develop a plugin that is able to manage encoded chars or replace strings.
That said, other tools like PowerBI, Qlik or any other BI/reporting tool (not so popular for this kind of data but still valid) that has a "data modeling" level will be able to handle this string in a nice way by simply using replace.

Not Related Note:
The last char of the string is cut off for some reason

sawo1337 · 2020-02-19T22:16:45Z

\n, \t are going to be parsed as plain text by pretty much any tool that reads data from influxdb. This is incorrect in my opinion. Also if someone really wants to see a pretty output, they can use poor man's SQL formatter, for example. It is available as a plugin for SSMS, which is used by most SQL developers anyway. I don't see any tool in the InfluxData stack that would automatically parse this kind of data to formatted SQL as well.
The other issue mentioned by @danielnelson is also valid, errors about tag key size limit are shown pretty often, which leads to some of the data being missed. Personally I've converted the tag to a field, I believe most people would have it this way than not.

danielnelson · 2020-02-20T19:07:31Z

I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings

Both Flux and InfluxQL should be able to return multi-line strings no problem, however I think Grafana and Chronograf would both have a hard time displaying this string since it is so far from how a tag is expected in terms of size/format.

danielnelson · 2020-02-20T19:15:41Z

It seems that there are two decisions to make:

Should we convert this tag to a field by default
Should we remove whitespace formatting of the query, \n \t etc

I'm leaning towards yes on moving the tag to a field and no (at least for now) on removing whitespace formatting.

The downside of moving statement_text to a field is that you will no longer be able to group by the statement_text and instead you will need to use the query_hash. In order to determine what the query_hash is you will probably need to first look that up in a separate query.

I can't remember if we made query_has a tag already, if it is still a field then we will need to move it over to a tag in order to keep series identity.

We probably should rename the tags/fields if we swap them, otherwise it can be somewhat confusing to query when you have a tag and field with the same name.

Trovalo · 2020-02-20T19:39:06Z

Completely agree on moving "statement text" from tag to field.

The column "query_hash" is a tag since issue #6679, which forced its value from binary to string, therefore making it a tag automatically (for this plugin all string values ar tags).

About the "statement_text", @sawo1337 opinion is not wrong, if you want to format that text "manually" using copy-paste having a clean text is handy.
Not sure if it's worth to make a parameter in the config for this, something like "clean_sql" = true | false. How you can manage this is really up to the tools you have as frontend.

At last, not grouping by "statement_text" to me is not an issue anyway, it is unreadable in charts, and in tables with more than 3-4 columns.
I personally group by or filter by "sql hash" in any visual that use that measurement, if you want to see the query itself you need a dedicated visual.
Personally I have a table with the "query hash" and its statistics, to identify expensive queries, then using a link I navigate to a "query detail" dashboard, which shows the data of a single "query hash" with some metrics and the "statement text" alone in its own table.

spaghettidba · 2020-02-25T15:03:01Z

Throwing in my 2 cents:

I understand that some characters are not allowed in the protocol for limitations in the protocol itself and have to be escaped, but I do not understand why the data is saved in the database escaped. (E.g. newline becomes "\n" in the protocol, but I'm expecting to find newline in the database). Can anyone comment on this?
Removing illegal characters from the sql strings would be useless and potentially harmful. For instance, when investigating performance issues, we need to have the exact query text, otherwise we might get a different query plan. Removing newlines or other characters would make this impossible.

I would investigate the following:

Is this a limitation in the database storage (can InfluxDB store newlines and other illegal characters)?
Is this a limitation in the write protocol (can Telegraf send escaped strings to InfluxDB and have the escaped characters saved correctly?)
Is this a limitation in the read protocol (can InfluxDB read data from a point that contains illegal characters and send this data to applications like Grafana?)
Is this a limitation in the applications (does Grafana handle illegal characters correctly?)

All of these have different solutions, but messing with the string data with replace and the like is not one of them.

sawo1337 · 2020-02-25T15:35:09Z

Unescaped special characters are not allowed in the influxdb line protocol. You can get the plan handle id with the statement (there is a field for it), which would be the most definitive way of getting the exact plan you want to see.

spaghettidba · 2020-02-25T15:55:28Z

I see what you mean, but you're assuming that

I can access the database (fair assumption, if I have to troubleshoot it)
The plan is still in the cache (not guaranteed at all)
The plan has ever been cached (again, not guaranteed. All trivial plans are not cached for instance)

I undestand that the line protocol disallows some characters, hence they're escaped. What I cannot understand is why those characters are not saved in the database correctly. The central question is whether the characters are escaped or replaced with something that looks like an escape sequence but it's not. If it's really an escape sequence, the database should contain the original character, which is not the case.

sawo1337 · 2020-02-25T16:23:06Z

In our environment, for example, we log the session data including the plan for anything that takes over 500ms for later investigation.
I wouldn't use this plugin for anything but a pointer to the actual problem. Without the actual execution plan, most of the information is not that useful if you want to find the actual reason for the problem.
For example, you know that you have a long-running query that uses CPU/io/memory, but then what? You still need to investigate the execution plan IMO.
In this case, the special characters are stored as clear text, then escaped as well. The only thing that amounts to is to make the query unreadable and that's about it.
For me, it is not that big of a deal, I can always recompile the plugin with the special characters removed, it is just that I don't see how can having them stored as plain text can benefit anyone.

danielnelson · 2020-02-26T01:18:32Z

@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space.

spaghettidba · 2020-02-26T11:00:19Z

@danielnelson Thanks for the info. I was under the impression that this should not be a tag, but a field. Did I misunderstand?

danielnelson · 2020-02-26T21:05:24Z

Currently Telegraf writes it as a tag and one of the points of discussion is to change it to be a field.

Trovalo · 2020-02-27T15:02:46Z

@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular, I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space.

That answer intrigued me and I made some tests, by manually entering some values using Chronograf.
Here is the LineProtocol string I passed in Chronograf: (use it as is)

Test,QueryId=1 QueryText="SELECT 
	ColumnA,
	ColumnB,
	ColumnC
FROM MyTable"
Test,QueryId=2 QueryText="SELECT 
	a.ColumnA,
	a.ColumnB,
	a.ColumnC,
	b.SomeThing
FROM MyTable AS a
INNER JOIN OtherTable AS b
ON a.Id = b.Id"

The result is then displayed correctly in Grafana by setting:

Query → Format As: Table
Visualization: Table
- table transform: Table
- column style (for the QueryText column):
  - Type: String
  - Type: Preserve Formatting: True

In the end, it is possible to output those strings correctly if set as field and not encoded

As a note, I've found the below statement confusing in the Influx Line Protocol Docs
https://docs.influxdata.com/influxdb/v1.7/write_protocols/line_protocol_reference/

Line protocol accepts the newline character \n and is whitespace-sensitive.

and immediately after

Note Line protocol does not support the newline character \n in tag values or field values.

In fact, as @danielnelson stated before, it is possible to use the newline char inside a string field

Trovalo · 2020-04-22T14:34:34Z

I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly.

I've used the "converter" processor to achieve it.

sjwang90 · 2021-04-05T17:11:43Z

Closing this as @Trovalo has provided a workaround.

I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly.
I've used the "converter" processor to achieve it.

spaghettidba · 2021-04-05T17:27:02Z

I'm not sure that closing bugs because there's a workaround is a good idea. The SQL Server plugin by default wants to store all text data as tags, which leads to this issue. The mere existence of a workaround does not solve the bug IMHO

sawo1337 · 2021-04-05T17:48:17Z

Agreed, I don't see why not fix this instead of leaving it as closed with a workaround? There are no real benefits of keeping "statement_text" as a tag as per the discussion above.

sjwang90 · 2021-04-05T23:38:02Z

This is not a bug since it performing as expected, I am going to label this as a feature request instead and rename it to "change statement_text from a tag to a field".

Anyone please feel free to open a PR for this change.

danielnelson added area/sqlserver bug unexpected problem or unintended behavior feature request Requests for new plugin and for new features to existing plugins labels Feb 4, 2020

danielnelson mentioned this issue Feb 18, 2020

Input SQL Server - error max key length exceeded, discarding points #7037

Closed

Trovalo mentioned this issue Apr 24, 2020

Converter Processor - Tag to string Field - chars escapes leads to invalid points #7406

Closed

danielnelson mentioned this issue Jun 22, 2020

[[input.kapacitor]] Invalid character 'x' in string escape code #7563

Closed

sjwang90 closed this as completed Apr 5, 2021

sjwang90 reopened this Apr 5, 2021

sjwang90 changed the title ~~sqlserver input stores statement_text with extra escape characters~~ sqlserver input: change statement_text from a tag to a field Apr 5, 2021

sjwang90 removed the bug unexpected problem or unintended behavior label Apr 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sqlserver input: change statement_text from a tag to a field #6976

sqlserver input: change statement_text from a tag to a field #6976

sawo1337 commented Feb 4, 2020 •

edited

Loading

danielnelson commented Feb 4, 2020

sawo1337 commented Feb 4, 2020

sawo1337 commented Feb 6, 2020

danielnelson commented Feb 14, 2020

danielnelson commented Feb 18, 2020

Trovalo commented Feb 19, 2020 •

edited

Loading

sawo1337 commented Feb 19, 2020 •

edited

Loading

danielnelson commented Feb 20, 2020

danielnelson commented Feb 20, 2020

Trovalo commented Feb 20, 2020

spaghettidba commented Feb 25, 2020 •

edited

Loading

sawo1337 commented Feb 25, 2020

spaghettidba commented Feb 25, 2020

sawo1337 commented Feb 25, 2020

danielnelson commented Feb 26, 2020

spaghettidba commented Feb 26, 2020

danielnelson commented Feb 26, 2020

Trovalo commented Feb 27, 2020 •

edited

Loading

Trovalo commented Apr 22, 2020

sjwang90 commented Apr 5, 2021

spaghettidba commented Apr 5, 2021

sawo1337 commented Apr 5, 2021

sjwang90 commented Apr 5, 2021

sqlserver input: change statement_text from a tag to a field #6976

sqlserver input: change statement_text from a tag to a field #6976

Comments

sawo1337 commented Feb 4, 2020 • edited Loading

Relevant telegraf.conf:

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

danielnelson commented Feb 4, 2020

sawo1337 commented Feb 4, 2020

sawo1337 commented Feb 6, 2020

danielnelson commented Feb 14, 2020

danielnelson commented Feb 18, 2020

Trovalo commented Feb 19, 2020 • edited Loading

sawo1337 commented Feb 19, 2020 • edited Loading

danielnelson commented Feb 20, 2020

danielnelson commented Feb 20, 2020

Trovalo commented Feb 20, 2020

spaghettidba commented Feb 25, 2020 • edited Loading

sawo1337 commented Feb 25, 2020

spaghettidba commented Feb 25, 2020

sawo1337 commented Feb 25, 2020

danielnelson commented Feb 26, 2020

spaghettidba commented Feb 26, 2020

danielnelson commented Feb 26, 2020

Trovalo commented Feb 27, 2020 • edited Loading

Trovalo commented Apr 22, 2020

sjwang90 commented Apr 5, 2021

spaghettidba commented Apr 5, 2021

sawo1337 commented Apr 5, 2021

sjwang90 commented Apr 5, 2021

sawo1337 commented Feb 4, 2020 •

edited

Loading

Trovalo commented Feb 19, 2020 •

edited

Loading

sawo1337 commented Feb 19, 2020 •

edited

Loading

spaghettidba commented Feb 25, 2020 •

edited

Loading

Trovalo commented Feb 27, 2020 •

edited

Loading