-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sqlserver input: change statement_text from a tag to a field #6976
Comments
Can you attach the output of |
Sure:
|
Apparently, the statement is being parsed as a formatted SQL - with newlines, tabs, etc. This is why all that is being converted to \n, \t, etc.
To this:
The above removes all extra white spaces, tabs, new lines etc. |
Just linking into some related discussion: #6678 (comment). |
This tag value definitely sticks out, we don't have any other tag values that are so large and with whitespace like this. Additionally, as pointed out by @Trovalo on the community site, tag key values are limited to 64KiB in length. This seems like a strong argument in favor of switching this to a string field. |
To me this is not a bug, Telegraf is just saving the string as-is, by encoding the chars that will otherwise get lost. Yes, the output raw output is not nice to see, but even by removing the special chars you may not get a readable result.
IMO this kind of logic should be placed when reading the data and not while writing them to the database, below the reasons:
@danielnelson I'm curious to know if flux, which has the replace function (which is not present in InfluxQL) can actually return a multi-line strings Summary Removing the encoded chars will just save a bit of space, but will also limit your options in managing the data later, since the only way to format that huge one-line string will be to format it somehow, without the possibility to use a simple replace function. My Case Not Related Note: |
\n, \t are going to be parsed as plain text by pretty much any tool that reads data from influxdb. This is incorrect in my opinion. Also if someone really wants to see a pretty output, they can use poor man's SQL formatter, for example. It is available as a plugin for SSMS, which is used by most SQL developers anyway. I don't see any tool in the InfluxData stack that would automatically parse this kind of data to formatted SQL as well. |
Both Flux and InfluxQL should be able to return multi-line strings no problem, however I think Grafana and Chronograf would both have a hard time displaying this string since it is so far from how a tag is expected in terms of size/format. |
It seems that there are two decisions to make:
I'm leaning towards yes on moving the tag to a field and no (at least for now) on removing whitespace formatting. The downside of moving statement_text to a field is that you will no longer be able to group by the statement_text and instead you will need to use the query_hash. In order to determine what the query_hash is you will probably need to first look that up in a separate query. I can't remember if we made query_has a tag already, if it is still a field then we will need to move it over to a tag in order to keep series identity. We probably should rename the tags/fields if we swap them, otherwise it can be somewhat confusing to query when you have a tag and field with the same name. |
Completely agree on moving "statement text" from tag to field. The column "query_hash" is a tag since issue #6679, which forced its value from binary to string, therefore making it a tag automatically (for this plugin all string values ar tags). About the "statement_text", @sawo1337 opinion is not wrong, if you want to format that text "manually" using copy-paste having a clean text is handy. At last, not grouping by "statement_text" to me is not an issue anyway, it is unreadable in charts, and in tables with more than 3-4 columns. |
Throwing in my 2 cents:
I would investigate the following:
All of these have different solutions, but messing with the string data with replace and the like is not one of them. |
Unescaped special characters are not allowed in the influxdb line protocol. You can get the plan handle id with the statement (there is a field for it), which would be the most definitive way of getting the exact plan you want to see. |
I see what you mean, but you're assuming that
I undestand that the line protocol disallows some characters, hence they're escaped. What I cannot understand is why those characters are not saved in the database correctly. The central question is whether the characters are escaped or replaced with something that looks like an escape sequence but it's not. If it's really an escape sequence, the database should contain the original character, which is not the case. |
In our environment, for example, we log the session data including the plan for anything that takes over 500ms for later investigation. |
@spaghettidba Unfortunately, it's not possible to encode all byte sequences using line protocol. In particular I don't think there is a way to send a tag value with a LF char. On the other hand, a string field could have newlines and tabs encoded without using replacement characters. For trivia purposes, other examples of unencodable data are the empty string and a value with a trailing space. |
@danielnelson Thanks for the info. I was under the impression that this should not be a tag, but a field. Did I misunderstand? |
Currently Telegraf writes it as a tag and one of the points of discussion is to change it to be a field. |
That answer intrigued me and I made some tests, by manually entering some values using Chronograf.
The result is then displayed correctly in Grafana by setting:
In the end, it is possible to output those strings correctly if set as field and not encoded As a note, I've found the below statement confusing in the Influx Line Protocol Docs
and immediately after
In fact, as @danielnelson stated before, it is possible to use the newline char inside a string field |
I've run some tests today, once "stetement_text" column is converted from tag to field the text in it will be interpreted correctly. I've used the "converter" processor to achieve it. |
Closing this as @Trovalo has provided a workaround.
|
I'm not sure that closing bugs because there's a workaround is a good idea. The SQL Server plugin by default wants to store all text data as tags, which leads to this issue. The mere existence of a workaround does not solve the bug IMHO |
Agreed, I don't see why not fix this instead of leaving it as closed with a workaround? There are no real benefits of keeping "statement_text" as a tag as per the discussion above. |
This is not a bug since it performing as expected, I am going to label this as a feature request instead and rename it to "change statement_text from a tag to a field". Anyone please feel free to open a PR for this change. |
Relevant telegraf.conf:
default sqlserver setup, query_version=2
System info:
Telegraf 1.13.2 for Ubuntu
SQL Server 2017 server
Steps to reproduce:
Expected behavior:
data to be stored without escape characters, for example:
select @par = parfrom [server].[db].dbo.table where id = @p_par_id and ([dbid] = @p_db_id or [db_id] = 0
Actual behavior:
data is stored with \n, \t, \r characters:
select @par = par\r\n\t\tfrom [server].[db].dbo.table\r\n\t\twhere id = @p_par_id and \r\n\t\t\t([dbid] = @p_db_id or [db_id] = 0
The text was updated successfully, but these errors were encountered: