-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sql server remodel #3618
Sql server remodel #3618
Conversation
…y initialization to only happen once.
…tance is not running on AzureDB.
I do need someone to test this with Azure SQL DB. My free trial expired before I could fully test it. |
@regevbr Do you think you would be able to help with Azure testing? |
@regevbr "AzureDB support" in this case is simply adding a query to an AzureDB specific DMV for resource utilization as well as adding logic throughout V2 queries to avoid DMVs not available in AzureDB. Please suggest more queries that could be added if you can. I was thinking of adding per DB wait stats instead of the "server" wide, since those can be misleading in AzureDB. I would also be interested to know if AzureDB users would typically expect to set up telegraf by the database, or by the server. |
@m82labs could you please update/upload the grafana dashboard with the modifications you have done? |
Since the queries are so different in how the data is gathered it would be
a complete re-write of the dashboard. I could upload a custom dashboard I
use (it is actually 4 different dashboards), but it is very much tweaked
for my own environment, and I assumed others would build there own from
scratch.
I should mention that the original queries are all still here and could
continue to be used for those using the dashboard created around them.
…On Thu, Jan 4, 2018 at 11:14 AM, zensqlmonitor ***@***.***> wrote:
@m82labs <https://github.com/m82labs> could you please update/upload the
grafana dashboard with the modifications you have done?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dIL1se-_V3i9uSFdgn2_Vennd2BVks5tHPjIgaJpZM4RLN0w>
.
|
While we will have the original queries for some time, when eventually we release Telegraf 2.0 I would like to drop the old version. This might be a ton of work, but maybe we should make a list of the queries we have in the current version and note if we can still make them with this format? |
Just to make sure I understand, are you saying that we should see if we can
rewrite the existing queries to return data in a more telegraf-friendly
format? (No delta calculations, etc.)
If so, I will try to fit the new query results to the existing dashboard
and see what's missing and we can go from there. Since I added the ability
to exclude specific queries, adding more queries to cover what the old
queries were returning would likely meet everyone's needs here.
…On Thu, Jan 4, 2018 at 2:14 PM, Daniel Nelson ***@***.***> wrote:
While we will have the original queries for some time, when eventually we
release Telegraf 2.0 I would like to drop the old version. This might be a
ton of work, but maybe we should make a list of the queries we have in the
current version and note if we can still make them with this format?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dLsBOQoQV7TVH_UPbLJg-M8mWQ6tks5tHSLwgaJpZM4RLN0w>
.
|
I'm interested in knowing if there are queries on the current dashboard that could not be rewritten with the new queries/format due to data not being collected. That doesn't necessarily mean we have to have equivalents, we just should have an idea what is no longer available. |
Ah, that makes sense. I will document what would no longer be available.
…On Thu, Jan 4, 2018 at 4:00 PM, Daniel Nelson ***@***.***> wrote:
I'm interested in knowing if there are queries on the current dashboard
that could not be rewritten with the new queries/format due to data not
being collected. That doesn't necessarily mean we have to have equivalents,
we just should have an idea what is no longer available.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dB6WQjJm4NDTRguwchYb_8qPER4Nks5tHTvRgaJpZM4RLN0w>
.
|
I'm excited to try this, because the old version generates an insane amount of measurements, most of which are of no real interest to me. Thanks for this. |
@kerams Should be available in the nightly build now if you are interested: https://dl.influxdata.com/telegraf/nightlies/telegraf-nightly_windows_amd64.zip, would love to hear what you think. |
@kerams longer term I want to add better support for AzureDB. Prior to the release I want to add backup throughput counters as well as the user defined counters. |
I've started porting the Grafana dashboard and these are the questions/notes so far:
|
- Good catch @kerams. Not sure why I am not including system databases, I
will change that.
- Do you have a few examples of the c_type issues? I can take a look. I am
calculating that field so I am not sure what could be going wrong. (I based
it on ths doc:
https://blogs.msdn.microsoft.com/psssql/2013/09/23/interpreting-the-counter-values-from-sys-dm_os_performance_counters/)
- The CPU usage included in this plugin is specifically for resource
governor workload groups. So in this case it shows the internal and default
workload groups. For more detailed CPU information you would need to use
the CPU plugin.
- Page file usage would be another that would need to be grabbed from
another plugin, as well as more detailed memory information. It does have
total physical memory, I could include the max server memory as well
-
Row/log writes and reads are captured at the database level currently, it
wouldn't be too difficult to alter the query to get a total as well, I can
add that.
-
Log Files Size and Used are both captured, so free % could be derived from
that (same for system log if you wanted that in another graph)
In any cases where an existing plugin could capture better data I left the
data out. I figured this was the best approach to keep this a pure SQL
Server plugin. Some of the "server" counters might not make a whole lot of
sense on a SQL on Linux instance for example. As far as the data being
different, this is expected. The original plugin was capturing short
pockets of time at a given interval, so it missed a lot of detail. This
plugin relies on the user to do diff calculations so it can report at a
much higher frequency, providing a lot more detail. Also, there is almost
zero math happening in the plugin itself, so the data should be a lot more
accurate. When I deployed this in my production environment I noticed
things I had never noticed before, but it matched my previous metrics for
the most part (I was not using telegraf, I was doing raw perf counter
captures).
If you can give me a few examples of where they vary, I can try to provide
a better explanation.
…On Mon, Feb 5, 2018 at 4:29 AM, kerams ***@***.***> wrote:
I've started porting the Grafana dashboard and these are the
questions/notes so far:
- The database counts (online, offline, etc.) don't seem to include
system databases
- Many performance counter readings have /sec in their name (and have
the wrong c_type tag) even though they represent raw values, not deltas
- The CPU usage % counter has 2 tag values for instance - default and
internal. What are these?
- Apart from the ones you mentioned above, I could not find the
appropriate measurements to populate these panels:
- Page file usage
- Target memory
- Used memory
- Page file
- Row/log writes and reads for the entire instance
- Log used %
- System log used %
- I'm running both versions of the plugin side-by-side on the same DB
instance and several v1 and v2 counters seem to differ by a relatively wide
margin. I'll keep an eye on this.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dKASLKFQT67UckRw_FJ1VnkiGpbqks5tRsoFgaJpZM4RLN0w>
.
|
For instance, |
Ah this sounds like a terminology issue on my part maybe. 'raw' means it
can be reported directly, using avg or something similar, 'rate' means it
has to be fed to 'non_negative_derivative'. It sounds confusing now that I
write that. Maybe 'current' and 'cumulative' would be better?
|
I see. The |
I originally wanted to do that, but wanted to avoid adding all of the
counters to the docs. After working with this a while though I DO feel
having that extra tag on there is kind of a waste.
…On Mon, Feb 5, 2018 at 9:44 AM, kerams ***@***.***> wrote:
I see. The rate and /sec combination is a bit unfortunate. Nonetheless, I
don't really see the benefit of that tag. Sure, it helps you design your
queries at first, but its value for a counter never changes. Could this
kind of meta information perhaps be more suited for the docs?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dExQisubzb-Ad1onNfD0IrpbPA08ks5tRxPVgaJpZM4RLN0w>
.
|
Take a look at https://i.imgur.com/C8wuyAo.png and https://i.imgur.com/OI4lHlh.png. Logouts and SQL (Re-)Compilations appear to be significantly different. |
Can you change your graphs for the V2 queries to calculate like this:
`non_negative_derivative(last("value"),1s)`
This will give you a more accurate number. Using the mean on cumulative
numbers like these will also under-report the value.
…On Tue, Feb 6, 2018 at 3:27 AM, kerams ***@***.***> wrote:
Take a look at https://i.imgur.com/C8wuyAo.png and
https://i.imgur.com/OI4lHlh.png. Logouts and SQL (Re-)Compilations appear
to be significantly different.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dOjmegbBI09W-qKobmEHwVO2QTcOks5tSAzbgaJpZM4RLN0w>
.
|
Tried it, but the averages reported by Grafana remained unaffected. |
Have you tried using performance monitor on the instance itself to see
which is closer to reality?
I am going to do this now on one of my instances.
…On Tue, Feb 6, 2018 at 7:06 AM, kerams ***@***.***> wrote:
Tried it, but the averages reported by Grafana remained unaffected.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dDMg2qEaLxEwRare369xvLbfdh3Pks5tSEBAgaJpZM4RLN0w>
.
|
SQL Compilations and Batch requests in perfmon indeed seem to match v2 more closely. |
@kerams I ran this for ~ 1 hour and checked and the new results seem to line up with performance monitor: |
Yeah, sorry for the false alarm. V2 looks good then. |
No worries, it forced me to double check. :)
…On Tue, Feb 6, 2018 at 9:23 AM, kerams ***@***.***> wrote:
Yeah, sorry for the false alarm. V2 looks good then.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dHurhMryXnWfN7D7aiT2xH7NWgg3ks5tSGBMgaJpZM4RLN0w>
.
|
Is it possible to get the values from sqlserver_performance as Integer and not as String? name: sqlserver_database_io from telegraf test -> sqlserver_server_properties,host=ham,sql_instance=HAMD,sql_version=12.0.2000.8 db_suspect=0i,uptime=17249i,db_restoring=0i,server_memory=33553840i,cpu_count=4i,db_recovering=0i,db_online=8i,db_offline=0i,db_recoveryPending=0i 1522841856000000000 |
This was not intentional. I'll take a look at it.
…On Wed, Apr 4, 2018 at 10:23 DieterHi ***@***.***> wrote:
Is it possible to get the values from sqlserver_performance as Integer and
not as String?
SHOW FIELD KEYS
name: sqlserver_performance
fieldKey fieldType
value string
name: sqlserver_database_io
fieldKey fieldType
read_bytes integer
read_latency_ms integer
reads integer
write_bytes integer
write_latency_ms integer
writes integer
from telegraf test ->
**sqlserver_performance,counter=Query,host=hamd,instance=User** counter\
9,object=SQLServer:User\ Settable,sql_instance=HAMD
*value=“0.000000000000”* 1522841856000000000
*sqlserver_server_properties,host=ham,sql_instance=HAMD,sql_version=12.0.2000.8*
*db_suspect=0i,uptime=17249i,db_restoring=0i,server_memory=33553840i,cpu_count=4i,db_recovering=0i,db_online=8i,db_offline=0i,db_recoveryPending=0i*
1522841856000000000
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3618 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIS1dFXkfY1zwqAZFoKHN2LksD-TSGlQks5tlNdIgaJpZM4RLN0w>
.
|
I use telegraf 1.6 RC3 with Kafka output. |
@zensqlmonitor would you be able to shed a little light on how the plugin decides which data type to use? It seems any recent build is outputting strings, but the SQL datatypes being returned are not, they are typically type |
@DieterHi I have a fix I am currently testing. |
Might be worth looking into moving back to the main go-mssqldb repo: https://github.com/denisenkom/go-mssqldb |
@danielnelson good call. I can look at whats involved. EDIT: A quick test and it looks like nothing would need to change, but I will do some further testing before doing a PR. |
Hi Mark,
Now i use telegraf Nighty Build Telegraf v1.7.0~a28de4b5 (git: master a28de4b) with your new code and its run.
Thanks for your Good Work and your Support!
Greetings Dieter
Gesendet: Freitag, 06. April 2018 um 20:50 Uhr
Von: "Mark Wilkinson - m82labs" <notifications@github.com>
An: influxdata/telegraf <telegraf@noreply.github.com>
Cc: DieterHi <dieter_hildebrandt@web.de>, Mention <mention@noreply.github.com>
Betreff: Re: [influxdata/telegraf] Sql server remodel (#3618)
@danielnelson good call. I can look at whats involved.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Good to hear @DieterHi ! |
This is a rewrite of most of the data collection queries for the SQL Server plugin.
Changes:
Notes:
Required for all PRs: