Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sql server remodel #3618

Merged
merged 21 commits into from
Feb 1, 2018
Merged

Conversation

m82labs
Copy link
Contributor

@m82labs m82labs commented Dec 22, 2017

This is a rewrite of most of the data collection queries for the SQL Server plugin.

Changes:

  • Added AzureDB support
  • Added configuration option to exclude some queries
  • Added configuration option to switch between original and new queries
  • Queries are initialized once when the plugin is initialized, instead of on each time metrics are gathered
  • Re-wrote README to include new config options and explain the v2 queries. Also removed the Grafana examples and metrics output dump to simplify it.

Notes:

  • New queries were written with performance in mind (not readability :) )
  • New queries will return raw metrics whenever possible, deltas are not calculated in the TSQL or plugin code
  • More human friendly names have been added for things like memory clerks
  • Wait stats have been categorized based on Microsoft published categories
  • These changes have been running on 200+ instances in a development environment for over a week

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

@m82labs
Copy link
Contributor Author

m82labs commented Dec 22, 2017

I do need someone to test this with Azure SQL DB. My free trial expired before I could fully test it.

@danielnelson
Copy link
Contributor

@regevbr Do you think you would be able to help with Azure testing?

@m82labs
Copy link
Contributor Author

m82labs commented Dec 28, 2017

@regevbr "AzureDB support" in this case is simply adding a query to an AzureDB specific DMV for resource utilization as well as adding logic throughout V2 queries to avoid DMVs not available in AzureDB. Please suggest more queries that could be added if you can. I was thinking of adding per DB wait stats instead of the "server" wide, since those can be misleading in AzureDB.

I would also be interested to know if AzureDB users would typically expect to set up telegraf by the database, or by the server.

@zensqlmonitor
Copy link
Contributor

@m82labs could you please update/upload the grafana dashboard with the modifications you have done?

@m82labs
Copy link
Contributor Author

m82labs commented Jan 4, 2018 via email

@danielnelson
Copy link
Contributor

While we will have the original queries for some time, when eventually we release Telegraf 2.0 I would like to drop the old version. This might be a ton of work, but maybe we should make a list of the queries we have in the current version and note if we can still make them with this format?

@m82labs
Copy link
Contributor Author

m82labs commented Jan 4, 2018 via email

@danielnelson
Copy link
Contributor

I'm interested in knowing if there are queries on the current dashboard that could not be rewritten with the new queries/format due to data not being collected. That doesn't necessarily mean we have to have equivalents, we just should have an idea what is no longer available.

@danielnelson danielnelson added this to the 1.6.0 milestone Jan 4, 2018
@danielnelson danielnelson added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Jan 4, 2018
@m82labs
Copy link
Contributor Author

m82labs commented Jan 4, 2018 via email

@kerams
Copy link
Contributor

kerams commented Feb 2, 2018

I'm excited to try this, because the old version generates an insane amount of measurements, most of which are of no real interest to me. Thanks for this.

@danielnelson
Copy link
Contributor

@kerams Should be available in the nightly build now if you are interested: https://dl.influxdata.com/telegraf/nightlies/telegraf-nightly_windows_amd64.zip, would love to hear what you think.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 3, 2018

@kerams longer term I want to add better support for AzureDB. Prior to the release I want to add backup throughput counters as well as the user defined counters.

@kerams
Copy link
Contributor

kerams commented Feb 5, 2018

I've started porting the Grafana dashboard and these are the questions/notes so far:

  • The database counts (online, offline, etc.) don't seem to include system databases
  • Many performance counter readings have /sec in their name (and have the wrong c_type tag) even though they represent raw values, not deltas
  • The CPU usage % counter has 2 tag values for instance - default and internal. What are these?
  • Apart from the ones you mentioned above, I could not find the appropriate measurements to populate these panels:
    • Page file usage
    • Target memory
    • Used memory
    • Page file
    • Row/log writes and reads for the entire instance
    • Log used %
    • System log used %
  • I'm running both versions of the plugin side-by-side on the same DB instance and several v1 and v2 counters seem to differ by a relatively wide margin. I'll keep an eye on this.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 5, 2018 via email

@kerams
Copy link
Contributor

kerams commented Feb 5, 2018

Do you have a few examples of the c_type issues?

For instance, Batch Requests/sec's and SQL Compilations/sec's c_type is rate, but they contain raw values that I feed to non_negative_derivative.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 5, 2018 via email

@kerams
Copy link
Contributor

kerams commented Feb 5, 2018

I see. The rate and /sec combination is a bit unfortunate. Nonetheless, I don't really see the benefit of that tag (and object for that matter). Sure, it helps you design your queries at first, but its value for a counter never changes. Could this kind of meta information perhaps be more suited for the docs?

@m82labs
Copy link
Contributor Author

m82labs commented Feb 5, 2018 via email

@kerams
Copy link
Contributor

kerams commented Feb 6, 2018

Take a look at https://i.imgur.com/C8wuyAo.png and https://i.imgur.com/OI4lHlh.png. Logouts and SQL (Re-)Compilations appear to be significantly different.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 6, 2018 via email

@kerams
Copy link
Contributor

kerams commented Feb 6, 2018

Tried it, but the averages reported by Grafana remained unaffected.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 6, 2018 via email

@kerams
Copy link
Contributor

kerams commented Feb 6, 2018

SQL Compilations and Batch requests in perfmon indeed seem to match v2 more closely.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 6, 2018

@kerams I ran this for ~ 1 hour and checked and the new results seem to line up with performance monitor:
image
image

@kerams
Copy link
Contributor

kerams commented Feb 6, 2018

Yeah, sorry for the false alarm. V2 looks good then.

@m82labs
Copy link
Contributor Author

m82labs commented Feb 6, 2018 via email

@DieterHi
Copy link

DieterHi commented Apr 4, 2018

Is it possible to get the values from sqlserver_performance as Integer and not as String?
SHOW FIELD KEYS
name: sqlserver_performance
fieldKey fieldType
value string

name: sqlserver_database_io
fieldKey fieldType
read_bytes integer
read_latency_ms integer
reads integer
write_bytes integer
write_latency_ms integer
writes integer

from telegraf test ->
*sqlserver_performance,counter=Query,host=hamd,instance=User* counter\ 9,object=SQLServer:User\ Settable,sql_instance=HAMD value=“0.000000000000” 1522841856000000000

sqlserver_server_properties,host=ham,sql_instance=HAMD,sql_version=12.0.2000.8 db_suspect=0i,uptime=17249i,db_restoring=0i,server_memory=33553840i,cpu_count=4i,db_recovering=0i,db_online=8i,db_offline=0i,db_recoveryPending=0i 1522841856000000000

@m82labs
Copy link
Contributor Author

m82labs commented Apr 4, 2018 via email

@DieterHi
Copy link

DieterHi commented Apr 4, 2018

I use telegraf 1.6 RC3 with Kafka output.
sorry ->
Telegraf v1.6.0~rc2 (git: release-1.6 1e95f97)

@m82labs
Copy link
Contributor Author

m82labs commented Apr 6, 2018

@zensqlmonitor would you be able to shed a little light on how the plugin decides which data type to use? It seems any recent build is outputting strings, but the SQL datatypes being returned are not, they are typically type numeric.

@m82labs
Copy link
Contributor Author

m82labs commented Apr 6, 2018

@DieterHi I have a fix I am currently testing.

@danielnelson
Copy link
Contributor

Might be worth looking into moving back to the main go-mssqldb repo: https://github.com/denisenkom/go-mssqldb

@m82labs
Copy link
Contributor Author

m82labs commented Apr 6, 2018

@danielnelson good call. I can look at whats involved.

EDIT: A quick test and it looks like nothing would need to change, but I will do some further testing before doing a PR.

@DieterHi
Copy link

DieterHi commented Apr 9, 2018 via email

@m82labs
Copy link
Contributor Author

m82labs commented Apr 10, 2018

Good to hear @DieterHi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants