Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some metrics disapears after upgrade to 1.6.x and above using Prometheus Ouput Plugin #3977

Closed
allangood opened this issue Apr 5, 2018 · 7 comments · Fixed by #3978
Closed
Labels
bug unexpected problem or unintended behavior
Milestone

Comments

@allangood
Copy link

Bug report

After upgrade from 1.5.3 to 1.6.X or 1.7.X on Windows or Linux machines I get a strange behavior.

Some outputs like inputs.mem and inputs.sqlserver just disappears from the prometheus exposed list (http://myserver/metrics) but works perfectly with --test switch on command line.

I've tried to run as different user (on windows) and using IP address instead of name as well, and none worked.
There's no relevant logs on log file or running with "debug" flag as true.

I will be glad to provide more information with you need.

Thank you.

Relevant telegraf.conf:

[global_tags]
  collector = "telegraf"
  fqdn = "myserver.mydomain"
  location = "mylocation"
  ostype = "windows"
[[outputs.prometheus_client]]
  expiration_interval = "60s"
  listen = "0.0.0.0:9273"
[[inputs.mem]]
[[inputs.sqlserver]]
  query_version = 2
  servers = [ "Server=localhost\\instance;User Id=telegraf;Password=telegraf;app name=telegraf;encrypt=disable;log=0"]	

System info:

Telegraf 1.6.X and above.
Using Prometheus Output plugin.

Steps to reproduce:

Upgrade to 1.6.X or above.

Expected behavior:

Output from version 1.5.3:

curl http://myserver02/metrics

# HELP mem_active Telegraf collected metric
# TYPE mem_active counter
mem_active{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_available Telegraf collected metric
# TYPE mem_available counter
mem_available{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 5.204361216e+09
# HELP mem_available_percent Telegraf collected metric
# TYPE mem_available_percent counter
mem_available_percent{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 17.31076795555045
# HELP mem_buffered Telegraf collected metric
# TYPE mem_buffered counter
mem_buffered{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_cached Telegraf collected metric
# TYPE mem_cached counter
mem_cached{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_free Telegraf collected metric
# TYPE mem_free counter
mem_free{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_inactive Telegraf collected metric
# TYPE mem_inactive counter
mem_inactive{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_slab Telegraf collected metric
# TYPE mem_slab counter
mem_slab{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 0
# HELP mem_total Telegraf collected metric
# TYPE mem_total counter
mem_total{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 3.0064300032e+10
# HELP mem_used Telegraf collected metric
# TYPE mem_used counter
mem_used{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 2.4859938816e+10
# HELP mem_used_percent Telegraf collected metric
# TYPE mem_used_percent counter
mem_used_percent{collector="telegraf",fqdn="myserver02.mydomain",host="myserver02",location="mylocation",ostype="windows"} 82.68923204444955

Actual behavior:

Version 1.6.x and above

Output from command line:

c:\Program Files\telegraf>telegraf.exe -config "c:\Program Files\telegraf\telegraf.conf" --config-directory "c:\Program Files\telegraf\telegraf.d" --test --input-filter mem

> mem,collector=telegraf,fqdn=myserver,host=myserver,location=mylocation,ostype=windows active=0i,available=8255950848i,available_percent=48.057264930773286,buffered=0i,cached=0i,free=i,inactive=0i,slab=0i,total=17179402240i,used=8923451392i,used_percent=51.942735069226714,wire=0i 1522960875000000000

Output from curl "http://myserver/metrics":

# TYPE mem_available_percent counter
mem_available_percent{collector="telegraf",fqdn="myserver",host="myserver",location="mylocation",ostype="windows"} 48.13911625367473
# HELP mem_used_percent Telegraf collected metric
# TYPE mem_used_percent counter
mem_used_percent{collector="telegraf",fqdn="myserver",host="myserver",location="mylocation",ostype="windows"} 51.86088374632527
@danielnelson
Copy link
Contributor

I will add this to an official 1.6.0-rc4 tomorrow, in the meantime you can test with these builds:

@allangood
Copy link
Author

Hi @danielnelson

I've tested today and found that the "mem" plugin is working, but the sqlserver plugin still not working properly.

All outputs for "Performance Counters" (sqlserver_performance) are missing.
The output for these counters are working:

  • Database IO
  • Memory Clerk
  • Server properties
  • Wait stats

I'm not sure if all plugins are affected by this bug because I don't have a way to test all of them, sorry.

Thank you for your support.

@danielnelson
Copy link
Contributor

I think it may be this issue where some fields are being returned as strings: #3618 (comment)

@allangood
Copy link
Author

Looking deeper into sqlserver problem, you are right.
All sqlserver_performance metrics returns as string, not as float:

> sqlserver_performance,collector=telegraf,counter=Full\ Scans/sec,fqdn=myserver.mydomain,host=myserver,location=mylocation,object=MSSQL$MSSQL:Access\ Methods,ostype=windows,sql_instance=MYSERVER:MSSQL value="952301.000000000000" 1523025657000000000
> sqlserver_performance,collector=telegraf,counter=Index\ Searches/sec,fqdn=myserver.mydomain,host=myserver,location=mylocation,object=MSSQL$MSSQL:Access\ Methods,ostype=windows,sql_instance=MYSERVER:MSSQL value="65380032.000000000000" 1523025657000000000
> sqlserver_performance,collector=telegraf,counter=Page\ Splits/sec,fqdn=myserver.mydomain,host=myserver,location=mylocation,object=MSSQL$MSSQL:Access\ Methods,ostype=windows,sql_instance=MYSERVER:MSSQL value="91855.000000000000" 1523025657000000000
> sqlserver_performance,collector=telegraf,counter=Buffer\ cache\ hit\ ratio,fqdn=myserver.mydomain,host=myserver,location=mylocation,object=MSSQL$MSSQL:Buffer\ Manager,ostype=windows,sql_instance=MYSERVER:MSSQL value="100.000000000000" 1523025657000000000
> sqlserver_performance,collector=telegraf,counter=Checkpoint\ pages/sec,fqdn=myserver.mydomain,host=myserver,location=mylocation,object=MSSQL$MSSQL:Buffer\ Manager,ostype=windows,sql_instance=MYSERVER:MSSQL value="441499.000000000000" 1523025657000000000

Thank you @danielnelson

@bjammin
Copy link

bjammin commented Jul 31, 2018

I'm still experiencing this on 1.7.2 ... sqlserver_performance are all missing.

@danielnelson
Copy link
Contributor

@bjammin Are the sqlserver_performance metrics shown in the output of this command and if so can you add it here?

telegraf --input-filter sqlserver --test

@bjammin
Copy link

bjammin commented Aug 1, 2018

Thanks for the reply. Yes, the metrics are shown, and I've now realized the problem is mine. I was looking at the metrics through grafana and failed to filter properly by "counter".

Sorry to waste your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants