-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows version #30
Comments
+1 |
Any comments atleast!? |
+1 |
At the moment, no, but it could be considered for a future enhancement. What are the sorts of services you're looking for monitoring? |
i would say initially have a plugin that supports any windows performance counter , including wildcard on insstance name , for example |
No Windows support is a deal breaker. Looking forward to seeing this implemented. UPDATE: What are Windows users using to ship (at least system) metrics to InfluxDB today? |
@Vye This has worked very well for me, as fully featured as you can get. https://github.com/MattHodge/Graphite-PowerShell-Functions TopBeat from Elastic is in beta and supports windows but it does not support all wmi calls. What it does that is very good is that it monitors and ships metrics for all running processes. So you get per process memory and cpu usage. Very cool. |
Hello, I saw the "Help Wanted" tag and maybe you would be interested in this : I saw at least 2 collectors that use gosigar, including Topbeat (from Elastic). You may also want to see how Mackerel works :
For wmi (including processes), you can use https://github.com/StackExchange/wmi. I hope it helps. Regards, |
@ymettier gosigar specifically doesn't support windows The other libraries look useful, thanks for the recommendations 👍 |
Hello, About gosigar, I'm confused... https://github.com/cloudfoundry/gosigar/blob/master/sigar_windows.go
OK, you are right. I'm sorry. But from topbeat :
So i'm sorry for the wrong URL. I would agree if you do not want to use gosigar (I would not use it either after this confusion). Regards, |
Hello again... Reading again @discoduck2x's comment about typeperf and collectm... Collectm uses http://markitondemand.github.io/node-perfmon/. Reading the code of this module is very interesting and I would have recommended a similar implementation 1 or 2 months ago... Today I would not recommand this implementation because I noticed that the list of counters is limited. I have not investigated on how many counters you can ask for, but the limit is probably due to the length of the command line. This is a bug in Collectm (and in node-perfmon). But I have no idea on how to do it without calling as many EDIT. https://github.com/lxn/win/blob/master/pdh.go is probably a good start point for typeperf, perfmon & co. This is just about pdh.dll. Regards, |
downloaded https://s3.amazonaws.com/get.influxdb.org/telegraf/telegraf_0.1.9_amd64.msi |
Looks like datadog client uses WMI and Event Log https://github.com/DataDog/dd-agent/tree/master/checks.d |
Yes, we will need to do something with the Windows event log, this is going to be difficult though because I guess we'll need to have our own log wrapper that uses either stdout/stderr or the windows event log depending on the system. https://github.com/golang/sys/tree/master/windows/svc/example |
I tried https://s3.amazonaws.com/get.influxdb.org/telegraf/telegraf_0.1.9_amd64.msi on windows server and it works fine! |
@dbellantuono It's not officially supported yet so we aren't distributing it. Packaging up 0.1.9 was a bit of a one-off, there are also many plugins you will find don't work properly. |
is it more official now ? where to find a list of msi built ? |
@JulienChampseix It's not, sorry, I have many other core changes to telegraf to work on right now. I will be sure to update this case when I've made progress. |
I saw the question above about what are people using to get windows metrics into Influxdb. I am using sensu monitoring framework. There are a few checks (for Windows) and they are easy to extend. I am using this wmi_metrics currently but i have the network stuff commented out as it caused wmi to hang eventually. https://github.com/sensu/sensu-community-plugins/blob/master/extensions/checks/wmi_metrics.rb Since i am using other metrics on linux that also write out to graphite I am using this great handler here I then am using a check that queries influxdb and is really flexible on what to search for and what are the thresholds. Then im using graphite to show the metrics which you can link or embed into uchiwa (sensu dashboard). |
So, having read the comments in this issue, I still have no idea what the consensus is about designing such a feature for Telegraf. The only way that I can see to implement this purely in golang is WinRM. For your reference, here is a good comparison of RPC,WMI and WinRM: There also seems to be a bit of existing WinRM golang source out there: EDIT: Forgot to comment on the other wrapper idea about using the https://github.com/lxn/win/blob/master/pdh.go wrapper. This approach would indeed alleviate the concern that I think exists with the COM-wrapper approach. |
As a general Windows perf counter project to use as inspiration and help, I guess the C# PerfTrap implementation that sends graphite output is a very useful reference: https://github.com/Iristyle/PerfTap#other-historical-notes |
@cwegener - I got a very rough proof of concept (outside of telegraf) working with the github.com/lxn/win example you hinted at. I'm new to programming in general, used to scripting and if it works dont optimize. I need to figure out some basic stuff with Go and programming in general regarding pointer/referral handling. I hope to get this done this week, hopefully useful on most if not all Performance Counters. |
@cwegener - Got a lot further today, the proof of concept is working standalone in the expected way, I hope to move this to telegraf on Friday or Saturday. Tomorrow is maintenance evening @ work. Sample of how I am specifying performance counters:
|
@TheFlyingCorpse - That sounds great! I'm a lousy programmer myself. 😉 But I'm more than happy to code review and of course to deploy any telegraf code to some spare machines I have running around. 😄 |
@cwegener - Get the latest from git if possible, the pull request was accepted. Performance Counters wooo ! |
@TheFlyingCorpse Excellent work. I will build a package and run it on a few systems. ret = win.PdhAddEnglishCounter(handle, query, 0, &counterHandle) https://msdn.microsoft.com/en-us/library/windows/desktop/aa372536(v=vs.85).aspx I am certainly not advocating that the plugin should support older versions than Vista! But I know from my professional experience that still to this day, there are a lot of older versions of Windows being used in many places. 😧 |
@discoduck2x I have now added the collection of "% Processor Time" for all instances of "Process" to my test setup. This now gives me ~150 counters. I will leave it running for a bit and see how the median and average % Processor time of telegraf.exe looks ... For now, I have not been able to make telegraf.exe go beyond 0.2% Processor utilization (I'm still at 10 second interval though ...) |
@discoduck2x I have dropped my telegraf test database and I am now running a 1 second interval for all telegraf counters. I am still collecting ~150 counters on the test machine. The "% Processor Time" mean value for 'telegraf.exe' still sits below 0.2 percent. |
Very promising everyone, in my experience when using graphite powershell and Windows counters is that some counters take more power than others so having the ability to group counters by polling intervals would be a very useful addition. One counter that I remember being heavy is disk utilization, free space. Would be interesting to see a test of one of those with 1 sec interval. This is also one example of a counter that is not that useful to have at 1 sec interval, but rather in 5 minutes or more intervals. |
Happy to hear that @cwegener , my intention was to gather with a very low footprint. The plugin iterates over the configuration on startup to find the valid queries, then saves these with a handle to the search, so it can just query for the next value every interval, instead of creating a new query, waiting for the results (1s+ needed on the first sample), cleanup, wait for interval and do the process all over. |
@TheFlyingCorpse @cwegener , how do i test this ? is there a compiled bin somewhere or do i have to dl src from github? One Q , can you do wildcards on the counters? such as "\process(sql)% Processor Time" would give you all processes with sql in their processname. |
@discoduck2x - I can provide a compiled version if you'd like to. It is not too hard to build your own, maybe 10-15 mins tops. Install git with git in path, install make to path as well as go. Then just set gopath=C:\temp\work, go get github.com/influxdata/telegraf, cd %gopath%\src\influxdata\telegraf, make windows On the wildcard for counters, that is a good idea and I see the use case for it! |
@discoduck2x Regarding wildcards for instance names - I would probably prefer to simply collect all instances with the win_perf_counters plugin and do the filtering at a later stage due to the complexities involved when working with instance names. Though I clearly agree with your use case. SELECT * FROM win_process WHERE instance =~ /.*sql.*/ AND time > now() - 10s |
@cwegener & @TheFlyingCorpse ,,,first, great discussing thist with you! i dont think it makes sense to collect all instances,, you will end up with too many and by that making a too big footprint cpu wise for the collection process. Sure, there might be ppl wanting to collect ALL running processes cpu mem io usage etc and all processes that will ever start going forward after start of the telegraf process - but , i dont think that makes much sense. If you are keen on collecting metrics about a few processes that matters to you then surley you are not allowing the system to spawn unwanted or uncontrolled processes here and there just like that right? my usecase is within finance indusrty and we need high performance and low latency and we even skip using the server manufacorers "own" processse like HP Iinsight managment agents etc because they themselfs add on / delay the system. At least make it controllable - if someone wants to collect ALL then let that be possible,, but dont collect all just because it seems easier from an implementationperspective since it does not make sense. also since the counters if collected with typeperf works totally fine with stars as wildcards (dont know how to type star here withut it getting replaced :) ) ,,, think the collectm or collectw (cant remember which one) ,, monitors its conf file for changes like every 30 seconds and if something gets added there - like a new counter then that counter will be collected going forward, the collect m/w´s problems thought was that every time it "rechecked or resched" due to config file change, ud peg one cpu core for 1-2seconds..... :) also, if using typeperf (sorry for spamming bout typperf all the time, but its really great at no making a footprint while collecting alot of counters) ,, if lets say you monitor a process by instance name with wildcard , then if that process terminates,, and starts up again afte rX time,, it will be collected again (prob just aslong as the process name is the same,) ,, i do agree with your comment on the SQL processes that seems to get a random number attached at startup (no biggie though i think) btw if im gonna try build this myself which GO version 1.4 or 1.5? |
@discoduck2x, I might have a solution for wildcards in the instance name. On Pre Vista compability, I might also have a solution here. In the Pull Requests for Telegraf there is now also support for collecting a specific object every X iteration of the telegraf interval. On which Go version, I expect both 1.4 and 1.5 to work, myself I have tested only with 1.5 as that is what was coming down via chocolatey |
Pre Vista compability "added" in my local for now, testing a bit more, it will use the pre-vista method of adding perfcounters, this will require users to use the localized counters instead of the english ones that work across installed languages. |
Vista pushed to local, awaiting merge of my two other PR's before that is pushed. Depending on how many processes there is you want data for, if for example Chrome you can do Instances=["chrome","chrome#1","chrome#2","chrome#3","chrome#4"] |
@TheFlyingCorpse then the api seems flawed? as the native way of collecting counters do support wildcards in instance names? is there no way round this? |
@discoduck2x typeperf.exe is a cli tool that talks to the api. Implementation details of typeperf.exe are not easily accessible without a source code license from Microsoft. But if you have a look at @TheFlyingCorpse's implementation you will see that you can change the code yourself very easily to perform the discovery/validation of the pdh query on every call to gather. My guess is that performing the discovery/validation of the pdh query every time gather gets called, probably isn't going to add a lot of processing overhead after all. |
On another note, I am working on the code to be able to run telegraf as a windows service. I have a functional PoC running. Depending on how much time I can spend on telegraf development this week, I might have pull request ready in the next week or so. |
@cwegener how are you implementing that? what is currently blocking it running as a service? I was planning on using something like https://nssm.cc/ to wrap the binary |
@cwegener ,, oh i see, sorry if i sounded abit naive, i admit i did :) |
@sparrc I've used github.com/chai2010/winsvc which is a tiny helper to make using x/sys/windows/svc easier. The first PoC is in my fork on github.com. I only have 2 items left before it's fully usable:
|
@discoduck2x No worries. Your feedback is always welcome. I'm sure @TheFlyingCorpse is already working on some of the suggested features. 😄 |
@TheFlyingCorpse Regarding the Process Name filtering scenario for @discoduck2x - I'm wondering if a simple 'tagpass' filter in the win_perf_counters section using the 'instance' tag would do the trick ... ?! I might give that a go. It would mean that all instances of the Process object are collected via pdh but win_perf_counter should then filter out metric according to 'tagpass' .... |
@sparrc I've sorted out the passing through of additional telegraf.exe parameters during service installation time. However it needs PR for the dependency package ( chai2010/winsvc#1 ). Once that is done I will push another update to my feature/telegraf_win_svc branch. The next step in the windows service feature would be to figure out how to move all that telegraf windows service code into a separate go file, since I'm currently creating a full copy of telegraf.go, which is not very sustainable. |
Hi @cwegener I just tried compiling the feature/telegraf_win_svc branch on your fork. It seems as if the build is always taking telegraf.go instead of telegraf-win.go ( via
and executing the build again yielded following output:
Executing Also, a short brain dump with some things that (I think) would be nice to haves:
|
FYI there are now windows binaries available on the README: https://github.com/influxdata/telegraf#windows-binaries-experimental I would like to get Windows service support in there too, @cwegener do you think you could open a PR? |
@sparrc I've just started working on the windows service support again. I'll make it a priority for the coming days. |
@Oro: That branch of mine was just a start to get a few things going. Once I have a PR it will be much nicer. 😄 |
We run some computer labs at an university and would like to get stats on application usage. The labs run thin clients that connect to large windows servers. Currently for linux,mac we are using a bit of a hack job to get this data: We need to do the same as above but for windows. Example: 2 users have a rdp session to a windows server. One user Steve is running chrome with 5 tabs, Jim is running firefox and chrome. everyone is running matlab example config
example output:
also toying with having this on all staff workstations. Will be neat to see Microsoft word, (open|libre)office and others like minesweeper :) |
@cwegener thanks! If you want any help, I'll try my best (unfortunately, I have never written a line in Go) |
Closing this because the basic Windows executable is available. This is NOT to say that there won't be more windows issues & feature requests, but please open those as separate issues as they come up, so that we can track them with more granularity (see #860, for example) |
Hi,
What's an windows version is planned ?
thanks for your feedback
The text was updated successfully, but these errors were encountered: