Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide same data structure as X-Pack for ES node_stats #6807

Merged
merged 2 commits into from
Apr 26, 2018

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Apr 9, 2018

This PR adds a flag xpack.enabled to the Elasticsearch Metricbeat module. If enabled the data is transformed to the format as X-Pack Monitoring needs for the UI. It overwrites the index with monitoring.

A config to enable the feature currently looks as following:

metricbeat.modules:
- module: elasticsearch
  metricsets: ["node_stats"]
  hosts: ["localhost:9200"]
  period: 1s
  xpack.enabled: true
  • Data is sent to the index .monitoring-es-6-mb-{date}. This is hard coded.
  • To enabled the feature xpack.enabled has to be set in the module
  • If a node is master is checked on each request as it can change over time. The master request is not atomic so this information could be inaccurate. The master detection also only works correctly, if data is fetched from only 1 node.
  • The cluser_uuid is fetched on the first request and cached from then based on the assumption that the cluster_id for a node id does not change over time.
  • limit_in_bytes and usage_in_bytes are provided as string by ES and not converted to int as they could overflow in ES.
  • Experimental message is logged when used. The feature is not documented yet on purpose.

As the data goes into a separate index it is expected that monitoring already has loaded the template.

@ruflin ruflin added in progress Pull request is currently in progress. module discuss Issue needs further discussion. Metricbeat Metricbeat labels Apr 9, 2018
Even better would be to have the cluster id as part of the response
*/

clusterId := "1234"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterId should be clusterID

func GetClusterId(nodeId, clusterName string) string {

// Check if cluster id already cached. If yes, return it.
if clusterId, ok := clusterIdCache[nodeId]; ok {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterId should be clusterID

// Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id
var clusterIdCache = map[string]string{}

func GetClusterId(nodeId, clusterName string) string {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function GetClusterId should have comment or be unexported
func GetClusterId should be GetClusterID
func parameter nodeId should be nodeID

package elasticsearch

// Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id
var clusterIdCache = map[string]string{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterIdCache should be clusterIDCache

@@ -0,0 +1,220 @@
package node_stats

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use an underscore in package name

// master node will not be accurate anymore as often in these cases a proxy is in front
// of ES and it's not know if the request will be routed to the same node as before.
for nodeId, node := range nodesStruct.Nodes {
clusterId, err := elasticsearch.GetClusterId(m.http, m.HostData().SanitizedURI, nodeId)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterId should be clusterID

// it will provid the data for multiple nodes. This will mean the detection of the
// master node will not be accurate anymore as often in these cases a proxy is in front
// of ES and it's not know if the request will be routed to the same node as before.
for nodeId, node := range nodesStruct.Nodes {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range var nodeId should be nodeID

json.Unmarshal(content, &nodesStruct)

// _local will only fetch one node info. First entry is node name
for k, _ := range nodesStruct.Nodes {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should omit 2nd value from range; this loop is equivalent to for k := range ...

func GetClusterId(http *helper.HTTP, uri string, nodeId string) (string, error) {

// Check if cluster id already cached. If yes, return it.
if clusterId, ok := clusterIdCache[nodeId]; ok {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterId should be clusterID

ClusterId string `json:"cluster_uuid"`
}

func GetClusterId(http *helper.HTTP, uri string, nodeId string) (string, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function GetClusterId should have comment or be unexported
func GetClusterId should be GetClusterID
func parameter nodeId should be nodeID


type infoStruct struct {
ClusterName string `json:"cluster_name"`
ClusterId string `json:"cluster_uuid"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct field ClusterId should be ClusterID

)

// Global clusterIdCache. Assumption is that the same node id never can belong to a different cluster id
var clusterIdCache = map[string]string{}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterIdCache should be clusterIDCache

Copy link
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it's in good shape. Left some thoughts.


// Parses the uri to replace the path
u, _ := url.Parse(uri)
u.Path = "_cluster/state/master_node"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably call GET /_cluster/state/master_node?local=true to avoid having to bounce to the elected master to find out what this node thinks. Since we have to asynchronously fetch this, it's a race condition regardless. If it's wrong, it'll fix itself on the next pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, wasn't aware of this param. That means we also get an answer in case the node is disconnected I assume.


// Parses the uri to replace the path
u, _ := url.Parse(uri)
u.Path = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a shame that we have to call this to get the cluster_uuid after invoking the GET /_cluster/state/master_node. Perhaps we can get it added over there so we can do a single request to fetch both details.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great. Can you take this up with the ES team as you probably know best where this change would have to happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


func getNodeName(http *helper.HTTP, uri string) (string, error) {

// Parses the uri to replace the path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 64-73 are the same for the next 2 methods except for u.Path. Perhaps we can have a helper method that avoids repeating it all?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// from the path and Metricbeat is not installed on the same machine as the node
// it will provid the data for multiple nodes. This will mean the detection of the
// master node will not be accurate anymore as often in these cases a proxy is in front
// of ES and it's not know if the request will be routed to the same node as before.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support proxies, we could allow users to specify the node name that they want Beats to monitor and, if unspecified, use _local as the name.

Pitfalls:

  • Misconfiguration means it's broken.
  • Duplicate node names can cause confusion.
  • Requires the module's configuration to be reloaded if the node is renamed.

Advantages:

  • We'll work, by default, for non-proxies.
  • We can work through proxies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is pretty interesting. I would skip it for the first iteration but I'm pretty sure that is a request that will pop up.

It will probably be a bit tricker for dynamic environments where the user does not have much control around the naming of nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case they can set its address, which hopefully they know.

nodeData, _ := schemaXpack.Apply(node)
nodeStats := common.MapStr{
// This value is currently hard coded as it's not available in node_stats endpoint
"mlockall": false,
Copy link
Member

@pickypg pickypg Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can ignore this value , which was a holdover from when it was in _nodes/stats. We do not use it and it should be in an information document [that does not exist] -- not a stats one.

In case you're wondering, you could fetch it via GET /_nodes/_local/process.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@ruflin ruflin changed the title [Experiment] Provide same data structure as X-Pack for ES node_stats Provide same data structure as X-Pack for ES node_stats Apr 16, 2018
func GetClusterID(http *helper.HTTP, uri string, nodeID string) (string, error) {

// Check if cluster id already cached. If yes, return it.
if clusterId, ok := clusterIDCache[nodeID]; ok {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var clusterId should be clusterID

@ruflin
Copy link
Member Author

ruflin commented Apr 16, 2018

Review feedback addressed. We should leave this open until our last 6.x merge from master for 6.3 is done.

@@ -60,6 +61,12 @@ func (e *Event) BeatEvent(module, metricSet string, modifiers ...EventModifier)
e.MetricSetFields = nil
}

// Set index prefix
// TODO: does this break if user sets an index himself?
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@urso What will happen here with user defined index patterns? Does this get overwritten?


func getNodeName(http *helper.HTTP, uri string) (string, error) {

// Parses the uri to replace the path
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Parses the uri to replace the path
u, _ := url.Parse(uri)
u.Path = "_cluster/state/master_node"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, wasn't aware of this param. That means we also get an answer in case the node is disconnected I assume.


// Parses the uri to replace the path
u, _ := url.Parse(uri)
u.Path = ""
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great. Can you take this up with the ES team as you probably know best where this change would have to happen?

// from the path and Metricbeat is not installed on the same machine as the node
// it will provid the data for multiple nodes. This will mean the detection of the
// master node will not be accurate anymore as often in these cases a proxy is in front
// of ES and it's not know if the request will be routed to the same node as before.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is pretty interesting. I would skip it for the first iteration but I'm pretty sure that is a request that will pop up.

It will probably be a bit tricker for dynamic environments where the user does not have much control around the naming of nodes.

nodeData, _ := schemaXpack.Apply(node)
nodeStats := common.MapStr{
// This value is currently hard coded as it's not available in node_stats endpoint
"mlockall": false,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

ruflin added a commit to ruflin/beats that referenced this pull request Apr 16, 2018
ruflin added a commit to ruflin/beats that referenced this pull request Apr 25, 2018
To allow more flexibility the Elasticsearch node_stats metricset is updated to use the ReporverV2 interface. This is a subset of elastic#6807.

* Test data set for ES 6.2.3 added
* Reporting of error if json decoding fails
* Update node_stats integration tests to use reporter interface
* Update generator to also add metricset info
* Update data.json for node_stats and index
* Set namespace as part of registration
* Add service.name to event
exekias pushed a commit that referenced this pull request Apr 25, 2018
To allow more flexibility the Elasticsearch node_stats metricset is updated to use the ReporverV2 interface. This is a subset of #6807.

* Test data set for ES 6.2.3 added
* Reporting of error if json decoding fails
* Update node_stats integration tests to use reporter interface
* Update generator to also add metricset info
* Update data.json for node_stats and index
* Set namespace as part of registration
* Add service.name to event
@ruflin ruflin added review and removed discuss Issue needs further discussion. labels Apr 25, 2018
@ruflin ruflin removed the in progress Pull request is currently in progress. label Apr 26, 2018
This PR adds a flag `xpack.enabled` to the Elasticsearch Metricbeat module. If enabled the data is transformed to the format as X-Pack Monitoring needs for the UI. It overwrites the index with `monitoring`.

A config to enable the feature currently looks as following:

```
metricbeat.modules:
- module: elasticsearch
  metricsets: ["node_stats"]
  hosts: ["localhost:9200"]
  period: 1s
  xpack.enabled: true
```

* Data is sent to the index `.monitoring-es-6-mb-{date}`. This is hard coded.
* To enabled the feature `xpack.enabled` has to be set in the module
* If a node is master is checked on each request as it can change over time. The master request is not atomic so this information could be inaccurate. The master detection also only works correctly, if data is fetched from only 1 node.
* The cluser_uuid is fetched on the first request and cached from then based on the assumption that the cluster_id for a node id does not change over time.
* `limit_in_bytes` and `usage_in_bytes` are provided as string by ES and not converted to int as they could overflow in ES.
* Experimental message is logged when used. The feature is not documented yet on purpose.

As the data goes into a separate index it is expected that monitoring already has loaded the template.
@ruflin
Copy link
Member Author

ruflin commented Apr 26, 2018

This PR is ready for an other review.

As soon as elastic/elasticsearch#30143 is merged we can do a follow up PR for further simplifying the collection.

@ruflin
Copy link
Member Author

ruflin commented Apr 26, 2018

One additional note on elastic/elasticsearch#30143 As this will be only for 6.4 forward we will still need the lookup for earlier versions and it means we have to detect which version we are monitoring.

"node_master": isMaster,
"node_id": nodeID,
}
nodeStats.DeepUpdate(nodeData)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use:

nodeData["node_master"] = isMaster
nodeData["node_id"] = nodeId

That would avoid DeepUpdate iterating over all fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed.

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left a comment

@exekias exekias merged commit ed840fd into elastic:master Apr 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants