Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Heartbeat Job validation #7587

Merged
merged 1 commit into from
Jul 19, 2018
Merged

Conversation

andrewvc
Copy link
Contributor

@andrewvc andrewvc commented Jul 12, 2018

Takes over for #7551 which was getting quite long.

Warning, I haven't gone over this line-by-line completely yet. It could probably use some more cleanup. Feedback on the skima DSL would be the most welcome sort of feedback at this point, as well as the general testing approach

This PR seeks to establish a pattern for testing heartbeat jobs. It currently tests the HTTP and TCP jobs.

To do this, it made sense to validate event maps with a sort of schema library. I couldn't find one that did exactly what I wanted here, so I wrote one called skima.

One of the nicest things about the heartbeat architecture is the dialer chain behavior. It should be the case that any validated protocol using TCP (e.g. HTTP, TCP, Redis, etc.) has the exact same tcp metadata.

To help make testing these properties easy skima lets users compose portions of a schema into a bigger one. In other words, you can say "An HTTP response should be a TCP response, with the standard monitor data added in, and also the special HTTP fields". Even having only written a handful of tests this has uncovered some inconsistencies there, where TCP jobs have a hostname, but HTTP ones do not. It's also uncovered that HTTP 5xx responses don't include a status code, while 2xx ones do.

Files worth looking at first:

  1. HTTP Test: (monitors/active/http/http_test.go)
  2. TCP Test: (monitors/active/tcp/tcp_test.go
  3. Skima Unit Tests: (skima/core_test.go)

Adendum, why I didn't choose other schema libs

Well, this started off as being a very simple set of functions, and here we are. Still, that's not a good reason to write something new. So, let's compare:

  1. go-playground/go-validator : awkward tag based syntax. A goal here was clarity, this isn't so easy to use.
  2. ozzo-validation. Seems mostly targeted at web apps / forms. Doesn't really have a validation def that 'looks' like the end representation, which is a key goal of 'skima'
  3. go-validator/validator once again, awkward tag based syntax.

@andrewvc andrewvc added in progress Pull request is currently in progress. needs tests Heartbeat labels Jul 12, 2018
@andrewvc andrewvc requested review from urso and ruflin July 12, 2018 22:09
})
}

func TcpChecks(port uint16) skima.SchemaValidator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function TcpChecks should have comment or be unexported
func TcpChecks should be TCPChecks

return uint16(p), nil
}

func MonitorChecks(id string, host string, ip string, scheme string, status string) skima.SchemaValidator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function MonitorChecks should have comment or be unexported

io.WriteString(w, BadGatewayBody)
})

func ServerPort(server *httptest.Server) (uint16, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported function ServerPort should have comment or be unexported


var BadGatewayBody = "Bad Gateway"

var BadGatewayHandler = http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported var BadGatewayHandler should have comment or be unexported

io.WriteString(w, HelloWorldBody)
})

var BadGatewayBody = "Bad Gateway"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported var BadGatewayBody should have comment or be unexported


var HelloWorldBody = "hello, world!"

var HelloWorldHandler = http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported var HelloWorldHandler should have comment or be unexported

"github.com/elastic/beats/heartbeat/skima"
)

var HelloWorldBody = "hello, world!"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported var HelloWorldBody should have comment or be unexported

if id.checkKeyMissing {
if !keyExists {
return ValidValueResult
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block ends with a return statement, so drop this else and outdent its block

// It's a little weird, but is fairly efficient. We could stop using the flattened map as a datastructure, but
// that would add complexity elsewhere. Probably a good refactor at some point, but not worth it now.
validatedPaths := sort.StringSlice{}
for k, _ := range validatedResults {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should omit 2nd value from range; this loop is equivalent to for k := range ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. gofmt -s (-s == simplify) will correct this.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the addition of the skima package as it seems to make testing easier with the composed event. One thing I worry as there is quite some complexity in the testing package that in some cases it will mean debugging the test package to figure out where the error is but we also have a good test coverage here so this should be ok.

There is a schema package in Metricbeat which does very similar things and got recently improved by @jsoriano for the error handling. Long term we should sync up on how the two packages overlap.

},
"http": skima.Map{
"url": urlStr,
// TODO: This should work in the future "response.status_code": statusCode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not working today? Perhaps the TODO needs to be more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to improve the todo. I haven't looked into the code, but after speaking with @urso this behavior is not desired.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in HB. the http monitor returns to early on error, such that the status code is not reported.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, would be good to have this note in the code somehow to make it clear.

Can be in a follow up PR.

if err != nil {
return start, end, nil, reason.IOFailed(err)
}
defer resp.Body.Close()

err = validator(resp)
end = time.Now()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need a note form @urso on why we update the end time here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different network layers record round trip times. For http having a response doesn't mean the request if finished yet. It means we have seen and parsed the headers. All additional contents will be streamed. If the body is bigger + if the validator consumes all contents, we at least have some idea about the total time taken (the validator generates additional IO). This can definitely be improved, but as we create IO when consuming the body, we need to report durations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the details.

scheme string
host string
expectedHost string
expectedPort uint16
expectedError error
}{
{
"plain",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice addition to make debugging easier.


import (
"strings"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports messing with you :-)

// specific language governing permissions and limitations
// under the License.

package skima
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reminds me quite a bit of the schema package we use in Metricbeat to validate / convert events: https://github.com/elastic/beats/tree/master/libbeat/common/schema

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too 🙂 but I think that doing what skima does with schema would be quite tricky. The main objective of schema is to convert maps into events, and collaterally it can detect when expected keys are missing or its values have unexpected types or formats. On the other hand skima validates structures, more like an assert.Equals() with superpowers 🙂

My main concern now is that we are going to have two different things with similar names, maybe an option to merge both libraries would be to make a schema.Validator that reuses somehow the schema conversion functions.

}
}

// Strict is used when you want any unspecified keys that are encountered to be considered errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano Reminds me a lot of your recent changes to schema.Apply.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably Strict is a better name than AllRequired 🙂

package skima

import (
"testing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

goimports :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhhh, I need to fix intellij to do the right thing. Sorry!


// Valid returns true if there are no errors.
func (r SchemaValidationResults) Valid() bool {
// TODO: this is a pretty slow way to do this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is only for testing should not matter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! I agree. N is pretty small.

One thing @urso did mention is we could eventually use this skima lib as the backend to a user-definable validator in YAML. I think it's important to call out slow functions so that if people use them in the future they're aware they may need optimization.

// specific language governing permissions and limitations
// under the License.

package testutil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other places we call this package testing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ will change

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both testing and testutil are somewhat generic. We even have a testing package that is not used for testing but for config validation... I like what the go stdlib does. E.g. httptest or iotest. Depending how generic this package is hbtest might be fine :)

@@ -1361,6 +1361,12 @@
"version": "v1.2.0",
"versionExact": "v1.2.0"
},
{
"checksumSHA1": "wnEANt4k5X/KGwoFyfSSnpxULm4=",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this package gets added but I couldn't find where it's used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I did delete the test that used it, but I'm going to use it again soon. The existing tests should use this to t.FailNow() when a test server returns err or the like. Apologies. So, a common idiom would be:

server, err := httptest.Server()
require.Nil(err)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use assert where requires should be used much too often. +1 for adding it. But maybe in an extra PR, so we can use requires in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I just pushed new commits that make heavier use of requires. Do we still want to break that up into a separate PR? (I'm fine with that, not a big deal)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with keeping it in this PR as it's now actually used.

// and turn it into a sorted string array, then do a binary prefix search to determine if a subkey was tested.
// It's a little weird, but is fairly efficient. We could stop using the flattened map as a datastructure, but
// that would add complexity elsewhere. Probably a good refactor at some point, but not worth it now.
validatedPaths := sort.StringSlice{}
Copy link

@urso urso Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while it works a more idiomatic use of sort package is to use it only for sorting, not for it's types:

e.g.

validatedPaths := make([]string, 0, len(validatedResults))
for k := range validatedResults {
  validatedPaths = append(validatedPaths, k)
}
sort.Strings(validatedPaths)

The sort.Strings is equivalent to sort.Strings(x) == sort.Sort(sort.StringSlice(x))

currentMap common.MapStr,
rootMap common.MapStr,
path []string,
dottedPath string) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when breaking like this common pattern is this:

func(
  param1,
  param2,
  ...
  paramN,
) <optional return> {
  code
}

This way you don't need the newline to visually distinguish parameters from function body, plus yeah can easily add/remove/move parameters by copy'n paste ;)

Can walk be simplified to pass less parameters? Or use a struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the heads up, glad to use a struct. I think that will make it much easier to read

type Map map[string]interface{}

// SchemaValidator is the result of Schema and is run against the map you'd like to test.
type SchemaValidator func(common.MapStr) SchemaValidationResults
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to reduce stutter. The package kinda name resembles schema. For users of a package, a symbol defined in a package always has the name combined by package name + symbol name. E.g. the actual type name here is skima.SchemaValidator. Without stuttering we get skima.Validator.

Same for the validation results. The SchemaValidationResults will be skima.SchemaValidationResults. Unless you have a more generic result type consider renaming this to skima.Results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The confusing thing here is that there is a notion of Value validation vs. Schema validation in this package.

For values we have IsDef, Checker, and ValueResult
For map/schema validation we have SchemaValidator and SchemaValidationResults.

I'm thinking these names are a bit inconsistent, and I agree too long. WDYT about renaming them as indicated below?

IsDef->IsDef
Checker->ValValidator
ValueResult->ValResult
SchemaValidator->Validator
SchemaValidationResults->Results

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. ValueResult is fine as well.

// Strict is used when you want any unspecified keys that are encountered to be considered errors.
func Strict(laxValidator SchemaValidator) SchemaValidator {
return func(actual common.MapStr) SchemaValidationResults {
validatedResults := laxValidator(actual)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything is validatedX in here. Try to keep short names. E.g. code is understandable as well, by using results, paths.

@andrewvc
Copy link
Contributor Author

@ruflin thanks for the review.

Would love to get your input on this @jsoriano . I think the two packages have different aims. Am I correct in saying that schema seems to be more oriented around transformations? This package is solely focused on validation. There may be an opportunity here to unify stuff, we should sync on that.

path []string,
dottedPath string) {

_, validatedExactly := validatedResults[dottedPath]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as results is it's own type, these two lines are calling for a method so you can do: if results.Has(dottedPath)...

import (
"strings"

"testing"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do never import "testing" in packages that are not used for testing only. If skima is supposed to be used for testing only, then this should be reflected in the package path. Importaing "testing" has a many side-effects. E.g. global CLI flags we do not want (and that will clash with beats CLI flags) will be registered globally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! OK, I'll move the Test function out of skima into a new skimat support package. (does that naming work well? I'm trying to adjust to short var names in goland)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM. skimatest will also do. One can still alias or wrap the function locally if required.

assert.True(t, vr.valid, msg)
})
return r
}
Copy link

@urso urso Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing this to DetailedErrors which returns an error wrapping multiple errors.

type multiErr []error

type (m *multiErr) Error() string {
   ...
}

func Details(r Results) error {
  var errs multiErr
  ...

  if len(errs) == 0 {
    return nil
  }
  return errs
}

Then one can use assert.NoError(skima.Details(results)) or requires.NoError(skima.Details(results)) in actual test code

We can also provide a runner returning a multi-error.

Btw. we vendor a multierr package for these use-cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, I'd still like to keep around the map[string][]ValueResult structure for future use, but for testing it is overkill.

rootMap common.MapStr,
path []string,
dottedPath string,
)
Copy link

@urso urso Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path and dotted path are the same. How about:

type path struct {
  dotted string
  elems []string
}

Also the maps build some fields ctx like:

type fieldsCtx struct {
  map common.MapStr
  root common.MapStr
}

func (ctx *fieldsCtx) Get(...) ... {}
...

introducing these types + methods the walkObersver becomes

type walker func(ctx *fieldCtx, path path, key string, value interface{})

func walkFull(m common.MapStr, root common.MapStr, path []string, wo walkObserver) {
for k, v := range m {
splitK := strings.Split(k, ".")
var newPath []string
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor optimisation:

newPath := make([]string, len(path) + len(splitK)) // pre-alloc slice
copy(newPath, path)
copy(newPath[len(path):], splitK)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, much nicer

} else if convertedM, ok := v.(Map); ok {
mapV = common.MapStr(convertedM)
vIsMap = true
}
Copy link

@urso urso Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add map[string]interface{} ?

e.g.

vIsMap := true
switch tmp := v.(type) {
case common.MapStr:
  mapV = tmp
case Map:
  mapV = common.MapStr(tmp)
case map[string]interface{}:
  mapV = common.MapStr(tmp)
default:
  vIsMap = false
}
if vIsMap {
  ...
}

// Walk nested maps
vIsMap := false
var mapV common.MapStr
// Note that we intentionally do not handle StrictMap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've killed StrictMap for now, will remove this comment.

"github.com/elastic/beats/heartbeat/skima"
)

var HelloWorldBody = "hello, world!"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want anybody mess with globals, use const here ;)

return uint16(p), nil
}

func MonitorChecks(id string, host string, ip string, scheme string, status string) skima.SchemaValidator {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternative syntax:

func MonitorChecks(id, host, ip, scheme, status string) skima.Validator {
...
}

return skima.Schema(skima.Map{
"monitor": skima.Map{
// TODO: This is only optional because, for some reason, TCP returns
// this value, but HTTP does not. We should fix this
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this was actually on purpose. Does http add an url field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTTP does not, but TCP does, so we should probably be consistent here.

I kind of like having an URL representation of every resource TBH, but then it is one more field to be indexed....

WDYT?

@@ -218,41 +218,62 @@ func execPing(
body []byte,
timeout time.Duration,
validator func(*http.Response) error,
) (time.Time, time.Time, common.MapStr, reason.Reason) {
) (start time.Time, end time.Time, event common.MapStr, errReason reason.Reason) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also be written (start, end time.Time, event common.MapStr, err reason.Reason)


// TcpChecks creates a skima.Validator that represents the "tcp" field present
// in all heartbeat events that use a Tcp connection as part of their DialChain
func TcpChecks(port uint16) skima.Validator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func TcpChecks should be TCPChecks


// TcpChecks creates a skima.Validator that represents the "tcp" field present
// in all heartbeat events that use a Tcp connection as part of their DialChain
func TcpChecks(port uint16) skima.Validator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func TcpChecks should be TCPChecks

return fmt.Sprintf("@path '%s': %s", vre.path, vre.valueResult.Message)
}

func (r Results) Errors() []error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Results.Errors should have comment or be unexported

return errors
}

type ValueResultError struct {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported type ValueResultError should have comment or be unexported

}
}

// Errors returns a new Results object consisting only of error data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on exported method Results.DetailedErrors should be of the form "DetailedErrors ..."

}
}

func (r Results) EachResult(f func(string, ValueResult) bool) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exported method Results.EachResult should have comment or be unexported

if id.checkKeyMissing {
if !keyExists {
return ValidVR
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block ends with a return statement, so drop this else and outdent its block


// TcpChecks creates a skima.Validator that represents the "tcp" field present
// in all heartbeat events that use a Tcp connection as part of their DialChain
func TcpChecks(port uint16) skima.Validator {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func TcpChecks should be TCPChecks

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is looking quite good, only some comments by now. I'll give another look and thought during the weekend 🙂

}
}

// Strict is used when you want any unspecified keys that are encountered to be considered errors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably Strict is a better name than AllRequired 🙂

// specific language governing permissions and limitations
// under the License.

package skima
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me too 🙂 but I think that doing what skima does with schema would be quite tricky. The main objective of schema is to convert maps into events, and collaterally it can detect when expected keys are missing or its values have unexpected types or formats. On the other hand skima validates structures, more like an assert.Equals() with superpowers 🙂

My main concern now is that we are going to have two different things with similar names, maybe an option to merge both libraries would be to make a schema.Validator that reuses somehow the schema conversion functions.

kontinue := true
for path, pathResults := range r {
for _, result := range pathResults {
kontinue = f(path, result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if !f(path, result) {
    return
}

instead of kontinue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I forgot return was a thing. Good suggestion :-D

@andrewvc
Copy link
Contributor Author

@jsoriano I think some of the fixes I pushed blew-away your comment and the discussion about merging skima and schema since the lines in the code changed.

It seems like, as you said, schema is about map->event, and this is more about map validation. While there are some similarities (both walk maps for instance), it seems that the actual goals, as you mentioned are quite different. Merging them doesn't seem like a clear win to me at this point.

I do completely agree that if we do keep them separate the naming would be confusing. I'm glad to rename skima to something else. Maybe validatron or mapval or treecomp etc. etc.

@andrewvc
Copy link
Contributor Author

I think I've addressed all PR comments except the question of what to do with skima itself.

That aside, I do think this PR is pretty much done. The original intent here was to write some basic tests for http and TCP. I do think more tests are needed, however, I don't think this PR needs more going on inside of it. My preference would be to merge this sooner rather than later, and add those in a separate PR focused on testing the correctness of the various jobs.

Once we have tests for the current behavior, we'll also have a catalog of where consistency can be improved (things like status codes in http responses etc.) and can make some further follow-up PRs to rectify those issues.

Moving in phases for a refactor like this is important IMHO, since I don't want to change/break behavior until we have a way to reflect that in the tests.

@jsoriano
Copy link
Member

jsoriano commented Jul 15, 2018

I think that skima could find its place under libbeat, probably other beats can also make use of it, specially if we move more integration testing towards Go implementations.
Regarding the name, to avoid confusion with schema, I'd highlight its use as map validator (or event validator), so mapval or something like this looks great to me, maybe it could be placed in libbeat/testing/mapval, or if it can find uses out of testing in libbeat/common/mapval. Once this is solved I agree with merging this so development can continue.

@ruflin
Copy link
Member

ruflin commented Jul 16, 2018

As these 2 packages are currently in 2 different Beats I don't think the naming discussion should hold back the merging of this PR. @andrewvc Could you open a follow up Github issue for the discussion?

@ruflin
Copy link
Member

ruflin commented Jul 16, 2018

@andrewvc Can you look into the failing http tests? Also makes sure to rebase on master, this should resolve the failing tests in the generator.

@jsoriano
Copy link
Member

@ruflin schema is in libbeat and it is currently used both in libbeat and Metricbeat, and I guess it could be also used in heartbeat (or other beats too) if needed at some point. The same for "skima", it has value to be used in other beats tests, so apart of the naming, wdyt about moving it to libbeat?

(In any case I'd prefer a name that highlights its use for validation 🙂 like mapval or something like this)

@ruflin
Copy link
Member

ruflin commented Jul 17, 2018

@jsoriano Sorry, you are right. I forgot that we move schema to libbeat :-( I'm all on board with changing the name but I wonder if it must happen in this PR. The PR already got much larger then initially expected so I would rather get it in and have a follow up PR to changes the names and potentially moves the package around.

if errReason != nil {
if resp != nil {
return start, end, makeEvent(end.Sub(start), resp), errReason
} else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if block ends with a return statement, so drop this else and outdent its block

@andrewvc
Copy link
Contributor Author

OK, renamed skima->mapval. Also, moved mapval to libbeat.

I also fixed the source of the test failure, which was due to a restructuring I made to the HTTP task. This actually fixes what I thought was a pre-existing bug, but was actually something I introduced, which was HTTP error responses not including their status code in the event. The good news is that adding these tests will help clarify such things in the future.

@andrewvc andrewvc removed the in progress Pull request is currently in progress. label Jul 18, 2018
@andrewvc andrewvc changed the title [WIP] Basic Heartbeat Job validation Basic Heartbeat Job validation Jul 18, 2018
@andrewvc
Copy link
Contributor Author

@ruflin this is good for a squash + merge FYI. I'm leaving it unsquashed for now in case you want to look over the last few commits.

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Waiting for @andrewvc to squash and the for the green light from CI.

This commit seeks to establish a pattern for testing heartbeat jobs. It currently tests the HTTP and TCP jobs. It also required some minor refactors of those tasks for HTTP/TCP.

To do this, it made sense to validate event maps with a sort of schema library. I couldn't find one that did exactly what I wanted here, so I wrote one called mapval. That turned out to be a large undertaking, and is now the majority of this commit. Further tests need to be written, but this commit is large enough as is.

One of the nicest things about the heartbeat architecture is the dialer chain behavior. It should be the case that any validated protocol using TCP (e.g. HTTP, TCP, Redis, etc.) has the exact same tcp metadata.

To help make testing these properties easy mapval lets users compose portions of a schema into a bigger one. In other words, you can say "An HTTP response should be a TCP response, with the standard monitor data added in, and also the special HTTP fields". Even having only written a handful of tests this has uncovered some inconsistencies there, where TCP jobs have a hostname, but HTTP ones do not.
@ruflin ruflin merged commit efb1b2a into elastic:master Jul 19, 2018
@ruflin
Copy link
Member

ruflin commented Jul 19, 2018

Thanks for all the work on this. Solid foundation for more testing on Heartbeat.

tsg pushed a commit that referenced this pull request Jul 24, 2018
* Fix breaking change in monitoring data (#7563)

The prefix for the stats metrics was metrics but renamed to `stats` by accident as the name is now auto generated. This reverts this change.

Closes #7562

* Add http.request.mehod to Kibana log filset (#7607)

Take `http.request.method` from ECS and apply it to the Kibana fileset.

Additional logs are added to the example log files.

* Fix rename log message (#7614)

Instead of the from field the to field was logged.

* Add tests to verify template content (#7606)

We recently started to move fields.yml into the Golang binary to be used internally. To make sure the loading important and loading of all the data into the binary works as expected for Metricbeat, this adds some basic tests. Related to #7605.

* Basic support of ES GC metrics for jvm9 (#7628)

GC log format for JVM9 is more detailed than for JVM8.

Differences and possible improvements:
* To get cpu_times.* a corellation between log lines is required.
* Some GC metrics are available in jvm8 are not in jvm9
  (class_unload_time_sec, weak_refs_processing_time_sec, ...)
* heap.used_kb is empty, but it can be calculated as young_gen.used_kb +
  old_gen.size_kb
* GC phase times are logged in miliseconds vs seconds in jvm8

* Improve fields.yml generator of modules (#7533)

From now on when a user provides a type hint in an Ingest pipeline, it's added to the generated `fields.yml` instead of guessing.

Closes #7472

* Fix filebeat registry meta being nil vs empty (#7632)

Filebeat introduces a meta field to registry entries in 6.3.1. The meta field is used to distuingish different log streams in docker files. For other input types the meta field must be null. Unfortunately the input loader did initialize the meta field with an empty dictionary. This leads to failing matches of old and new registry entries. Due to the match failing, old entries will not be removed, and filebeat will handle all files as new files on startup (old logs are send again).

Users will observe duplicate entries in the reigstry file. One entry with "meta": null and one entry with "meta": {}. The entry with "meta": {} will be used by filebeat. The null-entry will not be used by filebeat, but is kept in the registry file, cause it has now active owner (yet).

Improvements provided by this PR:

* when matching states consider an empty map and a null-map to be equivalent
* update input loader to create a null map for old state -> registry entries will be compatible on upgrade
* Add checks in critical places replacing an empty map with a null-map
* Add support to fix registry entries on load. states from corrupted 6.3.1 files will be merged into one single state on load 
* introduce unit tests for loading different registry formats
* introduce system tests validating output and registry when upgrading filebeat from an older version

Closes: #7634

* Heartbeat Job Validation + addition of libbeat/mapval (#7587)

This commit seeks to establish a pattern for testing heartbeat jobs. It currently tests the HTTP and TCP jobs. It also required some minor refactors of those tasks for HTTP/TCP.

To do this, it made sense to validate event maps with a sort of schema library. I couldn't find one that did exactly what I wanted here, so I wrote one called mapval. That turned out to be a large undertaking, and is now the majority of this commit. Further tests need to be written, but this commit is large enough as is.

One of the nicest things about the heartbeat architecture is the dialer chain behavior. It should be the case that any validated protocol using TCP (e.g. HTTP, TCP, Redis, etc.) has the exact same tcp metadata.

To help make testing these properties easy mapval lets users compose portions of a schema into a bigger one. In other words, you can say "An HTTP response should be a TCP response, with the standard monitor data added in, and also the special HTTP fields". Even having only written a handful of tests this has uncovered some inconsistencies there, where TCP jobs have a hostname, but HTTP ones do not.

* Only fetch shard metrics from master node (#7635)

This PR makes it so that the `elasticsearch/shard` metricset only fetches information from the Elasticsearch node if that node is the master node.

* Create (X-Pack Monitoring) stats metricset for Kibana module (#7525)

This PR takes the `stats` metricset of the `kibana` Metricbeat module and makes it ship documents to `.monitoring-kibana-6-mb-%{YYYY.MM.DD}` indices, while preserving the current format/mapping expected by docs in these indices. This will ensure that current consumers of the data in these indices, viz. the X-Pack Monitoring UI and the Telemetry shipping module in Kibana, will continue to work as-is.

* Add kubernetes specs for auditbeat file integrity monitoring (#7642)

* Release the rename processor as GA

* Fix log message for Kibana beta state (#7631)

From copy paste Kafka was in the log message instead of Kibana.

* Clean up experimental and beta messages (#7659)

Sometimes the old logging mechanism was used. If all use the new one it is easier to find all the entries. In addition some messages were inconsistent.

* Release raid and socket metricset from system module as GA (#7658)

* Release raid and socket metricset from system module as GA

* remove raid metricset title

* Update geoip config docs (#7640)

* Document  breaking change in monitoring shcema

Situation:

* Edit breaking changes statement about monitoring schema changes (#7666)

* Marking Elasticsearch module and its metricsets as beta (#7662)

This PR marks the `elasticsearch` module and all its 8 existing metricsets all as `beta`. Previously only 
2 metricsets were marked as `beta` with the remaining 6 marked as `experimental`.

* Increase kafka version in tests to 1.1.1 (#7655)

* Add missing mongodb status fields (#7613)

Add `locks`, `global_locks`, `oplatencies` and `process` fields to `status` metricset of MongoDB module.

* Remove outdated vendor information. (#7676)

* Fix Filebeat tests with new region_iso_code field (#7678)

In elastic/elasticsearch#31669 the field `region_iso_code` was added to the geoip processor. Because of this test broke with the most recent release of Elasticsearch as the events contain an undocumented field.

* Fix duplicated module headers (#7650)

* Fix duplicated module headers

Closes #7643

* fix metricset titles for munin and kvm

* fix imssing kubernetes apiserver metricset doc

* remove headers from modules / metricset generator and clean up traefik title

* Release munin and traefik module as beta. (#7660)

* Release munin and treafik module as beta.

* fixes to munin module

* Report k8s pct metrics from enrichment process (#7677)

Instead of doing it from the `state_container`. Problem with the
previous approach is that `state_container` metricset is not run in all
nodes, but from a single point. Making performance metrics not available
in all cases.

With this new approach, the enriching process will also collect
performance metrics, so they should be available everywhere where the
module is run.

* Fix misspell in Beats repo (#7679)

Running `make misspell`.

* Update sarama (kafka client) to 1.17 (#7665)

- Update Sarama to 1.17. The Sarama testsuite tests kafka versions between 0.11 and 1.1.0.
- Update compatible versions in output docs
- Add compression_level setting for gzip compression

* Update github.com/OneOfOne/xxhash to fix mips

* Update boltdb to use github.com/coreos/bbolt fork

Closes #6052

* Generate fields.yml using Mage (#7670)

Make will now delegate to mage for generating fields.yml. Make will check if the mage command exists and go install it if not. The FIELDS_FILE_PATH make variable is not longer used because the path(s) are specified in magefile.go.

This allows fields.yml to be generated on Windows using Mage. The CI scripts for Windows have been updated so that fields.yml is generated for all Beats during testing.

This also adds a make.bat in each directory where building occurs to give Windows
users a starting point.

Some fixes were made to the generators because:
- rsync was excluding important source files contained in a directory
  named "build"
- the generated project needed to be `git init` before running certain
  magefile targets that detect project's root dir and import path.

* Update go-ucfg to 0.6.1 (#7599)

Update fixes config unpacking if users overwrite settings from CLI, with
missing values. When using `-E key=` (e.g. in scripts defining potential
empty defaults via env variables like `-E key=${MYVALUE}`), an untyped
`nil`-values was inserted into the config. This untyped value will make
Unpack fail for most typed settings.

* Docs: Add deprecation check for dashboard loading. (#7675)

For APM Server the recommended way of loading dashboards and Kibana index pattern will be through the Kibana UI from 6.4 on. Since the docs are based on the libbeat docs we need to add a deprecation flag for dashboard and index pattern related documentation.

relates to elastic/apm-server#1142

* Update expected filebeat module files for geoip change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants