libbeat: Refactor error handling in schema.Apply() #7335

jsoriano · 2018-06-14T14:33:12Z

schema.Apply() returned a schema.Errors object as error, with this
object is not easy to check if there was really an error, specially when
optional fields are involved. This leads to lots of cases where these
errors were being ignored.

This change introduces a series of changes to have better control on the
errors happening during the application of an schema.

The interface is kept as event, err := schema.Apply(data), but now the
error returned is always a plain error in case of one or more errors
happened, or nil if no error happened.

schema.ApplyTo() returns the list of errors, and can be used where
multiple schemas are applied for the same event.

Apply() and ApplyTo() can now receive a list of options, these
options are functions that can be used to filter the errors returned or
to access the unfiltered list of errors. This allows different
combinations of resulting errors in case of optional or required fields.
It also allows to add additional checks on all the errors happened, even
on the ones ignored for the final result. Three options are added by now:

AllRequired to fail if any non-optional field is missing, this
mimics the current behaviour, and is set by default if no other option
is set.
FailOnRequired to fail if any required field is missing. A new
Required option is also added to be able to declare required fields.
NotFoundKeys(cb func([]string)) is an option builder that receives
a function, this function is called with the list of missing keys, so
additional custom checks or tests can be done.

For lists of errors, the multierror library is used now, some custom
errors have been added to help on previous modifications and to improve
some error messages. Errors on structured data contain now the full
paths of the fields.

houndci-bot · 2018-06-14T14:33:41Z

libbeat/common/schema/schema.go

@@ -55,7 +51,7 @@ func (conv Conv) HasKey(key string) bool {
 // implements Mapper interface for structure
 type Object map[string]Mapper

-func (o Object) Map(key string, event common.MapStr, data map[string]interface{}) *Errors {
+func (o Object) Map(key string, event common.MapStr, data map[string]interface{}) multierror.Errors {


exported method Object.Map should have comment or be unexported

houndci-bot · 2018-06-14T14:33:41Z

libbeat/common/schema/error.go

-	key       string
-	message   string
-	errorType ErrorType
+type KeyNotFoundError struct {


exported type KeyNotFoundError should have comment or be unexported

ruflin · 2018-06-15T08:51:42Z

+100 on that change. Now that we have a result struct we can also add potentially more / other data in the future if we need it.

ruflin · 2018-06-15T08:54:36Z

Two other options I was thinking of on what we could return:

event, result, err := schema.Apply()
if err != nil {
   return err
}

It would make error checking more go like and often people would probably skip the result.

result, err := schema.Apply()
if err != nil {
   return err
}
result.Event ...

An other option is to have event as part of the result. When I read the code I asked myself why is event not part of the result as it's kind of the result.

I'm good with all 3 option as all 3 are much better then we have now. Thanks for coming up with this.

jsoriano · 2018-06-15T12:05:14Z

The main reason why I chose to keep the event as first returned value and the "result" as second was to keep it more similar to the current interface so it continue working the same as before for the majority of cases where the error is being ignored.

But it is true that in any case the signature is changing and maybe it is a good idea to force the review of uses of this function, specially where the error is being ignored. I like the result, err := schema.Apply() form, I'll give a try to this.

houndci-bot · 2018-06-22T09:53:14Z

libbeat/common/schema/error.go

 }

-func (err *Error) IsType(errorType ErrorType) bool {
-	return err.errorType == errorType
+func NewWrongFormatError(key string, msg string) *WrongFormatError {


exported function NewWrongFormatError should have comment or be unexported

houndci-bot · 2018-06-22T09:53:14Z

libbeat/common/schema/error.go

 }

-func (err *Error) SetType(errorType ErrorType) {
-	err.errorType = errorType
+type WrongFormatError struct {


exported type WrongFormatError should have comment or be unexported

houndci-bot · 2018-06-22T09:53:14Z

libbeat/common/schema/error.go

-	key       string
-	message   string
-	errorType ErrorType
+func NewKeyNotFoundError(key string) *KeyNotFoundError {


exported function NewKeyNotFoundError should have comment or be unexported

houndci-bot · 2018-06-22T09:53:15Z

libbeat/common/schema/error.go

+	k.key = v
+}
+
+type KeyNotFoundError struct {


exported type KeyNotFoundError should have comment or be unexported

houndci-bot · 2018-06-22T09:53:15Z

libbeat/common/schema/error.go

 )

-type ErrorType int
+type KeyError interface {


exported type KeyError should have comment or be unexported

jsoriano · 2018-06-22T10:04:49Z

libbeat/common/schema/mapstriface/mapstriface.go

+	for _, err := range errors {
+		if err, ok := err.(schema.KeyError); ok {
+			err.SetKey(convMap.Key + "." + err.Key())
+		}


This is to have more meaningful errors on nested objects, previously a missing key would report something like Key "segments" not found, now it reports key "total.segments" not found.

nice, that was not very nice in the past.

jsoriano · 2018-06-22T10:10:13Z

metricbeat/module/elasticsearch/index_summary/data.go

@@ -16,68 +16,67 @@ var (
 			"docs": c.Dict("docs", s.Schema{
 				"count":   c.Int("count"),
 				"deleted": c.Int("deleted"),
-			}),
+			}, c.DictOptional),


Added these optional modifiers because there are test files that don't contain these fields.

Instead of that, added option to Apply to fail only on required fields.

jsoriano · 2018-06-22T10:40:35Z

metricbeat/module/elasticsearch/pending_tasks/data_test.go

-	errors, ok := errs.(*s.Errors)
-	if ok {
-		assert.True(t, errors.HasRequiredErrors(), "mapping error: %s", errors)
-		assert.EqualError(t, errors, "Required fields are missing: ,source")


@ruflin this is the only place where I have seen that we'd make use of ApplyResult.NotFound, to check if source is in the list of missing fields, but to take advantage of this we'd have to complicate the eventsMapping function to return the result and not only the event and the errors. Also, I think that here it'd be enough with testing that it returns an error for the case of having a missing field, going further looks like if we were testing here the behaviour of schema.Apply, that is already being tested in the schema package.

Given that, I wonder again if it worths it to have the ApplyResult object. From the perspective of a consumer of the API, it is cleaner to just have event, err := schema.Apply(data), with an event and an error that can be used directly. I think it is also more valuable to be able to give directly the semantics of the use case, as in event.MetricSetFields, err = schema.Apply(index) or nodeData, _ := schemaXpack.Apply(node) instead of obtaining the result and then assigning result.Event to a meaningful variable. From our perspective, if we want to test more particular cases of missing fields, I think that a better place would be the schema package.

I'd go back to the approach in #7167, but with the additional improvements in error handling added here too, and with ApplyTo still returning a list of errors.

I can see that from a user perspective it is much more convenient if the event can be assigned directly. It's also the reason why I started to think about 3 arguments.

For testing for the missing fields which are optional I think this is something specific to metricsets and not necessarly internal testing of Apply. The reason it only exists in one place so far I think is more that it was too complicated to do it. When we start to add more versions and have very subtle differences between each version I think tests for missing fields become more common.

I assume moving forward we will make optional fields the default as we discussed in the past and not required. In this scenario most fields will be optional. So I our testing will potentially more look like: max 3 missing fields and we get an error if there are more.

I have changed the approach again, to event, err := schema.Apply(data, options...), options is optional, and can be used to select the mode of operation of Apply (so for example we can make all fields optional by default). Or to access unfiltered errors, so additional tests can be done on the missing fields for example. The test that was checking the missing fields has been refactored to this approach.

I have updated the description of the PR accordingly.

jsoriano · 2018-06-22T10:43:30Z

metricbeat/module/elasticsearch/pending_tasks/data_test.go

 }

 func TestInvalidJsonForBadFormatShouldThrowError(t *testing.T) {
+	t.Skip("the code is not really checking for uknown fields")
+


This test was passing because schema.Apply() returned the empty schema.Errors, it was not detecting the unknown field in the (valid) JSON file. This test can be probably removed if the missing field is optional, if is not optional, an explicit check should be added in eventsMapping to check that it exists.

Assuming this is not optional, explicit check added and skip removed.

jsoriano · 2018-06-22T11:04:51Z

metricbeat/module/kibana/stats/data.go

-			"total":       c.Int("total"),
-			"disconnects": c.Int("disconnects"),
+			"total":       c.Int("total", s.Optional),
+			"disconnects": c.Int("disconnects", s.Optional),


These fields don't exist in the test input:

Error: Received unexpected error: 2 errors: key `requests.total` not found; key `requests.disconnects` not found Test: TestStats Messages: _meta/test/stats.700.json

jsoriano · 2018-06-22T12:21:54Z

libbeat/common/schema/mapstriface/mapstriface_test.go

+		keyErr, ok := errs[0].(s.KeyError)
+		if assert.True(t, ok, c.Description) {
+			assert.Equal(t, c.Expected, keyErr.Key(), c.Description)
+		}


ApplyTo can be used as a more advanced method for special cases, it still returns the full list of errors, that can be further analyzed if needed.

jsoriano · 2018-06-22T18:11:15Z

jenkins, test this again, please

houndci-bot · 2018-06-28T15:27:37Z

libbeat/common/schema/options.go

+	"github.com/elastic/beats/libbeat/common"
+)
+
+var DefaultApplyOptions = []ApplyOption{AllRequired}


exported var DefaultApplyOptions should have comment or be unexported

houndci-bot · 2018-06-28T15:27:37Z

libbeat/common/schema/schema.go

@@ -130,6 +135,13 @@ func Optional(c Conv) Conv {
 	return c
 }

+// The required flag provokes an error in case the key


comment on exported function Required should be of the form "Required ..."

houndci-bot · 2018-06-28T15:27:37Z

libbeat/common/schema/mapstriface/mapstriface.go

@@ -294,6 +312,12 @@ func DictOptional(c ConvMap) ConvMap {
 	return c
 }

+// The optional flag provokes an error in case the key doesn't exist.


comment on exported function DictRequired should be of the form "DictRequired ..."

ruflin

I really like this approach and now that I see it I'm quite happy that we kept iterating on it.

This will give us a lot of flexibility on how we can test things and making the switch to optional by default really easy.

ruflin · 2018-06-29T13:58:22Z

libbeat/common/schema/mapstriface/mapstriface.go

+	for _, err := range errors {
+		if err, ok := err.(schema.KeyError); ok {
+			err.SetKey(convMap.Key + "." + err.Key())
+		}


nice, that was not very nice in the past.

ruflin · 2018-06-29T14:03:25Z

metricbeat/module/elasticsearch/index_summary/data.go

@@ -66,25 +66,24 @@ var (
 	}
 )

-func eventMapping(r mb.ReporterV2, info elasticsearch.Info, content []byte) []error {
+func eventMapping(r mb.ReporterV2, info elasticsearch.Info, content []byte) error {
 	var all struct {
 		Data map[string]interface{} `json:"_all"`
 	}

 	err := json.Unmarshal(content, &all)
 	if err != nil {
 		r.Error(err)


One nice thing that we can do now is that we can check for the error from eventMapping and report it in only 1 place. The current code with reporting and returning the error now is only returning the error as there is only 1 type of errors left.

ruflin · 2018-06-29T14:04:29Z

metricbeat/module/elasticsearch/pending_tasks/data.go

@@ -29,28 +31,34 @@ var (
 	schema = s.Schema{
 		"insert_order":     c.Int("insert_order"),
 		"priority":         c.Str("priority"),
-		"source":           c.Str("source"),
+		"source":           c.Str("source", s.Required),


How did you pick this field as required?

There is a test failing if this field is not present, but I'm thinking now that all fields here seem to be required, I'm going to revert this part.

`schema.Apply()` returned a `schema.Errors` object as error, with this object is not easy to check if there was really an error, specially when optional fields are involved. This leads to lots of cases where these errors were being ignored. This change introduces a series of changes to have better control on the errors happening during the application of an schema. The interface is kept as `event, err := schema.Apply(data)`, but now the error returned is always a plain error in case of one or more errors happened, or nil if no error happened. `schema.ApplyTo()` returns the list of errors, and can be used where multiple schemas are applied for the same event. `Apply()` and `ApplyTo()` can now receive a list of options, these options are functions that can be used to filter the errors returned or to access the unfiltered list of errors. This allows different combinations of resulting errors in case of optional or required fields. It also allows to add additional checks on all the errors happened, even on the ones ignored for the final result. Three options are added by now: * `AllRequired` to fail if any non-optional field is missing, this mimics the current behaviour, and is set by default if no other option is set. * `FailOnRequired` to fail if any required field is missing. A new `Required` option is also added to be able to declare required fields. * `NotFoundKeys(cb func([]string))` is an option builder that receives a function, this function is called with the list of missing keys, so additional custom checks or tests can be done. For lists of errors, the `multierror` library is used now, some custom errors have been added to help on previous modifications and to improve some error messages. Errors on structured data contain now the full paths of the fields.

jsoriano · 2018-07-02T10:23:16Z

@exekias @andrewkroh if you have time, could you also take a look to this PR? Thanks!

exekias

Great work here! I think error handling has improved a lot with this, thank you for taking it!

jsoriano added in progress Pull request is currently in progress. libbeat labels Jun 14, 2018

houndci-bot reviewed Jun 14, 2018

View reviewed changes

jsoriano mentioned this pull request Jun 14, 2018

libbeat: schema.Apply() returns plain error or nil #7167

Closed

jsoriano added the refactoring label Jun 15, 2018

houndci-bot reviewed Jun 22, 2018

View reviewed changes

jsoriano force-pushed the schema-result branch from 4f1256d to 015a75b Compare June 22, 2018 09:57

jsoriano commented Jun 22, 2018

View reviewed changes

jsoriano added the discuss Issue needs further discussion. label Jun 22, 2018

jsoriano commented Jun 22, 2018

View reviewed changes

jsoriano added the review label Jun 22, 2018

jsoriano commented Jun 22, 2018

View reviewed changes

jsoriano added the Metricbeat Metricbeat label Jun 22, 2018

houndci-bot reviewed Jun 28, 2018

View reviewed changes

jsoriano changed the title ~~libbeat: Return a result object on schema.Apply()~~ libbeat: Refactor error handling in schema.Apply() Jun 28, 2018

jsoriano removed the in progress Pull request is currently in progress. label Jun 28, 2018

ruflin reviewed Jun 29, 2018

View reviewed changes

jsoriano force-pushed the schema-result branch 4 times, most recently from 4f3231c to b63168f Compare June 29, 2018 14:43

jsoriano force-pushed the schema-result branch from b63168f to de2edad Compare June 29, 2018 14:45

jsoriano removed the discuss Issue needs further discussion. label Jun 29, 2018

jsoriano requested review from exekias and andrewkroh July 2, 2018 10:23

exekias approved these changes Jul 2, 2018

View reviewed changes

ruflin merged commit 14da408 into elastic:master Jul 3, 2018

jsoriano mentioned this pull request Jul 18, 2018

Add metrics metricset to MongoDB module #7611

Merged

jsoriano mentioned this pull request Jan 1, 2019

Performance Metricset on X-Pack MSSQL Metricbeat module #9826

Merged

10 tasks

jsoriano deleted the schema-result branch January 2, 2019 15:37

jsoriano mentioned this pull request May 3, 2019

[libbeat]MapStr/Conv toInteger should behave different when converting from nil #11983

Closed

fearful-symmetry mentioned this pull request May 30, 2019

[dev guide] Add docs on optional schema arguments #12371

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

libbeat: Refactor error handling in schema.Apply() #7335

libbeat: Refactor error handling in schema.Apply() #7335

jsoriano commented Jun 14, 2018 •

edited

Loading

houndci-bot Jun 14, 2018

houndci-bot Jun 14, 2018

ruflin commented Jun 15, 2018

ruflin commented Jun 15, 2018

jsoriano commented Jun 15, 2018 •

edited

Loading

houndci-bot Jun 22, 2018

houndci-bot Jun 22, 2018

houndci-bot Jun 22, 2018

houndci-bot Jun 22, 2018

houndci-bot Jun 22, 2018

jsoriano Jun 22, 2018 •

edited

Loading

ruflin Jun 29, 2018

jsoriano Jun 22, 2018

jsoriano Jun 28, 2018

jsoriano Jun 22, 2018 •

edited

Loading

ruflin Jun 28, 2018

jsoriano Jun 28, 2018

jsoriano Jun 22, 2018

jsoriano Jun 28, 2018

jsoriano Jun 22, 2018

jsoriano Jun 22, 2018

jsoriano commented Jun 22, 2018

houndci-bot Jun 28, 2018

houndci-bot Jun 28, 2018

houndci-bot Jun 28, 2018

ruflin left a comment

ruflin Jun 29, 2018

ruflin Jun 29, 2018

ruflin Jun 29, 2018

jsoriano Jun 29, 2018

jsoriano commented Jul 2, 2018

exekias left a comment

libbeat: Refactor error handling in schema.Apply() #7335

libbeat: Refactor error handling in schema.Apply() #7335

Conversation

jsoriano commented Jun 14, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin commented Jun 15, 2018

ruflin commented Jun 15, 2018

jsoriano commented Jun 15, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano Jun 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano Jun 22, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano commented Jun 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsoriano commented Jul 2, 2018

exekias left a comment

Choose a reason for hiding this comment

jsoriano commented Jun 14, 2018 •

edited

Loading

jsoriano commented Jun 15, 2018 •

edited

Loading

jsoriano Jun 22, 2018 •

edited

Loading

jsoriano Jun 22, 2018 •

edited

Loading