[Saved Objects] introduce per-type validation schemas. #104088

pgayvallet · 2021-07-01T12:20:51Z

It is currently trivial to create saved objects containing invalid or corrupted data. You simply have to create (or update an existing) object with data that doesn't match the expected shape. We're currently not doing anything to protect against that.

This is dangerous because potential developer errors can lead to data being corrupted. Even worse, as the SO apis are documented, even end users can easily create corrupted objects that would break the system.

Another consequence is that, even if the corrupted object is unused in any application (e.g a corrupted dashboard that the end user never uses), it can lead to a migration failure during an upgrade, as all documents of a given type will go though the migration functions that are expecting a valid shape and/or data.

The first obvious improvement we could achieve to introduce basic protection against such corrupted would be to add per-type validation schemas, that would be provided when registering the associated type.

core.savedObjects.registerType({
  name: 'myType',
  mappings: {
    properties: {
       someField: { type: 'text' }
    }
  },
  schema: schema.object({
    someField: schema.string(),
  })
})

This would allow us to validate the attributes provided during a SOR.create operation against the schema, to ensure the created object is valid (and throw an error if it's not).

I'm unsure how we could implement that with other APIs. e.g update is effectively a partial update, so I'm not sure we could find a way to only validate the updated properties.

We could eventually use a plain object as the root schema instead of a schema.object, and pick the properties that are updated

schema: {
    someField: schema.string(),
}

but then we would loose some of the advanced schema validation features (such as by-reference validation between fields, or global validation methods), so I'm not sure it's worth it, and maybe only validating create is good enough

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-07-01T12:20:52Z

Pinging @elastic/kibana-core (Team:Core)

elasticmachine · 2021-07-01T12:20:53Z

Pinging @elastic/apm-ui (Team:apm)

sorenlouv · 2021-07-07T14:46:37Z

Was team:apm added on purpose?

pgayvallet · 2021-07-07T22:28:01Z

@sqren sorry, no, removed it after creating the issue

mshustov · 2021-07-08T07:49:52Z

  mappings: {
    properties: {
       someField: { type: 'text' }
    }
  },
  schema: schema.object({
    someField: schema.string(),
  })

How hard it would be to build schema validation automatically based on provided properties?

This is dangerous because potential developer errors can lead to data being corrupted. Even worse, as the SO apis are documented, even end users can easily create corrupted objects that would break the system.

Also, we can start with validating well-known properties created by SavedObjects service itself.

{
  id: schema.string({ validate: () => validateIdFormat }),
  type: schema.string(), // or literal?
  version: schema.maybe(schema.string()),
  attributes: schema.any(), 
  ....

pgayvallet · 2021-07-08T08:25:04Z

How hard it would be to build schema validation automatically based on provided properties

Would probably be tedious for some specific types of fields such as nested and so. Would also not work with attributes that are not indexed (and therefore could not be present in the mappings provided during the type registration).

But mostly it would not be good enough. In some case, we would like the owners to do more than just surface validation. E.g for dashboard, we would be expecting to validate their panelsJSON and optionsJSON properties by parsing them and running validation against the parsed properties, not just ensure that a string was passed, which we wouldn't be able to do with just auto-generated schemas.

We could try to generate a schema for types that does not explicitly provide one though, it would be better than no validation at all, but even in that case, we would be forced to use unknowns: 'allow' to not break in case of potential unregistered/non-indexed fields

Also, we can start with validating well-known properties created by SavedObjects service itself

I don't think we need that to secure the SOR.create API, as the consumers are only allowed to provide the attributes used during the creation of the doc?

mshustov · 2021-07-08T12:33:03Z

I don't think we need that to secure the SOR.create API, as the consumers are only allowed to provide the attributes used during the creation of the doc?

yeah. it's rather for the update case.

We could try to generate a schema for types that does not explicitly provide one though, it would be better than no validation at all

yes, I didn't say we should use auto-generation for 100% of cases, but it might be a good default.

lukeelmers · 2021-10-26T19:47:55Z

Discussed today that for now we will focus on SOR.create here. We can treat update as a follow-up task if we think the tradeoffs discussed in this thread are worth it.

lukeelmers · 2021-11-03T22:37:55Z

Another consequence is that, even if the corrupted object is unused in any application (e.g a corrupted dashboard that the end user never uses), it can lead to a migration failure during an upgrade, as all documents of a given type will go though the migration functions that are expecting a valid shape and/or data.

As I've started looking into this, one trade-off we will need to decide on is whether to run the type validations before or after the object is migrated during SOR.create.

If we validate before, devs would need to worry about writing a schema that maintains BWC, as for example field names could be changed when the object is migrated. This would be a pain. It also means we'd need a map of schemas to versions like we use for migrations, which would significantly increase complexity and require adding this to the migration algorithm:

core.savedObjects.registerType({
  name: 'myType',
  mappings: {
    properties: {
       someField: { type: 'text' }
    }
  },
  migrations: {
    '1.0.0': fn1,
    '2.0.0': fn2,
  },
  schemas: {
    '1.0.0': schema.object({ someField: schema.string() }),
    '2.0.0': schema.object({ someMigratedField: schema.string() }),
  }
})

If we validate after, the schema can be written based on the current structure of the object, however the error messages become less useful. For example, an invalid object may cause a failed migration, but you'd get back an error about the migration, not a general validation error. That's a bummer because it means migration functions cannot trust that the provided attributes are valid.

Overall I think validating after still makes the most sense and is the least complex / risky route, and is the path I'm going down now, but wanted to make note of it here in case others feel strongly otherwise. The user would still be protected against data corruption in this scenario because the object won't be written to ES, it just means they wouldn't always be getting back a useful validation error message unless the invalid object made it through the migration, and migrations themselves won't really be able to benefit from this feature.

lukeelmers · 2021-11-04T15:11:53Z

Just discussed this as a team, and we decided that moving toward per-version schemas would make sense for us longer term, as (among other things) it could improve the migration authoring experience if a version-specific validation could be run before each migration function.

However, this would significantly blow up the scope of this effort, and we want to start small here.

So we agreed a good path forward would be:

Provide versioned schemas when registering SO types as shown above
For now, just proceed with validating a SO inside SOR.create, and do it after the migration is run against the object
We would only run the schema validation for the current Kibana version

In this way, we get the near-term benefit of reducing the likelihood of data corruption, while having a future-proofed API that would allow for version-by-version schema validation to take place during migrations in the future. The tradeoff is the same as what I mentioned above, basically that users might get a migration error back when creating a SO instead of a validation error, which isn't the ideal DX but is at least an improvement on the current situation.

rudolf · 2021-11-04T15:20:39Z

👍

Adding validations to migrations would also mean that upgrades fail when we encounter invalid documents. Whereas some migrations are written in a way that ignores invalid data to allow the migration to succeed even with corrupt data.

So we would want to have API validation in place for a long time, then perhaps warn on failed migration validation and perhaps only then enforce it.

mshustov · 2021-11-04T15:38:59Z

So we would want to have API validation in place for a long time, then perhaps warn on failed migration validation and perhaps only then enforce it.

Can we start with emitting a warning in the dev mode and collecting production telemetry to track the number of such malformed docs?

lukeelmers · 2022-01-05T14:54:55Z

#118969 has merged with support for create and bulkCreate, so I will go ahead and close this issue as it represents what we wanted to accomplish for a first iteration.

I've opened #122343 to track some of the potential follow-ups we discussed here.

pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:APM All issues that need APM UI Team support Feature:Saved Objects labels Jul 1, 2021

pgayvallet removed the Team:APM All issues that need APM UI Team support label Jul 1, 2021

pgayvallet mentioned this issue Jul 1, 2021

[SO Migration] meta list of the possible improvements by area #104083

Open

sorenlouv added the [zube]: Backlog label Jul 7, 2021

This was referenced Aug 18, 2021

[Security Solutions][Detection Engine] Meta - Migration of security_solution rules to Stack Management export/import/copy #109169

Closed

[Saved Objects] Import hook/interceptor for saved object management #109189

Closed

pgayvallet removed the [zube]: Backlog label Aug 24, 2021

kobelb mentioned this issue Oct 5, 2021

[TSVB] Breaks requests when series id or metric id is purely digits #113601

Closed

lukeelmers added impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. loe:medium Medium Level of Effort labels Oct 29, 2021

exalate-issue-sync bot added loe:large Large Level of Effort and removed loe:medium Medium Level of Effort labels Nov 2, 2021

lukeelmers self-assigned this Nov 2, 2021

lukeelmers mentioned this issue Nov 18, 2021

Adds validations for Saved Object types when calling create or bulkCreate. #118969

Merged

exalate-issue-sync bot added loe:medium Medium Level of Effort and removed loe:large Large Level of Effort labels Nov 19, 2021

pgayvallet mentioned this issue Nov 19, 2021

[Saved Object] No validations when create/import saved_objects with special characters #119110

Closed

pgayvallet mentioned this issue Dec 9, 2021

Corrupt saved object passes import validation, breaks UI, and blocks upgrades #118621

Closed

lukeelmers mentioned this issue Jan 5, 2022

[saved objects] Type validation enhancements #122343

Closed

exalate-issue-sync bot closed this as completed Jan 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Saved Objects] introduce per-type validation schemas. #104088

[Saved Objects] introduce per-type validation schemas. #104088

pgayvallet commented Jul 1, 2021

elasticmachine commented Jul 1, 2021

elasticmachine commented Jul 1, 2021

sorenlouv commented Jul 7, 2021

pgayvallet commented Jul 7, 2021

mshustov commented Jul 8, 2021

pgayvallet commented Jul 8, 2021

mshustov commented Jul 8, 2021

lukeelmers commented Oct 26, 2021

lukeelmers commented Nov 3, 2021

lukeelmers commented Nov 4, 2021

rudolf commented Nov 4, 2021

mshustov commented Nov 4, 2021

lukeelmers commented Jan 5, 2022

[Saved Objects] introduce per-type validation schemas. #104088

[Saved Objects] introduce per-type validation schemas. #104088

Comments

pgayvallet commented Jul 1, 2021

elasticmachine commented Jul 1, 2021

elasticmachine commented Jul 1, 2021

sorenlouv commented Jul 7, 2021

pgayvallet commented Jul 7, 2021

mshustov commented Jul 8, 2021

pgayvallet commented Jul 8, 2021

mshustov commented Jul 8, 2021

lukeelmers commented Oct 26, 2021

lukeelmers commented Nov 3, 2021

lukeelmers commented Nov 4, 2021

rudolf commented Nov 4, 2021

mshustov commented Nov 4, 2021

lukeelmers commented Jan 5, 2022