-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Saved Objects] introduce per-type validation schemas. #104088
Comments
Pinging @elastic/kibana-core (Team:Core) |
Pinging @elastic/apm-ui (Team:apm) |
Was |
@sqren sorry, no, removed it after creating the issue |
mappings: {
properties: {
someField: { type: 'text' }
}
},
schema: schema.object({
someField: schema.string(),
}) How hard it would be to build
Also, we can start with validating well-known properties created by SavedObjects service itself. {
id: schema.string({ validate: () => validateIdFormat }),
type: schema.string(), // or literal?
version: schema.maybe(schema.string()),
attributes: schema.any(),
.... |
Would probably be tedious for some specific types of fields such as nested and so. Would also not work with attributes that are not indexed (and therefore could not be present in the But mostly it would not be good enough. In some case, we would like the owners to do more than just surface validation. E.g for We could try to generate a schema for types that does not explicitly provide one though, it would be better than no validation at all, but even in that case, we would be forced to use
I don't think we need that to secure the |
yeah. it's rather for the
yes, I didn't say we should use auto-generation for 100% of cases, but it might be a good default. |
Discussed today that for now we will focus on |
As I've started looking into this, one trade-off we will need to decide on is whether to run the type validations before or after the object is migrated during If we validate before, devs would need to worry about writing a core.savedObjects.registerType({
name: 'myType',
mappings: {
properties: {
someField: { type: 'text' }
}
},
migrations: {
'1.0.0': fn1,
'2.0.0': fn2,
},
schemas: {
'1.0.0': schema.object({ someField: schema.string() }),
'2.0.0': schema.object({ someMigratedField: schema.string() }),
}
}) If we validate after, the Overall I think validating after still makes the most sense and is the least complex / risky route, and is the path I'm going down now, but wanted to make note of it here in case others feel strongly otherwise. The user would still be protected against data corruption in this scenario because the object won't be written to ES, it just means they wouldn't always be getting back a useful validation error message unless the invalid object made it through the migration, and migrations themselves won't really be able to benefit from this feature. |
Just discussed this as a team, and we decided that moving toward per-version schemas would make sense for us longer term, as (among other things) it could improve the migration authoring experience if a version-specific validation could be run before each migration function. However, this would significantly blow up the scope of this effort, and we want to start small here. So we agreed a good path forward would be:
In this way, we get the near-term benefit of reducing the likelihood of data corruption, while having a future-proofed API that would allow for version-by-version schema validation to take place during migrations in the future. The tradeoff is the same as what I mentioned above, basically that users might get a migration error back when creating a SO instead of a validation error, which isn't the ideal DX but is at least an improvement on the current situation. |
👍 Adding validations to migrations would also mean that upgrades fail when we encounter invalid documents. Whereas some migrations are written in a way that ignores invalid data to allow the migration to succeed even with corrupt data. So we would want to have API validation in place for a long time, then perhaps warn on failed migration validation and perhaps only then enforce it. |
Can we start with emitting a warning in the dev mode and collecting production telemetry to track the number of such malformed docs? |
It is currently trivial to create saved objects containing invalid or corrupted data. You simply have to create (or update an existing) object with data that doesn't match the expected shape. We're currently not doing anything to protect against that.
This is dangerous because potential developer errors can lead to data being corrupted. Even worse, as the SO apis are documented, even end users can easily create corrupted objects that would break the system.
Another consequence is that, even if the corrupted object is unused in any application (e.g a corrupted dashboard that the end user never uses), it can lead to a migration failure during an upgrade, as all documents of a given type will go though the migration functions that are expecting a valid shape and/or data.
The first obvious improvement we could achieve to introduce basic protection against such corrupted would be to add per-type validation schemas, that would be provided when registering the associated type.
This would allow us to validate the attributes provided during a
SOR.create
operation against the schema, to ensure the created object is valid (and throw an error if it's not).I'm unsure how we could implement that with other APIs. e.g
update
is effectively a partial update, so I'm not sure we could find a way to only validate the updated properties.We could eventually use a plain object as the root schema instead of a schema.object, and pick the properties that are updated
but then we would loose some of the advanced schema validation features (such as by-reference validation between fields, or global validation methods), so I'm not sure it's worth it, and maybe only validating
create
is good enoughThe text was updated successfully, but these errors were encountered: