-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter out documents of unknown types during migrations #104690
Comments
Pinging @elastic/kibana-core (Team:Core) |
Would running the 7.14 upgrade be necessary? I thought we ran migrations on SO import? Perhaps the workaround be simplified to:
This would be another area where #55404 could potentially help... if we returned some sort of warning on ignored SO types, we could prompt the user to export those for future reference before attempting the initial upgrade. Then step (1) in the workaround above wouldn't be necessary. |
These scenarios (and their workarounds) work as long as we don't have references between a disabled and an enabled type during the migration matching the disabled type's Please refer to #101351 (comment) for the detailed edge case, but in short:
consequence:
Meaning that the suggested workaround
Would not work, as we would need to also alter the references of objects already migrated in our Kibana index. This is the main problem here. Now, I would be more than happy to ignore this edge case, and just go with the proposed approach/workarounds, but we need to be aware that this may come back and bite us during the 7.15 and/or 8.0 migration, which are the currently scheduled I think we do need to check the list available here: #100489, check which types are using references to types from other plugins (e.g the references that we could potentially break by disabling a specific plugin before/during the migration), and the risk that their owning plugin can/could be disabled in our customers base. |
Yes, I had forgotten about the references ID rewriting. With all of these moving parts, it seems that the most thorough way to continue supporting this use case would be to implement the suggested solution in #101351 (comment) related to splitting the
This way we can migrate the references as needed separately from the ID. This is going to add additional complexity to the migration algorithm which I would like to avoid. But if we need to keep supporting this use case, I don't see another good option available. Regardless of the long-term solution, this seems far out of scope for the 7.14 release. For the 7.14 release we have two options:
I strongly prefer option (1) for the long-term benefits, though it could make for a bumpy upgrade for any users who have disabled a plugin. (3) is also attractive but I worry about giving our users a foot-gun. Making a decision here is quite hard without much visibility into how and when our users are using the plugin disabling 'feature'. |
I've completed an audit going back to version 5.5 where Saved Objects were added. There is only one additional SO type that needs to be filtered out for option (1) and I'll be opening a PR for this. Here's the results of my audit: https://gist.github.com/joshdover/e3c449d165498da49e8e4a4fb6f2cef5 |
I summarized these options and tradeoffs in a bit more digestable way. I'm going to dig in to our data tomorrow to see if we can determine an estimated % of clusters that are disabling plugins and see if the most common plugins would be affected by options 1 or 3. Summary Table
Detailed table
|
Yea, the added complexity is all but negligible, and I agree that I would gladly avoid this option if possible
I agree with your summary and your order of preference. To explain a little more, I do think With all the previous versions check you performed the last few days, I feel pretty confident that option Also, I'll add that we're very close to the BC deadline, and implementing |
My preference is to go with option 2 in the short term. I fully acknowledge that this is the kick the can down the road option, and we're going to have to deal with this problem in the future as soon as we allow saved-object IDs to be regenerated. My thinking is that this will buy us time to figure out what upgrade scenarios we need to support and evaluate our options against those upgrade scenarios. This also allows us to brainstorm other more complex solutions we can implement. IMO, this is more in line with the make-it-minor mindset where we are willing to invest more effort in solutions that provide our users a seamless upgrade experience. |
In some offline discussions, we decided to go with option (2) for the remainder of the 7.x series and add a few things to make the 8.0 upgrade smoother & improve long-term data integrity:
I will have a PR up to fix the 7.14/7.x scenario first, followed by PRs for adding Upgrade Assistant deprecations and the fail-fast logic for the master/8.0 branch. |
After #103341 merged, we've noticed many upgrade failures due to documents still existing for types that have been removed from Kibana's codebase. We've had no process for tracking these removals so there are some SO types that have likely had old stale documents in customers'
.kibana
index for some time.Prior to #103341, these documents would have caused issues for v2 migrations, causing the migration to fail halfway through the upgrade and leaving the cluster in a state that could not easily be rolled back. While we now fail earlier to enable a smooth rollback, the number of documents with unknown types seems to be much larger than we have anticipated. (in v1 migrations, these documents would not have failed the migration only if they never had any migrations registered for them).
We are attempting to handle this situation by finding all the types that have been removed and adding them to the filter that is used to ignore these outdated documents in #104507. The problem is that this audit may not cover 100% of cases because auditing older branches is not foolproof and in some cases the old branches no longer work at all.
An alternative option may be to simply only migrate known types and ignore all others. This option was not previously considered in the original RFC section that discussed this scenario. Given that the v2 migration architecture retains the previous index, I think this option is now viable since it will not delete the documents of unknown types.
The main issue with this option is that it does not provide an easy path to migrate these documents later should a plugin become enabled that previously used this SO type. For instance, imagine the following scenarios:
Scenario 1: custom plugin
custom-type
SO type and stores some datacustom-type
documents are filtered out and left in the old 7.13 index.If the plugin did not create a migration in 7.14, the user could simply copy the documents from the 7.13 index to the 7.14 index and everything should work fine. However if the custom plugin did register migrations, the workaround would involve setting up a separate 7.13 cluster with the
custom-type
documents in the 7.13 index, running an upgrade to 7.14, then exporting those objects and reimporting them into the original 7.14 cluster.Scenario 2: disable plugin
xpack.security_solution.enabled: false
cases
documents are filtered out and left in the old 7.13 index.The workaround would involve setting up a separate 7.13 cluster with the
cases
(and any other related) documents in the 7.13 index, running an upgrade to 7.14, then exporting those objects and reimporting them into the original 7.14 cluster.If we're ok with just not providing a good user experience here, then I can't come up with a reason not to do this. I don't think this the custom plugin scenario is case we particularly need to support and a (painful) workaround does exist. Providing a good, bulletproof default experience is probably more important. The second scenario for disabling plugins I'm less sure about. At least with the current behavior we very explicitly make them choose to delete these documents.
It's also worth noting that we're moving away from allowing customers to disable most first-party plugins which will help mitigate this issue, but that won't be available until 8.0. See #89584 for discussion.
cc @pgayvallet @kobelb
The text was updated successfully, but these errors were encountered: