Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML][Fleet] Update Transform installation mechanism to support upgrade paths #142920

Merged
merged 25 commits into from
Dec 21, 2022

Conversation

qn895
Copy link
Member

@qn895 qn895 commented Oct 6, 2022

Summary

Follow up of #140046. This PR:

  • Adds index aliases to the destination index for Transforms:

    • It will automatically appends {{package-version}} name to the destination_index_name specified in transform.yml
    • Create a {destination_index_name}.all that points to all the destination indices from all the previous versions and new version
    • Create a {destination_index_name}.latest that points to just the destination index of the new version
  • Upgrading package to a new version no longer deletes the destination index

  • Downgrading package to an older version (e.g. from v3 to v1) will:

    • Delete the transform from the newer version (v3), create transform for the older version (v1)
    • If the older version was previously installed and the destination index of the older version v1 already exists: update the alias {destination_index_name}.latest to point to destination index v1.
    • If the older version was never installed and destination index of the older version v1 does not exist: create the destination index with {destination_index_name}.all and {destination_index_name}.latest alias.
  • Support installing transforms concurrently and sequentially.

    • If the order is specified in the transform.yml's _meta section, and all the numerical order are unique, transforms will be created and started sequentially. If not, they will be created and started concurrently.
  • Support versioning of transforms. If fleet_transform_version is specified in transform.yml's _meta section:

    • If fleet_transform_version changed (either incremented or decremented): delete old transform, keep the old destination index, install new index templates, component templates, and transform
    • If fleet_transform_version remains the same: keep old transform, keep the old destination index, do nothing new
  • Fixes an issue with the mappings and template not being applied to the destination index correctly when the destination index has an ingest pipeline. Previously, when the transform is associated with an ingest pipeline, we add the ingest pipeline to the settings when calling PUT index/{transform-destination-index}. This in turns makes the settings and mappings from the component templates not apply correctly to the destination. This PR changes so that it will add the pipeline to the component template.

Technical changes:

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@qn895 qn895 changed the title [ML] Fix template being overridden [ML] Fix Fleet transforms installation's templates being overridden by index settings Oct 6, 2022
@qn895 qn895 self-assigned this Oct 6, 2022
@qn895 qn895 added the :ml label Oct 6, 2022
@qn895 qn895 changed the title [ML] Fix Fleet transforms installation's templates being overridden by index settings [ML][Fleet] Update Transform installation mechanism to support upgrade paths Nov 2, 2022
@qn895 qn895 force-pushed the ml-fleet-fix-templates-being-overriden branch from 1fe8c56 to a637796 Compare November 2, 2022 14:53
@qn895 qn895 force-pushed the ml-fleet-fix-templates-being-overriden branch from bf16496 to 8e2a6ff Compare November 9, 2022 02:37
Comment on lines 49 to 50
// @TODO: Remove ignore 400
await esClient.ingest.deletePipeline({ id }, { ignore: [400] });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way to remove the 400 is to set the index.default_pipeline setting on the old destination index to the empty string before deleting the pipeline. See elastic/elasticsearch#32286 (comment).

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Nov 10, 2022
These changes go with those of elastic/kibana#142920.

As we formalize the process by which the Fleet package
installer will upgrade transforms more operations are
required for managing the transforms and the related
destination index:

1. Need to be able to add an alias on the transform
   destination index and adjust which indices it points
   to when upgrading the transform.
2. Need to be able to remove a default ingest pipeline
   from the settings of an old transform destination
   index during an upgrade that deletes the ingest
   pipeline.
droberts195 added a commit to elastic/elasticsearch that referenced this pull request Nov 22, 2022
These changes go with those of elastic/kibana#142920.

As we formalize the process by which the Fleet package
installer will upgrade transforms more operations are
required for managing the transforms and the related
destination index:

1. Need to be able to add an alias on the transform
   destination index and adjust which indices it points
   to when upgrading the transform.
2. Need to be able to remove a default ingest pipeline
   from the settings of an old transform destination
   index during an upgrade that deletes the ingest
   pipeline.
@qn895 qn895 requested a review from a team as a code owner December 7, 2022 19:17
Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core related changes LGTM

Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stepped through all the code/tests (much appreciated on how many tests we've got for this process!) and everything looks good to me overall. I don't have a great way to test this on my end, though, so I'll defer to @kevinlog who I know was testing the Endpoint transforms previously.

I just had a few questions about whether we should be bubbling up granular errors for specific transform failures during installation - but I wouldn't consider those blocking questions here.

} catch (err) {
logger.debug(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be logged to error or is this a non-blocking issue?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this same question about making errors visible to the user. It's generally a lot easier to debug with error messages than without!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here e1a0050 (#142920)

logger.debug(`Deleted alias: '${alias}' matching indices ${indicesMatchingAlias}`);
}
} catch (err) {
logger.debug(`Error deleting alias: ${alias}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here re: log levels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated here e1a0050 (#142920)

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kevinlog
Copy link
Contributor

kevinlog commented Dec 19, 2022

@qn895 thank you for preparing this PR! I checked it out and tried it. See some observations below.

kibana_system permissions need to update

As discussed offline, since destination indices have the version appended to the end, we need to adjust the permissions for the kibana_system user in ES.

Initially, I saw the the following when trying to install the new transform schema with the Endpoint package.

Error: Error installing endpoint 8.7.1: security_exception: [security_exception] Reason: action [indices:admin/create] is unauthorized for user [kibana_system] with effective roles [kibana_system] on indices [.metrics-endpoint.metadata_united_default-8.7.1], this action is granted by the index privileges [create_index,manage,all]

I made the below change locally updating kibana_system permissions, and installation of transforms following the new schema works!

diff --git a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java b/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java
index d8889003f36..df77b7660ad 100644
--- a/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java
+++ b/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/security/authz/store/ReservedRolesStore.java
@@ -820,9 +820,9 @@ public class ReservedRolesStore implements BiConsumer<Set<String>, ActionListene
                     .build(),
                 RoleDescriptor.IndicesPrivileges.builder()
                     .indices(
-                        "metrics-endpoint.metadata_current_default",
-                        ".metrics-endpoint.metadata_current_default",
-                        ".metrics-endpoint.metadata_united_default"
+                        "metrics-endpoint.metadata_current_default*",
+                        ".metrics-endpoint.metadata_current_default*",
+                        ".metrics-endpoint.metadata_united_default*"
                     )
                     .privileges("create_index", "delete_index", "read", "index", IndicesAliasesAction.NAME, UpdateSettingsAction.NAME)
                     .build(),

Testing scenarios

I tested the following scenarios with the endpoint package.

Fresh package installs

  • Install legacy Endpoint package with legacy transforms in fresh install ✅
  • Install new schema Endpoint package with new transforms schema in fresh install ✅

Package upgrade scenarios

  • Upgrade legacy Endpoint package with legacy transforms to a newer legacy Endpoint package with legacy transforms (legacy -> legacy) ✅
  • Upgrade new schema Endpoint package with new transforms schema to a newer Endpoint package with new transforms schema (new schema -> new schema) ✅
    • Note: I updated fleet_transform_version as specified to see the transform upgrade occur
  • Upgrade legacy Endpoint package with legacy transforms to a newer Endpoint package with new transforms schema (legacy -> new schema) ❌

In the (legacy -> new schema) upgrade test above, I am seeing that the new transforms get installed, but the legacy transforms are not being deleted. Further, the legacy destination indices are not being deleted. See some screenshots below to visualize what I'm seeing.

You see 2 sets of transforms. The ones prefixed logs-* are new schema. The ones without the logs-* prefix are legacy. I confirmed offline with @qn895 that the legacy transforms should be deleted.

image

You see 2 sets of destination indices below. The ones with the version at the end -8.9.1 are the new schema. The ones without the version at the end are legacy.

image

❓ It's understood that destination indices will now be kept around after upgrades between two packages both following the new transform schema. Should we delete the destination indices of the legacy transforms in the initial "cross over" upgrade to using the new schema for the first time? My initial opinion is that we should delete the legacy destination indices when we do the initial upgrade to the new schema to stay consistent with how legacy transforms worked and to have a cleaner "cross over". I'm interested to hear other opinions. cc @qn895 @droberts195 @joshdover @kpollich

Question about {destination_index_name}.latest alias and keeping old destination indices around

My understanding for the reasoning for keeping old destination indices around is to help with upgrade performance and re-indexing data. (Let me know if I have that right!)

For Endpoint purposes, we really only care about the latest data about which Endpoints are deployed. When testing the new upgrade paths, our Kibana API to display deployed Endpoints looks at our destination index and returns that list. With the new upgrade scenarios, our older destination indices will not be deleted. When our new transforms install, the new indices will be indexed with the latest documents representing deployed Endpoints. This will initially cause us to see duplicates in the UI because we'll get the same Endpoint ID from each destination index. To fix this, we'll update our API calls to point to the {destination_index_name}.latest alias to make sure we only get data from the latest index and avoid duplicates with stale data from older destination indices that are no longer updated.

Let me know if the above scenario and my usage of the {destination_index_name}.latest alias makes sense. My goal is to ensure that my interpretation of the intended usage of the aliases is in alignment with everyone. cc @qn895 @droberts195 @joshdover @kpollich

droberts195 added a commit to droberts195/elasticsearch that referenced this pull request Dec 20, 2022
Changes to support elastic/kibana#142920.

Transform destination indices now have a version number appended
to their names, so the patterns used to set up the index privileges
need to be adjusted accordingly.
@qn895
Copy link
Member Author

qn895 commented Dec 20, 2022

Thanks Kevin for the thorough testing. I've updated the behavior for Upgrade legacy Endpoint package with legacy transforms to a newer Endpoint package with new transforms schema (legacy -> new schema) ❌ to reflect the discussed plan. Current behavior should now be:

  • Old transforms from version using legacy schema should now be deleted
  • Old destination indices from version using legacy schema should also now be deleted (I agree it's better for consistency)

@kevinlog
Copy link
Contributor

@qn895 thank you for the update!

I checked it out and tried it again and it works!

I addition to the other testing scenarios, the below one now works.

  • Upgrade legacy Endpoint package with legacy transforms to a newer Endpoint package with new transforms schema (legacy -> new schema) ✅

LGTM!

@qn895
Copy link
Member Author

qn895 commented Dec 21, 2022

@elasticmachine merge upstream

@qn895 qn895 enabled auto-merge (squash) December 21, 2022 15:47
@qn895 qn895 added the Feature:Transforms ML transforms label Dec 21, 2022
droberts195 added a commit to elastic/elasticsearch that referenced this pull request Dec 21, 2022
Changes to support elastic/kibana#142920.

Transform destination indices now have a version number appended
to their names, so the patterns used to set up the index privileges
need to be adjusted accordingly.
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
fleet 119.7KB 119.8KB +105.0B

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id before after diff
epm-packages 27 28 +1
Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 61 67 +6
osquery 109 115 +6
securitySolution 439 445 +6
total +20

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 70 76 +6
osquery 110 117 +7
securitySolution 515 521 +6
total +21

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @qn895

@qn895 qn895 merged commit b61066d into elastic:main Dec 21, 2022
@kibanamachine kibanamachine added the backport:skip This commit does not require backporting label Dec 21, 2022
@susan-shu-c
Copy link
Member

Thanks again for this 🙏 ☀️ 🎉

@szeitlin
Copy link

Thanks again for this 🙏 ☀️ 🎉

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Transforms ML transforms :ml release_note:enhancement Team:Fleet Team label for Observability Data Collection Fleet team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.