Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render Mappings more Compact in GET /_cluster/state #83846

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Feb 11, 2022

This deduplicates index mappings in the /_cluster/state response by grouping them by the sha256 hash of the mapping.
This significantly reduces the response size for large cluster states,
making this debug API easier to read for humans (because understanding
whether or not mappings are duplicated across indices might be valuable)
and more importantly less ressource hungry as far as serialization goes.

Adds an undocumented parameter to get back to the old format if we actually have to in an unforeseen instance.

New rendering of an index:

      "my-index-000001" : {
        "version" : 5,
        "mapping_version" : 2,
        "settings_version" : 1,
        "aliases_version" : 1,
        "routing_num_shards" : 1024,
        "state" : "open",
        "settings" : {
          "index" : {
            "routing" : {
              "allocation" : {
                "include" : {
                  "_tier_preference" : "data_content"
                }
              }
            },
            "number_of_shards" : "1",
            "provided_name" : "my-index-000001",
            "creation_date" : "1644601309989",
            "number_of_replicas" : "1",
            "uuid" : "9m19TP4DRsevoKkSG3DC0g",
            "version" : {
              "created" : "8020099"
            }
          }
        },
        "mappings" : "XinOY78Sl3l55aaFJrhIKDlniKVodszNHRmKhONrR/o=",

and then mappings at the same level as "indices":

  "mappings" : {
      "5jySc/VOXLKC1itGKeilXwF6kCFL0i5Xe0DqJ0eDgwk=" : {
        "dynamic" : "strict",
        "_meta" : {
          "version" : "8.2.0"
        },
        "properties" : {
          "chunk" : {
            "type" : "integer"
          },
          "data" : {
            "type" : "binary"
          },
          "name" : {
            "type" : "keyword"
          }
        }
      },
      "XinOY78Sl3l55aaFJrhIKDlniKVodszNHRmKhONrR/o=" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date"
          },
          "message" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "user" : {
            "properties" : {
              "id" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              }
            }
          }
        }
      }
    }

relates #77466

This deduplicates index mappings in the `/_cluster_state` response.
This significantly reduces the response size for large cluster states,
making this debug API easier to read for humans (because understanding
whether or not mappings are duplicated across indices might be valuable)
and more importantly less ressource hungry as far as serialization goes.
@original-brownbear original-brownbear added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v8.2.0 labels Feb 11, 2022
@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Feb 11, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine
Copy link
Collaborator

Hi @original-brownbear, I've created a changelog YAML for you.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally +1 from me but we still need to consider whether to treat this as a breaking change or not. We could alternatively not change the default but start sending 429s to clients that request an unreasonably large response. Marking this as Request changes until these questions are resolved.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preparing for our discussion today, I read through this and provided a couple of comments.

if (mmd == null) {
builder.nullField(KEY_MAPPINGS);
} else {
builder.field(KEY_MAPPINGS, indexMetadata.mapping().getSha256());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to pick a new name for the SHA256. mappings_sha256 maybe or perhaps shared_mappings_id. Then we can also allow dumping both, which could provide a transitional path for clients.

@@ -2124,6 +2126,15 @@ public static void toXContent(Metadata metadata, XContentBuilder builder, ToXCon
builder.endObject();

if (context == XContentContext.API) {
if (params.paramAsBoolean(MAPPINGS_BY_HASH_PARAM, true)) {
builder.startObject("mappings");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can call this shared_mappings and then have a flag of the same name to the API as well as a cluster setting to disable dumping non-shared mappings?

@elasticsearchmachine elasticsearchmachine changed the base branch from master to main July 22, 2022 23:09
@mark-vieira mark-vieira added v8.5.0 and removed v8.4.0 labels Jul 27, 2022
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Henning and I discussed this again today. We're not confident enough that nobody would be affected by this change in the default behaviour, and we think with chunked-encoding we can protect Elasticsearch against harm at least, so we would prefer to make the desired behaviour opt-in today, deprecate the old default, and remove it in the next major version.

I'll raise a proposal to do this with the breaking changes committee.

@csoulios csoulios added v8.6.0 and removed v8.5.0 labels Sep 21, 2022
@kingherc kingherc added v8.7.0 and removed v8.6.0 labels Nov 16, 2022
@rjernst rjernst added v8.8.0 and removed v8.7.0 labels Feb 8, 2023
@gmarouli gmarouli added v8.9.0 and removed v8.8.0 labels Apr 26, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-obsolete (Team:Distributed (Obsolete))

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.