-
Notifications
You must be signed in to change notification settings - Fork 1
/
2017061401.html
1 lines (1 loc) · 43.9 KB
/
2017061401.html
1
<!DOCTYPE html><html class="theme-next mist use-motion" lang="zh-Hans"><head><meta name="generator" content="Hexo 3.9.0"><meta charset="UTF-8"><meta http-equiv="X-UA-Compatible" content="IE=edge"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1"><meta name="theme-color" content="#222"><script src="/lib/pace/pace.min.js?v=1.0.2"></script><link href="/lib/pace/pace-theme-minimal.min.css?v=1.0.2" rel="stylesheet"><meta http-equiv="Cache-Control" content="no-transform"><meta http-equiv="Cache-Control" content="no-siteapp"><link href="/lib/fancybox/source/jquery.fancybox.css?v=2.1.5" rel="stylesheet" type="text/css"><link href="/lib/font-awesome/css/font-awesome.min.css?v=4.6.2" rel="stylesheet" type="text/css"><link href="/css/main.css?v=5.1.3" rel="stylesheet" type="text/css"><link rel="apple-touch-icon" sizes="180x180" href="/images/apple-touch-icon-240x240-playpi.png?v=5.1.3"><link rel="icon" type="image/png" sizes="32x32" href="/images/favicon-32x32-playpi.png?v=5.1.3"><link rel="icon" type="image/png" sizes="16x16" href="/images/favicon-16x16-playpi.png?v=5.1.3"><link rel="mask-icon" href="/images/logo-playpi.svg?v=5.1.3" color="#222"><meta name="keywords" content="Elasticsearch,bulk,keyword,ignore_above"><link rel="alternate" href="/atom.xml" title="虾丸派" type="application/atom+xml"><meta name="description" content="最近在项目中遇到一个异常,写入数据到 Elasticsearch 中,报错:max_bytes_length_exceeded_exception。这个其实和 Elasticsearch 的字段长度限制有关,本文就回顾一下在 Elasticsearch 中一个字段支持的最大字符数。本文涉及的开发环境:Elasticsearch v5.6.8,读者需要注意 字符数 、 字节数 这两个基本概念的区别。"><meta name="keywords" content="Elasticsearch,bulk,keyword,ignore_above"><meta property="og:type" content="article"><meta property="og:title" content="在 Elasticsearch 中一个字段支持的最大字符数"><meta property="og:url" content="https://www.playpi.org/2017061401.html"><meta property="og:site_name" content="虾丸派"><meta property="og:description" content="最近在项目中遇到一个异常,写入数据到 Elasticsearch 中,报错:max_bytes_length_exceeded_exception。这个其实和 Elasticsearch 的字段长度限制有关,本文就回顾一下在 Elasticsearch 中一个字段支持的最大字符数。本文涉及的开发环境:Elasticsearch v5.6.8,读者需要注意 字符数 、 字节数 这两个基本概念的区别。"><meta property="og:locale" content="zh-Hans"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213903.png"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213928.png"><meta property="og:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213935.png"><meta property="og:updated_time" content="2017-06-14T13:29:31.000Z"><meta name="twitter:card" content="summary"><meta name="twitter:title" content="在 Elasticsearch 中一个字段支持的最大字符数"><meta name="twitter:description" content="最近在项目中遇到一个异常,写入数据到 Elasticsearch 中,报错:max_bytes_length_exceeded_exception。这个其实和 Elasticsearch 的字段长度限制有关,本文就回顾一下在 Elasticsearch 中一个字段支持的最大字符数。本文涉及的开发环境:Elasticsearch v5.6.8,读者需要注意 字符数 、 字节数 这两个基本概念的区别。"><meta name="twitter:image" content="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213903.png"><script type="text/javascript" id="hexo.configurations">var NexT=window.NexT||{},CONFIG={root:"/",scheme:"Mist",version:"5.1.3",sidebar:{position:"left",display:"hide",offset:12,b2t:!1,scrollpercent:!0,onmobile:!1},fancybox:!0,tabs:!0,motion:{enable:!0,async:!1,transition:{post_block:"fadeIn",post_header:"slideDownIn",post_body:"slideDownIn",coll_header:"slideLeftIn",sidebar:"slideUpIn"}},duoshuo:{userId:"0",author:"博主"},algolia:{applicationID:"",apiKey:"",indexName:"",hits:{per_page:10},labels:{input_placeholder:"Search for Posts",hits_empty:"We didn't find any results for the search: ${query}",hits_stats:"${hits} results found in ${time} ms"}}}</script><link rel="canonical" href="https://www.playpi.org/2017061401.html"><title>在 Elasticsearch 中一个字段支持的最大字符数 | 虾丸派</title></head><body itemscope itemtype="http://schema.org/WebPage" lang="zh-Hans"><div class="container sidebar-position-left page-post-detail"><div class="headband"></div><header id="header" class="header" itemscope itemtype="http://schema.org/WPHeader"><div class="header-inner"><div class="site-brand-wrapper"><div class="site-meta"><div class="custom-logo-site-title"><a href="/" class="brand" rel="start"><span class="logo-line-before"><i></i></span> <span class="site-title">虾丸派</span> <span class="logo-line-after"><i></i></span></a></div><h1 class="site-subtitle" itemprop="description">烂笔头</h1></div><div class="site-nav-toggle"><button><span class="btn-bar"></span> <span class="btn-bar"></span> <span class="btn-bar"></span></button></div></div><nav class="site-nav"><ul id="menu" class="menu"><li class="menu-item menu-item-home"><a href="/" rel="section"><i class="menu-item-icon fa fa-fw fa-home"></i><br>首页</a></li><li class="menu-item menu-item-tags"><a href="/tags/" rel="section"><i class="menu-item-icon fa fa-fw fa-tags"></i><br>标签</a></li><li class="menu-item menu-item-categories"><a href="/categories/" rel="section"><i class="menu-item-icon fa fa-fw fa-th"></i><br>分类</a></li><li class="menu-item menu-item-archives"><a href="/archives/" rel="section"><i class="menu-item-icon fa fa-fw fa-archive"></i><br>归档</a></li><li class="menu-item menu-item-about"><a href="/about/" rel="section"><i class="menu-item-icon fa fa-fw fa-user"></i><br>关于</a></li><li class="menu-item menu-item-books"><a href="/books/" rel="section"><i class="menu-item-icon fa fa-fw fa-book"></i><br>书籍</a></li><li class="menu-item menu-item-guide"><a href="/guide/" rel="section"><i class="menu-item-icon fa fa-fw fa-location-arrow"></i><br>指南</a></li><li class="menu-item menu-item-search"><a href="javascript:;" class="popup-trigger"><i class="menu-item-icon fa fa-search fa-fw"></i><br>搜索</a></li></ul><div class="site-search"><div class="popup search-popup local-search-popup"><div class="local-search-header clearfix"><span class="search-icon"><i class="fa fa-search"></i> </span><span class="popup-btn-close"><i class="fa fa-times-circle"></i></span><div class="local-search-input-wrapper"><input autocomplete="off" placeholder="搜索..." spellcheck="false" type="text" id="local-search-input"></div></div><div id="local-search-result"></div></div></div></nav></div></header><main id="main" class="main"><div class="main-inner"><div class="content-wrap"><div id="content" class="content"><div id="posts" class="posts-expand"><article class="post post-type-normal" itemscope itemtype="http://schema.org/Article"><div class="post-block"><link itemprop="mainEntityOfPage" href="https://www.playpi.org/2017061401.html"><span hidden itemprop="author" itemscope itemtype="http://schema.org/Person"><meta itemprop="name" content="虾丸派"><meta itemprop="description" content="记录知识 | 分享技术"><meta itemprop="image" content="/images/favicon-1536x1536-playpi.png"></span><span hidden itemprop="publisher" itemscope itemtype="http://schema.org/Organization"><meta itemprop="name" content="虾丸派"></span><header class="post-header"><h2 class="post-title" itemprop="name headline">在 Elasticsearch 中一个字段支持的最大字符数</h2><div class="post-meta"><span class="post-time"><span class="post-meta-item-text">发表于</span> <time title="创建于" itemprop="dateCreated datePublished" datetime="2017-06-14T21:29:31+08:00">2017-06-14 </time></span><span class="post-category"><span class="post-meta-divider">|</span> <span class="post-meta-item-text">分类于</span> <span itemprop="about" itemscope itemtype="http://schema.org/Thing"><a href="/categories/big-data-technical-knowledge/" itemprop="url" rel="index"><span itemprop="name">大数据技术知识</span> </a></span></span><span id="busuanzi_container_page_pv" style="display:none"><span class="post-meta-divider">|</span> 阅读次数 <span id="busuanzi_value_page_pv"></span></span><div class="post-wordcount"><span class="post-meta-item-text">字数统计</span> <span title="字数统计">2,245字 </span><span class="post-meta-divider">|</span> <span class="post-meta-item-text">阅读时长 ≈</span> <span title="阅读时长">9分钟</span></div></div></header><div class="post-body" itemprop="articleBody"><p>最近在项目中遇到一个异常,写入数据到 <code>Elasticsearch</code> 中,报错:<code>max_bytes_length_exceeded_exception</code>。这个其实和 <code>Elasticsearch</code> 的字段长度限制有关,本文就回顾一下在 <code>Elasticsearch</code> 中一个字段支持的最大字符数。</p><p>本文涉及的开发环境:<code>Elasticsearch v5.6.8</code>,读者需要注意 <strong>字符数 </strong>、<strong> 字节数 </strong>这两个基本概念的区别。</p><a id="more"></a><h1 id="问题出现"><a href="# 问题出现" class="headerlink" title="问题出现"></a>问题出现</h1><p>在业务中发现漏数,查看后台的任务日志,发现异常:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">ERROR ESBulkProcessor: {"index":"your_index","type":"your_type","id":"b20ddaf126908506024aed6698b50214","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"author.raw\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[-24, -87, -71, -25, -74, -83, -24, -128, -107, -17, -68, -113, -27, -113, -80, -27, -116, -105, -27, -96, -79, -27, -80, -114, 32, -27, -120, -111, -28, -70]...', original message: bytes can be at most 32766 in length; got 98345]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 98345]"}},"status":400}</span><br><span class="line">17/06/14 18:07:04 ERROR ESBulkProcessor: bulk [76 : 1560506824519] 527 request - 526 response</span><br><span class="line">17/06/14 19:05:36 ERROR ESBulkProcessor: {"index":"your_index","type":"your_type","id":"cc36f925a9281389cb50b194cf590108","cause":{"type":"exception","reason":"Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field=\"author.raw\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[-27, -112, -77, -25, -112, -115, -27, -112, -101, -26, -114, -95, -24, -88, -86, -27, -96, -79, -27, -80, -114, 35, 34, 44, 34, 112, 117, 98, 116, 105]...', original message: bytes can be at most 32766 in length; got 94724]","caused_by":{"type":"exception","reason":"Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=max_bytes_length_exceeded_exception: bytes can be at most 32766 in length; got 94724]"}},"status":400}</span><br></pre></td></tr></table></figure><p>可以看到,使用 <code>bulk</code> 方式,在数据写入 <code>Elasticsearch</code> 时遇到异常,如果一个字段的类型是 <code>keyword</code>,而实际写入数据时指定了一个非常长的文本值,会报错:<code>illegal_argument_exception</code>、<code>max_bytes_length_exceeded_exception</code>,整个文档写入失败并返回异常【注意,会过滤掉当前整个文档,即整条数据不能被写入,而如果字段的字节长度小于等于 32766,文档是可以被写入的,但是这个字段可能不会被索引,参考下面的 <code>ignore_above</code> 参数】。</p><p>更详细的信息:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">whose UTF8 encoding is longer than the max length 32766</span><br></pre></td></tr></table></figure><p><code>author.raw</code> 取值的字节数超过了 32766,无法写入,综合上述异常信息,表明 <code>author.raw</code> 字段定义为 <code>keyword</code>,而实际写入数据时文本长度过大,字节数达到 94724【大概率是脏数据】。</p><p>注意,这里的无法写入是针对整个文档,即整条数据无法成功写入 <code>Elasticsearch</code>。</p><h1 id="问题分析"><a href="# 问题分析" class="headerlink" title="问题分析"></a>问题分析</h1><p>对于这种超长的值,如果简单的把字段设置为 <code>keyword</code> 类型肯定是不行的。</p><p>解决方法就是对这种长文本的字段不能定义为 <code>keyword</code>,而应该定义为分析类型,即 <code>text</code>,并指定必要的分析器。</p><p>那如果这个字段本身就应该定义为 <code>keyword</code> 类型,而实际中存在少量的脏数据,这种超长的内容是可以忽略的,那就给这个字段指定一个最长字符数,例如 200 字符,在写入前判断一下长度,超过则移除或者截断,不要让这种超长的文本进入写 <code>Elasticsearch</code> 的流程。毕竟这种超长文本写入到一个 <code>keyword</code> 类型的字段中,对于 <code>Elasticsearch</code> 是不友好的,底层的 <code>Lucene</code> 也无法支持,而且哪怕写入了,对于使用者来说也没有意义【要进行全文检索才是有意义的】。</p><h2 id="禁止索引"><a href="# 禁止索引" class="headerlink" title="禁止索引"></a>禁止索引</h2><p>当然,对于长度不超过 32766 字节的 <code>keyword</code> 类型字段值,如果太长也没有意义,例如几百几千个字符【对应的字节数可能是几千几万】,而 <code>Elasticsearch</code> 原生也支持对 <code>keyword</code> 类型的字段设置禁止索引的长度上限,超过一定的字符数【前提是不超过 32766 字节】则当前字段不能被索引,但是字段的数据还是能写入的,它就是 <code>ignore_above</code> 参数,下面举例说明。</p><p>设置 <code>name_ignore</code> 字段为 <code>keyword</code> 类型,并指定 <code>ignore_above</code> 为 8,表示最大可以索引 8 个字符的长度。同理,设置 <code>name</code> 字段为 <code>keyword</code> 类型,并指定 <code>ignore_above</code> 为 32,表示最大可以索引 32 个字符的长度。</p><p>注意,<code>ignore_above</code> 参数限制的是字符数,具体字节数要根据实际内容转换,如果内容中都是字母、数字,则字符数就是字节数,但是当内容中大多数是中文、韩文,则字节数等于字符数乘以 4。</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">PUT /my-index-post/_mapping/post</span><br><span class="line">{</span><br><span class="line"> "properties": {</span><br><span class="line"> "name_ignore": {</span><br><span class="line"> "type": "keyword",</span><br><span class="line"> "ignore_above": 8</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">PUT /my-index-post/_mapping/post</span><br><span class="line">{</span><br><span class="line"> "properties": {</span><br><span class="line"> "name": {</span><br><span class="line"> "type": "keyword",</span><br><span class="line"> "ignore_above": 32</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>写入 2 条数据:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">POST my-index-post/post/1</span><br><span class="line">{</span><br><span class="line"> "name": " 名称过长会被过滤名称过长会被过滤 & quot;,</span><br><span class="line"> "name_ignore": " 名称过长会被过滤名称过长会被过滤 & quot;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line">POST my-index-post/post/2</span><br><span class="line">{</span><br><span class="line"> "name": " 名称过长会被过滤名称过长会被过滤 & quot;,</span><br><span class="line"> "name_ignore": " 名称过长会被过滤名称过长会被过滤 & quot;</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p>可以看一下数据,2 条数据都成功写入 <code>Elasticsearch</code> 中:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">POST my-index-post/_search</span><br><span class="line">{</span><br><span class="line"> "query": {</span><br><span class="line"> "bool": {</span><br><span class="line"> "must": [</span><br><span class="line"> {</span><br><span class="line"> "terms": {</span><br><span class="line"> "_id": [</span><br><span class="line"> "1",</span><br><span class="line"> "2"</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213903.png" alt="查看 2 条数据" title="查看 2 条数据"></p><p>可以看到字段信息都完整,那设置了 <code>name_ignore</code> 参数的用处是什么呢,在于是否 <strong>索引 </strong>,我们加上精确匹配来查询一下:</p><figure class="highlight plain"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br><span class="line">31</span><br><span class="line">32</span><br><span class="line">33</span><br><span class="line">34</span><br><span class="line">35</span><br><span class="line">36</span><br><span class="line">37</span><br><span class="line">38</span><br><span class="line">39</span><br><span class="line">40</span><br><span class="line">41</span><br><span class="line">42</span><br><span class="line">43</span><br><span class="line">44</span><br><span class="line">45</span><br><span class="line">46</span><br><span class="line">47</span><br><span class="line">48</span><br><span class="line">49</span><br><span class="line">50</span><br><span class="line">51</span><br></pre></td><td class="code"><pre><span class="line"># 查不到数据 </span><br><span class="line">POST my-index-post/_search</span><br><span class="line">{</span><br><span class="line"> "query": {</span><br><span class="line"> "bool": {</span><br><span class="line"> "must": [</span><br><span class="line"> {</span><br><span class="line"> "terms": {</span><br><span class="line"> "_id": [</span><br><span class="line"> "1",</span><br><span class="line"> "2"</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> },</span><br><span class="line"> {</span><br><span class="line"> "terms": {</span><br><span class="line"> "name_ignore": [</span><br><span class="line"> " 名称过长会被过滤名称过长会被过滤 & quot;</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"># 可以查到数据 </span><br><span class="line">POST my-index-post/_search</span><br><span class="line">{</span><br><span class="line"> "query": {</span><br><span class="line"> "bool": {</span><br><span class="line"> "must": [</span><br><span class="line"> {</span><br><span class="line"> "terms": {</span><br><span class="line"> "_id": [</span><br><span class="line"> "1",</span><br><span class="line"> "2"</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> },</span><br><span class="line"> {</span><br><span class="line"> "terms": {</span><br><span class="line"> "name": [</span><br><span class="line"> " 名称过长会被过滤名称过长会被过滤 & quot;</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line"> ]</span><br><span class="line"> }</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213928.png" alt="查不到数据" title="查不到数据"></p><p><img src="https://raw.githubusercontent.com/iplaypi/img-playpi/master/img/2017/20200305213935.png" alt="可以查到数据" title="可以查到数据"></p><p>可以发现,使用 <code>name_ignore</code> 字段做精确匹配时查不到数据,而使用 <code>name</code> 字段却可以,说明 <code>Elasticsearch</code> 在写入 <code>name_ignore</code> 字段的值时没有对超过 8 个字符的做索引,只是简单的存储,也就无法查询。</p><p>官方说明:</p><blockquote><p>Strings longer than the ignore_above setting will not be indexed or stored.</p></blockquote><h1 id="总结"><a href="# 总结" class="headerlink" title="总结"></a>总结</h1><p>1、一个字段被设置为 <code>keyword</code> 类型,遇到很长的大段内容写入后【超过 32766 个字节】,抛出字节数过大异常,整条数据无法写入。</p><p>2、搜索超过 <code>ignore_above</code> 设定长度的字段,无法命中数据【因为在写入时没有做索引,但是字段的值仍旧保留】。</p><p>3、写入数据时,内容的字符数超过 <code>ignore_above</code> 的限制,整条数据仍旧可以入库【包含当前字段】,只是内容不会被索引,在查询命中这条数据时字段对应的值仍旧可以返回。</p><p>4、如果不设置 <code>ignore_above</code> 的值,默认为 256 个字符,但是记住这个值首先受限于 <code>keyword</code> 类型的限制,并不能无限大。</p><h2 id="引申说明"><a href="# 引申说明" class="headerlink" title="引申说明"></a> 引申说明</h2><p>1、由于 <code>keyword</code> 的长度限制,<code>keyword</code> 类型的最大支持的长度为 32766 个字节,注意如果是 <code>UTF-8</code> 类型的字符【占用 1-4 个字节】,也就能支持 8000 个左右【如果都是数字、字母则会长一点】,也就是说 <code>term</code> 精确匹配的最大支持长度为 8000 个 <code>UTF-8</code> 个字符【而实际上这么长在应用中是没有意义的】。</p><p>2、两种类型的区别:</p><ul><li><code>text</code> 类型:没有最大长度限制,支持分词、全文检索,不支持聚合、排序,因此适合大字段存储,例如文章详情</li><li><code>keyword</code> 类型:最大字节数为 32766,如果使用 <code>UTF-8</code> 编码,最大字符数粗略估计可以使用最大字节数除以 4,支持精确匹配,支持聚合、排序,适合精确字段匹配,例如:<code>url</code>、姓名、性别</li></ul><p>官方说明:</p><blockquote><p>This option is also useful for protecting against Lucene’s term byte-length limit of 32766.<br>The value for ignore_above is the character count, but Lucene counts bytes. If you use UTF-8 text with many non-ASCII characters, you may want to set the limit to 32766 / 4 = 8191 since UTF-8 characters may occupy at most 4 bytes.</p></blockquote><h1 id="备注"><a href="# 备注" class="headerlink" title="备注"></a>备注</h1><p>1、官方文档:<a href="https://www.elastic.co/guide/en/elasticsearch/reference/5.6/ignore-above.html" target="_blank" rel="noopener">ignore-above</a> 。</p><p>2、字段的 <code>ignore_above</code> 可以变更,类型不会变更,不会影响已经存储的内容【使用 <code>put</code> 接口,参考官方文档:<a href="https://www.elastic.co/guide/en/elasticsearch/reference/5.6/indices-put-mapping.html" target="_blank" rel="noopener">indices-put-mapping</a>】,只会影响以后写入的内容,因为字段类型并没有变化,只是限制了写入长度。</p><p>3、设置时取值为数值,例如 6、16 等,注意它表示的是字符数,不是字节数,所以如果数据都是字母、数字最大就可以设置为 32766,但是当数据是中文、韩文时最大只能设置为 8000 了。</p><p>4、如果需要同一个字段存在多种类型,可以使用 <code>multi-fields</code> 特性,参考:<a href="https://www.elastic.co/guide/en/elasticsearch/reference/5.6/multi-fields.html" target="_blank" rel="noopener">multi-fields</a> 。</p></div><div><div id="wechat_subscriber" style="display:block;padding:10px 0;margin:20px auto;width:100%;text-align:center"><img id="wechat_subscriber_qcode" src="/images/wechat-qr-personal.jpg" alt="虾丸派 wechat" style="width:200px;max-width:100%"><div>扫一扫添加博主,进技术交流群,共同学习进步</div></div></div><div><div style="padding:10px 0;margin:20px auto;width:90%;text-align:center"><div>永不止步</div><button id="rewardButton" disable="enable" onclick='var qr=document.getElementById("QR");"none"===qr.style.display?qr.style.display="block":qr.style.display="none"'><span>打赏</span></button><div id="QR" style="display:none"><div id="wechat" style="display:inline-block"><img id="wechat_qr" src="/images/wechat-pay-playpi.png" alt="虾丸派 微信支付"><p>微信支付</p></div></div></div></div><div><ul class="post-copyright"><li class="post-copyright-author"><strong>本文作者:</strong> 虾丸派</li><li class="post-copyright-link"><strong>本文链接:</strong> <a href="https://www.playpi.org/2017061401.html" title="在 Elasticsearch 中一个字段支持的最大字符数">https://www.playpi.org/2017061401.html</a></li><li class="post-copyright-license"><strong>版权声明: </strong>本博客所有文章除特别声明外,均采用 <a href="https://creativecommons.org/licenses/by-nc-sa/3.0/" rel="external nofollow" target="_blank">CC BY-NC-SA 3.0</a> 许可协议。转载请注明出处!</li></ul></div><footer class="post-footer"><div class="post-tags"><a href="/tags/Elasticsearch/" rel="tag"><i class="fa fa-tag"></i> Elasticsearch</a> <a href="/tags/bulk/" rel="tag"><i class="fa fa-tag"></i> bulk</a> <a href="/tags/keyword/" rel="tag"><i class="fa fa-tag"></i> keyword</a> <a href="/tags/ignore-above/" rel="tag"><i class="fa fa-tag"></i> ignore_above</a></div><div class="post-nav"><div class="post-nav-next post-nav-item"><a href="/2017060101.html" rel="next" title="记录一个 Kafka 错误:OffsetOutOfRangeException"><i class="fa fa-chevron-left"></i> 记录一个 Kafka 错误:OffsetOutOfRangeException</a></div><span class="post-nav-divider"></span><div class="post-nav-prev post-nav-item"><a href="/2017071701.html" rel="prev" title="Spark 序列化的一些事">Spark 序列化的一些事 <i class="fa fa-chevron-right"></i></a></div></div></footer></div></article><div class="post-spread"></div></div></div><div class="comments" id="comments"><div id="vcomments"></div></div></div><div class="sidebar-toggle"><div class="sidebar-toggle-line-wrap"><span class="sidebar-toggle-line sidebar-toggle-line-first"></span> <span class="sidebar-toggle-line sidebar-toggle-line-middle"></span> <span class="sidebar-toggle-line sidebar-toggle-line-last"></span></div></div><aside id="sidebar" class="sidebar"><div class="sidebar-inner"><ul class="sidebar-nav motion-element"><li class="sidebar-nav-toc sidebar-nav-active" data-target="post-toc-wrap">文章目录</li><li class="sidebar-nav-overview" data-target="site-overview-wrap">站点概览</li></ul><section class="site-overview-wrap sidebar-panel"><div class="site-overview"><div class="site-author motion-element" itemprop="author" itemscope itemtype="http://schema.org/Person"><img class="site-author-image" itemprop="image" src="/images/favicon-1536x1536-playpi.png" alt="虾丸派"><p class="site-author-name" itemprop="name">虾丸派</p><p class="site-description motion-element" itemprop="description">记录知识 | 分享技术</p></div><nav class="site-state motion-element"><div class="site-state-item site-state-posts"><a href="/archives/"><span class="site-state-item-count">144</span> <span class="site-state-item-name">日志</span></a></div><div class="site-state-item site-state-categories"><a href="/categories/index.html"><span class="site-state-item-count">13</span> <span class="site-state-item-name">分类</span></a></div><div class="site-state-item site-state-tags"><a href="/tags/index.html"><span class="site-state-item-count">294</span> <span class="site-state-item-name">标签</span></a></div></nav><div class="feed-link motion-element"><a href="/atom.xml" rel="alternate"><i class="fa fa-rss"></i> RSS</a></div><div class="links-of-author motion-element"><span class="links-of-author-item"><a href="https://github.com/iplaypi" target="_blank" title="GitHub"><i class="fa fa-fw fa-github"></i>GitHub</a> </span><span class="links-of-author-item"><a href="https://weibo.com/u/3086148515" target="_blank" title="微博"><i class="fa fa-fw fa-weibo"></i>微博</a> </span><span class="links-of-author-item"><a href="mailto:playpi@qq.com" target="_blank" title="E-Mail"><i class="fa fa-fw fa-envelope"></i>E-Mail</a></span></div><div class="cc-license motion-element" itemprop="license"><a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" class="cc-opacity" target="_blank" rel="external nofollow"><img src="/images/cc-by-nc-sa.svg" alt="Creative Commons"></a></div><div class="links-of-blogroll motion-element links-of-blogroll-inline"><div class="links-of-blogroll-title"><i class="fa fa-fw fa-link"></i> 友情链接</div><ul class="links-of-blogroll-list"><li class="links-of-blogroll-item"><a href="https://github.com/iplaypi" title="GitHub" target="_blank" rel="external nofollow">GitHub</a></li><li class="links-of-blogroll-item"><a href="https://weibo.com/u/3086148515" title="Weibo" target="_blank" rel="external nofollow">Weibo</a></li><li class="links-of-blogroll-item"><a href="https://www.playpi.org" title="虾丸派" target="_blank" rel="external nofollow">虾丸派</a></li><li class="links-of-blogroll-item"><a href="https://www.playpi.org" title="playpi" target="_blank" rel="external nofollow">playpi</a></li><li class="links-of-blogroll-item"><a href="https://www.liaoxuefeng.com" title="廖雪峰" target="_blank" rel="external nofollow">廖雪峰</a></li><li class="links-of-blogroll-item"><a href="http://www.ruanyifeng.com" title="阮一峰" target="_blank" rel="external nofollow">阮一峰</a></li><li class="links-of-blogroll-item"><a href="https://travis-ci.org/iplaypi/iplaypi.github.io" title="travis-ci" target="_blank" rel="external nofollow">travis-ci</a></li><li class="links-of-blogroll-item"><a href="https://www.vultr.com/?ref=7861302-4F" title="Vultr" target="_blank" rel="external nofollow">Vultr</a></li></ul></div></div></section><section class="post-toc-wrap motion-element sidebar-panel sidebar-panel-active"><div class="post-toc"><div class="post-toc-content"><ol class="nav"><li class="nav-item nav-level-1"><a class="nav-link" href="#问题出现"><span class="nav-number">1.</span> <span class="nav-text">问题出现</span></a></li><li class="nav-item nav-level-1"><a class="nav-link" href="#问题分析"><span class="nav-number">2.</span> <span class="nav-text">问题分析</span></a><ol class="nav-child"><li class="nav-item nav-level-2"><a class="nav-link" href="#禁止索引"><span class="nav-number">2.1.</span> <span class="nav-text">禁止索引</span></a></li></ol></li><li class="nav-item nav-level-1"><a class="nav-link" href="#总结"><span class="nav-number">3.</span> <span class="nav-text">总结</span></a><ol class="nav-child"><li class="nav-item nav-level-2"><a class="nav-link" href="#引申说明"><span class="nav-number">3.1.</span> <span class="nav-text">引申说明</span></a></li></ol></li><li class="nav-item nav-level-1"><a class="nav-link" href="#备注"><span class="nav-number">4.</span> <span class="nav-text">备注</span></a></li></ol></div></div></section></div></aside></div></main><footer id="footer" class="footer"><div class="footer-inner"><div class="copyright">© 2016–<span itemprop="copyrightYear">2021</span> <span class="post-meta-divider">|</span> <span class="with-love"><i class="fa fa-heart"></i> </span><span class="author" itemprop="copyrightHolder">虾丸派</span> <span class="post-meta-divider">|</span> <span class="post-meta-item-icon"><i class="fa fa-area-chart"></i> </span><span class="post-meta-item-text">全站字数统计</span> <span title="全站字数统计">326.3k 字</span></div><div class="powered-by">由 <a class="theme-link" target="_blank" href="https://hexo.io" rel="external nofollow">Hexo</a> 强力驱动</div><span class="post-meta-divider">|</span><div class="theme-info">主题 <a class="theme-link" target="_blank" href="https://github.com/iissnan/hexo-theme-next" rel="external nofollow">NexT.Mist</a><script async src="//busuanzi.ibruce.info/busuanzi/2.3/busuanzi.pure.mini.js"></script><span id="busuanzi_container_site_pv" style="display:none"><span class="post-meta-divider">|</span> 总访问量 <span id="busuanzi_value_site_pv"></span> 次 </span><span id="busuanzi_container_site_uv" style="display:none"><span class="post-meta-divider">|</span> 总访客 <span id="busuanzi_value_site_uv"></span> 人</span></div><div class="busuanzi-count"><script async src="https://dn-lbstatics.qbox.me/busuanzi/2.3/busuanzi.pure.mini.js"></script></div></div></footer><div class="back-to-top"><i class="fa fa-arrow-up"></i> <span id="scrollpercent"><span>0</span>%</span></div></div><script type="text/javascript">"[object Function]"!==Object.prototype.toString.call(window.Promise)&&(window.Promise=null)</script><script type="text/javascript" src="/lib/jquery/index.js?v=2.1.3"></script><script type="text/javascript" src="/lib/fastclick/lib/fastclick.min.js?v=1.0.6"></script><script type="text/javascript" src="/lib/jquery_lazyload/jquery.lazyload.js?v=1.9.7"></script><script type="text/javascript" src="/lib/velocity/velocity.min.js?v=1.2.1"></script><script type="text/javascript" src="/lib/velocity/velocity.ui.min.js?v=1.2.1"></script><script type="text/javascript" src="/lib/fancybox/source/jquery.fancybox.pack.js?v=2.1.5"></script><script type="text/javascript" src="/js/src/utils.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/motion.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/scrollspy.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/post-details.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/bootstrap.js?v=5.1.3"></script><script src="//unpkg.com/valine@1.3.7/dist/Valine.min.js"></script><script type="text/javascript">new Valine({av:AV,el:"#comments",verify:!1,notify:!1,app_id:"FC5Jijeg1meo2K2OzPYWK327-gzGzoHsz",app_key:"6A1ReY8tjhPutK00F01YbJSq",placeholder:"没有问题吗?"})</script><script type="text/javascript">var isfetched=!1,isXml=!0,search_path="search.xml";0===search_path.length?search_path="search.xml":/json$/i.test(search_path)&&(isXml=!1);var path="/"+search_path,onPopupClose=function(t){$(".popup").hide(),$("#local-search-input").val(""),$(".search-result-list").remove(),$("#no-result").remove(),$(".local-search-pop-overlay").remove(),$("body").css("overflow","")};function proceedsearch(){$("body").append('<div class="search-popup-overlay local-search-pop-overlay"></div>').css("overflow","hidden"),$(".search-popup-overlay").click(onPopupClose),$(".popup").toggle();var t=$("#local-search-input");t.attr("autocapitalize","none"),t.attr("autocorrect","off"),t.focus()}var searchFunc=function(t,e,s){"use strict";$("body").append('<div class="search-popup-overlay local-search-pop-overlay"><div id="search-loading-icon"><i class="fa fa-spinner fa-pulse fa-5x fa-fw"></i></div></div>').css("overflow","hidden"),$("#search-loading-icon").css("margin","20% auto 0 auto").css("text-align","center"),$.ajax({url:t,dataType:isXml?"xml":"json",async:!0,success:function(t){isfetched=!0,$(".popup").detach().appendTo(".header-inner");var o=isXml?$("entry",t).map(function(){return{title:$("title",this).text(),content:$("content",this).text(),url:$("url",this).text()}}).get():t,n=document.getElementById(e),r=document.getElementById(s),t=function(){var m=n.value.trim().toLowerCase(),x=m.split(/[\s\-]+/);1<x.length&&x.push(m);var e,w=[];0<m.length&&o.forEach(function(t){var e=!1,o=0,h=0,n=t.title.trim(),r=n.toLowerCase(),s=t.content.trim().replace(/<[^>]+>/g,""),a=s.toLowerCase(),i=decodeURIComponent(t.url),c=[],l=[];if(""!=n&&(x.forEach(function(t){function e(t,e,o){var n=t.length;if(0===n)return[];var r,s=0,a=[];for(o||(e=e.toLowerCase(),t=t.toLowerCase());-1<(r=e.indexOf(t,s));)a.push({position:r,word:t}),s=r+n;return a}c=c.concat(e(t,r,!1)),l=l.concat(e(t,a,!1))}),(0<c.length||0<l.length)&&(e=!0,o=c.length+l.length)),e){function p(t,e,o,n){for(var r=n[n.length-1],s=r.position,a=r.word,i=[],c=0;s+a.length<=o&&0!=n.length;){a===m&&c++,i.push({position:s,length:a.length});var l=s+a.length;for(n.pop();0!=n.length&&(s=(r=n[n.length-1]).position,a=r.word,s<l);)n.pop()}return h+=c,{hits:i,start:e,end:o,searchTextCount:c}}[c,l].forEach(function(t){t.sort(function(t,e){return e.position!==t.position?e.position-t.position:t.word.length-e.word.length})});t=[];0!=c.length&&t.push(p(0,0,n.length,c));for(var u=[];0!=l.length;){var f=l[l.length-1],d=f.position,g=f.word,v=d-20,f=d+80;v<0&&(v=0),(f=f<d+g.length?d+g.length:f)>s.length&&(f=s.length),u.push(p(0,v,f,l))}u.sort(function(t,e){return t.searchTextCount!==e.searchTextCount?e.searchTextCount-t.searchTextCount:t.hits.length!==e.hits.length?e.hits.length-t.hits.length:t.start-e.start});e=parseInt("1");function $(o,t){var n="",r=t.start;return t.hits.forEach(function(t){n+=o.substring(r,t.position);var e=t.position+t.length;n+='<b class="search-keyword">'+o.substring(t.position,e)+"</b>",r=e}),n+=o.substring(r,t.end)}0<=e&&(u=u.slice(0,e));var C="";0!=t.length?C+="<li><a href='"+i+"' class='search-result-title'>"+$(n,t[0])+"</a>":C+="<li><a href='"+i+"' class='search-result-title'>"+n+"</a>",u.forEach(function(t){C+="<a href='"+i+'\'><p class="search-result">'+$(s,t)+"...</p></a>"}),C+="</li>",w.push({item:C,searchTextCount:h,hitCount:o,id:w.length})}}),1===x.length&&""===x[0]?r.innerHTML='<div id="no-result"><i class="fa fa-search fa-5x" /></div>':0===w.length?r.innerHTML='<div id="no-result"><i class="fa fa-frown-o fa-5x" /></div>':(w.sort(function(t,e){return t.searchTextCount!==e.searchTextCount?e.searchTextCount-t.searchTextCount:t.hitCount!==e.hitCount?e.hitCount-t.hitCount:e.id-t.id}),e='<ul class="search-result-list">',w.forEach(function(t){e+=t.item}),e+="</ul>",r.innerHTML=e)};n.addEventListener("input",t),$(".local-search-pop-overlay").remove(),$("body").css("overflow",""),proceedsearch()}})};$(".popup-trigger").click(function(t){t.stopPropagation(),!1===isfetched?searchFunc(path,"local-search-input","local-search-result"):proceedsearch()}),$(".popup-btn-close").click(onPopupClose),$(".popup").click(function(t){t.stopPropagation()}),$(document).on("keyup",function(t){27===t.which&&$(".search-popup").is(":visible")&&onPopupClose()})</script><script>!function(){var t=document.createElement("script"),e=window.location.protocol.split(":")[0];t.src="https"===e?"https://zz.bdstatic.com/linksubmit/push.js":"http://push.zhanzhang.baidu.com/push.js";e=document.getElementsByTagName("script")[0];e.parentNode.insertBefore(t,e)}()</script><script type="text/javascript" src="/js/src/js.cookie.js?v=5.1.3"></script><script type="text/javascript" src="/js/src/scroll-cookie.js?v=5.1.3"></script><script src="/live2dw/lib/L2Dwidget.min.js?094cbace49a39548bed64abff5988b05"></script><script>L2Dwidget.init({pluginRootPath:"live2dw/",pluginJsPath:"lib/",pluginModelPath:"assets/",tagMode:!1,debug:!1,model:{scale:1,jsonPath:"/live2dw/assets/hijiki.model.json"},display:{position:"left",width:100,height:200,hOffset:0,vOffset:-20},mobile:{show:!1,motion:!0,scale:.3},log:!1})</script></body></html>