Merge branch 'main' into security-tokens-no-index-refresh

albertzaharovits · Jul 24, 2023 · 947363f · 947363f
2 parents c7f774f + 3e07b7c
commit 947363f
Show file tree

Hide file tree

Showing 220 changed files with 1,970 additions and 865 deletions.
diff --git a/...al/src/main/java/org/elasticsearch/gradle/internal/precommit/JavaModulePrecommitTask.java b/...al/src/main/java/org/elasticsearch/gradle/internal/precommit/JavaModulePrecommitTask.java
@@ -116,8 +116,9 @@ private void checkModuleVersion(ModuleReference mref) {
 
     private void checkModuleNamePrefix(ModuleReference mref) {
         getLogger().info("{} checking module name prefix for {}", this, mref.descriptor().name());
-        if (mref.descriptor().name().startsWith("org.elasticsearch.") == false) {
-            throw new GradleException("Expected name starting with \"org.elasticsearch.\", in " + mref.descriptor());
+        if (mref.descriptor().name().startsWith("org.elasticsearch.") == false
+            && mref.descriptor().name().startsWith("co.elastic.") == false) {
+            throw new GradleException("Expected name starting with \"org.elasticsearch.\" or \"co.elastic\" in " + mref.descriptor());
         }
     }
 

diff --git a/docs/changelog/96515.yaml b/docs/changelog/96515.yaml
@@ -0,0 +1,5 @@
+pr: 96515
+summary: Support boxplot aggregation in transform
+area: Transform
+type: enhancement
+issues: []
diff --git a/docs/changelog/97683.yaml b/docs/changelog/97683.yaml
@@ -0,0 +1,5 @@
+pr: 97683
+summary: Refactor nested field handling in `FieldFetcher`
+area: Search
+type: enhancement
+issues: []
diff --git a/docs/changelog/97840.yaml b/docs/changelog/97840.yaml
@@ -0,0 +1,6 @@
+pr: 97840
+summary: Improve exception handling in Coordinator#publish
+area: Cluster Coordination
+type: bug
+issues:
+ - 97798
diff --git a/docs/reference/aggregations/metrics/geoline-aggregation.asciidoc b/docs/reference/aggregations/metrics/geoline-aggregation.asciidoc
@@ -1,8 +1,8 @@
 [role="xpack"]
 [[search-aggregations-metrics-geo-line]]
-=== Geo-Line Aggregation
+=== Geo-line aggregation
 ++++
-<titleabbrev>Geo-Line</titleabbrev>
+<titleabbrev>Geo-line</titleabbrev>
 ++++
 
 The `geo_line` aggregation aggregates all `geo_point` values within a bucket into a `LineString` ordered
@@ -77,13 +77,12 @@ Which returns:
 The resulting https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] contains both a `LineString` geometry
 for the path generated by the aggregation, as well as a map of `properties`.
 The property `complete` informs of whether all documents matched were used to generate the geometry.
-The `size` option described below can be used to limit the number of documents included in the aggregation,
+The <<search-aggregations-metrics-geo-line-size,`size` option>> can be used to limit the number of documents included in the aggregation,
 leading to results  with `complete: false`.
-Exactly which documents are dropped from results depends on whether the aggregation is based
-on `time_series` or not, and this is discussed in
-<<search-aggregations-metrics-geo-line-grouping-time-series-advantages,more detail below>>.
+Exactly which documents are dropped from results <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,depends on whether the aggregation is based
+on `time_series` or not>>.
 
-The above result could be displayed in a map user interface:
+This result could be displayed in a map user interface:
 
 image:images/spatial/geo_line.png[Kibana map with museum tour of Amsterdam]
 
@@ -132,18 +131,19 @@ feature properties.
 The line is sorted in ascending order by the sort key when set to "ASC", and in descending
 with "DESC".
 
+[[search-aggregations-metrics-geo-line-size]]
 `size`::
 (Optional, integer, default: `10000`) The maximum length of the line represented in the aggregation.
 Valid sizes are between one and 10000.
 Within <<search-aggregations-metrics-geo-line-grouping-time-series,`time_series`>>
 the aggregation uses line simplification to constrain the size, otherwise it uses truncation.
-See <<search-aggregations-metrics-geo-line-grouping-time-series-advantages,below>>
+Refer to <<search-aggregations-metrics-geo-line-grouping-time-series-advantages>>
 for a discussion on the subtleties involved.
 
 [[search-aggregations-metrics-geo-line-grouping]]
 ==== Grouping
 
-The simple example above will produce a single track for all the data selected by the query. However, it is far more
+This simple example produces a single track for all the data selected by the query. However, it is far more
 common to need to group the data into multiple tracks. For example, grouping flight transponder measurements by
 flight call-sign before sorting each flight by timestamp and producing a separate track for each.
 
@@ -210,7 +210,7 @@ POST /tour/_bulk?refresh
 [[search-aggregations-metrics-geo-line-grouping-terms]]
 ==== Grouping with terms
 
-Using the above data, for a non-time-series use case, the grouping can be done using a
+Using this data, for a non-time-series use case, the grouping can be done using a
 <<search-aggregations-bucket-terms-aggregation,terms aggregation>> based on city name.
 This would work whether or not we had defined the `tour` index as a time series index.
 
@@ -294,17 +294,19 @@ Which returns:
 ----
 // TESTRESPONSE
 
-The above results contain an array of buckets, where each bucket is a JSON object with the `key` showing the name
+These results contain an array of buckets, where each bucket is a JSON object with the `key` showing the name
 of the `city` field, and an inner aggregation result called `museum_tour` containing a
 https://tools.ietf.org/html/rfc7946#section-3.2[GeoJSON Feature] describing the
 actual route between the various attractions in that city.
 Each result also includes a `properties` object with a `complete` value which will be `false` if the geometry
 was truncated to the limits specified in the `size` parameter.
-Note that when we use `time_series` in the example below, we will get the same results structured a little differently.
+Note that when we use `time_series` in the next example, we will get the same results structured a little differently.
 
 [[search-aggregations-metrics-geo-line-grouping-time-series]]
 ==== Grouping with time-series
 
+preview::[]
+
 Using the same data as before, we can also perform the grouping with a
 <<search-aggregations-bucket-time-series-aggregation,`time_series` aggregation>>.
 This will group by TSID, which is defined as the combinations of all fields with `time_series_dimension: true`,
@@ -337,7 +339,7 @@ NOTE: The `geo_line` aggregation no longer requires the `sort` field when nested
 This is because the sort field is set to `@timestamp`, which all time-series indexes are pre-sorted by.
 If you do set this parameter, and set it to something other than `@timestamp` you will get an error.
 
-The above query will result in:
+This query will result in:
 
 [source,js]
 ----
@@ -400,7 +402,7 @@ The above query will result in:
 ----
 // TESTRESPONSE
 
-The above results are essentially the same as with the previous `terms` aggregation example, but structured differently.
+These results are essentially the same as with the previous `terms` aggregation example, but structured differently.
 Here we see the buckets returned as a map, where the key is an internal description of the TSID.
 This TSID is unique for each unique combination of fields with `time_series_dimension: true`.
 Each bucket contains a `key` field which is also a map of all dimension values for the TSID, in this case only the city
@@ -414,7 +416,7 @@ was simplified to the limits specified in the `size` parameter.
 [[search-aggregations-metrics-geo-line-grouping-time-series-advantages]]
 ==== Why group with time-series?
 
-When reviewing the above examples, you might think that there is little difference between using
+When reviewing these examples, you might think that there is little difference between using
 <<search-aggregations-bucket-terms-aggregation,`terms`>> or
 <<search-aggregations-bucket-time-series-aggregation,`time_series`>>
 to group the geo-lines. However, there are some important differences in behaviour between the two cases.

diff --git a/docs/reference/how-to/size-your-shards.asciidoc b/docs/reference/how-to/size-your-shards.asciidoc
@@ -140,20 +140,21 @@ Every new backing index is an opportunity to further tune your strategy.
 
 [discrete]
 [[shard-size-recommendation]]
-==== Aim for shard sizes between 10GB and 50GB
-
-Larger shards take longer to recover after a failure. When a node fails, {es}
-rebalances the node's shards across the data tier's remaining nodes. This
-recovery process typically involves copying the shard contents across the
-network, so a 100GB shard will take twice as long to recover than a 50GB shard.
-In contrast, small shards carry proportionally more overhead and are less
-efficient to search. Searching fifty 1GB shards will take substantially more
-resources than searching a single 50GB shard containing the same data.
-
-There are no hard limits on shard size, but experience shows that shards
-between 10GB and 50GB typically work well for logs and time series data. You
-may be able to use larger shards depending on your network and use case.
-Smaller shards may be appropriate for
+==== Aim for shards of up to 200M documents, or with sizes between 10GB and 50GB
+
+There is some overhead associated with each shard, both in terms of cluster
+management and search performance. Searching a thousand 50MB shards will be
+substantially more expensive than searching a single 50GB shard containing the
+same data. However, very large shards can also cause slower searches and will
+take longer to recover after a failure.
+
+There is no hard limit on the physical size of a shard, and each shard can in
+theory contain up to just over two billion documents. However, experience shows
+that shards between 10GB and 50GB typically work well for many use cases, as
+long as the per-shard document count is kept below 200 million.
+
+You may be able to use larger shards depending on your network and use case,
+and smaller shards may be appropriate for
 {enterprise-search-ref}/index.html[Enterprise Search] and similar use cases.
 
 If you use {ilm-init}, set the <<ilm-rollover,rollover action>>'s

diff --git a/docs/reference/migration/apis/feature-migration.asciidoc b/docs/reference/migration/apis/feature-migration.asciidoc
@@ -142,7 +142,7 @@ Example response:
   "migration_status" : "NO_MIGRATION_NEEDED"
 }
 --------------------------------------------------
-// TESTRESPONSE[s/"minimum_index_version" : "8100099"/"minimum_index_version" : $body.$_path/]
+// TESTRESPONSE[skip:"AwaitsFix https://github.com/elastic/elasticsearch/issues/97780]
 
 When you submit a POST request to the `_migration/system_features` endpoint to
 start the migration process, the response indicates what features will be

diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc
@@ -767,6 +767,7 @@ currently supported:
 +
 --
 * <<search-aggregations-metrics-avg-aggregation,Average>>
+* <<search-aggregations-metrics-boxplot-aggregation,Boxplot>>
 * <<search-aggregations-pipeline-bucket-script-aggregation,Bucket script>>
 * <<search-aggregations-pipeline-bucket-selector-aggregation,Bucket selector>>
 * <<search-aggregations-metrics-cardinality-aggregation,Cardinality>>

diff --git a/docs/reference/scripting/security.asciidoc b/docs/reference/scripting/security.asciidoc
@@ -36,7 +36,7 @@ configured to run both types of scripts. To limit what type of scripts are run,
 set `script.allowed_types` to `inline` or `stored`. To prevent any scripts from 
 running, set `script.allowed_types` to `none`.
 
-IMPORTANT: If you use {kib}, set `script.allowed_types` to `both` or `inline`.
+IMPORTANT: If you use {kib}, set `script.allowed_types` to both or just `inline`.
 Some {kib} features rely on inline scripts and do not function as expected
 if {es} does not allow inline scripts.
 

diff --git a/modules/rest-root/src/main/java/org/elasticsearch/rest/root/MainResponse.java b/modules/rest-root/src/main/java/org/elasticsearch/rest/root/MainResponse.java
@@ -131,7 +131,7 @@ public XContentBuilder toXContent(XContentBuilder builder, Params params) throws
         builder.field("cluster_uuid", clusterUuid);
         builder.startObject("version")
             .field("number", build.qualifiedVersion())
-            .field("build_flavor", "default")
+            .field("build_flavor", build.flavor())
             .field("build_type", build.type().displayName())
             .field("build_hash", build.hash())
             .field("build_date", build.date())

diff --git a/server/src/internalClusterTest/java/org/elasticsearch/snapshots/FeatureStateResetApiIT.java b/server/src/internalClusterTest/java/org/elasticsearch/snapshots/FeatureStateResetApiIT.java
@@ -48,6 +48,7 @@ protected Collection<Class<? extends Plugin>> nodePlugins() {
     }
 
     /** Check that the reset method cleans up a feature */
+    @AwaitsFix(bugUrl = "https://github.com/elastic/elasticsearch/issues/97780")
     public void testResetSystemIndices() throws Exception {
         String systemIndex1 = ".test-system-idx-1";
         String systemIndex2 = ".second-test-system-idx-1";

diff --git a/server/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java b/server/src/main/java/org/elasticsearch/action/search/SearchPhaseController.java
@@ -541,15 +541,25 @@ static ReducedQueryPhase reducedQueryPhase(
             );
         }
         int total = queryResults.size();
-        queryResults = queryResults.stream().filter(res -> res.queryResult().isNull() == false).toList();
-        String errorMsg = "must have at least one non-empty search result, got 0 out of " + total;
-        assert queryResults.isEmpty() == false : errorMsg;
-        if (queryResults.isEmpty()) {
-            throw new IllegalStateException(errorMsg);
+        final Collection<SearchPhaseResult> nonNullResults = new ArrayList<>();
+        boolean hasSuggest = false;
+        boolean hasProfileResults = false;
+        for (SearchPhaseResult queryResult : queryResults) {
+            var res = queryResult.queryResult();
+            if (res.isNull()) {
+                continue;
+            }
+            hasSuggest |= res.suggest() != null;
+            hasProfileResults |= res.hasProfileResults();
+            nonNullResults.add(queryResult);
         }
+        queryResults = nonNullResults;
         validateMergeSortValueFormats(queryResults);
-        final boolean hasSuggest = queryResults.stream().anyMatch(res -> res.queryResult().suggest() != null);
-        final boolean hasProfileResults = queryResults.stream().anyMatch(res -> res.queryResult().hasProfileResults());
+        if (queryResults.isEmpty()) {
+            var ex = new IllegalStateException("must have at least one non-empty search result, got 0 out of " + total);
+            assert false : ex;
+            throw ex;
+        }
 
         // count the total (we use the query result provider here, since we might not get any hits (we scrolled past them))
         final Map<String, List<Suggestion<?>>> groupedSuggestions = hasSuggest ? new HashMap<>() : Collections.emptyMap();
@@ -578,9 +588,7 @@ static ReducedQueryPhase reducedQueryPhase(
                     }
                 }
             }
-            if (bufferedTopDocs.isEmpty() == false) {
-                assert result.hasConsumedTopDocs() : "firstResult has no aggs but we got non null buffered aggs?";
-            }
+            assert bufferedTopDocs.isEmpty() || result.hasConsumedTopDocs() : "firstResult has no aggs but we got non null buffered aggs?";
             if (hasProfileResults) {
                 String key = result.getSearchShardTarget().toString();
                 profileShardResults.put(key, result.consumeProfileResult());