Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add doc_values for fields that need to be sorted or aggregated in ElasticSearch, and disable all others. #12782

Merged
merged 14 commits into from
Nov 24, 2024
7 changes: 7 additions & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
## 10.2.0

#### Project
* Add [`doc_values`](https://www.elastic.co/guide/en/elasticsearch/reference/current/doc-values.html) for fields
that need to be sorted or aggregated in Elasticsearch, and disable all others.
* This change would not impact the existing deployment and its feature for our official release users.
* **Warning** If there are custom query plugins for our Elasticsearch indices, this change could break them as
sort queries and aggregation queries which used the unexpected fields are being blocked.
kezhenxu94 marked this conversation as resolved.
Show resolved Hide resolved

#### OAP Server

* Skip processing OTLP metrics data points with flag `FLAG_NO_RECORDED_VALUE`, which causes exceptional result.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageBuilderFactory;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.library.util.StringUtil;

import java.io.DataOutputStream;
Expand Down Expand Up @@ -205,6 +206,10 @@ private Class generateMetricsClass(AnalysisResult metricsStmt) throws OALCompile
Annotation banyanShardingKeyAnnotation = new Annotation(BanyanDB.SeriesID.class.getName(), constPool);
banyanShardingKeyAnnotation.addMemberValue("index", new IntegerMemberValue(constPool, 0));
annotationsAttribute.addAnnotation(banyanShardingKeyAnnotation);

// Entity id field should enable doc values.
final var enableDocValuesAnnotation = new Annotation(ElasticSearch.EnableDocValues.class.getName(), constPool);
annotationsAttribute.addAnnotation(enableDocValuesAnnotation);
}

if (field.isGroupByCondInTopN()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ public StorageID id() {
private String id0;
@Column(name = ID1, storageOnly = true)
private String id1;
@ElasticSearch.EnableDocValues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does alarm start time need this?

Copy link
Member Author

@kezhenxu94 kezhenxu94 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does alarm start time need this?

All fields used for sorting and aggregation need this

Copy link
Member Author

@kezhenxu94 kezhenxu94 Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does alarm start time need this?

This reminds me one potential issue here, if we restrict to only add fields used in this repo, it might break third parties’s custom plugin if they add more features to their own plugin by aggregating/sorting some fields that we didn’t enable doc_values. The extensibility will be restricted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will. Generally, it may be worth to see how much benefit we will get from this change. Could you try a benchmark about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But to be honest, we never guarantee users could read data from elasticsearch on their own, we only guarantee from our GraphQL/PromQL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hanahmily can you share where you find disabling doc_values can speed up the query performance?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, he was talking about BanyanDB, and the concept was from this config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key use case is when we talk about _traffic indices and trace/log indices. Conditions for that, is not used for sorting and aggregation. So, we could reduce the payload.

1. Value column of metrics.
2. Conditions of logs and traces(skywalking and zipkin) exclude latency and timestamp, which are used in sorting.
3. All searchable field in metadata(*_traffic)

In the original issue, I only proposed three use cases. Nothing more in my mind.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditions for that, is not used for sorting and aggregation. So, we could reduce the payload.

What I’m wondering is this, what “payload” are we trying to reduce, as mentioned, disabling doc_values is mainly for reducing disk space, I don’t see how it would speed up in terms of query performance like you said here #12782 (comment). If reducing disk space is our goal, disabling all possible fields will maximize the outcome, that’s why I tend to disable doc_values by default and opt in those fields that need this feature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reducing disk space is a benefit, and speeding up is just from smaller files/indices perspective, which is BanyanDB side asking about.
Sorry for misleading.

@Column(name = START_TIME)
private long startTime;
@Column(name = ALARM_MESSAGE, length = 512)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ public abstract class AbstractLogRecord extends Record {
private LongText content;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = TIMESTAMP)
private long timestamp;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand Down Expand Up @@ -58,10 +59,12 @@ public class EndpointRelationServerSideMetrics extends Metrics {
private String destEndpoint;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = COMPONENT_ID, storageOnly = true)
private int componentId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand Down Expand Up @@ -67,6 +68,7 @@ public class ServiceInstanceRelationClientSideMetrics extends Metrics {
private String destServiceInstanceId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand Down Expand Up @@ -67,6 +68,7 @@ public class ServiceInstanceRelationServerSideMetrics extends Metrics {
private String destServiceInstanceId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand Down Expand Up @@ -65,11 +66,13 @@ public class ProcessRelationClientSideMetrics extends Metrics {
private String destProcessId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = COMPONENT_ID, storageOnly = true)
@BanyanDB.SeriesID(index = 1)
private int componentId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand Down Expand Up @@ -64,11 +65,13 @@ public class ProcessRelationServerSideMetrics extends Metrics {
private String destProcessId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = COMPONENT_ID, storageOnly = true)
@BanyanDB.SeriesID(index = 1)
private int componentId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,9 +60,11 @@ public class ServiceRelationClientSideMetrics extends Metrics {
@Getter
@Column(name = COMPONENT_IDS, storageOnly = true)
@ElasticSearch.Keyword
@ElasticSearch.EnableDocValues
private IntList componentIds = new IntList(3);
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,11 @@ public class ServiceRelationServerSideMetrics extends Metrics {
@Getter
@Column(name = COMPONENT_IDS, storageOnly = true)
@ElasticSearch.Keyword
@ElasticSearch.EnableDocValues
private IntList componentIds = new IntList(3);
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -44,7 +45,7 @@
"tagKey",
"tagValue",
"tagType"
})
}, callSuper = true)
@BanyanDB.IndexMode
public class TagAutocompleteData extends Metrics {
public static final String INDEX_NAME = "tag_autocomplete";
Expand All @@ -55,6 +56,7 @@ public class TagAutocompleteData extends Metrics {
@Setter
@Getter
@Column(name = TAG_KEY)
@ElasticSearch.EnableDocValues
@BanyanDB.SeriesID(index = 1)
private String tagKey;
@Setter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,13 @@ public class SegmentRecord extends Record {
private String endpointId;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = START_TIME)
@BanyanDB.NoIndexing
private long startTime;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = LATENCY)
private int latency;
@Setter
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -53,8 +54,10 @@ public class SpanAttachedEventRecord extends Record {
public static final String DATA_BINARY = "data_binary";
public static final String TIMESTAMP = "timestamp";

@ElasticSearch.EnableDocValues
@Column(name = START_TIME_SECOND)
private long startTimeSecond;
@ElasticSearch.EnableDocValues
@Column(name = START_TIME_NANOS)
private int startTimeNanos;
@Column(name = EVENT)
Expand All @@ -76,6 +79,7 @@ public class SpanAttachedEventRecord extends Record {
private byte[] dataBinary;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = TIMESTAMP)
@BanyanDB.NoIndexing
private long timestamp;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -52,17 +53,20 @@ public class SampledSlowTraceRecord extends Record {

@Column(name = SCOPE)
private int scope;
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID)
@BanyanDB.SeriesID(index = 0)
private String entityId;
@Column(name = TRACE_ID, storageOnly = true)
private String traceId;
@Column(name = URI, storageOnly = true)
private String uri;
@ElasticSearch.EnableDocValues
@Column(name = LATENCY, dataType = Column.ValueDataType.SAMPLED_RECORD)
private long latency;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = TIMESTAMP)
private long timestamp;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -53,17 +54,20 @@ public class SampledStatus4xxTraceRecord extends Record {

@Column(name = SCOPE)
private int scope;
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID)
@BanyanDB.SeriesID(index = 0)
private String entityId;
@Column(name = TRACE_ID, storageOnly = true)
private String traceId;
@Column(name = URI, storageOnly = true)
private String uri;
@ElasticSearch.EnableDocValues
@Column(name = LATENCY, dataType = Column.ValueDataType.SAMPLED_RECORD)
private long latency;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = TIMESTAMP)
private long timestamp;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -53,17 +54,20 @@ public class SampledStatus5xxTraceRecord extends Record {

@Column(name = SCOPE)
private int scope;
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID)
@BanyanDB.SeriesID(index = 0)
private String entityId;
@Column(name = TRACE_ID, storageOnly = true)
private String traceId;
@Column(name = URI, storageOnly = true)
private String uri;
@ElasticSearch.EnableDocValues
@Column(name = LATENCY, dataType = Column.ValueDataType.SAMPLED_RECORD)
private long latency;
@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = TIMESTAMP)
private long timestamp;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -50,6 +51,7 @@ public abstract class HistogramFunction extends Meter implements AcceptableValue

@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import org.apache.skywalking.oap.server.core.storage.StorageID;
import org.apache.skywalking.oap.server.core.storage.annotation.BanyanDB;
import org.apache.skywalking.oap.server.core.storage.annotation.Column;
import org.apache.skywalking.oap.server.core.storage.annotation.ElasticSearch;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Entity;
import org.apache.skywalking.oap.server.core.storage.type.Convert2Storage;
import org.apache.skywalking.oap.server.core.storage.type.StorageBuilder;
Expand All @@ -49,6 +50,7 @@ public abstract class AvgFunction extends Meter implements AcceptableValue<Long>

@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand All @@ -73,6 +75,7 @@ public abstract class AvgFunction extends Meter implements AcceptableValue<Long>
protected long count;
@Getter
@Setter
@ElasticSearch.EnableDocValues
@Column(name = VALUE, dataType = Column.ValueDataType.COMMON_VALUE)
@BanyanDB.MeasureField
private long value;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ public abstract class AvgHistogramFunction extends Meter implements AcceptableVa

@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ public abstract class AvgHistogramPercentileFunction extends Meter implements Ac

@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ public abstract class AvgLabeledFunction extends Meter implements AcceptableValu

@Setter
@Getter
@ElasticSearch.EnableDocValues
@Column(name = ENTITY_ID, length = 512)
@BanyanDB.SeriesID(index = 0)
private String entityId;
Expand Down
Loading
Loading