Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Extend the PPL identifier defintion #888

Merged
merged 3 commits into from
Dec 4, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions docs/experiment/ppl/general/identifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,13 @@ Description

A regular identifier is a string of characters that must start with ASCII letter (lower or upper case). The subsequent character can be a combination of letter, digit, underscore (``_``). It cannot be a reversed key word. And whitespace and other special characters are not allowed.

For Elasticsearch, the following identifiers are supported extensionally:

1. Identifiers prefixed by dot ``.``: this is called hidden index in Elasticsearch, for example ``.kibana``.
2. Identifiers prefixed by at sign ``@``: this is common for meta fields generated in Logstash ingestion.
3. Identifiers with ``-`` in the middle: this is mostly the case for index name with date information.
4. Identifiers with star ``*`` present: this is mostly an index pattern for wildcard match.

Examples
--------

Expand All @@ -46,12 +53,7 @@ Delimited Identifiers
Description
-----------

A delimited identifier is an identifier enclosed in back ticks `````. In this case, the identifier enclosed is not necessarily a regular identifier. In other words, it can contain any special character not allowed by regular identifier. For Elasticsearch, the following identifiers are supported extensionally:

1. Identifiers prefixed by dot ``.``: this is called hidden index in Elasticsearch, for example ``.kibana``.
2. Identifiers prefixed by at sign ``@``: this is common for meta fields generated in Logstash ingestion.
3. Identifiers with ``-`` in the middle: this is mostly the case for index name with date information.
4. Identifiers with star ``*`` present: this is mostly an index pattern for wildcard match.
A delimited identifier is an identifier enclosed in back ticks `````. In this case, the identifier enclosed is not necessarily a regular identifier. In other words, it can contain any special character not allowed by regular identifier.

Use Cases
---------
Expand All @@ -67,7 +69,7 @@ Examples

Here are examples for quoting an index name by back ticks::

od> source=`acc*` | fields `account_number`;
od> source=`accounts` | fields `account_number`;
fetched rows / total rows = 4/4
+------------------+
| account_number |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,17 +79,18 @@ public void testStatsMax() throws IOException {
@Test
public void testStatsNested() throws IOException {
JSONObject response =
executeQuery(String.format("source=%s | stats avg(abs(age)*2) as AGE", TEST_INDEX_ACCOUNT));
executeQuery(String.format("source=%s | stats avg(abs(age) * 2) as AGE",
TEST_INDEX_ACCOUNT));
verifySchema(response, schema("AGE", null, "double"));
verifyDataRows(response, rows(60.342));
}

@Test
public void testStatsNestedDoubleValue() throws IOException {
JSONObject response =
executeQuery(String.format("source=%s | stats avg(abs(age)*2.0)",
executeQuery(String.format("source=%s | stats avg(abs(age) * 2.0)",
TEST_INDEX_ACCOUNT));
verifySchema(response, schema("avg(abs(age)*2.0)", null, "double"));
verifySchema(response, schema("avg(abs(age) * 2.0)", null, "double"));
verifyDataRows(response, rows(60.342));
}

Expand Down
4 changes: 1 addition & 3 deletions ppl/src/main/antlr/OpenDistroPPLLexer.g4
Original file line number Diff line number Diff line change
Expand Up @@ -230,13 +230,11 @@ CONCAT_WS: 'CONCAT_WS';
LENGTH: 'LENGTH';
STRCMP: 'STRCMP';

// LITERALS AND VALUES
//STRING_LITERAL: DQUOTA_STRING | SQUOTA_STRING | BQUOTA_STRING;
ID: ID_LITERAL;
INTEGER_LITERAL: DEC_DIGIT+;
DECIMAL_LITERAL: (DEC_DIGIT+)? '.' DEC_DIGIT+;

fragment ID_LITERAL: [A-Z_]+[A-Z_$0-9@\-]*;
fragment ID_LITERAL: [@*A-Z]+?[*A-Z_\-0-9]*;
DQUOTA_STRING: '"' ( '\\'. | '""' | ~('"'| '\\') )* '"';
SQUOTA_STRING: '\'' ('\\'. | '\'\'' | ~('\'' | '\\'))* '\'';
BQUOTA_STRING: '`' ( '\\'. | '``' | ~('`'|'\\'))* '`';
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,18 @@

import com.amazon.opendistroforelasticsearch.sql.ast.Node;
import com.amazon.opendistroforelasticsearch.sql.ast.tree.RareTopN.CommandType;
import com.amazon.opendistroforelasticsearch.sql.common.antlr.SyntaxCheckException;
import com.amazon.opendistroforelasticsearch.sql.ppl.antlr.PPLSyntaxParser;
import org.junit.Ignore;
import org.junit.Rule;
import org.junit.Test;
import org.junit.rules.ExpectedException;

public class AstBuilderTest {

@Rule
public ExpectedException exceptionRule = ExpectedException.none();

private PPLSyntaxParser parser = new PPLSyntaxParser();

@Test
Expand Down Expand Up @@ -366,6 +372,40 @@ public void testIndexName() {
));
}

@Test
public void testIdentifierAsIndexNameStartWithDot() {
assertEqual("source=.kibana",
relation(".kibana"));
}

@Test
public void identifierAsIndexNameWithDotInTheMiddleThrowException() {
exceptionRule.expect(SyntaxCheckException.class);
plan("source=log.2020.10.10");
}

@Test
public void testIdentifierAsIndexNameWithSlashInTheMiddle() {
assertEqual("source=log-2020",
relation("log-2020"));
}

@Test
public void testIdentifierAsIndexNameContainStar() {
assertEqual("source=log-2020-10-*",
relation("log-2020-10-*"));
}

@Test
public void testIdentifierAsFieldNameStartWithAt() {
assertEqual("source=log-2020 | fields @timestamp",
projectWithArg(
relation("log-2020"),
defaultFieldsArgs(),
field("@timestamp")
));
}

@Test
public void testRareCommand() {
assertEqual("source=t | rare a",
Expand Down