Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

Commit

Permalink
Update out-of-date documentation (#820)
Browse files Browse the repository at this point in the history
* Update readme and dev guide

* Update partiql doc for nested field

* Add PPL metrics

* Add PPL metrics

* Add PPL metrics

* Add PPL metrics

* Prepare PR
  • Loading branch information
dai-chen authored Nov 18, 2020
1 parent 50ce34f commit 815f4ad
Show file tree
Hide file tree
Showing 5 changed files with 193 additions and 50 deletions.
9 changes: 7 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,12 @@ The following projects have been merged into this repository as separate folders

## Documentation

Please refer to the [reference manual](./docs/user/index.rst) and [technical documentation](https://opendistro.github.io/for-elasticsearch-docs) for detailed information on installing and configuring opendistro-elasticsearch-sql plugin. Looking to contribute? Read the instructions on [Development Guide](./docs/developing.rst) and then submit a patch!
Please refer to the [SQL Language Reference Manual](./docs/user/index.rst), [Piped Processing Language (PPL) Reference Manual](./docs/experiment/ppl/index.rst) and [Technical Documentation](https://opendistro.github.io/for-elasticsearch-docs) for detailed information on installing and configuring opendistro-elasticsearch-sql plugin. Looking to contribute? Read the instructions on [Development Guide](./docs/developing.rst) and then submit a patch!


## Experimental

Recently we have been actively improving our query engine primarily for better correctness and extensibility. The new enhanced query engine has been already supporting the new released Piped Processing Language query processing behind the scene. Meanwhile, the integration with SQL language is also under way. To try out the power of the new query engine with SQL, simply run the command to enable it by [plugin setting](https://github.com/opendistro-for-elasticsearch/sql/blob/develop/docs/user/admin/settings.rst#opendistro-sql-engine-new-enabled). In future release, this will be enabled by default and nothing required to do from your side. Please stay tuned for updates on our progress and its new exciting features.


## Setup
Expand All @@ -36,7 +41,7 @@ After doing this, you need to restart the Elasticsearch server. Otherwise you ma
The package uses the [Gradle](https://docs.gradle.org/4.10.2/userguide/userguide.html) build system.

1. Checkout this package from version control.
2. To build from command line set `JAVA_HOME` to point to a JDK >=12
2. To build from command line set `JAVA_HOME` to point to a JDK >=14
3. Run `./gradlew build`


Expand Down
126 changes: 79 additions & 47 deletions docs/developing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,15 @@ Prerequisites
JDK
---

Java 12 is required to build the plugin because of the dependency on Elasticsearch test framework in our integration test. So you must have a JDK 12 installation on your machine. After the installation, please configure the ``JAVA_HOME`` environment variable accordingly. If everything goes right, you should something similar to this macOS sample output::
Specific version of JDK is required to build the plugin because of the dependency on Elasticsearch test framework in our integration test. So you must have the required version of JDK installation on your machine. After the installation, please configure the ``JAVA_HOME`` environment variable accordingly. If everything goes right, you should something similar to this sample output on macOS (take OpenJDK 14 for example)::

$ echo $JAVA_HOME
/Library/Java/JavaVirtualMachines/jdk-12.0.2.jdk/Contents/Home
/Library/Java/JavaVirtualMachines/adoptopenjdk-14.jdk/Contents/Home

$ java -version
java version "12.0.2" 2019-07-16
Java(TM) SE Runtime Environment (build 12.0.2+10)
Java HotSpot(TM) 64-Bit Server VM (build 12.0.2+10, mixed mode, sharing)
openjdk version "14.0.1" 2020-04-14
OpenJDK Runtime Environment AdoptOpenJDK (build 14.0.1+7)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 14.0.1+7, mixed mode, sharing)

Here are the official instructions on how to set ``JAVA_HOME`` for different platforms: https://docs.oracle.com/cd/E19182-01/820-7851/inst_cli_jdk_javahome_t/.

Expand All @@ -34,15 +34,16 @@ Elasticsearch & Kibana

For convenience, we recommend installing Elasticsearch and Kibana on your local machine. You can download the open source ZIP for each and extract them to a folder.

Kibana is optional, but makes it easier to test your queries. Alternately, you can use curl from the terminal to run queries against the plugin.
If you just want to have a quick look, you can also get an Elasticsearch running with plugin installed by ``./gradlew :plugin:run``.

Kibana is optional, but makes it easier to test your queries. Alternately, you can use curl from the terminal to run queries against the plugin.

Getting Source Code
===================

Now you can check out the code from your forked GitHub repository and create a new branch for your bug fix or enhancement work::

$ git clone https://github.com/<your_account>/sql.git
$ git clone git@github.com:<your_account>/sql.git
$ git checkout -b <branch_name>

If there is update in master or you want to keep the forked repository long living, you can sync it by following the instructions: https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/syncing-a-fork. Basically you just need to pull latest changes from upstream master once you add it for the first time::
Expand All @@ -67,7 +68,6 @@ After getting the source code as well as Elasticsearch and Kibana, your workspac
drwxr-xr-x 10 user group^users 4096 Nov 8 12:16 elasticsearch-7.3.2
drwxr-xr-x 14 user group^users 4096 Nov 8 12:14 kibana-7.3.2-linux-x86_64
drwxr-xr-x 16 user group^users 4096 Nov 15 10:59 sql
drwxr-xr-x 9 user group^users 4096 Oct 31 14:39 sql-jdbc


Configuring IDEs
Expand All @@ -78,7 +78,7 @@ You can develop the plugin in your favorite IDEs such as Eclipse and IntelliJ ID
Java Language Level
-------------------

Although JDK 12 is required to build the plugin, the Java language level needs to be Java 8 for compatibility. Only in this case your plugin works with Elasticsearch running against JDK 8. Otherwise it will raise runtime exception when executing new API from new JDK. In case your IDE doesn’t set it right, you may want to double check your project setting after import.
Although later version of JDK is required to build the plugin, the Java language level needs to be Java 8 for compatibility. Only in this case your plugin works with Elasticsearch running against JDK 8. Otherwise it will raise runtime exception when executing new API from new JDK. In case your IDE doesn’t set it right, you may want to double check your project setting after import.

Remote Debugging
----------------
Expand Down Expand Up @@ -128,51 +128,67 @@ The plugin codebase is in standard layout of Gradle project::
├── build.gradle
├── config
│ └── checkstyle
│ ├── checkstyle.xml
│ └── suppressions.xml
├── docs
│ ├── dev
│ │ ├── SemanticAnalysis.md
│ │ ├── SubQuery.md
│ │ └── img
│ └── user
│ ├── admin
│ ├── index.rst
│ └── interfaces
│   ├── attributions.md
│   ├── category.json
│   ├── dev
│   ├── developing.rst
│   ├── experiment
│   └── user
├── gradle.properties
├── gradlew
├── gradlew.bat
├── opendistro-elasticsearch-sql.release-notes
├── settings.gradle
└── src
├── assembly
│ └── zip.xml
├── main
│ ├── antlr
│ ├── java
│ └── resources
└── test
├── java
└── resources

Here are files and folders you are most likely to touch:

- build.gradle: Gradle build script.
- config/: only Checkstyle configuration files for now.
- docs/: include documentation for developers and reference manual for users.
- src/: source code root

- main/antlr: ANTLR4 grammar files.
- main/java: Java source code.
- test/java: Java test code.
├── common
├── core
├── doctest
├── elasticsearch
├── integ-test
├── legacy
├── plugin
├── protocol
├── ppl
├── sql
├── sql-cli
├── sql-jdbc
├── sql-odbc
└── workbench

Here are sub-folders (Gradle modules) for plugin source code:

- ``plugin``: Elasticsearch plugin related code.
- ``sql``: SQL language processor.
- ``ppl``: PPL language processor.
- ``core``: core query engine.
- ``elasticsearch``: Elasticsearch storage engine.
- ``protocol``: request/response protocol formatter.
- ``common``: common util code.
- ``integ-test``: integration and comparison test.

Here are other files and sub-folders that you are likely to touch:

- ``build.gradle``: Gradle build script.
- ``config``: only Checkstyle configuration files for now.
- ``docs``: documentation for developers and reference manual for users.
- ``doc-test``: code that run .rst docs in ``docs`` folder by Python doctest library.

Note that other related project code has already merged into this single repository together:

- ``sql-cli``: CLI tool for running query from command line.
- ``sql-jdbc``: JDBC driver.
- ``sql-odbc``: ODBC driver.
- ``workbench``: query workbench UI.


Code Convention
---------------

We’re integrated Checkstyle plugin into Gradle build: https://github.com/opendistro-for-elasticsearch/sql/blob/master/config/checkstyle/checkstyle.xml. So any violation will fail the build. You need to identify the offending code from Gradle error message and fix them and rerun the Gradle build. Here are the highlight of some Checkstyle rules:
We’re integrated Checkstyle plugin into Gradle build: https://github.com/opendistro-for-elasticsearch/sql/blob/master/config/checkstyle/google_checks.xml. So any violation will fail the build. You need to identify the offending code from Gradle error message and fix them and rerun the Gradle build. Here are the highlight of some Checkstyle rules:

* 2 spaces indentation.
* No line starts with tab character in source file.
* Line width <= 120 characters.
* Line width <= 100 characters.
* Wildcard imports: You can enforce single import by configuring your IDE. Instructions for Intellij IDEA: https://www.jetbrains.com/help/idea/creating-and-optimizing-imports.html#disable-wildcard-imports.
* Operator needs to wrap at next line.

Expand Down Expand Up @@ -201,10 +217,17 @@ Most of the time you just need to run ./gradlew build which will make sure you p
- Run all checks according to Checkstyle configuration.
* - ./gradlew test
- Run all unit tests.
* - ./gradlew integTestRunner
* - ./gradlew :integ-test:integTestRunner
- Run all integration test (this takes time).
* - ./gradlew build
- Build plugin by run all tasks above (this takes time).

For ``test`` and ``integTestRunner``, you can use —tests “UT full path” to run a task individually. For example ./gradlew test --tests “com.amazon.opendistroforelasticsearch.sql.unittest.LocalClusterStateTest”.
For integration test, you can use ``-Dtests.class`` “UT full path” to run a task individually. For example ``./gradlew :integ-test:integTest -Dtests.class="*QueryIT"``.

To run the task above for specific module, you can do ``./gradlew :<module_name>:task``. For example, only build core module by ``./gradlew :core:build``.

Troubleshooting
---------------

Sometimes your Gradle build fails or timeout due to Elasticsearch integration test process hung there. You can check this by the following commands::

Expand Down Expand Up @@ -270,12 +293,13 @@ For test cases, you can use the cases in the following checklist in case you mis
For unit test:

* Put your test class in the same package in src/test/java so you can access and test package-level method.
* Make sure you are testing against the right abstraction. For example a bad practice is to create many classes by ESActionFactory class and write test cases on very high level. This makes it more like an integration test.
* Make sure you are testing against the right abstraction with dependencies mocked. For example a bad practice is to create many classes by ESActionFactory class and write test cases on very high level. This makes it more like an integration test.

For integration test:

* Elasticsearch test framework is in use so an in-memory cluster will spin up for each test class.
* You can only access the plugin and verify the correctness of your functionality via REST client externally.
* You can only access the plugin and verify the correctness of your functionality via REST client externally.
* Our homemade comparison test framework is used heavily to compare with other databases without need of assertion written manually. More details can be found in `Testing <./dev/Testing.md>`_.

Here is a sample for integration test for your reference:

Expand All @@ -295,7 +319,7 @@ Here is a sample for integration test for your reference:
}
}
Finally thanks to JaCoCo library, you can check out the test coverage for your changes easily.
Finally thanks to JaCoCo library, you can check out the test coverage in ``<module_name>/build/reports/jacoco`` for your changes easily.

Deploying Locally
-----------------
Expand Down Expand Up @@ -325,6 +349,9 @@ For new feature or big enhancement, it is worth document your design idea for ot
Reference Manual
----------------

Doc Generator
>>>>>>>>>>>>>

Currently the reference manual documents are generated from a set of special integration tests. The integration tests use custom DSL to build ReStructure Text markup with real query and result set captured and documented.

1. Add a new template to ``src/test/resources/doctest/templates``.
Expand Down Expand Up @@ -352,3 +379,8 @@ Sample test class:
);
}
}
Doctest
>>>>>>>

Python doctest library makes our document executable which keeps it up-to-date to source code. The doc generator aforementioned served as scaffolding and generated many docs in short time. Now the examples inside is changed to doctest gradually. For more details please read `Doctest <./dev/Doctest.md>`_.
56 changes: 56 additions & 0 deletions docs/experiment/ppl/admin/monitoring.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
.. highlight:: sh

==========
Monitoring
==========

.. rubric:: Table of contents

.. contents::
:local:
:depth: 1


Introduction
============

By a stats endpoint, you are able to collect metrics for the plugin within the interval. Note that only node level statistics collecting is implemented for now. In other words, you only get the metrics for the node you're accessing. Cluster level statistics have yet to be implemented.

Node Stats
==========

Description
-----------

The meaning of fields in the response is as follows:

+--------------------------------+-------------------------------------------------------------------+
| Field name| Description|
+================================+===================================================================+
| ppl_request_total| Total count of PPL request|
+--------------------------------+-------------------------------------------------------------------+
| ppl_request_count| Total count of PPL request within the interval|
+--------------------------------+-------------------------------------------------------------------+
| ppl_failed_request_count_syserr|Count of failed PPL request due to system error within the interval|
+--------------------------------+-------------------------------------------------------------------+
| ppl_failed_request_count_cuserr| Count of failed PPL request due to bad request within the interval|
+--------------------------------+-------------------------------------------------------------------+


Example
-------

SQL query::

>> curl -H 'Content-Type: application/json' -X GET localhost:9200/_opendistro/_ppl/stats

Result set::

{
"ppl_request_total": 10,
"ppl_request_count": 2,
"ppl_failed_request_count_syserr": 0,
"ppl_failed_request_count_cuserr": 0,
...
}

2 changes: 2 additions & 0 deletions docs/experiment/ppl/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ The query start with search command and then flowing a set of command delimited

- `Plugin Settings <admin/settings.rst>`_

- `Monitoring <admin/monitoring.rst>`_

* **Commands**

- `Syntax <cmd/syntax.rst>`_
Expand Down
50 changes: 49 additions & 1 deletion docs/user/beyond/partiql.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,58 @@ There are three fields in test index ``people``: 1) deep nested object field ``c
Example: Employees
------------------

Here is the mapping for test index ``employees_nested``. Note that field ``projects`` is a nested field::

{
"mappings": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"projects": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"fielddata": true
},
"started_year": {
"type": "long"
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}


Result set::

{
"employees" : [
"employees_nested" : [
{
"id" : 3,
"name" : "Bob Smith",
Expand Down

0 comments on commit 815f4ad

Please sign in to comment.