[NSE-206]Update documents and remove duplicate parts

oap-project · Apr 30, 2021 · 8df21e1 · 8df21e1
1 parent 230df38
commit 8df21e1
Show file tree

Hide file tree

Showing 37 changed files with 12,993 additions and 1,125 deletions.
diff --git a/arrow-data-source/CHANGELOG.md → CHANGELOG.md b/arrow-data-source/CHANGELOG.md → CHANGELOG.md
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -1,3 +1,7 @@
+##### \* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.
+
+##### \* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).
+
 # Spark Native SQL Engine
 
 A Native Engine for Spark SQL with vectorized SIMD optimizations
@@ -10,7 +14,7 @@ You can find the all the Native SQL Engine documents on the [project web page](h
 
 ![Overview](./docs/image/nativesql_arch.png)
 
-Spark SQL works very well with structured row-based data. It used WholeStageCodeGen to improve the performance by Java JIT code. However Java JIT is usually not working very well on utilizing latest SIMD instructions, especially under complicated queries. [Apache Arrow](https://arrow.apache.org/) provided CPU-cache friendly columnar in-memory layout, its SIMD optimized kernels and LLVM based SQL engine Gandiva are also very efficient. Native SQL Engine used these technoligies and brought better performance to Spark SQL.
+Spark SQL works very well with structured row-based data. It used WholeStageCodeGen to improve the performance by Java JIT code. However Java JIT is usually not working very well on utilizing latest SIMD instructions, especially under complicated queries. [Apache Arrow](https://arrow.apache.org/) provided CPU-cache friendly columnar in-memory layout, its SIMD optimized kernels and LLVM based SQL engine Gandiva are also very efficient. Native SQL Engine used these technologies and brought better performance to Spark SQL.
 
 ## Key Features
 
@@ -58,7 +62,7 @@ Please notice the files are fat jars shipped with our custom Arrow library and p
 ### Building by Conda
 
 If you already have a working Hadoop Spark Cluster, we provide a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md) for more information. Once finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find built `spark-columnar-core-<version>-jar-with-dependencies.jar` under `$HOME/miniconda2/envs/oapenv/oap_jars`.
-Then you can just skip below steps and jump to Getting Started [Get Started](#get-started).
+Then you can just skip below steps and jump to [Get Started](#get-started).
 
 ### Building by yourself
 

diff --git a/TPP.txt b/TPP.txt
diff --git a/arrow-data-source/LICENSE.txt b/arrow-data-source/LICENSE.txt
diff --git a/arrow-data-source/README.md b/arrow-data-source/README.md
@@ -6,10 +6,6 @@ A Spark DataSource implementation for reading files into Arrow compatible column
 
 The development of this library is still in progress. As a result some of the functionality may not be constantly stable for being used in production environments that have not been fully considered due to the limited testing capabilities so far.
 
-## Online Documentation
-
-You can find the all the Native SQL Engine documents on the [project web page](https://oap-project.github.io/arrow-data-source/).
-
 ## Build
 
 ### Prerequisite
@@ -27,7 +23,7 @@ Please make sure you have already installed the software in your system.
 
 ### Building by Conda
 
-If you already have a working Hadoop Spark Cluster, we provide a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md) for more information. Once finished [OAP-Installation-Guide](./docs/OAP-Installation-Guide.md), you can find built `spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar` under `$HOME/miniconda2/envs/oapenv/oap_jars`.
+If you already have a working Hadoop Spark Cluster, we provide a Conda package which will automatically install dependencies needed by OAP, you can refer to [OAP-Installation-Guide](../docs/OAP-Installation-Guide.md) for more information. Once finished [OAP-Installation-Guide](../docs/OAP-Installation-Guide.md), you can find built `spark-arrow-datasource-standard-<version>-jar-with-dependencies.jar` under `$HOME/miniconda2/envs/oapenv/oap_jars`.
 Then you can just skip steps below and jump to [Get Started](#get-started).
 
 ### cmake installation
@@ -213,7 +209,7 @@ spark.sql("SELECT * FROM my_temp_view LIMIT 10").show(10)
 
 To validate if ArrowDataSource works, you can go to the DAG to check if ArrowScan has been used from the above example query.
 
-![Image of ArrowDataSource Validation](./docs/image/arrowdatasource_validation.png)
+![Image of ArrowDataSource Validation](../docs/image/arrowdatasource_validation.png)
 
 
 ## Work together with ParquetDataSource (experimental)

diff --git a/arrow-data-source/docs/ApacheArrowInstallation.md b/arrow-data-source/docs/ApacheArrowInstallation.md