[Docs][DataQuality]: Add DataQuality Docs #9512

zixi0825 · 2022-04-14T15:12:20Z

Purpose of the pull request

This pull request adds dataquality docs.

caishunfeng

LGTM, PTAL @Tianqi-Dotes @zhongjiajie

zhongjiajie · 2022-04-15T05:55:08Z

Will take a look this weekend

docs/docs/en/guide/task/data-quality.md

zhongjiajie

LGTM, waiting for CI

Tianqi-Dotes

better rewrite it

Tianqi-Dotes · 2022-04-21T07:39:12Z

docs/docs/en/guide/task/data-quality.md

+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.
+- The execution flow of the data quality task is as follows: 


The execution flow of
->
The execution logic of
or
The execution code logic of

Tianqi-Dotes · 2022-04-21T07:39:49Z

docs/docs/en/guide/task/data-quality.md

+# 1 Overview
+## 1.1 Introduction
+
+The data quality task is used to check the data accuracy during the integration and processing of data. Data quality tasks in this release include single-table checking, single-table custom SQL checking, multi-table accuracy, and two-table value comparisons. The running environment of the data quality task is Spark 2.4.0, and other versions have not been verified, and users can verify by themselves.


better add a blank line.

Tianqi-Dotes · 2022-04-21T07:43:23Z

docs/docs/en/guide/task/data-quality.md

+
+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`


t_ds_dq_execute_result table of dolphinscheduler
->
table t_ds_dq_execute_result of the database dolphinscheduler.

Tianqi-Dotes · 2022-04-21T07:45:51Z

docs/docs/en/guide/task/data-quality.md

+> The user defines the task in the interface, and the user input value is stored in `TaskParam`
+When running a task, `Master` will parse `TaskParam`, encapsulate the parameters required by `DataQualityTask` and send it to `Worker`.
+Worker runs the data quality task. After the data quality task finishes running, it writes the statistical results to the specified storage engine. The current data quality task result is stored in the `t_ds_dq_execute_result` table of `dolphinscheduler`
+`Worker` sends the task result to `Master`, after `Master` receives `TaskResponse`, it will judge whether the task type is `DataQualityTask`, if so, it will read the corresponding result from `t_ds_dq_execute_result` according to `taskInstanceId`, and then The result is judged according to the check mode, operator and threshold configured by the user. If the result is a failure, the corresponding operation, alarm or interruption will be performed according to the failure policy configured by the user.


and then The result
->
and then the result

check mode
->
check formula

Tianqi-Dotes · 2022-04-21T07:50:27Z

docs/docs/en/guide/task/data-quality.md

+```properties
+data-quality.jar.name=dolphinscheduler-data-quality-dev-SNAPSHOT.jar
+
+Please fill in `data-quality.jar.name` according to the actual package name,


,
->
.
if these 2 sentence is the same sentence, should use ',' at the end of the first one ,and 'if you ....' lower case at the second one (When we write a long sentence, we always enter a newline at about 35 words)
if these 2 are 2 sentence, should use '.' and the first sentence and upper case at the second sentence.

Tianqi-Dotes · 2022-04-21T08:43:57Z

docs/docs/en/guide/task/data-quality.md

+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range


use upper case at the start

Tianqi-Dotes · 2022-04-21T08:44:31Z

docs/docs/en/guide/task/data-quality.md

+- Src table check column: drop-down to select check column name
+- start time: the start time of a time range
+- end time: the end time of a time range
+- Time Format: Set the corresponding time format


Time Format: Set the corresponding time format
->
Time Format: set the corresponding time format

Tianqi-Dotes · 2022-04-21T08:45:59Z

docs/docs/en/guide/task/data-quality.md

+    - [Actual/Expected]x100%
+    - [(Expected-Actual)/Expected]x100%
+- Check operators: =, >, >=, <, <=, ! =
+- Threshold: The value used in the formula for comparison


The value used
->
the value

Tianqi-Dotes · 2022-04-21T08:50:35Z

docs/docs/en/guide/task/data-quality.md

+    - Note: The SQL must be statistical SQL, such as counting the number of rows, calculating the maximum value, minimum value, etc.
+    - select max(a) as max_num from ${src_table}, the table name must be filled like this
+- Src filter conditions: such as the title, it will also be used when counting the total number of rows in the table, optional
+- Check method:


Check method:
->
Check method: select a suitable check method.

Tianqi-Dotes · 2022-04-21T08:51:24Z

docs/docs/en/guide/task/data-quality.md

+- Threshold: The value used in the formula for comparison
+- Failure strategy
+    - Alert: The data quality task failed, the DolphinScheduler task result is successful, and an alert is sent
+    - Blocking: The data quality task fails, the DolphinScheduler task result is failed, and an alarm is sent


what's the difference between alarm and alert

zhongjiajie · 2022-04-22T08:14:16Z

@Tianqi-Dotes Sorry, I merged it directly because we released 3.0.0-alpha, and this doc must in our website

Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>

docs(data-quality): add dq docs

2a9a5d5

SbloodyS requested a review from zhongjiajie April 15, 2022 01:32

caishunfeng reviewed Apr 15, 2022

View reviewed changes

zhongjiajie requested a review from Tianqi-Dotes April 15, 2022 05:54

zhongjiajie reviewed Apr 21, 2022

View reviewed changes

zhongjiajie added 5 commits April 21, 2022 14:12

Update docs/docs/en/guide/task/data-quality.md

3beae90

Update docs/docs/en/guide/task/data-quality.md

a64406f

Update docs/docs/en/guide/task/data-quality.md

40d50b6

Update docs/docs/en/guide/task/data-quality.md

ea8a92a

Update docs/docs/en/guide/task/data-quality.md

97d4f74

zhongjiajie assigned zixi0825 Apr 21, 2022

zhongjiajie added the document label Apr 21, 2022

mergeable bot removed the document label Apr 21, 2022

zhongjiajie added 2 commits April 21, 2022 14:48

fix locally

4006615

fix locally

ef4e48a

zhongjiajie added the document label Apr 21, 2022

zhongjiajie approved these changes Apr 21, 2022

View reviewed changes

zhongjiajie self-requested a review April 21, 2022 06:50

zhongjiajie merged commit 337696e into apache:dev Apr 21, 2022

Tianqi-Dotes reviewed Apr 21, 2022

View reviewed changes

fengjian1129 pushed a commit to fengjian1129/dolphinscheduler that referenced this pull request Apr 23, 2022

[Docs][DataQuality]: Add DataQuality Docs (apache#9512)

6371067

Co-authored-by: Jiajie Zhong <zhongjiajie955@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs][DataQuality]: Add DataQuality Docs #9512

[Docs][DataQuality]: Add DataQuality Docs #9512

zixi0825 commented Apr 14, 2022

caishunfeng left a comment

zhongjiajie commented Apr 15, 2022

zhongjiajie left a comment •

edited

Loading

Tianqi-Dotes left a comment

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

Tianqi-Dotes Apr 21, 2022

zhongjiajie commented Apr 22, 2022

[Docs][DataQuality]: Add DataQuality Docs #9512

[Docs][DataQuality]: Add DataQuality Docs #9512

Conversation

zixi0825 commented Apr 14, 2022

Purpose of the pull request

caishunfeng left a comment

Choose a reason for hiding this comment

zhongjiajie commented Apr 15, 2022

zhongjiajie left a comment • edited Loading

Choose a reason for hiding this comment

Tianqi-Dotes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhongjiajie commented Apr 22, 2022

zhongjiajie left a comment •

edited

Loading