modelscope · HYLcool · Sep 7, 2023 · Sep 7, 2023 · Sep 7, 2023 · Sep 7, 2023
diff --git a/README.md b/README.md
@@ -47,21 +47,31 @@ Table of Contents
 
 ## Features
 
-- **Broad Range of Operators**: Equipped with 50+ core [operators (OPs)](docs/Operators.md), including Formatters, Mappers, Filters, Deduplicators, and beyond.
+![Overview](docs/imgs/overview.png)
 
-- **Specialized Toolkits**: Feature-rich specialized toolkits such as [Text Quality Classifier](tools/quality_classifier/README.md), [Dataset Splitter](tools/preprocess/README.md), [Analysers](#data-analysis), [Evaluators](tools/evaluator/README.md), and more that elevate your dataset handling capabilities.
+- **Systematic & Reusable**: 
+  Empowering users with a systematic library of 20+ reusable [config recipes](configs), 50+ core [OPs](docs/Operators.md), and feature-rich 
+  dedicated [toolkits](#documentation), designed to 
+  function independently of specific LLM datasets and processing pipelines.
 
-- **Systematic & Reusable**: Empowering users with a systematic library of reusable [config recipes](configs) and [OPs](docs/Operators.md), designed to function independently of specific datasets, models, or tasks.
+- **Data-in-the-loop**: Allowing detailed data analyses with an automated 
+  report generation feature for a deeper understanding of your dataset. Coupled with multi-dimension automatic evaluation capabilities, it supports a timely feedback loop at multiple stages in the LLM development process.
+  ![Data-in-the-loop](docs/imgs/feedback_loop.png)
 
-- **Data-in-the-loop**: Allowing detailed data analyses with an automated report generation feature for a deeper understanding of your dataset. Coupled with timely multi-dimension automatic evaluation capabilities, it supports a feedback loop at multiple stages in the LLM development process.
+- **Comprehensive Data Processing Recipes**: Offering tens of [pre-built data 
+  processing recipes](configs/data_juicer_recipes/README.md) for 
+  pre-training, post-tuning, en, zh, and more scenarios. Validated on 
+  reference LLaMA models.  
+  ![exp_llama](docs/imgs/exp_on_llama.png)
 
-- **Comprehensive Processing Recipes**: Offering tens of [pre-built data processing recipes](configs/data_juicer_recipes/README.md) for pre-training, SFT, en, zh, and more scenarios.
+- **Enhanced Efficiency**: Providing a speedy data processing pipeline 
+  requiring less memory and CPU usage, optimized for maximum productivity. 
+  ![sys-perf](docs/imgs/sys_perf.png)
 
-- **User-Friendly Experience**: Designed for simplicity, with [comprehensive documentation](#documentation), [easy start guides](#quick-start) and [demo configs](configs/README.md), and intuitive configuration with simple adding/removing OPs from [existing configs](configs/config_all.yaml).
 
 - **Flexible & Extensible**: Accommodating most types of data formats (e.g., jsonl, parquet, csv, ...) and allowing flexible combinations of OPs. Feel free to [implement your own OPs](docs/DeveloperGuide.md#build-your-own-ops) for customizable data processing.
 
-- **Enhanced Efficiency**: Providing a speedy data processing pipeline requiring less memory, optimized for maximum productivity.
+- **User-Friendly Experience**: Designed for simplicity, with [comprehensive documentation](#documentation), [easy start guides](#quick-start) and [demo configs](configs/README.md), and intuitive configuration with simple adding/removing OPs from [existing configs](configs/config_all.yaml).
 
 ## Prerequisites
 
@@ -193,8 +203,8 @@ python xxx.py --config configs/demo/process.yaml --language_id_score_filter.lang
 ## Data Recipes
 - [Recipes for data process in BLOOM](configs/reproduced_bloom/README.md)
 - [Recipes for data process in RedPajama](configs/redpajama/README.md)
-- [Refined recipes for pretraining data](configs/data_juicer_recipes/README.md)
-- [Refined recipes for SFT data](configs/data_juicer_recipes/README.md#before-and-after-refining-for-alpaca-cot-dataset)
+- [Refined recipes for pre-training data](configs/data_juicer_recipes/README.md)
+- [Refined recipes for post-tuning data](configs/data_juicer_recipes/README.md#before-and-after-refining-for-alpaca-cot-dataset)
 
 ## Demos
 - Introduction to Data-Juicer [[ModelScope](https://modelscope.cn/studios/Data-Juicer/overview_scan/summary)]
@@ -211,8 +221,8 @@ python xxx.py --config configs/demo/process.yaml --language_id_score_filter.lang
   - Quality Classifier for CommonCrawl [[ModelScope](https://modelscope.cn/studios/Data-Juicer/tool_quality_classifier/summary)]
   - Auto Evaluation on [HELM](https://github.com/stanford-crfm/helm) [[ModelScope](https://modelscope.cn/studios/Data-Juicer/auto_evaluation_helm/summary)]
   - Data Sampling and Mixture [[ModelScope](https://modelscope.cn/studios/Data-Juicer/data_mixture/summary)]
-- Data Process Loop [[ModelScope](https://modelscope.cn/studios/Data-Juicer/data_process_loop/summary)]
-- Data Process HPO [[ModelScope](https://modelscope.cn/studios/Data-Juicer/data_process_hpo/summary)]
+- Data Processing Loop [[ModelScope](https://modelscope.cn/studios/Data-Juicer/data_process_loop/summary)]
+- Data Processing HPO [[ModelScope](https://modelscope.cn/studios/Data-Juicer/data_process_hpo/summary)]
 
 ## License
 Data-Juicer is released under Apache License 2.0.

diff --git a/README_ZH.md b/README_ZH.md
@@ -46,21 +46,20 @@ Data-Juicer 是一个一站式数据处理系统，旨在为大语言模型 (LLM
 
 ## 特点
 
-* **丰富的算子**：内置了 50 多个核心 [算子(OPs)](docs/Operators_ZH.md)，包括 Formatters，Mappers，Filters，Deduplicators 等。
+![Overview](docs/imgs/overview.png)
 
-* **专业的工具库**：提供功能丰富的专业工具库，例如 [文本质量打分器](tools/quality_classifier/README_ZH.md)，[数据分割器](tools/preprocess/README_ZH.md)，[分析器](#数据分析)，[评估器](tools/evaluator/README_ZH.md) 等，提升您的数据处理能力。
+* **系统化 & 可复用**：为用户提供系统化且可复用的20+[配置菜谱](configs/README_ZH.md)，50+核心[算子](docs/Operators_ZH.md)和专用[工具池](#documentation)，旨在让数据处理独立于特定的大语言模型数据集和处理流水线。
 
-* **系统化 & 可复用**：为用户提供系统化且可复用的[配置菜谱](configs/README_ZH.md)和[算子库](docs/Operators_ZH.md)，旨在让数据处理独立于特定的数据集、模型或任务运行。
+* **数据反馈回路**：支持详细的数据分析，并提供自动报告生成功能，使您深入了解您的数据集。结合多维度自动评估功能，支持在 LLM 开发过程的多个阶段进行及时反馈循环。  ![Data-in-the-loop](docs/imgs/feedback_loop.png)
 
-* **数据反馈回路**：支持详细的数据分析，并提供自动报告生成功能，使您深入了解您的数据集。结合及时多维度自动评估功能，支持在 LLM 开发过程的多个阶段进行反馈循环。
+* **全面的数据处理菜谱**：为pre-training、post-tuning、中英文等场景提供数十种[预构建的数据处理菜谱](configs/data_juicer_recipes/README_ZH.md)。  ![exp_llama](docs/imgs/exp_on_llama.png)
 
-* **全面的处理菜谱**：为预训练、SFT、中英文等场景提供数十种[预构建的数据处理菜谱](configs/data_juicer_recipes/README_ZH.md)。
+* **效率增强**：提供高效的数据处理流水线，减少内存占用和CPU开销，提高生产力。  ![sys-perf](docs/imgs/sys_perf.png)
 
 * **用户友好**：设计简单易用，提供全面的[文档](#documentation)、简易[入门指南](#快速上手)和[演示配置](configs/README_ZH.md)，并且可以轻松地添加/删除[现有配置](configs/config_all.yaml)中的算子。
-
+  
 * **灵活 & 易扩展**：支持大多数数据格式（如jsonl、parquet、csv等），并允许灵活组合算子。支持[自定义算子](docs/DeveloperGuide_ZH.md#构建自己的算子)，以执行定制化的数据处理。
 
-* **效率增强**：提供高效的数据处理流水线，减少内存占用，提高生产力。
 
 ## 前置条件
 
@@ -189,7 +188,7 @@ python xxx.py --config configs/demo/process.yaml --language_id_score_filter.lang
 * [BLOOM 数据处理菜谱](configs/reproduced_bloom/README_ZH.md)
 * [RedPajama 数据处理菜谱](configs/reproduced_redpajama/README_ZH.md)
 * [预训练数据增强菜谱](configs/data_juicer_recipes/README_ZH.md)
-* [SFT数据增强菜谱](configs/data_juicer_recipes/README_ZH.md#完善前后的alpaca-cot数据集)
+* [Post-tuning数据增强菜谱](configs/data_juicer_recipes/README_ZH.md#完善前后的alpaca-cot数据集)
 
 ## 演示样例
 

diff --git a/docs/imgs/exp_on_llama.png b/docs/imgs/exp_on_llama.png
diff --git a/docs/imgs/feedback_loop.png b/docs/imgs/feedback_loop.png
diff --git a/docs/imgs/overview.png b/docs/imgs/overview.png
diff --git a/docs/imgs/sys_perf.png b/docs/imgs/sys_perf.png