Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for structured streaming #292

Merged
merged 4 commits into from
Apr 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@ Waterdrop 提供可直接执行的软件包,没有必要自行编译源代码
Databricks 开源的 Apache Spark 对于分布式数据处理来说是一个伟大的进步。我们在使用 Spark 时发现了很多可圈可点之处,同时我们也发现了我们的机会 —— 通过我们的努力让Spark的使用更简单,更高效,并将业界和我们使用Spark的优质经验固化到Waterdrop这个产品中,明显减少学习成本,加快分布式数据处理能力在生产环境落地。

除了大大简化分布式数据处理难度外,Waterdrop尽所能为您解决可能遇到的问题:

* 数据丢失与重复
* 任务堆积与延迟
* 吞吐量低
* 应用到生产环境周期长
* 缺少应用运行状态监控


"Waterdrop" 的中文是“水滴”,来自中国当代科幻小说作家刘慈欣的《三体》系列,它是三体人制造的宇宙探测器,会反射几乎全部的电磁波,表面绝对光滑,温度处于绝对零度,全部由被强互作用力紧密锁死的质子与中子构成,无坚不摧。在末日之战中,仅一个水滴就摧毁了人类太空武装力量近2千艘战舰。

## Waterdrop 使用场景
Expand All @@ -44,14 +44,15 @@ Databricks 开源的 Apache Spark 对于分布式数据处理来说是一个伟

## Waterdrop 的特性

* 简单易用,灵活配置,无需开发
* 实时流式处理
* 离线多源数据分析
* 高性能
* 海量数据处理能力
* 模块化和插件化,易于扩展
* 支持利用SQL做数据处理和聚合
* 支持Spark 2.x
* 简单易用,灵活配置,无需开发
* 实时流式处理
* 离线多源数据分析
* 高性能
* 海量数据处理能力
* 模块化和插件化,易于扩展
* 支持利用SQL做数据处理和聚合
* 支持Spark Structured Streaming
* 支持Spark 2.x

## Waterdrop 的工作流程

Expand Down
Binary file modified docs/images/wd-workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/zh-cn/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ Databricks 开源的 Apache Spark 对于分布式数据处理来说是一个伟
* 海量数据处理能力
* 模块化和插件化,易于扩展
* 支持利用SQL做数据处理和聚合
* Spark Structured Streaming
* 支持Spark 2.x

## Waterdrop 的工作流程
Expand Down
10 changes: 10 additions & 0 deletions docs/zh-cn/quick-start.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,3 +118,13 @@ cd waterdrop
./bin/start-waterdrop.sh --master local[4] --deploy-mode client --config ./config/batch.conf.template

```

[配置示例3 : Structured Streaming 流式处理](https://github.com/InterestingLab/waterdrop/blob/master/config/structuredstreaming.conf.template)

以上配置为默认【Structured Streaming 配置模版】,需配置Kafka输入源后运行,命令如下:

```
cd waterdrop
./bin/start-waterdrop-structured-streaming.sh --master local[4] --deploy-mode client --config ./config/batch.conf.template

```