Release Notes 0.13.0 #4370

EmmyMiao87 · 2020-08-17T04:13:12Z

New Feature

Query spill to disk

Doris supports query spill to disk in sorting and window functions. When the enable_spilling is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.

[#3820] [#4151] [#4152]

Support `bitmap_union`, `hll_union` and `count` in materialized view

Materialized view supports richer aggregate functions: bitmap_union, hll_union and count. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query.

[#3651] [#3677] [#3705] [#3873] [#4014] [#3677]

Spark load

Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.

[#3418] [#3712] [#3715] [#3716]

Support load json-data into Doris by RoutineLoad or StreamLoad

RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.

[#3553]

Modify routine load

The properties of routine load such as concurrency, Kafka consumption progress could be modify by ALTER ROUTINE LOAD stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.

[#4158]

Support fetch `_id` from ES and create table with wildcard or aliase index of ES

There is _id field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table with aliases or wildcard index such as log_*. User can easily search all those index by using aliases and wildcards to match those indexes.

[#3900] [#3968]

Logstash Doris output plugin

Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load.

[#3800]

Support `SELECT INTO OUTFILE`

Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by bitmap_to_string.

[#3584]

Support in predicate in delete statement

The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.

[#4006]

Enhancement

Compaction rules optimization

This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.

[#4212]

Simplify the delete process to make it fast

The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level.

[#3191]

Support simple transitivity on join predicate pushdown

When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.

[#3453]

Non blocking OlapTableSink

In this optimization, the sending process and the adding row process are executed concurrently in OlapTableSink, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.

[#3143]

Support txn management in db level and use ArrayDeque to improve txn task performance

The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks

[#3369]

Improve the performance of query with IN predicate

Add a new config max_pushdown_conditions_per_column to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased.

[#3694]

Optimized the speed of reading parquet files

There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.

[#3878]

New Built-in Functions

bitmap_intersect [Support bitmap_intersect #3571]
orthogonal_bitmap_intersect in UDAF [Add bitmap longitudinal cutting udaf #4198]
orthogonal_bitmap_intersect_count in UDAF [Add bitmap longitudinal cutting udaf #4198]
orthogonal_bitmap_union_count in UDAF [Add bitmap longitudinal cutting udaf #4198]

Other

Support to modify configs when BE is running without restarting ([config] Support to modify configs when BE is running without restarting #3264)
Support setting replica quota in db level (Support setting replica quota in db level #3283)
[Doris On ES][Bug-fix] Solve the problem of time format processing.([Doris On ES][Bug-fix] Solve the problem of time format processing #3941)
[Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode.([Doris On ES][Bug-Fix] Incorrect result for docvalue scan mode #3751)
[Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes ([Doris On ES][Bug] ES queries always route at same 3 BE nodes #4351) ([Doris On ES][Bug-Fix] ES queries always route at same 3 BE nodes (#4351) #4352)
[Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type([Doris On ES][Bug-Fix] Resolve NullPointerException when multi fields with text type #4300)
[CodeRefactor] Modify FE modules ([CodeRefactor] Modify FE modules #4146)
[CodeRefactor] Generate jave files using maven (generate jave files using maven #4133)
[Compaction] Add delayed deletion of rowsets function, fix -230 error. (Add delayed deletion of rowsets function, fix -230 error. #4039)
[DOCS] documents rebuild with Vuepress ([Enhancement] documents rebuild with Vuepress (#3408) #3414)
[Webserver] Make BE webserver more pretty ([webserver] Make BE webserver more pretty #4050)
[Webserver] Introduce mustache to simplify BE's website render ([webserver] Introduce mustache to simplify BE's website render #4062)
[Doris On ES][Enhancement] Add docvalue limitation for doc_values scan and enable doc_values scan default ([Doris On ES] Add docvalue limitation for doc_values scan and enable doc_values scan default #4055)
[Doris On ES][Enhancement] refactor and enchanment ES sync meta logic. ([Doris On ES][Refactor] refactor and enchanment ES sync meta logic #4012)
[Doris On ES][Enhancement] Ignore _total node for efficiency and fully trusted document count ([Doris On ES][Optimization] Ignore _total node for efficiency and fully trusted document count #3932)
[ColocateJoin] Support table join itself by colocate join ([ColocateJoin] ColocateJoin support table join itself (#4230) #4231)
[Load] Support import true or false as boolean value (Support import true or false as boolean value #3898)
[Meta tool] Add segment v2 footer meta viewer (Add segment v2 footer meta viewer #3822)

API Change

[DynamicPartition] Optimize the rule of creating dynamic partition ([DynamicPartition] Optimize the rule of creating dynamic partition #3679)
[SegmentV2] Change the default storage format to SegmentV2 ([SegmentV2] Change the default storage format to SegmentV2 #4387)
[License] Organize and modify the license of the code ([License] Organize and modify the license of the code #4371)
[UDF] Fix large string val allocation failure (Fix large string val allocation failure #3724)
Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad (Support utf-8 encoding in instr, locate, locate_pos, lpad, rpad #3638)

Credits

@ZhangYu0123
@wfjcmcb
@Fullstop000
@sduzh
@stalary
@worker24h
@chaoyli
@vagetablechicken
@jmk1011
@funyeah
@wutiangan
@gengjun-git
@xinghuayu007
@EmmyMiao87
@songenjie
@acelyc111
@yangzhg
@Seaven
@hexian55
@ChenXiaoFei
@WingsGo
@kangpinghuang
@wangbo
@weizuo93
@sdgshawn
@skyduy
@wyb
@gaodayue
@HappenLee
@kangkaisen
@wuyunfeng
@HangyuanLiu
@xy720
@liutang123
@caiconghui
@liyuance
@spaces-X
@hffariel
@decster
@blackfox1983
@Astralidea
@morningman
@hf200012
@xbyang18
@Youngwb
@imay
@marising
@caoyang10

The text was updated successfully, but these errors were encountered:

EmmyMiao87 · 2020-11-02T02:56:50Z

新功能

大查询落盘

Doris在排序和窗口函数功能中支持查询落盘功能。当参数 enable_spilling 为 true 并且查询达到内存限制时，查询将进行落盘，以避免由于内存瓶颈导致的无法查询的问题。 0.13版本主要支持在排序和窗口函数上的落盘功能。

[#3820] [#4151] [#4152]

物化视图支持 `bitmap_union`，`hll_union` 和 count

物化视图支持更丰富的聚合函数：bitmap_union，hll_union和 count。在“订单”业务情景中，用户可以借助生成 count 类型的物化视图，来分析不同维度的订单数量。还可以对某些重复数据精确去重分析（例如分析网站流量中的PV和UV数据）执行 bitmap 和 hll 函数的预先计算。 Doris可以自动将用户查询与最佳物化视图进行匹配，以加快查询速度。

[#3651] [#3677] [#3705] [#3873] [#4014] [#3677]

Spark 导入

通过外部 Spark 资源实现对导入数据的 ETL 处理，提高了 Doris 大数据量的导入性能，并节省了 Doris 集群计算资源。它主要用于在初始迁移期间将大量数据导入 Doris 的方案。

[#3418] [#3712] [#3715] [#3716]

RoutineLoad 和 StreamLoad 支持新的数据格式：Json

通过load语句中的转换规则将 Json 格式的数据导入 Doris。此功能对于原始数据格式为 Json 的日志服务特别有用。用户不再需要在外层将数据处理为 csv 格式。

[#3553]

修改 Routine load

可以通过 ALTER ROUTINE LOAD stmt 修改常规 Routine load 的属性，例如并发性，Kafka消费进度。注意只能修改处于 “pause” 状态的作业。修改并发度后，当 Routine load被被 resume 时，新设置的属性将用于 Routine load。

[#4158]

Doris on ES

支持从ES提取_id并使用ES的通配符或别名索引创建表
来自本地ES文档的_id字段是ES索引的主键。此字段可由Doris在ES上获取。此外，Doris支持使用别名或通配符索引（例如log_ *）创建外部表。用户可以使用别名和通配符来匹配所有索引，从而轻松搜索所有这些索引。
重构并增强了读取 ES 元数据的逻辑
为doc_values扫描添加docvalue限制并默认启用doc_values扫描
忽略_total节点以提高效率和完全信任的文档数

[#3900] [#3968]

Logstash 插件

Logstash 插件用于将数据从 Logstash 输出到 Doris 中。使用HTTP协议与Doris FE Http接口进行交互
通过Doris的 Stream load 来加载 Logstash 的数据。

[#3800]

支持查询结果输出到文件

Doris 当前支持将查询结果导出到第三方文件系统，例如HDFS，S3，BOS。语法是从 MySQL 语法手册中引用的。导出格式为CSV。导出查询结果可以提供给其他用户，以供其他系统下载或进一步处理。对于因为结果集太大而无法通过 MySQL 协议输出的查询很有帮助，例如当函数 bitmap_to_string 后所标识的ID过多时。

[#3584]

在delete语句中支持谓词

delete语句支持IN或NOT IN谓词的条件。用户可以通过此功能删除满足不同值的行。

[#4006]

优化功能

合并规则优化

此优化更新了触发合并的策略，该版本合并策略将大福减少写放大，空间放大，读取性能等问题（它倾向于合并相邻大小的文件）。当相同版本的数量相同时，合并数量减少，并且文件总数减少。

[#4212]

简化删除过程

删除期间的轮训机制被取消，并由事务回调代替，把delete命令的响应时间减少到毫秒级。

[#3191]

在连接谓词下推时支持简单的可传递性

当查询过滤谓词中涉及的列与联接条件中涉及的列一致时，过滤谓词可以进行列传输，还可以过滤联接中的另一个表，从而减少了数据量，达到了提高查询速度的效果。。

[#3453]

非阻塞 OlapTableSink

在此优化中，发送过程和加载新行过程在 OlapTableSink 中并发执行，并且导入性能得到改善。经过测试，56G Broker load，原始版本将运行4个小时，多版本可将时间减半。

[#3143]

支持数据库级别的事务管理并使用 ArrayDeque 提升事务任务性能

事务管理支持 DB 级别的划分，并且每个 DB 不会相互阻塞，从而提高了事务任务的执行效率

[#3369]

使用IN谓词提高查询性能

添加新的配置 max_pushdown_conditions_per_column 以限制可以向下推到存储引擎的单个列的条件数。它与 split scan key 的先前配置不同。仅默认值为1024。将两种配置分开后，Doris的 qps 有所提高，CPU使用率也降低了。

[#3694]

优化读取 parquet 文件的速度

读取 parquet 文件时，在 broker 读取过程中增加了一个缓存缓冲区。当 broker 要寻找到 position 并从远程 parquet 文件中获取数据时，会先尝试在缓存缓冲区中读取该位置。一旦期望的数据命中了缓存缓冲区，那么我们就不用再花时间从远程 parquet 文件中读取数据了。测试后，broker 中的 parquet 文件的时间可以减半。适用于 Spark load 和 Broker load

[#3878]

新查询函数

bitmap_intersect [Support bitmap_intersect #3571]
orthogonal_bitmap_intersect in UDAF [Add bitmap longitudinal cutting udaf #4198]
orthogonal_bitmap_intersect_count in UDAF [Add bitmap longitudinal cutting udaf #4198]
orthogonal_bitmap_union_count in UDAF [Add bitmap longitudinal cutting udaf #4198]

API 变化

[SegmentV2] 新表的默认存储格式均为 Segment V2 ([SegmentV2] Change the default storage format to SegmentV2 #4387)
[License] 修复了 Doris 目前的一些 License 问题 ([License] Organize and modify the license of the code #4371)
编译 Doris 时，默认将不试用 Mysql client 和 LZO 库。如果需要该依赖，编译时通过修改配置引入依赖。

Credits

@ZhangYu0123
@wfjcmcb
@Fullstop000
@sduzh
@stalary
@worker24h
@chaoyli
@vagetablechicken
@jmk1011
@funyeah
@wutiangan
@gengjun-git
@xinghuayu007
@EmmyMiao87
@songenjie
@acelyc111
@yangzhg
@Seaven
@hexian55
@ChenXiaoFei
@WingsGo
@kangpinghuang
@wangbo
@weizuo93
@sdgshawn
@skyduy
@wyb
@gaodayue
@HappenLee
@kangkaisen
@wuyunfeng
@HangyuanLiu
@xy720
@liutang123
@caiconghui
@liyuance
@spaces-X
@hffariel
@decster
@blackfox1983
@Astralidea
@morningman
@hf200012
@xbyang18
@Youngwb
@imay
@marising
@caoyang10

EmmyMiao87 · 2020-11-06T07:57:41Z

Apache incubator Doris 0.13 has been released. Welcome to try it~

EmmyMiao87 added the release-note label Aug 17, 2020

This comment has been minimized.

Sign in to view

EmmyMiao87 added release notes and removed release-note labels Sep 1, 2020

This comment has been minimized.

Sign in to view

EmmyMiao87 changed the title ~~Release Nodes 0.13.0~~ Release Notes 0.13.0 Sep 10, 2020

EmmyMiao87 linked a pull request Sep 24, 2020 that will close this issue

Fix compile error #4663

Merged

2 tasks

EmmyMiao87 mentioned this issue Sep 24, 2020

[License] Add license #4664

Merged

morningman closed this as completed in #4663 Sep 24, 2020

EmmyMiao87 reopened this Oct 9, 2020

EmmyMiao87 closed this as completed Nov 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Notes 0.13.0 #4370

Release Notes 0.13.0 #4370

EmmyMiao87 commented Aug 17, 2020 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

EmmyMiao87 commented Nov 2, 2020 •

edited

Loading

EmmyMiao87 commented Nov 6, 2020

Release Notes 0.13.0 #4370

Release Notes 0.13.0 #4370

Comments

EmmyMiao87 commented Aug 17, 2020 • edited Loading

New Feature

Query spill to disk

Support bitmap_union, hll_union and count in materialized view

Spark load

Support load json-data into Doris by RoutineLoad or StreamLoad

Modify routine load

Support fetch _id from ES and create table with wildcard or aliase index of ES

Logstash Doris output plugin

Support SELECT INTO OUTFILE

Support in predicate in delete statement

Enhancement

Compaction rules optimization

Simplify the delete process to make it fast

Support simple transitivity on join predicate pushdown

Non blocking OlapTableSink

Support txn management in db level and use ArrayDeque to improve txn task performance

Improve the performance of query with IN predicate

Optimized the speed of reading parquet files

New Built-in Functions

Other

API Change

Credits

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

EmmyMiao87 commented Nov 2, 2020 • edited Loading

新功能

大查询落盘

物化视图支持 bitmap_union，hll_union 和 count

Spark 导入

RoutineLoad 和 StreamLoad 支持新的数据格式：Json

修改 Routine load

Doris on ES

Logstash 插件

支持查询结果输出到文件

在delete语句中支持谓词

优化功能

合并规则优化

简化删除过程

在连接谓词下推时支持简单的可传递性

非阻塞 OlapTableSink

支持数据库级别的事务管理并使用 ArrayDeque 提升事务任务性能

使用IN谓词提高查询性能

优化读取 parquet 文件的速度

新查询函数

API 变化

Credits

EmmyMiao87 commented Nov 6, 2020

EmmyMiao87 commented Aug 17, 2020 •

edited

Loading

Support `bitmap_union`, `hll_union` and `count` in materialized view

Support fetch `_id` from ES and create table with wildcard or aliase index of ES

Support `SELECT INTO OUTFILE`

EmmyMiao87 commented Nov 2, 2020 •

edited

Loading

物化视图支持 `bitmap_union`，`hll_union` 和 count