-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release Notes 0.13.0 #4370
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
新功能大查询落盘Doris在排序和窗口函数功能中支持查询落盘功能。当参数 物化视图支持
|
Apache incubator Doris 0.13 has been released. Welcome to try it~ |
New Feature
Query spill to disk
Doris supports query spill to disk in sorting and window functions. When the
enable_spilling
is true and memory limit is reached, the query will spill to disk so as to avoid the problem of unable to query due to memory bottleneck. The 0.13 version supports spill in sort and window function.[#3820] [#4151] [#4152]
Support
bitmap_union
,hll_union
andcount
in materialized viewMaterialized view supports richer aggregate functions:
bitmap_union
,hll_union
andcount
. In the Order scenario, user needs to analyze the number of orders in different dimensions by count. Also the pre-calculation of bitmap and hll function can be performed for some deduplication analysis scenarios such as analyzing PV and UV data in website traffic. Doris can automatically match the user's query to an optimal materialized view to speed up the query.[#3651] [#3677] [#3705] [#3873] [#4014] [#3677]
Spark load
Spark load implements the preprocessing of imported data through external Spark resources, improves the import performance of Doris large data volume and saves Doris cluster computing resources. It is mainly used for scenarios where a large amount of data is imported into Doris during the initial migration.
[#3418] [#3712] [#3715] [#3716]
Support load json-data into Doris by RoutineLoad or StreamLoad
RoutineLoad and StreamLoad support a new data format: json. The data in json format is finally imported into Doris through the transform rules in the load statement. This function is especially beneficial for log services whose original data format is json. Users no longer need to process the data into csv format in the outer layer.
[#3553]
Modify routine load
The properties of routine load such as concurrency, Kafka consumption progress could be modify by
ALTER ROUTINE LOAD
stmt. Only jobs in the PAUSED state can be modified. After routine load is modified, the newly set properties will be used to plan the task when the task is scheduled again.[#4158]
Support fetch
_id
from ES and create table with wildcard or aliase index of ESThere is
_id
field from native ES document which is primary-key for ES index. This field could be fetch by Doris on ES. Also, Doris support create external table withaliases
orwildcard index
such aslog_*
. User can easily search all those index by using aliases and wildcards to match those indexes.[#3900] [#3968]
Logstash Doris output plugin
Logstash plugin is used to output data to Doris for logstash. Use the HTTP protocol to interact with the Doris FE Http interface
Load data through Doris's stream load.
[#3800]
Support
SELECT INTO OUTFILE
Doris currently supports exporting query results to a third-party file system such as HDFS, S3, BOS. The grammar is referenced from the MySQL grammar manual. The export format is CSV. The export query results could be provide to other users to download or further processing by other systems. Especially good for this kind that the result reset is too large to through the MySQL protocol such as a large number of ids by
bitmap_to_string
.[#3584]
Support in predicate in delete statement
The delete statement supports conditions for IN or NOT IN predicate. Users can delete rows that meet different values through this function.
[#4006]
Enhancement
Compaction rules optimization
This optimization updated the strategy for triggering compaction, a version merging strategy that compromises write amplification, space amplification, and read performance (it tends to merge files of adjacent sizes). When the number of the same version is the same, the number of merges is reduced and the total number of files is reduced.
[#4212]
Simplify the delete process to make it fast
The load checker of the rotation training during deletion is cancelled and replaced by txn callback, which will reduce the corresponding time of the delete command to the millisecond level.
[#3191]
Support simple transitivity on join predicate pushdown
When the columns involved in the query filter predicate are consistent with the columns involved in the join condition, the filter predicate can conduct column transmission and also filter another table in the join, reducing the amount of data and achieving the effect of improving the query speed.
[#3453]
Non blocking OlapTableSink
In this optimization, the sending process and the adding row process are executed concurrently in
OlapTableSink
, and the load performance is always improved. After testing, 56G broker load, the origin ver will run for 4 hours, the multi-ver can halve the time.[#3143]
Support txn management in db level and use ArrayDeque to improve txn task performance
The transaction management part supports the division of db levels, and each db does not block each other, which improves the execution efficiency of transaction tasks
[#3369]
Improve the performance of query with IN predicate
Add a new config
max_pushdown_conditions_per_column
to limit the number of conditions of a single column that can be pushed down to the storage engine. It is different from the previous configuration that controls the split scan key. The default value alone is 1024. After the two configurations are separated, the qps of Doris has improved, and the CPU usage rate has also decreased.[#3694]
Optimized the speed of reading parquet files
There is a cache buffer array in broker reading process when reading parquet file. When a broker about to seek for a position and get data from remote parquet file, try reading with this position in the cache buffer array. Once the expected data hits the cache buffer array, then we don't bother to read data from remote parquet file. After testing, the load time of parquet file in broker or spark load can halve the time.
[#3878]
New Built-in Functions
bitmap_intersect
[Support bitmap_intersect #3571]orthogonal_bitmap_intersect
in UDAF [Add bitmap longitudinal cutting udaf #4198]orthogonal_bitmap_intersect_count
in UDAF [Add bitmap longitudinal cutting udaf #4198]orthogonal_bitmap_union_count
in UDAF [Add bitmap longitudinal cutting udaf #4198]Other
text
type #4300)API Change
Credits
@ZhangYu0123
@wfjcmcb
@Fullstop000
@sduzh
@stalary
@worker24h
@chaoyli
@vagetablechicken
@jmk1011
@funyeah
@wutiangan
@gengjun-git
@xinghuayu007
@EmmyMiao87
@songenjie
@acelyc111
@yangzhg
@Seaven
@hexian55
@ChenXiaoFei
@WingsGo
@kangpinghuang
@wangbo
@weizuo93
@sdgshawn
@skyduy
@wyb
@gaodayue
@HappenLee
@kangkaisen
@wuyunfeng
@HangyuanLiu
@xy720
@liutang123
@caiconghui
@liyuance
@spaces-X
@hffariel
@decster
@blackfox1983
@Astralidea
@morningman
@hf200012
@xbyang18
@Youngwb
@imay
@marising
@caoyang10
The text was updated successfully, but these errors were encountered: