You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We refactor our storage engine, following is the main changes:
Refactoring BE is to clarify the structure the codes.
Using unique id to indicate a rowset.
Name rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
Extract an rowset interface to encapsulate rowsets
with different format.
Because of this work, we can support more format of storage file.
And now we are working for BetaRowset, which will introduce a more
effective compression method for string type. We will support
inverted index based on BetaRowset, it should be released in the
next version.
Support Bitmap
We support bitmap type and bitmap union operation on it. User can
leverage this function to compute accurate distinct number quickly.
For example, user can map visitor ids to this type and get distinct
number of visitors through bitmap union operation.
We unify all types of our documents. We should write several
copies documents for one function before. Now only one copy of
document is needed to be written, it will be used in many place,
such as our website or our help command.
We also support English document and English website in this release.
Now user can load Parquet format files through broker load. Besides this,
Doris can get column content from file path. This is more friendly for
integration with Spark and Hive.
Enhancement
Support Timezone
Now we support timezone, user can specify timezone when querying or loading.
We add allocator to reduce huge memory allocating and free, which can improve
performance when there are high concurrency requests.
Support MIN/MAX for char/varchar
User can create min max for char or varchar type.
Others
Remove query status report from BE when query is cancelled normally (#1489)
Optimize the load performance for large file (#1798)
Free olap scanner out of lock (#1733)
Add exchange in MemPool to reduce alloc/free operation (#1732)
Shuffle partitioned instance to avoid skew (#1744)
Reduce unnecessary memory allocate and copy in OlapScanNode (#1742)
Improve LRUCache to get better performance (#1826)
Split channel close operation into two phase (#1830)
Make http server and thrift server backlog num configurable (#1638)
Add timeout on snapshot of data (#1672)
Make the max recursion depth of distribution pruner configurable (#1709)
Limit the disk usage to avoid running out of disk capacity (#1702)
Refactor DateLiteral class in FE (#1644)
Add limit to show tablet stmt (#1547)
Unify the msg of 'Memory exceed limit' (#1737)
Encapsulate HLL logic (#1756)
Make CpuInfo::get_current_core work (#1773)
Refactor alter job (#1695)
Add parallel_exchange_instance_num to set parallel after exchange (#1788)
Resolve reduce/reduce conflict in our syntax (#1811)
Limit the max version to cumulative compaction (#1813)
Check file descriptor number is larger than 65536 upon start (#1819)
Check buckets limit: buckets > 0 when adding partition (#1855)
SQL Support
Multiple Columns Partition
When creating table with OLAP engine, use can specify multi partition columns.
eg:
PARTITION BY RANGE(`date`, `id`)
(
PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
)
More Push Down
Support push down predicates past agg, win and sort. This will filter data ASAP,
which can improve query's performance.
Others
Support TIME type and timediff function (#1505)
Show load statement support offset (#1531)
Fix <=> operator and in operator get wrong result
Add PreAgg Hint (#1617)
Bug-fix: error result of union stmt (#1758)
Fix bug: unknown column from the inline view (#1770)
Support table comment and column comment for view (#1799)
Support grant GRANT privilege on database or table #1472
Fix bug: Remove conjuncts for empty set node (#1840)
Add a ALTER operation to change distribution type from RANDOM to HASH (#1823)
Support cast datetime to decimal (#1849)
Enable StringLiteral cast to Varchar (#1846)
Support hll_empty function (#1825)
Fix NPE error when creating table with bool column (#1864)
Load
Function and Where clause in Broker Load
User can specify function map and where clause in Broker Load.
Others
Support timeout in stream load #1480
Add more profile for OlapTableSink #1487
Fix the duplicated request bug of mini load #1504
Add more logs and metrics to trace the broker load process (#1530)
Fix Bug: Load fail when we don't specify format type. (#1538)
Allow the null default in insert into stmt (#1556)
Fix Broker load hang when rpc failed (#1567)
Fix parquet directory have empty file (#1593)
Support Decimal Type when load Parquet File (#1595)
Broker load supports function (#1592)
Insert select Stmt keep the same semantics with mysql (#1626) (#1628)
Support read kafka partition from start (#1642)
Support checking error data row when doing INSERT (#1597)
Enable parsing columns from file path for Broker Load (#1582) (#1635)
Support setting timeout for stream load (#1670)
Reduce the number of partition info in BrokerScanNode param (#1675)
Add strict mode in Routine load, Stream load and Mini load (#1677)
Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690)
Add a loaded rows in SHOW LOAD result (#1686)
Error check about column which has no default value (#1728)
Optimize some kinds of load jobs (#1762)
Commit kafka offset (#1734)
Fix bug that routine load may mistakenly skipped some data (#1832)
Support setting timezone for stream load and routine load (#1831)
Bug Fixes
Fix bug that replicas of a tablet may be located on same host (#1517)
Fix bug that bad replica can not be synchronized when report (#1634)
Fix bug that failed to get enough normal replica because path hash is not set. (#1714)
Take segments in singleton rowset into consideration upon cumulative compaction (#1866)
Fix BE crash when doing rollup #1502
Fix get wrong partition type for non partition table #1503
Fix bug that BE may crash when closing OlapTableSink (#1507)
Fix bug that WrapperField does not consider HLL column type when creating (#1514)
Fix variable arguments bug in UDAF (#1523)
Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt (#1528)
Fix Bug: Load fail when we don't specify format type. (#1538)
Fix the null pointer exception when ReplayOnAborted of txn in broker load (#1543)
Fix bug which make BE crash when load HLL type (#1552)
Fix bug that getting compatible type for TIME with other types fails (#1544)
Fix bugs of Broker load (#1546)
Fix bug that unable to delete replica if version is missing (#1585)
Fix errors when ES username and passwd is empty (#1601)
Fix bug that cluster balance may cause load job failed (#1581)
Fix bug: localtime is not thread-safe,then changed to localtime_r. (#1614)
Fix bug that encounter "No more data to read" when accessing broker (#1621)
Fix tablet restore api in BE(#1623) (#1624)
Fix bug: "SHOW DATA" or "SHOW PARTITIONS", the DATA-SIZE less than 0 (#1680)
Fix bug that failed to create a new partition when no partition in a table (#1688)
Fix bug that the calculation of disk usage percent is wrong (#1791)
Fix tablet meta tool command argument bug (#1810)
Seek block when starts a ScanKey (#1828)
Fix two digit year bug in to_days function (#1839)
Fix bug: compare column with equals rather than == (#1850)
Collect scanner's status when es_http_scan_node close (#1861)
Highlight
Storage Engine Refactor
We refactor our storage engine, following is the main changes:
Name rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
with different format.
Because of this work, we can support more format of storage file.
And now we are working for BetaRowset, which will introduce a more
effective compression method for string type. We will support
inverted index based on BetaRowset, it should be released in the
next version.
Support Bitmap
We support bitmap type and bitmap union operation on it. User can
leverage this function to compute accurate distinct number quickly.
For example, user can map visitor ids to this type and get distinct
number of visitors through bitmap union operation.
#1610
#1721
More documents
We unify all types of our documents. We should write several
copies documents for one function before. Now only one copy of
document is needed to be written, it will be used in many place,
such as our website or our help command.
We also support English document and English website in this release.
#1586
#1518
#1719
#1729
Support Load Parquet Format
Now user can load Parquet format files through broker load. Besides this,
Doris can get column content from file path. This is more friendly for
integration with Spark and Hive.
Enhancement
Support Timezone
Now we support timezone, user can specify timezone when querying or loading.
#1587
#1598
#1631
Add ChunkAllocator
We add allocator to reduce huge memory allocating and free, which can improve
performance when there are high concurrency requests.
Support MIN/MAX for char/varchar
User can create min max for char or varchar type.
Others
Remove query status report from BE when query is cancelled normally (#1489)
Optimize the load performance for large file (#1798)
Free olap scanner out of lock (#1733)
Add exchange in MemPool to reduce alloc/free operation (#1732)
Shuffle partitioned instance to avoid skew (#1744)
Reduce unnecessary memory allocate and copy in OlapScanNode (#1742)
Improve LRUCache to get better performance (#1826)
Split channel close operation into two phase (#1830)
Make http server and thrift server backlog num configurable (#1638)
Add timeout on snapshot of data (#1672)
Make the max recursion depth of distribution pruner configurable (#1709)
Limit the disk usage to avoid running out of disk capacity (#1702)
Refactor DateLiteral class in FE (#1644)
Add limit to show tablet stmt (#1547)
Unify the msg of 'Memory exceed limit' (#1737)
Encapsulate HLL logic (#1756)
Make CpuInfo::get_current_core work (#1773)
Refactor alter job (#1695)
Add parallel_exchange_instance_num to set parallel after exchange (#1788)
Resolve reduce/reduce conflict in our syntax (#1811)
Limit the max version to cumulative compaction (#1813)
Check file descriptor number is larger than 65536 upon start (#1819)
Check buckets limit: buckets > 0 when adding partition (#1855)
SQL Support
Multiple Columns Partition
When creating table with OLAP engine, use can specify multi partition columns.
eg:
More Push Down
Support push down predicates past agg, win and sort. This will filter data ASAP,
which can improve query's performance.
Others
Support TIME type and timediff function (#1505)
Show load statement support offset (#1531)
Fix <=> operator and in operator get wrong result
Add PreAgg Hint (#1617)
Bug-fix: error result of union stmt (#1758)
Fix bug: unknown column from the inline view (#1770)
Support table comment and column comment for view (#1799)
Support grant GRANT privilege on database or table #1472
Fix bug: Remove conjuncts for empty set node (#1840)
Add a ALTER operation to change distribution type from RANDOM to HASH (#1823)
Support cast datetime to decimal (#1849)
Enable StringLiteral cast to Varchar (#1846)
Support hll_empty function (#1825)
Fix NPE error when creating table with bool column (#1864)
Load
Function and Where clause in Broker Load
User can specify function map and where clause in Broker Load.
Others
Support timeout in stream load #1480
Add more profile for OlapTableSink #1487
Fix the duplicated request bug of mini load #1504
Add more logs and metrics to trace the broker load process (#1530)
Fix Bug: Load fail when we don't specify format type. (#1538)
Allow the null default in insert into stmt (#1556)
Fix Broker load hang when rpc failed (#1567)
Fix parquet directory have empty file (#1593)
Support Decimal Type when load Parquet File (#1595)
Broker load supports function (#1592)
Insert select Stmt keep the same semantics with mysql (#1626) (#1628)
Support read kafka partition from start (#1642)
Support checking error data row when doing INSERT (#1597)
Enable parsing columns from file path for Broker Load (#1582) (#1635)
Support setting timeout for stream load (#1670)
Reduce the number of partition info in BrokerScanNode param (#1675)
Add strict mode in Routine load, Stream load and Mini load (#1677)
Fix bug that 2 same stream load jobs may both be able to executed successfully (#1690)
Add a loaded rows in SHOW LOAD result (#1686)
Error check about column which has no default value (#1728)
Optimize some kinds of load jobs (#1762)
Commit kafka offset (#1734)
Fix bug that routine load may mistakenly skipped some data (#1832)
Support setting timezone for stream load and routine load (#1831)
Bug Fixes
Fix bug that replicas of a tablet may be located on same host (#1517)
Fix bug that bad replica can not be synchronized when report (#1634)
Fix bug that failed to get enough normal replica because path hash is not set. (#1714)
Take segments in singleton rowset into consideration upon cumulative compaction (#1866)
Fix BE crash when doing rollup #1502
Fix get wrong partition type for non partition table #1503
Fix bug that BE may crash when closing OlapTableSink (#1507)
Fix bug that WrapperField does not consider HLL column type when creating (#1514)
Fix variable arguments bug in UDAF (#1523)
Fix bug that user with LOAD_PRIV can see load job by SHOW LOAD stmt (#1528)
Fix Bug: Load fail when we don't specify format type. (#1538)
Fix the null pointer exception when ReplayOnAborted of txn in broker load (#1543)
Fix bug which make BE crash when load HLL type (#1552)
Fix bug that getting compatible type for TIME with other types fails (#1544)
Fix bugs of Broker load (#1546)
Fix bug that unable to delete replica if version is missing (#1585)
Fix errors when ES username and passwd is empty (#1601)
Fix bug that cluster balance may cause load job failed (#1581)
Fix bug: localtime is not thread-safe,then changed to localtime_r. (#1614)
Fix bug that encounter "No more data to read" when accessing broker (#1621)
Fix tablet restore api in BE(#1623) (#1624)
Fix bug: "SHOW DATA" or "SHOW PARTITIONS", the DATA-SIZE less than 0 (#1680)
Fix bug that failed to create a new partition when no partition in a table (#1688)
Fix bug that the calculation of disk usage percent is wrong (#1791)
Fix tablet meta tool command argument bug (#1810)
Seek block when starts a ScanKey (#1828)
Fix two digit year bug in to_days function (#1839)
Fix bug: compare column with equals rather than == (#1850)
Collect scanner's status when es_http_scan_node close (#1861)
Thirdparty
Bump thirdparty's BZIP2 version to 1.0.8 (#1559)
Credits
Thanks to everyone who contributed to this release!
@DDDDDDouble
@EmmyMiao87
@HangyuanLiu
@WingsGo
@Youngwb
@acelyc111
@chaoyli
@chenhao7253886
@cquptEthan
@gaodayue
@imay
@kangkaisen
@kangpinghuang
@lenmom
@liutang123
@lxqfy
@manannan2017
@morningman
@shgxwxl
@wangbo
@wkhappy1
@worker24h
@wubiaoi
@wuyunfeng
@xionglei0
@xy720
@yangzhg
@yiguolei
@yuanlihan
The text was updated successfully, but these errors were encountered: