You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
Description
Now, doris only store data in local disk, it makes you can read and write data on disk quickly. But not all data in database is read/written usually, most data is used when it is a new one. When the data is not hot, it will still cost the space of the disk.You can delete it, however some data maybe useful again some time.
So, the cold data need to be saved on some cheaper storage, such as BOS/S3/HDFS, etc. It will be cheaper.
Then the cold data can also be read when it is necessary, just from remote storage.
Overall
Support remote storage, data will be move to remote storage(BOS/S3) when it is cold.
Dynamic partition need to be set to cold continuously by the create time, so we can set them cold continuously.
Meta need to be local, so we can read it quickly. Then read data by the meta.
When cold data need to be read, get it from remote storage.
remote storage need to be similar to local storage, cold data can be read, moved to trash and deleted, but cant't be appended.
Detail design
BE will resovle the relation of local disk and remote storage.
Local disk will hold the meta, which will be used to find which data is needed.
Remote storage will hold the cold data, which will be read by be.
FE
|
BE
| |
META DATA
LOCAL DISK REMOTE STORAGE
Support remote storage
remote storage configure will be set in the properties of Create/Alter Table
a. storage_medium is the storage for hot data.
b. storage_cold_medium is the destination storage which cold data will be moved to.
c. storage_cooldown_time is the time for cold data.
Dynamic partition cold data
Dynamic partition is created continuously, so the cold time must be set by the partition time.
a. dynamic_partition.hot_partition_num means how many hot partition will relay, the older partition will be set to cold.
b. dynamic_partition.storage_medium is the storage holding hot data.
c. dynamic_partition.storage_cold_medium is the dest storage for cold data.
Read cold data, meta will be local
When you are calling select and the data is cold. BE will get meta of local disck first, choose which data is needed.
Then the matched remote data will be read and return to BE.
SELECT * FROM TblPxy;
Cold data trash
When cold data need to be dropped, move it to trash path on remote storage, and the trash path will be set in local trash path.
Cleaner will check local trash path, if it's time to delete, remote data will be deleted first, and then local.
Search before asking
Description
Now, doris only store data in local disk, it makes you can read and write data on disk quickly. But not all data in database is read/written usually, most data is used when it is a new one. When the data is not hot, it will still cost the space of the disk.You can delete it, however some data maybe useful again some time.
So, the cold data need to be saved on some cheaper storage, such as BOS/S3/HDFS, etc. It will be cheaper.
Then the cold data can also be read when it is necessary, just from remote storage.
Overall
Detail design
BE will resovle the relation of local disk and remote storage.
Local disk will hold the meta, which will be used to find which data is needed.
Remote storage will hold the cold data, which will be read by be.
remote storage configure will be set in the properties of Create/Alter Table
a. storage_medium is the storage for hot data.
b. storage_cold_medium is the destination storage which cold data will be moved to.
c. storage_cooldown_time is the time for cold data.
Dynamic partition is created continuously, so the cold time must be set by the partition time.
a. dynamic_partition.hot_partition_num means how many hot partition will relay, the older partition will be set to cold.
b. dynamic_partition.storage_medium is the storage holding hot data.
c. dynamic_partition.storage_cold_medium is the dest storage for cold data.
When you are calling select and the data is cold. BE will get meta of local disck first, choose which data is needed.
Then the matched remote data will be read and return to BE.
When cold data need to be dropped, move it to trash path on remote storage, and the trash path will be set in local trash path.
Cleaner will check local trash path, if it's time to delete, remote data will be deleted first, and then local.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: