Consider refactory catalog trait #3098

BohuTANG · 2021-11-25T09:45:17Z

Summary

catalog trait is not clear, for example in catalog trait it has a build_table to return a dyn Table by table info and used by many places:

fn build_table(&self, table_info: &TableInfo) -> Result<Arc<dyn Table>>;

In database trait, it has get_table to return dyn Table by db_name and table_name and used by many places other:

async fn get_table(&self, db_name: &str, table_name: &str) -> Result<Arc<dyn Table>>;

Goal:

Provide a unitfied trait api
User can easily implement their own datasource under the api

The text was updated successfully, but these errors were encountered:

Veeupup · 2021-11-25T13:55:01Z

agree with you ... they have a similar meaning and I get confused when first encountering these two trait methods

BohuTANG · 2021-11-27T10:03:59Z

Some initial thoughts:

Lift the database trait to catalog trait, it something as: https://github.com/datafuselabs/databend/blob/2fd5b9c175ae6212d12c5115ab189333616d3271/query/src/catalogs/catalog.rs
How should we define the functions of catalog?
Rename datasources to storages, the directory layout looks like:

storages
├── csv
├── memory
├── null
├── parquet
├── github
├── system
└── fuse

In storages dir, the table engine defines:
How and where data is stored, where to write it and where read from.

drmingdrmer · 2021-11-29T08:00:24Z

To my understanding, there two major pieces of information for data:

Where the data is,
and what the format of the data is.

Take csv as an example, it actually defines both of these two.

Maybe we need to clarify these two roles from bottom up, e.g.:
CsvTable should only care about the data format. where the data is should be provided by something like a DAL.

BohuTANG · 2021-11-29T08:04:22Z

To my understanding, there two major pieces of information for data:

Where the data is,

and what the format of the data is.

Take csv as an example, it actually defines both of these two.

Maybe we need to clarify these two roles from bottom up, e.g.: CsvTable should only care about the data format. where the data is should be provided by something like a DAL.

I am trying on it:#3136
In storages, each table(CSV/Parquet/Memory etc.) is an storage engine, there is no concept of a database engine.
It looks a bit clearer now, but still needs some work, I guess it will be finished soon.

BohuTANG added A-query Area: databend query C-improvement Category: improvement labels Nov 26, 2021

BohuTANG self-assigned this Nov 26, 2021

BohuTANG mentioned this issue Nov 28, 2021

STORAGE-3098: Catalog refactor #3136

Merged

BohuTANG closed this as completed in #3136 Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider refactory catalog trait #3098

Consider refactory catalog trait #3098

BohuTANG commented Nov 25, 2021

Veeupup commented Nov 25, 2021

BohuTANG commented Nov 27, 2021

drmingdrmer commented Nov 29, 2021

BohuTANG commented Nov 29, 2021

Consider refactory catalog trait #3098

Consider refactory catalog trait #3098

Comments

BohuTANG commented Nov 25, 2021

Veeupup commented Nov 25, 2021

BohuTANG commented Nov 27, 2021

drmingdrmer commented Nov 29, 2021

BohuTANG commented Nov 29, 2021