Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider refactory catalog trait #3098

Closed
BohuTANG opened this issue Nov 25, 2021 · 4 comments · Fixed by #3136
Closed

Consider refactory catalog trait #3098

BohuTANG opened this issue Nov 25, 2021 · 4 comments · Fixed by #3136
Assignees
Labels
A-query Area: databend query C-improvement Category: improvement

Comments

@BohuTANG
Copy link
Member

Summary

catalog trait is not clear, for example in catalog trait it has a build_table to return a dyn Table by table info and used by many places:

fn build_table(&self, table_info: &TableInfo) -> Result<Arc<dyn Table>>;

In database trait, it has get_table to return dyn Table by db_name and table_name and used by many places other:

async fn get_table(&self, db_name: &str, table_name: &str) -> Result<Arc<dyn Table>>;

Goal:

  1. Provide a unitfied trait api
  2. User can easily implement their own datasource under the api
@Veeupup
Copy link
Contributor

Veeupup commented Nov 25, 2021

agree with you ... they have a similar meaning and I get confused when first encountering these two trait methods

@BohuTANG BohuTANG added A-query Area: databend query C-improvement Category: improvement labels Nov 26, 2021
@BohuTANG BohuTANG self-assigned this Nov 26, 2021
@BohuTANG
Copy link
Member Author

Some initial thoughts:

  1. Lift the database trait to catalog trait, it something as: https://github.com/datafuselabs/databend/blob/2fd5b9c175ae6212d12c5115ab189333616d3271/query/src/catalogs/catalog.rs
    How should we define the functions of catalog?

  2. Rename datasources to storages, the directory layout looks like:

storages
├── csv
├── memory
├── null
├── parquet
├── github
├── system
└── fuse

In storages dir, the table engine defines:
How and where data is stored, where to write it and where read from.

@drmingdrmer
Copy link
Member

To my understanding, there two major pieces of information for data:

  • Where the data is,
  • and what the format of the data is.

Take csv as an example, it actually defines both of these two.

Maybe we need to clarify these two roles from bottom up, e.g.:
CsvTable should only care about the data format. where the data is should be provided by something like a DAL.

@BohuTANG
Copy link
Member Author

To my understanding, there two major pieces of information for data:

  • Where the data is,
  • and what the format of the data is.

Take csv as an example, it actually defines both of these two.

Maybe we need to clarify these two roles from bottom up, e.g.: CsvTable should only care about the data format. where the data is should be provided by something like a DAL.

I am trying on it:#3136
In storages, each table(CSV/Parquet/Memory etc.) is an storage engine, there is no concept of a database engine.
It looks a bit clearer now, but still needs some work, I guess it will be finished soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query C-improvement Category: improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants