Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist compaction time window to table manifest #1055

Closed
v0y4g3r opened this issue Feb 22, 2023 · 4 comments
Closed

Persist compaction time window to table manifest #1055

v0y4g3r opened this issue Feb 22, 2023 · 4 comments
Assignees
Labels
C-feature Category Features good first issue Good for newcomers

Comments

@v0y4g3r
Copy link
Contributor

v0y4g3r commented Feb 22, 2023

What problem does the new feature solve?

Everytime when compacting a table, we scan all SST files in level 0 and calculate a suitable time window for compaction.

let time_bucket = infer_time_bucket(&files);

With the presumption that the workload of a given table remains consistent, we can persist the inferred time window to table manifest to avoid calculating it everytime.

Another benefit of this proposal is that it enforces all SSTs in level 1 to have the same aligned time window, which makes it easier to find compactable SSTs in level 1.

What does the feature do?

Persist inferred time window to table manifest on first compaction and use that window for following compactions.

Implementation challenges

We can add a field compaction_time_window on TableOptions :

pub struct TableOptions {
/// Memtable size of memtable.
pub write_buffer_size: Option<ReadableSize>,
/// Time-to-live of table. Expired data will be automatically purged.
#[serde(with = "humantime_serde")]
pub ttl: Option<Duration>,
/// Extra options that may not applicable to all table engines.
pub extra_options: HashMap<String, String>,
}

We can also allow users to manually specify this option when creating table using WITH clause like:

CREATE TABLE monitor (host STRING,
    ts BIGINT,
    cpu DOUBLE DEFAULT 0,
    memory DOUBLE,
    TIME INDEX (ts),
    PRIMARY KEY(host)
) ENGINE=mito WITH(regions=1, ttl='7days', compaction_time_window='2hours');
@v0y4g3r v0y4g3r added good first issue Good for newcomers C-feature Category Features labels Feb 22, 2023
@v0y4g3r v0y4g3r mentioned this issue Feb 22, 2023
15 tasks
@etolbakov
Copy link
Collaborator

Hi @v0y4g3r,
I would like to give a go if you don't mind.

@killme2008
Copy link
Contributor

@v0y4g3r Do we finish this job?

@v0y4g3r
Copy link
Contributor Author

v0y4g3r commented Apr 11, 2023

@v0y4g3r Do we finish this job?

Not yet. We still need to to finish the compaction window persistence part. I will resolve this ASAP.

@killme2008
Copy link
Contributor

ping @v0y4g3r

@v0y4g3r v0y4g3r closed this as completed May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category Features good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants