Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[skip untouched files]Enable skipping untouched files during materialize #137

Merged

Conversation

Zyiqin-Miranda
Copy link
Member

This PR adds support for:

  1. Skipping untouched files during materialize.
    All manifest entries that do not contain records to be updated are considered untouched. Those manifest entries are copied by reference.
  2. Expose the ratio of untouched file count divided by total manifest entry files count as a logger line for now.
    Will follow up with PR to emit the ratio as a metrics.

@Zyiqin-Miranda Zyiqin-Miranda force-pushed the efficiency-improvement-setup branch from c2cdf71 to c633130 Compare June 13, 2023 23:32
Copy link
Collaborator

@valiantljk valiantljk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM, just a minor comment on how to persist the ratio, so we can easily reuse it in subsequent runs.

deltacat/compute/compactor/compaction_session.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Please check high level comments and few minor comments.

deltacat/compute/compactor/compaction_session.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
deltacat/compute/compactor/steps/materialize.py Outdated Show resolved Hide resolved
@Zyiqin-Miranda Zyiqin-Miranda force-pushed the efficiency-improvement-setup branch from c633130 to bd8967d Compare June 23, 2023 00:22
@valiantljk
Copy link
Collaborator

LGTM

@valiantljk
Copy link
Collaborator

Please create issues for todos, and linked in the pr.

@Zyiqin-Miranda
Copy link
Member Author

Related issue tracking TODO for integration tests: #144

@Zyiqin-Miranda
Copy link
Member Author

Related issue tracking moving stage delta implementation to internal: #145

Support Repartition to split and organize the data into multiple groups
@Zyiqin-Miranda Zyiqin-Miranda force-pushed the efficiency-improvement-setup branch from bd8967d to cf41061 Compare June 26, 2023 18:49
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@Zyiqin-Miranda
Copy link
Member Author

@raghumdani, any more comment on this PR?

@raghumdani
Copy link
Collaborator

@raghumdani, any more comment on this PR?

Looks good to me! Thanks

@Zyiqin-Miranda Zyiqin-Miranda merged commit 2ef0c40 into ray-project:main Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants