Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: copy files order by last modified time asc #8628

Merged
merged 1 commit into from
Nov 4, 2022

Conversation

BohuTANG
Copy link
Member

@BohuTANG BohuTANG commented Nov 3, 2022

I hereby agree to the terms of the CLA available at: https://databend.rs/dev/policies/cla/

Summary

Copy files in asc order of file last modification time.

Closes #issue

@vercel
Copy link

vercel bot commented Nov 3, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Updated
databend ⬜️ Ignored (Inspect) Nov 3, 2022 at 1:07PM (UTC)

@mergify mergify bot added the pr-feature this PR introduces a new feature to the codebase label Nov 3, 2022
@BohuTANG BohuTANG force-pushed the dev-copy-files-order branch from d87f1ff to d447fc5 Compare November 3, 2022 13:07
@BohuTANG BohuTANG requested a review from Xuanwo November 3, 2022 13:07
@BohuTANG BohuTANG marked this pull request as ready for review November 3, 2022 13:07
@Xuanwo
Copy link
Member

Xuanwo commented Nov 3, 2022

Why we need this change? After stream copy implemented, we can't do sort like this anymore.

@BohuTANG
Copy link
Member Author

BohuTANG commented Nov 3, 2022

Why we need this change? After stream copy implemented, we can't do sort like this anymore.

If we have many files need to copy, let the oldest files fisrt, we can keep the insert order same as the file modified order.

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't make sense to me. But worth a try.

@PsiACE
Copy link
Member

PsiACE commented Nov 3, 2022

It feels possible to add a switch for it if necessary.

@Xuanwo
Copy link
Member

Xuanwo commented Nov 3, 2022

It feels possible to add a switch for it if necessary.

We can't sort the files if there are 10W files, we will copy them in stream which will in the order returned by list (in the near future).

@PsiACE
Copy link
Member

PsiACE commented Nov 3, 2022

We can't sort the files if there are 10W files, we will copy them in stream which will in the order returned by list (in the near future).

So I suggest leaving it up to the user to decide whether to use this change

set order_by_last_modified_time = 1

@BohuTANG
Copy link
Member Author

BohuTANG commented Nov 4, 2022

  1. Each file size is larger, for example, 100GB
  2. We have a lot of these files like 1), and it's being produced all the time
  3. The order is important, let's the oldest copy first.

@BohuTANG BohuTANG merged commit a8bcc20 into databendlabs:main Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants