Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0-byte file is the result of copying a file to itself with DVCFileSystem.get_file with any file larger than COPY_PBAR_MIN_SIZE #318

Open
adamliter opened this issue Dec 10, 2024 · 0 comments

Comments

@adamliter
Copy link

adamliter commented Dec 10, 2024

Bug report

If you use DVCFileSystem's get_file method to copy a file to itself, you'll get a file with size of 0 bytes if the file size greater than COPY_PBAR_MIN_SIZE. However, if the file size is less than COPY_PBAR_MIN_SIZE, you'll get the original file back.

You end up with a 0-byte file because of this code here.

Current behavior

$ cd /tmp
$ mkdir dvc-test
$ cd dvc-test
$ pdm init --python cpython@3.12
$ git init
$ dvc init
$ git add .
$ git commit -m "initial commit"
$ truncate -s 2G model.ckpt
$ dvc add model.ckpt
$ git add .
$ git commit -m "trained model"
$ ls -lh model.ckpt
-rw-r--r--@ 1 adam.liter  wheel   2.0G Dec  9 17:21 model.ckpt

Then from Python (e.g., pdm run python):

from dvc.api import DVCFileSystem
fs = DVCFileSystem()
fs.get_file("model.ckpt", "model.ckpt")

Now go back to a shell and check the file size:

$ ls -lh model.ckpt
-rw-r--r--@ 1 adam.liter  wheel     0B Dec  9 17:25 model.ckpt

Expected behavior

The behavior of dvc_objects.fs.utils.copyfile should be the same for all files, regardless of file size. In particular, if copying a file to itself when the file size is greater than COPY_PBAR_MIN_SIZE, the result should not be a 0-byte file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant