-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repro: acknowledge that text file might come from different OS #6314
Comments
We had a few more cases where files pushed on Windows showed a change only in size when pulled and run with |
I'm working with @SebbanSms on a project and we have different os. Would be great to find a solution here. |
Caused by #4658 The workaround for now is to use |
@efiop Thanks for the reply. Do you have a guide in the dvc documentation how to set this up? I admit, I'm quite unfamiliar with git config. Currently we use a And my guess is we would need to use |
DVC stopped using dos2unix hashes, so in 3.x, the hashes are different too. We recommend users to ensure outputs are generated with proper line-endings. Closing. |
Discussed in #6313
Originally posted by SebbanSms July 14, 2021
I have some issues using a s3 bucket to push and pull input data with dvc.
I have a stage where the
deps
areouts
of a prior stage.When I reproduce the dvc.yaml and push the data on OSX(Mac)
and then pull it on another machine using Windows, I reproduce the dvc.yaml again,
dvc.lock shows the same hash but different file sizes for that stage
deps
and reruns the stage completely, then also showing different files sizes and hash on theouts
git marks the diff in my IDE in the deps of that stage in the .lock file only for sizes:
Any idea how the file size in
deps
could change if the hash is the same?What I tried so far:
deleting the files on OSX, pull them again from s3, reproducing dvc.yaml -> no changes detected, dvc.lock stays the same
delteing the files on Windows, pull them again from s3, reproducing dvc.yaml -> changes detected, dvc.lock shows different file sizes for the files in deps of that stage
On both systems, I definitely use the same git commit.
It seems that running
repro
on different file system will retrigger existing stages just because file size from Unix system is different than one from Windows.We should probably acknowledge that OS when adding the file (same as we do with calculating hash). The problem is that changing this behaviour now would probably affect all repositories on Windows that use text files.
Maybe we could do some additional check on Windows that would allow to verify that given file is unchanged even if the sizes do not match?
The text was updated successfully, but these errors were encountered: