Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement optional dos2unix feature #5246

Closed

Conversation

MetalBlueberry
Copy link
Contributor

Hi again =)

Related to #4658, I've implemented the option to disable dos2unix fron the dvc configuration file.

We are starting to use dvc internally, but we are facing issues with pdf or image/tiff formats because they are accidentally identified as text. This means that every /r/n is replaced to /n and the md5 hash calculation no longer matches the file content.

We rolled out our own version of dvc internally with this patch, but it is hard to explain to people that they have to install it from another source. For that reason, I've invested a little bit more time to properly add the parameter and submit this PR. Hopefully, someone else will be able to enjoy this feature.

By default, the parameter dos2unix under core configuration will be True. So dvc will behave as usual for everyone. In case someone notices errors due to the dos2unix feature, they can simply set dos2unix to False.

I know that changing this value in an already existing repo may have weird issues, like dvc detecting changes on unchanged files. I think this won't be a usual case because either you have problems and disable it, or you never notice that this option exists. Usually, this value will be set at project creation and never modified again.

I will write the documentation page, but first, I would like to have your feedback.

Thanks for your support!

@MetalBlueberry MetalBlueberry changed the title implement optional to_unix feature Implement optional dos2unix feature Jan 11, 2021
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, thank you!

Though this makes me think if we should indeed do the next step and just switch to no-dos2unix by default in 2.0, to not waste this chance. @pmrowla has been working on the new way to store data more efficiently, and we though that maybe it is better to wait for that to do this switch, but that big feature will likely become default only in 3.0 :(

If we want to do the switch in 2.0, we'll likely need to put a cache version/schema file in cache/remote, use it to verify that cache/remote is of expected version and provide a trivial migration script.

@efiop
Copy link
Contributor

efiop commented Jan 18, 2021

Just an update: @pmrowla is working on dropping CRLF for dvc 2.0. Closing this PR for now.

@efiop efiop closed this Jan 18, 2021
@MetalBlueberry
Copy link
Contributor Author

@efiop, Can you leave a reference to the @pmrowla work? My team will need to understand the new storage mechanism as we are maintaining our own server implementation to parse dvc files.
Thank you!

@efiop
Copy link
Contributor

efiop commented Jan 18, 2021

@MetalBlueberry Sure, we use #4658 to track progress on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants