Skip to content

Keep Research Data Private

cpfaff edited this page May 20, 2013 · 3 revisions

If you put primary research data to the folder datasafe in your repository and add it via git add data/my_research_data.csv it will get public if you push it to the remote repository. To prevent this you can exclude such files by adding a line into your repositories .gitignore file like the one below.

  • Open the .gitignore file in any editor

vim /path/to/your/paper_writing_folder/Open-Science-Paper/.gitignore

  • Add a path to the file or folder you like to exclude one file

usr/statistics/datasafe/my_research_data.csv

  • Or use wildcards to exclude a bunch of files

usr/statistics/datasafe/*.csv

To ignore a file like this has the disadvantage that other authors you collaborate with do not get the file automatically when they update and merge their local branches. This can be a problem when you add a data file and make changes to the document which require the data in place to compile. If you push this changes to the remote repository and others update their branches they will no longer be able to compile the document until they get the file from you which is missing. For projects where the data does not change during the process of writing this is not a big problem. You can distribute the files once per e-mail or whatever other channel of communication you decide.

But if projects authors add or change data files regularly which are not supposed to be public, the mentioned distribution of files over an additional communication channel can get a pain in the hat. For these projects it might be a good alternative to turn their repository into a private one. This costs a small monthly amount of money but excludes the public completely from the repository.

If you are worried about to put your data on a foreign server in general you can also setup your own GitHub like server for collaboration. There is a good open source alternative which you can find here. This takes a bit more effort to get your collaboration working but your data is completely in your hands.