Skip to content

Contributing Format Samples Without Needing to Know the Command Line

Ross Spencer edited this page Nov 3, 2017 · 14 revisions

Contributing to the Open Preserve file format corpus helps the community continue to test its format based tool corpus, e.g. format identification, or validation. The corpus of files can also be used to test repository workflows from transfer through to SIP (submission information package) and AIP (archival information package) generation.

Contributing can be done via the command line through tools such as Git, but it isn't always feasible for someone to have this installed, e.g. in government environments. It isn't always feasible for everyone to know how-to.

Fortunately it is all possible via GitHub itself.

Pre-requisites

  • A GitHub account
  • You know what GitHub is
  • You might know what Git is
  • A set of files to upload that are well understood and safe for others to use
  • A willingness to waive all rights to your contribution using a CC0 License

Steps to Follow

Step 1:

Summary: Locate the format-corpus repository. It is where we will begin our journey.

If you're reading this Wiki then you've pretty much already completed Step 1. Find the Open Preserve Format Corpus:

OPF Format Corpus

Step 2:

Summary: Fork the repository to our own GitHub account. This is the way GitHub manages submissions to a project from an external source. This way we don't have to be part of the Open Preservation Foundation but we can still contribute. We can make also additions to the repository without impacting anyone else using the format-corpus.

Now you have to fork the repository into your existing, or new, GitHub account. A fork is like cloning, except the folks at Git already used the word clone for something else!

Choose to clone the repository

GitHub will give you a choice of repositories to fork into if you have more than one.

Choose your repository

Confirmation can be seen that the repository has been forked when you look in the top-left corner of the user interface. It will read:

 forked from openpreserve/format-corpus

Confirmation that the repository has been forked

Step 3:

Summary: Create a 'branch' in your newly forked repository for each discrete set of files that you want to contribute. In most cases, this will just be a small one-off collection.

The process we are following is called 'Creating a Pull Request' and it is central to open source development using Git and GitHub. In fact! Once you crack this, you'll be one step closer to contributing to a whole host of projects!

Key to a pull request is a 'branch'. A branch is where all work related to a single pull request is saved (committed). Sometimes this isn't always feasible, but for our format corpus you'll see there should be little or no issue.

First, select the branch drop-down menu, and provide a name. Here we've gone for:

 disk-images/hfs/artefactual

It encapsulates:

 [what]/[what in more detail]/[whom]

Creating and naming a branch

Confirmation of the new branch is that GitHub switches to it automatically (see the same drop-down) and a blue banner will flash up at the top of the interface.

Confirmation of the branch being created

If you're happy the branch has been created, click the button that reads 'Create new file'.

Step 4

Summary: Now that we've set up a new branch, we have to create a place to put our files. To do this, we will create a README.md file with information about what we're submitting. Once we've done this we will also be able to attach our file, or files.

We need somewhere to add our files on this branch. Let's create a structure with the following path:

 disk-images/hfs/artefactual/README.md

After clicking 'Create new file' we can type it in. Directory names will appear in blue next to the input-box after each slash.

Initiate a new README.md

The name README.md is the first file that we're creating. We CANNOT add our sample-files without a README.md (or other simple file like this). We have added some information to it in the same step.

There are three reasons for this:

  1. README.md is rendered by GitHub so it provides immediate context for users accessing a repository ('.md' stands for Markdown and so it can be formatted using Markdown syntax).
  2. We can add contextual information that will help users to reuse our files.
  3. We have explicitly waived our rights to these files so that others can reuse them.

Once you've typed this information, and typed a commit message (also highlighted). Click 'Commit changes'.

A commit message is programmer speak for adding a provenance note or audit trail to the files we're working with. It helps them to see where something has changed, and how it was changed in the past if that information is ever needed.

Step 5

Summary: We have a small area to upload files to now. Let's do that!

Now we have this new structure to our cloned, and branched, repository, we can add our sample files to that location. In the image below I am adding two disk-images on behalf of Artefactual.

Adding two files to the repository

Uploads might be quite familiar to many through other tools like Dropbox or Google Drive. We just drag-and-drop from our computer.

GitHub has a flashy progress bar to help us see how we're doing. Like before, add a commit message and click 'Commit changes'.

Step 6

Summary: GitHub is going to help us. It knows that we might want to submit a pull request so let's click one of those options to see what happens.

We're almost done. The process up until now will help GitHub recognize that we've changed the state of this repository (done something to it), and now we might want to submit a pull request.

The most helpful thing GitHub does for us is provide a multitude of mechanisms to submit a pull request. The big green button in the middle reading 'Compare & pull request' and the little gray button next to our branch drop-down reading 'New pull request'.

The tab reading 'Pull requests (0)' is for requests against this repository which is technically ours even though it is a full copy of the Open Preserve one. It is unlikely this will ever have a positive count next to it for this copy. If it does, explore it. If it is useful, consider how to work with it. If not, GitHub allows you to communicate with the person who submitted it - you can let them know the right place to go.

Pull request options

Click 'Compare & pull request'.

Step 7

Summary: Now we're going to review what we've done to the repository so far and submit it for consideration to the Open Preserve Format-Corpus.

The 'Open a pull request' dialog will open and there's a lot to look at:

Opening a pull request

There is room here to:

  • Add a title for your pull request.
  • Add a message annotating your changes.
  • Review your changes.

NB: GitHub has made this as easy as possible, but if you don't see the same options, or your pull-request isn't showing, maybe check out the options under:

'Create a new pull request by comparing changes across two branches. If you need to, you can also compare across forks.'

Once you're happy what you see is what you've done (a common gotcha might be accidentally doing more on the branch than anticipated).

Click 'Create pull request'.

NB: We're also working under the premise that your changes will result in the message (in the image above) 'Able to be merged'. It is unlikely that this will not be the case, but if it does happen, it might mean you've changed a part of the repository that has since been changed by someone else. It might also mean that your changes are not atomic enough, that is, much broader in scope than perhaps they should be. Consider reviewing the work done in the original repository since your clone (and maybe cloning the repository again fresh). Or consider splitting your pull-request into smaller ones.

Step 8

Summary: Let's take a look again at our submission in the original repository. We can use this page in GitHub to communicate with the repository maintainers.

Finally, you can see your pull request in the original repository. Remember the tab 'Pull requests (0)' well, you've added a one to whatever that number was.

A complete pull request

Pretty cool eh?

Step 9

Summary: Keep an eye on your emails! If the maintainer has any questions you'll see an update there. importantly, when the pull-request is accepted, you'll get an update too - the process will be complete.

We're not done just yet!

The maintainer of the repository will take a look at your submission and attempt to merge it with the original. There may be further communications or a note of thanks. Keep an eye on your emails to keep up to date with the work and continue to participate if needed.

Et voila.

We've just done our bit for the community and the tools we're all using.

Well done!

*GIF is a #GIFITUP2016 entry by Lorena Colme (Rosa Fiori) from Lavis, Trento, Italy. Source material from the Yale University Art Gallery via ArtStor via DPLA.