Skip to content

Latest commit

 

History

History
698 lines (540 loc) · 20.1 KB

intro_to_git.md

File metadata and controls

698 lines (540 loc) · 20.1 KB

Introduction to Git

The materials below have been adapted from the excellent lessons by the Software Carpentry, which they have generously made available through the CC BY 4.0 license. For each of the sections, we encourage you to visit Software Carpentry's original lesson page for more in-depth content. Our workshop lessons below are based on their lessons 2 to 6.

  1. Setting Up Git
  2. Creating a Repository
  3. Tracking Changes
  4. A Commit Workflow (2nd half of SWC's Tracking Changes)
  5. Exploring History
  6. Ignoring Things

Setting Up Git

When we use Git on a new computer for the first time, we need to configure a few things. Below are a few examples of configurations we will set as we get started with Git:

  • our name and email address,
  • what our preferred text editor is,
  • and that we want to use these settings globally (i.e. for every project).

On a command line, Git commands are written as git verb options, where verb is what we actually want to do and options is additional optional information which may be needed for the verb. So here is how you would look up your global setting (you will get an error initially):

$ git config --global --list
user.name=Na-Rae Han
user.email=naraehan@gmail.com
core.editor=nano

Your name and email will need setting up. Commands to set them on your machine:

$ git config --global user.name "Henry Higgins"
$ git config --global user.email "profhiggins@oxford.edu"

Please use your own name and email address instead of Prof. Higgins's. This user name and email will be associated with your subsequent Git activity, which means that any changes pushed to online git host servers such as GitHub in a later lesson will include this information.

One additional detail: your default editor. If you followed our installation instruction, it should already be set to nano or your own favorite text editor. If that is not the case, reset it as shown below.

$ git config --global core.editor "nano"

Lastly, if you forget a git command, you can access the list of commands by using -h and access the Git manual by using --help:

$ git config -h

Creating a Repository

Once Git is configured, we can start using it. First, let's create a directory in Desktop folder for our work and then move into that directory:

$ cd ~/Desktop
$ mkdir languages
$ cd languages

Then we tell Git to make languages a repository -- a place where Git can store versions of our files:

$ git init

It is important to note that git init will create a repository that includes subdirectories and their files -- there is no need to create separate repositories nested within the languages repository, whether subdirectories are present from the beginning or added later. Also, note that the creation of the languages directory and its initialization as a repository are completely separate processes.

If we use ls to show the directory's contents, it appears that nothing has changed:

$ ls

But if we add the -a flag to show everything, we can see that Git has created a hidden directory within languages called .git:

$ ls -a
.	..	.git

Git uses this special sub-directory to store all the information about the project, including all files and sub-directories located within the project's directory. If we ever delete the .git sub-directory, we will lose the project's history.

We can check that everything is set up correctly by asking Git to tell us the status of our project:

$ git status
# On branch master
#
# Initial commit
#
nothing to commit (create/copy files and use "git add" to track)

Tracking Changes

Let's create a file called zulu.txt that contains some notes about the language. I'll use nano to edit the file; you can use your favorite plain-text editor if you have one. (Note: It does not have to be the core.editor you set globally earlier.)

$ nano zulu.txt

An editor window will open up. Type the text below into the zulu.txt file:

belongs to the Bantu language family

To save and exit nano, hit Ctrl+X, and then y to save. zulu.txt now contains a single line, which we can view by running the cat ("concatenate") command:

$ cat zulu.txt
belongs to the Bantu language family

If we check the status of our project again, Git tells us that it’s noticed the new file:

$ git status
On branch master

Initial commit

Untracked files:
   (use "git add <file>..." to include in what will be committed)

	zulu.txt
nothing added to commit but untracked files present (use "git add" to track)

The "untracked files" message means that there's a file in the directory that Git isn't keeping track of. We can tell Git to track a file using git add:

$ git add zulu.txt

and then check that the right thing happened:

$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   zulu.txt

Git now knows that it's supposed to keep track of zulu.txt, but it hasn't recorded these changes yet. To get it to do that, we need to run git commit:

$ git commit -m "started notes on Zulu language"
[master (root-commit) f22b25e] started notes on Zulu language
 1 file changed, 1 insertion(+)
 create mode 100644 zulu.txt

When we run git commit, Git takes everything we have told it to save by using git add and stores a copy permanently inside the special .git directory. This permanent copy is called a commit and its short identifier is f22b25e in this example.

We use the -m flag (for "message") to record a short, descriptive, and specific comment that will help us remember later on what we did and why. If we just run git commit without the -m option, Git will launch nano (or whatever other editor we configured as core.editor) so that we can write a longer message.

If we run git status now:

$ git status
On branch master
nothing to commit, working directory clean

it tells us everything is up to date. If we want to know what we've done recently, we can ask Git to show us the project's history using git log:

$ git log
commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Henry Higgins <profhiggins@oxford.edu>
Date:   Thu Aug 22 09:51:46 2018 -0400

    started notes on Zulu language

git log lists all commits made to a repository in reverse chronological order. The listing for each commit includes the commit's full identifier (which starts with the same f22b25e), the commit's author, when it was created, and the log message Git was given when the commit was created.

Now suppose Prof. Higgins adds more information to the file:

$ nano zulu.txt
$ cat zulu.txt
belongs to the Bantu language family
spoken in South Africa

When we run git status now, it tells us that a file it already knows about has been modified:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   zulu.txt

no changes added to commit (use "git add" and/or "git commit -a")

The last line is the key phrase: "no changes added to commit". We have changed this file, but we haven't told Git we will want to save those changes (which we do with git add) nor have we actually saved them (which we do with git commit). Before taking those steps, it is good practice to always review our changes. We do this using git diff. This shows us the differences between the current state of the file and the most recently saved version:

$ git diff
diff --git a/zulu.txt b/zulu.txt
index df0654a..315bf3a 100644
--- a/zulu.txt
+++ b/zulu.txt
@@ -1 +1,2 @@
 belongs to the Bantu language family
+spoken in South Africa

The output is cryptic, but to break it down into pieces:

  1. The first line tells us that Git is producing output similar to the Unix diff command comparing the old and new versions of the file.
  2. The second line tells exactly which versions of the file Git is comparing; df0654a and 315bf3a are unique computer-generated labels for those versions.
  3. The third and fourth lines once again show the name of the file being changed.
  4. The remaining lines are the most interesting, they show us the actual differences and the lines on which they occur. In particular, the + marker in the first column shows where we added a line.

After reviewing our change, we go for committing:

$ git commit -m "added region information"
$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   zulu.txt

no changes added to commit (use "git add" and/or "git commit -a")

Whoops: Git won't commit because we didn't use git add first. Let's fix that. We add and then commit:

$ git add zulu.txt
$ git commit -m "added region information"
[master 34961b1] added region information
 1 file changed, 1 insertion(+)

Git insists that we add files to the set we want to commit before actually committing anything. Suppose we are working on multiple languages: say Zulu, Xhosa and Japanese, and just edited all of them to include region information. All these files then can be git-added and committed in one swoop, as a version that reflects the same update across all files. Additionally, staging allows us to commit our changes in stages and capture changes in logical portions rather than only large batches. For example, suppose we're adding a few citations to relevant research to our thesis. We might want to commit those additions, and the corresponding bibliography entries, but not commit some of our work drafting the conclusion (which we haven't finished yet).

To allow for this, Git has a special staging area where it keeps track of things that have been added to the current changeset but not yet committed. If you think of Git as taking snapshots of changes over the life of a project, git add specifies what will go in a snapshot (putting things in the staging area), and git commit then actually takes the snapshot, and makes a permanent record of it (as a commit). An illustration:

A Commit Workflow

Let's recap by trying another round, from start to finish. First, we'll add another line to the Zulu file, this time about word order:

$ nano zulu.txt
$ cat zulu.txt
belongs to the Bantu language family
spoken in South Africa
word order: SVO

Then see what's changed:

$ git diff
diff --git a/zulu.txt b/zulu.txt
index 315bf3a..b36abfd 100644
--- a/zulu.txt
+++ b/zulu.txt
@@ -1,2 +1,3 @@
 belongs to the Bantu language family
 spoken in South Africa
+word order: SVO

So far, so good: we've added one line to the end of the file (shown with a + in the first column). Now let's put that change in the staging area and see what git diff reports:

$ git add zulu.txt
$ git diff

It displays no difference, because the new changes have been added to the staging area. To show differences between the last commit and what's staged, we need to specify the --staged flag:

$ git diff --staged
diff --git a/zulu.txt b/zulu.txt
index 315bf3a..b36abfd 100644
--- a/zulu.txt
+++ b/zulu.txt
@@ -1,2 +1,3 @@
 belongs to the Bantu language family
 spoken in South Africa
+word order: SVO

Let's then save our changes through committing:

$ git commit -m "added word order info"
[master 005937f] added word order info
 1 file changed, 1 insertion(+)

check our status:

$ git status
On branch master
nothing to commit, working directory clean

and look at the history of what we've done so far. By now, our log has gotten longer, so you will likely be thrown into pagination. You can hit SPACE to page down, and q to quit:

$ git log
commit 005937fbe2a98fb83f0ade869025dc2636b4dad5
Author: Henry Higgins <profhiggins@oxford.edu>
Date:   Thu Aug 22 10:14:07 2018 -0400

    added word order info

commit 34961b159c27df3b475cfe4415d94a6d1fcd064d
Author: Henry Higgins <profhiggins@oxford.edu>
Date:   Thu Aug 22 10:07:21 2018 -0400

    added region information

commit f22b25e3233b4645dabd0d81e651fe074bd8e73b
Author: Henry Higgins <profhiggins@oxford.edu>
Date:   Thu Aug 22 09:51:46 2018 -0400

    started notes on zulu language

Exploring History

As we saw in the previous lesson, we can refer to commits by their identifiers. You can refer to the most recent commit of the working directory by using the identifier HEAD.

We've been adding one line at a time to zulu.txt, so it's easy to track our progress by looking, so let's do that using our HEADs. Before we start, let's make a change to zulu.txt, adding yet another line which unfortunately contains misinformation:

$ nano zulu.txt
$ cat zulu.txt
belongs to the Bantu language family
spoken in South Africa
word order: SVO
a close relative of Spanish

Now, let's see what we get.

$ git diff HEAD zulu.txt
diff --git a/zulu.txt b/zulu.txt
index b36abfd..0848c8d 100644
--- a/zulu.txt
+++ b/zulu.txt
@@ -1,3 +1,4 @@
 belongs to the Bantu language family
 spoken in South Africa
 word order: SVO
+a close relative of Spanish

which is the same as what you would get if you leave out HEAD. The real goodness in all this is when you can refer to previous commits. We do that by adding ~1 to refer to the commit one before HEAD.

$ git diff HEAD~1 zulu.txt

If we want to see the differences between older commits we can use git diff again, but with the notation HEAD~1, HEAD~2, and so on, to refer to them:

$ git diff HEAD~2 zulu.txt
diff --git a/zulu.txt b/zulu.txt
index df0654a..b36abfd 100644
--- a/zulu.txt
+++ b/zulu.txt
@@ -1 +1,4 @@
 belongs to the Bantu language family
+spoken in South Africa
+word order: SVO
+a close relative of Spanish

We could also use git show which shows us what changes we made at an older commit as well as the commit message, rather than the differences between a commit and our working directory that we see by using git diff.

$ git show HEAD~2 zulu.txt
commit 34961b159c27df3b475cfe4415d94a6d1fcd064d
Author: Henry Higgins <profhiggins@oxford.edu>
Date:   Thu Aug 22 10:07:21 2013 -0400

    started notes on zulu

diff --git a/zulu.txt b/zulu.txt
new file mode 100644
index 0000000..df0654a
--- /dev/null
+++ b/zulu.txt
@@ -0,0 +1 @@
+belongs to the Bantu language family

In this way, we can build up a chain of commits. The most recent end of the chain is referred to as HEAD; we can refer to previous commits using the ~ notation, so HEAD~1 means "the previous commit", while HEAD~123 goes back 123 commits from where we are now.

All right! So we can save changes to files and see what we've changed—now how can we restore older versions of things? We need that, as we realize Zulu is in fact not related to Spanish and decide to scrap that line. Checking git status tells us that the file has been changed, but those changes haven't been staged:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   zulu.txt

no changes added to commit (use "git add" and/or "git commit -a")

We can put things back to the state of last commit by simply using git checkout HEAD filename:

$ git checkout HEAD zulu.txt
$ cat zulu.txt
belongs to the Bantu language family
spoken in South Africa
word order: SVO

As you might guess from its name, git checkout checks out (i.e., restores) an old version of a file. In this case, we're telling Git that we want to recover the version of the file recorded in HEAD, which is the last saved commit. If we want to go back even further, we can use a commit identifier instead:

$ git checkout f22b25e zulu.txt
$ cat zulu.txt
belongs to the Bantu language family
$ git status
# On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#	modified:   zulu.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

Notice that the changes are on the staging area. If you decide to stick to this restored version, you will need to complete the process by committing, when the restored version will become the new HEAD. If not, you can go back to the last commit point using git checkout HEAD filename:

$ git checkout HEAD zulu.txt

Ignoring Things

What if we have files that we do not want Git to track for us, like backup files created by our editor or intermediate files created during data analysis? Let's create a few dummy files:

$ mkdir results
$ touch a.dat b.dat c.dat results/a.out results/b.out

and see what Git says:

$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	a.dat
	b.dat
	c.dat
	results/
nothing added to commit but untracked files present (use "git add" to track)

Putting these files under version control would be a waste of disk space. What's worse, having them all listed could distract us from changes that actually matter, so let's tell Git to ignore them.

We do this by creating a file in the root directory of our project called .gitignore:

$ nano .gitignore
$ cat .gitignore
*.dat
results/

These patterns tell Git to ignore any file whose name ends in .dat and everything in the results directory. (If any of these files were already being tracked, Git would continue to track them.)

Once we have created this file, the output of git status is much cleaner:

$ git status
On branch master
Untracked files:
  (use "git add <file>..." to include in what will be committed)

	.gitignore
nothing added to commit but untracked files present (use "git add" to track)

The only thing Git notices now is the newly-created .gitignore file. You might think we wouldn't want to track it, but everyone we're sharing our repository with will probably want to ignore the same things that we're ignoring. Let's add and commit .gitignore:

$ git add .gitignore
$ git commit -m "Ignore data files and the results folder."
$ git status
# On branch master
nothing to commit, working directory clean

As a matter of fact, the workshop's repository has its own .gitignore file, which is found here. You will see .ipynb_checkpoints and other common OS configuration files.