Understanding old git branches
I will be refactoring and reorganising QuTiP’s internal data structures, a large task that was previously attempted by someone else but one that never quite got completed and lives in a disused branch on their fork. In the intervening year or so, the codebase has moved on significantly, so GitHub now sounds the death knell
This branch is 85 commits ahead, 366 commits behind qutip:master.
I want to know what changes they had made, without being inundated by unrelated changes on the master
branch.
Let’s assume that the old branch of interest is called old-feature
and lives in a forked repository which I have added as a remote called fork
.
Getting the diff
The tool for showing changes is, predictably, git diff
. I cannot just do a standard call to
git diff qutip/master fork/old-feature
because I end up with 201 files changed, 14255 insertions, 17882 deletions
,
since I also see all the changes that master
made as well. Instead, I want to
see all the changes that happened on fork/old-feature
since it diverged. I
can find the hash of this commit by using git merge-base
with
git merge-base qutip/master fork/old-feature
so I can combine these two to get a more useful diff with
git diff $(git merge-base qutip/master fork/old-feature) fork/old-feature
In fact, this is such a useful feature that there is even a short-hand for it:
diff’s triple-dot notation
git diff qutip/master...fork/old-feature
Searching the commit history
Using git diff
I can see the total of all the changes to the code, but there
may also be some useful information stored in the commit messages. These are
accessed, as always, through git log
.
A quick glance at the manpages (sidenote: git
subcommands’ manpages are
accessed by hyphenated the command together, such as man git-log
) tells us
that git log
understands a similar-looking two-dot (..
) and three-dot
(...
) syntax to what we just used in git diff
. Here we must be very
careful; git diff
and git log
treat the dots almost completely conversely to
each other.
The three-dot form here is called the “symmetric difference” of two references.
The set branch-a...branch-b
now means all the commits that are ancestors of
branch-a
or branch-b
, but not both. This means that we will get all the
commits which happened on either branch since the two were split from each
other. In other words, this is what git diff
was doing before we put in the
triple dots!
Instead, we want the two-dot form, as in branch-a..branch-b
. This form is the
“range” notation, and means all commits which are ancestors of branch-b
but
not of branch-a
. This way we only see the changes on branch-b
since it was
split off, even if branch-a
also changed.
Our command to see the commits made only on the old branch, then, is
git log qutip/master..fork/old-feature
As always, I can use the pathspec
arguments to git log
to limit the commits
to only the files I ask for, so if I only want to look at changes to the tests,
I can run
git log qutip/master..fork/old-feature -- qutip/tests
Here the --
is a command-line switch which git
(and many Unix utilities) use
to separate out options from files. This is clearly useful in this case,
because the directory qutip/tests
looks a lot like the branch qutip/master
!
Now that I’ve seen the code changes and some explanation of the thought process and history behind those changes as they were being made, hopefully I’ll not need to reinvent the wheel so much when trying to implement it myself!