I’ve recently been spending a lot of my time working on more open source projects with this I’ve had to learn a couple of new things that you typically don’t when you are only contributing to proprietary work. Specifically caring for the commits in your Pull Requests.
I’ve always thought my git
workflow was fairly decent; I’d cut a branch off
master
then do my work and commit into a single --amend
ing anything into
that first commit, when I was ready, I would put in a Pull Request (PR). This
workflow seemed decent, I wasn’t pushing my history into the mainline, and I
was making it easy to revert, but this wasn’t good enough.
This all came up with working on a couple features for the AWS IAM Authenticator. Nick Turner and Matt Landis from the Amazon EKS team have both been extremely helpful and patient with getting the history just right and after a few code reviews of different pull requests I think I’m starting to get use to the approach.
The Idea
The git history for each pull request should be both kept to a minimum number of commits, in an order that makes sense, and lastly in a way that makes reviewing the work very easy. To explain this more take for example a pull request that added one new feature, the feature had no external dependencies and was 100% “handwritten”. This feature and PR would be represented by a single commit.
commit e678906aadb977a8e9161a3213213f5d62c2b05c (HEAD -> feature/125-version-command, christopherhein/feature/125-versi
on-command)
Author: Christopher Hein <me@christopherhein.com>
Date: Tue Aug 7 15:30:05 2018 -0700
Adding `version` subcommand
**Why:**
* allows you to echo the version of the build, uses built-in
`goreleaser` `ldflags` to get the latest release information
**This change addresses the need by:**
* closes #125
Signed-off-by: Christopher Hein <me@christopherhein.com>
And the diff for this commit didn’t add any additional components. Perfect use case for a single commit PR.
Now to give an example of a multiple commit PR. If you flipped that last PR and
said it did install additional packages and because we use go
and vendor our
packages after we have written the code we have a status that looks like this:
## feature/125-version-command
M .goreleaser.yaml
M Gopkg.lock
?? cmd/aws-iam-authenticator/version.go
?? vendor/github.com/someproject/
📎 In this example we have added an additional package in vendor
and
updated the .lock
file which manages installed packages. This is a great use
case for multiple commits, because I didn’t write the vendored code and the
dep
project updated my Gopkg.lock
file for me so the only real review
someone should do would be to delete the vendored package and verify that dep
ensure
continues to return the same files. In this instance you would make two
commits one for the vendored additions and one for the code you wrote. Like so:
commit 36f80d6d1f4e85992de30b288928e6c1bb714b3d (HEAD -> feature/125-version-command, christopherhein/feature/125-version-command)
Author: Christopher Hein <me@christopherhein.com>
Date: Tue Aug 7 15:30:05 2018 -0700
Adding `version` subcommand
**Why:**
* allows you to echo the version of the build, uses built-in
`goreleaser` `ldflags` to get the latest release information
**This change addresses the need by:**
* closes #125
Signed-off-by: Christopher Hein <me@christopherhein.com>
commit f19cf04c3eeb770f564660a6c89f54bdfc18e08d
Author: Christopher Hein <me@christopherhein.com>
Date: Tue Aug 7 15:29:42 2018 -0700
Adding `go-version` vendored package
Signed-off-by: Christopher Hein <me@christopherhein.com>
If you had multiple parts that are generated and not originally authored in full
by you, maybe this is a generated client based on a types.go
file and the
Kubernetes code generation libraries processing them this would be treated the
same way separate the types.go
file and anything else you wrote to set up the
code gen with then a second commit for the generated code.
In Practice
Now we all have our ways to saving our work along the way, some you .bak
files
some ignore it, and some use git
throughout the development lifecycle,
constantly commit-ing locally in-case they need to step back. This is the most
common experience I am seeing. So let’s break this down and show you have to
rewrite your git
history and start rebuild it with only the changes that are
needed in each commit.
Rebase Is Really Powerful
So in that example imagine you were continuously commit-ing into your local copy and ended up with something that looks like this.
commit 1f231401246e00721f153dfb8ddc12e6407605f7 (HEAD -> feature/25-rewriting-git-history)
Author: Christopher Hein <me@christopherhein.com>
Date: Sun Sep 23 00:14:28 2018 -0700
Yay, it works!
commit d8479321716d937e6ff62f9f7e6bc923b1f687c1
Author: Christopher Hein <me@christopherhein.com>
Date: Sun Sep 23 00:14:07 2018 -0700
Think I have it
commit d522a8931ff92a4d45b9c8582d84343566653065
Author: Christopher Hein <me@christopherhein.com>
Date: Sun Sep 23 00:13:56 2018 -0700
Trying Again
commit b3dd3b8df5efa20179552f796627a4609c588074
Author: Christopher Hein <me@christopherhein.com>
Date: Sun Sep 23 00:13:42 2018 -0700
testing
All four of these commits representing writing some new code which includes
dependencies. First and for most reset
and rebase
can be your friend if
these aren’t in your development workflow, they should be. Let’s first reset
everything so that we have 2 commits but still all the changes. For this we use
reset
.
First grab the SHA of the first commit of this feature. for us that is
b3dd3b8df5efa20179552f796627a4609c588074
with this we’ll run the following.
git reset --soft b3dd3b8df5efa20179552f796627a4609c588074^
📎 Note the ^
which will actually reset that as the head state,
meaning this commit won’t exist anymore, if you did with without the ^
you
would see the b3dd3b8df5efa20179552f796627a4609c588074
commit still in your
history but nothing else.
Once we have our code in this state we’ll see that using git status
all our
files are still staged meaning if we try to make a commit all files will be
added, so let’s reset
again without --soft
to unstage.
git reset
Now we can start to reconstruct your history, commit-ing the generated code first and then the “handwritten” code.
git add vendor/
git commit -m "adding vendored code for X"
Then we’ll commit our “handwritten” code separating them to make code reviewable much easier.
git add cmd/aws-iam-authenticator/version.go
git commit -m "adding go-version"
Fantastic, now you have a rewritten history and you go and push it up and the
reviewers ask you to change something, does that mean you have to go through
this full workflow again? No! We can continue to --amend
our commits to the
previous commit. Let’s try it. Give a git status of
Updating a Previously “Perfect History”
## feature/125-version-command
M cmd/aws-iam-authenticator/version.go
We can use --amend
to mutate the previous commit.
git add cmd/aws-iam-authenticator/version.go
git commit --amend
This will open the previous text in a vim
session and you can just save and
close the file to apply.
Now that you have the commit on the top of your stack updated, let’s push it
your remote origin to update the Pull Request. To do this we use push
but we
pass an additional flag --force-with-lease
this will update your remote origin
and apply the new history rewriting the existing. It is a safer alternative to
--force
. --force-with-lease
will force push to your remote branch only if
the HEAD hasn’t changed on the remote since your rewriting.
git push --force-with-lease <origin> <branch>
This can be a lot to write so I use a git alias
.
git config --global alias.pf "push --force-with-lease"
Without updates in place the reviewers then give you the feedback that the
package you have vendored needs to be updated as well. Does this mean you have
to start the whole workflow again? No! You can actually make a new commit with
the updates the rebase
them into the correct commit. To do this we need to
make a commit with the changes to the vendor/
.
Rebasing in Additional Commits
git commit -m "vendor commit to be squashed"
Once you have made this commit you can actually get the SHA of the adding
vendored code for X
commit and run rebase -i
to rebase
the commits
interactively.
git rebase -i c55ed6ab270cb5c724cacf1feacde0c6998d42a3^
📎 Note the ^
if you do not have this your commit won’t appear in the
output of the command and you won’t be able to squash
it down.
The output of the above command should return a vim
session with 3 lines.
pick 2e05ad51 adding vendored code for X
pick 53d69ae4 adding go vendor
pick e3d1bd65 vendor commit to be squashed
With this vim
session we’re going to reorder the lines to the structure the
way we want the commits to list in and then we want to squash
the last commit
with the first line. To do this our file should look like.
pick 2e05ad51 adding vendored code for X
squash e3d1bd65 vendor commit to be squashed
pick 53d69ae4 adding go vendor
📎 Note the squash
instead of pick
for the latest commit.
By reordering the lines we can put the most recent vendor additions on top of
the last vendor additions then we can squash
them by changing the pick
to
squash
when you save and close this file you will be taking to a vim
session
with an aggregated view of the commits here you can remove the vendor commit
to be squashed
leaving the original commit as adding vendored code for X
.
Once that processes you can git push --force-with-lease
or if you are like me
you can git pf
and everything will be in-sync on the remote origin.
Conclusion
This topic is vast, no matter if you are working on a private project or open source. Following these methods you make it simple to review code for your reviews, make the review process even more error prone and allow the history to actually make sense. Here are a handful of links to dive deeper into:
- squashing commits with rebase
- –force considered harmful; understanding git’s –force-with-lease
- reorder commits with rebase
- Kubernetes Workflow
- How to Write a Git Commit Message
If you want to learn more git hacks and workflows reach out @christopherhein on Twitter.