I’ve recently been spending a lot of my time working on more open source projects with this I’ve had to learn a couple of new things that you typically don’t when you are only contributing to proprietary work. Specifically caring for the commits in your Pull Requests.
I’ve always thought my
git workflow was fairly decent; I’d cut a branch off
master then do my work and commit into a single
--amending anything into
that first commit, when I was ready, I would put in a Pull Request (PR). This
workflow seemed decent, I wasn’t pushing my history into the mainline, and I
was making it easy to revert, but this wasn’t good enough.
This all came up with working on a couple features for the AWS IAM Authenticator. Nick Turner and Matt Landis from the Amazon EKS team have both been extremely helpful and patient with getting the history just right and after a few code reviews of different pull requests I think I’m starting to get use to the approach.
The git history for each pull request should be both kept to a minimum number of commits, in an order that makes sense, and lastly in a way that makes reviewing the work very easy. To explain this more take for example a pull request that added one new feature, the feature had no external dependencies and was 100% “handwritten”. This feature and PR would be represented by a single commit.
commit e678906aadb977a8e9161a3213213f5d62c2b05c (HEAD -> feature/125-version-command, christopherhein/feature/125-versi on-command) Author: Christopher Hein <email@example.com> Date: Tue Aug 7 15:30:05 2018 -0700 Adding `version` subcommand **Why:** * allows you to echo the version of the build, uses built-in `goreleaser` `ldflags` to get the latest release information **This change addresses the need by:** * closes #125 Signed-off-by: Christopher Hein <firstname.lastname@example.org>
And the diff for this commit didn’t add any additional components. Perfect use case for a single commit PR.
Now to give an example of a multiple commit PR. If you flipped that last PR and
said it did install additional packages and because we use
go and vendor our
packages after we have written the code we have a status that looks like this:
## feature/125-version-command M .goreleaser.yaml M Gopkg.lock ?? cmd/aws-iam-authenticator/version.go ?? vendor/github.com/someproject/
📎 In this example we have added an additional package in
.lock file which manages installed packages. This is a great use
case for multiple commits, because I didn’t write the vendored code and the
dep project updated my
Gopkg.lock file for me so the only real review
someone should do would be to delete the vendored package and verify that
ensure continues to return the same files. In this instance you would make two
commits one for the vendored additions and one for the code you wrote. Like so:
commit 36f80d6d1f4e85992de30b288928e6c1bb714b3d (HEAD -> feature/125-version-command, christopherhein/feature/125-version-command) Author: Christopher Hein <email@example.com> Date: Tue Aug 7 15:30:05 2018 -0700 Adding `version` subcommand **Why:** * allows you to echo the version of the build, uses built-in `goreleaser` `ldflags` to get the latest release information **This change addresses the need by:** * closes #125 Signed-off-by: Christopher Hein <firstname.lastname@example.org> commit f19cf04c3eeb770f564660a6c89f54bdfc18e08d Author: Christopher Hein <email@example.com> Date: Tue Aug 7 15:29:42 2018 -0700 Adding `go-version` vendored package Signed-off-by: Christopher Hein <firstname.lastname@example.org>
If you had multiple parts that are generated and not originally authored in full
by you, maybe this is a generated client based on a
types.go file and the
Kubernetes code generation libraries processing them this would be treated the
same way separate the
types.go file and anything else you wrote to set up the
code gen with then a second commit for the generated code.
Now we all have our ways to saving our work along the way, some you
some ignore it, and some use
git throughout the development lifecycle,
constantly commit-ing locally in-case they need to step back. This is the most
common experience I am seeing. So let’s break this down and show you have to
git history and start rebuild it with only the changes that are
needed in each commit.
Rebase Is Really Powerful
So in that example imagine you were continuously commit-ing into your local copy and ended up with something that looks like this.
commit 1f231401246e00721f153dfb8ddc12e6407605f7 (HEAD -> feature/25-rewriting-git-history) Author: Christopher Hein <email@example.com> Date: Sun Sep 23 00:14:28 2018 -0700 Yay, it works! commit d8479321716d937e6ff62f9f7e6bc923b1f687c1 Author: Christopher Hein <firstname.lastname@example.org> Date: Sun Sep 23 00:14:07 2018 -0700 Think I have it commit d522a8931ff92a4d45b9c8582d84343566653065 Author: Christopher Hein <email@example.com> Date: Sun Sep 23 00:13:56 2018 -0700 Trying Again commit b3dd3b8df5efa20179552f796627a4609c588074 Author: Christopher Hein <firstname.lastname@example.org> Date: Sun Sep 23 00:13:42 2018 -0700 testing
All four of these commits representing writing some new code which includes
dependencies. First and for most
rebase can be your friend if
these aren’t in your development workflow, they should be. Let’s first reset
everything so that we have 2 commits but still all the changes. For this we use
First grab the SHA of the first commit of this feature. for us that is
b3dd3b8df5efa20179552f796627a4609c588074 with this we’ll run the following.
git reset --soft b3dd3b8df5efa20179552f796627a4609c588074^
📎 Note the
^ which will actually reset that as the head state,
meaning this commit won’t exist anymore, if you did with without the
would see the
b3dd3b8df5efa20179552f796627a4609c588074 commit still in your
history but nothing else.
Once we have our code in this state we’ll see that using
git status all our
files are still staged meaning if we try to make a commit all files will be
added, so let’s
reset again without
--soft to unstage.
Now we can start to reconstruct your history, commit-ing the generated code first and then the “handwritten” code.
git add vendor/ git commit -m "adding vendored code for X"
Then we’ll commit our “handwritten” code separating them to make code reviewable much easier.
git add cmd/aws-iam-authenticator/version.go git commit -m "adding go-version"
Fantastic, now you have a rewritten history and you go and push it up and the
reviewers ask you to change something, does that mean you have to go through
this full workflow again? No! We can continue to
--amend our commits to the
previous commit. Let’s try it. Give a
git status of
Updating a Previously “Perfect History”
## feature/125-version-command M cmd/aws-iam-authenticator/version.go
We can use
--amend to mutate the previous commit.
git add cmd/aws-iam-authenticator/version.go git commit --amend
This will open the previous text in a
vim session and you can just save and
close the file to apply.
Now that you have the commit on the top of your stack updated, let’s push it
your remote origin to update the Pull Request. To do this we use
push but we
pass an additional flag
--force-with-lease this will update your remote origin
and apply the new history rewriting the existing. It is a safer alternative to
--force-with-lease will force push to your remote branch only if
the HEAD hasn’t changed on the remote since your rewriting.
git push --force-with-lease <origin> <branch>
This can be a lot to write so I use a
git config --global alias.pf "push --force-with-lease"
Without updates in place the reviewers then give you the feedback that the
package you have vendored needs to be updated as well. Does this mean you have
to start the whole workflow again? No! You can actually make a new commit with
the updates the
rebase them into the correct commit. To do this we need to
make a commit with the changes to the
Rebasing in Additional Commits
git commit -m "vendor commit to be squashed"
Once you have made this commit you can actually get the SHA of the
vendored code for X commit and run
rebase -i to
rebase the commits
git rebase -i c55ed6ab270cb5c724cacf1feacde0c6998d42a3^
📎 Note the
^ if you do not have this your commit won’t appear in the
output of the command and you won’t be able to
squash it down.
The output of the above command should return a
vim session with 3 lines.
pick 2e05ad51 adding vendored code for X pick 53d69ae4 adding go vendor pick e3d1bd65 vendor commit to be squashed
vim session we’re going to reorder the lines to the structure the
way we want the commits to list in and then we want to
squash the last commit
with the first line. To do this our file should look like.
pick 2e05ad51 adding vendored code for X squash e3d1bd65 vendor commit to be squashed pick 53d69ae4 adding go vendor
📎 Note the
squash instead of
pick for the latest commit.
By reordering the lines we can put the most recent vendor additions on top of
the last vendor additions then we can
squash them by changing the
squash when you save and close this file you will be taking to a
with an aggregated view of the commits here you can remove the
to be squashed leaving the original commit as
adding vendored code for X.
Once that processes you can
git push --force-with-lease or if you are like me
git pf and everything will be in-sync on the remote origin.
This topic is vast, no matter if you are working on a private project or open source. Following these methods you make it simple to review code for your reviews, make the review process even more error prone and allow the history to actually make sense. Here are a handful of links to dive deeper into:
- squashing commits with rebase
- –force considered harmful; understanding git’s –force-with-lease
- reorder commits with rebase
- Kubernetes Workflow
- How to Write a Git Commit Message
If you want to learn more git hacks and workflows reach out @christopherhein on Twitter.