Rewriting Git History with Signed Commits
When we prepared the open source release of ABQ and the open source release of Captain, we decided to rewrite our git history.
We didn't want to make the existing issues and pull requests public, because we didn’t intend for the discussions and notes in them to be public. However, the commits contained references to pull requests, and we didn’t want those references to be incorrect. For example, this commit references pull request #111, but our new repository wouldn’t have that pull request.
commit f28181c271ff82f65a6fbad370187f7cd6852faa
Author: Ayaz <[email protected]>
Date: Thu Sep 8 09:19:01 2022 -0500
Adds TLS support (#111)
Additionally, we had been using a squash and merge strategy on our repository. At one point in time the default for that strategy was to include the pull request description in the commit message, and that unfortunately made some of our commit messages quite verbose and poorly formatted.
We decided we’d rewrite the commits to look like this:
commit a1fd708249a7c72baf16ae70784804124aedaf07
Author: Ayaz <[email protected]>
Date: Thu Sep 8 09:19:01 2022 -0500
Adds TLS support
Original PR: 111
Original Commit: f28181c271ff82f65a6fbad370187f7cd6852faa
The script to rewrite the commit history was fairly straightforward, but our team members have commit signature verification enabled, which posed a challenge. To ensure that we produced verified commits, we needed each person on our team to rewrite their respective commits. We ended up writing a multiplayer git history rewrite script, including signed commits. Each person ran the script like this:
# dan
$ ruby rewrite_git_history.rb abq [email protected]
# tommy
$ ruby rewrite_git_history.rb abq [email protected]
We all ran the script at the same time during standup one day, and a few minutes later, we had a completely new git history including signed commits!
If you want to skip to the completed code, here’s the full script as a gist.
High Level Process
The high level process we followed was to:
- iterate over the original commit history in reverse order (oldest commit first)
- build a new commit message by parsing and reformatting the original commit message
- use the git tree from the original commit with
git commit-tree
to make a new commit- the git tree specifies the file contents of the commit – using it ensured the contents of the commits were identical to the original commits
- maintain a mapping of old commit hashes to new commit hashes
- to make sure the script could resume, we also wrote some code to parse the mapping from any commits that had already been rewritten
- use the mapping of commit hashes to map the parent commits for each commit
- with the combination of consistent git trees and consistent commit parents, our commit history and graph would remain identical other than the new commit messages and signatures
To facilitate all of us running the script at the same time, the script:
- had each person pass in the email addresses associated with their commits
- one of us added the bot email addresses to our input
- if the next commit to be rewritten was theirs, we’d make a new commit using
git commit-tree
with the-S
option to produce a signed commit - if the next commit to be rewritten was somebody else’s, the script would periodically run a
git fetch
to see if the new commit had been pushed yet.
Here’s some commentary on the script implementation details.
Args and Global State
# pass in the repository as the first argument
REPO = ARGV[0] || raise("need repo")
# pass in a list of email addresses for
# commits this person should rewrite
EMAILS = ARGV[1...]
# we had to map some old email addresses to new email addresses
# to ensure the signing keys matched the email address on the commit
EMAIL_MAP = {
"[email protected]" => "[email protected]"
}
# maintain a mapping of old->new commits
MAPPED_COMMITS = {}
# a set of new commits
NEW_COMMITS = Set.new
The Main Loop
Dir.chdir("../#{REPO}") do
# make sure the repository is up to date!
`git fetch`
raise "fetch failed" unless $?.success?
# get a list of commit hashes (%H) with oldest first
commits = `git log --reverse --pretty='%H' origin/main`.split("\n")
raise "git log command failed" unless $?.success?
# check to see if the main-oss branch has been created yet
# we have to push the first commit using a different command syntax
`git branch -r | grep main-oss`
first_commit = !$?.success?
# import commits that have already been rewritten
# so that this script can be re-executed and resume where it left off
unless first_commit
import_new_commits
end
# the main loop!
commits.each do |commit|
# if this commit has already been rewritten, move on
if MAPPED_COMMITS.include?(commit)
first_commit = false
next
end
# get the author of the next commit to rewrite
author = `git show --no-patch --pretty=%ae #{commit}`.strip
# check if the next commit is for
# the person who ran the script
if EMAILS.include?(author)
# rewrite the commit
new_commit = import_commit(commit)
# we have to push differently for the first commit
if first_commit
`git push origin main:main-oss`
raise "push failed" unless $?.success?
`git push -f origin #{new_commit}:main-oss`
raise "pushed failed" unless $?.success?
else
`git push origin #{new_commit}:main-oss`
raise "pushed failed" unless $?.success?
end
# waiting for somebody else to rewrite this commit
else
puts "Waiting for #{author} to rewrite commit #{commit}..."
loop do
sleep 3
puts "Fetching..."
# check to see if the main-oss branch has been updated
`git fetch 2>&1 | grep main-oss`
break if $?.success?
end
import_new_commits
# make sure the commit we were waiting on got imported!
unless MAPPED_COMMITS.include?(commit)
raise "expected #{commit} to be imported"
end
end
first_commit = false
end
end
Committing
# %H = commit hash
# %h = abbreviated commit hash
# %T = tree hash
# %P = parent hashes
# %an = author name
# %ae = author email
# %aD = author date, rfc2822 style
# %cn = committer name
# %ce = committer email
# %cD = committer date, rfc2822 style
# %s = subject
# %b = body
def import_commit(original_commit_hash)
# 0 1 2 3 4 5 6 7 8 9 10 11 12
format = %w[%H %h %T %P %an %ae %aD %cn %ce %cD %s %H %b]
# fetch these attributes for the given commit
# separating the attributes with newlines
results = `git show --no-patch --pretty=#{format.join("%n")} #{original_commit_hash}`.split("\n")
raise "failed to show #{original_commit_hash.inspect}" unless $?.success?
# if results[11] isn't the expected original commit, something went wrong
unless results[11] == original_commit_hash
raise "format error: #{results}"
end
# take the parents for this commit, and
# determine the new commit hashes
mapped_parents = results[3].split(" ").map do |commit|
MAPPED_COMMITS.fetch(commit) # will raise if is not mapped
end
# passing the new commit message as a file makes shell syntax easier
commit_message_file = build_commit_message(results[10], results[0], results[12...].join("\n"))
# map the author email to a new email if necessary
email = EMAIL_MAP[results[5]] || results[5]
command = [
"env",
"GIT_AUTHOR_NAME='#{results[4]}'",
"GIT_AUTHOR_EMAIL='#{email}'",
"GIT_AUTHOR_DATE='#{results[6]}'",
"GIT_COMMITTER_NAME='#{results[4]}'",
"GIT_COMMITTER_EMAIL='#{email}'",
"GIT_COMMITTER_DATE='#{results[9]}'",
"git commit-tree",
mapped_parents.map { |parent| "-p #{parent}" },
# don't sign bot commits
# we had one person on one team add the bot emails to their ARGV
(results[5].include?("[bot]") ? "" : "-S"),
"-F #{commit_message_file}",
# results[2] is the original commit tree, which stays the same
results[2]
].flatten.join(" ")
STDERR.puts "Rewriting #{original_commit_hash}"
puts command
new_commit = `#{command}`.strip
puts new_commit
raise "commit failed" unless $?.success?
MAPPED_COMMITS[original_commit_hash] = new_commit
NEW_COMMITS << new_commit
new_commit
end
Building New Commit Messages
def build_commit_message(subject, original_commit, body)
# strip the PR number from the commit message
# this is the default format in "squash and merge" commits
if subject =~ /^(.+) (#(d+))$/
sanitized_subject = $1
original_pr = $2
# we had a few normal merge commits too
elsif subject =~ /^Merge pull request #(d+) (.+)$/
original_pr = $1
sanitized_subject = "Merge pull request #{$2}"
else
original_pr = nil
sanitized_subject = subject
end
sanitized_subject = sanitized_subject.gsub(/#(d+)/) { "PR #{$1}" }
# maintain credit to co-authors!
co_authors = body.split("\n").select { |line| line.start_with?("Co-authored-by") }
result = []
result << "#{sanitized_subject}\n"
result << "\n"
result << "Original PR: #{original_pr}\n" if original_pr
result << "Original Commit: #{original_commit}\n"
if co_authors.any?
result << "\n"
co_authors.each { |co| result << co }
end
"#{TMP_DIR}/#{SecureRandom.uuid}".tap do |file|
File.open(file, "w") { |f| f.write result.join }
end
end
Importing Existing Mapped Commits
def import_new_commits
new_commits = `git log --reverse --pretty='%H' origin/main-oss`.split("\n")
raise "git log command failed" unless $?.success?
new_commits.each do |new_commit|
next if NEW_COMMITS.include?(new_commit)
# because we maintain a reference to the original commit
# in the new commit message, we can parse the git log for
# the new branch to fetch the existing mapping
original_commit = `git show --no-patch --pretty=%b #{new_commit} | grep 'Original Commit'`.split(" ")[2].strip
raise "failed to get original commit" if !$?.success? || original_commit.empty?
next if MAPPED_COMMITS.include?(original_commit)
puts "#{original_commit} rewritten to #{new_commit}"
MAPPED_COMMITS[original_commit] = new_commit
NEW_COMMITS << new_commit
end
end
Full Script
Here’s the full script as a gist.
Alternative Approaches
We also could have handled the new repository having missing pull requests by creating issues on the new repository that indicated that the repository had been migrated, and issue #111 refers to a historical, private issue. Since we also wanted to clean up our commit messages, rather than create those placeholders, we decided we’d rewrite the history instead.
Connect with our team
We spend most of our time at RWX solving problems related to builds and tests. We publish open source tools for Captain and ABQ and are happy to chat anytime. Say hello on Discord or reach out at [email protected]