Automating the Fracturing of a Git Repository

I have a git repo that’s become a monster. It’s got at least two WordPress themes, a handful of custom WordPress plugins, some .htaccess files, an artwork directory, and more. Back in the dark times, when it was a Subversion repository this sort of made sense, in that I didn’t have to set up a new repo on the server for each component in the project. But I haven’t touched SVN in a long time, and making new repositories in Git is easy. We’re doing some major work on this client’s sites this fall, so it’s time to break up the giant repository into several smaller ones.

The monolith repository is being split into seventeen smaller components. One of these components is an “artwork” directory, the ~600MB history of which will bloat the git history of all sixteen other repositories if we don’t purge it properly. The goals of this script are:

  1. Avoid typing the same set of commands seventeen times.
  2. Keep the appropriate git history for each of the seventeen components.
  3. Purge any unrelated git history so each component only has its own history in its .git directory.

Github has a help page on splitting a subfolder into its own repository, and another on removing files from a repository’s history, (it’s meant for sensitive data, but works for all data). In theory we only really need the first link, but in testing the entire history of the monolith remained when only following the instructions in the first link, but with the purge and garbage collection commands from the second link the git history is down to an appropriate size.

The script is relatively simple, but relies a bit on directory structure. It should be in the same directory as the monolith repo, (the script doesn’t go into the monolith, it goes in the same containing folder). There are two places to change the script:

  1. Put the directories for extraction into the repos array, (starting on line 5). These paths are relative to the root of the repository.
  2. Put the path of a copy of the monolith repo for use as a source in the master variable, (line 16).

Once that’s done, running ./export-directory-repos.sh should export the repositories.

Here’s the gist:


#!/bin/bash
# Enter the paths in the main repo to the subdirectories you want to extract
# Separate paths with spaces or newlines
repos=(
artwork
checklists
code/comingsoon
code/utility
)
echo "Total repos to export : ${#repos[*]}"
# Put the directory of the main repo here.
master=monolith-repo-master
for repo in "${repos[@]}"
do
subdir=$(basename $repo)
echo "${repo} => ${subdir}"
cp -R $master $subdir
cd $subdir
git filter-branch –prune-empty –subdirectory-filter $repo master
git for-each-ref –format='delete %(refname)' refs/original | git update-ref –stdin
git reflog expire –expire=now –all
git gc –prune=now
cd ../
done