I have a git repo that’s become a monster. It’s got at least two WordPress themes, a handful of custom WordPress plugins, some .htaccess files, an artwork directory, and more. Back in the dark times, when it was a Subversion repository this sort of made sense, in that I didn’t have to set up a new repo on the server for each component in the project. But I haven’t touched SVN in a long time, and making new repositories in Git is easy. We’re doing some major work on this client’s sites this fall, so it’s time to break up the giant repository into several smaller ones.
The monolith repository is being split into seventeen smaller components. One of these components is an “artwork” directory, the ~600MB history of which will bloat the git history of all sixteen other repositories if we don’t purge it properly. The goals of this script are:
- Avoid typing the same set of commands seventeen times.
- Keep the appropriate git history for each of the seventeen components.
- Purge any unrelated git history so each component only has its own history in its .git directory.
Github has a help page on splitting a subfolder into its own repository, and another on removing files from a repository’s history, (it’s meant for sensitive data, but works for all data). In theory we only really need the first link, but in testing the entire history of the monolith remained when only following the instructions in the first link, but with the purge and garbage collection commands from the second link the git history is down to an appropriate size.
The script is relatively simple, but relies a bit on directory structure. It should be in the same directory as the monolith repo, (the script doesn’t go into the monolith, it goes in the same containing folder). There are two places to change the script:
- Put the directories for extraction into the
repos
array, (starting on line 5). These paths are relative to the root of the repository. - Put the path of a copy of the monolith repo for use as a source in the
master
variable, (line 16).
Once that’s done, running ./export-directory-repos.sh
should export the repositories.
Here’s the gist:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# Enter the paths in the main repo to the subdirectories you want to extract | |
# Separate paths with spaces or newlines | |
repos=( | |
artwork | |
checklists | |
code/comingsoon | |
code/utility | |
) | |
echo "Total repos to export : ${#repos[*]}" | |
# Put the directory of the main repo here. | |
master=monolith-repo-master | |
for repo in "${repos[@]}" | |
do | |
subdir=$(basename $repo) | |
echo "${repo} => ${subdir}" | |
cp -R $master $subdir | |
cd $subdir | |
git filter-branch –prune-empty –subdirectory-filter $repo master | |
git for-each-ref –format='delete %(refname)' refs/original | git update-ref –stdin | |
git reflog expire –expire=now –all | |
git gc –prune=now | |
cd ../ | |
done |