Sometimes you may need to delete large files or clean up sensitive data such as passwords, credentials, or keys from your git repos. Perhaps these were in your repos from a while ago. You may have deleted these files from your latest commits, but they are still in the history and can still be checked out. If you want these gone, you’ll need to purge all of your history of these files.
In git, every object, file, commit, and tag has a unique hash ID. If you were to change any part of the ID, its value would change completely. A commit ID not only depends on its content but also on the IDs of all the commits that came before it. The commit ID is a hash that represents the commit’s entire history.
You can use the built-in git-filter-branch command to clear all the IDs from history. However, the BFG Repo-Clean tool is much better suited to purging large files or removing passwords and credentials. Moreover, it is simpler and is 10–720x faster than git-filter-branch.
Where to Download BFG
You can grab this tool at this location:
BFG Repo-Cleanerbfg --strip-blobs-bigger-than 100M --replace-text banned.txt repo.git The BFG is a simpler, faster alternative to…
rtyley.github.io
The man page is available at
Requirements
You need Java 8 to run the latest version of BFG.
First Step
Clone a fresh copy of your repo with the mirror option. This downloads a copy of the git repo. This option downloads a bare Git repository, which means that your normal files won’t be visible. However, it is a full copy of the Git database for your repository, so at this point you should make a backup to ensure you don’t lose anything.
Example — Delete a directory
This example will delete the ‘base’ directory and everything in it. Note that you cannot specify the path to the directory.
The BFG will not automatically modify the contents of your most recent commit on your master (or ‘HEAD’) branch, although it will clean all commits prior to it.
To force cleaning everything, use the “no-blob-protection” option to BFG.
Push your changes to the upstream repo:
Example — Delete files bigger than 50M
Example — Delete all files named ‘id_rsa’ or ‘id_dsa’
Conclusion
I found this tool while investigating ways to work around the fact that our company’s GitHub instance isn’t licensed to support files greater than 50MB. BFG makes it easy to remove large files, which was the solution I was looking for.