Photo by Murray Campbell on Unsplash
Mastering Git: Safely Rewriting History to Remove Sensitive Files
Undoing the Past with git-filter-repo
Table of contents
Once upon a time, in a classic developer rite of passage, someone accidentally committed a .env
file to version control. I’m not saying it was me, but let’s just say we all know someone who’s been there. You can probably guess how that story ended. But what if I told you it’s possible to go back in time and make it like it never happened?
In this tutorial, we shall explore how to safely rewrite your git history to remove any unwanted files from version control. Whether it’s database credentials, API keys or any other sensitive information, having these details exposed in your git history poses very serious security risks. Even if you end up deleting the file in a new commit, the sensitive data still lives in your repository’s history. Yes, you heard that right! But don’t worry, we’ve got a solution! So come on, let’s go back in time!
The Need for Rewriting Git History
The strength of Git lies in its detailed record-keeping. Every single commit is saved, preserved and accessible. However, this quickly becomes a liability when sensitive data is committed. It’s critical to emphasize that simply deleting the file with a new commit isn’t enough; the information is still accessible from the repo history.
Here are some scenarios to consider:
🔑 Committed API keys or passwords in configuration files.
🗄️ Database credentials in environment files.
📧 Personal data or email addresses in test files.
🔐 Private keys or certificates.
You might be thinking, “Why should i be concerned if my repo is private?”. Even if it’s private today, it could be public tomorrow. Or worse still, a security breach could expose your entire git history. What are you going to say then? This is exactly why we need tools to safely remove sensitive data from Git’s history.
Introducing git-filter-repo
git-filter-repo
is the modern solution for rewriting Git history, officially recommended by the Git community. It replaces the older git filter-branch
command, offering several advantages:
⚡ Performance: Up to 100x faster than filter-branch.
🛠️ Safety: Built-in safeguards against common mistakes.
🎯 Precision: More accurate handling of complex rewrites.
📦 Simplicity: Cleaner, more intuitive command syntax.
Feature | git filter-repo | git filter-branch |
Speed | Very fast | Slow |
Ease of use | Simple | Complex |
Safety checks | Yes | Limited |
Documentation | Comprehensive | Basic |
Step-by-Step Guide to Using git-filter-repo
Prerequisites
A GitHub repository with a sensitive file (e.g.,
.env
) in its history.Git installed on your machine.
Basic familiarity with Git commands.
A backup of your repository (seriously, do this first!).
Here are the steps:
First, we need to install
git-filter-repo
.git-filter-repo
can be installed using Python's package manager,pip
.pip install git-filter-repo
Alternatively, you can download it directly from the GitHub repository.
In your terminal, navigate to the directory of the Git repository you want to clean.
cd /path/to/your/repo
Now, use the
git filter-repo
command to remove all instances of the.env
file from your repository history.git filter-repo --path .env --invert-paths
Next, ensure that the
.env
file is added to your.gitignore
file so it won't be tracked in future commits.echo ".env" >> .gitignore git add .gitignore git commit -m "chore: add .env to .gitignore"
Finally, force push the changes to your remote repository. This is necessary because you’ve rewritten the history.
git push origin --force --all git push origin --force --tags
Summary of operations to be performed:
# Step 1: Navigate to your repository cd /path/to/your/repo # Step 2: Run git filter-repo to remove .env files git filter-repo --path .env --invert-paths # Step 3: Add .env to .gitignore echo ".env" >> .gitignore git add .gitignore git commit -m "Add .env to .gitignore" # Step 4: Force push the changes to the remote repository git push origin --force --all git push origin --force --tags
Using --all
and --tags
ensures that all your branches and tags are pushed to the remote repository, which is especially useful after operations like history rewriting. However, these operations should be used with caution, especially in a collaborative environment, due to their potential to disrupt the work of others.
Team Synchronization After History Rewrite
If this is a shared repository with other team members, they'll need to sync their local repositories. Share these instructions with your team:
For a Quick Sync (For team members with no local changes):
Switch to your main/default branch
git checkout main # or your default branch name
Fetch the latest changes
git fetch origin
Reset to match the remote
git reset --hard origin/main
For a Safe Sync (For team members with uncommitted changes)
Applies to when you’re working on the main branch, have made some uncommitted changes and do not have local branches. If this is you, follow the steps below:
Save your current changes
git stash # Temporarily store uncommitted changes
Switch to main branch
git checkout main
Fetch and reset
git fetch origin git reset --hard origin/main
Reapply your changes
git stash pop # Restore your uncommitted changes
For a Complete Sync (For team members with local branches)
Applies to when you have local branches with commits, have feature branches that have not been committed yet and have branches based on the old history that need to be preserved. If this is you, follow the steps below:
Create a backup for your current branch
git branch backup-main main
Update main branch
git checkout main git fetch origin git reset --hard origin/main
For each local branch
git checkout <your-branch> git rebase main
⚠️ Important Notes for Team Members
Choose the appropriate sync method based on your local setup.
Use Safe Sync if you're just working on the main branch with temporary uncommitted changes.
Use Complete Sync when you have:
Local branches that need to be preserved.
Unpushed commits.
Feature branches based on the old history.
Commit or stash any important changes before syncing.
The history rewrite means all commit hashes have changed.
You may need to force push your rebased local branches.
When in doubt, create a backup branch first.
🔍 Troubleshooting If you encounter issues:
1. Create a backup of your local work: git branch backup-$(date +%Y%m%d)
2. Try the Quick Sync method.
Conclusion
In this article, we have learnt that history can be rewritten, your git history.
Key Takeaways
Git history rewrites, while powerful, should be used thoughtfully and sparingly.
git filter-repo
provides a reliable way to remove sensitive data from your repository's history.Team coordination is crucial when performing history rewrites on shared repositories.
Prevention is always better than cure – always review changes before committing and pushing.
Best Practices moving forward
Make sure to add all your sensitive files to
.gitignore
.Consider setting up pre-commit hooks to catch sensitive files before being committed.
To learn more about pre-commit hooks, check out my detailed guide here.
Create clear guidelines for handling configuration files and establish a protocol for when sensitive data is accidentally exposed.
Rotate any exposed credentials immediately.
Remember: Rewriting Git history is like time travel - powerful but should be used responsibly. The best strategy is to prevent sensitive data from being committed in the first place, but when accidents happen, you now have the tools to handle them properly.
Resources
Bookmark this guide for future reference (you will need it), share with your team and react with an emoji if this saved your day! 🎯 Until next time, commit responsibly!