Mastering Git: Safely Rewriting History to Remove Sensitive Files

Undoing the Past with git-filter-repo

Once upon a time, in a classic developer rite of passage, someone accidentally committed a .env file to version control. I’m not saying it was me, but let’s just say we all know someone who’s been there. You can probably guess how that story ended. But what if I told you it’s possible to go back in time and make it like it never happened?

In this tutorial, we shall explore how to safely rewrite your git history to remove any unwanted files from version control. Whether it’s database credentials, API keys or any other sensitive information, having these details exposed in your git history poses very serious security risks. Even if you end up deleting the file in a new commit, the sensitive data still lives in your repository’s history. Yes, you heard that right! But don’t worry, we’ve got a solution! So come on, let’s go back in time!

The Need for Rewriting Git History

The strength of Git lies in its detailed record-keeping. Every single commit is saved, preserved and accessible. However, this quickly becomes a liability when sensitive data is committed. It’s critical to emphasize that simply deleting the file with a new commit isn’t enough; the information is still accessible from the repo history.

Here are some scenarios to consider:

  • 🔑 Committed API keys or passwords in configuration files.

  • 🗄️ Database credentials in environment files.

  • 📧 Personal data or email addresses in test files.

  • 🔐 Private keys or certificates.

You might be thinking, “Why should i be concerned if my repo is private?”. Even if it’s private today, it could be public tomorrow. Or worse still, a security breach could expose your entire git history. What are you going to say then? This is exactly why we need tools to safely remove sensitive data from Git’s history.

Introducing git-filter-repo

git-filter-repo is the modern solution for rewriting Git history, officially recommended by the Git community. It replaces the older git filter-branch command, offering several advantages:

  • Performance: Up to 100x faster than filter-branch.

  • 🛠️ Safety: Built-in safeguards against common mistakes.

  • 🎯 Precision: More accurate handling of complex rewrites.

  • 📦 Simplicity: Cleaner, more intuitive command syntax.

Featuregit filter-repogit filter-branch
SpeedVery fastSlow
Ease of useSimpleComplex
Safety checksYesLimited
DocumentationComprehensiveBasic

Step-by-Step Guide to Using git-filter-repo

Prerequisites

  • A GitHub repository with a sensitive file (e.g., .env) in its history.

  • Git installed on your machine.

  • Basic familiarity with Git commands.

  • A backup of your repository (seriously, do this first!).

Here are the steps:

  1. First, we need to install git-filter-repo . git-filter-repo can be installed using Python's package manager, pip.

     pip install git-filter-repo
    

Alternatively, you can download it directly from the GitHub repository.

  1. In your terminal, navigate to the directory of the Git repository you want to clean.

     cd /path/to/your/repo
    
  2. Now, use the git filter-repo command to remove all instances of the .env file from your repository history.

     git filter-repo --path .env --invert-paths
    
  3. Next, ensure that the .env file is added to your .gitignore file so it won't be tracked in future commits.

     echo ".env" >> .gitignore
     git add .gitignore
     git commit -m "chore: add .env to .gitignore"
    
  4. Finally, force push the changes to your remote repository. This is necessary because you’ve rewritten the history.

     git push origin --force --all
     git push origin --force --tags
    
  5. Summary of operations to be performed:

     # Step 1: Navigate to your repository
     cd /path/to/your/repo
    
     # Step 2: Run git filter-repo to remove .env files
     git filter-repo --path .env --invert-paths
    
     # Step 3: Add .env to .gitignore
     echo ".env" >> .gitignore
     git add .gitignore
     git commit -m "Add .env to .gitignore"
    
     # Step 4: Force push the changes to the remote repository
     git push origin --force --all
     git push origin --force --tags
    

Using --all and --tags ensures that all your branches and tags are pushed to the remote repository, which is especially useful after operations like history rewriting. However, these operations should be used with caution, especially in a collaborative environment, due to their potential to disrupt the work of others.

Team Synchronization After History Rewrite

If this is a shared repository with other team members, they'll need to sync their local repositories. Share these instructions with your team:

  1. For a Quick Sync (For team members with no local changes):

    • Switch to your main/default branch

        git checkout main  # or your default branch name
      
    • Fetch the latest changes

        git fetch origin
      
    • Reset to match the remote

        git reset --hard origin/main
      
  2. For a Safe Sync (For team members with uncommitted changes)

    Applies to when you’re working on the main branch, have made some uncommitted changes and do not have local branches. If this is you, follow the steps below:

    • Save your current changes

        git stash # Temporarily store uncommitted changes
      
    • Switch to main branch

        git checkout main
      
    • Fetch and reset

        git fetch origin
        git reset --hard origin/main
      
    • Reapply your changes

        git stash pop # Restore your uncommitted changes
      
  3. For a Complete Sync (For team members with local branches)

    Applies to when you have local branches with commits, have feature branches that have not been committed yet and have branches based on the old history that need to be preserved. If this is you, follow the steps below:

    • Create a backup for your current branch

        git branch backup-main main
      
    • Update main branch

        git checkout main
        git fetch origin
        git reset --hard origin/main
      
    • For each local branch

        git checkout <your-branch>
        git rebase main
      

⚠️ Important Notes for Team Members

  • Choose the appropriate sync method based on your local setup.

    • Use Safe Sync if you're just working on the main branch with temporary uncommitted changes.

    • Use Complete Sync when you have:

      • Local branches that need to be preserved.

      • Unpushed commits.

      • Feature branches based on the old history.

  • Commit or stash any important changes before syncing.

  • The history rewrite means all commit hashes have changed.

  • You may need to force push your rebased local branches.

  • When in doubt, create a backup branch first.

🔍 Troubleshooting If you encounter issues:

1. Create a backup of your local work: git branch backup-$(date +%Y%m%d)

2. Try the Quick Sync method.

Conclusion

In this article, we have learnt that history can be rewritten, your git history.

Key Takeaways

  1. Git history rewrites, while powerful, should be used thoughtfully and sparingly.

  2. git filter-repo provides a reliable way to remove sensitive data from your repository's history.

  3. Team coordination is crucial when performing history rewrites on shared repositories.

  4. Prevention is always better than cure – always review changes before committing and pushing.

Best Practices moving forward

  1. Make sure to add all your sensitive files to .gitignore .

  2. Consider setting up pre-commit hooks to catch sensitive files before being committed.

    To learn more about pre-commit hooks, check out my detailed guide here.

  3. Create clear guidelines for handling configuration files and establish a protocol for when sensitive data is accidentally exposed.

  4. Rotate any exposed credentials immediately.

Remember: Rewriting Git history is like time travel - powerful but should be used responsibly. The best strategy is to prevent sensitive data from being committed in the first place, but when accidents happen, you now have the tools to handle them properly.

Resources

Bookmark this guide for future reference (you will need it), share with your team and react with an emoji if this saved your day! 🎯 Until next time, commit responsibly!