The problem: secrets in git are forever
You know the drill. A developer hardcodes a Stripe secret key to test a webhook handler locally. They commit. They push. Maybe they catch it themselves and run git rm. Problem solved, right?
Wrong. The key is still in your git history. Anyone who clones the repo can run git log -p and find it. Bots scrape GitHub for exactly this pattern. GitGuardian reported over 10 million secrets detected in public commits in 2023 alone, and the number keeps climbing.
Scrubbing secrets from git history means git filter-branch or BFG Repo-Cleaner, force-pushing to every remote, and hoping nobody already pulled the old history. If the key reached a public repo for even a few minutes, you need to rotate it. For AWS, that means updating every service, Lambda, and CI pipeline that uses it. For Stripe, that means regenerating keys and redeploying payment infrastructure.
The real cost is not the cleanup. It is the blast radius. A leaked AWS key can rack up tens of thousands in compute charges before you notice. A leaked Stripe key gives an attacker access to your customer payment data. Prevention is not optional.
The fix: a POSIX pre-commit hook
A git pre-commit hook runs automatically before every commit. If it exits with a non-zero status, the commit is blocked. The strategy: scan every staged file for patterns that look like secrets, and refuse to commit if anything matches.
Here is the skeleton. This goes in .git/hooks/pre-commit (or use a symlink from a checked-in scripts/ directory so every developer on the team gets it).
#!/bin/sh
# Pre-commit hook: block secrets from reaching git history
set -e
STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM)
if [ -z "$STAGED_FILES" ]; then
exit 0
fi
FOUND=0
for file in $STAGED_FILES; do
# Skip binary files
if file "$file" | grep -q "binary"; then
continue
fi
# Get only the staged content (not working tree)
CONTENT=$(git show ":$file" 2>/dev/null) || continue
# Check for known secret patterns
if echo "$CONTENT" | check_patterns "$file"; then
FOUND=1
fi
done
if [ "$FOUND" -eq 1 ]; then
echo "COMMIT BLOCKED: potential secrets detected."
echo "Add a pii-ok comment to suppress false positives."
exit 1
fi
Key detail: we use git show ":$file" to read the staged content, not the working tree. This prevents false negatives where a developer stages a file with a secret, then removes it from the working copy but does not re-stage.
Pattern matching: what to look for
The core of the hook is a set of regular expressions that match known secret formats. These are not hypothetical patterns. They are extracted from real-world key formats.
check_patterns() {
file="$1"
matched=0
# AWS Access Key ID
if grep -nE 'AKIA[0-9A-Z]{16}' | filter_suppressed; then
echo " [AWS] $file: AWS Access Key ID"
matched=1
fi
# Stripe secret key
if grep -nE 'sk_(live|test)_[0-9a-zA-Z]{24,}' | filter_suppressed; then
echo " [STRIPE] $file: Stripe secret key"
matched=1
fi
# Stripe restricted key
if grep -nE 'rk_(live|test)_[0-9a-zA-Z]{24,}' | filter_suppressed; then
echo " [STRIPE] $file: Stripe restricted key"
matched=1
fi
# GitHub personal access token
if grep -nE 'ghp_[0-9a-zA-Z]{36}' | filter_suppressed; then
echo " [GITHUB] $file: GitHub PAT"
matched=1
fi
# Generic high-entropy strings (API keys, tokens)
if grep -nE "['\"][0-9a-zA-Z]{32,}['\"]" | filter_suppressed; then
echo " [ENTROPY] $file: high-entropy string (>=32 chars)"
matched=1
fi
return $matched
}
The high-entropy check at the end is the catch-all. Any quoted string of 32+ alphanumeric characters is flagged. This catches tokens, API keys, and secrets that do not match a known vendor pattern. It will also flag some legitimate values like UUIDs and hashes, which is where the suppression pragma comes in.
The pii-ok pragma: handling false positives
Every secret scanner produces false positives. A SHA-256 hash in a test fixture. A base64-encoded public key. A long CSS class name generated by a build tool. If there is no escape hatch, developers will disable the hook entirely, which defeats the purpose.
The solution is a suppression comment: pii-ok. If a line contains this marker, the scanner skips it.
filter_suppressed() {
# Remove lines containing the suppression marker
grep -v "pii-ok" | grep -c . > /dev/null 2>&1
}
In practice it looks like this:
// This SHA-256 is a test fixture, not a secret
const EXPECTED_HASH = 'a1b2c3d4e5f6...'; // pii-ok
// This WILL be caught (no pragma)
const STRIPE_KEY = 'sk_live_abc123...';
The rule is simple: if you know a value is not a secret, add pii-ok on the same line. If you are not sure, leave it off and let the hook flag it. The inconvenience of a false positive is nothing compared to the cost of a leaked key.
Going further: .htaccess and env files
The pattern-matching approach extends to other dangerous file types. .htaccess files with SetEnv directives often contain database passwords. .env files are secrets by definition. Your hook should flag both.
# Block .env files entirely
if echo "$file" | grep -qE '\.env$'; then
echo " [ENV] $file: .env files must be .gitignored"
FOUND=1
continue
fi
# Flag SetEnv with real values in .htaccess
if echo "$file" | grep -qE '\.htaccess$'; then
if echo "$CONTENT" | grep -nE 'SetEnv\s+\S+\s+\S+' | filter_suppressed; then
echo " [HTACCESS] $file: SetEnv with real values"
FOUND=1
fi
fi
The convention: commit .env.example with <REPLACE_ME> placeholders. The real .env stays in .gitignore. Same for .htaccess files that contain credentials -- commit a sanitized version, keep the real one out of version control.
The AI layer: catching what regex cannot
Pattern matching catches known secret formats. But what about a database connection string with an embedded password? Or a hardcoded JWT that does not match any vendor prefix? Or code that is technically functional but has a SQL injection vulnerability?
This is where an LLM-powered code review gate comes in. The idea: after the regex-based pre-commit hook passes, run a second pass that sends the diff to an LLM and asks it to identify security concerns. The model can catch patterns that regex never will -- SQL injection, logic errors that expose data, hardcoded credentials in unusual formats, and more.
The review gate reads your staged diff, sends it to an LLM with a security-focused system prompt, and blocks the commit if the model identifies high-severity issues. It complements the fast, deterministic regex hook with the contextual understanding of a language model.
Making it a team standard
A pre-commit hook that lives in .git/hooks/ only works on one machine. To make it a team-wide standard:
- Check the hook into the repo under
scripts/pre-commit - Add a setup script that symlinks it:
ln -sf ../../scripts/pre-commit .git/hooks/pre-commit - Document the pii-ok pragma so developers know how to suppress false positives without disabling the hook
- Run the same patterns in CI as a backup, because developers can skip hooks with
--no-verify
The hook must be fast. If it takes more than a second or two, developers will bypass it. Pure POSIX shell with grep keeps it under 200ms even on large commits. The AI review gate is optional and can be configured to run only on push or in CI if latency is a concern.
Claude Code Kit
The complete pre-commit safety system, ready to drop in
The patterns in this post are a starting point. Claude Code Kit is the production-hardened version: a complete PII scanner, AI-powered code review gate, pre-commit hooks covering 15+ secret formats, and CLAUDE.md templates for teams using AI coding assistants. 53 tests. Zero dependencies. Pure POSIX shell.
- PII scanner with 15+ patterns
- AI code review gate (LLM-powered)
- pii-ok pragma suppression
- CLAUDE.md team templates
- .env / .htaccess protection
- 53 passing tests
- Zero dependencies
- POSIX shell -- works everywhere
What to do next
If you do nothing else today, add the basic pattern-matching hook from this post to your repositories. It takes five minutes, it costs nothing, and it will save you from at least one costly key rotation.
For a production-ready implementation with broader pattern coverage, the AI review gate, team templates, and a full test suite, Claude Code Kit has it all packaged up and documented. It is $29, it is a one-time purchase, and every line of source code is included. No binaries, no obfuscation, no vendor lock-in.
Your git history should contain your work, not your secrets.