The Pre-Commit Hook That Catches API Keys Before They Hit Git

The problem: secrets in git are forever

You know the drill. A developer hardcodes a Stripe secret key to test a webhook handler locally. They commit. They push. Maybe they catch it themselves and run git rm. Problem solved, right?

Wrong. The key is still in your git history. Anyone who clones the repo can run git log -p and find it. Bots scrape GitHub for exactly this pattern. GitGuardian reported over 10 million secrets detected in public commits in 2023 alone, and the number keeps climbing.

Scrubbing secrets from git history means git filter-branch or BFG Repo-Cleaner, force-pushing to every remote, and hoping nobody already pulled the old history. If the key reached a public repo for even a few minutes, you need to rotate it. For AWS, that means updating every service, Lambda, and CI pipeline that uses it. For Stripe, that means regenerating keys and redeploying payment infrastructure.

The real cost is not the cleanup. It is the blast radius. A leaked AWS key can rack up tens of thousands in compute charges before you notice. A leaked Stripe key gives an attacker access to your customer payment data. Prevention is not optional.

The fix: a POSIX pre-commit hook

A git pre-commit hook runs automatically before every commit. If it exits with a non-zero status, the commit is blocked. The strategy: scan every staged file for patterns that look like secrets, and refuse to commit if anything matches.

Here is the skeleton. This goes in .git/hooks/pre-commit (or use a symlink from a checked-in scripts/ directory so every developer on the team gets it).

#!/bin/sh
# Pre-commit hook: block secrets from reaching git history
set -e

STAGED_FILES=$(git diff --cached --name-only --diff-filter=ACM)

if [ -z "$STAGED_FILES" ]; then
  exit 0
fi

FOUND=0

for file in $STAGED_FILES; do
  # Skip binary files
  if file "$file" | grep -q "binary"; then
    continue
  fi

  # Get only the staged content (not working tree)
  CONTENT=$(git show ":$file" 2>/dev/null) || continue

  # Check for known secret patterns
  if echo "$CONTENT" | check_patterns "$file"; then
    FOUND=1
  fi
done

if [ "$FOUND" -eq 1 ]; then
  echo "COMMIT BLOCKED: potential secrets detected."
  echo "Add a pii-ok comment to suppress false positives."
  exit 1
fi

Key detail: we use git show ":$file" to read the staged content, not the working tree. This prevents false negatives where a developer stages a file with a secret, then removes it from the working copy but does not re-stage.

Pattern matching: what to look for

The core of the hook is a set of regular expressions that match known secret formats. These are not hypothetical patterns. They are extracted from real-world key formats.

check_patterns() {
  file="$1"
  matched=0

  # AWS Access Key ID
  if grep -nE 'AKIA[0-9A-Z]{16}' | filter_suppressed; then
    echo "  [AWS] $file: AWS Access Key ID"
    matched=1
  fi

  # Stripe secret key
  if grep -nE 'sk_(live|test)_[0-9a-zA-Z]{24,}' | filter_suppressed; then
    echo "  [STRIPE] $file: Stripe secret key"
    matched=1
  fi

  # Stripe restricted key
  if grep -nE 'rk_(live|test)_[0-9a-zA-Z]{24,}' | filter_suppressed; then
    echo "  [STRIPE] $file: Stripe restricted key"
    matched=1
  fi

  # GitHub personal access token
  if grep -nE 'ghp_[0-9a-zA-Z]{36}' | filter_suppressed; then
    echo "  [GITHUB] $file: GitHub PAT"
    matched=1
  fi

  # Generic high-entropy strings (API keys, tokens)
  if grep -nE "['\"][0-9a-zA-Z]{32,}['\"]" | filter_suppressed; then
    echo "  [ENTROPY] $file: high-entropy string (>=32 chars)"
    matched=1
  fi

  return $matched
}

The high-entropy check at the end is the catch-all. Any quoted string of 32+ alphanumeric characters is flagged. This catches tokens, API keys, and secrets that do not match a known vendor pattern. It will also flag some legitimate values like UUIDs and hashes, which is where the suppression pragma comes in.

The pii-ok pragma: handling false positives

Every secret scanner produces false positives. A SHA-256 hash in a test fixture. A base64-encoded public key. A long CSS class name generated by a build tool. If there is no escape hatch, developers will disable the hook entirely, which defeats the purpose.

The solution is a suppression comment: pii-ok. If a line contains this marker, the scanner skips it.

filter_suppressed() {
  # Remove lines containing the suppression marker
  grep -v "pii-ok" | grep -c . > /dev/null 2>&1
}

In practice it looks like this:

// This SHA-256 is a test fixture, not a secret
const EXPECTED_HASH = 'a1b2c3d4e5f6...'; // pii-ok

// This WILL be caught (no pragma)
const STRIPE_KEY = 'sk_live_abc123...';

The rule is simple: if you know a value is not a secret, add pii-ok on the same line. If you are not sure, leave it off and let the hook flag it. The inconvenience of a false positive is nothing compared to the cost of a leaked key.

Going further: .htaccess and env files

The pattern-matching approach extends to other dangerous file types. .htaccess files with SetEnv directives often contain database passwords. .env files are secrets by definition. Your hook should flag both.

# Block .env files entirely
if echo "$file" | grep -qE '\.env$'; then
  echo "  [ENV] $file: .env files must be .gitignored"
  FOUND=1
  continue
fi

# Flag SetEnv with real values in .htaccess
if echo "$file" | grep -qE '\.htaccess$'; then
  if echo "$CONTENT" | grep -nE 'SetEnv\s+\S+\s+\S+' | filter_suppressed; then
    echo "  [HTACCESS] $file: SetEnv with real values"
    FOUND=1
  fi
fi

The convention: commit .env.example with <REPLACE_ME> placeholders. The real .env stays in .gitignore. Same for .htaccess files that contain credentials -- commit a sanitized version, keep the real one out of version control.

The AI layer: catching what regex cannot

Pattern matching catches known secret formats. But what about a database connection string with an embedded password? Or a hardcoded JWT that does not match any vendor prefix? Or code that is technically functional but has a SQL injection vulnerability?

This is where an LLM-powered code review gate comes in. The idea: after the regex-based pre-commit hook passes, run a second pass that sends the diff to an LLM and asks it to identify security concerns. The model can catch patterns that regex never will -- SQL injection, logic errors that expose data, hardcoded credentials in unusual formats, and more.

The review gate reads your staged diff, sends it to an LLM with a security-focused system prompt, and blocks the commit if the model identifies high-severity issues. It complements the fast, deterministic regex hook with the contextual understanding of a language model.

Making it a team standard

A pre-commit hook that lives in .git/hooks/ only works on one machine. To make it a team-wide standard:

Check the hook into the repo under scripts/pre-commit
Add a setup script that symlinks it: ln -sf ../../scripts/pre-commit .git/hooks/pre-commit
Document the pii-ok pragma so developers know how to suppress false positives without disabling the hook
Run the same patterns in CI as a backup, because developers can skip hooks with --no-verify

The hook must be fast. If it takes more than a second or two, developers will bypass it. Pure POSIX shell with grep keeps it under 200ms even on large commits. The AI review gate is optional and can be configured to run only on push or in CI if latency is a concern.

Claude Code Kit

The complete pre-commit safety system, ready to drop in

The patterns in this post are a starting point. Claude Code Kit is the production-hardened version: a complete PII scanner, AI-powered code review gate, pre-commit hooks covering 15+ secret formats, and CLAUDE.md templates for teams using AI coding assistants. 53 tests. Zero dependencies. Pure POSIX shell.

PII scanner with 15+ patterns
AI code review gate (LLM-powered)
pii-ok pragma suppression
CLAUDE.md team templates
.env / .htaccess protection
53 passing tests
Zero dependencies
POSIX shell -- works everywhere

Get Claude Code Kit -- $29 Full Stack Bundle -- $149 One-time purchase. No subscription.

What to do next

If you do nothing else today, add the basic pattern-matching hook from this post to your repositories. It takes five minutes, it costs nothing, and it will save you from at least one costly key rotation.

For a production-ready implementation with broader pattern coverage, the AI review gate, team templates, and a full test suite, Claude Code Kit has it all packaged up and documented. It is $29, it is a one-time purchase, and every line of source code is included. No binaries, no obfuscation, no vendor lock-in.

Your git history should contain your work, not your secrets.