Skip to content

GitHub scanner misses secrets in files created by copying content from previously scanned files #4672

@PascalThuet

Description

@PascalThuet

TruffleHog Version

trufflehog 3.92.4

Trace Output

Not applicable - the behavior is deterministic and reproducible without trace logs. The issue is that certain files are simply not scanned/reported by the github scanner while filesystem scanner finds them correctly.

Expected Behavior

When scanning a GitHub repository, TruffleHog should detect secrets in all files that contain them, including files that were created by copying/renaming content from other files in subsequent commits.

Actual Behavior

The trufflehog github scanner only reports the secret in the original file from the first commit where it was introduced. When the same secret is copied into new files (e.g., when splitting a file into multiple files), those new files are not scanned/reported.

The trufflehog filesystem scanner on the same cloned repository does detect the secrets in the new files correctly.

Scanner Original file (deleted) New file _DE New file _FR
trufflehog github ✅ Found (in history) ❌ Not found ❌ Not found
trufflehog filesystem N/A (deleted) ✅ Found ✅ Found

Steps to Reproduce

  1. Have a repository with a file containing a verified secret (e.g., fileA.ipynb with a Databricks token)
  2. In a later commit, create new files by copying content from the original file (e.g., fileA_DE.ipynb, fileA_FR.ipynb) - these new files contain the same secret
  3. Optionally delete the original file
  4. Run trufflehog github --repo=<repo_url> --token=<token> --json
  5. Observe: only the original file is reported, not the new files
  6. Clone the repo and run trufflehog filesystem <path> --json
  7. Observe: the new files (_DE, _FR) are correctly reported

Environment

  • OS: macOS Darwin 25.2.0
  • Version: trufflehog 3.92.4

Additional Context

This is a security concern because:

  1. The original file may be deleted (no longer on active branches)
  2. The new files with the secret are on active branches but not reported
  3. Users may think the secret exposure is "history only" when it's actually still present in the codebase

The github scanner seems to deduplicate based on the secret value, skipping files if the same secret was already found in a previous commit - even if the new files are currently on the active branch and the original file no longer exists.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions