-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[INS-285] Fix custom detectors line number reporting to match the full regex instead of capture group #4697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| // Ensure primary regex name is set. | ||
| ensurePrimaryRegexNameSet(pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we now always set the primary regex ourselves, is it still necessary to expose this as a user-configurable option and document it?
We should properly document this behavior in code, clearly explaining how and why the primary regex is always used for full matches.
We should explain this in the README as well: going forward, the custom detector line number will point to the full match of the primary regex or, if none is explicitly set, the first regex. This is important because a match may span multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user may want to set a regex other than the first one as the primary regex, so I think this option should still be exposed.
Totally agree with the documentation part! I'll do that
| // if the match is of the primary regex, set it's full match as primary secret value in result | ||
| if c.PrimaryRegexName == key { | ||
| result.SetPrimarySecretValue(secret) | ||
| result.SetPrimarySecretValue(fullMatch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a detailed comment here why are we using fullMatch instead of secret here.
| assert.Equal(t, "secret_YI7C90ACY1_yy", results[0].GetPrimarySecretValue()) | ||
| } | ||
|
|
||
| func TestDetectorPrimarySecretFullMatch(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write a test case for a match which span multiple lines.
| raw += secret | ||
|
|
||
| // if the match is of the primary regex, set it's value as primary secret value in result | ||
| // if the match is of the primary regex, set it's full match as primary secret value in result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We set the full regex match as the primary secret value.
Reasoning:
The engine calculates the line number using the match. When a primary secret is set, it uses that value instead of the raw secret.
While the secret match itself is sufficient to calculate the line number, the same group match could appear elsewhere in the data.
To avoid ambiguity, we store the full regex match as the primary secret value.
This primary secret value is used only for identifying the exact line number and is not used anywhere else.
Example:
Full regex match: secret = ABC123
Secret (raw): ABC123
In this case, the primary secret value stores the full string `secret = ABC123`,
allowing the engine to pinpoint the exact location and avoid matching redundant occurrences of `ABC123` in the data.
| assert.Equal(t, "secret_YI7C90ACY1_yy", results[0].GetPrimarySecretValue()) | ||
| } | ||
|
|
||
| func TestDetectorPrimarySecretFullMatch(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to write a test case in the engine too.
Location: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/engine/engine_test.go#L166
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case should include the full regex match appearing across multiple lines, while the actual secret exists on only one line. The primary secret value will span multiple lines and should resolve to the line number where the match starts.
I want to confirm how the primary secret value is processed when it contains multi-line values - specifically, how the engine determines and reports the starting line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
…the-line-that-matches-the-full-regex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
Description:
Problem
Currently for custom detectors, line number reporting works by looking for the
Rawvalue from the detector result in the chunk. This works for most cases, except this specific one: When the provided regex contains a capture group, and the raw value is also present before the actual regex match.This causes an incorrect line number to be reported.
An example for better explanation:
Regexp:
secret *= *"([^"\r\n]+)"Chunk:
In this example, the regex matches on Line 4, but the
Rawvalue is also present on Line 1, which causes Line 1 to be reported.Solution
We have a
primarySecretfield in the detector's result. It's primary purpose is to indicate the value to determine correct line numbers when multiple regexes are provided. However, we can also use it to solve this problem. by setting to the full match value instead of theRawvalue. This would result in correct line numbers to be reported.Implementing this also requires
primary_regex_nameto be set in the detector config. I've added logic to ensure it is always set.Checklist:
make test-community)?make lintthis requires golangci-lint)?