Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/custom_detectors/CUSTOM_DETECTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ This guide will walk you through setting up a custom detector in TruffleHog to i
- **`verify`**: An optional section to validate detected secrets. If you want to verify or unverify detected secrets, this section needs to be configured. If not configured, all detected secrets will be marked as unverified. Read [verification server examples](#verification-server-examples)

**Other allowed parameters:**
- **`primary_regex_name`**: This parameter allows you designate the primary regex pattern when multiple regex patterns are defined in the regex section. If a match is found, the match for the designated primary regex will be used to determine the line number. The value must be one of the names specified in the regex section.
- **`primary_regex_name`**: This parameter allows you designate the primary regex pattern when multiple regex patterns are defined in the regex section. If a match is found, the match for the designated primary regex will be used to determine the line number. The value must be one of the names specified in the regex section. If not provided, the first regex defined in the regex section will be used as the primary regex by default.
- **`exclude_regexes_capture`**: This parameter allows you to define regex patterns to exclude specific parts of a detected secret. If a match is found within the detected secret, the portion matching this regex is excluded from the result.
- **`exclude_regexes_match`**: This parameter enables you to define regex patterns to exclude entire matches from being reported as secrets. This applies to the entire matched string, not just the token.
- **`entropy`**: This parameter is used to assess the randomness of detected strings. High entropy often indicates that a string is a potential secret, such as an API key or password, due to its complexity and unpredictability. It helps in filtering false-positives. While an entropy threshold of `3` can be a starting point, it's essential to adjust this value based on your project's specific requirements and the nature of the data you have.
Expand Down
31 changes: 29 additions & 2 deletions pkg/custom_detectors/custom_detectors.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ func NewWebhookCustomRegex(pb *custom_detectorspb.CustomRegex) (*CustomRegexWebh
}
}

// Ensure primary regex name is set.
ensurePrimaryRegexNameSet(pb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now always set the primary regex ourselves, is it still necessary to expose this as a user-configurable option and document it?

We should properly document this behavior in code, clearly explaining how and why the primary regex is always used for full matches.

We should explain this in the README as well: going forward, the custom detector line number will point to the full match of the primary regex or, if none is explicitly set, the first regex. This is important because a match may span multiple lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user may want to set a regex other than the first one as the primary regex, so I think this option should still be exposed.

Totally agree with the documentation part! I'll do that


// TODO: Copy only necessary data out of pb.
return &CustomRegexWebhook{pb}, nil
}
Expand Down Expand Up @@ -229,14 +232,27 @@ func (c *CustomRegexWebhook) createResults(ctx context.Context, match map[string
values := match[key]
// values[0] contains the entire regex match.
secret := values[0]
fullMatch := values[0]
if len(values) > 1 {
secret = values[1]
}
raw += secret

// if the match is of the primary regex, set it's value as primary secret value in result
// We set the full regex match as the primary secret value.
// Reasoning:
// The engine calculates the line number using the match. When a primary secret is set, it uses that value instead of the raw secret.
// While the secret match itself is sufficient to calculate the line number, the same group match could appear elsewhere in the data.
// To avoid ambiguity, we store the full regex match as the primary secret value.
// This primary secret value is used only for identifying the exact line number and is not used anywhere else.

// Example:
// Full regex match: secret = ABC123
// Secret (raw): ABC123

// In this case, the primary secret value stores the full string `secret = ABC123`,
// allowing the engine to pinpoint the exact location and avoid matching redundant occurrences of `ABC123` in the data.
if c.PrimaryRegexName == key {
result.SetPrimarySecretValue(secret)
result.SetPrimarySecretValue(fullMatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a detailed comment here why are we using fullMatch instead of secret here.

}
}

Expand Down Expand Up @@ -394,3 +410,14 @@ func (c *CustomRegexWebhook) Description() string {
}
return c.GetDescription()
}

// ensurePrimaryRegexNameSet sets the PrimaryRegexName field to the first
// regex name if it is not already set.
func ensurePrimaryRegexNameSet(pb *custom_detectorspb.CustomRegex) {
if pb.PrimaryRegexName == "" {
for name := range pb.Regex {
pb.PrimaryRegexName = name
return
}
}
}
73 changes: 73 additions & 0 deletions pkg/custom_detectors/custom_detectors_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,61 @@ func TestDetectorPrimarySecret(t *testing.T) {
assert.Equal(t, "secret_YI7C90ACY1_yy", results[0].GetPrimarySecretValue())
}

func TestDetectorPrimarySecretFullMatch(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a test case for a match which span multiple lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to write a test case in the engine too.
Location: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/engine/engine_test.go#L166

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case should include the full regex match appearing across multiple lines, while the actual secret exists on only one line. The primary secret value will span multiple lines and should resolve to the line number where the match starts.

I want to confirm how the primary secret value is processed when it contains multi-line values - specifically, how the engine determines and reports the starting line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

tests := []struct {
name string
input *custom_detectorspb.CustomRegex
chunk []byte
want string
}{
{
name: "primary regex full match",
input: &custom_detectorspb.CustomRegex{
Name: "test",
Keywords: []string{"secret"},
Regex: map[string]string{"secret": `secret *= *"([^"\r\n]+)"`},
PrimaryRegexName: "secret",
},
chunk: []byte(`
// some code
secret="mysecret"
// some code
`),
want: `secret="mysecret"`,
},
// Write a test case for a match which span multiple lines.
{
name: "primary regex full match multiline",
input: &custom_detectorspb.CustomRegex{
Name: "test",
Keywords: []string{"secret"},
Regex: map[string]string{"secret": `secret *= *"([^"]+)"`},
PrimaryRegexName: "secret",
},
chunk: []byte(`
// some code
secret="mysecret
thatspansmultiplelines"
// some code
`),
want: `secret="mysecret
thatspansmultiplelines"`,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
detector, err := NewWebhookCustomRegex(tt.input)
assert.NoError(t, err)
results, err := detector.FromData(context.Background(), false, tt.chunk)
assert.NoError(t, err)
assert.Equal(t, 1, len(results))
assert.Equal(t, tt.want, results[0].GetPrimarySecretValue())
})
}

}

func TestDetectorValidations(t *testing.T) {
type args struct {
CustomRegex *custom_detectorspb.CustomRegex
Expand Down Expand Up @@ -707,6 +762,24 @@ func TestNewWebhookCustomRegex_Validation(t *testing.T) {
}
}

func TestNewWebhookCustomRegex_EnsurePrimaryRegexNameSet(t *testing.T) {
t.Parallel()

pb := &custom_detectorspb.CustomRegex{
Name: "test",
Keywords: []string{"kw"},
Regex: map[string]string{
"first": `first_regex`,
"second": `second_regex`,
},
// PrimaryRegexName is not set.
}

detector, err := NewWebhookCustomRegex(pb)
assert.NoError(t, err)
assert.Equal(t, "first", detector.GetPrimaryRegexName(), "expected PrimaryRegexName to be set to the first regex name")
}

func BenchmarkProductIndices(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = productIndices(3, 2, 6)
Expand Down
15 changes: 15 additions & 0 deletions pkg/engine/engine_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,21 @@ func TestFragmentLineOffsetWithPrimarySecret(t *testing.T) {
}
}

func TestFragmentLineOffsetWithPrimarySecretMultiline(t *testing.T) {
result := &detectors.Result{
Raw: []byte("secret here"),
}
result.SetPrimarySecretValue("secret:\nsecret here")

chunk := &sources.Chunk{
Data: []byte("line1\nline2\nsecret:\nsecret here\nline5"),
}
lineOffset, isIgnored := FragmentLineOffset(chunk, result)
assert.False(t, isIgnored)
// offset 2 means line 3
assert.Equal(t, int64(2), lineOffset)
}

func setupFragmentLineOffsetBench(totalLines, needleLine int) (*sources.Chunk, *detectors.Result) {
data := make([]byte, 0, 4096)
needle := []byte("needle")
Expand Down
Loading