Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions pkg/custom_detectors/custom_detectors.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ func NewWebhookCustomRegex(pb *custom_detectorspb.CustomRegex) (*CustomRegexWebh
}
}

// Ensure primary regex name is set.
ensurePrimaryRegexNameSet(pb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now always set the primary regex ourselves, is it still necessary to expose this as a user-configurable option and document it?

We should properly document this behavior in code, clearly explaining how and why the primary regex is always used for full matches.

We should explain this in the README as well: going forward, the custom detector line number will point to the full match of the primary regex or, if none is explicitly set, the first regex. This is important because a match may span multiple lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user may want to set a regex other than the first one as the primary regex, so I think this option should still be exposed.

Totally agree with the documentation part! I'll do that


// TODO: Copy only necessary data out of pb.
return &CustomRegexWebhook{pb}, nil
}
Expand Down Expand Up @@ -229,14 +232,15 @@ func (c *CustomRegexWebhook) createResults(ctx context.Context, match map[string
values := match[key]
// values[0] contains the entire regex match.
secret := values[0]
fullMatch := values[0]
if len(values) > 1 {
secret = values[1]
}
raw += secret

// if the match is of the primary regex, set it's value as primary secret value in result
// if the match is of the primary regex, set it's full match as primary secret value in result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We set the full regex match as the primary secret value.
Reasoning:
The engine calculates the line number using the match. When a primary secret is set, it uses that value instead of the raw secret.
While the secret match itself is sufficient to calculate the line number, the same group match could appear elsewhere in the data.
To avoid ambiguity, we store the full regex match as the primary secret value.
This primary secret value is used only for identifying the exact line number and is not used anywhere else.

Example:
Full regex match: secret = ABC123
Secret (raw): ABC123

In this case, the primary secret value stores the full string `secret = ABC123`,
allowing the engine to pinpoint the exact location and avoid matching redundant occurrences of `ABC123` in the data.

if c.PrimaryRegexName == key {
result.SetPrimarySecretValue(secret)
result.SetPrimarySecretValue(fullMatch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a detailed comment here why are we using fullMatch instead of secret here.

}
}

Expand Down Expand Up @@ -394,3 +398,14 @@ func (c *CustomRegexWebhook) Description() string {
}
return c.GetDescription()
}

// ensurePrimaryRegexNameSet sets the PrimaryRegexName field to the first
// regex name if it is not already set.
func ensurePrimaryRegexNameSet(pb *custom_detectorspb.CustomRegex) {
if pb.PrimaryRegexName == "" {
for name := range pb.Regex {
pb.PrimaryRegexName = name
return
}
}
}
37 changes: 37 additions & 0 deletions pkg/custom_detectors/custom_detectors_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,25 @@ func TestDetectorPrimarySecret(t *testing.T) {
assert.Equal(t, "secret_YI7C90ACY1_yy", results[0].GetPrimarySecretValue())
}

func TestDetectorPrimarySecretFullMatch(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write a test case for a match which span multiple lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to write a test case in the engine too.
Location: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/engine/engine_test.go#L166

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case should include the full regex match appearing across multiple lines, while the actual secret exists on only one line. The primary secret value will span multiple lines and should resolve to the line number where the match starts.

I want to confirm how the primary secret value is processed when it contains multi-line values - specifically, how the engine determines and reports the starting line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

detector, err := NewWebhookCustomRegex(&custom_detectorspb.CustomRegex{
Name: "test",
Keywords: []string{"secret"},
Regex: map[string]string{"secret": `secret *= *"([^"\r\n]+)"`},
PrimaryRegexName: "secret",
})

assert.NoError(t, err)
results, err := detector.FromData(context.Background(), false, []byte(`
// some code
secret="$existing_secret"
// some code
`))
assert.NoError(t, err)
assert.Equal(t, 1, len(results))
assert.Equal(t, `secret="$existing_secret"`, results[0].GetPrimarySecretValue())
}

func TestDetectorValidations(t *testing.T) {
type args struct {
CustomRegex *custom_detectorspb.CustomRegex
Expand Down Expand Up @@ -707,6 +726,24 @@ func TestNewWebhookCustomRegex_Validation(t *testing.T) {
}
}

func TestNewWebhookCustomRegex_EnsurePrimaryRegexNameSet(t *testing.T) {
t.Parallel()

pb := &custom_detectorspb.CustomRegex{
Name: "test",
Keywords: []string{"kw"},
Regex: map[string]string{
"first": `first_regex`,
"second": `second_regex`,
},
// PrimaryRegexName is not set.
}

detector, err := NewWebhookCustomRegex(pb)
assert.NoError(t, err)
assert.Equal(t, "first", detector.GetPrimaryRegexName(), "expected PrimaryRegexName to be set to the first regex name")
}

func BenchmarkProductIndices(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = productIndices(3, 2, 6)
Expand Down
Loading