Skip to content

Conversation

@h3n4l
Copy link
Member

@h3n4l h3n4l commented Jan 30, 2026

Summary

  • Split methodCall into collectionMethodCall (48 methods) and cursorMethodCall (35 methods) so C3 returns context-appropriate completions at each position
  • Remove genericMethod and genericDbMethod catch-all rules (identifier LPAREN arguments? RPAREN) that caused C3 to flood all keyword tokens as method candidates
  • Add 61 new lexer tokens (16 collection + 45 database methods) and 48 explicit dbStatement alternatives to replace the removed catch-alls
  • Add 20 new collection method parser rules (bulkWrite, update, mapReduce, findAndModify, explain, hideIndex, Atlas Search Index methods, etc.)

Test plan

  • make build && make test in mongodb/ directory passes
  • Verify autocomplete in bytebase consumer: db.| shows only db methods, db.users.| shows only collection methods, db.users.find().| shows only cursor methods

🤖 Generated with Claude Code

h3n4l added 3 commits January 30, 2026 16:22
…n this session:

## Session Summary

This session continued the implementation of the plan to remove hardcoded method lists by leveraging C3 token candidates from the restructured grammar.

### What was done:

**Task 3 — Fix `query_type.go` build failure:**
- Rewrote `classifyMethodChain()` to use the new grammar API: `chain.CollectionExplainMethod()`, `chain.AllCursorMethodCall()`, `chain.CollectionMethodCall()`
- Rewrote `classifyMethodCall()` → `classifyCollectionMethodCall()` to use `mongodb.ICollectionMethodCallContext` instead of the removed `mongodb.IMethodCallContext`
- Removed the `GenericMethod()` check (no longer exists)
- Added classification for the 20 new collection methods (bulkWrite, update, mapReduce, findAndModify, hideIndex, unhideIndex, reIndex, createSearchIndex, etc.)
- Updated the `incomplete_find_statement` test expectation (parser now recovers `find` better → `Select` instead of `DML`)

**Task 3 — Fix `convertCandidates()` token flooding:**
- At `db.|`, C3 tokens leak from the `collectionOperation` path. Fixed by only using C3 tokens when `collectionAccess` is NOT present (i.e., at collection/cursor method positions)
- At `db.|`, returns only hardcoded database methods + collections (unchanged)
- At `db.users.|` and `db.users.find().|`, returns C3 tokens directly

**Task 4 — Regenerate test expectations:**
- Used record mode to regenerate `test_completion.yaml`
- Key behavior change: `db.users.|` now shows 89 methods (both collection + cursor) instead of 34 collection-only. This is correct per the grammar, which allows either type as the first method.
- `db.users.find().|` correctly shows only 35 cursor methods (grammar enforces `cursorMethodCall` after first method)

**Task 5 — Cleanup verification:**
- `isMethodToken()` — still needed for token → function type classification
- `getDatabaseMethodCandidates()` — still needed because database methods from `dbStatement` aren't provided by C3 at `db.|` position
- `previousNonDotToken()` — still needed for `db.` vs `db[` distinction
- All 15 database methods verified as matching `dbStatement` grammar rule

**All tests pass, lint clean.**
**1. Grammar fix (`MongoShellParser.g4`)**: Removed `cursorMethodCall` from the first position in `methodChain`. Now only `collectionMethodCall` is valid as the first method after a collection:

```
methodChain
    : DOT collectionExplainMethod DOT collectionMethodCall (DOT cursorMethodCall)*
    | DOT collectionMethodCall (DOT cursorMethodCall)*
    ;
```

There's no valid use case for `db.users.sort()` — cursor methods only make sense after a collection method like `find()`.

**2. Removed `isMethodToken` and `getTokenCandidateType`**: In the `else` branch of `convertCandidates`, we're at a method position where every C3 token is a method. So we treat all tokens as `CandidateTypeFunction` with `()` suffix. This eliminates the ~120-line `isMethodToken` switch block entirely.

This also fixed a bug: `getPlanCache` was previously classified as KEYWORD (no `()`). Now it correctly appears as `getPlanCache()` with FUNCTION type.

**3. Completion behavior now**:
- `db.|` → 15 database methods + collections (hardcoded, unchanged)
- `db.users.|` → only **collection methods** (48 methods — no cursor methods like `allowDiskUse`, `sort`, `limit`)
- `db.users.find().|` → only **cursor methods** (35 methods)
**Removed `genericDbMethod`** — the `identifier LPAREN arguments? RPAREN` catch-all rule from `dbStatement`. Same pattern we already removed for collection methods (`genericMethod`).

**Added 45 new lexer tokens** for the 48 missing db methods (3 already had tokens: `AGGREGATE`, `WATCH`, `SET_WRITE_CONCERN`).

**Added 48 new `dbStatement` alternatives** — each follows the explicit pattern `DB DOT TOKEN LPAREN arguments? RPAREN` with a labeled alternative name.

**Updated `getDatabaseMethodCandidates()`** — expanded from 15 to 63 methods (all db methods now listed alphabetically).

**Updated `identifier` rule** — all 45 new tokens added so they can still be used as identifiers (collection names, keys, etc.).
Copilot AI review requested due to automatic review settings January 30, 2026 09:02
@rebelice rebelice merged commit effef73 into main Jan 30, 2026
9 checks passed
@rebelice rebelice deleted the vk/8788-still-mongodb-au branch January 30, 2026 09:06
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the MongoDB shell parser to replace generic catch-all grammar rules with explicit method definitions to improve C3 autocomplete context-awareness. The changes enable the autocomplete system to suggest only contextually-appropriate methods at each position (e.g., only db methods after "db.", only collection methods after "db.users.", only cursor methods after "db.users.find().").

Changes:

  • Removed two catch-all rules (genericMethod and genericDbMethod) that were causing autocomplete to flood all keywords as suggestions
  • Added 61 new lexer tokens (16 collection methods + 45 database methods) to explicitly recognize method names
  • Split the generic methodCall rule into two context-specific rules: collectionMethodCall (52 alternatives) and cursorMethodCall (35 alternatives)
  • Added 48 new explicit database method alternatives to the dbStatement rule

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 93 comments.

Show a summary per file
File Description
mongodb/MongoShellLexer.g4 Added 16 collection method tokens (bulkWrite, update, mapReduce, hideIndex, etc.) and 45 database method tokens (auth, createUser, grantRolesToUser, etc.)
mongodb/MongoShellParser.g4 Removed catch-all rules, split methodCall into collectionMethodCall/cursorMethodCall, added 48 explicit db method alternatives, added 20 new collection method parser rules
mongodb/mongoshellparser_visitor.go Updated visitor interface with new Visit methods for all new parser rules (45 db methods, 20 collection methods, split methodCall)
mongodb/mongoshellparser_listener.go Updated listener interface with Enter/Exit methods for all new parser rules
mongodb/mongoshellparser_base_visitor.go Added default implementations of Visit methods for all new rules
mongodb/mongoshellparser_base_listener.go Added default implementations of Enter/Exit methods for all new rules

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 218 to 221
methodChain
: DOT methodCall (DOT methodCall)*
: DOT collectionExplainMethod DOT collectionMethodCall (DOT cursorMethodCall)*
| DOT collectionMethodCall (DOT cursorMethodCall)*
;
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The methodChain grammar rule has a potential ambiguity issue. The first alternative DOT collectionExplainMethod DOT collectionMethodCall (DOT cursorMethodCall)* will never be matched because collectionExplainMethod is already included in collectionMethodCall (line 264). This means ANTLR will always choose the second alternative DOT collectionMethodCall (DOT cursorMethodCall)* when parsing explain() chains.

To fix this, collectionExplainMethod should be removed from the collectionMethodCall rule (line 264), so that the special case for explain().find() chains can be properly distinguished from normal method chains.

Copilot uses AI. Check for mistakes.
h3n4l added a commit to bytebase/gomongo that referenced this pull request Jan 30, 2026
The upstream parser (bytebase/parser#57) replaced the unified
MethodCallContext with CollectionMethodCallContext and
CursorMethodCallContext, and removed the GenericMethod catch-all.

- Split visitMethodCall into visitCollectionMethodCall and
  visitCursorMethodCall
- Update visitMethodChain to use CollectionMethodCall() and
  AllCursorMethodCall() instead of AllMethodCall()
- Replace GenericMethod fallback with extractMethodNameFromText

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
rebelice pushed a commit to bytebase/gomongo that referenced this pull request Jan 30, 2026
The upstream parser (bytebase/parser#57) replaced the unified
MethodCallContext with CollectionMethodCallContext and
CursorMethodCallContext, and removed the GenericMethod catch-all.

- Split visitMethodCall into visitCollectionMethodCall and
  visitCursorMethodCall
- Update visitMethodChain to use CollectionMethodCall() and
  AllCursorMethodCall() instead of AllMethodCall()
- Replace GenericMethod fallback with extractMethodNameFromText

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants