merve

A fast C++ lexer for extracting named exports from CommonJS modules. This library performs static analysis to detect CommonJS export patterns without executing the code.

Features

Fast: Zero-copy parsing for most exports using std::string_view
Accurate: Handles complex CommonJS patterns including re-exports, Object.defineProperty, and transpiler output
Unicode Support: Properly unescapes JavaScript string literals including \u{XXXX} and surrogate pairs
Optional SIMD Acceleration: Can use simdutf for faster string operations
No Dependencies: Single-header distribution available (simdutf is optional)
Cross-Platform: Works on Linux, macOS, and Windows

Installation

CMake

include(FetchContent)
FetchContent_Declare(
  merve
  GIT_REPOSITORY https://github.com/anonrig/merve.git
  GIT_TAG main
)
FetchContent_MakeAvailable(merve)

target_link_libraries(your_target PRIVATE lexer::lexer)

Single Header

Copy singleheader/merve.h and singleheader/merve.cpp to your project.

Usage

#include "merve.h"
#include <iostream>

int main() {
  std::string_view source = R"(
    exports.foo = 1;
    exports.bar = function() {};
    module.exports.baz = 'hello';
  )";

  auto result = lexer::parse_commonjs(source);
  
  if (result) {
    std::cout << "Exports found:" << std::endl;
    for (const auto& exp : result->exports) {
      std::cout << "  - " << lexer::get_string_view(exp) << std::endl;
    }
  }
  
  return 0;
}

Output:

Exports found:
  - foo
  - bar
  - baz

API Reference

`lexer::parse_commonjs`

std::optional<lexer_analysis> parse_commonjs(std::string_view file_contents);

Parses CommonJS source code and extracts export information.

Parameters:

file_contents: The JavaScript source code to analyze

Returns:

std::optional<lexer_analysis>: Analysis result, or std::nullopt on parse error

`lexer::lexer_analysis`

struct lexer_analysis {
  std::vector<export_string> exports;      // Named exports
  std::vector<export_string> re_exports;   // Re-exported module specifiers
};

`lexer::export_string`

using export_string = std::variant<std::string, std::string_view>;

Export names are stored as a variant to avoid unnecessary copies:

std::string_view: Used for simple identifiers (zero-copy, points to source)
std::string: Used when unescaping is needed (e.g., Unicode escapes)

`lexer::get_string_view`

inline std::string_view get_string_view(const export_string& s);

Helper function to get a string_view from either variant type.

`lexer::get_last_error`

const std::optional<lexer_error>& get_last_error();

Returns the last parse error, if any.

Supported Patterns

Direct Exports

exports.foo = 1;
exports['bar'] = 2;
module.exports.baz = 3;
module.exports['qux'] = 4;

Object Literal Assignment

module.exports = {
  foo: 1,
  bar: someValue,
  'string-key': 3
};

Object.defineProperty

Object.defineProperty(exports, 'foo', { value: 1 });
Object.defineProperty(exports, 'bar', {
  enumerable: true,
  get: function() { return something; }
});

Re-exports (Transpiler Patterns)

// Babel/TypeScript __export pattern
function __export(m) {
  for (var p in m) if (!exports.hasOwnProperty(p)) exports[p] = m[p];
}
__export(require("./other-module"));

// Object.keys forEach pattern
Object.keys(_module).forEach(function(key) {
  if (key === "default" || key === "__esModule") return;
  exports[key] = _module[key];
});

Spread Re-exports

module.exports = {
  ...require('./other-module'),
  foo: 1
};

Unicode Handling

The lexer properly handles JavaScript string escape sequences:

exports['\u0066\u006f\u006f'] = 1;     // 'foo'
exports['\u{1F310}'] = 2;              // Globe emoji
exports['\u{D83C}\u{DF10}'] = 3;       // Surrogate pair for emoji
exports['caf\u00e9'] = 4;              // 'cafe' with accent

Invalid escape sequences (like lone surrogates) are filtered out.

ESM Detection

The lexer detects ESM syntax and returns an error:

import foo from 'bar';        // Error: UNEXPECTED_ESM_IMPORT
export const foo = 1;         // Error: UNEXPECTED_ESM_EXPORT
import.meta.url;              // Error: UNEXPECTED_ESM_IMPORT_META

This helps identify files that should be parsed as ES modules instead.

Error Handling

auto result = lexer::parse_commonjs(source);

if (!result) {
  auto error = lexer::get_last_error();
  if (error) {
    switch (*error) {
      case lexer::UNEXPECTED_ESM_IMPORT:
        std::cerr << "File contains ESM import syntax" << std::endl;
        break;
      case lexer::UNTERMINATED_STRING_LITERAL:
        std::cerr << "Unterminated string literal" << std::endl;
        break;
      // ... handle other errors
    }
  }
}

Building

mkdir build && cd build
cmake ..
cmake --build .

Running Tests

cmake --build . --target real_world_tests
./tests/real_world_tests

Build Options

Option	Default	Description
`MERVE_TESTING`	`ON`	Build test suite
`MERVE_BENCHMARKS`	`OFF`	Build benchmarks
`MERVE_USE_SIMDUTF`	`OFF`	Use simdutf for optimized string operations
`MERVE_SANITIZE`	`OFF`	Enable address sanitizer

Building with simdutf

To enable SIMD-accelerated string operations:

cmake -B build -DMERVE_USE_SIMDUTF=ON
cmake --build build

When MERVE_USE_SIMDUTF=ON, CMake will automatically fetch simdutf via CPM if it's not found on the system. The library uses simdutf's optimized find() function for faster escape sequence detection.

For projects that already have simdutf available (like Node.js), define MERVE_USE_SIMDUTF=1 and ensure the simdutf header is in the include path.

Performance

The lexer is optimized for speed:

Single-pass parsing with no backtracking
Zero-copy for most export names using std::string_view
String allocation only when unescaping is required
Compile-time lookup tables using C++20 consteval
Optional SIMD acceleration via simdutf for escape sequence detection

License

Licensed under either of

Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
cmake		cmake
include		include
singleheader		singleheader
src		src
tests		tests
.clang-format		.clang-format
.editorconfig		.editorconfig
.gitignore		.gitignore
.release-please-manifest.json		.release-please-manifest.json
CMakeLists.txt		CMakeLists.txt
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
merve.pc.in		merve.pc.in
release-please-config.json		release-please-config.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

merve

Features

Installation

CMake

Single Header

Usage

API Reference

`lexer::parse_commonjs`

`lexer::lexer_analysis`

`lexer::export_string`

`lexer::get_string_view`

`lexer::get_last_error`

Supported Patterns

Direct Exports

Object Literal Assignment

Object.defineProperty

Re-exports (Transpiler Patterns)

Spread Re-exports

Unicode Handling

ESM Detection

Error Handling

Building

Running Tests

Build Options

Building with simdutf

Performance

License

About

Licenses found

Uh oh!

Releases

Contributors 5

Uh oh!

Languages

License

Licenses found

anonrig/merve

Folders and files

Latest commit

History

Repository files navigation

merve

Features

Installation

CMake

Single Header

Usage

API Reference

lexer::parse_commonjs

lexer::lexer_analysis

lexer::export_string

lexer::get_string_view

lexer::get_last_error

Supported Patterns

Direct Exports

Object Literal Assignment

Object.defineProperty

Re-exports (Transpiler Patterns)

Spread Re-exports

Unicode Handling

ESM Detection

Error Handling

Building

Running Tests

Build Options

Building with simdutf

Performance

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Contributors 5

Uh oh!

Languages

`lexer::parse_commonjs`

`lexer::lexer_analysis`

`lexer::export_string`

`lexer::get_string_view`

`lexer::get_last_error`