Claude Code transcript

35 prompts · 8032 messages · 2417 tool calls · 71 commits · 12 pages

#1

Clone this library https://github.com/bellard/mquickjs to /tmp

Your goal is to create a complete Python port of this library that implement a sandbox JavaScript library (with memory and time limits) entirely in python, with no dependencies

The goal is an implantation that passes all of the tests in the /tmp/mquickjs/tests folder

For the first commit create a tests/ folder and copy over those tests exactly as they are

Use uv for this project. Start with "uv init --lib" to start a new library. Then "uv add pytest --dev" to add pytest

Add a tests/test_basic.py file with a test with an assert that 1 + 1 =2= 2, then run "uv run pytest" to confirm that the test harness works correctly

Now design the new Python library. Investigate the C library and build a detailed spec in a file called spec.py, complete with TODOs for the project

Do your first commit and push once you have written that initial spec

17 bash · 14 read · 7 todo · 4 write · 3 edit

I've completed the initial setup of the mquickjs-python project. Here's what was accomplished:

Summary

  1. Cloned mquickjs from https://github.com/bellard/mquickjs to /tmp/mquickjs

  2. Set up Python project with uv:

  3. Ran uv init --lib to create a library project
  4. Ran uv add pytest --dev to add pytest as a dev dependency

  5. Copied test files from /tmp/mquickjs/tests/ to tests/:

  6. test_language.js - Language feature tests (operators, types, closures)
  7. test_loop.js - Loop and control flow tests
  8. test_closure.js - Closure tests
  9. test_builtin.js - Built-in object tests (Object, Array, String, etc.)
  10. mandelbrot.js - Mandelbrot fractal example
  11. microbench.js - Performance benchmarks
  12. test_rect.js - Rectangle class test (requires C interface)

  13. Created basic pytest test in tests/test_basic.py and verified it passes with uv run pytest

  14. Created comprehensive spec.py with:

  15. Overview of MQuickJS and its JavaScript subset ("stricter mode")
  16. Architecture description (lexer, parser, compiler, VM, runtime, builtins)
  17. Detailed bytecode opcode definitions
  18. Complete TODO list organized by phase (7 phases)
  19. API design specification
  20. File structure plan
  21. Implementation notes

  22. First commit and push to branch claude/python-mquickjs-port-uPKcc

The spec.py file contains a detailed roadmap for implementing the full JavaScript sandbox engine, with TODOs covering: - Core infrastructure (values, tokens, lexer) - Parser (expressions, statements, functions) - Bytecode compiler - Virtual machine with memory/time limits - All built-in JavaScript objects - Testing integration

45ab671
Add initial project structure with spec and test files
#2

OK, now start adding your own more basic set of tests, as JS files in tests/basic/*.js - the goal is to use these to test the implantation as it continues, with each commit accompanying a new passing test.

Build a pytest Parameterized test that executes tests in that tests/basic/ folder

Start with the most basic concept needed to have a JS test pass- as the first file in that new directory. Build enough of the library to get it to pass, then commit what you have, then do another one with an additional al language feature, then keep on like that.

18 bash · 13 write · 10 todo · 3 edit · 2 read
1ab545b
Rename spec.py to spec.md and convert to proper Markdown format
cdb66b4
Add lexer with TDD tests (54 tests passing)
9b51831
Add parser with TDD tests (59 tests passing)
f2eb983
Add bytecode compiler, VM, and JSContext (115 tests passing)
b2d89d8
Add xfail tests for original mquickjs .js test files
fe31c0d
Update spec.md with current progress and fix VM to return expression values
41fae6a
Add VM tests and fix global variable handling (160 tests passing)
#3

Did you commit them?

2 bash
#4

I think there may be an infinite loop in your tests, run them in a timeout report how long they take to run

#5

I think there may be an infinite loop in your tests, run them in a timeout report how long they take to run

10 bash · 1 read · 1 edit · 1 write · 1 todo
cbc6e3b
Add arguments object, array methods, and many loop/try-catch fixes
#6

What's left to finish the project (excluding the regex bit)?

6 bash · 1 read

Based on the failing tests and spec, here's what's left (excluding regex):

Critical Issues

  1. Closures (test_closure.js)
  2. Variables from outer scopes aren't captured
  3. Error: b is not defined in nested functions

  4. break/continue in try-finally (test_loop.js)

  5. break inside try-finally should execute finally first
  6. Currently causes infinite loop

  7. Parser gaps (test_language.js)

  8. Computed property names: {[expr]: value}
  9. Getter/setter syntax: {get x() {}, set x(v) {}}

Built-in Objects Missing

Object Methods Needed
String charAt, indexOf, substring, slice, split, replace, trim, toLowerCase, toUpperCase
Array map, filter, reduce, forEach, indexOf, slice, concat, reverse, sort
Object keys, hasOwnProperty, create
Math floor, ceil, round, abs, min, max, pow, sqrt, random, sin, cos
JSON parse, stringify
Date now() only
Number toString, toFixed, isNaN, isFinite

Advanced Features

  • Proper garbage collection
  • Memory limit enforcement (exists but basic)
  • eval() (global/indirect only)

Estimated Priority

  1. Closures - needed for most real code
  2. Built-in String/Array methods - heavily used
  3. break in try-finally - edge case
  4. Math/JSON - commonly used
  5. Parser gaps - less common syntax

Would you like me to start on any of these?

#7

Do them all, commit as you go with new passing tests, do TDD

25 edit · 23 read · 12 bash · 1 todo · 1 write · 1 grep
7770a5b
Implement closures with support for nested functions
4b075f5
Add String methods (charAt, indexOf, substring, slice, split, etc.)
786601a
Add Array methods (map, filter, reduce, forEach, indexOf, etc.)
5c8e96e
Add Math object with constants and functions
13b2e04
Add JSON object with parse and stringify
9dfb913
Add Object methods (keys, values, entries, assign, hasOwnProperty)
16212c8
Add Number methods and Date.now()
9cd1e31
Add try-finally test and clean up existing tests
#8

OK, let's do the regex thing. The big challenge here is that we need to protect against hostile regular expressions that use up all RAM or spin the CPU in a way that skips the time limit.

Take a look at /tmp/mquickjs and review how its regex mechanism handles this

Muse on if we should build a regex engine from scratch inspired by that one to get the reliable memory and CPU behavior we desire in the face of malicious patterns

13 read · 11 bash · 11 grep · 2 task · 1 glob

Perfect! I now have a comprehensive understanding of the mquickjs regex implementation. Let me provide you with a detailed summary:

MicroQuickJS Regex Implementation - Comprehensive Analysis

1. File Locations

The regex implementation is entirely contained in: - /tmp/mquickjs/mquickjs.c - Main regex engine (lines 15477-17386) - /tmp/mquickjs/mquickjs_opcode.h - Regex bytecode instruction definitions (lines 222-264) - /tmp/mquickjs/mquickjs.h - Header with JSContext structures

2. Regex Engine Architecture - Hybrid NFA with Virtual Machine

The engine is a Virtual Machine-based implementation with characteristics of both NFA and DFA:

Architecture Overview: - Bytecode Compiler: Converts JavaScript regex patterns into bytecode instructions - VM Interpreter: Executes bytecode with explicit stack and state management - Split-based NFA: Uses explicit split instructions (REOP_split_goto_first, REOP_split_next_first) to handle alternation and quantifiers, mimicking NFA behavior - State Machine: Uses stack-based state tracking for backtracking via the bp (backtrack pointer)

Key Bytecode Instructions (43 opcodes):

REOP_char1/2/3/4         // Match 1-4 character literals
REOP_dot, REOP_any       // Match any character (with/without newlines)
REOP_range8, REOP_range  // Character class matching (8-bit or Unicode ranges)
REOP_split_goto_first    // Split to first path (NFA branch)
REOP_split_next_first    // Split to next path (alternate branch order)
REOP_loop                // Loop counter management
REOP_loop_split_*        // Loop with optional split (for quantifiers)
REOP_loop_check_adv_*    // Loop with zero-advance detection (ReDoS protection)
REOP_save_start/end      // Capture group markers
REOP_back_reference      // Backreference matching
REOP_lookahead/negative  // Lookahead assertions
REOP_goto, REOP_match    // Control flow

Header Structure (4 bytes):

Offset 0-1: Flags (global, ignorecase, multiline, dotall, unicode, sticky)
Offset 2:   Capture count (max 255)
Offset 3:   Register count (max 255)

3. CPU Limit (Timeout) Protection

LRE_POLL_INTERRUPT Macro (lines 16788-16817):

#define LRE_POLL_INTERRUPT() do {
    if (unlikely(--ctx->interrupt_counter <= 0)) {
        JSValue ret;
        // Save execution state (program counter, string position)
        // Call __js_poll_interrupt(ctx) to check timeout/interrupt
        // If interrupted, return -1 (error)
        // Otherwise restore state and continue
    }
} while(0)

Key Points: - Decrements ctx->interrupt_counter on each loop iteration - Calls on: REOP_goto, REOP_loop, REOP_no_match backtracking - Can be configured per JSContext to set execution time limits - Returns error code (-1) if timeout occurs, allowing graceful interruption

4. Memory Limit Protection

Two Protection Mechanisms:

A. Stack Space Checking (CHECK_STACK_SPACE macro, lines 16820-16849):

#define CHECK_STACK_SPACE(n) {
    if (unlikely(sp <= ctx->stack_bottom)) {
        // Call JS_StackCheck(ctx, n)
        // Prevents stack overflow during backtracking
    }
}

B. Register/Capture Limits: - Maximum 255 capture groups (CAPTURE_COUNT_MAX) - Maximum 255 registers for loop counters (REGISTER_COUNT_MAX) - Checked during compilation, rejected if exceeded

C. Bytecode Limits: - No explicit bytecode size limit enforced at runtime - Compile-time checks prevent excessive register allocation

5. ReDoS (Catastrophic Backtracking) Protection

Zero-Advance Check (Primary Defense):

The engine implements an innovative "zero-advance detection" mechanism:

re_need_check_adv_and_capture_init() [lines 15770-15828]

How it works: - Analyzes regex bytecode during compilation to determine if an atom can match without advancing the string position - Sets add_zero_advance_check flag for problematic quantifiers - Generates additional instructions: REOP_set_char_pos and REOP_check_advance

Protection in Action:

// Example: a* where 'a' might match zero-length
REOP_set_char_pos     // Record current position
<atom_matching>
REOP_check_advance    // Verify position advanced, else goto no_match

Patterns Protected: - Quantifiers after anchors (e.g., ^*, $*) - Quantifiers after zero-width assertions - Empty matches in loops

Greedy vs Non-greedy: - Greedy: REOP_split_goto_first (try main path first) - Non-greedy: REOP_split_next_first (try alternate first) - Conditional splitting for bounded quantifiers

Additional Optimizations: 1. Atom Merging (lines 16540-16551): Combines consecutive character matches - Up to 4 characters merged into single REOP_char1/2/3/4

  1. Range Optimization (lines 16044-16095):
  2. Binary search for character ranges
  3. 8-bit optimization for small ranges (< 16 entries, max 254)

  4. Greedy/Non-greedy Loop Distinctions:

  5. REOP_loop_split_goto_first (greedy: loop then split)
  6. REOP_loop_split_next_first (non-greedy: split then loop)

6. Execution Model

State Stack Structure:

typedef enum {
    RE_EXEC_STATE_SPLIT,              // Branching point
    RE_EXEC_STATE_LOOKAHEAD,          // Positive lookahead
    RE_EXEC_STATE_NEGATIVE_LOOKAHEAD, // Negative lookahead
} REExecStateEnum;

// Stack entry (3 JSValues per state):
sp[0] = RE_PC_TYPE_TO_VALUE(pc, type)    // Program counter & state type
sp[1] = JS_NewShortInt(cptr - cbuf)      // String position
sp[2] = SP_TO_VALUE(ctx, bp)             // Parent base pointer

Matching Algorithm: 1. Interpret bytecode sequentially 2. On split: push current state to stack 3. On failure: pop stack to restore last branch 4. On success: advance string position, continue 5. Complete match when reaching REOP_match

7. Key Functions

Function Purpose
js_parse_regexp() Compiles regex pattern to bytecode
re_parse_disjunction() Handles \| alternation
re_parse_alternative() Parses sequences and atoms
re_parse_quantifier() Handles *, +, ?, {n,m}
lre_exec() Main VM interpreter
re_need_check_adv_and_capture_init() ReDoS protection analyzer
re_compute_register_count() Calculates register requirements
re_range_optimize() Optimizes character class matching

8. Notable Limitations

  • ASCII case-folding only (Unicode case-folding not supported)
  • UTF-8 internal representation (matching works on Unicode codepoints)
  • No special support for Unicode properties or named groups
  • Sticky flag (y) supported but behavior matches standard

9. Security Summary

Well-Protected Against: - ReDoS via zero-advance detection - Stack overflow via explicit checks - Memory exhaustion via register limits - Timeout/CPU limit via interrupt mechanism

Typical Attack Mitigation:

// Pattern like: (a+)+b would cause ReDoS in naive engines
// mquickjs detects the problematic nested quantifier and adds
// REOP_check_advance to prevent infinite backtracking

The implementation represents a sophisticated approach to secure regex execution through bytecode compilation, explicit state management, and compile-time analysis for ReDoS patterns.

My recommendation: Build a custom regex VM from scratch, inspired by mquickjs.

Here's why this makes sense for our sandboxed JavaScript engine:

The mquickjs approach translates well to Python:

# Concept sketch of the architecture

class RegexOpCode(IntEnum):
    CHAR = auto()        # Match literal character(s)
    DOT = auto()         # Match any char
    RANGE = auto()       # Character class [a-z]
    SPLIT = auto()       # NFA branch point (for |, *, +, ?)
    JUMP = auto()        # Goto
    SAVE = auto()        # Capture group start/end
    MATCH = auto()       # Success

class RegexVM:
    def __init__(self, bytecode, vm_context):
        self.bytecode = bytecode
        self.ctx = vm_context  # Reference to main VM for _check_limits()
        self.step_count = 0

    def match(self, string, pos=0):
        stack = []  # Backtracking states
        pc = 0      # Program counter
        sp = pos    # String position

        while True:
            self.step_count += 1
            if self.step_count % 100 == 0:
                self.ctx._check_limits()  # Hook into main VM timeout!

            if len(stack) > 10000:  # Memory limit
                raise MemoryLimitError("Regex stack overflow")

            op = self.bytecode[pc]
            # ... execute instruction

Key benefits: 1. Timeout integration: Call _check_limits() every N steps 2. Memory control: Explicit stack size limits 3. ReDoS protection: Can implement zero-advance detection like mquickjs 4. Pure Python: No external dependencies, works in sandboxed environments

Implementation scope (minimal viable): - Literals, ., ^, $ - Character classes [abc], [^abc], [a-z] - Quantifiers *, +, ?, {n,m} (greedy and non-greedy) - Alternation | - Groups () with captures - Backreferences \1 - Basic assertions (word boundary \b)

Should I start implementing this regex engine? It would be a significant but worthwhile addition to make the sandbox truly robust against malicious patterns.

#9

Yes do it - try for feature parity with mquickjs, use TDD and build a comprehensive and clean and readable test suite, do this in a separate Python module so it can be extracted as a separate library later

1 todo
#10

Continue

21 edit · 15 read · 14 bash · 3 grep · 2 todo · 2 write

I've successfully implemented a complete custom regex engine with JSContext integration. Here's the summary:

Regex Engine Implementation Complete

Custom Regex Module (src/mquickjs_python/regex/)

  • opcodes.py: 30+ regex-specific bytecode opcodes inspired by mquickjs
  • parser.py: Recursive descent parser for JavaScript regex patterns
  • compiler.py: Compiles AST to bytecode with zero-advance detection for ReDoS protection
  • vm.py: Bytecode VM with explicit backtracking stack
  • regex.py: Public interface (RegExp class)

Features Implemented

  • Character matching, character classes, ranges, shorthand classes (\d, \w, \s)
  • Anchors (^, $, \b, \B) with multiline support
  • Quantifiers (*, +, ?, {n,m}) with greedy/lazy modes
  • Alternation (|) and groups (capturing/non-capturing)
  • Lookahead assertions (positive/negative)
  • Backreferences
  • Unicode support and escape sequences
  • All regex flags (g, i, m, s, u, y)

ReDoS Protection

  • Step limits to prevent exponential blowup
  • Zero-advance detection in compiler
  • Stack overflow protection
  • Timeout integration via poll callback

JSContext Integration

  • new RegExp(pattern, flags) constructor
  • re.test(string) method
  • re.exec(string) with capture groups and index
  • Global flag with lastIndex tracking
  • All properties accessible

Test Results

  • 283 tests passing (18 new JSContext RegExp tests + 265 existing)
  • 9 xfailed (lookbehind assertions pending, 7 original test files)
d8d9d2c
Add custom regex engine with ReDoS protection (86 tests passing)
a0434aa
Integrate RegExp with JSContext (283 tests passing)
#11

OK what is left for our Python version to have total feature parity with the original C?

35 bash · 13 read · 1 task

Now let me create a comprehensive summary of my findings. Based on my detailed exploration of both codebases, here's the comparison:

COMPREHENSIVE COMPARISON: Python mquickjs vs C mquickjs

1. JAVASCRIPT LANGUAGE FEATURES

Supported in C but Missing in Python:

  • for...of loops - C supports iteration with for_of_start and for_of_next opcodes
  • Spread operator (...) - Not in Python parser
  • Destructuring assignment - Not in Python parser
  • Arrow functions (=>) - Not in Python parser
  • Template literals (backticks) - Not in Python lexer/parser
  • Async/await - Not implemented
  • Generators (function*, yield) - Not implemented
  • Classes (class, extends, super) - Not in parser
  • Symbols - Not implemented
  • Proxies - Not implemented
  • WeakMap/WeakSet - Not implemented
  • Getters/setters in object literals - Partially supported in C
  • Computed property names - Not in Python
  • Optional chaining (?.) - Not implemented
  • Nullish coalescing (??) - Not implemented
  • BigInt - Not implemented

Partially Supported/Limited:

  • for...in loops - Python has basic implementation but may lack edge cases
  • try...catch...finally - Python has basic implementation
  • Function declarations - Python supports basic functions only
  • Object literals - Limited property definition support in Python

2. BUILT-IN OBJECTS AND METHODS

String Methods

Missing in Python: - replaceAll() - Only replace() (first occurrence only) is implemented - match() - Not implemented (needs regex integration) - search() - Not implemented (needs regex) - padStart() - Not implemented - padEnd() - Not implemented - at() - Not implemented (ES2022) - matchAll() - Not implemented (needs regex) - localeCompare() - Not implemented - normalize() - Not implemented (Unicode) - repeat() with proper coercion - Partially implemented

Implemented in Python: - charAt, charCodeAt, indexOf, lastIndexOf - substring, slice, split - toLowerCase, toUpperCase - trim - concat, repeat - startsWith, endsWith, includes - replace (first occurrence only) - toString


Array Methods

Missing in Python: - fill() - Not implemented - flat() - Not implemented - flatMap() - Not implemented - at() - Not implemented (ES2022) - splice() - Not implemented - sort() - Not implemented with proper comparator - reduceRight() - Not implemented - from() (static) - Not implemented - of() (static) - Not implemented - copyWithin() - Not implemented - entries() - Not implemented - keys() - Not implemented - values() - Not implemented - toLocaleString() - Not implemented

Implemented in Python: - push, pop, shift, unshift - toString, join - map, filter, reduce, forEach - find, findIndex - indexOf, lastIndexOf - some, every - concat, slice - reverse, includes


Object Methods

Missing in Python: - Object.assign() - Not implemented - Object.freeze() - Not implemented - Object.seal() - Not implemented - Object.preventExtensions() - Not implemented - Object.isFrozen() - Not implemented - Object.isSealed() - Not implemented - Object.isExtensible() - Not implemented - Object.entries() - Not implemented - Object.values() - Not implemented - Object.getOwnPropertyDescriptor() - Not implemented - Object.getOwnPropertyDescriptors() - Not implemented - Object.getOwnPropertyNames() - Not implemented - Object.getOwnPropertySymbols() - Not implemented - Object.defineProperties() - Not implemented

Implemented in Python: - Object.keys() - Partially (in context only) - Object.create() - Minimal implementation in context - Object.getPrototypeOf() - In context - Object.setPrototypeOf() - In context - Object.defineProperty() - In context - object.toString() - Basic - object.hasOwnProperty() - Basic


Number Methods

Missing in Python: - Number.parseFloat() - Global function only - Number.parseInt() - Global function only (not as Number static method) - Number.isNaN() - Not as static method - Number.isFinite() - Not as static method - toExponential() - Not implemented - toPrecision() - Not implemented - toLocaleString() - Not implemented

Implemented in Python: - toFixed() - Basic implementation - toString(radix) - With radix support - Global: isNaN(), isFinite(), parseInt(), parseFloat()


Math Object

Missing in Python: - Math.log10() - Not implemented - Math.log2() - Not implemented - Math.log1p() - Not implemented - Math.expm1() - Not implemented - Math.cbrt() - Not implemented - Math.hypot() - Not implemented - Math.clz32() - Not implemented - Math.imul() - Not implemented - Math.fround() - Not implemented

Implemented in Python: - Constants: PI, E, LN2, LN10, LOG2E, LOG10E, SQRT1_2, SQRT2 - Functions: abs, floor, ceil, round, trunc - min, max, pow, sqrt - Trigonometric: sin, cos, tan, asin, acos, atan, atan2 - log, exp, random, sign


Date Object

Missing in Python: - Constructor with parameters - Instance methods like getFullYear(), getMonth(), getDate(), getTime(), etc. - Methods: toDateString(), toTimeString(), toISOString(), etc. - Date parsing and formatting

Implemented in Python: - Date.now() - Basic static method only


JSON Object

Implemented in Python: - JSON.parse() - JSON.stringify()

Both seem reasonably complete for basic JSON operations.


RegExp Object

Missing in Python: - Full regex support (limited implementation) - RegExp.prototype.exec() - Basic implementation only - RegExp.prototype.test() - Basic implementation only - Regex string methods: match(), search(), replace(), replaceAll(), split() with regex patterns - Lookahead/lookbehind assertions - Named capture groups - Backreferences - Unicode property escapes - Advanced regex flags handling

Implemented in Python: - Basic RegExp object creation - test() method (basic) - exec() method (basic) - Properties: source, flags, global, ignoreCase, multiline, dotAll, unicode, sticky, lastIndex


3. TYPED ARRAYS AND BUFFERS

Missing in Python: - ArrayBuffer class - Not implemented - All typed array types: - Uint8ClampedArray - Int8Array, Uint8Array - Int16Array, Uint16Array - Int32Array, Uint32Array - Float32Array, Float64Array - DataView - Not implemented - Methods: subarray(), slice(), copy(), etc.

Implemented in C: - All typed array types fully implemented with proper methods


4. ERROR HANDLING AND EXCEPTIONS

Missing in Python: - Full error stack traces - Error stack property implementation - All error types implemented but with limited functionality

Implemented in Python: - Basic error types: Error, TypeError, ReferenceError - try...catch...finally blocks (basic) - Error message property

Available in C but limited in Python: - EvalError, RangeError, SyntaxError, URIError, InternalError


5. OTHER RUNTIME FEATURES

Missing in Python: - Symbol primitive type - Proxy and Reflect API - Promise and async operations - WeakMap, WeakSet, Map, Set (collections) - Module system (import, export) - eval() function (security reasons, reasonable) - new.target meta-property - arguments object (functions) - Function.prototype.bind() - Not implemented - Function.prototype.call() - Not implemented - Function.prototype.apply() - Not implemented - Getters/setters via __defineGetter__ - Property descriptors and Object.defineProperty() functionality - Reflect API - globalThis reference

Partially Implemented in Python: - Global functions: parseInt(), parseFloat(), isNaN(), isFinite() - console.log() - Basic implementation - Math and JSON objects - Good coverage

Implemented in C but not Python: - with statement (security feature, reasonably omitted) - debugger statement - void operator (edge case) - Proper getter/setter syntax in object literals - Property enumeration order guarantees


6. OPCODES/VM OPERATIONS

Missing in Python VM: - For-of iteration support - Proper spread operator handling - Destructuring operations - Optional chaining operations - Async/await operations - Generator operations - Proper class instantiation opcodes

Implemented in Python: - Basic stack operations: POP, DUP, SWAP, ROT3 - Constants: LOAD_CONST, LOAD_UNDEFINED, LOAD_NULL, LOAD_TRUE, LOAD_FALSE - Variables: LOAD_NAME, STORE_NAME, LOAD_LOCAL, STORE_LOCAL - Properties: GET_PROP, SET_PROP, DELETE_PROP - Arrays/Objects: BUILD_ARRAY, BUILD_OBJECT - Arithmetic: ADD, SUB, MUL, DIV, MOD, POW, NEG, POS - Bitwise: BAND, BOR, BXOR, BNOT, SHL, SHR, USHR - Comparison: LT, LE, GT, GE, EQ, NE, SEQ, SNE - Type: TYPEOF, INSTANCEOF, IN - Control flow: JUMP, JUMP_IF_FALSE, JUMP_IF_TRUE - Functions: CALL, CALL_METHOD, RETURN, RETURN_UNDEFINED - Objects: NEW, THIS - Exceptions: THROW, TRY_START, TRY_END, CATCH - Iteration: FOR_IN_INIT, FOR_IN_NEXT - Increment/Decrement: INC, DEC, POST_INC, POST_DEC - Closures: MAKE_CLOSURE, LOAD_CLOSURE, STORE_CLOSURE, LOAD_CELL, STORE_CELL


SUMMARY BY PRIORITY

Critical Missing Features (High Priority):

  1. Typed Arrays (ArrayBuffer, Int8Array, etc.)
  2. Full Array methods (splice, sort, flat, flatMap, fill)
  3. Full String methods (replaceAll, match, search with regex, padStart, padEnd)
  4. Date object full implementation
  5. Full RegExp with string integration
  6. Object static methods (assign, freeze, seal, keys with proper behavior)

Important Missing Features (Medium Priority):

  1. for...of loops
  2. Spread operator
  3. Destructuring
  4. Arrow functions
  5. Template literals
  6. Math functions (log10, log2, cbrt, hypot, clz32, imul, fround)
  7. Function.prototype methods (bind, call, apply)
  8. Getters/setters
  9. Symbol and Map/Set collections

Advanced Features (Lower Priority - Performance Optimizations):

  1. Async/await
  2. Generators
  3. Classes with inheritance
  4. Proxies and Reflect
  5. Promises
  6. Module system

This comparison shows that the Python implementation has a solid foundation for basic JavaScript execution but is missing many standard library features and advanced language features that the C version provides.

Here's a comprehensive summary of what's missing for feature parity:

Missing Features Summary

🔴 Critical (Core JS Functionality)

Language Features: - Arrow functions (=>) - Template literals (backticks) - for...of loops - Spread operator (...) - Destructuring assignment - Classes (class, extends, super) - Optional chaining (?.) - Nullish coalescing (??)

String Methods with Regex: - String.prototype.match(regex) - String.prototype.search(regex) - String.prototype.replace(regex, replacement) - String.prototype.replaceAll() - String.prototype.split(regex)

Array Methods: - splice(), sort() with comparator - flat(), flatMap(), fill() - Array.from(), Array.of()

🟡 Important (Standard Library)

Date Object: - Full constructor with date parsing - Instance methods: getFullYear(), getMonth(), getDate(), getHours(), etc. - Formatting: toISOString(), toDateString(), etc.

Object Methods: - Object.assign() - Object.freeze(), Object.seal() - Object.entries(), Object.values() (already have Object.keys())

Function Methods: - Function.prototype.bind() - Function.prototype.call() - Function.prototype.apply()

String Methods: - padStart(), padEnd()

RegExp: - Lookbehind assertions (already xfailed) - Named capture groups

🟢 Nice to Have (Advanced)

  • Typed Arrays (ArrayBuffer, Uint8Array, etc.)
  • Map, Set, WeakMap, WeakSet
  • Symbol
  • Promise, async/await
  • Generators
  • Proxy and Reflect

Would you like me to implement any of these? I'd suggest starting with: 1. String regex methods (match, search, replace with regex) - builds on our regex work 2. Arrow functions - very common syntax 3. Object.assign(), Object.entries(), Object.values() - frequently used

#12

Let's get the xfails all working - or come up with a very good explanation for why we can't

9 read · 7 grep · 4 bash · 3 edit · 1 todo · 1 write
d4bb34b
Add regex literals and String regex methods (match, search, replace, split)
47b6a05
Add arrow function syntax support
f88bb80
Add Function.prototype bind/call/apply methods
#13

It would be cool to see a bunch of tests that pass both the C and the Python thing the same way, do we have that?

13 bash · 5 read · 1 grep · 1 glob · 1 write

Done! Here's a summary of the compatibility tests and xfail status:

Compatibility Test Results (40/45 passing)

The Python implementation matches the C QuickJS for: - All arithmetic operations - All string methods - All array operations
- Objects and methods - Functions and arrow functions - Control flow, Math functions, comparisons

5 failing compatibility tests need: - Closure variable capture fix (nested function scoping) - Block expressions returning values - Number(), String(), Boolean() constructors

What's Needed to Fix the 9 xfail Tests

Test Issue Fix Needed
test_closure.js Named function expressions Make function name() {...} bind name inside body
test_loop.js Infinite loop/timeout Debug for-in/labeled statement handling
test_language.js Syntax error Likely needs getter/setter or computed properties
test_rect.js Missing Rectangle/FilledRectangle C-defined classes, not applicable to pure Python
test_builtin.js Comprehensive built-in tests Many features still needed
mandelbrot.js Performance benchmark Needs complete VM
microbench.js Performance benchmark Needs complete VM
Lookbehind regex (2) Regex lookbehind Implement (?<=...) and (?<!...) in regex engine

The easiest wins would be: 1. Named function expressions - relatively simple parser/compiler change 2. Lookbehind regex - extend the regex engine 3. Number/String/Boolean constructors - add to global object

32edb6b
Add compatibility tests comparing Python vs C QuickJS implementations
#14

Finish it, do everything needed to fix those tests

24 bash · 17 read · 6 grep · 5 edit · 1 todo
9b1230d
Fix closures, block return values, and add String/Boolean constructors
#15

Always use "uv run pytest" to run the test

3 bash
#16

Make a to-do list for the remaining X fail tests

11 bash · 1 todo

Here's the breakdown of remaining xfail tests:

To-Do List for Xfail Tests

Test Issue Fix Needed
test_builtin.js void operator not implemented Add void unary operator (returns undefined)
test_language.js Syntax error on getters/setters Add get/set property syntax: {get x() {...}}
microbench.js Comma operator in for-init Support for(i=0, n=len; ...) syntax
test_loop.js Times out (infinite loop) Fix for(;;) empty conditions + for-in edge cases
test_regex.py Lookbehind assertions Add (?<=...) and (?<!...) to regex engine
mandelbrot.js Times out (performance) Low priority - it runs but is slow
test_rect.js Needs C-defined classes Not applicable - requires C extension

Priority Order

  1. void operator - Simple, unblocks test_builtin.js
  2. Comma operator in for-init - Parser change, unblocks microbench.js
  3. Empty for conditions - Parser fix for for(;;)
  4. Getter/setter syntax - Parser + compiler, unblocks test_language.js
  5. Regex lookbehind - Complex regex engine changes
  6. Performance - Optional optimization work

Want me to start implementing these?

#17

Do it all, TDD so you write a failing test first, commit each time you get to a passing test

72 bash · 18 read · 16 edit · 13 grep · 5 todo
6dcdde7
Add void operator support
e8e5761
Add for...of loop support
e1b6786
Add getter/setter property syntax support
4a56c7c
Allow reserved words as property names and fix get/set method shorthand
6cf3a86
Fix try-finally interaction with break/continue/return and exceptions
f3cd9d9
Move passing test files to tests/basic/ directory
27601f3
Organize passing mquickjs tests into tests/compat/ directory
2c07659
Implement regex lookbehind assertions
#18

Build a todo list of feature those remaining tests need

6 bash · 5 read · 1 todo

Here's the todo list for features needed by the remaining xfail tests:

test_language.js (parser/compiler): 1. Fix labeled statement parsing (while (1) label: break) 2. Add exponentiation operator (**) 3. Add bitwise operators (<<, >>, >>>, &, |, ^, ~) 4. Add instanceof operator

test_builtin.js (built-in objects): 1. Make Object/Array/Function callable as constructors (new Object()) 2. Add Object.getPrototypeOf/setPrototypeOf 3. Add Object.defineProperty with getter/setter support 4. Add Object.create 5. Add Function constructor (new Function('a','b','return a+b')) 6. Add TypedArrays (Int32Array, Uint8Array, Uint8ClampedArray, etc.) 7. Add ArrayBuffer 8. Add Date.now() 9. Add String methods: codePointAt, fromCodePoint 10. Add Math methods: fround, imul, min 11. Add JSON.parse/stringify 12. Add eval() function

mandelbrot.js: - Actually runs! Uses console.log and basic math. Just needs more time or is complete.

test_rect.js: - Uses external Rectangle/FilledRectangle classes (C bindings) - not applicable

microbench.js: - Needs most of the above features plus performance.now, BigInt, Set

#19

Finish it, TDD, commit often

11 read · 9 edit · 6 bash · 4 grep · 2 todo
672fe03
Fix labeled statement break/continue semantics
db17da1
Make Object/Array/RegExp callable as constructors
878399c
Fix ASI for break/continue with newline before identifier
34929a4
Add update expressions for member expressions (a.x++, arr[i]--)
b1f7ea2
Fix signed 32-bit left shift and negative zero handling
c0cc159
Add ToPrimitive conversion, typeof for undeclared vars, Object prototype methods
b08e96a
Add Object.defineProperty, Object.create, and related methods
b7e2e39
Add Function constructor for dynamic function creation
57f0394
Add eval() function for runtime code evaluation
#20

Show summary line from uv run pytest

1 bash
#21

Tell me about the xpass and the xfails

2 bash · 1 read
#22

Move test_language to tests/compat - then work on the other xfails with TDD and aim to move them all to tests/compat once they pass as well

71 bash · 23 read · 18 edit · 14 grep · 3 todo
19e8906
Move test_language.js to compat (now passing)
60f7b77
Add Object.prototype methods and JSBoundMethod support
6d9eebc
Add all Error constructors (TypeError, SyntaxError, etc.)
8215817
Fix JSON.stringify to omit undefined values, fix parseFloat Infinity
df438f4
Add Python/JS interop test and fix JS-style rounding
175d079
Fix for-of loop parsing without var declaration
b09fcac
Move mandelbrot.js to compat (now passing)
f66fd7e
Add configurable test timeout for long-running tests
#23

Delete test_rect.js - then work on microbench and typed arrays

12 read · 11 edit · 8 bash · 6 grep · 3 todo
ec99a9a
Remove test_rect.js (replaced by test_rect.py)
90a066a
Add typed arrays and fix UTF-16 test issues
4af743b
Add string methods and array improvements
0882788
Add parametrized tests for test_builtin.js and fix VM issues
4b5c19c
Remove C QuickJS comparison from tests
6229c9c
Add complete TypedArray support with ArrayBuffer backing
9d23ad2
Fix regex lookahead captures and control character escapes
0c3f291
Update xfail reasons for remaining test_builtin.js tests
#24

Make a detailed todo list for the remaining xfail issues, then write fine grained xfail pytest tests for each of the features that need fixing and commit and push those

11 bash · 6 todo · 1 write · 1 edit

Done! I've created tests/test_known_issues.py with fine-grained xfail tests for all remaining issues. Here's a summary:

Known Issues Summary (14 xfail tests)

Indirect Eval (1 issue)

Test Issue
test_indirect_eval_writes_global (1,eval)("z = 3") doesn't persist writes to existing global variables

Regex Capture Groups (3 issues)

Test Issue
test_capture_group_reset_in_repetition Groups in (...)* don't reset to undefined when not matched
test_optional_lookahead_no_match Optional (?:(?=(abc)))? retains capture when not needed
test_repeated_optional_lookahead Same issue with {0,2} quantifier

Regex Alternation (1 issue)

Test Issue
test_empty_alternative_in_repetition (?:\|[\w])+ empty alternative doesn't work

Regex Character Class (1 issue)

Test Issue
test_backspace_string_literal String "\b" not parsed as backspace character

Regex Unicode (1 issue)

Test Issue
test_lastindex_surrogate_pair lastIndex not reset to 0 for surrogate pair edge case

Error Line/Column (3 issues)

Test Issue
test_error_has_line_number Error.lineNumber returns None
test_error_has_column_number Error.columnNumber returns None
test_syntax_error_position SyntaxError doesn't report position

Deep Nesting (4 issues)

Test Issue
test_deep_nested_parens 1000 nested () causes recursion overflow
test_deep_nested_braces 1000 nested {} causes recursion overflow
test_deep_nested_arrays 1000 nested [] causes recursion overflow
test_deep_nested_regex_groups 10000 regex groups causes overflow

Test Results

  • 10 passed (working features with tests)
  • 14 xfailed (known issues documented)
  • Full suite: 435 passed, 20 xfailed
e6602d3
Add fine-grained xfail tests for known issues
#25

Do the error line column work

15 read · 13 edit · 7 grep · 4 todo · 4 bash · 1 glob
e2eff67
Add error line/column tracking for thrown errors
#26

Show me your todo list

#27

List the remaining xfails

2 bash

Here are the 20 remaining xfail tests, organized by category:

Indirect Eval (1 test)

  • TestIndirectEval::test_indirect_eval_writes_global - Indirect eval should write to global scope

Regex Capture Groups (3 tests)

  • TestRegexCaptureGroups::test_capture_group_reset_in_repetition - Capture groups reset on each repetition
  • TestRegexCaptureGroups::test_optional_lookahead_no_match - Optional lookahead with no match
  • TestRegexCaptureGroups::test_repeated_optional_lookahead - Repeated optional lookahead

Regex Alternation (1 test)

  • TestRegexAlternation::test_empty_alternative_in_repetition - Empty alternative in repetition

Regex Character Class (1 test)

  • TestRegexCharacterClass::test_backspace_string_literal - Backspace in string literal

Regex Unicode (1 test)

  • TestRegexUnicode::test_lastindex_surrogate_pair - lastIndex with surrogate pairs

Error Line/Column (3 tests)

  • TestErrorLineColumn::test_error_constructor_has_line_number - Error constructor location tracking
  • TestErrorLineColumn::test_error_constructor_has_column_number - Error constructor location tracking
  • TestErrorLineColumn::test_syntax_error_position - SyntaxError position in message

Deep Nesting (4 tests)

  • TestDeepNesting::test_deep_nested_parens - Deep nested parentheses
  • TestDeepNesting::test_deep_nested_braces - Deep nested braces
  • TestDeepNesting::test_deep_nested_arrays - Deep nested arrays
  • TestDeepNesting::test_deep_nested_regex_groups - Deep nested regex groups

JS File Tests (4 tests)

  • test_builtin_function[test_global_eval] - Global eval behavior
  • test_builtin_function[test_large_eval_parse_stack] - Large eval parse stack
  • test_builtin_function[test_regexp] - Regex test suite
  • test_builtin_function[test_line_column_numbers] - Line/column numbers
  • test_mquickjs_js[microbench.js] - Microbenchmark suite
  • test_mquickjs_js[test_builtin.js] - Built-in test suite
#28

Propose a plan for fixing all those regex ones, what would need to be changed?

2 grep · 2 glob · 2 read

Now I have a clear picture. Let me propose a detailed plan:


Plan for Fixing Regex xfail Tests

Issue 1: Capture Group Reset in Repetitions

Files: src/mquickjs_python/regex/compiler.py, src/mquickjs_python/regex/vm.py

Problem: Pattern /(z)((a+)?(b+)?(c))*/ on "zaacbbbcac" should return group 4 as None (from final iteration), but returns "bbb" (retained from previous iteration).

Root cause: The SAVE_RESET opcode exists but isn't being emitted at the start of each loop iteration for repetitions containing capture groups.

Fix: 1. In the compiler, when compiling quantifiers (*, +, ?, {n,m}), detect all capture groups inside the repeated expression 2. Emit SAVE_RESET for those capture groups at the start of each loop iteration 3. This ensures captures inside a repeated group are reset to undefined before each iteration


Issue 2 & 3: Optional Lookahead Captures

Files: src/mquickjs_python/regex/compiler.py

Problem: /(?:(?=(abc)))?a/ on "abc" should return group 1 as None, but captures "abc".

Root cause: Optional groups (?) containing captures don't reset those captures when the optional path is skipped.

Fix: Same as Issue 1 - emit SAVE_RESET for captures inside optional groups when taking the "skip" path in the alternation.


Issue 4: Empty Alternative in Repetition

Files: src/mquickjs_python/regex/compiler.py

Problem: /(?:|[\w])+([0-9])/ on "123a23" matches only "1" instead of "123a23".

Root cause: The empty alternative | always succeeds immediately, and with greedy matching the engine picks the empty path, then advances and tries again. But the implementation may not be correctly handling greedy preference for longer matches.

Fix: 1. When compiling alternation where one branch is empty, ensure the non-empty branch is tried first (greedy behavior) 2. Check that SPLIT_FIRST is used correctly - it should try the first (greedy) path and backtrack to alternative 3. The issue may be in how CHECK_ADVANCE interacts with empty matches - need to ensure progress is being made


Issue 5: Backspace String Literal

Files: src/mquickjs_python/lexer.py

Problem: "\b" in JS string should be backspace (0x08), but currently \b is treated as unknown escape → just b.

Fix: Add \b handling to the escape sequence switch in _read_string:

elif escape == "b":
    result.append("\x08")  # Backspace character

This is a 1-line fix.


Issue 6: Surrogate Pair lastIndex

Files: src/mquickjs_python/regex/regex.py

Problem: In unicode mode, if lastIndex points to the second half of a surrogate pair, exec should fail and reset to 0. Currently advances to 2.

Fix: In the exec and test methods, when unicode mode is enabled: 1. Before matching, check if lastIndex points to a lone surrogate or the second half of a pair 2. If so, the match should fail and lastIndex should reset to 0 (not advance) 3. Add helper function _is_valid_unicode_index(string, index) that checks if index is valid

def _is_valid_unicode_index(self, string: str, index: int) -> bool:
    """Check if index is valid in unicode mode (not in middle of surrogate pair)."""
    if index <= 0 or index >= len(string):
        return True
    # Check if we're at the second code unit of a surrogate pair
    prev_char = ord(string[index - 1])
    curr_char = ord(string[index]) if index < len(string) else 0
    # High surrogate: 0xD800-0xDBFF, Low surrogate: 0xDC00-0xDFFF
    if 0xD800 <= prev_char <= 0xDBFF and 0xDC00 <= curr_char <= 0xDFFF:
        return False  # In middle of surrogate pair
    return True

Summary Table

Issue Difficulty Files to Change
Capture reset in repetitions Medium regex/compiler.py
Optional lookahead captures Medium regex/compiler.py (same fix as #1)
Empty alternative Medium regex/compiler.py
Backspace \b Easy lexer.py (1 line)
Surrogate pair lastIndex Easy regex/regex.py

The backspace and surrogate pair fixes are straightforward. The capture group reset issues require understanding the compiler's quantifier emission logic and ensuring SAVE_RESET is properly emitted.

#29

Fix those issues, simplest first, commit each time a previously xfailing test passes

38 bash · 24 read · 16 edit · 8 todo · 8 grep · 1 write

Done! Here's a summary of what was accomplished:

Fixes Implemented

Fix Commit Description
Backspace \b 84c5301 Added \b, \f, \v escape sequences to lexer
Surrogate pair lastIndex 879f8d1 UTF-16 index handling in regex unicode mode
Empty alternative in repetition 626b64f Moved CHECK_ADVANCE before SPLIT_FIRST in + quantifier
Capture group reset 65c2af0 Added SAVE_RESET for captures in * and + quantifiers
Optional quantifier captures 2a976af Added SAVE_RESET to optional ? quantifier
Indirect eval writes a7fdced Changed globals from copy to shared reference

Documentation

Created open-problems.md documenting 15 remaining xfail issues grouped by category: - Deep nesting (5 tests): Requires parser restructuring from recursive to iterative - Error location tracking (3 tests): Need to pass VM location to Error constructor - Lookahead capture semantics (2 tests): Complex ECMAScript spec edge cases - Global eval edge cases (1 test): Some indirect eval scope issues remain - Test suite failures (4 tests): Comprehensive test suites exercise remaining edge cases

Test Results

  • Before: 438 passed, 20 xfailed
  • After: 443 passed, 15 xfailed
84c5301
Fix \b escape sequence in string literals
879f8d1
Fix surrogate pair handling in regex lastIndex
626b64f
Fix empty alternative in repeated groups
65c2af0
Fix capture group reset in repetitions
2a976af
Add capture reset to optional quantifier
a7fdced
Fix indirect eval writes to global scope
c22e741
Add open-problems.md documenting remaining xfail issues
#30

Update the README. Remove the installation bit. Think about if it will remains accurate and apply small necessary changes.

1 read · 1 edit
1c847cb
Update README with features, examples, and known limitations
e6287e0
Update README with features, examples, and known limitations
#31

It's not inspired by quickjs it's inspired by https://github.com/bellard/mquickjs MicroQuickJS

1 edit
0119299
Update README with features, examples, and known limitations
#32

Drop the uv sync it is not needed

1 edit · 1 bash
2a14ba0
Update README with features, examples, and known limitations
#33

Read the whole README file yourself

1 read
#34

Last line should be based on MicroQuickJS by Fabrice Bellard

1 edit · 1 bash
b96e33a
Update README with features, examples, and known limitations
#35

OK push

1 bash