Testing Strategy
Principles
- TDD (Red-Green-Refactor): Write the failing test first, implement minimal code, refactor.
- Property-based testing: Use fast-check to verify invariants over generated inputs. Tests define equivalence classes — partitions of the input space where the system must behave uniformly — rather than enumerating individual examples.
- Unknown = error: The compiler never silently succeeds when it cannot verify. See ADR-010.
Test Categories
Parser Tests (test/parsing/)
Verify that .ddd source parses into the expected AST. Use the parseValid helper and ddd tagged template from test/parsing/helpers.ts.
Validation Tests (test/validation/)
Two kinds per validation rule:
- Integration tests (
.test.ts) — Parse.dddsource via Langium’svalidationHelper, check diagnostics for expected errors/warnings. - Property-based tests (
.property.test.ts) — Build mock AST nodes directly, verify validation invariants with fast-check arbitraries.
Shared helpers: validate, ddd, expectError, expectWarning, expectNoIssues, expectErrorCount from test/validation/helpers.ts.
Code Generation Tests (packages/generator-emmett/test/)
Tests for the Emmett code generator verify that .ddd AST nodes produce correct TypeScript output.
Mock AST pattern: Test fixtures build plain JavaScript objects with $type discriminators and { ref: ... } cross-references, typed as any. This avoids coupling tests to the Langium runtime while preserving the AST shape that generators consume.
Fixture builder pattern: Complex decider fixtures (e.g., buildRegistrationDecider()) construct a full decider AST with commands, events, states, decisions, and evolutions. Each fixture exercises a specific code generation path.
Property tests: test/generators/codegen.property.test.ts uses fast-check arbitraries from test/arbitraries/codegen.ts to generate random CodegenDeciderSpec values and verify structural invariants (e.g., generated output contains the correct event type names, state type names, and function signatures).
Assertion layering: Tests combine behavioral assertions (toContain, toEqual) with snapshot assertions (toMatchSnapshot) for defense in depth. See Assertion Strength Guidance below.
Snapshot Testing
Snapshot tests capture the exact output of generators as regression guards. When a code change modifies generated output — even whitespace or punctuation — the snapshot comparison fails, forcing explicit review.
Tool: Bun’s built-in snapshot API (documentation).
APIs:
| API | Storage | Use Case |
|---|---|---|
toMatchSnapshot() | __snapshots__/<test-file>.snap (colocated) | Default — serializes to external file |
toMatchInlineSnapshot() | Inline in test source (auto-populated on first run) | Small values where inline is clearer |
toThrowErrorMatchingSnapshot() | External .snap file | Error message regression |
Snapshot file format: Files use // Bun Snapshot v1 header with exports["test name 1"] = \”value”`;` entries. Test names become snapshot keys.
Workflow:
- Add
expect(result).toMatchSnapshot()— first run auto-creates the.snapfile. - Review
git diff __snapshots__/— every snapshot change must be intentional. - After intentional output changes:
bun test --update-snapshotsto regenerate.
Git: __snapshots__/ directories are committed to version control. Reviewers see snapshot diffs in PRs.
When to use: After behavioral tests verify correctness, as a safety net for remaining string-level mutations.
When NOT to use: As the sole assertion. Snapshots encode current behavior (including bugs). Pair with behavioral assertions that verify correctness.
Stryker interaction: Snapshot mismatches cause non-zero exit from bun test, so Stryker’s command-runner registers the mutant as killed.
Arbitraries (test/arbitraries/)
Custom fast-check arbitraries derived from the Langium grammar via LLM-assisted generation. The grammar defines the shape of valid AST nodes; the arbitraries mirror this structure to produce random but grammatically valid inputs.
Grammar → Arbitrary mapping: Each grammar rule (e.g., Decider, Decision, Evolution) maps to an arbitrary that generates structurally valid mock AST nodes. The LLM reads the grammar and produces arbitraries that respect the rule’s cardinalities, references, and type constraints.
Equivalence classes: Arbitraries partition the input space into classes that exercise distinct code paths:
| Class | What It Covers | Example |
|---|---|---|
| Complete deciders | All (Command, State) pairs have decide clauses | No exhaustiveness errors |
| Incomplete deciders | Missing decide clauses for some pairs | Exhaustiveness errors expected |
| Guarded decisions | Decisions with require guards | Guard consistency checks |
| Unguarded decisions | Short-form decisions without guards | No guard checks needed |
| Terminal states | States marked as terminal | Terminal state enforcement |
| Dead declarations | Commands/events declared but unused | Dead code warnings |
Key exports:
arbDeciderSpec— generates decider configurations across equivalence classesbuildMockDecider(spec)— converts a spec to a mock Langium ASTDecidernodecollectErrors()— creates a mockValidationAcceptorthat captures emitted diagnostics
Property-Based Test Design
Each validation rule has a corresponding .property.test.ts file that defines properties over equivalence classes rather than individual test cases.
Structure of a property test:
- Define the equivalence class via a constrained arbitrary (e.g., “deciders where all Command × State pairs are covered”)
- State the invariant that must hold for all members of the class (e.g., “no exhaustiveness errors are emitted”)
- Let fast-check explore the space with random inputs (
numRuns: 100)
Example pattern:
test('complete deciders produce no exhaustiveness errors', () => { fc.assert( fc.property(arbDeciderSpec({ complete: true }), (spec) => { const decider = buildMockDecider(spec); const { errors } = collectErrors(); checkExhaustiveness(decider, errors); expect(errors()).toHaveLength(0); }), { numRuns: 100 }, );});This single property replaces dozens of example-based tests by verifying the invariant holds across the entire equivalence class.
Running Tests
# All packagesbun test
# Single packagecd packages/language && bun test
# Single test filecd packages/language && bun test test/validation/guard-consistency.test.ts
# With filterbun test --filter "exhaustiveness"Mutation Testing
# Single packagecd packages/generator-emmett && bun run strykercd packages/language && bun run strykerMutation Testing
Stryker verifies that the test suite detects code changes — a test suite that passes when code is mutated is not providing real coverage.
Configuration
All packages with tests have Stryker configs (stryker.config.json). Stryker runs in command-runner mode because Bun has no native Stryker plugin — each mutant spawns a full bun test invocation.
Reporters configured: JSON (machine-readable), HTML (visual), clear-text (terminal).
Running Mutation Tests
# Single packagecd packages/generator-emmett && bun run stryker
# Language packagecd packages/language && bun run strykerCI Integration
The .forgejo/workflows/mutation-testing.yml workflow runs Stryker on push, uploads JSON and HTML reports as artifacts with 30-day retention.
Reading Mutation Reports
The JSON report at reports/mutation/mutation.json contains per-file mutant data. To extract survivor counts:
jq '.files | to_entries[] | {file: .key, survived: [.value.mutants[] | select(.status == "Survived")] | length}' reports/mutation/mutation.jsonThe HTML report provides a navigable view — open reports/mutation/mutation.html in a browser to see highlighted source with mutant status overlays.
Assertion Strength Guidance
Mutation testing reveals that assertion choice directly affects mutant kill rate. The following table orders assertions by their effectiveness against Stryker’s mutators:
| Assertion | Mutant Kill Rate | Maintenance Cost | Use For |
|---|---|---|---|
toContain(token) | ~50% | Low | Property tests, behavioral checks where exact output varies |
Line-split + toContain | ~70% | Medium | Isolating assertions to specific output regions |
toEqual(exactString) | ~100% | High (brittle to formatting) | Small static generators with fixed output |
toMatchSnapshot() | ~100% | Medium (update on intentional change) | Regression guard on all generators |
Recommended layering:
- Behavioral
toContain— verify presence of critical tokens (event names, type tags, keywords). toEqualfor static generators — small generators with fixed templates (e.g., Emmett wiring, type aliases).toMatchSnapshot— catch remaining string-level mutations (indentation, empty lines, decorative punctuation).
The StringLiteral mutator dominates code generator mutation testing. It replaces template strings with "", which toContain passes because the empty string is contained in any string. toEqual and toMatchSnapshot both catch this mutator class.
Test Conventions
- Test files use
bun:test(describe,test,expect) - Property tests use
fc.assert(fc.property(...), { numRuns: 100 }) - Mock AST nodes use
as unknown as Typecast pattern (avoids Langium runtime dependency in unit tests) - Integration tests parse real
.dddsource for end-to-end validation