Testing Philosophy: Scenarios as Algebraic Laws

This document explains how Gherkin scenarios, step definitions, and the nested TDD loop fit into Morph’s algebraic model.

The Key Insight

Gherkin scenarios are algebraic laws expressed as examples.

In algebra, laws constrain which implementations are valid. For example, addition must be commutative: a + b = b + a. Any implementation that violates this law is incorrect.

Similarly, Gherkin scenarios define what it means to correctly implement the domain algebra. An implementation is valid if and only if it satisfies all scenarios.

Scenarios, Steps, and Interpretations

Component	Role	Category Theory
Scenario	A law the algebra must satisfy	Equation in theory T
Step definition	How to verify the law for a specific interpretation	Component of natural transformation η applied to equation
Passing tests	Evidence that the interpretation satisfies the laws	η preserves equations (naturality holds)

See Algebraic Foundations for the full functorial semantics model.

Same Scenarios, Different Step Definitions

The same scenarios can be executed against different interpretations:

Scenario: Create an item
  Given an empty list
  When I create an item titled "Widget"
  Then the list contains one item

@core steps (test the library directly):

Given("an empty list", function () {
	this.list = createList({ repo: inMemoryRepo() });
});
When("I create an item titled {string}", function (title) {
	this.result = createItem(this.list.id, title);
});

@cli steps (test via command line):

Given("an empty list", function () {
	this.listId = execSync("myapp create-list").trim();
});
When("I create an item titled {string}", function (title) {
	execSync(`myapp create-item --list ${this.listId} --title "${title}"`);
});

@api steps (test via HTTP):

Given("an empty list", function () {
	const res = await fetch("/lists", { method: "POST" });
	this.listId = (await res.json()).id;
});
When("I create an item titled {string}", function (title) {
	await fetch(`/lists/${this.listId}/items`, {
		method: "POST",
		body: JSON.stringify({ title }),
	});
});

All three execute the same scenario but verify different interpretations.

The Nested TDD Loop

The development workflow uses a nested loop structure:

OUTER LOOP (Cucumber scenarios)
│
│   Write/modify scenario
│   ↓
│   Run scenarios → RED (expected to fail)
│   ↓
│   ┌─────────────────────────────────────┐
│   │ INNER LOOP (unit tests)             │
│   │                                     │
│   │   Write unit test → RED             │
│   │   Write implementation → GREEN      │
│   │   Refactor                          │
│   │   (repeat for each helper needed)   │
│   │                                     │
│   └─────────────────────────────────────┘
│   ↓
│   Run scenarios → GREEN
│   ↓
│   Refactor (run all scenarios to verify)
│
└── Next scenario

Why Two Loops?

Outer loop — Verifies the interpretation satisfies the algebra’s laws
Inner loop — Builds the internal machinery of the implementation

The outer loop tests what the system does (behavior). The inner loop tests how it does it (mechanics).

Step Definition Generation

Morph generates step definitions for each interpretation:

Library Steps (@core)

Generated from schema operations:

Given steps set up state via repository
When steps call operations directly
Then steps assert on results/state

App Steps (@cli, @api, @mcp)

Generated from the same scenarios but different execution:

Given steps use the app interface to set up state
When steps invoke the app (shell command, HTTP request, MCP message)
Then steps verify the app’s response

Because apps are derived from operations via natural transformation, scenarios that pass against @core should also pass against apps — this validates the transformation is correct.

What This Enables

Portable specifications — Same scenarios test library and all apps
Transformation validation — If @core passes but @cli fails, the CLI generator has a bug
Confidence in derivation — Mechanical generation is verified by running the same laws
Living documentation — Scenarios describe behavior in domain language

Generation Strategy

For each interpretation, generate:

World/context — Test state shared across steps
Parameter types — Parse domain values from step text
Step definitions — Map step patterns to interpretation calls
Hooks — Before/After for setup/teardown

The step patterns come from the schema’s examples. The step bodies differ per interpretation but verify the same laws.

When Scenarios Don’t Apply: Pure Functions

Not every operation type benefits from scenario testing. Scenarios verify that stateful operations (commands and queries) produce correct results across interpretations — but functions are pure morphisms with none of the properties that make scenarios valuable.

Functions have:

No state to set up — no Given step, because functions don’t read or write repositories
No side effects to observe — no Then-state-changed step, because functions don’t emit events or mutate state
No execution context — functions don’t depend on R (environment), so there’s nothing to swap between interpretations
Identical behavior everywhere — the same pure computation in every interpretation, so running scenarios against @core vs @api vs @cli would just repeat the same test

Since scenarios are equations in theory T (see Algebraic Foundations: “functions — Pure morphisms (no state, no effects)”), and functions don’t participate in the stateful theory, scenarios add no verification value.

Pure functions are better tested with:

Direct unit tests — assert input→output for known examples
Property-based tests — use fast-check to verify algebraic laws (idempotency, commutativity, associativity) over random inputs

This is why the defineOp wrapper is intentionally not generated for functions (see context-dsl.ts: “Functions are pure and don’t need defineOp wrappers”). They don’t need the operation infrastructure — dependency injection, events, context — they’re just computations.

Morph itself only has function operations, which is why fixtures/scenarios/scenarios.ts is deliberately empty.