bold beginnings

2026-02-18 11:13:08 -06:00
parent 5d11f936d7
commit 6e0f82835a
3 changed files with 681 additions and 1 deletions
--- a/IMPLEMENTATION_PLAN.md
+++ b/IMPLEMENTATION_PLAN.md
@@ -0,0 +1,624 @@
+# Physical Combinatorics — Implementation Plan
+
+## Context
+
+This project systematically generates novel concepts by computing the Cartesian product of real-world attribute dimensions (platforms, power sources, and future dimensions), then filters the combinatorial explosion through a multi-pass viability pipeline. The goal is to surface "bizarre but plausible" innovations (like hydrogen-powered bicycles) while eliminating the vast majority of nonsensical pairings. The core insight is that attributes are real things — the risk isn't bad input, it's how much noise survives the filters.
+
+**Stack:** Python, SQLite, abstract LLM interface, CLI-first.
+
+---
+
+## 1. Project Structure
+
+```
+physicalCombinatorics/
+├── README.md
+├── IMPLEMENTATION_PLAN.md
+├── pyproject.toml                  # Package config, dependencies
+├── src/
+│   └── physcom/
+│       ├── __init__.py
+│       ├── cli.py                  # CLI entry point (argparse/click)
+│       ├── db/
+│       │   ├── __init__.py
+│       │   ├── schema.py           # DDL, table creation, migrations
+│       │   └── repository.py       # CRUD operations for all entities
+│       ├── models/
+│       │   ├── __init__.py
+│       │   ├── entity.py           # Entity, Dependency dataclasses
+│       │   ├── domain.py           # Domain, MetricWeight dataclasses
+│       │   └── combination.py      # Combination, Score dataclasses
+│       ├── engine/
+│       │   ├── __init__.py
+│       │   ├── combinator.py       # Cartesian product generator
+│       │   ├── constraint_resolver.py  # Dependency contradiction detection
+│       │   ├── scorer.py           # Multiplicative logarithmic scoring
+│       │   └── pipeline.py         # Multi-pass orchestrator
+│       ├── llm/
+│       │   ├── __init__.py
+│       │   ├── base.py             # Abstract LLM interface
+│       │   ├── prompts.py          # Prompt templates for physics/social passes
+│       │   └── providers/          # Concrete implementations (future)
+│       │       └── __init__.py
+│       └── seed/
+│           └── transport_example.py  # Seed data from the README example
+├── tests/
+│   ├── test_constraint_resolver.py
+│   ├── test_scorer.py
+│   ├── test_combinator.py
+│   ├── test_pipeline.py
+│   └── test_repository.py
+└── data/
+    └── physcom.db                  # SQLite database (gitignored)
+```
+
+---
+
+## 2. Database Schema
+
+### Table: `dimensions`
+Defines attribute categories (e.g., "platform", "power_source"). Adding a new dimension is just inserting a row — no schema change needed.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `name` | TEXT UNIQUE NOT NULL | e.g., "platform", "power_source" |
+| `description` | TEXT | Human-readable purpose |
+
+### Table: `entities`
+Individual attributes within a dimension.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `dimension_id` | INTEGER FK → dimensions.id | Which dimension this belongs to |
+| `name` | TEXT NOT NULL | e.g., "Bicycle", "Solar Sail" |
+| `description` | TEXT | Longer description |
+| UNIQUE | (dimension_id, name) | No duplicate names within a dimension |
+
+### Table: `dependencies`
+The core metadata on every entity. Uses a flexible category/key/value/unit structure so any type of dependency can be expressed without schema changes.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `entity_id` | INTEGER FK → entities.id | Owner entity |
+| `category` | TEXT NOT NULL | One of: "environment", "force", "material", "physical", "infrastructure" |
+| `key` | TEXT NOT NULL | e.g., "requires_ground", "min_mass_kg", "force_output_watts" |
+| `value` | TEXT NOT NULL | e.g., "true", "500", "vacuum" |
+| `unit` | TEXT | Optional unit: "kg", "watts", "celsius", etc. |
+| `constraint_type` | TEXT NOT NULL | One of: "requires", "provides", "range_min", "range_max", "excludes" |
+
+**Constraint types explained:**
+- `requires` — this entity needs this condition to function (walking requires ground=true)
+- `provides` — this entity supplies this condition (a sealed cabin provides atmosphere=true)
+- `range_min` / `range_max` — numeric bounds (nuclear reactor: min_mass_kg=2000)
+- `excludes` — this entity cannot coexist with this condition (solar sail excludes atmosphere=dense)
+
+### Table: `domains`
+Context frames that define what "good" means.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `name` | TEXT UNIQUE NOT NULL | e.g., "urban_commuting", "interplanetary_travel" |
+| `description` | TEXT | What this domain represents |
+
+### Table: `metrics`
+The measurable dimensions of viability.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `name` | TEXT UNIQUE NOT NULL | e.g., "speed", "cost_efficiency", "safety" |
+| `unit` | TEXT | e.g., "km/h", "usd/km", "score_0_1" |
+| `description` | TEXT | What this measures |
+
+### Table: `domain_metric_weights`
+Per-domain weighting and normalization bounds for each metric.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `domain_id` | INTEGER FK → domains.id | |
+| `metric_id` | INTEGER FK → metrics.id | |
+| `weight` | REAL NOT NULL | 0.0–1.0, weights within a domain should sum to 1.0 |
+| `norm_min` | REAL | Lower bound for normalization (below this → score 0) |
+| `norm_max` | REAL | Upper bound for normalization (above this → score 1) |
+| UNIQUE | (domain_id, metric_id) | |
+
+### Table: `combinations`
+Each generated combination of entities.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `hash` | TEXT UNIQUE NOT NULL | Deterministic hash of sorted entity IDs (dedup) |
+| `status` | TEXT NOT NULL DEFAULT 'pending' | One of: "pending", "valid", "blocked", "scored", "reviewed" |
+| `block_reason` | TEXT | If blocked, why (which dependencies contradicted) |
+| `created_at` | TIMESTAMP | |
+
+### Table: `combination_entities`
+Junction table linking combinations to their constituent entities.
+
+| Column | Type | Notes |
+|---|---|---|
+| `combination_id` | INTEGER FK → combinations.id | |
+| `entity_id` | INTEGER FK → entities.id | |
+| PRIMARY KEY | (combination_id, entity_id) | |
+
+### Table: `combination_scores`
+Per-metric scores for each combination within a domain.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `combination_id` | INTEGER FK → combinations.id | |
+| `domain_id` | INTEGER FK → domains.id | |
+| `metric_id` | INTEGER FK → metrics.id | |
+| `raw_value` | REAL | The estimated raw metric value (e.g., 40.0 km/h) |
+| `normalized_score` | REAL | 0.0–1.0 after log-normalization against domain bounds |
+| `estimation_method` | TEXT | "physics_calc", "llm_estimate", "human_input" |
+| `confidence` | REAL | 0.0–1.0 confidence in the estimate |
+| UNIQUE | (combination_id, domain_id, metric_id) | |
+
+### Table: `combination_results`
+Final composite viability scores per domain.
+
+| Column | Type | Notes |
+|---|---|---|
+| `id` | INTEGER PK | Auto-increment |
+| `combination_id` | INTEGER FK → combinations.id | |
+| `domain_id` | INTEGER FK → domains.id | |
+| `composite_score` | REAL | Weighted geometric mean of normalized metric scores |
+| `novelty_flag` | TEXT | "novel", "exists", "researched", or NULL |
+| `llm_review` | TEXT | LLM-generated plausibility summary |
+| `human_notes` | TEXT | Human reviewer notes |
+| `pass_reached` | INTEGER | Highest pass this combination survived (1–5) |
+| UNIQUE | (combination_id, domain_id) | |
+
+### Indexes
+```sql
+CREATE INDEX idx_deps_entity ON dependencies(entity_id);
+CREATE INDEX idx_deps_category_key ON dependencies(category, key);
+CREATE INDEX idx_combo_status ON combinations(status);
+CREATE INDEX idx_scores_combo_domain ON combination_scores(combination_id, domain_id);
+CREATE INDEX idx_results_domain_score ON combination_results(domain_id, composite_score DESC);
+```
+
+---
+
+## 3. Entity & Dependency Data Model (Python)
+
+```python
+@dataclass
+class Dependency:
+    category: str        # "environment", "force", "material", "physical", "infrastructure"
+    key: str             # "requires_ground", "force_output_watts", "min_mass_kg"
+    value: str           # "true", "75", "vacuum"
+    unit: str | None     # "kg", "watts", etc.
+    constraint_type: str # "requires", "provides", "range_min", "range_max", "excludes"
+
+@dataclass
+class Entity:
+    id: int | None
+    dimension: str          # "platform", "power_source"
+    name: str               # "Bicycle"
+    description: str
+    dependencies: list[Dependency]
+
+@dataclass
+class Combination:
+    id: int | None
+    entities: list[Entity]  # One per dimension
+    status: str             # "pending" → "valid"/"blocked" → "scored" → "reviewed"
+    block_reason: str | None
+```
+
+### Example: Entities with Full Dependencies
+
+```python
+Entity(
+    name="Solar Sail",
+    dimension="power_source",
+    description="Propulsion via radiation pressure from a star",
+    dependencies=[
+        Dependency("environment", "atmosphere", "vacuum_or_thin", None, "requires"),
+        Dependency("environment", "star_proximity", "true", None, "requires"),
+        Dependency("physical", "surface_area", "100", "m^2", "range_min"),
+        Dependency("force", "force_output_watts", "0.001", "N", "provides"),
+        Dependency("force", "thrust_profile", "continuous_low", None, "provides"),
+    ]
+)
+
+Entity(
+    name="Walking",
+    dimension="platform",
+    description="Bipedal locomotion",
+    dependencies=[
+        Dependency("environment", "ground_surface", "true", None, "requires"),
+        Dependency("environment", "gravity", "true", None, "requires"),
+        Dependency("physical", "max_mass_kg", "150", "kg", "range_max"),
+        Dependency("force", "force_output_watts", "75", "watts", "provides"),
+        Dependency("infrastructure", "fuel_infrastructure", "none", None, "requires"),
+    ]
+)
+
+Entity(
+    name="Modular Nuclear Reactor",
+    dimension="power_source",
+    description="Small-scale fission reactor for sustained high power output",
+    dependencies=[
+        Dependency("physical", "min_mass_kg", "2000", "kg", "range_min"),
+        Dependency("material", "radiation_shielding", "true", None, "requires"),
+        Dependency("material", "coolant_system", "true", None, "requires"),
+        Dependency("force", "force_output_watts", "1000000", "watts", "provides"),
+        Dependency("infrastructure", "nuclear_fuel", "enriched_uranium", None, "requires"),
+        Dependency("infrastructure", "regulatory_approval", "nuclear", None, "requires"),
+    ]
+)
+
+Entity(
+    name="Bicycle",
+    dimension="platform",
+    description="Two-wheeled human-scale vehicle",
+    dependencies=[
+        Dependency("environment", "ground_surface", "true", None, "requires"),
+        Dependency("environment", "atmosphere", "standard", None, "requires"),
+        Dependency("physical", "max_mass_kg", "30", "kg", "range_max"),
+        Dependency("physical", "max_payload_kg", "120", "kg", "range_max"),
+        Dependency("force", "force_required_watts", "50", "watts", "range_min"),
+        Dependency("force", "force_required_watts", "500", "watts", "range_max"),
+    ]
+)
+```
+
+Note how **Bicycle + Nuclear Reactor** is caught by Rule 3: the reactor's `min_mass_kg=2000` exceeds the bicycle's `max_mass_kg=30`. Meanwhile **Bicycle + Human Pedalling** passes all checks — the force ranges overlap, the mass is compatible, and no environmental contradictions exist.
+
+---
+
+## 4. Constraint Resolution Engine
+
+**File:** `src/physcom/engine/constraint_resolver.py`
+
+The resolver takes a `Combination` (a set of entities, one per dimension) and checks all dependencies for contradictions. It returns `VALID`, `BLOCKED`, or `CONDITIONAL` with reasons.
+
+### Contradiction Rules
+
+**Rule 1: Requires vs. Excludes**
+If entity A `requires` key=X and entity B `excludes` key=X → BLOCKED.
+
+> Walking requires `ground_surface=true`; if a hypothetical power source excluded ground operation, the combination is impossible.
+
+**Rule 2: Mutual Exclusion**
+If entity A `requires` key=X and entity B `requires` key=Y where X and Y are mutually exclusive values of the same key → BLOCKED.
+
+> Solar sail requires `atmosphere=vacuum_or_thin`; a ground platform requires `atmosphere=standard` → contradiction on the `atmosphere` key.
+
+This requires a mutual exclusion registry:
+
+```python
+MUTEX_VALUES = {
+    "atmosphere": [{"vacuum", "vacuum_or_thin"}, {"dense", "standard"}],
+    "medium": [{"ground"}, {"water"}, {"air"}, {"space"}],
+}
+```
+
+**Rule 3: Range Incompatibility**
+If entity A has `range_min` for key=K at value V1 and entity B has `range_max` for the same key at value V2, and V1 > V2 → BLOCKED.
+
+> Nuclear reactor `range_min` mass=2000kg vs. Bicycle `range_max` mass=30kg → the reactor cannot physically fit on the bicycle.
+
+**Rule 4: Force Scale Mismatch**
+A specialized range check on force-related keys. If a platform requires a minimum force that the power source cannot provide, or the power source outputs force orders of magnitude beyond what the platform can structurally handle → flag. This may be a soft constraint (CONDITIONAL with warning) rather than hard BLOCKED, since some edge cases are debatable.
+
+**Rule 5: Unmet Requirements**
+If entity A `requires` key=K but no other entity in the combination `provides` that key, AND it is not an ambient environmental assumption → `CONDITIONAL`. The combination works only if the missing requirement is externally supplied.
+
+> A hydrogen engine requires `fuel_infrastructure=hydrogen_station`. If no other entity provides it, the combination is conditionally viable — it works where hydrogen stations exist.
+
+### Resolver Interface
+
+```python
+@dataclass
+class ConstraintResult:
+    status: str            # "valid", "blocked", "conditional"
+    violations: list[str]  # Human-readable descriptions of hard blocks
+    warnings: list[str]    # Soft constraint notes
+
+class ConstraintResolver:
+    def __init__(self, mutex_registry: dict): ...
+    def resolve(self, combination: Combination) -> ConstraintResult: ...
+```
+
+---
+
+## 5. Scoring Pipeline
+
+**File:** `src/physcom/engine/scorer.py`
+
+### Normalization — Logarithmic Scaling
+
+Raw metric values are normalized to 0.0–1.0 against domain-specific bounds. The log scale reflects the expected behavior: most combinations cluster near 0 (useless) or near 1 (fully competitive) within a given domain.
+
+```python
+def normalize(raw_value: float, norm_min: float, norm_max: float) -> float:
+    """Log-normalize a raw value to 0-1 within domain bounds.
+
+    Values at or below norm_min → 0.0
+    Values at or above norm_max → 1.0
+    Values between are log-interpolated.
+    """
+    if raw_value <= norm_min:
+        return 0.0
+    if raw_value >= norm_max:
+        return 1.0
+    log_min = math.log1p(norm_min)
+    log_max = math.log1p(norm_max)
+    log_val = math.log1p(raw_value)
+    return (log_val - log_min) / (log_max - log_min)
+```
+
+### Composite Score — Weighted Geometric Mean
+
+Scores are **multiplied**, not averaged. This is the key design decision: a single near-zero metric kills the overall viability, filtering out "technically possible but completely pointless in practice" concepts.
+
+```python
+def composite_score(scores: list[float], weights: list[float]) -> float:
+    """Weighted geometric mean. Any score near 0 drives the result toward 0.
+
+    composite = product(score_i ^ weight_i) for all metrics i
+    """
+    result = 1.0
+    for score, weight in zip(scores, weights):
+        result *= score ** weight
+    return result
+```
+
+**Properties:**
+- If any `score = 0.0` → composite = 0.0 regardless of other scores
+- If all scores = 1.0 → composite = 1.0
+- The resulting distribution is heavily skewed toward 0 — this is intended as the primary noise filter
+- A person pushing a car: speed ≈ 0 in ground transport domain → composite ≈ 0 → eliminated
+- A rocket-powered car: speed ≈ 1 in ground transport domain, but safety ≈ 0 → composite ≈ 0 → also eliminated
+- Only combinations that score reasonably across ALL metrics survive
+
+### Scorer Interface
+
+```python
+class Scorer:
+    def __init__(self, domain: Domain): ...
+    def score_combination(self, combination: Combination,
+                          raw_metrics: dict[str, float]) -> ScoredResult: ...
+```
+
+---
+
+## 6. Domain System
+
+**File:** `src/physcom/models/domain.py`
+
+Domains define the context in which combinations are evaluated. The same combination may be viable in one domain and worthless in another.
+
+```python
+@dataclass
+class MetricBound:
+    metric_name: str
+    weight: float       # 0.0–1.0, all weights in a domain sum to 1.0
+    norm_min: float     # Below this = score 0 in this domain
+    norm_max: float     # Above this = score 1 in this domain
+
+@dataclass
+class Domain:
+    name: str
+    description: str
+    metric_bounds: list[MetricBound]
+```
+
+### Example Domains
+
+**Urban Commuting** (daily city travel, 1–50km):
+| Metric | Weight | Min (score=0) | Max (score=1) |
+|---|---|---|---|
+| speed | 0.25 | 5 km/h | 120 km/h |
+| cost_efficiency | 0.25 | $0.01/km | $2.00/km |
+| safety | 0.25 | 0.0 | 1.0 |
+| availability | 0.15 | 0.0 | 1.0 |
+| range_fuel | 0.10 | 5 km | 500 km |
+
+**Interplanetary Travel** (between planets in a solar system):
+| Metric | Weight | Min (score=0) | Max (score=1) |
+|---|---|---|---|
+| speed | 0.30 | 1,000 km/s | 300,000 km/s |
+| range_fuel | 0.30 | 1M km | 10B km |
+| safety | 0.20 | 0.0 | 1.0 |
+| cost_efficiency | 0.10 | $1K/km | $1B/km |
+| range_degradation | 0.10 | 100 days | 36,500 days |
+
+A car at 100 km/h scores ~0.83 for speed in urban commuting but ~0.0 in interplanetary travel. This is by design — domain context determines relevance.
+
+---
+
+## 7. Multi-Pass Pipeline
+
+**File:** `src/physcom/engine/pipeline.py`
+
+```
+   All Combinations (Cartesian product)
+           │
+   Pass 1: Constraint Resolution (hard physics, deterministic)
+           │ filter: BLOCKED removed
+           ▼
+   Valid + Conditional Combinations
+           │
+   Pass 2: Physics Estimation (compute or LLM-assisted)
+           │ estimates raw metric values per combination
+           ▼
+   Estimated Combinations
+           │
+   Pass 3: Scoring & Ranking (per domain)
+           │ filter: composite_score < threshold removed
+           ▼
+   High-Scoring Shortlist
+           │
+   Pass 4: LLM Review (social factors, novelty, plausibility)
+           │ annotates with natural-language assessment
+           ▼
+   Annotated Shortlist
+           │
+   Pass 5: Human Review (manual, via CLI)
+           │ human adds notes, approves/rejects
+           ▼
+   Final Curated Concepts
+```
+
+### Design Notes
+
+- **Pass 1 is deterministic and cheap.** It prunes the bulk of the combinatorial explosion using only dependency logic. No LLM calls, no estimation. This is the first and most aggressive filter.
+- **Pass 2 is the most expensive.** Physics estimation (whether formula-based or LLM-assisted) runs only on combinations that survived Pass 1. For the initial transport example (81 combinations, many blocked), this might mean ~20–40 surviving combinations need estimation.
+- **Pass 3 applies the multiplicative filter.** The logarithmic scoring distribution means most estimated combinations still score near 0. Only a handful survive the threshold.
+- **Passes 4–5 are human-scale.** By the time concepts reach LLM review and human review, the list should be small enough for thoughtful individual assessment.
+
+### Pipeline Interface
+
+```python
+class Pipeline:
+    def __init__(self, db: Repository, resolver: ConstraintResolver,
+                 scorer: Scorer, llm: LLMProvider | None): ...
+
+    def run(self, domain: Domain, dimensions: list[str],
+            score_threshold: float = 0.1,
+            passes: list[int] = [1, 2, 3, 4, 5]) -> PipelineResult: ...
+```
+
+The `passes` parameter allows partial runs (e.g., `[1, 2, 3]` skips LLM and human review).
+
+---
+
+## 8. LLM Interface (Abstract)
+
+**File:** `src/physcom/llm/base.py`
+
+```python
+from abc import ABC, abstractmethod
+
+class LLMProvider(ABC):
+    """Provider-agnostic LLM interface."""
+
+    @abstractmethod
+    def estimate_physics(self, combination_description: str,
+                         metrics: list[str]) -> dict[str, float]:
+        """Estimate raw metric values for a combination using
+        order-of-magnitude physics reasoning.
+        Returns {metric_name: estimated_value}."""
+        ...
+
+    @abstractmethod
+    def review_plausibility(self, combination_description: str,
+                            scores: dict[str, float]) -> str:
+        """Return a natural-language assessment of plausibility,
+        novelty, social viability, and practical barriers."""
+        ...
+```
+
+**File:** `src/physcom/llm/prompts.py`
+
+Structured prompt templates for:
+- `PHYSICS_ESTIMATION_PROMPT` — asks for order-of-magnitude metric estimates with reasoning
+- `PLAUSIBILITY_REVIEW_PROMPT` — asks for social viability, barriers, novelty, and prior art
+
+A `MockLLMProvider` is included for deterministic testing. Concrete providers (Anthropic, OpenAI, local) are implemented in `src/physcom/llm/providers/` and registered by name.
+
+---
+
+## 9. CLI Interface
+
+**File:** `src/physcom/cli.py`
+
+```
+physcom init                          # Create database, initialize schema
+physcom seed <seed_name>              # Load seed data (e.g., "transport")
+physcom entity add <dim> <name>       # Add an entity interactively
+physcom entity list [--dimension X]   # List entities with dependencies
+physcom domain add <name>             # Add a domain interactively
+physcom domain list                   # List domains with metric weights
+physcom run <domain> [--passes 1,2,3] [--threshold 0.1]
+physcom results <domain> [--top N]    # View ranked results
+physcom review <combination_id>       # Interactive human review
+physcom export <domain> --format md   # Export to markdown report
+```
+
+---
+
+## 10. Implementation Phases
+
+### Phase A: Foundation (database + models)
+1. Set up `pyproject.toml` with dependencies (`click` for CLI, `sqlite3` stdlib)
+2. Implement `db/schema.py` — all table DDL, `init_db()` function, indexes
+3. Implement `models/` — all dataclasses (`Entity`, `Dependency`, `Domain`, `Combination`, etc.)
+4. Implement `db/repository.py` — CRUD for entities, dependencies, domains, metrics
+5. Implement `seed/transport_example.py` — the README example with full dependency metadata on all 18 entities
+6. Implement `cli.py` — `init` and `seed` commands
+7. **Verify:** Create DB, load seed, query entities, confirm all dependency data persists correctly
+
+### Phase B: Constraint Engine
+1. Implement `engine/combinator.py` — Cartesian product generator across N dimensions
+2. Implement mutual exclusion registry (dict initially, DB table later)
+3. Implement `engine/constraint_resolver.py` — all five contradiction rules
+4. Wire Pass 1 into the pipeline and CLI
+5. **Verify:** Solar sail + walking → BLOCKED (atmosphere contradiction). Hydrogen engine + bicycle → VALID. Nuclear reactor + bicycle → BLOCKED (mass range). Confirm expected blocked/valid counts for full 81-combination transport example.
+
+### Phase C: Scoring
+1. Implement `engine/scorer.py` — log normalization + weighted geometric mean
+2. Implement domain metric weights in the database
+3. Wire Pass 2 with stub physics estimates (hardcoded for the transport seed)
+4. Wire Pass 3 into the pipeline
+5. **Verify:** Score distribution is heavily logarithmic. A single zero-metric kills composite score. Domain bounds shift rankings (same combination scores differently across domains).
+
+### Phase D: LLM Integration
+1. Implement `llm/base.py` — abstract interface
+2. Implement `llm/prompts.py` — prompt templates
+3. Implement `MockLLMProvider` for testing
+4. Wire Pass 2 to fall back to LLM when no hardcoded estimate exists
+5. Wire Pass 4 (plausibility review)
+6. **Verify:** Mock provider returns structured data. Pipeline processes LLM output correctly. Full pipeline runs end-to-end with mock.
+
+### Phase E: Human Review & Output
+1. Implement Pass 5 — interactive CLI review workflow
+2. Implement `export` command — markdown report generation
+3. Implement `results` and `review` CLI commands
+4. **Verify:** Full pipeline from seed data through all 5 passes to final curated output.
+
+### Phase F: Extension & Refinement
+1. Add new attribute dimensions (terrain, passenger count, cargo type)
+2. Add new evaluation domains (military logistics, recreational, emergency services)
+3. Refine dependency metadata based on pipeline output analysis
+4. Implement a concrete LLM provider
+5. Optional: Streamlit dashboard for interactive exploration
+
+---
+
+## 11. Testing Strategy
+
+| Test Type | Scope | Purpose |
+|---|---|---|
+| **Unit** | Each engine module | Constraint rules, scoring math, combinator logic |
+| **Integration** | Full pipeline with seed + mock LLM | End-to-end data flow |
+| **Property-based** | Scorer | Multiplicative zero-kill, score bounds [0,1], geometric mean identity |
+| **Snapshot** | Transport example | Known combinations produce known constraint/score results |
+| **Interface contract** | LLM providers | Structured output schema compliance (not output quality) |
+
+LLM output quality is inherently non-deterministic and is NOT tested. Only the interface contract (correct types, valid JSON, expected keys) is validated.
+
+---
+
+## 12. Verification Checkpoints
+
+After each phase, run:
+
+1. `python -m pytest tests/` — all tests green
+2. `physcom init && physcom seed transport` — seed loads without error
+3. `physcom run urban_commuting --passes 1` — constraint pass produces expected blocked/valid split
+4. `physcom run urban_commuting --passes 1,2,3 --threshold 0.05` — scoring produces a ranked shortlist
+5. `physcom results urban_commuting --top 10` — top concepts are plausible, not nonsense
+6. **Manual smell test:** Do obviously-absurd combinations survive? If so, dependency metadata or constraint rules need refinement — that's where the real signal-to-noise quality lives.