Add async pipeline with progress monitoring, resumability, and result transparency

Pipeline engine rewritten with combo-first loop: each combination is processed
through all requested passes before moving to the next, with incremental DB
saves after every step (crash-safe). Blocked combos now get result rows so they
appear in the results page with constraint violation reasons.

New pipeline_runs table tracks run lifecycle (pending/running/completed/failed/
cancelled). Web route launches pipeline in a background thread with its own DB
connection. HTMX polling partial shows live progress with per-pass breakdown.

Also: status guard prevents reviewed->scored downgrade, save_combination loads
existing status on dedup for correct resume, per-metric scores show domain
bounds + units + position bars, ensure_metric backfills units on existing rows.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Simonson, Andrew
2026-02-18 15:30:52 -06:00
parent 8118a62242
commit d2028a642b
17 changed files with 1263 additions and 217 deletions

65
CLAUDE.md Normal file
View File

@@ -0,0 +1,65 @@
# PhysCom — Physical Combinatorics
Innovation discovery engine: generate entity combinations, filter by physical constraints, score against domain-specific metrics, rank results.
## Commands
- **Tests**: `python -m pytest tests/ -q` (48 tests, ~3s). Run after every change.
- **Web dev server**: `python -m physcom_web`
- **CLI**: `python -m physcom`
- **Seed data**: loaded automatically on first DB init (SQLite, `physcom.db` or `$PHYSCOM_DB`)
## Architecture
```
src/physcom/ # Core library (no web dependency)
models/ # Dataclasses: Entity, Dependency, Combination, Domain, MetricBound
db/schema.py # DDL (all CREATE TABLE statements)
db/repository.py # All DB access — single Repository class, sqlite3 row_factory=Row
engine/combinator.py # Cartesian product of entities across dimensions
engine/constraint_resolver.py # Pass 1: requires/excludes/mutex/range/force checks
engine/scorer.py # Pass 3: log-normalize raw→0-1, weighted geometric mean composite
engine/pipeline.py # Orchestrator: combo-first loop, incremental saves, resume, cancel
llm/base.py # LLMProvider ABC (estimate_physics, review_plausibility)
llm/providers/mock.py # MockLLMProvider for tests
seed/transport_example.py # 9 platforms + 9 power sources, 2 domains
src/physcom_web/ # Flask web UI
app.py # App factory, get_repo(), DB path resolution
routes/pipeline.py # Background thread pipeline execution, HTMX status/cancel endpoints
routes/results.py # Results browse, detail view, human review submission
routes/entities.py # Entity CRUD
routes/domains.py # Domain listing
templates/ # Jinja2, extends base.html, uses HTMX for polling
static/style.css # Single stylesheet
tests/ # pytest, uses seeded_repo fixture from conftest.py
```
## Key patterns
- **Repository is the only DB interface.** No raw SQL outside `repository.py`.
- **Pipeline is combo-first**: each combo goes through all requested passes before the next combo starts. Progress is persisted per-combo (crash-safe, resumable).
- **`pipeline_runs` table** tracks run lifecycle: pending → running → completed/failed/cancelled. The web route creates the record, then starts a background thread with its own `sqlite3.Connection`.
- **`combination_results`** has rows for ALL combos including blocked ones (pass_reached=1, composite_score=0.0). Scored combos get pass_reached=3+.
- **Status guard**: `update_combination_status` refuses to downgrade `reviewed``scored`.
- **`save_combination`** loads existing status/block_reason on dedup (important for resume).
- **`ensure_metric`** backfills unit if the row already exists with an empty unit.
- **MetricBound** carries `unit` — flows through seed → ensure_metric → metrics table → get_combination_scores → template display.
- **HTMX polling**: `_run_status.html` partial polls every 2s while run is pending/running; stops polling when terminal.
## Data flow (pipeline passes)
1. **Pass 1 — Constraints**: `ConstraintResolver.resolve()` → blocked/conditional/valid. Blocked combos get a result row and `continue`.
2. **Pass 2 — Estimation**: LLM or `_stub_estimate()` → raw metric values. Saved immediately via `save_raw_estimates()` (normalized_score=NULL).
3. **Pass 3 — Scoring**: `Scorer.score_combination()` → log-normalized scores + weighted geometric mean composite. Saves via `save_scores()` + `save_result()`.
4. **Pass 4 — LLM Review**: Only for above-threshold combos with an LLM provider.
5. **Pass 5 — Human Review**: Manual via web UI results page.
## Conventions
- Python 3.11+, `from __future__ import annotations` everywhere.
- Dataclasses for models, no ORM.
- Tests use `seeded_repo` fixture (in-memory SQLite with transport seed data).
- Don't use `cd` in Bash commands — run from the working directory so pre-approved permission patterns match.
- Don't add docstrings/comments/type annotations to code you didn't change.

View File

@@ -3,7 +3,9 @@
from __future__ import annotations from __future__ import annotations
import hashlib import hashlib
import json
import sqlite3 import sqlite3
from datetime import datetime, timezone
from typing import Sequence from typing import Sequence
from physcom.models.entity import Dependency, Entity from physcom.models.entity import Dependency, Entity
@@ -170,6 +172,11 @@ class Repository:
"INSERT OR IGNORE INTO metrics (name, unit, description) VALUES (?, ?, ?)", "INSERT OR IGNORE INTO metrics (name, unit, description) VALUES (?, ?, ?)",
(name, unit, description), (name, unit, description),
) )
if unit:
self.conn.execute(
"UPDATE metrics SET unit = ? WHERE name = ? AND (unit IS NULL OR unit = '')",
(unit, name),
)
row = self.conn.execute("SELECT id FROM metrics WHERE name = ?", (name,)).fetchone() row = self.conn.execute("SELECT id FROM metrics WHERE name = ?", (name,)).fetchone()
self.conn.commit() self.conn.commit()
return row["id"] return row["id"]
@@ -181,7 +188,7 @@ class Repository:
) )
domain.id = cur.lastrowid domain.id = cur.lastrowid
for mb in domain.metric_bounds: for mb in domain.metric_bounds:
metric_id = self.ensure_metric(mb.metric_name) metric_id = self.ensure_metric(mb.metric_name, unit=mb.unit)
mb.metric_id = metric_id mb.metric_id = metric_id
self.conn.execute( self.conn.execute(
"""INSERT INTO domain_metric_weights """INSERT INTO domain_metric_weights
@@ -233,10 +240,13 @@ class Repository:
combination.hash = self.compute_hash(entity_ids) combination.hash = self.compute_hash(entity_ids)
existing = self.conn.execute( existing = self.conn.execute(
"SELECT id FROM combinations WHERE hash = ?", (combination.hash,) "SELECT id, status, block_reason FROM combinations WHERE hash = ?",
(combination.hash,),
).fetchone() ).fetchone()
if existing: if existing:
combination.id = existing["id"] combination.id = existing["id"]
combination.status = existing["status"]
combination.block_reason = existing["block_reason"]
return combination return combination
cur = self.conn.execute( cur = self.conn.execute(
@@ -255,6 +265,13 @@ class Repository:
def update_combination_status( def update_combination_status(
self, combo_id: int, status: str, block_reason: str | None = None self, combo_id: int, status: str, block_reason: str | None = None
) -> None: ) -> None:
# Don't downgrade 'reviewed' to 'scored' — preserve human review state
if status == "scored":
row = self.conn.execute(
"SELECT status FROM combinations WHERE id = ?", (combo_id,)
).fetchone()
if row and row["status"] == "reviewed":
return
self.conn.execute( self.conn.execute(
"UPDATE combinations SET status = ?, block_reason = ? WHERE id = ?", "UPDATE combinations SET status = ?, block_reason = ? WHERE id = ?",
(status, block_reason, combo_id), (status, block_reason, combo_id),
@@ -327,7 +344,7 @@ class Repository:
def get_combination_scores(self, combo_id: int, domain_id: int) -> list[dict]: def get_combination_scores(self, combo_id: int, domain_id: int) -> list[dict]:
"""Return per-metric scores for a combination in a domain.""" """Return per-metric scores for a combination in a domain."""
rows = self.conn.execute( rows = self.conn.execute(
"""SELECT cs.*, m.name as metric_name """SELECT cs.*, m.name as metric_name, m.unit as metric_unit
FROM combination_scores cs FROM combination_scores cs
JOIN metrics m ON cs.metric_id = m.id JOIN metrics m ON cs.metric_id = m.id
WHERE cs.combination_id = ? AND cs.domain_id = ?""", WHERE cs.combination_id = ? AND cs.domain_id = ?""",
@@ -335,12 +352,52 @@ class Repository:
).fetchall() ).fetchall()
return [dict(r) for r in rows] return [dict(r) for r in rows]
def count_combinations_by_status(self) -> dict[str, int]: def count_combinations_by_status(self, domain_name: str | None = None) -> dict[str, int]:
"""Count combos by status. If domain_name given, only combos with results in that domain."""
if domain_name:
rows = self.conn.execute(
"""SELECT c.status, COUNT(*) as cnt
FROM combination_results cr
JOIN combinations c ON cr.combination_id = c.id
JOIN domains d ON cr.domain_id = d.id
WHERE d.name = ?
GROUP BY c.status""",
(domain_name,),
).fetchall()
else:
rows = self.conn.execute( rows = self.conn.execute(
"SELECT status, COUNT(*) as cnt FROM combinations GROUP BY status" "SELECT status, COUNT(*) as cnt FROM combinations GROUP BY status"
).fetchall() ).fetchall()
return {r["status"]: r["cnt"] for r in rows} return {r["status"]: r["cnt"] for r in rows}
def get_pipeline_summary(self, domain_name: str) -> dict | None:
"""Return a summary of results for a domain, or None if no results."""
row = self.conn.execute(
"""SELECT COUNT(*) as total,
AVG(cr.composite_score) as avg_score,
MAX(cr.composite_score) as max_score,
MIN(cr.composite_score) as min_score,
MAX(cr.pass_reached) as last_pass
FROM combination_results cr
JOIN domains d ON cr.domain_id = d.id
WHERE d.name = ?""",
(domain_name,),
).fetchone()
if not row or row["total"] == 0:
return None
# Also count blocked combos (they have no results but exist)
blocked = self.conn.execute(
"SELECT COUNT(*) as cnt FROM combinations WHERE status = 'blocked'"
).fetchone()
return {
"total_results": row["total"],
"avg_score": row["avg_score"],
"max_score": row["max_score"],
"min_score": row["min_score"],
"last_pass": row["last_pass"],
"blocked": blocked["cnt"] if blocked else 0,
}
def get_result(self, combo_id: int, domain_id: int) -> dict | None: def get_result(self, combo_id: int, domain_id: int) -> dict | None:
"""Return a single combination_result row.""" """Return a single combination_result row."""
row = self.conn.execute( row = self.conn.execute(
@@ -412,3 +469,88 @@ class Repository:
"pass_reached": r["pass_reached"], "pass_reached": r["pass_reached"],
}) })
return results return results
# ── Pipeline Runs ────────────────────────────────────────
def create_pipeline_run(self, domain_id: int, config: dict) -> int:
"""Create a new pipeline_run record. Returns the run id."""
cur = self.conn.execute(
"""INSERT INTO pipeline_runs (domain_id, status, config, created_at)
VALUES (?, 'pending', ?, ?)""",
(domain_id, json.dumps(config), datetime.now(timezone.utc).isoformat()),
)
self.conn.commit()
return cur.lastrowid
def update_pipeline_run(self, run_id: int, **fields) -> None:
"""Update arbitrary fields on a pipeline_run."""
if not fields:
return
set_clause = ", ".join(f"{k} = ?" for k in fields)
values = list(fields.values())
values.append(run_id)
self.conn.execute(
f"UPDATE pipeline_runs SET {set_clause} WHERE id = ?", values
)
self.conn.commit()
def get_pipeline_run(self, run_id: int) -> dict | None:
row = self.conn.execute(
"SELECT * FROM pipeline_runs WHERE id = ?", (run_id,)
).fetchone()
return dict(row) if row else None
def list_pipeline_runs(self, domain_id: int | None = None) -> list[dict]:
if domain_id is not None:
rows = self.conn.execute(
"""SELECT pr.*, d.name as domain_name
FROM pipeline_runs pr
JOIN domains d ON pr.domain_id = d.id
WHERE pr.domain_id = ?
ORDER BY pr.created_at DESC""",
(domain_id,),
).fetchall()
else:
rows = self.conn.execute(
"""SELECT pr.*, d.name as domain_name
FROM pipeline_runs pr
JOIN domains d ON pr.domain_id = d.id
ORDER BY pr.created_at DESC"""
).fetchall()
return [dict(r) for r in rows]
def get_combo_pass_reached(self, combo_id: int, domain_id: int) -> int | None:
"""Return the pass_reached for a combo in a domain, or None if no result."""
row = self.conn.execute(
"""SELECT pass_reached FROM combination_results
WHERE combination_id = ? AND domain_id = ?""",
(combo_id, domain_id),
).fetchone()
return row["pass_reached"] if row else None
def save_raw_estimates(
self, combo_id: int, domain_id: int, estimates: list[dict]
) -> None:
"""Save raw metric estimates (pass 2) with normalized_score=NULL.
Each dict: metric_id, raw_value, estimation_method, confidence.
"""
for e in estimates:
self.conn.execute(
"""INSERT OR REPLACE INTO combination_scores
(combination_id, domain_id, metric_id, raw_value, normalized_score,
estimation_method, confidence)
VALUES (?, ?, ?, ?, NULL, ?, ?)""",
(combo_id, domain_id, e["metric_id"], e["raw_value"],
e["estimation_method"], e["confidence"]),
)
self.conn.commit()
def get_existing_result(self, combo_id: int, domain_id: int) -> dict | None:
"""Return the full combination_results row for resume logic."""
row = self.conn.execute(
"""SELECT * FROM combination_results
WHERE combination_id = ? AND domain_id = ?""",
(combo_id, domain_id),
).fetchone()
return dict(row) if row else None

View File

@@ -91,11 +91,29 @@ CREATE TABLE IF NOT EXISTS combination_results (
UNIQUE(combination_id, domain_id) UNIQUE(combination_id, domain_id)
); );
CREATE TABLE IF NOT EXISTS pipeline_runs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
domain_id INTEGER NOT NULL REFERENCES domains(id),
status TEXT NOT NULL DEFAULT 'pending',
config TEXT,
total_combos INTEGER DEFAULT 0,
combos_pass1 INTEGER DEFAULT 0,
combos_pass2 INTEGER DEFAULT 0,
combos_pass3 INTEGER DEFAULT 0,
combos_pass4 INTEGER DEFAULT 0,
current_pass INTEGER,
error_message TEXT,
started_at TIMESTAMP,
completed_at TIMESTAMP,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX IF NOT EXISTS idx_deps_entity ON dependencies(entity_id); CREATE INDEX IF NOT EXISTS idx_deps_entity ON dependencies(entity_id);
CREATE INDEX IF NOT EXISTS idx_deps_category_key ON dependencies(category, key); CREATE INDEX IF NOT EXISTS idx_deps_category_key ON dependencies(category, key);
CREATE INDEX IF NOT EXISTS idx_combo_status ON combinations(status); CREATE INDEX IF NOT EXISTS idx_combo_status ON combinations(status);
CREATE INDEX IF NOT EXISTS idx_scores_combo_domain ON combination_scores(combination_id, domain_id); CREATE INDEX IF NOT EXISTS idx_scores_combo_domain ON combination_scores(combination_id, domain_id);
CREATE INDEX IF NOT EXISTS idx_results_domain_score ON combination_results(domain_id, composite_score DESC); CREATE INDEX IF NOT EXISTS idx_results_domain_score ON combination_results(domain_id, composite_score DESC);
CREATE INDEX IF NOT EXISTS idx_pipeline_runs_domain ON pipeline_runs(domain_id);
""" """

View File

@@ -1,15 +1,15 @@
"""Multi-pass pipeline orchestrator.""" """Multi-pass pipeline orchestrator with incremental saves and resumability."""
from __future__ import annotations from __future__ import annotations
from dataclasses import dataclass, field from dataclasses import dataclass, field
from datetime import datetime, timezone
from physcom.db.repository import Repository from physcom.db.repository import Repository
from physcom.engine.combinator import generate_combinations from physcom.engine.combinator import generate_combinations
from physcom.engine.constraint_resolver import ConstraintResolver, ConstraintResult from physcom.engine.constraint_resolver import ConstraintResolver, ConstraintResult
from physcom.engine.scorer import Scorer from physcom.engine.scorer import Scorer
from physcom.llm.base import LLMProvider from physcom.llm.base import LLMProvider
from physcom.llm.prompts import PHYSICS_ESTIMATION_PROMPT, PLAUSIBILITY_REVIEW_PROMPT
from physcom.models.combination import Combination, ScoredResult from physcom.models.combination import Combination, ScoredResult
from physcom.models.domain import Domain from physcom.models.domain import Domain
@@ -23,12 +23,17 @@ class PipelineResult:
pass1_blocked: int = 0 pass1_blocked: int = 0
pass1_conditional: int = 0 pass1_conditional: int = 0
pass2_estimated: int = 0 pass2_estimated: int = 0
pass3_scored: int = 0
pass3_above_threshold: int = 0 pass3_above_threshold: int = 0
pass4_reviewed: int = 0 pass4_reviewed: int = 0
pass5_human_reviewed: int = 0 pass5_human_reviewed: int = 0
top_results: list[dict] = field(default_factory=list) top_results: list[dict] = field(default_factory=list)
class CancelledError(Exception):
"""Raised when a pipeline run is cancelled."""
def _describe_combination(combo: Combination) -> str: def _describe_combination(combo: Combination) -> str:
"""Build a natural-language description of a combination.""" """Build a natural-language description of a combination."""
parts = [f"{e.dimension}: {e.name}" for e in combo.entities] parts = [f"{e.dimension}: {e.name}" for e in combo.entities]
@@ -53,71 +58,88 @@ class Pipeline:
self.scorer = scorer self.scorer = scorer
self.llm = llm self.llm = llm
def _check_cancelled(self, run_id: int | None) -> None:
"""Raise CancelledError if the run has been cancelled."""
if run_id is None:
return
run = self.repo.get_pipeline_run(run_id)
if run and run["status"] == "cancelled":
raise CancelledError("Pipeline run cancelled")
def _update_run_counters(
self, run_id: int | None, result: PipelineResult, current_pass: int
) -> None:
"""Update pipeline_run progress counters in the DB."""
if run_id is None:
return
self.repo.update_pipeline_run(
run_id,
combos_pass1=result.pass1_valid
+ result.pass1_conditional
+ result.pass1_blocked,
combos_pass2=result.pass2_estimated,
combos_pass3=result.pass3_scored,
combos_pass4=result.pass4_reviewed,
current_pass=current_pass,
)
def run( def run(
self, self,
domain: Domain, domain: Domain,
dimensions: list[str], dimensions: list[str],
score_threshold: float = 0.1, score_threshold: float = 0.1,
passes: list[int] | None = None, passes: list[int] | None = None,
run_id: int | None = None,
) -> PipelineResult: ) -> PipelineResult:
if passes is None: if passes is None:
passes = [1, 2, 3, 4, 5] passes = [1, 2, 3, 4, 5]
result = PipelineResult() result = PipelineResult()
# Mark run as running (unless already cancelled)
if run_id is not None:
run_record = self.repo.get_pipeline_run(run_id)
if run_record and run_record["status"] == "cancelled":
result.top_results = self.repo.get_top_results(domain.name, limit=20)
return result
self.repo.update_pipeline_run(
run_id,
status="running",
started_at=datetime.now(timezone.utc).isoformat(),
)
# Generate all combinations # Generate all combinations
combos = generate_combinations(self.repo, dimensions) combos = generate_combinations(self.repo, dimensions)
result.total_generated = len(combos) result.total_generated = len(combos)
# Save all combinations to DB # Save all combinations to DB (also loads status for existing combos)
for combo in combos: for combo in combos:
self.repo.save_combination(combo) self.repo.save_combination(combo)
# ── Pass 1: Constraint Resolution ─────────────────────── if run_id is not None:
valid_combos: list[Combination] = [] self.repo.update_pipeline_run(run_id, total_combos=len(combos))
if 1 in passes:
valid_combos = self._pass1_constraints(combos, result)
else:
valid_combos = combos
# ── Pass 2: Physics Estimation ────────────────────────── # Prepare metric lookup
estimated: list[tuple[Combination, dict[str, float]]] = [] metric_names = [mb.metric_name for mb in domain.metric_bounds]
if 2 in passes: bounds_by_name = {mb.metric_name: mb for mb in domain.metric_bounds}
estimated = self._pass2_estimation(valid_combos, domain, result)
else:
# Skip estimation, use zeros
estimated = [(c, {}) for c in valid_combos]
# ── Pass 3: Scoring & Ranking ─────────────────────────── # ── Combo-first loop ─────────────────────────────────────
scored: list[tuple[Combination, ScoredResult]] = [] try:
if 3 in passes:
scored = self._pass3_scoring(estimated, domain, score_threshold, result)
# ── Pass 4: LLM Review ──────────────────────────────────
if 4 in passes and self.llm:
self._pass4_llm_review(scored, domain, result)
# ── Save results after scoring ─────────────────────────
if 3 in passes:
max_pass = max(p for p in passes if p <= 5)
for combo, sr in scored:
self.repo.save_result(
combo.id, domain.id, sr.composite_score,
pass_reached=max_pass,
novelty_flag=sr.novelty_flag,
llm_review=sr.llm_review,
)
self.repo.update_combination_status(combo.id, "scored")
# Collect top results
result.top_results = self.repo.get_top_results(domain.name, limit=20)
return result
def _pass1_constraints(
self, combos: list[Combination], result: PipelineResult
) -> list[Combination]:
valid = []
for combo in combos: for combo in combos:
self._check_cancelled(run_id)
# Check existing progress for this combo in this domain
existing_pass = self.repo.get_combo_pass_reached(
combo.id, domain.id
) or 0
# Load existing result to preserve human review data
existing_result = self.repo.get_existing_result(
combo.id, domain.id
)
# ── Pass 1: Constraint Resolution ────────────────
if 1 in passes and existing_pass < 1:
cr: ConstraintResult = self.resolver.resolve(combo) cr: ConstraintResult = self.resolver.resolve(combo)
if cr.status == "blocked": if cr.status == "blocked":
combo.status = "blocked" combo.status = "blocked"
@@ -125,56 +147,87 @@ class Pipeline:
self.repo.update_combination_status( self.repo.update_combination_status(
combo.id, "blocked", combo.block_reason combo.id, "blocked", combo.block_reason
) )
# Save a result row so blocked combos appear in results
self.repo.save_result(
combo.id,
domain.id,
composite_score=0.0,
pass_reached=1,
)
result.pass1_blocked += 1 result.pass1_blocked += 1
self._update_run_counters(run_id, result, current_pass=1)
continue # blocked — skip remaining passes
elif cr.status == "conditional": elif cr.status == "conditional":
combo.status = "valid" combo.status = "valid"
self.repo.update_combination_status(combo.id, "valid") self.repo.update_combination_status(combo.id, "valid")
valid.append(combo)
result.pass1_conditional += 1 result.pass1_conditional += 1
else: else:
combo.status = "valid" combo.status = "valid"
self.repo.update_combination_status(combo.id, "valid") self.repo.update_combination_status(combo.id, "valid")
valid.append(combo)
result.pass1_valid += 1 result.pass1_valid += 1
return valid
def _pass2_estimation( self._update_run_counters(run_id, result, current_pass=1)
self, elif 1 in passes:
combos: list[Combination], # Already pass1'd — check if it was blocked
domain: Domain, if combo.status == "blocked":
result: PipelineResult, result.pass1_blocked += 1
) -> list[tuple[Combination, dict[str, float]]]: continue
metric_names = [mb.metric_name for mb in domain.metric_bounds] else:
estimated = [] result.pass1_valid += 1
else:
# Pass 1 not requested; check if blocked from a prior run
if combo.status == "blocked":
result.pass1_blocked += 1
continue
for combo in combos: # ── Pass 2: Physics Estimation ───────────────────
raw_metrics: dict[str, float] = {}
if 2 in passes and existing_pass < 2:
description = _describe_combination(combo) description = _describe_combination(combo)
if self.llm: if self.llm:
raw_metrics = self.llm.estimate_physics(description, metric_names) raw_metrics = self.llm.estimate_physics(
description, metric_names
)
else: else:
# Stub estimation: derive from dependencies where possible
raw_metrics = self._stub_estimate(combo, metric_names) raw_metrics = self._stub_estimate(combo, metric_names)
estimated.append((combo, raw_metrics))
# Save raw estimates immediately (crash-safe)
estimate_dicts = []
for mname, rval in raw_metrics.items():
mb = bounds_by_name.get(mname)
if mb and mb.metric_id:
estimate_dicts.append({
"metric_id": mb.metric_id,
"raw_value": rval,
"estimation_method": "llm" if self.llm else "stub",
"confidence": 1.0,
})
if estimate_dicts:
self.repo.save_raw_estimates(
combo.id, domain.id, estimate_dicts
)
result.pass2_estimated += 1 result.pass2_estimated += 1
self._update_run_counters(run_id, result, current_pass=2)
elif 2 in passes:
# Already estimated — reload raw values from DB
existing_scores = self.repo.get_combination_scores(
combo.id, domain.id
)
raw_metrics = {
s["metric_name"]: s["raw_value"] for s in existing_scores
}
result.pass2_estimated += 1
else:
# Pass 2 not requested, use empty metrics
raw_metrics = {}
return estimated # ── Pass 3: Scoring & Ranking ────────────────────
if 3 in passes and existing_pass < 3:
def _pass3_scoring(
self,
estimated: list[tuple[Combination, dict[str, float]]],
domain: Domain,
threshold: float,
result: PipelineResult,
) -> list[tuple[Combination, ScoredResult]]:
scored = []
for combo, raw_metrics in estimated:
sr = self.scorer.score_combination(combo, raw_metrics) sr = self.scorer.score_combination(combo, raw_metrics)
if sr.composite_score >= threshold:
scored.append((combo, sr)) # Persist per-metric scores with normalized values
result.pass3_above_threshold += 1
# Persist per-metric scores
score_dicts = [] score_dicts = []
bounds_by_name = {mb.metric_name: mb for mb in domain.metric_bounds}
for s in sr.scores: for s in sr.scores:
mb = bounds_by_name.get(s.metric_name) mb = bounds_by_name.get(s.metric_name)
if mb and mb.metric_id: if mb and mb.metric_id:
@@ -188,22 +241,97 @@ class Pipeline:
if score_dicts: if score_dicts:
self.repo.save_scores(combo.id, domain.id, score_dicts) self.repo.save_scores(combo.id, domain.id, score_dicts)
# Sort by composite score descending # Preserve existing human data
scored.sort(key=lambda x: x[1].composite_score, reverse=True) novelty_flag = (
return scored existing_result["novelty_flag"] if existing_result else None
)
human_notes = (
existing_result["human_notes"] if existing_result else None
)
def _pass4_llm_review( self.repo.save_result(
self, combo.id,
scored: list[tuple[Combination, ScoredResult]], domain.id,
domain: Domain, sr.composite_score,
result: PipelineResult, pass_reached=3,
) -> None: novelty_flag=novelty_flag,
for combo, sr in scored: human_notes=human_notes,
)
self.repo.update_combination_status(combo.id, "scored")
result.pass3_scored += 1
if sr.composite_score >= score_threshold:
result.pass3_above_threshold += 1
self._update_run_counters(run_id, result, current_pass=3)
elif 3 in passes and existing_pass >= 3:
# Already scored — count it
result.pass3_scored += 1
if existing_result and existing_result["composite_score"] is not None:
if existing_result["composite_score"] >= score_threshold:
result.pass3_above_threshold += 1
# ── Pass 4: LLM Review ───────────────────────────
if 4 in passes and self.llm:
cur_pass = self.repo.get_combo_pass_reached(
combo.id, domain.id
) or 0
if cur_pass < 4:
cur_result = self.repo.get_existing_result(
combo.id, domain.id
)
if (
cur_result
and cur_result["composite_score"] is not None
and cur_result["composite_score"] >= score_threshold
):
description = _describe_combination(combo) description = _describe_combination(combo)
score_dict = {s.metric_name: s.normalized_score for s in sr.scores} db_scores = self.repo.get_combination_scores(
review = self.llm.review_plausibility(description, score_dict) combo.id, domain.id
sr.llm_review = review )
score_dict = {
s["metric_name"]: s["normalized_score"]
for s in db_scores
if s["normalized_score"] is not None
}
review = self.llm.review_plausibility(
description, score_dict
)
self.repo.save_result(
combo.id,
domain.id,
cur_result["composite_score"],
pass_reached=4,
novelty_flag=cur_result.get("novelty_flag"),
llm_review=review,
human_notes=cur_result.get("human_notes"),
)
result.pass4_reviewed += 1 result.pass4_reviewed += 1
self._update_run_counters(
run_id, result, current_pass=4
)
except CancelledError:
if run_id is not None:
self.repo.update_pipeline_run(
run_id,
status="cancelled",
completed_at=datetime.now(timezone.utc).isoformat(),
)
result.top_results = self.repo.get_top_results(domain.name, limit=20)
return result
# Mark run as completed
if run_id is not None:
self.repo.update_pipeline_run(
run_id,
status="completed",
completed_at=datetime.now(timezone.utc).isoformat(),
)
result.top_results = self.repo.get_top_results(domain.name, limit=20)
return result
def _stub_estimate( def _stub_estimate(
self, combo: Combination, metric_names: list[str] self, combo: Combination, metric_names: list[str]
@@ -223,24 +351,21 @@ class Pipeline:
# Rough speed estimate: F=ma -> v proportional to power/mass # Rough speed estimate: F=ma -> v proportional to power/mass
if "speed" in raw and mass_kg > 0: if "speed" in raw and mass_kg > 0:
# Very rough: speed ~ power / (mass * drag_coeff)
raw["speed"] = min(force_watts / mass_kg * 0.5, 300000) raw["speed"] = min(force_watts / mass_kg * 0.5, 300000)
if "cost_efficiency" in raw: if "cost_efficiency" in raw:
# Lower force = cheaper per km (roughly)
raw["cost_efficiency"] = max(0.01, 2.0 - force_watts / 100000) raw["cost_efficiency"] = max(0.01, 2.0 - force_watts / 100000)
if "safety" in raw: if "safety" in raw:
raw["safety"] = 0.5 # default mid-range raw["safety"] = 0.5
if "availability" in raw: if "availability" in raw:
raw["availability"] = 0.5 raw["availability"] = 0.5
if "range_fuel" in raw: if "range_fuel" in raw:
# More power = more range (very rough)
raw["range_fuel"] = min(force_watts * 0.01, 1e10) raw["range_fuel"] = min(force_watts * 0.01, 1e10)
if "range_degradation" in raw: if "range_degradation" in raw:
raw["range_degradation"] = 365 # 1 year default raw["range_degradation"] = 365
return raw return raw

View File

@@ -13,6 +13,7 @@ class MetricBound:
weight: float # 0.01.0 weight: float # 0.01.0
norm_min: float # Below this → score 0 norm_min: float # Below this → score 0
norm_max: float # Above this → score 1 norm_max: float # Above this → score 1
unit: str = ""
metric_id: int | None = None metric_id: int | None = None

View File

@@ -243,11 +243,11 @@ URBAN_COMMUTING = Domain(
name="urban_commuting", name="urban_commuting",
description="Daily travel within a city, 1-50km range", description="Daily travel within a city, 1-50km range",
metric_bounds=[ metric_bounds=[
MetricBound("speed", weight=0.25, norm_min=5, norm_max=120), MetricBound("speed", weight=0.25, norm_min=5, norm_max=120, unit="km/h"),
MetricBound("cost_efficiency", weight=0.25, norm_min=0.01, norm_max=2.0), MetricBound("cost_efficiency", weight=0.25, norm_min=0.01, norm_max=2.0, unit="$/km"),
MetricBound("safety", weight=0.25, norm_min=0.0, norm_max=1.0), MetricBound("safety", weight=0.25, norm_min=0.0, norm_max=1.0, unit="0-1"),
MetricBound("availability", weight=0.15, norm_min=0.0, norm_max=1.0), MetricBound("availability", weight=0.15, norm_min=0.0, norm_max=1.0, unit="0-1"),
MetricBound("range_fuel", weight=0.10, norm_min=5, norm_max=500), MetricBound("range_fuel", weight=0.10, norm_min=5, norm_max=500, unit="km"),
], ],
) )
@@ -255,11 +255,11 @@ INTERPLANETARY = Domain(
name="interplanetary_travel", name="interplanetary_travel",
description="Travel between planets within a solar system", description="Travel between planets within a solar system",
metric_bounds=[ metric_bounds=[
MetricBound("speed", weight=0.30, norm_min=1000, norm_max=300000), MetricBound("speed", weight=0.30, norm_min=1000, norm_max=300000, unit="km/s"),
MetricBound("range_fuel", weight=0.30, norm_min=1e6, norm_max=1e10), MetricBound("range_fuel", weight=0.30, norm_min=1e6, norm_max=1e10, unit="km"),
MetricBound("safety", weight=0.20, norm_min=0.0, norm_max=1.0), MetricBound("safety", weight=0.20, norm_min=0.0, norm_max=1.0, unit="0-1"),
MetricBound("cost_efficiency", weight=0.10, norm_min=1e3, norm_max=1e9), MetricBound("cost_efficiency", weight=0.10, norm_min=1e3, norm_max=1e9, unit="$/km"),
MetricBound("range_degradation", weight=0.10, norm_min=100, norm_max=36500), MetricBound("range_degradation", weight=0.10, norm_min=100, norm_max=36500, unit="days"),
], ],
) )

View File

@@ -40,18 +40,8 @@ def entity_new():
dimensions=repo.list_dimensions()) dimensions=repo.list_dimensions())
@bp.route("/<int:entity_id>") @bp.route("/<int:entity_id>", methods=["GET", "POST"])
def entity_detail(entity_id: int): def entity_detail(entity_id: int):
repo = get_repo()
entity = repo.get_entity(entity_id)
if not entity:
flash("Entity not found.", "error")
return redirect(url_for("entities.entity_list"))
return render_template("entities/detail.html", entity=entity)
@bp.route("/<int:entity_id>/edit", methods=["GET", "POST"])
def entity_edit(entity_id: int):
repo = get_repo() repo = get_repo()
entity = repo.get_entity(entity_id) entity = repo.get_entity(entity_id)
if not entity: if not entity:
@@ -62,13 +52,17 @@ def entity_edit(entity_id: int):
description = request.form.get("description", "").strip() description = request.form.get("description", "").strip()
if not name: if not name:
flash("Name is required.", "error") flash("Name is required.", "error")
return render_template("entities/form.html", entity=entity, else:
dimensions=repo.list_dimensions())
repo.update_entity(entity_id, name, description) repo.update_entity(entity_id, name, description)
flash(f"Entity '{name}' updated.", "success") flash(f"Entity '{name}' updated.", "success")
entity = repo.get_entity(entity_id)
return render_template("entities/detail.html", entity=entity)
@bp.route("/<int:entity_id>/edit")
def entity_edit(entity_id: int):
"""Legacy route — redirect to detail page."""
return redirect(url_for("entities.entity_detail", entity_id=entity_id)) return redirect(url_for("entities.entity_detail", entity_id=entity_id))
return render_template("entities/form.html", entity=entity,
dimensions=repo.list_dimensions())
@bp.route("/<int:entity_id>/delete", methods=["POST"]) @bp.route("/<int:entity_id>/delete", methods=["POST"])

View File

@@ -1,7 +1,12 @@
"""Pipeline run routes.""" """Pipeline run routes with background execution and progress monitoring."""
from __future__ import annotations from __future__ import annotations
import json
import os
import threading
from pathlib import Path
from flask import Blueprint, flash, redirect, render_template, request, url_for from flask import Blueprint, flash, redirect, render_template, request, url_for
from physcom_web.app import get_repo from physcom_web.app import get_repo
@@ -9,12 +14,77 @@ from physcom_web.app import get_repo
bp = Blueprint("pipeline", __name__, url_prefix="/pipeline") bp = Blueprint("pipeline", __name__, url_prefix="/pipeline")
def _run_pipeline_in_background(
db_path: str,
domain_name: str,
dim_list: list[str],
passes: list[int],
threshold: float,
run_id: int,
) -> None:
"""Run the pipeline in a background thread with its own DB connection."""
from physcom.db.schema import init_db
from physcom.db.repository import Repository
from physcom.engine.constraint_resolver import ConstraintResolver
from physcom.engine.scorer import Scorer
from physcom.engine.pipeline import Pipeline
try:
conn = init_db(db_path)
repo = Repository(conn)
domain = repo.get_domain(domain_name)
if not domain:
repo.update_pipeline_run(
run_id, status="failed",
error_message=f"Domain '{domain_name}' not found",
)
conn.close()
return
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer, llm=None)
pipeline.run(
domain, dim_list,
score_threshold=threshold,
passes=passes,
run_id=run_id,
)
except Exception as exc:
try:
repo.update_pipeline_run(
run_id, status="failed",
error_message=str(exc)[:500],
)
except Exception:
pass
finally:
try:
conn.close()
except Exception:
pass
@bp.route("/") @bp.route("/")
def pipeline_form(): def pipeline_form():
repo = get_repo() repo = get_repo()
domains = repo.list_domains() domains = repo.list_domains()
dimensions = repo.list_dimensions() dimensions = repo.list_dimensions()
return render_template("pipeline/run.html", domains=domains, dimensions=dimensions) # Build per-domain summaries
summaries = {}
for d in domains:
summaries[d.name] = repo.get_pipeline_summary(d.name)
# Get recent pipeline runs
runs = repo.list_pipeline_runs()
return render_template(
"pipeline/run.html",
domains=domains,
dimensions=dimensions,
summaries=summaries,
runs=runs,
)
@bp.route("/run", methods=["POST"]) @bp.route("/run", methods=["POST"])
@@ -37,20 +107,50 @@ def pipeline_run():
flash("Select at least one dimension.", "error") flash("Select at least one dimension.", "error")
return redirect(url_for("pipeline.pipeline_form")) return redirect(url_for("pipeline.pipeline_form"))
from physcom.engine.constraint_resolver import ConstraintResolver # Create pipeline_run record
from physcom.engine.scorer import Scorer config = {
from physcom.engine.pipeline import Pipeline "passes": passes,
"threshold": threshold,
"dimensions": dim_list,
}
run_id = repo.create_pipeline_run(domain.id, config)
resolver = ConstraintResolver() # Resolve DB path for the background thread
scorer = Scorer(domain) from physcom_web.app import DEFAULT_DB
pipeline = Pipeline(repo, resolver, scorer, llm=None) db_path = str(Path(os.environ.get("PHYSCOM_DB", str(DEFAULT_DB))))
result = pipeline.run(domain, dim_list, score_threshold=threshold, passes=passes) # Start background thread
t = threading.Thread(
target=_run_pipeline_in_background,
args=(db_path, domain_name, dim_list, passes, threshold, run_id),
daemon=True,
)
t.start()
flash( flash(
f"Pipeline complete: {result.total_generated} combos generated, " f"Pipeline run #{run_id} started for {domain_name} "
f"{result.pass1_valid} valid, {result.pass1_blocked} blocked, " f"(passes {passes}, threshold {threshold}).",
f"{result.pass3_above_threshold} above threshold.", "info",
"success",
) )
return redirect(url_for("results.results_domain", domain_name=domain_name)) return redirect(url_for("pipeline.pipeline_form"))
@bp.route("/runs/<int:run_id>/status")
def run_status(run_id: int):
"""HTMX partial: returns live progress for a single pipeline run."""
repo = get_repo()
run = repo.get_pipeline_run(run_id)
if not run:
return "<p>Run not found.</p>", 404
return render_template("pipeline/_run_status.html", run=run)
@bp.route("/runs/<int:run_id>/cancel", methods=["POST"])
def run_cancel(run_id: int):
"""Set a running pipeline to cancelled. The pipeline checks this flag."""
repo = get_repo()
run = repo.get_pipeline_run(run_id)
if run and run["status"] == "running":
repo.update_pipeline_run(run_id, status="cancelled")
flash(f"Run #{run_id} cancellation requested.", "info")
return redirect(url_for("pipeline.pipeline_form"))

View File

@@ -26,7 +26,8 @@ def results_domain(domain_name: str):
status_filter = request.args.get("status") status_filter = request.args.get("status")
results = repo.get_all_results(domain_name, status=status_filter) results = repo.get_all_results(domain_name, status=status_filter)
statuses = repo.count_combinations_by_status() # Domain-scoped status counts (only combos that have results in this domain)
statuses = repo.count_combinations_by_status(domain_name=domain_name)
return render_template( return render_template(
"results/list.html", "results/list.html",
@@ -35,6 +36,7 @@ def results_domain(domain_name: str):
results=results, results=results,
status_filter=status_filter, status_filter=status_filter,
statuses=statuses, statuses=statuses,
total_results=sum(statuses.values()),
) )

View File

@@ -154,3 +154,77 @@ dd { font-size: 0.9rem; }
/* ── Dep add form ────────────────────────────────────────── */ /* ── Dep add form ────────────────────────────────────────── */
.dep-add-form { margin-top: 0.75rem; } .dep-add-form { margin-top: 0.75rem; }
/* ── Form hints ──────────────────────────────────────────── */
.form-hint { color: #666; font-size: 0.8rem; margin-bottom: 0.25rem; font-weight: 400; }
/* ── Vertical checkbox list ──────────────────────────────── */
.checkbox-col { display: flex; flex-direction: column; gap: 0.5rem; }
.checkbox-col label { display: flex; align-items: baseline; gap: 0.4rem; font-size: 0.9rem; }
.checkbox-col label .form-hint { display: block; margin-left: 1.3rem; }
/* ── Summary DL (pipeline) ───────────────────────────────── */
.summary-dl { display: grid; grid-template-columns: auto 1fr; gap: 0.15rem 1rem; }
/* ── Pipeline run status ────────────────────────────────── */
.badge-running { background: #dbeafe; color: #1e40af; }
.badge-completed { background: #dcfce7; color: #166534; }
.badge-failed { background: #fee2e2; color: #991b1b; }
.badge-cancelled { background: #fef3c7; color: #92400e; }
.run-status { padding: 0.25rem 0; }
.run-status-header { display: flex; align-items: center; gap: 0.5rem; margin-bottom: 0.5rem; }
.run-status-label { font-weight: 600; font-size: 0.9rem; }
.progress-bar-container {
background: #e5e7eb;
border-radius: 4px;
height: 8px;
overflow: hidden;
margin-bottom: 0.35rem;
}
.progress-bar {
background: #2563eb;
height: 100%;
border-radius: 4px;
transition: width 0.3s ease;
}
.run-status-counters {
display: flex;
gap: 1rem;
font-size: 0.8rem;
color: #555;
margin-bottom: 0.35rem;
}
.run-status-actions { margin-top: 0.35rem; }
/* ── Block reason ───────────────────────────────────────── */
.block-reason-cell {
font-size: 0.8rem;
color: #666;
max-width: 350px;
word-break: break-word;
}
/* ── Metric position bar ────────────────────────────────── */
.metric-bar-container {
display: inline-block;
width: 60px;
height: 6px;
background: #e5e7eb;
border-radius: 3px;
overflow: hidden;
vertical-align: middle;
}
.metric-bar {
height: 100%;
background: #2563eb;
border-radius: 3px;
}
.metric-bar-label {
font-size: 0.75rem;
color: #666;
margin-left: 0.3rem;
}

View File

@@ -7,14 +7,13 @@
{% if not domains %} {% if not domains %}
<p class="empty">No domains found. Seed data via CLI first.</p> <p class="empty">No domains found. Seed data via CLI first.</p>
{% else %} {% else %}
<div class="card-grid">
{% for d in domains %} {% for d in domains %}
<div class="card"> <div class="card">
<h2>{{ d.name }}</h2> <h2>{{ d.name }}</h2>
<p>{{ d.description }}</p> <p>{{ d.description }}</p>
<table> <table>
<thead> <thead>
<tr><th>Metric</th><th>Weight</th><th>Min</th><th>Max</th></tr> <tr><th>Metric</th><th>Weight</th><th>Norm Min</th><th>Norm Max</th></tr>
</thead> </thead>
<tbody> <tbody>
{% for mb in d.metric_bounds %} {% for mb in d.metric_bounds %}
@@ -29,6 +28,5 @@
</table> </table>
</div> </div>
{% endfor %} {% endfor %}
</div>
{% endif %} {% endif %}
{% endblock %} {% endblock %}

View File

@@ -3,21 +3,29 @@
{% block content %} {% block content %}
<div class="page-header"> <div class="page-header">
<h1>{{ entity.name }}</h1> <h1>{{ entity.name }} <span class="subtitle">{{ entity.dimension }}</span></h1>
<div>
<a href="{{ url_for('entities.entity_edit', entity_id=entity.id) }}" class="btn">Edit</a>
<form method="post" action="{{ url_for('entities.entity_delete', entity_id=entity.id) }}" class="inline-form" <form method="post" action="{{ url_for('entities.entity_delete', entity_id=entity.id) }}" class="inline-form"
onsubmit="return confirm('Delete this entity?')"> onsubmit="return confirm('Delete this entity and all its dependencies?')">
<button type="submit" class="btn btn-danger">Delete</button> <button type="submit" class="btn btn-danger">Delete Entity</button>
</form> </form>
</div>
</div> </div>
<div class="card"> <div class="card">
<dl> <form method="post" action="{{ url_for('entities.entity_detail', entity_id=entity.id) }}">
<dt>Dimension</dt><dd>{{ entity.dimension }}</dd> <div class="form-row">
<dt>Description</dt><dd>{{ entity.description or '—' }}</dd> <div class="form-group" style="flex:1">
</dl> <label for="name">Name</label>
<input type="text" id="name" name="name" value="{{ entity.name }}" required>
</div>
<div class="form-group" style="flex:2">
<label for="description">Description</label>
<input type="text" id="description" name="description" value="{{ entity.description }}">
</div>
<div class="form-group" style="align-self:end">
<button type="submit" class="btn btn-primary">Save</button>
</div>
</div>
</form>
</div> </div>
<h2>Dependencies</h2> <h2>Dependencies</h2>

View File

@@ -0,0 +1,78 @@
{# HTMX partial: live status for a single pipeline run #}
<div class="run-status run-status-{{ run.status }}"
{% if run.status == 'running' or run.status == 'pending' %}
hx-get="{{ url_for('pipeline.run_status', run_id=run.id) }}"
hx-trigger="every 2s"
hx-swap="outerHTML"
{% endif %}>
<div class="run-status-header">
<span class="badge badge-{{ run.status }}">{{ run.status }}</span>
<span class="run-status-label">Run #{{ run.id }}</span>
{% if run.current_pass %}
<span class="subtitle">Processing pass {{ run.current_pass }}</span>
{% endif %}
</div>
{% if run.total_combos and run.total_combos > 0 %}
{% set done = run.combos_pass1 or 0 %}
{% set pct = (done / run.total_combos * 100) | int %}
<div class="progress-bar-container">
<div class="progress-bar" style="width: {{ pct }}%"></div>
</div>
<div class="run-status-counters">
<span>{{ done }} / {{ run.total_combos }} combos processed</span>
</div>
<table class="compact" style="margin-top:0.35rem">
<thead>
<tr><th>Pass</th><th>Result</th></tr>
</thead>
<tbody>
{% if (run.combos_pass1 or 0) > 0 %}
{% set valid = (run.combos_pass1 or 0) - (run.total_combos - (run.combos_pass2 or 0)) if (run.combos_pass2 or 0) > 0 else (run.combos_pass1 or 0) %}
<tr>
<td>1 — Constraints</td>
<td>{{ run.combos_pass1 or 0 }} checked
{%- if (run.combos_pass2 or 0) > 0 and (run.combos_pass1 or 0) > (run.combos_pass2 or 0) %},
<span class="badge badge-blocked">{{ (run.combos_pass1 or 0) - (run.combos_pass2 or 0) }} blocked</span>
{%- endif -%}
</td>
</tr>
{% endif %}
{% if (run.combos_pass2 or 0) > 0 %}
<tr>
<td>2 — Estimation</td>
<td>{{ run.combos_pass2 or 0 }} estimated</td>
</tr>
{% endif %}
{% if (run.combos_pass3 or 0) > 0 %}
<tr>
<td>3 — Scoring</td>
<td>{{ run.combos_pass3 or 0 }} scored</td>
</tr>
{% endif %}
{% if (run.combos_pass4 or 0) > 0 %}
<tr>
<td>4 — LLM Review</td>
<td>{{ run.combos_pass4 or 0 }} reviewed</td>
</tr>
{% endif %}
</tbody>
</table>
{% endif %}
{% if run.error_message %}
<div class="flash flash-error" style="margin-top:0.5rem">{{ run.error_message }}</div>
{% endif %}
<div class="run-status-actions">
{% if run.status == 'running' %}
<form method="post" action="{{ url_for('pipeline.run_cancel', run_id=run.id) }}" class="inline-form">
<button type="submit" class="btn btn-danger btn-sm">Cancel</button>
</form>
{% endif %}
{% if run.status == 'completed' %}
<a href="{{ url_for('results.results_index') }}" class="btn btn-sm">View results</a>
{% endif %}
</div>
</div>

View File

@@ -8,6 +8,7 @@
<form method="post" action="{{ url_for('pipeline.pipeline_run') }}"> <form method="post" action="{{ url_for('pipeline.pipeline_run') }}">
<div class="form-group"> <div class="form-group">
<label for="domain">Domain</label> <label for="domain">Domain</label>
<p class="form-hint">The evaluation context that defines which metrics matter and how they're weighted.</p>
<select name="domain" id="domain" required> <select name="domain" id="domain" required>
<option value="">— select —</option> <option value="">— select —</option>
{% for d in domains %} {% for d in domains %}
@@ -18,30 +19,45 @@
<fieldset> <fieldset>
<legend>Passes</legend> <legend>Passes</legend>
<div class="checkbox-row"> <p class="form-hint">Each pass progressively filters and enriches combinations. Later passes depend on earlier ones.</p>
{% for p in [1, 2, 3, 4, 5] %} <div class="checkbox-col">
<label> <label>
<input type="checkbox" name="passes" value="{{ p }}" <input type="checkbox" name="passes" value="1" checked>
{{ 'checked' if p <= 3 }}> <strong>Pass 1 — Constraint Resolution</strong>
Pass {{ p }} <span class="form-hint">Checks requires/provides/excludes compatibility between entities. Blocks impossible combinations.</span>
{% if p == 1 %}(Constraints) </label>
{% elif p == 2 %}(Estimation) <label>
{% elif p == 3 %}(Scoring) <input type="checkbox" name="passes" value="2" checked>
{% elif p == 4 %}(LLM Review) <strong>Pass 2 — Physics Estimation</strong>
{% elif p == 5 %}(Human Review) <span class="form-hint">Estimates raw metric values (speed, cost, etc.) using heuristics or an LLM. Without an LLM provider, uses a force/mass stub.</span>
{% endif %} </label>
<label>
<input type="checkbox" name="passes" value="3" checked>
<strong>Pass 3 — Scoring &amp; Ranking</strong>
<span class="form-hint">Normalizes estimates against domain bounds and computes a weighted geometric mean composite score.</span>
</label>
<label>
<input type="checkbox" name="passes" value="4">
<strong>Pass 4 — LLM Review</strong>
<span class="form-hint">Sends top combinations to an LLM for a plausibility and novelty assessment. Requires an LLM provider to be configured.</span>
</label>
<label>
<input type="checkbox" name="passes" value="5">
<strong>Pass 5 — Human Review</strong>
<span class="form-hint">Marks results as ready for human review on the Results page.</span>
</label> </label>
{% endfor %}
</div> </div>
</fieldset> </fieldset>
<div class="form-group"> <div class="form-group">
<label for="threshold">Score Threshold</label> <label for="threshold">Score Threshold</label>
<p class="form-hint">Minimum composite score (01) for a combination to pass scoring. Lower values keep more results; higher values are more selective.</p>
<input type="number" name="threshold" id="threshold" value="0.1" step="0.01" min="0" max="1"> <input type="number" name="threshold" id="threshold" value="0.1" step="0.01" min="0" max="1">
</div> </div>
<fieldset> <fieldset>
<legend>Dimensions</legend> <legend>Dimensions</legend>
<p class="form-hint">Which entity dimensions to combine. The pipeline generates the Cartesian product of all entities in the selected dimensions.</p>
<div class="checkbox-row"> <div class="checkbox-row">
{% for d in dimensions %} {% for d in dimensions %}
<label> <label>
@@ -57,4 +73,76 @@
</div> </div>
</form> </form>
</div> </div>
{% set active_runs = runs | selectattr('status', 'in', ['pending', 'running']) | list %}
{% if active_runs %}
<h2>Active Runs</h2>
{% for run in active_runs %}
<div class="card"
hx-get="{{ url_for('pipeline.run_status', run_id=run.id) }}"
hx-trigger="every 2s"
hx-swap="innerHTML">
{% include "pipeline/_run_status.html" %}
</div>
{% endfor %}
{% endif %}
{% if runs %}
<h2>Run History</h2>
<table>
<thead>
<tr>
<th>ID</th>
<th>Domain</th>
<th>Status</th>
<th>Total</th>
<th>P1 Checked</th>
<th>P1 Blocked</th>
<th>P2 Estimated</th>
<th>P3 Scored</th>
<th>P4 Reviewed</th>
<th>Started</th>
</tr>
</thead>
<tbody>
{% for run in runs %}
{% set blocked = (run.combos_pass1 or 0) - (run.combos_pass2 or 0) if (run.combos_pass2 or 0) > 0 and (run.combos_pass1 or 0) > (run.combos_pass2 or 0) else 0 %}
<tr>
<td>{{ run.id }}</td>
<td>{{ run.domain_name }}</td>
<td><span class="badge badge-{{ run.status }}">{{ run.status }}</span></td>
<td>{{ run.total_combos or '—' }}</td>
<td>{{ run.combos_pass1 or '—' }}</td>
<td>{% if blocked %}<span class="badge badge-blocked">{{ blocked }}</span>{% else %}—{% endif %}</td>
<td>{{ run.combos_pass2 or '—' }}</td>
<td>{{ run.combos_pass3 or '—' }}</td>
<td>{{ run.combos_pass4 or '—' }}</td>
<td>{{ run.started_at or run.created_at }}</td>
</tr>
{% endfor %}
</tbody>
</table>
{% endif %}
{% if summaries.values()|select|list %}
<h2>Domain Summaries</h2>
{% for d in domains %}
{% set s = summaries[d.name] %}
{% if s %}
<div class="card">
<h3>{{ d.name }} <span class="subtitle">{{ d.description }}</span></h3>
<dl class="summary-dl">
<dt>Results</dt><dd>{{ s.total_results }} scored combinations</dd>
<dt>Blocked</dt><dd>{{ s.blocked }} combinations</dd>
<dt>Score range</dt><dd class="score-cell">{{ "%.4f"|format(s.min_score) }} — {{ "%.4f"|format(s.max_score) }}</dd>
<dt>Avg score</dt><dd class="score-cell">{{ "%.4f"|format(s.avg_score) }}</dd>
<dt>Last pass</dt><dd>{{ s.last_pass }}</dd>
</dl>
<div style="margin-top:0.5rem">
<a href="{{ url_for('results.results_domain', domain_name=d.name) }}" class="btn btn-sm">View results</a>
</div>
</div>
{% endif %}
{% endfor %}
{% endif %}
{% endblock %} {% endblock %}

View File

@@ -55,19 +55,55 @@
{% if scores %} {% if scores %}
<h2>Per-Metric Scores</h2> <h2>Per-Metric Scores</h2>
{% set bounds = {} %}
{% for mb in domain.metric_bounds %}
{% set _ = bounds.update({mb.metric_name: mb}) %}
{% endfor %}
<div class="card"> <div class="card">
<table> <table>
<thead> <thead>
<tr><th>Metric</th><th>Raw Value</th><th>Normalized</th><th>Method</th><th>Confidence</th></tr> <tr>
<th>Metric</th>
<th>Raw Value</th>
<th>Domain Range</th>
<th>Position</th>
<th>Normalized</th>
<th>Weight</th>
</tr>
</thead> </thead>
<tbody> <tbody>
{% for s in scores %} {% for s in scores %}
{% set mb = bounds.get(s.metric_name) %}
<tr> <tr>
<td>{{ s.metric_name }}</td> <td>{{ s.metric_name }}</td>
<td>{{ "%.2f"|format(s.raw_value) if s.raw_value is not none else '' }}</td> {% set unit = s.metric_unit or '' %}
<td class="score-cell">{{ "%.2f"|format(s.raw_value) if s.raw_value is not none else '—' }}{{ ' ' + unit if unit and s.raw_value is not none else '' }}</td>
<td>
{%- if mb -%}
{{ "%.2f"|format(mb.norm_min) }} — {{ "%.2f"|format(mb.norm_max) }}{{ ' ' + unit if unit else '' }}
{%- else -%}
{%- endif -%}
</td>
<td>
{%- if mb and s.raw_value is not none -%}
{%- if s.raw_value <= mb.norm_min -%}
<span class="badge badge-blocked">at/below min</span>
{%- elif s.raw_value >= mb.norm_max -%}
<span class="badge badge-valid">at/above max</span>
{%- else -%}
{% set pct = ((s.raw_value - mb.norm_min) / (mb.norm_max - mb.norm_min) * 100) | int %}
<div class="metric-bar-container">
<div class="metric-bar" style="width: {{ pct }}%"></div>
</div>
<span class="metric-bar-label">~{{ pct }}%</span>
{%- endif -%}
{%- else -%}
{%- endif -%}
</td>
<td class="score-cell">{{ "%.4f"|format(s.normalized_score) if s.normalized_score is not none else '—' }}</td> <td class="score-cell">{{ "%.4f"|format(s.normalized_score) if s.normalized_score is not none else '—' }}</td>
<td>{{ s.estimation_method or '—' }}</td> <td>{{ "%.0f%%"|format(mb.weight * 100) if mb else '—' }}</td>
<td>{{ "%.2f"|format(s.confidence) if s.confidence is not none else '—' }}</td>
</tr> </tr>
{% endfor %} {% endfor %}
</tbody> </tbody>

View File

@@ -21,7 +21,7 @@
<div class="filter-row"> <div class="filter-row">
<span>Filter:</span> <span>Filter:</span>
<a href="{{ url_for('results.results_domain', domain_name=domain.name) }}" <a href="{{ url_for('results.results_domain', domain_name=domain.name) }}"
class="btn btn-sm {{ '' if status_filter else 'btn-primary' }}">All</a> class="btn btn-sm {{ '' if status_filter else 'btn-primary' }}">All ({{ total_results }})</a>
{% for s, cnt in statuses.items() %} {% for s, cnt in statuses.items() %}
<a href="{{ url_for('results.results_domain', domain_name=domain.name, status=s) }}" <a href="{{ url_for('results.results_domain', domain_name=domain.name, status=s) }}"
class="btn btn-sm {{ 'btn-primary' if status_filter == s else '' }}"> class="btn btn-sm {{ 'btn-primary' if status_filter == s else '' }}">
@@ -32,7 +32,11 @@
{% endif %} {% endif %}
{% if not results %} {% if not results %}
<p class="empty">No results yet. <a href="{{ url_for('pipeline.pipeline_form') }}">Run the pipeline</a> first.</p> {% if status_filter %}
<p class="empty">No results with status "{{ status_filter }}" in this domain.</p>
{% else %}
<p class="empty">No results for this domain yet. <a href="{{ url_for('pipeline.pipeline_form') }}">Run the pipeline</a> first.</p>
{% endif %}
{% else %} {% else %}
<table> <table>
<thead> <thead>
@@ -41,7 +45,7 @@
<th>Score</th> <th>Score</th>
<th>Entities</th> <th>Entities</th>
<th>Status</th> <th>Status</th>
<th>Novelty</th> <th>Details</th>
<th></th> <th></th>
</tr> </tr>
</thead> </thead>
@@ -49,10 +53,18 @@
{% for r in results %} {% for r in results %}
<tr> <tr>
<td>{{ loop.index }}</td> <td>{{ loop.index }}</td>
<td class="score-cell">{{ "%.4f"|format(r.composite_score) }}</td> <td class="score-cell">{{ "%.4f"|format(r.composite_score) if r.composite_score else '—' }}</td>
<td>{{ r.combination.entities|map(attribute='name')|join(' + ') }}</td> <td>{{ r.combination.entities|map(attribute='name')|join(' + ') }}</td>
<td><span class="badge badge-{{ r.combination.status }}">{{ r.combination.status }}</span></td> <td><span class="badge badge-{{ r.combination.status }}">{{ r.combination.status }}</span></td>
<td>{{ r.novelty_flag or '—' }}</td> <td class="block-reason-cell">
{%- if r.combination.status == 'blocked' and r.combination.block_reason -%}
{{ r.combination.block_reason }}
{%- elif r.novelty_flag -%}
{{ r.novelty_flag }}
{%- else -%}
{%- endif -%}
</td>
<td> <td>
<a href="{{ url_for('results.result_detail', domain_name=domain.name, combo_id=r.combination.id) }}" <a href="{{ url_for('results.result_detail', domain_name=domain.name, combo_id=r.combination.id) }}"
class="btn btn-sm">View</a> class="btn btn-sm">View</a>

View File

@@ -0,0 +1,305 @@
"""Tests for async pipeline: resume, cancellation, status guard, run lifecycle."""
import json
from physcom.engine.constraint_resolver import ConstraintResolver
from physcom.engine.scorer import Scorer
from physcom.engine.pipeline import Pipeline, CancelledError
def test_pipeline_run_lifecycle(seeded_repo):
"""Pipeline run should transition: pending -> running -> completed."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
config = {"passes": [1, 2, 3], "threshold": 0.1, "dimensions": ["platform", "power_source"]}
run_id = repo.create_pipeline_run(domain.id, config)
run = repo.get_pipeline_run(run_id)
assert run["status"] == "pending"
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
pipeline.run(domain, ["platform", "power_source"], passes=[1, 2, 3], run_id=run_id)
run = repo.get_pipeline_run(run_id)
assert run["status"] == "completed"
assert run["total_combos"] == 81
assert run["started_at"] is not None
assert run["completed_at"] is not None
def test_pipeline_run_failed(seeded_repo):
"""Pipeline run should be marked failed on error."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
config = {"passes": [1], "threshold": 0.1, "dimensions": ["platform", "power_source"]}
run_id = repo.create_pipeline_run(domain.id, config)
# Manually mark as failed (simulating what the web route does on exception)
repo.update_pipeline_run(run_id, status="failed", error_message="Test error")
run = repo.get_pipeline_run(run_id)
assert run["status"] == "failed"
assert run["error_message"] == "Test error"
def test_resume_skips_completed_combos(seeded_repo):
"""Re-running the same passes on the same domain should skip already-completed combos."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
# First run: passes 1-3
run_id_1 = repo.create_pipeline_run(domain.id, {"passes": [1, 2, 3]})
result1 = pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3], run_id=run_id_1,
)
assert result1.pass2_estimated > 0
first_estimated = result1.pass2_estimated
# Second run: same passes — should skip all combos (already pass_reached >= 3)
run_id_2 = repo.create_pipeline_run(domain.id, {"passes": [1, 2, 3]})
result2 = pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3], run_id=run_id_2,
)
# pass2_estimated still counted (reloaded from DB) but no new estimation work
# The key thing: the run completes successfully
assert result2.total_generated == result1.total_generated
run2 = repo.get_pipeline_run(run_id_2)
assert run2["status"] == "completed"
def test_cancellation_stops_processing(seeded_repo):
"""Cancelling a run mid-flight should stop the pipeline gracefully."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
run_id = repo.create_pipeline_run(domain.id, {"passes": [1, 2, 3]})
# Pre-cancel the run before it starts processing
repo.update_pipeline_run(run_id, status="running")
repo.update_pipeline_run(run_id, status="cancelled")
result = pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3], run_id=run_id,
)
# Should have stopped without processing all combos
run = repo.get_pipeline_run(run_id)
assert run["status"] == "cancelled"
# The pipeline was cancelled before any combo processing could happen
assert result.pass2_estimated == 0
def test_status_guard_no_downgrade_reviewed(seeded_repo):
"""update_combination_status should not downgrade 'reviewed' to 'scored'."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
# Run pipeline to get scored combos
result = pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3],
)
# Find a scored combo and manually mark it as reviewed
scored_combos = repo.list_combinations(status="scored")
assert len(scored_combos) > 0
combo = scored_combos[0]
repo.conn.execute(
"UPDATE combinations SET status = 'reviewed' WHERE id = ?", (combo.id,)
)
repo.conn.commit()
# Attempt to downgrade to 'scored'
repo.update_combination_status(combo.id, "scored")
# Should still be 'reviewed'
reloaded = repo.get_combination(combo.id)
assert reloaded.status == "reviewed"
def test_human_notes_preserved_on_rerun(seeded_repo):
"""Human notes should not be overwritten when re-running the pipeline."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
# First run
pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3],
)
# Add human notes to a result
results = repo.get_all_results(domain.name)
assert len(results) > 0
target = results[0]
combo_id = target["combination"].id
domain_id = target["domain_id"]
repo.save_result(
combo_id, domain_id,
target["composite_score"],
pass_reached=target["pass_reached"],
novelty_flag=target["novelty_flag"],
human_notes="Important human insight",
)
# Clear pass_reached so re-run processes this combo again
repo.conn.execute(
"""UPDATE combination_results SET pass_reached = 0
WHERE combination_id = ? AND domain_id = ?""",
(combo_id, domain_id),
)
repo.conn.commit()
# Re-run pipeline
pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3],
)
# Check that human_notes survived
result = repo.get_existing_result(combo_id, domain_id)
assert result["human_notes"] == "Important human insight"
def test_list_pipeline_runs(seeded_repo):
"""list_pipeline_runs should return runs for a domain or all domains."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
run_id_1 = repo.create_pipeline_run(domain.id, {"passes": [1]})
run_id_2 = repo.create_pipeline_run(domain.id, {"passes": [1, 2, 3]})
all_runs = repo.list_pipeline_runs()
assert len(all_runs) >= 2
domain_runs = repo.list_pipeline_runs(domain_id=domain.id)
assert len(domain_runs) >= 2
assert all(r["domain_id"] == domain.id for r in domain_runs)
def test_get_combo_pass_reached(seeded_repo):
"""get_combo_pass_reached returns the correct pass level."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3],
)
# Get a scored combo
scored_combos = repo.list_combinations(status="scored")
assert len(scored_combos) > 0
combo = scored_combos[0]
pass_reached = repo.get_combo_pass_reached(combo.id, domain.id)
assert pass_reached == 3
# Non-existent combo
assert repo.get_combo_pass_reached(99999, domain.id) is None
def test_blocked_combos_have_results(seeded_repo):
"""Blocked combinations should still appear in combination_results."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
result = pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3],
)
assert result.pass1_blocked > 0
# All combos (blocked + scored) should have result rows
all_results = repo.get_all_results(domain.name)
total_with_results = len(all_results)
# blocked combos get pass_reached=1 results, non-blocked get pass_reached=3
assert total_with_results == result.pass1_blocked + result.pass3_scored
# Blocked combos should have pass_reached=1 and composite_score=0.0
blocked_results = [r for r in all_results if r["combination"].status == "blocked"]
assert len(blocked_results) == result.pass1_blocked
for br in blocked_results:
assert br["pass_reached"] == 1
assert br["composite_score"] == 0.0
def test_all_passes_run_and_tracked(seeded_repo):
"""With passes [1,2,3], all three should show nonzero counts in run record."""
repo = seeded_repo
domain = repo.get_domain("urban_commuting")
resolver = ConstraintResolver()
scorer = Scorer(domain)
pipeline = Pipeline(repo, resolver, scorer)
run_id = repo.create_pipeline_run(domain.id, {"passes": [1, 2, 3]})
result = pipeline.run(
domain, ["platform", "power_source"],
score_threshold=0.01, passes=[1, 2, 3], run_id=run_id,
)
run = repo.get_pipeline_run(run_id)
assert run["combos_pass1"] > 0, "Pass 1 counter should be nonzero"
assert run["combos_pass2"] > 0, "Pass 2 counter should be nonzero"
assert run["combos_pass3"] > 0, "Pass 3 counter should be nonzero"
# Pass 2 should equal valid + conditional (blocked don't get estimated)
assert run["combos_pass2"] == result.pass2_estimated
# Pass 3 should equal pass3_scored (all scored combos, not just above threshold)
assert run["combos_pass3"] == result.pass3_scored
def test_save_combination_loads_existing_status(seeded_repo):
"""save_combination should load the status of an existing combo from DB."""
repo = seeded_repo
from physcom.models.combination import Combination
from physcom.models.entity import Entity
entities = repo.list_entities(dimension="platform")[:1] + repo.list_entities(dimension="power_source")[:1]
combo = Combination(entities=entities)
saved = repo.save_combination(combo)
assert saved.status == "pending"
# Mark it blocked in DB
repo.update_combination_status(saved.id, "blocked", "test reason")
# Re-saving should pick up the blocked status
combo2 = Combination(entities=entities)
reloaded = repo.save_combination(combo2)
assert reloaded.id == saved.id
assert reloaded.status == "blocked"
assert reloaded.block_reason == "test reason"