Add pluggable LLM support with Gemini provider

- Add LLMProvider registry (llm/registry.py) that builds a provider from env vars (LLM_PROVIDER, GEMINI_API_KEY, GEMINI_MODEL) - Add GeminiLLMProvider using the google-genai SDK - Wire build_llm_provider() into CLI and web pipeline route (replacing llm=None) - Wrap pass 2 and pass 4 LLM calls in per-combo try/except so API errors skip individual combos rather than aborting the whole run - Add gemini optional dep to pyproject.toml; Dockerfile installs [web,gemini] - Document env vars in .env.example and README - Lower requires-python to >=3.10 to match installed system Python Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-18 22:04:35 -06:00
parent f1b3c75190
commit 20dae0dce3
10 changed files with 204 additions and 40 deletions
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,11 @@
+# Copy to .env — FLASK_SECRET_KEY is auto-generated on first run if omitted.
+FLASK_SECRET_KEY=
+
+# LLM provider (leave blank to use stub estimation)
+# Supported: gemini
+LLM_PROVIDER=
+
+# Gemini (required when LLM_PROVIDER=gemini)
+# Install: pip install -e '.[gemini]'   (from repo root)
+GEMINI_API_KEY=
+GEMINI_MODEL=gemini-2.0-flash
--- a/2
+++ b/2
@@ -5,7 +5,7 @@ WORKDIR /app
 COPY pyproject.toml .
 COPY src/ src/

-RUN pip install --no-cache-dir ".[web]"
+RUN pip install --no-cache-dir ".[web,gemini]"

 VOLUME /app/data
 ENV PHYSCOM_DB=/app/data/physcom.db
--- a/README.md
+++ b/README.md
@@ -41,6 +41,55 @@ Putting together lists 1 and 2 we can create 81 mostly novel forms of transporta
 Using these metrics this experiment intends to sift vaguely reasonable concepts from nonsense.  Its shortlist may include concepts that sound bizarre but may be technically plausible.  Bicycles, motorcycles, and e-bikes all had their turn.  Why not hydrogen-bikes?


+## Setup
+
+### Docker (recommended)
+
+```bash
+docker compose up web
+```
+
+Then open [http://localhost:5000](http://localhost:5000).
+
+Seed the database with the transport example:
+
+```bash
+docker compose run cli seed transport
+```
+
+### Local development
+
+```bash
+pip install -e ".[dev,web]"
+python -m physcom init
+python -m physcom seed transport
+python -m physcom_web
+```
+
+Then open [http://localhost:5000](http://localhost:5000).
+
+Run tests:
+
+```bash
+python -m pytest tests/ -q
+```
+
+### LLM integration (optional)
+
+By default the pipeline uses stub estimation. To enable Gemini:
+
+```bash
+pip install -e ".[gemini]"
+export LLM_PROVIDER=gemini
+export GEMINI_API_KEY=your_key_here
+# export GEMINI_MODEL=gemini-2.0-flash  # optional, this is the default
+physcom run urban_commuting --passes 1,2,3,4
+```
+
+Copy `.env.example` to `.env` and fill in your key for persistent configuration.
+
+---
+
 a few notes:  the thin atmosphere and the sun are obvious dependencies to the solar sail power source. Dependencies would include things such as scale of force (nuclear reactor vs pedalling obviously has an important force differential) and geographic requirements (walking requires ground and gravity).  The project should include on every entity a list of dependencies.  The viability tester will need to pull in all of these dependencies to ensure they do not contradict.

 Additionally, metrics are expected to be extremely close to full points or none at all.  The speed of a person pushing a car is effectively zero in its domain whereas a rocket powered car would easily reach the limits of speed in the domain.  The resulting multiplication between metrics to get the viability score will be heavily logarithmic.  This is expected and is intended to be a filter to eliminate technically plausible but completely pointless in practice concepts.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -7,8 +7,13 @@ services:
      - physcom-data:/app/data
    environment:
      - PHYSCOM_DB=/app/data/physcom.db
-      - FLASK_SECRET_KEY=${FLASK_SECRET_KEY:-physcom-dev-key}
+      - FLASK_SECRET_KEY=${FLASK_SECRET_KEY:-}
+      - LLM_PROVIDER=${LLM_PROVIDER:-}
+      - GEMINI_API_KEY=${GEMINI_API_KEY:-}
+      - GEMINI_MODEL=${GEMINI_MODEL:-gemini-2.0-flash}
    restart: unless-stopped
+    expose:
+      - 5000

  cli:
    build: .
@@ -16,6 +21,9 @@ services:
      - physcom-data:/app/data
    environment:
      - PHYSCOM_DB=/app/data/physcom.db
+      - LLM_PROVIDER=${LLM_PROVIDER:-}
+      - GEMINI_API_KEY=${GEMINI_API_KEY:-}
+      - GEMINI_MODEL=${GEMINI_MODEL:-gemini-2.0-flash}
    entrypoint: ["python", "-m", "physcom"]
    profiles: [cli]

--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
 name = "physcom"
 version = "0.1.0"
 description = "Physical Combinatorics — innovation via attribute mixing"
-requires-python = ">=3.11"
+requires-python = ">=3.10"
 dependencies = [
    "click>=8.1",
 ]
@@ -18,6 +18,9 @@ dev = [
 web = [
    "flask>=3.0",
 ]
+gemini = [
+    "google-genai>=1.0",
+]

 [project.scripts]
 physcom = "physcom.cli:main"
@@ -26,5 +29,8 @@ physcom-web = "physcom_web.app:run"
 [tool.setuptools.packages.find]
 where = ["src"]

+[tool.setuptools.package-data]
+physcom_web = ["templates/**/*.html", "static/**/*"]
+
 [tool.pytest.ini_options]
 testpaths = ["tests"]
--- a/src/physcom/cli.py
+++ b/src/physcom/cli.py
@@ -135,7 +135,8 @@ def run(ctx, domain_name, passes, threshold, dimensions):
    pass_list = [int(p.strip()) for p in passes.split(",")]
    dim_list = [d.strip() for d in dimensions.split(",")]

-    pipeline = Pipeline(repo, resolver, scorer, llm=None)
+    from physcom.llm.registry import build_llm_provider
+    pipeline = Pipeline(repo, resolver, scorer, llm=build_llm_provider())
    click.echo(f"Running pipeline for '{domain_name}' (passes={pass_list}, threshold={threshold})")
    click.echo(f"Dimensions: {dim_list}")

@@ -209,16 +210,12 @@ def review(ctx, combination_id):
    notes = click.prompt("Human notes (or empty)", default="")

    if novelty != "skip" or notes:
-        # Get all domains this combo has results for
-        rows = repo.conn.execute(
-            "SELECT domain_id, composite_score FROM combination_results WHERE combination_id = ?",
-            (combo.id,),
-        ).fetchall()
-        for row in rows:
+        for row in repo.get_results_for_combination(combo.id):
            repo.save_result(
                combo.id, row["domain_id"], row["composite_score"],
                pass_reached=5,
                novelty_flag=novelty if novelty != "skip" else None,
+                llm_review=row.get("llm_review"),
                human_notes=notes or None,
            )
        repo.update_combination_status(combo.id, "reviewed")
--- a/src/physcom/engine/pipeline.py
+++ b/src/physcom/engine/pipeline.py
@@ -184,9 +184,12 @@ class Pipeline:
                if 2 in passes and existing_pass < 2:
                    description = _describe_combination(combo)
                    if self.llm:
+                        try:
                            raw_metrics = self.llm.estimate_physics(
                                description, metric_names
                            )
+                        except Exception:
+                            raw_metrics = self._stub_estimate(combo, metric_names)
                    else:
                        raw_metrics = self._stub_estimate(combo, metric_names)

@@ -284,6 +287,7 @@ class Pipeline:
                            and cur_result["composite_score"] is not None
                            and cur_result["composite_score"] >= score_threshold
                        ):
+                            try:
                                description = _describe_combination(combo)
                                db_scores = self.repo.get_combination_scores(
                                    combo.id, domain.id
@@ -296,7 +300,6 @@ class Pipeline:
                                review = self.llm.review_plausibility(
                                    description, score_dict
                                )
-
                                self.repo.save_result(
                                    combo.id,
                                    domain.id,
@@ -310,6 +313,8 @@ class Pipeline:
                                self._update_run_counters(
                                    run_id, result, current_pass=4
                                )
+                            except Exception:
+                                pass  # skip this combo; don't abort the run

        except CancelledError:
            if run_id is not None:
--- a/src/physcom/llm/providers/gemini.py
+++ b/src/physcom/llm/providers/gemini.py
@@ -0,0 +1,57 @@
+"""Gemini LLM provider via google-genai SDK."""
+
+from __future__ import annotations
+
+import json
+import re
+
+from physcom.llm.base import LLMProvider
+from physcom.llm.prompts import PHYSICS_ESTIMATION_PROMPT, PLAUSIBILITY_REVIEW_PROMPT
+
+
+class GeminiLLMProvider(LLMProvider):
+    """LLM provider backed by Google Gemini."""
+
+    def __init__(self, api_key: str, model: str = "gemini-2.0-flash") -> None:
+        try:
+            from google import genai
+        except ImportError:
+            raise ImportError(
+                "google-genai is required: pip install 'physcom[gemini]'"
+            )
+        self._client = genai.Client(api_key=api_key)
+        self._model = model
+
+    def estimate_physics(
+        self, combination_description: str, metrics: list[str]
+    ) -> dict[str, float]:
+        prompt = PHYSICS_ESTIMATION_PROMPT.format(
+            description=combination_description,
+            metrics=", ".join(metrics),
+        )
+        response = self._client.models.generate_content(
+            model=self._model, contents=prompt
+        )
+        return self._parse_json(response.text, metrics)
+
+    def review_plausibility(
+        self, combination_description: str, scores: dict[str, float]
+    ) -> str:
+        scores_str = "\n".join(f"- {k}: {v:.3f}" for k, v in scores.items())
+        prompt = PLAUSIBILITY_REVIEW_PROMPT.format(
+            description=combination_description,
+            scores=scores_str,
+        )
+        response = self._client.models.generate_content(
+            model=self._model, contents=prompt
+        )
+        return response.text.strip()
+
+    def _parse_json(self, text: str, metrics: list[str]) -> dict[str, float]:
+        """Strip markdown fences and parse JSON; fall back to 0.5 per metric on error."""
+        text = re.sub(r"```(?:json)?\s*", "", text).strip().rstrip("`").strip()
+        try:
+            data = json.loads(text)
+            return {k: float(v) for k, v in data.items() if k in metrics}
+        except (json.JSONDecodeError, ValueError, TypeError):
+            return {m: 0.5 for m in metrics}
--- a/src/physcom/llm/registry.py
+++ b/src/physcom/llm/registry.py
@@ -0,0 +1,30 @@
+"""Build an LLMProvider from environment variables."""
+
+from __future__ import annotations
+
+import os
+
+from physcom.llm.base import LLMProvider
+
+
+def build_llm_provider() -> LLMProvider | None:
+    """Return an LLMProvider based on env vars, or None if not configured.
+
+    LLM_PROVIDER   — provider name ('gemini'; more can be added)
+    GEMINI_API_KEY — required when LLM_PROVIDER=gemini
+    GEMINI_MODEL   — optional Gemini model name (default: gemini-2.0-flash)
+    """
+    provider = os.environ.get("LLM_PROVIDER", "").lower().strip()
+
+    if not provider:
+        return None
+
+    if provider == "gemini":
+        api_key = os.environ.get("GEMINI_API_KEY", "")
+        if not api_key:
+            raise ValueError("LLM_PROVIDER=gemini requires GEMINI_API_KEY to be set")
+        model = os.environ.get("GEMINI_MODEL", "gemini-2.0-flash")
+        from physcom.llm.providers.gemini import GeminiLLMProvider
+        return GeminiLLMProvider(api_key=api_key, model=model)
+
+    raise ValueError(f"Unknown LLM_PROVIDER: {provider!r}. Supported: gemini")
--- a/src/physcom_web/routes/pipeline.py
+++ b/src/physcom_web/routes/pipeline.py
@@ -42,9 +42,10 @@ def _run_pipeline_in_background(
            conn.close()
            return

+        from physcom.llm.registry import build_llm_provider
        resolver = ConstraintResolver()
        scorer = Scorer(domain)
-        pipeline = Pipeline(repo, resolver, scorer, llm=None)
+        pipeline = Pipeline(repo, resolver, scorer, llm=build_llm_provider())

        pipeline.run(
            domain, dim_list,