DEEP DIVE · 05

The ai-executive spec
cross-review system

The spec prompt to paste into Claude Code Desktop during Part 1. Three phases — independent eval, peer cross-review, synthesis. The sample output on the 3-billion-yen MMORPG investment question shows what the tool actually produces in forty-five seconds.

Cross-Review Spec-to-App SSE Streaming FastAPI · Docker

Why "generate from a spec"

The original plan was to clone ~/Products/ai-executive onto each executive's laptop and run docker compose up. But the laptops MIXI hands to executives aren't signed into GitHub. Neither git clone nor gh repo clone will run. So we flipped it: Part 1 of the workshop now has Claude Code Desktop generate the whole app from an empty folder.

No GitHub account needed — the device policy wall goes away, and every executive's MacBook boots the same way
The demo lands harder — watching Claude draw the app in front of you beats running a pre-made repo for "I could do this too" impact
If you can read and write the spec, you have the tool — a small preview of what 2026 software development is converging toward: "spec → generate → adjust"
It reproduces easily — paste the §5 prompt at home, and the executive re-runs the whole thing solo

Part 1's centerpiece. This page isn't a walkthrough; it's an executable spec. Copy and paste, and a mini ai-executive grows out of an empty folder. The §5 prompt block is the artifact that matters most.

Where ai-executive (reference) fits

The full version under ~/Products/ai-executive is a multi-agent system for MIXI strategy work: 17 agents (16 standard + 1 company-specific), 12 strategy frameworks, 3 analysis modes, 4 interfaces (Web / CLI / REST / MCP). That's the reference implementation — it's there to show the summit, the direction this can grow in. We don't clone it during the workshop. It's an inspiration source, nothing more.

The difference between the reference and the mini version we generate live:

Item	Reference (full)	Workshop (mini)
Agents	17	4 (CEO / CFO / CTO + Facilitator)
File count	150+	≤ 12 files
Languages / stack	Python + TypeScript (Next.js)	Python only (+ one HTML file)
Frameworks	12 (SWOT, Porter, 3C, and so on)	None — free-form + a JSON schema
Flow	3 modes	3 phases (independent eval → cross-review → synthesis)
Transport	REST + WebSocket	Server-Sent Events streaming
Startup	docker compose up (backend + frontend)	docker compose up (one container)

Think of the mini as "the skeleton of the full version, compressed to one tenth, turned into a teaching tool". Once the executive has been through spec → generate → modify once, scaling up to the full version is a connected idea, not a leap.

What we're generating (executive-facing summary)

The outline of the mini version, at the level of detail you'd use to brief an executive:

Goal — a very simple AI board tool that evaluates an item from the CEO / CFO / CTO perspectives in parallel
Input — one agenda item (e.g. "Should we invest 300 million yen in new game X?")
Processing — three agents running in parallel on the OpenAI API (gpt-4o-mini), each producing a 3–5 line assessment
Output — a Markdown report with three sections side by side (## CEO / ## CFO / ## CTO)
Runtime — one docker compose up, served on localhost:8000 (backend and static frontend from the same container)
Execution time — 10–15 seconds per agenda item (three parallel gpt-4o-mini calls)

One-line version for the CEO: "You type an agenda in, you get three assessments — CEO, CFO, CTO — in ten seconds. That's all it does." Don't oversell it.

Sample output (should we invest 300M yen in new game X?)

What the tool actually produces — the concrete example that converts a skeptical executive in 30 seconds. Below is a dummy sample of what you'd expect from gpt-4o-mini (don't use this for real investment decisions).

Agenda: "Should we invest 300M yen in new game X (MMORPG)? Initial development 12 months, 24-month payback after launch, with competitor A possibly shipping a similar title in 2027 Q2."

Phase 1: Independent evaluation (completed in 14.2s)

CEO's view — stance: conditional yes
MMORPG sits well alongside MIXI's existing IP, and portfolio diversification three years out wants this move. But 300M yen at a moment when competitor A could ship around the same time means the differentiation story gets weaker. It's a conditional yes — only if we can front-load two existing IP characters and build brand advantage early.
Key points: reuse of existing IP, first-mover window vs competitor A, portfolio expansion.
Risks: launch timing, brand dilution.

CFO's view — stance: no
A 24-month payback window on MMORPG is optimistic. Industry average is 36–48 months, and for Japanese titles at this scale, over 60% miss their DAU targets. Also, 300M yen covers dev only — post-launch marketing plus ops will pull in another 200–300M reliably. Total investment of 500–600M yen, and IRR likely drops below our 15% company floor. The numbers are loose.
Key points: payback assumption, hidden costs, IRR below hurdle.
Risks: snowballing add-on investment, opportunity cost.
Quantitative note: 24-month payback assumed, industry median is 42 months.

CTO's view — stance: holding judgment
MMORPG's technical stack diverges from MIXI's existing assets (server sync, world persistence, anti-cheat). Twelve months assumes 8 in-house engineers full-time, but that collides with our existing project priorities. External studio contracting is possible, but MIXI inherits the operational technical debt afterward.
Key points: stack divergence, resource priority clash, operational debt inheritance.
Risks: 12-month schedule gap, long-term technical debt.

Phase 2: Cross-review (completed in 18.4s)

CEO on CFO / CTO
Agreements: CFO is right — 24-month payback is optimistic, it's out of line with industry benchmarks.
Blind spots: CFO doesn't model CAC reduction from existing IP reuse. If even 10% of MIXI's current users convert to the MMORPG, CAC drops ~60%.
Counterpoints: CTO's "external studio only" stance is too cautious. MIXI's technical brand depends on owning this kind of work in-house and taking on the debt.
Question: Can we actually imagine a 3-year IP strategy that doesn't include an MMORPG?

CFO on CEO / CTO
Agreements: CEO's "front-load existing IP" is a real differentiation lever. CTO's point about long-term tech debt being a direct cost driver is important.
Blind spots: CEO hasn't priced in macro risks beyond competitor A (weak yen, PC/console user growth slowing). CTO hasn't put numbers on contract costs or IP licensing fees for an external studio.
Counterpoints: CEO's "portfolio diversification" is abstract. Go back to the corporate plan and point at the evidence that current businesses are decelerating.
Question: What's the IRR if we put the same 300M yen into improving existing titles' operations instead?

CTO on CEO / CFO
Agreements: CFO's "hidden costs" is realistic. MMORPG ops from year three onward is maintenance-heavy. Agree with CEO on front-loading IP.
Blind spots: Neither CEO nor CFO accounts for AI coding assistants cutting development time (tools like today's Claude Code). In 2026 you can expect ~30% improvement in dev efficiency.
Counterpoints: CEO's "we'll bring it in-house" is decided without looking at current team utilization.
Question: With the 8 engineers already fully loaded, how does this rank against other large projects (core system replacement, etc.)?

Phase 3: Synthesis (completed in 8.1s)

Points of agreement
1. MMORPG as new IP has market potential.
2. Existing IP can drive differentiation (all three agree).
3. The current 300M-yen, 24-month-payback assumption is open to question.

Points of conflict and recommended resolution
CFO "optimistic payback / IRR miss" vs CEO "strategic necessity" → restructure as staged investment (Phase 1 prototype 50M yen → KPI-gated 200M yen GO/NO-GO → remaining 50M yen for launch).
CTO "stack divergence" vs CEO "bring it in-house" → hybrid: outsource core components, MIXI builds an in-house wrapper.

Action items (3)

Rebuild the staged-investment numbers — Owner: CFO. Deadline: 2026 Q3. Success metric: 3-phase investment scenario + IRR sensitivity approved at the management meeting.
Technical stack survey and due diligence on 2 outsourcing candidates — Owner: CTO. Deadline: 2026 Q3. Success metric: architecture option + 2-candidate RFP complete.
Simulation of existing-user conversion to the new title — Owner: Marketing. Deadline: 2026 Q4. Success metric: survey of 100 current users + quantified CAC reduction estimate.

Residual risks and monitoring metrics
1. Competitor A launches earlier than 2027 Q2 → monthly competitive research.
2. In-house engineer priority collision → quarterly EM utilization review.
3. AI dev-efficiency gains come in below estimate → measure Claude Code hours and PRs merged over 3 months.

Final recommendation: conditional GO
Strategic intent is agreed by all three. But "300M yen, 24-month payback" as currently specified is a NO-GO. Once the three conditions — redesigned staged investment, technical DD completed, IP conversion quantified — are met, bring it back for a formal GO/NO-GO.

What this sample shows about the tool:

Taken alone, the three stances (CEO "conditional yes" / CFO "no" / CTO "holding") deadlock the discussion.
Phase 2 cross-review surfaces what none of them saw alone (CAC reduction from IP conversion, macro risks, AI dev efficiency).
Phase 3 synthesis reframes "yes vs no" as "staged investment + three conditions" — the discussion now has forward motion.
A debate that usually takes 60–90 minutes of board meeting lands in 40 seconds of runtime + 3–5 minutes of reading.

File layout (expected output)

The directory tree Claude produces should land close to this. The spec caps it at "≤ 10 files, ~300 lines", so results converge here.

ai-executive/
├── .env.example
├── .env              # filled in by hand, gitignored
├── .gitignore
├── README.md
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── app/
│   ├── __init__.py
│   ├── main.py       # FastAPI entrypoint + SSE streaming
│   ├── agents.py     # the 4 agents (CEO/CFO/CTO/Facilitator)
│   ├── pipeline.py   # 3-phase pipeline orchestration
│   ├── models.py     # Pydantic schemas
│   └── prompts.py    # system prompts in one place
└── static/
    └── index.html    # Single-page UI (SSE receiver + Markdown rendering)

Total ≤ 12 files, ≤ 600 lines of code. app/pipeline.py is the core — asyncio.gather runs Phase 1 and Phase 2 in parallel, Phase 3 runs serially through the Facilitator. static/index.html receives Server-Sent Events and lights up each phase as it completes; the final report is rendered with marked.js.

Claude sometimes proposes tests/ or more elaborate separation. Given the time budget, say "skip that for now, keep to MVP" in Plan mode. Extensions can wait until the executive gets home.

The spec prompt (copy and paste)

This block is the single most important artifact in the workshop. Open Claude Code Desktop's Code tab, set the project folder to ~/workshop-demo/, and paste the whole thing. Turn Plan mode ON before sending, so it pauses for file-list review. Approve, and generation begins.

Copy and paste into Claude Code Desktop's Code tab

Create a new empty folder at ~/workshop-demo/ai-executive/, and build the MIXI "AI Executive cross-review agenda-evaluation system" inside it.

## Product goal
A practical management-support tool that evaluates agenda items from three executive perspectives (CEO / CFO / CTO), runs peer cross-review, then synthesizes. The core value: surface blind spots that individual evaluations miss, through executive cross-review.

## Three-phase processing

### Phase 1: Independent evaluation (parallel)
Each agent (CEO / CFO / CTO) evaluates the agenda item independently, without seeing the others' output.
Output fields:
- stance: "yes" / "no" / "conditional yes" / "holding judgment"
- summary: 2–3 lines of assessment
- key_points: main points (3 bullets)
- risks: concerning risks (2 bullets)
- quantitative_notes: quantitative notes (required for CFO, optional for others)

### Phase 2: Cross-review (parallel)
Each agent reads the other two evaluations from Phase 1 and produces a peer review covering:
- agreements: points of agreement (1–2)
- blind_spots: blind spots and pointed questions (2–3)  ← the core value of this tool
- counter_arguments: counterpoints (0–2)
- questions: follow-up questions / information requests (1–2)

### Phase 3: Synthesis (serial)
A Facilitator agent reads all Phase 1 + Phase 2 output and produces:
- consensus: points of agreement (2–4)
- conflicts: points of conflict + recommended resolution
- actions: 3 actionable items (title / owner / deadline / success_metric)
- residual_risks: residual risks and monitoring metrics
- decision_recommendation: "GO" / "NO-GO" / "conditional GO" + one-line reason

## Agent specs

### CEO Agent (key: OPENAI_API_KEY_CEO)
System prompt: "You are MIXI's CEO. Your evaluation perspective covers (1) alignment with the 3-year vision, (2) building competitive advantage, (3) long-term shareholder value. Weight 'why now' over the numbers. Put executive intuition into words. Always respond in the specified JSON schema."

### CFO Agent (key: OPENAI_API_KEY_CFO)
System prompt: "You are MIXI's CFO. Your evaluation perspective covers (1) payback period (ROI, IRR), (2) quantifying business risks, (3) financial discipline. Strip out unfounded optimism and speak in numbers. Always cover the pessimistic scenario. Always respond in the specified JSON schema."

### CTO Agent (key: OPENAI_API_KEY_CTO)
System prompt: "You are MIXI's CTO. Your evaluation perspective covers (1) technical feasibility, (2) scalability ceilings, (3) load on the engineering org. Imagine the ops load six months in. Be honest about technical debt. Always respond in the specified JSON schema."

### Facilitator Agent (key: OPENAI_API_KEY)
System prompt: "You are the strategy-meeting facilitator at MIXI. Read the three executives' independent evaluations + cross-reviews and produce a synthesized recommendation. State points of agreement, points of conflict, and action items clearly. When opinions split, don't force consensus — surface the conflict as-is and propose a resolution direction. Always respond in the specified JSON schema."

## API key management: web-UI settings page

### Important: don't ask anyone to edit .env directly

- Neither the facilitator nor the executives open .env by hand
- API keys are entered via the browser /settings page
- The app persists them safely to .env behind the scenes

### Settings page requirements

GET /settings returns HTML (or static/settings.html):
- Title "API key settings"
- 4 password fields (all <input type="password">):
  - OPENAI_API_KEY (default / for the Facilitator)
  - OPENAI_API_KEY_CEO / _CFO / _CTO
- A show/hide toggle next to each field, and a "Validate & Save" button below
- Existing keys show only ••••abc1 (last 4 chars), masked
- A rotate button for each field
- .gitignore status at the bottom (green check = .env excluded)
- The 6 principles quick-reference at the bottom

### API endpoints

- GET /api/settings/status → { all_keys_set, masked, gitignore_ok }
- POST /api/settings → validate (OpenAI test call per key) → atomic .env write → auto-append to .gitignore → hot-apply in memory
- POST /api/settings/rotate → replace only the specified key
- GET /api/settings/gitignore-check → is .env gitignored?

### Startup flow

1. GET / calls /api/settings/status
2. If all_keys_set is false, redirect to /settings?first_run=true
3. After setup, return to /

### Security

- /settings endpoints accept only localhost (docker binds 127.0.0.1:8000:8000)
- Never log or render key values (masked last-4 chars only)
- Regex-check the OpenAI key format
- chmod 600 on .env
- Reject keys that fail validation

### Why this approach

- Executive UX first. Especially on Windows, eliminate the friction of editing dotfiles
- .env stays compatible with Docker Compose as the standard
- All 6 principles are honored (UI design makes them easier to honor)

## Technical requirements

### Backend
- Python 3.11+
- FastAPI (async, asyncio.gather for parallel Phase 1 / Phase 2)
- OpenAI SDK (openai>=1.30) via AsyncOpenAI
- Pydantic v2 for typed models
- python-dotenv to load .env
- Use response_format={"type":"json_object"} for structured OpenAI output

### Frontend
- Single static/index.html
- Vanilla JS + fetch()
- SSE (Server-Sent Events) streaming
- UI updates as each phase/agent completes (progress bar + agent card lights up)
- Markdown rendering: marked.js via CDN
- "Copy to clipboard" button

### API spec (POST /api/review, SSE)
Request: { "agenda": "Should we invest 300M yen in new game X?" }
Response (text/event-stream):
  event: phase1_start / agent_eval(x3) / phase1_done
  event: phase2_start / cross_review(x3) / phase2_done
  event: phase3_start / synthesis / phase3_done
  event: complete / error

### Error handling
- OpenAI rate limit → exponential backoff, max 3 retries (1s, 2s, 4s)
- Missing API key → detect at startup and report which keys are missing
- Per-phase timeout: 30 seconds
- Broken JSON output → one retry with "the format was invalid, please regenerate"

### Logging
- Save raw I/O to logs/{timestamp}-{request_id}.json
- Record per-phase duration and input/output token counts

### Docker
- One command: docker compose up -d
- localhost:8000 serves both the frontend and the API
- Based on python:3.11-slim

## File layout

ai-executive/
├── .env.example
├── .gitignore
├── README.md
├── docker-compose.yml
├── Dockerfile
├── requirements.txt
├── app/
│   ├── __init__.py
│   ├── main.py       # FastAPI + SSE + redirect logic
│   ├── agents.py     # 4 agents
│   ├── pipeline.py   # 3-phase orchestration
│   ├── settings.py   # Settings API (/api/settings/*)
│   ├── env_writer.py # atomic .env write + gitignore check
│   ├── models.py     # Pydantic schemas
│   └── prompts.py    # system prompts
└── static/
    ├── index.html    # agenda-evaluation UI
    └── settings.html # API key settings UI

Total ≤ 14 files.

## Constraints

### File count and code volume
- Total files: ≤ 14
- Total code: ≤ 700 lines

### Execution time (measured)
- Full processing per agenda: 40–70 seconds typical (hard cap 90s per phase)
  - Phase 1 (parallel): 8–20s
  - Phase 2 (parallel, longer prompts): 10–25s
  - Phase 3 (serial): 5–15s

### Cost (7 OpenAI calls: Phase 1 x3 + Phase 2 x3 + Phase 3 x1)
- gpt-4o-mini (default): $0.01–0.02 per agenda (~15–25k in / ~5–8k out tokens)
- gpt-4o: $0.25–0.50 per agenda
- DEFAULT_MODEL env var + /settings dropdown to switch models
- Per-agent model override (MODEL_CEO, MODEL_CFO, etc.)

### Phase 2 partial-failure handling (required)
If 1–2 agents fail in Phase 2 (broken JSON / timeout / rate limit):
1. One retry (with backoff)
2. If still failing, record as {"status": "unavailable", "reason": "..."}
3. Phase 3 Facilitator synthesizes with whatever cross-reviews are available
4. residual_risks must state "Only N/3 cross-reviews obtained: {missing role}'s view not reflected"
5. Mark decision_recommendation confidence as "weak"
6. UI: grey out the missing card, show a warning icon, offer retry
Never issue GO/NO-GO on 2/3 opinions silently. The missing view must be visible.

### README.md required contents
Purpose / 4-step startup / .env setup (via web UI) / API spec / license

### First-run UX target
docker up → /settings → evaluation screen reachable in under 60 seconds

## What to show in Plan mode

1. File list + one-line responsibility per file
2. Pseudocode for each phase (≤ 30 lines, Python-flavored)
3. API endpoint spec (SSE event list included)
4. /settings UI flow (state machine: unset / partial / all-set)
5. Settings API endpoints + validation flow
6. .env atomic write + gitignore check implementation approach
7. Error-handling strategy (4 error sources × response)
8. Pydantic model skeletons (4–5 models)
9. requirements.txt equivalent
10. Startup smoke-test procedure

After approval, generate all files. API keys will be placed later — for now, only create .env.example.

After pasting, turn Plan mode ON and send. When the file tree previews, hit Approve. Generation runs, and all files land in 1–2 minutes.

Splitting into four keys (CEO / CFO / CTO / default) is a setup for managing billing caps per executive later. Technically, one shared key would work. The workshop shows the split pattern as the standard. See §6.

Placing the API keys (6 principles in practice) — web UI approach

To remove the friction of executives (especially Windows users) editing dotfiles directly, the generated app includes a web-UI settings page (/settings). Workshop step P1-02 walks everyone through that page.

First launch: docker compose up -d → open http://localhost:8000 → app auto-redirects to /settings?first_run=true
Paste pre-issued keys into the four <input type="password"> fields
"Validate & Save" — the app test-calls OpenAI (models.list or similar) per key
All validate → atomic write to .env, auto-check and append to .gitignore, hot-apply in memory (no restart)
Success toast + "Go to evaluation" button returns to /

How the 6 principles are honored

#	Principle	How the web UI delivers it
1	Don't paste in chat	Password input — not visible to eyes or screen capture
2	Via .env	App writes .env atomically behind the scenes (same mechanism)
3	.gitignore check	Auto-check and append on Save, green check displayed
4	Rotation	Dedicated "Rotate" button in /settings
5	Least privilege	OpenAI-side scoping (UI doesn't touch this)
6	Prod / dev split	dev/prod tabs in /settings (planned extension)

For acquisition, rotation, and billing caps on the operations side, see the API key hygiene guide, "Web UI vs dotfile" chapter. For the workshop: issue same-day → Save → run evals → revoke at end via /settings Rotate.

Startup and verification

Generation and key placement done — time to launch.

Start with Docker Compose

cd ~/workshop-demo/ai-executive
docker compose up -d
open http://localhost:8000

First run auto-redirects to /settings — enter keys into the 4 fields → Validate & Save → on success, app writes .env, auto-checks .gitignore, returns to evaluation.
Drop in a sample agenda — paste "Should we invest 300M yen in new game X?" and run. SSE streams Phase 1 → 2 → 3 across the screen in 40–45 seconds.
Watch Claude self-repair — first runs almost always hit a minor error (async implementation slip, SSE config missing, broken JSON output). Paste the log to Claude with "we're getting an error" — and auto-repair runs. That's the highlight of the demo.

Typical first-run errors and how Claude responds:

Error	Cause	Claude's response
`ModuleNotFoundError: openai`	requirements.txt missing it	Adds `openai>=1.30` and proposes re-running `docker compose build`
`Port 8000 is already in use`	Another app is holding the port	Rewrites docker-compose.yml to `8001:8000`
`openai.AuthenticationError`	.env not loaded / key wrong	Adds python-dotenv or adds `env_file: .env` to compose
`json.decoder.JSONDecodeError`	Broken JSON from OpenAI	Makes response_format explicit and adds one-retry logic
SSE EventSource error in Chrome	CORS or buffering issue	FastAPI `StreamingResponse` with correct media_type
Phase 2 has one slow agent	No gather timeout	Adds `asyncio.wait_for` with a 30-second limit
Settings page doesn't appear	Redirect logic missing	Adds a status check on GET /, redirects to /settings when all_keys_set is false
.env write fails	Volume not mounted / permission	Adds `.env:/app/.env` (rw) volume to docker-compose.yml

The message for executives in this section: "An error isn't a failure. It's the conversation continuing. Throw the log to Claude; it fixes. That feeling — 'I can modify this myself' — is the entry point."

Natural-language feature addition (P1-04): a devil's advocate field

Once startup is verified, the Part 1 climax is adding a feature by typing English. New spec: add a "devil's advocate question" to Phase 2 cross-review, tightening the mutual check against optimism bias.

Add one more item to Phase 2 cross-review:
a "devil's advocate question".

Each agent, when reviewing the other agents' evaluations,
adds one question: "Does this still hold in the worst case?"

Purpose: strengthen mutual checks on optimism bias,
with one added field.

Display the new field in the frontend UI as well.

Claude shows the diff in the Visual Diff preview: a new field on CrossReview in app/models.py, extended prompt template in app/prompts.py, a new card in static/index.html. Click Accept → redeploy with docker compose restart → re-run the same agenda → the three cross-reviews now each carry the "worst-case" question. Done.

What matters in this demo: the executive approving a change without reading the code. And Claude keeps consistency across multiple files — something a single-file edit can't show. That's the real entry into "I can modify this myself".

Extension ideas (to try at home)

A list of "what's next" the executive can try at home. All of these need one more line in the spec prompt and Claude handles it.

Add more agents — CMO / COO / CLO / CHRO. One line of system_prompt and Claude adds it to the agent array.
Persist past agendas and outcomes to a DB — add SQLite, store agenda + 3 evaluations + action items. Full-text or embedding search across "similar investments in the last 3 years".
MCP to post agendas to Slack / Drive — via MCP, post agendas to a Slack channel / Google Drive / Notion, and send results back. See MCP deep dive.
Route the output to Claude Design for board slides — three evaluations + action items auto-flow into a slide template. See Claude Design deep dive.
Derive system prompts from actual MIXI materials — extract phrasing and perspectives from shareholder letters, AGM notices, and integrated reports, and fold them into each agent's system_prompt. The "it sounds like us" factor goes up significantly.
Highlight conflict points — pull the contradictory claims across the three and surface them as a separate "to-debate" section. Port the synthesizer.py idea from the reference back down into the mini.

For the at-home exercise, "add one more agent" is the highest-ROI pick. Fifteen minutes, and the spec-prompt modification muscle starts to build.

Troubleshooting

What tends to jam the day of, and how to handle it on the spot.

Symptom	Cause	Fix
Plan mode doesn't appear	Selected model doesn't support Plan mode	Pick a model marked "Plan mode supported" (Sonnet 4.7 / Opus 4.7)
OpenAI `AuthenticationError`	`.env` isn't being read (`source .env` doesn't reach the container)	Load via python-dotenv inside `app.py`, or add `env_file: .env` to `docker-compose.yml`
`Port 8000 already in use`	Another app holding the port (usually the reference running in the background)	Change `docker-compose.yml` ports to `8001:8000`, use `localhost:8001`
Docker is slow to start	First-time image pull (python:3.11-slim is ~150MB)	Wait 2–3 minutes. Running `docker pull python:3.11-slim` ahead of time makes it instant.
CORS error on the frontend	Static mount misconfigured, or frontend calling a different origin	Use FastAPI's `StaticFiles` as `app.mount("/", StaticFiles(..., html=True))` to unify origin
Agents run sequentially instead of in parallel	Sync `for` loop instead of `asyncio.gather`	Tell Claude "it's not parallel" — it rewrites with `asyncio.gather(*tasks)`

First principle on the day: if you're stuck, paste the whole log into Claude. Don't try to diagnose yourself. That saves the most time.

Official links

Last verified: 2026-04-22

Sources

Claude Code Desktop Documentation
OpenAI API Reference
FastAPI Documentation
Docker Compose Documentation
Reference implementation (local): ~/Products/ai-executive — inspiration source only
Internal: API key hygiene — the complete guide

The ai-executive speccross-review system

Why "generate from a spec"

Where ai-executive (reference) fits

What we're generating (executive-facing summary)

Sample output (should we invest 300M yen in new game X?)

Phase 1: Independent evaluation (completed in 14.2s)

Phase 2: Cross-review (completed in 18.4s)

Phase 3: Synthesis (completed in 8.1s)

File layout (expected output)

The spec prompt (copy and paste)

Placing the API keys (6 principles in practice) — web UI approach

How the 6 principles are honored

Startup and verification

Natural-language feature addition (P1-04): a devil's advocate field

Extension ideas (to try at home)

Troubleshooting

Official links

The ai-executive spec
cross-review system