@@ -71,10 +71,11 @@ Delegates to the `code-reviewer` and `architect` agents in parallel for a two-la
Do not self-review as a fallback. If the `code-reviewer` or `architect` agent path is missing, unavailable, skipped, or fails, emit a clear unavailable-review result and block approval until the independent lane evidence exists.
Respect the user's current model and reasoning/effort selection when launching review lanes. Do not pass `model` or `reasoning_effort` overrides in the review-lane task calls unless the user explicitly asks for review-specific overrides; omitting them lets native subagents inherit the active session settings.
```
task(
agent_type="code-reviewer",
reasoning_effort="xhigh",
prompt="CODE REVIEW TASK
Review code changes for quality, security, and maintainability.
@@ -51,6 +51,11 @@ If no flag is provided, use **Standard**.
- Gather codebase facts via `explore` before asking user about internals
-`omx explore` is deprecated. Use normal repository inspection tools/subagents for simple read-only brownfield fact gathering; use `omx sparkshell` only for explicit shell-native read-only evidence, and keep ambiguous or non-shell-only investigation on the richer normal path.
- Always run a preflight context intake before the first interview question
- For brownfield work, preflight must include doc/context grounding before user-facing questions: inspect applicable `AGENTS.md` files, README/getting-started docs, relevant `docs/` contracts/plans/ADRs, existing `.omx/context/` snapshots, and any project-local glossary/context files such as `CONTEXT.md` or `CONTEXT-MAP.md` when present.
- Treat existing repo language as evidence, not authority: if the user uses a fuzzy, overloaded, or conflicting term, surface the specific doc/code wording and ask which meaning should govern before implementation.
- Cross-check user claims about current behavior against code or documented contracts when discoverable. If docs and code disagree, ask a confirmation question that names both sources instead of silently choosing one.
- Use scenario-based edge-case grilling when relationships, boundaries, or handoff behavior are unclear: invent one concrete scenario that stresses the ambiguous boundary, then ask one focused question about the expected outcome.
- Durable docs, glossary, ADR, or memory updates are opt-in and public-safe only. Deep-interview may recommend such updates in the handoff summary, but must not automatically create or dump public docs from interview transcripts unless the user explicitly chooses that as in-scope.
- If initial context is oversized or would exceed the prompt budget, do not paste or forward the raw payload into interview prompts; request and record a prompt-safe initial-context summary first
- The oversized initial-context summary gate is blocking: wait for the concise summary before ambiguity scoring, crystallizing artifacts, or any downstream execution handoff
- The summary must preserve goals, constraints, success criteria, non-goals, decision boundaries, and references to any full source documents so downstream consumers receive a prompt-safe but faithful context
...
...
@@ -97,8 +102,15 @@ If no flag is provided, use **Standard**.
- Unknowns/open questions
- Decision-boundary unknowns
- Likely codebase touchpoints
- Relevant repo docs/rules/context inspected
- Terminology or doc/code conflicts found
- Prompt-safe initial-context summary status (`not_needed`, `needed`, or `recorded`)
5. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
5. For brownfield tasks, inspect the applicable documentation/rule surface before the first user-facing round. Prefer exact, nearby sources over broad scans:
- governing `AGENTS.md` files and template/runtime instruction surfaces that apply to the touched paths
- README/getting-started docs and relevant docs under `docs/`, especially contracts, plans, ADR-like records, and workflow docs
- existing `.omx/context/` snapshots, `.omx/specs/`, and planning artifacts relevant to the slug
- project-local glossary/context files such as `CONTEXT.md`, `CONTEXT-MAP.md`, or context-specific docs when they exist
6. Save snapshot to `.omx/context/{slug}-{timestamp}.md` (UTC `YYYYMMDDTHHMMSSZ`) and reference it in mode state.
## Phase 1: Initialize
...
...
@@ -137,13 +149,14 @@ If no flag is provided, use **Standard**.
Repeat until ambiguity `<= threshold`, the pressure pass is complete, the readiness gates are explicit, the user exits with warning, or max rounds are reached. This is a stop condition: below threshold, do not open a new ordinary interview branch.
### 2a) Generate next question
If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
If the initial context is oversized and no prompt-safe summary has been recorded yet, the next question must be only a summary request. Do not score ambiguity, do not run readiness gates, and do not hand off to `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, or `$team` until that summary answer is captured.
Use:
- Original idea
- Prior Q&A rounds
- Current dimension scores
- Brownfield context (if any)
- Doc/context grounding notes, including existing terminology, governing rules, and any doc/code mismatch
- Activated challenge mode injection (Phase 3)
Target the lowest-scoring dimension, but respect stage priority:
...
...
@@ -155,12 +168,21 @@ Follow-up pressure ladder after each answer:
1. Ask for a concrete example, counterexample, or evidence signal behind the latest claim
2. Probe the hidden assumption, dependency, or belief that makes the claim true
3. Force a boundary or tradeoff: what would you explicitly not do, defer, or reject?
4. If the answer still describes symptoms, reframe toward essence / root cause before moving on
4. Challenge fuzzy or conflicting terms against the repo's documented language and current code behavior
5. Stress-test the boundary with one concrete scenario or edge case when a relationship or handoff remains ambiguous
6. If the answer still describes symptoms, reframe toward essence / root cause before moving on
Prefer staying on the same thread for multiple rounds when it has the highest leverage. Breadth without pressure is not progress.
Maintain a **Breadth Ledger** across independent ambiguity tracks: scope, constraints, outputs, verification, brownfield integration, and any user-mentioned deliverable tracks. The ledger is a guard, not a mandatory rotation rule: stay deep on the current thread until it has been pressure-tested, then zoom out only when another material track remains unresolved and would change execution.
Maintain a **Docs/Terminology Ledger** for brownfield interviews:
- repo docs/rules/context sources inspected, with path references
- canonical terms already used by the repo and terms to avoid or disambiguate
- user terms that conflict with docs or current code behavior
- doc/code mismatches that require a human decision before implementation
- optional durable-doc follow-ups that are safe to propose but not auto-apply
Detailed dimensions:
- Intent Clarity — why the user wants this
- Outcome Clarity — what end state they want
...
...
@@ -306,6 +328,7 @@ Append round result and updated scores via `omx state write --input '<json>' --j
Use each mode once when applicable. These are normal escalation tools, not rare rescue moves:
-**Contrarian** (round 2+ or immediately when an answer rests on an untested assumption): challenge core assumptions
-**Terminologist** (brownfield, whenever a key term is fuzzy, overloaded, or conflicts with repo docs/code): force a canonical meaning against existing project language before implementation
-**Simplifier** (round 4+ or when scope expands faster than outcome clarity): probe minimal viable scope
-**Ontologist** (round 5+ and ambiguity > 0.25, or when the user keeps describing symptoms): ask for essence-level reframing
...
...
@@ -336,6 +359,9 @@ Spec should include:
- Assumptions exposed + resolutions
- Pressure-pass findings (which answer was revisited, and what changed)
- Brownfield evidence vs inference notes for any repository-grounded confirmation questions
- Docs/Terminology Ledger with inspected repo docs/rules/context, term conflicts, and any doc/code mismatch decisions
- Scenario/edge-case pressure findings that materially shaped scope or acceptance criteria
- Optional durable documentation recommendations, explicitly marked opt-in and public-safe; do not include raw private transcript dumps
- Technical context findings
- Full or condensed transcript
...
...
@@ -365,11 +391,11 @@ When the clarified task is specifically about `$autoresearch`, or the skill is i
## Phase 5: Execution Bridge
Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, and any residual-risk warnings across the handoff.
Present execution options after artifact generation using explicit handoff contracts. Treat the deep-interview spec as the current requirements source of truth and preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, and any residual-risk warnings across the handoff.
### Goal-mode follow-ups
Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
Include these product-facing suggestions when they fit the clarified spec, without removing the existing `$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, and `$team` handoff options:
-**`$ultragoal`** — default goal-mode follow-up for implementation or general goal-oriented follow-up specs that should be converted into durable Codex/OMX goals with sequential completion tracking.
-**`$autoresearch-goal`** — use when the clarified context is a research project: a research question, reference/literature gathering, evaluator-backed analysis, or professor/critic-style deliverable.
...
...
@@ -377,7 +403,16 @@ Include these product-facing suggestions when they fit the clarified spec, witho
Recommend `$ultragoal` as the default durable goal-mode follow-up because it supersedes Ralph for goal tracking. Preserve `$team` for coordinated parallel implementation and keep `$ralph` only as an explicit fallback for persistent single-owner execution/verification when the user specifically selects it.
-**Input Artifact:**`.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
-**Invocation:**`$ultragoal create-goals --brief-file <spec-path>` followed by `$ultragoal complete-goals` in the active execution lane
-**Consumer Behavior:** Convert the clarified spec into durable goal-mode work. Preserve intent, non-goals, decision boundaries, acceptance criteria, docs/terminology grounding, scenario-pressure findings, and residual-risk warnings as binding story constraints.
-**Skipped / Already-Satisfied Stages:** Requirement interview, ambiguity clarification, doc/context preflight, and early intent-boundary elicitation
-**Expected Output:**`.omx/ultragoal/brief.md`, `.omx/ultragoal/goals.json`, `.omx/ultragoal/ledger.jsonl`, implementation evidence, verification evidence, and final cleanup/review-gate evidence
-**Best When:** The clarified spec is execution-ready or the user explicitly wants durable goal tracking as the next step
-**Next Recommended Step:** Run the Ultragoal completion loop; launch `$team` only inside an active Ultragoal story when parallel lanes are warranted, and use `$ralph` only as an explicit fallback when the user asks for that legacy persistence mode
### 2. **`$ralplan` (Recommended when architecture/test-shape review is still needed)**
-**Input Artifact:**`.omx/specs/deep-interview-{slug}.md` (optionally accompanied by the transcript/context snapshot for traceability)
-**Consumer Behavior:** Treat the deep-interview spec as the requirements source of truth. Do not repeat the interview by default; refine architecture/feasibility around the clarified intent and boundaries instead.
...
...
@@ -386,7 +421,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
-**Best When:** Requirements are clear enough to stop interviewing, but architectural validation / consensus planning is still desirable
-**Next Recommended Step:** Use the approved planning artifacts with `$ultragoal` as the default durable goal-mode follow-up (optionally with `$team` for parallel lanes); choose `$autoresearch-goal` for research validation or `$performance-goal` for measurable optimization, and use `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
-**Consumer Behavior:** Use the deep-interview spec as the clarified execution brief. Preserve intent, non-goals, decision boundaries, and acceptance criteria as binding context for planning/execution.
...
...
@@ -395,7 +430,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
-**Best When:** The clarified spec is already strong enough for direct planning + execution without an additional consensus gate
-**Next Recommended Step:** Continue through autopilot's execution/QA/validation flow; if coordination-heavy execution emerges, prefer `$team` under a leader-owned `$ultragoal` ledger, using `$ralph` only as an explicit fallback when a narrow single-owner persistence loop is requested
-**Consumer Behavior:** Use the spec's acceptance criteria and boundary constraints as the persistence target. Do not reopen requirements discovery unless the user explicitly asks to refine further.
...
...
@@ -404,7 +439,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
-**Best When:** The user explicitly asks for Ralph's persistent sequential completion pressure; otherwise use `$ultragoal` for durable goal tracking and completion checkpoints
-**Next Recommended Step:** If this explicit fallback is selected, continue Ralph's persistence loop; if work expands into coordination-heavy lanes, hand off to `$team` under `$ultragoal` checkpointing rather than promoting Ralph as the next default
-**Consumer Behavior:** Treat the spec as shared execution context for coordinated parallel work. Preserve the clarified intent, non-goals, decision boundaries, and acceptance criteria as common lane constraints.
...
...
@@ -413,7 +448,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
-**Best When:** The task is large, multi-lane, or blocker-sensitive enough to justify coordinated parallel execution instead of a single persistent loop
-**Next Recommended Step:** Follow the team verification path when the coordinated execution phase finishes; checkpoint completion through `$ultragoal` by default, escalating to a separate Ralph loop only when the user explicitly asks for that persistent verification/fix owner
### 5. **Refine further**
### 6. **Refine further**
-**Input Artifact:** Existing transcript, context snapshot, and current spec draft
-**Invocation:** Continue the interview loop
-**Consumer Behavior:** Re-enter questioning to resolve the highest-leverage remaining uncertainty
...
...
@@ -437,6 +472,7 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
- Use `omx state write/read --input '<json>' --json` for resumable mode state; `state_write` / `state_read` are explicit MCP compatibility fallbacks only
- If the interview cannot ask a required `omx question` round, persist the blocker as terminal state with `active: false` and `current_phase: "blocked"`; do not write a terminal blocked phase with `active: true`
- Read/write context snapshots under `.omx/context/`
- Read applicable repo docs/rules/context during preflight; write durable docs, glossary, ADR, or memory updates only when the user explicitly opts in and the content is public-safe
- Record whether the oversized-context summary gate is not needed, pending, or satisfied before any scoring or handoff step
- Save transcript/spec artifacts under `.omx/interviews/` and `.omx/specs/`
</Tool_Usage>
...
...
@@ -460,7 +496,11 @@ Recommend `$ultragoal` as the default durable goal-mode follow-up because it sup
-[] Transcript written to `.omx/interviews/{slug}-{timestamp}.md`
-[] Spec written to `.omx/specs/deep-interview-{slug}.md`
-[] Brownfield questions use evidence-backed confirmation when applicable
-[] Handoff options provided (`$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$ultragoal`, `$autoresearch-goal`, `$performance-goal`) when applicable
-[] Brownfield preflight inspected applicable repo docs/rules/context before user-facing questions
-[] Fuzzy or conflicting terminology was challenged against repo language/current code behavior when applicable
-[] Scenario-based edge-case grilling was used when boundary ambiguity would materially affect implementation
-[] Durable docs/ADR/memory updates, if any, were explicitly opted into and public-safe
-[] Handoff options provided (`$ultragoal`, `$ralplan`, `$autopilot`, `$ralph`, `$team`) plus context-sensitive goal-mode suggestions (`$autoresearch-goal`, `$performance-goal`) when applicable
-[] No direct implementation performed in this mode
@@ -254,31 +254,31 @@ Auto-generated by `omx setup` from the current `config.toml` plus OMX model over
| Role | Model | Reasoning Effort | Use Case |
| --- | --- | --- | --- |
| Frontier (leader) | `gpt-5.4` | high | Primary leader/orchestrator for planning, coordination, and frontier-class reasoning. |
| Frontier (leader) | `gpt-5.5` | high | Primary leader/orchestrator for planning, coordination, and frontier-class reasoning. |
| Spark (explorer/fast) | `gpt-5.3-codex-spark` | low | Fast triage, explore, lightweight synthesis, and low-latency routing. |
| Standard (subagent default) | `gpt-5.4` | high | Default standard-capability model for installable specialists and secondary worker lanes unless a role is explicitly frontier or spark. |
| Standard (subagent default) | `gpt-5.5` | high | Default standard-capability model for installable specialists and secondary worker lanes unless a role is explicitly frontier or spark. |
| `explore` | `gpt-5.3-codex-spark` | low | Fast codebase search and file/symbol mapping (fast-lane, fast) |