- 02 Jun, 2026 40 commits
-
-
Update the handoff package with newer runtime evidence so the next session can distinguish a still-progressing epoch from a hung pipeline while waiting for the first saved model file. Constraint: Verification had to rely on live process state because Epoch 1 has not completed yet Rejected: Leave the prior checkpoint as-is | would force the next session to re-check whether progress continued Confidence: high Scope-risk: narrow Directive: Continue checking for the first transition into saved model output, build-index, or evaluate before drawing quality conclusions Tested: ps on PID 311629; process scan for smoke-local/build-index/evaluate; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests; find on /tmp/fma_real_smoke_stopcheck/fma_models_smoke Not-tested: Final FMA smoke report and accuracy metrics
cnb.bofCdSsphPA authored -
Capture the current real-FMA CPU smoke checkpoint, restart path, and delivery handoff so the next session can resume without re-diagnosing an expected long-running training stage. Constraint: Real FMA smoke is still running on CPU with no GPU available Rejected: Wait for final smoke completion before documenting | would delay a usable handoff artifact Confidence: high Scope-risk: narrow Directive: Keep staging explicit; do not include datasets, smoke outputs, checkpoints, or caches Tested: git diff review; live process check; validate-splits on /tmp/fma_real_smoke_stopcheck/fma/manifests Not-tested: Final FMA smoke metrics after Epoch 1 completion
cnb.bofCdSsphPA authored -
Constraint: This checkpoint records running-smoke evidence only and must not stage data, model artifacts, or tmp outputs Rejected: Wait for the full real FMA smoke to finish before updating handoff docs | The running-state evidence is already valuable for the next session and should not be lost Confidence: high Scope-risk: narrow Directive: Keep future restart notes aligned with the live smoke status and continue using explicit file staging Tested: Re-verified real FMA smoke is running on CPU, manifests validate, and the documented no-GPU condition explains the long training phase Not-tested: Did not wait for Epoch 1 completion, model checkpoint emission, or downstream build-index/evaluate completion
cnb.bofCdSsphPA authored -
Constraint: Limit this checkpoint to ignore rules and handoff notes; do not change tracked artifact history Rejected: Expand ignore coverage to all noisy data trees immediately | This pass only suppresses well-understood untracked cache noise Confidence: high Scope-risk: narrow Directive: Keep ignore changes incremental and distinguish between untracked cache noise and already-tracked historical artifacts Tested: Confirmed untracked __pycache__ and .pyc noise disappeared from git status after the ignore update Not-tested: Did not rewrite tracking state for already-versioned cache or data artifacts
cnb.bofCdSsphPA authored -
Constraint: Limit this checkpoint to ignore rules and handoff notes; do not alter dataset contents Rejected: Ignore broad data or cache trees immediately | This pass only suppresses confirmed local-generated noise with low risk Confidence: high Scope-risk: narrow Directive: Keep adding ignore rules incrementally and only for artifacts proven to be local/generated noise Tested: Confirmed the targeted .omx wait files and real-smoke CSV no longer appear in git status after the ignore update Not-tested: Did not broaden ignore coverage to larger data/cache trees in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Limit this checkpoint to handoff and changelog documentation consistency Rejected: Leave small formatting mismatches across entrypoints | Restart guidance should be byte-level easy to copy and compare Confidence: high Scope-risk: narrow Directive: Keep the first verification command identical across AGENT, README, session handoff, and delivery handoff Tested: Verified all four primary entrypoints contain the same runnable command block Not-tested: No code or training path executed in this consistency-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: Restrict this checkpoint to handoff documentation consistency only Rejected: Leave delivery handoff behind the newer restart guidance | All primary restart entrypoints should expose the same first verification command Confidence: high Scope-risk: narrow Directive: Keep the first runnable command identical across AGENT, README, session handoff, and delivery handoff Tested: Rechecked relative links in the updated delivery handoff and changelog docs Not-tested: No code or training path executed in this handoff-consistency checkpoint
cnb.bofCdSsphPA authored -
Constraint: Limit this checkpoint to memory and handoff documentation only Rejected: Keep the restart command only in README | New sessions should see the first verification command directly in AGENT memory too Confidence: high Scope-risk: narrow Directive: Keep AGENT memory focused on restart-critical commands and avoid duplicating full workflow specs there Tested: Rechecked 174 relative links across AGENT, changelog, and handoff docs Not-tested: No code or training path executed in this memory-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: Restrict this checkpoint to navigation and handoff documentation backed by fresh local verification Rejected: Keep restart guidance read-only without a first command to run | New sessions benefit from an immediate executable sanity check Confidence: high Scope-risk: narrow Directive: Keep README focused on compressed restart guidance and use the offline smoke only as an environment and chain sanity check Tested: Re-ran business_export_offline_smoke.py successfully and rechecked 215 relative links across the updated docs Not-tested: Did not connect to a live business export or run full training/evaluation beyond dry-run
cnb.bofCdSsphPA authored -
Constraint: Restrict this checkpoint to navigation documentation only Rejected: Leave the overview mismatch and rely on users to infer reading order | Restart sessions should get a direct, explicit path Confidence: high Scope-risk: narrow Directive: Keep README focused on compressed navigation and restart order, not on duplicating full specs Tested: Rechecked 215 relative links across the updated overview, changelog, and handoff docs Not-tested: No code or training path executed in this navigation-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint limited to navigation docs and preserve the condensed doc structure Rejected: Keep the new business-export material discoverable only through deep links | New sessions should find the intake chain from the overview immediately Confidence: high Scope-risk: narrow Directive: Maintain README as the compressed navigation surface and avoid expanding it into another full spec Tested: Rechecked 211 relative links across the updated overview, changelog, and handoff docs Not-tested: No code or training path executed in this navigation-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: Limit this checkpoint to documentation updates backed by already-collected local evidence Rejected: Leave the smoke result only in transient chat output | The next session needs the proof captured in repo-native handoff files Confidence: high Scope-risk: narrow Directive: Keep treating the offline smoke as an integration proof, not as a substitute for real business-data validation Tested: Rechecked 183 relative links and documented the successful offline smoke summary already verified locally Not-tested: No new code path executed in this documentation-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: Keep verification offline-only and avoid touching real databases or production assets Rejected: Stop at manifest generation without execution evidence | A dry-run smoke gives the next session stronger handoff confidence Confidence: high Scope-risk: narrow Directive: Stage local sample audio inside the smoke workspace so manifest paths remain self-contained and reproducible Tested: Ran business_export_offline_smoke.py end-to-end; verified normalize/build summaries and train.py --dry-run success; rechecked adapter doc links Not-tested: Did not run full training/evaluation on live business exports or connect to any database
cnb.bofCdSsphPA authored -
…y from normalized rows Constraint: Keep this checkpoint offline-only and avoid touching real business data, datasets, or model artifacts Rejected: Leave final manifest shaping as a manual next-session task | The handoff is stronger when catalog/train/test/val can already be produced automatically Confidence: high Scope-risk: narrow Directive: Treat these generated manifests as integration-stage scaffolds and validate final field policy again before production data ingestion Tested: Ran build_business_project_manifests.py on normalized sample data and verified catalog/train/test/val structure; rechecked 70 relative links Not-tested: Did not run the generated manifests through full training/evaluation against live business audio
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint offline-only and avoid touching real business data, datasets, or model artifacts Rejected: Leave role splitting as a manual next-session step | The export chain is more usable when reference/query/excluded lists are produced automatically Confidence: high Scope-risk: narrow Directive: Treat the split outputs as staging lists and keep final project-manifest adaptation explicit in the downstream integration step Tested: Normalized the sample CSV, ran split_business_manifest_ready.py, verified 1 reference + 1 query + 1 excluded row, and rechecked 73 relative links Not-tested: Did not run against a live business export or feed the split outputs into the full training pipeline
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint offline-only and avoid touching real databases, datasets, or model artifacts Rejected: Stop at static CSV/JSONL examples only | The next session needs an executable normalization path, not just samples Confidence: high Scope-risk: narrow Directive: Treat normalized JSONL as manifest-ready staging output and keep final manifest shaping explicit in the integration step Tested: Ran normalize_business_export.py on the sample CSV and JSONL inputs; verified 3 output rows each; rechecked 71 relative links Not-tested: Did not run against a live business export or connect to any database
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint static and avoid any real database connectivity or dataset mutation Rejected: Leave export details implicit until a live exporter exists | The next session needs concrete SQL, CSV, and JSONL examples now Confidence: high Scope-risk: narrow Directive: Treat the SQL as a field-mapping example only and adapt table names to the real schema during integration Tested: Parsed the CSV and JSONL examples and rechecked 69 relative links across the export docs Not-tested: Did not connect to a production database or execute a live export
cnb.bofCdSsphPA authored -
Constraint: Keep the checkpoint lightweight and avoid touching real datasets or generated artifacts Rejected: Defer manifest guidance until a DB export tool exists | The next session needs repo-native field and role contracts now Confidence: high Scope-risk: narrow Directive: Default ambiguous assets to excluded until manual review confirms song identity and usable role Tested: Parsed manifest templates; verified print_business_type_mapping.py emits valid JSON; rechecked 94 relative links Not-tested: Did not connect to a real database or run a live export in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Keep this checkpoint documentation-first and avoid staging dataset, cache, or model artifacts Rejected: Leave the asset-type strategy implicit in chat only | The next session needs repo-native guidance and templates Confidence: high Scope-risk: narrow Directive: Treat type-based buckets as a starting scaffold and keep hard-negative curation manual until evidence supports automation Tested: Parsed both bucket JSON templates and rechecked 104 relative links across the new docs Not-tested: Did not run a fresh business-type benchmark in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Keep the checkpoint lightweight and avoid touching dataset or model artifacts Rejected: Wait to add buckets until automatic semantic labeling exists | Manual curated buckets are enough to unblock the next session now Confidence: high Scope-risk: narrow Directive: Use the template as a curated benchmark scaffold, not as evidence that filenames imply semantics Tested: Parsed the new JSON template; ran ab_smoke_bucketed.py --help; rechecked targeted relative links Not-tested: Did not launch a new semantic bucket benchmark run in this checkpoint
cnb.bofCdSsphPA authored -
Constraint: Avoid staging datasets, smoke artifacts, /tmp outputs, and caches Rejected: Delay handoff until larger semantic buckets exist | User asked for immediate delivery and resumability now Confidence: high Scope-risk: narrow Directive: Treat toy prefix buckets as a methodology baseline, not a product conclusion Tested: Verified /tmp/ab_smoke_bucketed_smoke/report.json and bucket_report.json outputs; reviewed targeted git diff Not-tested: No new training or benchmark execution in this documentation-only checkpoint
cnb.bofCdSsphPA authored -
Constraint: The cap48/cap64 reversal means strategy guidance can no longer rely on a single overall subset result Rejected: Keep bucket benchmarking as a doc-only next step | The repo now needs an executable baseline so later sessions can measure scale/style divergence directly Confidence: high Scope-risk: moderate Directive: Treat ab_smoke_bucketed.py as the canonical seed for style-aware evaluation, and expand bucket definitions before revisiting global default-strategy claims Tested: Verified acr-engine/scripts/ab_smoke_bucketed.py passes py_compile; verified first bucket prefix_000_a produced bucket_report.json with hybrid 4/1.0/1.0 and high_energy 3/1.0/1.0; verified second bucket execution is in progress Not-tested: Full multi-bucket report.json completion, richer bucket definitions, and bucket-level aggregate conclusions
cnb.bofCdSsphPA authored -
Constraint: Strategy guidance must now reflect that cap48 and cap64 produce different winners under verified runs Rejected: Keep high_energy as the generic default | The completed cap64 run shows hybrid winning clearly at a larger subset size, so the docs must acknowledge scale sensitivity Confidence: high Scope-risk: moderate Directive: Do not present a single global default strategy again until bucketed and style-aware benchmarks explain the cap48/cap64 divergence Tested: Verified cap64 report.json, progress.json, high_energy eval.json, and hybrid eval.json; confirmed cap64 winner=hybrid with top1 0.875 vs high_energy 0.625 Not-tested: Multi-seed cap64 aggregates, bucket/style-aware benchmarks, and any revised hybrid training design
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still incomplete, so only verified hybrid index-complete and evaluation-running evidence can be recorded safely now Rejected: Wait for hybrid eval.json before checkpointing | Would lose the verified handoff that hybrid indexing finished and evaluate.py is already running Confidence: high Scope-risk: narrow Directive: Keep cap64 high_energy and hybrid checkpoints symmetric so the final comparison can be written from docs alone if needed Tested: Verified hybrid reference_progress.json shows 64 refs, 657 windows, 192-d embeddings, and complete status; verified active process is evaluate.py on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified hybrid eval.json and report.json are still absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still active, so only verified training-complete evidence can be recorded now without overstating results Rejected: Wait for hybrid eval before checkpointing | Would lose the stronger handoff evidence that the full hybrid epoch already completed Confidence: high Scope-risk: narrow Directive: Keep distinguishing hybrid training-complete from hybrid index/eval completion until report.json lands Tested: Verified live session output shows hybrid Epoch 1 progressed from 0/32 to 32/32, and verified the active process remains run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests while hybrid eval.json and report.json remain absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still in progress, so this checkpoint can only record verified hybrid stage transitions, not final comparisons Rejected: Wait for hybrid eval before checkpointing | Would lose the verified evidence that hybrid training finished and indexing has already started Confidence: high Scope-risk: narrow Directive: Keep cap64 branch checkpoints symmetric so high_energy and hybrid can be compared later without re-reading process history Tested: Verified active process is run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified /tmp/ab_smoke_seg_cap64_top2/hybrid/fma_models_smoke/best_model.pt exists; verified hybrid eval.json and report.json are still absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still incomplete, so only branch-transition evidence can be recorded safely at this point Rejected: Wait for the hybrid eval before checkpointing | Would lose the verified handoff that execution has moved beyond high_energy into hybrid training Confidence: high Scope-risk: narrow Directive: Keep cap64 branch progression explicit so the next session can resume from the current strategy leg without re-inspection Tested: Verified high_energy eval.json reports num_queries=32, top1=0.625, topk=1.0; verified active processes show external_adapters.py on /tmp/ab_smoke_seg_cap64_top2/hybrid and train.py on /tmp/ab_smoke_seg_cap64_top2/hybrid/fma/manifests; verified hybrid eval.json and report.json are still absent Not-tested: Final hybrid cap64 metrics, final report.json, and any cap64 winner conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run has only produced the high_energy leg so far, so any larger conclusion must wait for hybrid and the final report Rejected: Wait for report.json before checkpointing | Would lose the verified cap64 high_energy score and the proof that execution has already switched into the hybrid branch Confidence: high Scope-risk: narrow Directive: Do not compare cap64 strategy winners until both legs and the final report land; treat the current 0.625 high_energy score as an intermediate checkpoint only Tested: Verified high_energy eval.json reports num_queries=32, top1=0.625, topk=1.0; verified progress.json records the same result; verified the active process has switched to the hybrid smoke-local branch and report.json is still absent Not-tested: Final cap64 hybrid metrics, final report.json, and any cap64-based strategy conclusion
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still active, so this checkpoint can only record stage completion evidence rather than final benchmark conclusions Rejected: Wait for eval.json or report.json before committing | Would lose the verified handoff that indexing finished and evaluate.py is now running Confidence: high Scope-risk: narrow Directive: Keep stage checkpoints explicit—training complete, index complete, evaluation running, report complete—until cap64 fully settles Tested: Verified reference_progress.json shows 64 refs, 657 windows, and complete status; verified active process is evaluate.py on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests; verified high_energy eval.json and report.json are still absent Not-tested: Final cap64 high_energy metrics, hybrid branch execution, and post-cap64 strategy guidance
cnb.bofCdSsphPA authored -
Constraint: The cap64 run is still active, so only verified training-complete evidence can be recorded without overstating results Rejected: Keep only the older build-index note | The live session now proves the entire high_energy epoch finished, which is stronger handoff evidence Confidence: high Scope-risk: narrow Directive: Distinguish clearly between training-complete, indexing-complete, and report-complete milestones in future cap64 checkpoints Tested: Verified live session output shows high_energy Epoch 1 progressed from 0/32 to 32/32, and verified the active process remains run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests Not-tested: Final cap64 eval metrics, hybrid branch progress, and report.json generation
cnb.bofCdSsphPA authored -
Constraint: The cap64 benchmark is still running, so only verified stage-transition evidence can be documented safely Rejected: Wait for cap64 completion before checkpointing | Would leave the next session without proof that the run advanced from training into build-index Confidence: high Scope-risk: narrow Directive: Keep recording cap64 milestones as they happen, but avoid updating winner guidance until report.json lands Tested: Verified cap64 processes are active, confirmed the high_energy branch advanced from train.py to run_demo.py build-index on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests, and confirmed report.json is still absent Not-tested: Final cap64 scores, hybrid branch progression, and any post-cap64 strategy conclusion
cnb.bofCdSsphPA authored -
Constraint: The new cap64 run is still in-flight, so only startup and stage-transition evidence can be documented safely Rejected: Wait for cap64 results before checkpointing | Would leave the next session without a verified handoff that the larger benchmark is already running Confidence: high Scope-risk: narrow Directive: Keep cap64 artifacts out of git and update strategy guidance only after report.json lands Tested: Verified the cap64 ab_smoke process is running, confirmed the high_energy smoke-local branch entered train.py on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests, and recorded the active work root and parameters in docs Not-tested: Final cap64 metrics, hybrid branch execution, and any post-cap64 strategy conclusion
cnb.bofCdSsphPA authored -
Constraint: Strategy guidance had to wait until the full seed=999 report landed and all three cap48 runs could be aggregated consistently Rejected: Keep treating cap48 as unresolved | The third seed now confirms high_energy repeats the same score while hybrid remains volatile Confidence: high Scope-risk: narrow Directive: Treat high_energy as the cap48 default only within the documented FMA smoke condition until larger cap64 and bucketed benchmarks either confirm or overturn it Tested: Verified seed=999 report.json, high_energy eval.json, hybrid eval.json, and computed three-seed aggregate showing high_energy mean_top1=0.9167 with zero variance versus hybrid mean_top1=0.8750 Not-tested: cap64-or-larger benchmarks, bucket/style-aware evaluations, and any future hybrid redesign
cnb.bofCdSsphPA authored -
Constraint: The cap48 seed=999 run has only completed the hybrid leg, so the three-seed aggregate is still incomplete Rejected: Wait for high_energy to finish before checkpointing | Would risk losing the verified hybrid seed999 score from the active Ralph session Confidence: high Scope-risk: narrow Directive: Keep recording verified partial benchmark milestones, but do not revise default-strategy guidance until both strategies and the final report are available Tested: Verified hybrid eval.json reports num_queries=24, top1=0.875, topk=1.0; verified progress.json records the same result; verified high_energy is still running and report.json is still absent Not-tested: Final high_energy seed999 metrics, final report.json, and updated three-seed aggregate
cnb.bofCdSsphPA authored -
Constraint: The running cap48 seed=999 benchmark has not emitted its final report yet, so only in-flight evidence can be recorded safely Rejected: Claim a new three-seed conclusion now | The aggregate would be speculative without report.json and eval outputs Confidence: high Scope-risk: narrow Directive: When a long benchmark is still active, checkpoint stage evidence explicitly and wait for report.json before changing strategy guidance Tested: Verified process tree shows hybrid moved from build-index to evaluate.py; verified reference_progress.json reports 48 refs, 491 windows, 192-d embeddings, and complete status; verified report.json is still absent Not-tested: Final hybrid eval metrics, subsequent high_energy run, and final three-seed aggregate
cnb.bofCdSsphPA authored -
Constraint: The cap48 seed=999 benchmark is still running, so this checkpoint must avoid unverified algorithm conclusions Rejected: Wait for the CPU benchmark to finish | Would delay handoff and leave the next session without a clean restart package Confidence: high Scope-risk: narrow Directive: Keep future doc-only checkpoints surgically staged and do not add data/raw, external_smoke, /tmp outputs, or model artifacts Tested: Verified staged diff only includes AGENT memory, handoff, changelog, and changelist docs; confirmed /tmp cap48 seed=999 report is not ready yet Not-tested: The in-flight cap48 seed=999 benchmark result and any follow-up aggregate metrics
cnb.bofCdSsphPA authored -
Persist the current two-seed cap48 summary so the strategy recommendation is grounded in aggregated evidence rather than whichever single run happened most recently. Constraint: Only documentation changes are allowed because benchmark artifacts remain outside version control Rejected: Keep narrating cap48 one run at a time | The aggregate is now more informative than any individual cap48 run Confidence: high Scope-risk: narrow Directive: Prefer reporting aggregate seed statistics once two or more runs exist; avoid re-elevating single-seed claims above the aggregate Tested: Verified both cap48 report.json files; computed aggregate mean/min/max/stdev; verified docs now record high_energy mean_top1=0.9167 and hybrid mean_top1=0.8750 Not-tested: Aggregates beyond two seeds or style-bucketed aggregates
cnb.bofCdSsphPA authored -
Persist the completed seed123 benchmark showing hybrid ahead again, and update the strategy guidance from single-run winner claims to a multi-seed interpretation. Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control Rejected: Keep framing cap48 as a stable high_energy win | The second seed materially weakens that interpretation Confidence: high Scope-risk: narrow Directive: Base the hybrid vs high_energy default decision on aggregated multi-seed evidence, not any single cap48 run Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/report.json; verified high_energy eval.json; verified docs now record hybrid=24/0.9583/1.0 and high_energy=24/0.9167/1.0 for seed123 Not-tested: Formal aggregation across multiple seeds beyond these two cap48 runs
cnb.bofCdSsphPA authored -
Persist the newly finished cap48 seed123 hybrid result so the second-seed validation run now has measured evidence instead of only a runtime checkpoint. Constraint: seed123 high_energy and the final report are still pending Rejected: Wait for the full seed123 report before updating docs | Would leave the multi-seed evidence stale across sessions Confidence: high Scope-risk: narrow Directive: Replace the seed123 partial section with the final two-strategy ranking once high_energy eval and report.json land Tested: Verified /tmp/ab_smoke_seg_cap48_top2_seed123/hybrid/fma_reports_smoke/eval.json; verified docs record hybrid=24/0.9583/1.0 and high_energy still in build-index Not-tested: Final seed123 comparison because high_energy has not finished yet
cnb.bofCdSsphPA authored -
Update the handoff and changelog with the newer seed123 runtime milestone so later sessions know the hybrid lane has advanced from build-index into capped evaluation. Constraint: No measured seed123 score is available yet, only a later execution milestone Rejected: Leave the older build-index note in place | Would make the restart handoff stale and less actionable Confidence: high Scope-risk: narrow Directive: Replace the seed123 runtime note with measured scores as soon as hybrid eval.json or report.json land Tested: Verified active seed123 hybrid evaluate.py process; verified docs now record seed123 current phase as evaluate.py --max-queries 24 Not-tested: Seed123 strategy scores because hybrid eval.json has not landed yet
cnb.bofCdSsphPA authored
-