Lock the cap32 result and harden the hybrid default recommendation

Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset. Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority Confidence: high Scope-risk: narrow Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0 Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets

Lock the cap32 result and harden the hybrid default recommendation
Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset. Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority Confidence: high Scope-risk: narrow Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0 Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
cnb.bofCdSsphPA
Commit f05e7023 ... f05e70234cff85e3d77a9d273897f0e80d55cfa0 authored 2026-06-02 17:46:42 +0800 by cnb.bofCdSsphPA
Showing 3 changed files with 43 additions and 7 deletions
docs/CHANGELOG.md
docs/open-dataset-workflow.md
docs/session-handoff.md
--- a/docs/CHANGELOG.md
View file @f05e702
+++ b/docs/CHANGELOG.md
View file @f05e702
@@ -2,6 +2,27 @@
 ## 2026-06-02
+### Stage: 收尾 cap32 top2 真实 FMA 对照并稳定默认策略结论
+完成项：
+- 读取 `/tmp/ab_smoke_seg_cap32_top2/report.json`
+- 读取：
+  - `/tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json`
+  - `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json`
+- 更新：
+  - [open-dataset-workflow.md](./open-dataset-workflow.md)
+  - [session-handoff.md](./session-handoff.md)
+  - [CHANGELOG.md](./CHANGELOG.md)
+最终结果（subset=32, `max_test_queries=20`）：
+- `hybrid`: `num_queries=20`, `top1=0.95`, `topk=1.0`
+- `high_energy`: `num_queries=20`, `top1=0.5`, `topk=1.0`
+结论：
+- cap32 继续强化 cap24 的结论：`hybrid` 明显优于 `high_energy`
+- 当前默认训练 / query 策略可以稳定固定为 `hybrid`
+- `high_energy` 更适合作为专项对照策略，而非默认策略
 ### Stage: 启动 cap32 top2 真实 FMA 对照并记录运行阶段
 完成项：
--- a/docs/open-dataset-workflow.md
View file @f05e702
+++ b/docs/open-dataset-workflow.md
View file @f05e702
@@ -175,6 +175,20 @@ flowchart LR
 - `hybrid` 不只是“和 `high_energy` 打平”
 - 在更大的真实子集上，`hybrid` 的稳定性更强
 - 当前默认推荐应明确收敛到 **`hybrid`**
+### 更新：cap32 top2 对照（subset=32, `max_test_queries=20`）
+进一步扩大到 32 首真实 FMA 子集后，结论继续强化：
+| 排名 | 策略 | num_queries | top1 | topk |
+|---:|---|---:|---:|---:|
+| 1 | `hybrid` | 20 | 0.95 | 1.0 |
+| 2 | `high_energy` | 20 | 0.5 | 1.0 |
+这说明：
+- `hybrid` 在更大真实子集上仍明显领先
+- `high_energy` 虽然可作为高能区偏置策略，但稳定性不足以成为默认
+- 当前默认策略已经可以稳定写死为 **`hybrid`**
 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
 ```
--- a/docs/session-handoff.md
View file @f05e702
+++ b/docs/session-handoff.md
View file @f05e702
@@ -416,7 +416,7 @@ cap24 top2 最终结论：
 ---
-## 11. cap32 top2 对照实验（进行中）
+## 11. cap32 top2 对照实验（已完成）
 为了确认 cap24 的结论不是偶然，已继续启动更大的真实 FMA top2 对照：
@@ -436,15 +436,15 @@ cd /workspace/acr-engine
  --output-json /tmp/ab_smoke_seg_cap32_top2/report.json
 ```
-当前已确认的 fresh evidence：
+最终结果：
 | 项目 | 状态 |
 |---|---|
 | `subset_size` | `32` |
 | `max_test_queries` | `20` |
 | `hybrid` | `num_queries=20`, `top1=0.95`, `topk=1.0` |
-| `high_energy` | 训练中 |
+| `high_energy` | `num_queries=20`, `top1=0.5`, `topk=1.0` |
-| `report.json` | 尚未生成 |
+| `report.json` | 已生成 |
 恢复检查命令：
@@ -452,9 +452,10 @@ cd /workspace/acr-engine
 pgrep -af 'ab_smoke_seg_cap32_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap32_top2|evaluate.py --data /tmp/ab_smoke_seg_cap32_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap32_top2|train.py --data /tmp/ab_smoke_seg_cap32_top2'
 ```
-优先等待文件：
+cap32 top2 最终结论：
- `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json`
+- `hybrid`：`20 / 0.95 / 1.0`
- `/tmp/ab_smoke_seg_cap32_top2/report.json`
+- `high_energy`：`20 / 0.5 / 1.0`
+- cap24 与 cap32 两轮更大真实子集都指向同一结论：**默认策略固定为 `hybrid`**
 - `b766c74` Make open-dataset manifests trainable end to end
 - `fa23144` Add a single-page open dataset workflow for training prep
 - `af33be3` Condense docs and add manifest validation before training