Promote hybrid to the default strategy using the stronger cap24 evidence

Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie. Constraint: Only docs change because benchmark outputs remain outside version control Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly Confidence: high Scope-risk: narrow Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0 Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset

Promote hybrid to the default strategy using the stronger cap24 evidence
Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie. Constraint: Only docs change because benchmark outputs remain outside version control Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly Confidence: high Scope-risk: narrow Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0 Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
cnb.bofCdSsphPA
Commit 08379e56 ... 08379e5636315c8ed958706c7288c302e8237d47 authored 2026-06-02 17:36:12 +0800 by cnb.bofCdSsphPA
Showing 3 changed files with 38 additions and 4 deletions
docs/CHANGELOG.md
docs/open-dataset-workflow.md
docs/session-handoff.md
--- a/docs/CHANGELOG.md
View file @08379e5
+++ b/docs/CHANGELOG.md
View file @08379e5
@@ -2,6 +2,25 @@
 ## 2026-06-02
+### Stage: 收尾 cap24 top2 真实 FMA 对照并确认默认策略
+完成项：
+- 读取 `/tmp/ab_smoke_seg_cap24_top2/report.json`
+- 读取 `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json`
+- 更新：
+  - [open-dataset-workflow.md](./open-dataset-workflow.md)
+  - [session-handoff.md](./session-handoff.md)
+  - [CHANGELOG.md](./CHANGELOG.md)
+最终结果（subset=24, `max_test_queries=16`）：
+- `hybrid`: `num_queries=16`, `top1=1.0`, `topk=1.0`
+- `high_energy`: `num_queries=16`, `top1=0.8125`, `topk=1.0`
+结论：
+- cap24 比 cap16 更有区分度，`hybrid` 不再只是与 `high_energy` 打平
+- 当前默认训练 / query 策略应明确固定为 `hybrid`
+- `high_energy` 更适合作为补充对照或偏高能区数据的次选策略
 ### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果
 完成项：
--- a/docs/open-dataset-workflow.md
View file @08379e5
+++ b/docs/open-dataset-workflow.md
View file @08379e5
@@ -161,6 +161,20 @@ flowchart LR
 - `high_energy` 是当前最强的并列次选，适合更偏主段/高能区的数据
 - `beat_aware` 更适合规则节拍较强的风格，但在这轮 FMA 子集上略弱
 - `repeated_section_aware` 单独使用不如混合策略稳
+### 更新：更大 cap24 top2 对照（subset=24, `max_test_queries=16`）
+在更大的真实 FMA 子集上，只保留前两名策略继续对照：
+| 排名 | 策略 | num_queries | top1 | topk |
+|---:|---|---:|---:|---:|
+| 1 | `hybrid` | 16 | 1.0 | 1.0 |
+| 2 | `high_energy` | 16 | 0.8125 | 1.0 |
+这轮结果比 cap16 更有区分度，说明：
+- `hybrid` 不只是“和 `high_energy` 打平”
+- 在更大的真实子集上，`hybrid` 的稳定性更强
+- 当前默认推荐应明确收敛到 **`hybrid`**
 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
 ```
--- a/docs/session-handoff.md
View file @08379e5
+++ b/docs/session-handoff.md
View file @08379e5
@@ -401,7 +401,7 @@ cd /workspace/acr-engine
 | 策略 | subset | max_test_queries | top1 | topk | 状态 |
 |---|---:|---:|---:|---:|---|
 | `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 |
-| `high_energy` | 24 | 16 | - | - | 训练中 |
+| `high_energy` | 24 | 16 | 0.8125 | 1.0 | 已完成 |
 恢复检查命令：
@@ -409,9 +409,10 @@ cd /workspace/acr-engine
 pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2'
 ```
-如果 `report.json` 尚未生成，优先等待：
+cap24 top2 最终结论：
- `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json`
+- `hybrid`：`16 / 1.0 / 1.0`
- `/tmp/ab_smoke_seg_cap24_top2/report.json`
+- `high_energy`：`16 / 0.8125 / 1.0`
+- 这个结果比 cap16 更能说明问题：**当前默认策略应明确固定为 `hybrid`**
 - `b766c74` Make open-dataset manifests trainable end to end
 - `fa23144` Add a single-page open dataset workflow for training prep
 - `af33be3` Condense docs and add manifest validation before training