Commit 08379e56 08379e5636315c8ed958706c7288c302e8237d47 by cnb.bofCdSsphPA

Promote hybrid to the default strategy using the stronger cap24 evidence

Persist the larger real-FMA benchmark result showing hybrid clearly outperforming high_energy, so the project recommendation can converge on one default instead of an unresolved tie.

Constraint: Only docs change because benchmark outputs remain outside version control
Rejected: Keep treating hybrid and high_energy as co-equal defaults | The larger 24-track capped benchmark now separates them clearly
Confidence: high
Scope-risk: narrow
Directive: Use cap24 top-two as the current strongest public evidence until a larger capped benchmark supersedes it
Tested: Verified /tmp/ab_smoke_seg_cap24_top2/report.json; verified high_energy eval.json; verified docs now state hybrid=16/1.0/1.0 and high_energy=16/0.8125/1.0
Not-tested: Broader strategy comparison beyond hybrid vs high_energy on the 24-track subset
1 parent 48a5957a
......@@ -2,6 +2,25 @@
## 2026-06-02
### Stage: 收尾 cap24 top2 真实 FMA 对照并确认默认策略
完成项:
- 读取 `/tmp/ab_smoke_seg_cap24_top2/report.json`
- 读取 `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json`
- 更新:
- [open-dataset-workflow.md](./open-dataset-workflow.md)
- [session-handoff.md](./session-handoff.md)
- [CHANGELOG.md](./CHANGELOG.md)
最终结果(subset=24, `max_test_queries=16`):
- `hybrid`: `num_queries=16`, `top1=1.0`, `topk=1.0`
- `high_energy`: `num_queries=16`, `top1=0.8125`, `topk=1.0`
结论:
- cap24 比 cap16 更有区分度,`hybrid` 不再只是与 `high_energy` 打平
- 当前默认训练 / query 策略应明确固定为 `hybrid`
- `high_energy` 更适合作为补充对照或偏高能区数据的次选策略
### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果
完成项:
......
......@@ -161,6 +161,20 @@ flowchart LR
- `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据
- `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱
- `repeated_section_aware` 单独使用不如混合策略稳
### 更新:更大 cap24 top2 对照(subset=24, `max_test_queries=16`)
在更大的真实 FMA 子集上,只保留前两名策略继续对照:
| 排名 | 策略 | num_queries | top1 | topk |
|---:|---|---:|---:|---:|
| 1 | `hybrid` | 16 | 1.0 | 1.0 |
| 2 | `high_energy` | 16 | 0.8125 | 1.0 |
这轮结果比 cap16 更有区分度,说明:
- `hybrid` 不只是“和 `high_energy` 打平”
- 在更大的真实子集上,`hybrid` 的稳定性更强
- 当前默认推荐应明确收敛到 **`hybrid`**
/usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
/usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
```
......
......@@ -401,7 +401,7 @@ cd /workspace/acr-engine
| 策略 | subset | max_test_queries | top1 | topk | 状态 |
|---|---:|---:|---:|---:|---|
| `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 |
| `high_energy` | 24 | 16 | - | - | 训练中 |
| `high_energy` | 24 | 16 | 0.8125 | 1.0 | 已完成 |
恢复检查命令:
......@@ -409,9 +409,10 @@ cd /workspace/acr-engine
pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2'
```
如果 `report.json` 尚未生成,优先等待:
- `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json`
- `/tmp/ab_smoke_seg_cap24_top2/report.json`
cap24 top2 最终结论:
- `hybrid``16 / 1.0 / 1.0`
- `high_energy``16 / 0.8125 / 1.0`
- 这个结果比 cap16 更能说明问题:**当前默认策略应明确固定为 `hybrid`**
- `b766c74` Make open-dataset manifests trainable end to end
- `fa23144` Add a single-page open dataset workflow for training prep
- `af33be3` Condense docs and add manifest validation before training
......