Commit f05e7023 f05e70234cff85e3d77a9d273897f0e80d55cfa0 by cnb.bofCdSsphPA

Lock the cap32 result and harden the hybrid default recommendation

Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset.

Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control
Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority
Confidence: high
Scope-risk: narrow
Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them
Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0
Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
1 parent f228197d
......@@ -2,6 +2,27 @@
## 2026-06-02
### Stage: 收尾 cap32 top2 真实 FMA 对照并稳定默认策略结论
完成项:
- 读取 `/tmp/ab_smoke_seg_cap32_top2/report.json`
- 读取:
- `/tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json`
- `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json`
- 更新:
- [open-dataset-workflow.md](./open-dataset-workflow.md)
- [session-handoff.md](./session-handoff.md)
- [CHANGELOG.md](./CHANGELOG.md)
最终结果(subset=32, `max_test_queries=20`):
- `hybrid`: `num_queries=20`, `top1=0.95`, `topk=1.0`
- `high_energy`: `num_queries=20`, `top1=0.5`, `topk=1.0`
结论:
- cap32 继续强化 cap24 的结论:`hybrid` 明显优于 `high_energy`
- 当前默认训练 / query 策略可以稳定固定为 `hybrid`
- `high_energy` 更适合作为专项对照策略,而非默认策略
### Stage: 启动 cap32 top2 真实 FMA 对照并记录运行阶段
完成项:
......
......@@ -175,6 +175,20 @@ flowchart LR
- `hybrid` 不只是“和 `high_energy` 打平”
- 在更大的真实子集上,`hybrid` 的稳定性更强
- 当前默认推荐应明确收敛到 **`hybrid`**
### 更新:cap32 top2 对照(subset=32, `max_test_queries=20`)
进一步扩大到 32 首真实 FMA 子集后,结论继续强化:
| 排名 | 策略 | num_queries | top1 | topk |
|---:|---|---:|---:|---:|
| 1 | `hybrid` | 20 | 0.95 | 1.0 |
| 2 | `high_energy` | 20 | 0.5 | 1.0 |
这说明:
- `hybrid` 在更大真实子集上仍明显领先
- `high_energy` 虽然可作为高能区偏置策略,但稳定性不足以成为默认
- 当前默认策略已经可以稳定写死为 **`hybrid`**
/usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
/usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
```
......
......@@ -416,7 +416,7 @@ cap24 top2 最终结论:
---
## 11. cap32 top2 对照实验(进行中
## 11. cap32 top2 对照实验(已完成
为了确认 cap24 的结论不是偶然,已继续启动更大的真实 FMA top2 对照:
......@@ -436,15 +436,15 @@ cd /workspace/acr-engine
--output-json /tmp/ab_smoke_seg_cap32_top2/report.json
```
当前已确认的 fresh evidence
最终结果
| 项目 | 状态 |
|---|---|
| `subset_size` | `32` |
| `max_test_queries` | `20` |
| `hybrid` | `num_queries=20`, `top1=0.95`, `topk=1.0` |
| `high_energy` | 训练中 |
| `report.json` | 尚未生成 |
| `high_energy` | `num_queries=20`, `top1=0.5`, `topk=1.0` |
| `report.json` | 生成 |
恢复检查命令:
......@@ -452,9 +452,10 @@ cd /workspace/acr-engine
pgrep -af 'ab_smoke_seg_cap32_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap32_top2|evaluate.py --data /tmp/ab_smoke_seg_cap32_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap32_top2|train.py --data /tmp/ab_smoke_seg_cap32_top2'
```
优先等待文件:
- `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json`
- `/tmp/ab_smoke_seg_cap32_top2/report.json`
cap32 top2 最终结论:
- `hybrid``20 / 0.95 / 1.0`
- `high_energy``20 / 0.5 / 1.0`
- cap24 与 cap32 两轮更大真实子集都指向同一结论:**默认策略固定为 `hybrid`**
- `b766c74` Make open-dataset manifests trainable end to end
- `fa23144` Add a single-page open dataset workflow for training prep
- `af33be3` Condense docs and add manifest validation before training
......