Commit c659380d c659380d9cceff27e1fdde325b89191486ea1ad5 by cnb.bofCdSsphPA

Lock the final cap16 FMA benchmark ranking into the workflow docs

Persist the completed capped real-data benchmark results so future sessions can use the final strategy ordering and recommendation without replaying the run.

Constraint: Only documentation should change because benchmark artifacts live outside version control
Rejected: Leave the result only in /tmp report files | Would make the evidence fragile across sessions
Confidence: high
Scope-risk: narrow
Directive: Use cap16 as the current default evidence point until a larger capped benchmark supersedes it
Tested: Verified /tmp/ab_smoke_seg_cap16/report.json; verified repeated_section_aware eval.json; verified docs reflect final ranking hybrid/high_energy/beat_aware/repeated_section_aware
Not-tested: Larger real-dataset benchmark beyond the 16-track capped subset
1 parent 29c1962c
......@@ -2,6 +2,27 @@
## 2026-06-02
### Stage: 收尾 cap16 真实 FMA capped segmentation benchmark
完成项:
- 读取 `/tmp/ab_smoke_seg_cap16/report.json`
- 确认 `repeated_section_aware` 最终评测结果
- 更新:
- [open-dataset-workflow.md](./open-dataset-workflow.md)
- [session-handoff.md](./session-handoff.md)
- [CHANGELOG.md](./CHANGELOG.md)
最终结果(subset=16, `max_test_queries=12`):
- `hybrid`: `num_queries=12`, `top1=1.0`, `topk=1.0`
- `high_energy`: `num_queries=12`, `top1=1.0`, `topk=1.0`
- `beat_aware`: `num_queries=12`, `top1=0.9167`, `topk=1.0`
- `repeated_section_aware`: `num_queries=12`, `top1=0.8333`, `topk=1.0`
结论:
- 在固定 query 预算下,`hybrid` 仍是当前默认首选
- `high_energy` 是最强次选,并且与 `hybrid` 在这轮 cap16 上打平
- `beat_aware``repeated_section_aware` 单独使用时不如混合策略稳定
### Stage: 交付当前切片 benchmark 续跑 handoff
完成项:
......
......@@ -144,6 +144,23 @@ flowchart LR
这一步的意义是:
- 之前的 A/B 排名更偏“覆盖能力”
- 加上 cap 后,可以更公平地比较“同等 query 成本下的识别质量”
### 最新真实 FMA capped 结果(subset=16, `max_test_queries=12`)
已完成一轮更公平的真实 FMA A/B:
| 排名 | 策略 | num_queries | top1 | topk |
|---:|---|---:|---:|---:|
| 1 | `hybrid` | 12 | 1.0 | 1.0 |
| 2 | `high_energy` | 12 | 1.0 | 1.0 |
| 3 | `beat_aware` | 12 | 0.9167 | 1.0 |
| 4 | `repeated_section_aware` | 12 | 0.8333 | 1.0 |
当前建议:
- **默认训练 / query 策略仍优先 `hybrid`**
- `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据
- `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱
- `repeated_section_aware` 单独使用不如混合策略稳
/usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
/usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
```
......
......@@ -331,14 +331,14 @@ cd /workspace/acr-engine
--output-json /tmp/ab_smoke_seg_cap16/report.json
```
在本次交接时,已拿到的 partial result
在本次交接时,cap16 已完成,最终结果如下
| 策略 | num_queries | top1 | topk | 状态 |
|---|---:|---:|---:|---|
| `hybrid` | 12 | 1.0 | 1.0 | 已完成 |
| `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 |
| `high_energy` | 12 | 1.0 | 1.0 | 已完成 |
| `repeated_section_aware` | - | - | - | 未开始/未完成 |
| `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 |
| `repeated_section_aware` | 12 | 0.8333 | 1.0 | 已完成 |
### 重启后第一优先动作
......@@ -348,10 +348,10 @@ cd /workspace/acr-engine
pgrep -af 'ab_smoke_seg_cap16|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap16|evaluate.py --data /tmp/ab_smoke_seg_cap16|run_demo.py build-index --data /tmp/ab_smoke_seg_cap16'
```
2. 如果还在跑,等待 `/tmp/ab_smoke_seg_cap16/report.json`
2. 如果 `report.json` 已存在,优先读取并同步文档
3. 如果中断:
- 保留已有 `/tmp/ab_smoke_seg_cap16/hybrid``/tmp/ab_smoke_seg_cap16/beat_aware` 结果作人工记录
- 重新跑剩余策略,或单独跑:
- 保留已有 `/tmp/ab_smoke_seg_cap16/*` 结果作人工记录
- 重新跑缺失策略,或单独跑:
```bash
cd /workspace/acr-engine
......@@ -369,10 +369,10 @@ cd /workspace/acr-engine
--seed 42
```
4. 完整结果出来后
- 更新 [open-dataset-workflow.md](./open-dataset-workflow.md)
- 更新 [CHANGELOG.md](./CHANGELOG.md)
- commit + push
4. 当前这轮 cap16 的最终建议已经形成
- 默认优先:`hybrid`
- 强次选:`high_energy`
- `beat_aware` / `repeated_section_aware` 更适合作为补充对照,而不是默认策略
- `b766c74` Make open-dataset manifests trainable end to end
- `fa23144` Add a single-page open dataset workflow for training prep
- `af33be3` Condense docs and add manifest validation before training
......