Lock the final cap16 FMA benchmark ranking into the workflow docs
Persist the completed capped real-data benchmark results so future sessions can use the final strategy ordering and recommendation without replaying the run. Constraint: Only documentation should change because benchmark artifacts live outside version control Rejected: Leave the result only in /tmp report files | Would make the evidence fragile across sessions Confidence: high Scope-risk: narrow Directive: Use cap16 as the current default evidence point until a larger capped benchmark supersedes it Tested: Verified /tmp/ab_smoke_seg_cap16/report.json; verified repeated_section_aware eval.json; verified docs reflect final ranking hybrid/high_energy/beat_aware/repeated_section_aware Not-tested: Larger real-dataset benchmark beyond the 16-track capped subset
Showing
3 changed files
with
48 additions
and
10 deletions
| ... | @@ -2,6 +2,27 @@ | ... | @@ -2,6 +2,27 @@ |
| 2 | 2 | ||
| 3 | ## 2026-06-02 | 3 | ## 2026-06-02 |
| 4 | 4 | ||
| 5 | ### Stage: 收尾 cap16 真实 FMA capped segmentation benchmark | ||
| 6 | |||
| 7 | 完成项: | ||
| 8 | - 读取 `/tmp/ab_smoke_seg_cap16/report.json` | ||
| 9 | - 确认 `repeated_section_aware` 最终评测结果 | ||
| 10 | - 更新: | ||
| 11 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 12 | - [session-handoff.md](./session-handoff.md) | ||
| 13 | - [CHANGELOG.md](./CHANGELOG.md) | ||
| 14 | |||
| 15 | 最终结果(subset=16, `max_test_queries=12`): | ||
| 16 | - `hybrid`: `num_queries=12`, `top1=1.0`, `topk=1.0` | ||
| 17 | - `high_energy`: `num_queries=12`, `top1=1.0`, `topk=1.0` | ||
| 18 | - `beat_aware`: `num_queries=12`, `top1=0.9167`, `topk=1.0` | ||
| 19 | - `repeated_section_aware`: `num_queries=12`, `top1=0.8333`, `topk=1.0` | ||
| 20 | |||
| 21 | 结论: | ||
| 22 | - 在固定 query 预算下,`hybrid` 仍是当前默认首选 | ||
| 23 | - `high_energy` 是最强次选,并且与 `hybrid` 在这轮 cap16 上打平 | ||
| 24 | - `beat_aware` 与 `repeated_section_aware` 单独使用时不如混合策略稳定 | ||
| 25 | |||
| 5 | ### Stage: 交付当前切片 benchmark 续跑 handoff | 26 | ### Stage: 交付当前切片 benchmark 续跑 handoff |
| 6 | 27 | ||
| 7 | 完成项: | 28 | 完成项: | ... | ... |
| ... | @@ -144,6 +144,23 @@ flowchart LR | ... | @@ -144,6 +144,23 @@ flowchart LR |
| 144 | 这一步的意义是: | 144 | 这一步的意义是: |
| 145 | - 之前的 A/B 排名更偏“覆盖能力” | 145 | - 之前的 A/B 排名更偏“覆盖能力” |
| 146 | - 加上 cap 后,可以更公平地比较“同等 query 成本下的识别质量” | 146 | - 加上 cap 后,可以更公平地比较“同等 query 成本下的识别质量” |
| 147 | |||
| 148 | ### 最新真实 FMA capped 结果(subset=16, `max_test_queries=12`) | ||
| 149 | |||
| 150 | 已完成一轮更公平的真实 FMA A/B: | ||
| 151 | |||
| 152 | | 排名 | 策略 | num_queries | top1 | topk | | ||
| 153 | |---:|---|---:|---:|---:| | ||
| 154 | | 1 | `hybrid` | 12 | 1.0 | 1.0 | | ||
| 155 | | 2 | `high_energy` | 12 | 1.0 | 1.0 | | ||
| 156 | | 3 | `beat_aware` | 12 | 0.9167 | 1.0 | | ||
| 157 | | 4 | `repeated_section_aware` | 12 | 0.8333 | 1.0 | | ||
| 158 | |||
| 159 | 当前建议: | ||
| 160 | - **默认训练 / query 策略仍优先 `hybrid`** | ||
| 161 | - `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据 | ||
| 162 | - `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱 | ||
| 163 | - `repeated_section_aware` 单独使用不如混合策略稳 | ||
| 147 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json | 164 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json |
| 148 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local | 165 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local |
| 149 | ``` | 166 | ``` | ... | ... |
| ... | @@ -331,14 +331,14 @@ cd /workspace/acr-engine | ... | @@ -331,14 +331,14 @@ cd /workspace/acr-engine |
| 331 | --output-json /tmp/ab_smoke_seg_cap16/report.json | 331 | --output-json /tmp/ab_smoke_seg_cap16/report.json |
| 332 | ``` | 332 | ``` |
| 333 | 333 | ||
| 334 | 在本次交接时,已拿到的 partial result: | 334 | 在本次交接时,cap16 已完成,最终结果如下: |
| 335 | 335 | ||
| 336 | | 策略 | num_queries | top1 | topk | 状态 | | 336 | | 策略 | num_queries | top1 | topk | 状态 | |
| 337 | |---|---:|---:|---:|---| | 337 | |---|---:|---:|---:|---| |
| 338 | | `hybrid` | 12 | 1.0 | 1.0 | 已完成 | | 338 | | `hybrid` | 12 | 1.0 | 1.0 | 已完成 | |
| 339 | | `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 | | ||
| 340 | | `high_energy` | 12 | 1.0 | 1.0 | 已完成 | | 339 | | `high_energy` | 12 | 1.0 | 1.0 | 已完成 | |
| 341 | | `repeated_section_aware` | - | - | - | 未开始/未完成 | | 340 | | `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 | |
| 341 | | `repeated_section_aware` | 12 | 0.8333 | 1.0 | 已完成 | | ||
| 342 | 342 | ||
| 343 | ### 重启后第一优先动作 | 343 | ### 重启后第一优先动作 |
| 344 | 344 | ||
| ... | @@ -348,10 +348,10 @@ cd /workspace/acr-engine | ... | @@ -348,10 +348,10 @@ cd /workspace/acr-engine |
| 348 | pgrep -af 'ab_smoke_seg_cap16|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap16|evaluate.py --data /tmp/ab_smoke_seg_cap16|run_demo.py build-index --data /tmp/ab_smoke_seg_cap16' | 348 | pgrep -af 'ab_smoke_seg_cap16|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap16|evaluate.py --data /tmp/ab_smoke_seg_cap16|run_demo.py build-index --data /tmp/ab_smoke_seg_cap16' |
| 349 | ``` | 349 | ``` |
| 350 | 350 | ||
| 351 | 2. 如果还在跑,等待 `/tmp/ab_smoke_seg_cap16/report.json` | 351 | 2. 如果 `report.json` 已存在,优先读取并同步文档 |
| 352 | 3. 如果中断: | 352 | 3. 如果中断: |
| 353 | - 保留已有 `/tmp/ab_smoke_seg_cap16/hybrid`、`/tmp/ab_smoke_seg_cap16/beat_aware` 结果作人工记录 | 353 | - 保留已有 `/tmp/ab_smoke_seg_cap16/*` 结果作人工记录 |
| 354 | - 重新跑剩余策略,或单独跑: | 354 | - 重新跑缺失策略,或单独跑: |
| 355 | 355 | ||
| 356 | ```bash | 356 | ```bash |
| 357 | cd /workspace/acr-engine | 357 | cd /workspace/acr-engine |
| ... | @@ -369,10 +369,10 @@ cd /workspace/acr-engine | ... | @@ -369,10 +369,10 @@ cd /workspace/acr-engine |
| 369 | --seed 42 | 369 | --seed 42 |
| 370 | ``` | 370 | ``` |
| 371 | 371 | ||
| 372 | 4. 完整结果出来后: | 372 | 4. 当前这轮 cap16 的最终建议已经形成: |
| 373 | - 更新 [open-dataset-workflow.md](./open-dataset-workflow.md) | 373 | - 默认优先:`hybrid` |
| 374 | - 更新 [CHANGELOG.md](./CHANGELOG.md) | 374 | - 强次选:`high_energy` |
| 375 | - commit + push | 375 | - `beat_aware` / `repeated_section_aware` 更适合作为补充对照,而不是默认策略 |
| 376 | - `b766c74` Make open-dataset manifests trainable end to end | 376 | - `b766c74` Make open-dataset manifests trainable end to end |
| 377 | - `fa23144` Add a single-page open dataset workflow for training prep | 377 | - `fa23144` Add a single-page open dataset workflow for training prep |
| 378 | - `af33be3` Condense docs and add manifest validation before training | 378 | - `af33be3` Condense docs and add manifest validation before training | ... | ... |
-
Please register or sign in to post a comment