Commit c659380d c659380d9cceff27e1fdde325b89191486ea1ad5 by cnb.bofCdSsphPA

Lock the final cap16 FMA benchmark ranking into the workflow docs

Persist the completed capped real-data benchmark results so future sessions can use the final strategy ordering and recommendation without replaying the run.

Constraint: Only documentation should change because benchmark artifacts live outside version control
Rejected: Leave the result only in /tmp report files | Would make the evidence fragile across sessions
Confidence: high
Scope-risk: narrow
Directive: Use cap16 as the current default evidence point until a larger capped benchmark supersedes it
Tested: Verified /tmp/ab_smoke_seg_cap16/report.json; verified repeated_section_aware eval.json; verified docs reflect final ranking hybrid/high_energy/beat_aware/repeated_section_aware
Not-tested: Larger real-dataset benchmark beyond the 16-track capped subset
1 parent 29c1962c
...@@ -2,6 +2,27 @@ ...@@ -2,6 +2,27 @@
2 2
3 ## 2026-06-02 3 ## 2026-06-02
4 4
5 ### Stage: 收尾 cap16 真实 FMA capped segmentation benchmark
6
7 完成项:
8 - 读取 `/tmp/ab_smoke_seg_cap16/report.json`
9 - 确认 `repeated_section_aware` 最终评测结果
10 - 更新:
11 - [open-dataset-workflow.md](./open-dataset-workflow.md)
12 - [session-handoff.md](./session-handoff.md)
13 - [CHANGELOG.md](./CHANGELOG.md)
14
15 最终结果(subset=16, `max_test_queries=12`):
16 - `hybrid`: `num_queries=12`, `top1=1.0`, `topk=1.0`
17 - `high_energy`: `num_queries=12`, `top1=1.0`, `topk=1.0`
18 - `beat_aware`: `num_queries=12`, `top1=0.9167`, `topk=1.0`
19 - `repeated_section_aware`: `num_queries=12`, `top1=0.8333`, `topk=1.0`
20
21 结论:
22 - 在固定 query 预算下,`hybrid` 仍是当前默认首选
23 - `high_energy` 是最强次选,并且与 `hybrid` 在这轮 cap16 上打平
24 - `beat_aware``repeated_section_aware` 单独使用时不如混合策略稳定
25
5 ### Stage: 交付当前切片 benchmark 续跑 handoff 26 ### Stage: 交付当前切片 benchmark 续跑 handoff
6 27
7 完成项: 28 完成项:
......
...@@ -144,6 +144,23 @@ flowchart LR ...@@ -144,6 +144,23 @@ flowchart LR
144 这一步的意义是: 144 这一步的意义是:
145 - 之前的 A/B 排名更偏“覆盖能力” 145 - 之前的 A/B 排名更偏“覆盖能力”
146 - 加上 cap 后,可以更公平地比较“同等 query 成本下的识别质量” 146 - 加上 cap 后,可以更公平地比较“同等 query 成本下的识别质量”
147
148 ### 最新真实 FMA capped 结果(subset=16, `max_test_queries=12`)
149
150 已完成一轮更公平的真实 FMA A/B:
151
152 | 排名 | 策略 | num_queries | top1 | topk |
153 |---:|---|---:|---:|---:|
154 | 1 | `hybrid` | 12 | 1.0 | 1.0 |
155 | 2 | `high_energy` | 12 | 1.0 | 1.0 |
156 | 3 | `beat_aware` | 12 | 0.9167 | 1.0 |
157 | 4 | `repeated_section_aware` | 12 | 0.8333 | 1.0 |
158
159 当前建议:
160 - **默认训练 / query 策略仍优先 `hybrid`**
161 - `high_energy` 是当前最强的并列次选,适合更偏主段/高能区的数据
162 - `beat_aware` 更适合规则节拍较强的风格,但在这轮 FMA 子集上略弱
163 - `repeated_section_aware` 单独使用不如混合策略稳
147 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json 164 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
148 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local 165 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
149 ``` 166 ```
......
...@@ -331,14 +331,14 @@ cd /workspace/acr-engine ...@@ -331,14 +331,14 @@ cd /workspace/acr-engine
331 --output-json /tmp/ab_smoke_seg_cap16/report.json 331 --output-json /tmp/ab_smoke_seg_cap16/report.json
332 ``` 332 ```
333 333
334 在本次交接时,已拿到的 partial result 334 在本次交接时,cap16 已完成,最终结果如下
335 335
336 | 策略 | num_queries | top1 | topk | 状态 | 336 | 策略 | num_queries | top1 | topk | 状态 |
337 |---|---:|---:|---:|---| 337 |---|---:|---:|---:|---|
338 | `hybrid` | 12 | 1.0 | 1.0 | 已完成 | 338 | `hybrid` | 12 | 1.0 | 1.0 | 已完成 |
339 | `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 |
340 | `high_energy` | 12 | 1.0 | 1.0 | 已完成 | 339 | `high_energy` | 12 | 1.0 | 1.0 | 已完成 |
341 | `repeated_section_aware` | - | - | - | 未开始/未完成 | 340 | `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 |
341 | `repeated_section_aware` | 12 | 0.8333 | 1.0 | 已完成 |
342 342
343 ### 重启后第一优先动作 343 ### 重启后第一优先动作
344 344
...@@ -348,10 +348,10 @@ cd /workspace/acr-engine ...@@ -348,10 +348,10 @@ cd /workspace/acr-engine
348 pgrep -af 'ab_smoke_seg_cap16|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap16|evaluate.py --data /tmp/ab_smoke_seg_cap16|run_demo.py build-index --data /tmp/ab_smoke_seg_cap16' 348 pgrep -af 'ab_smoke_seg_cap16|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap16|evaluate.py --data /tmp/ab_smoke_seg_cap16|run_demo.py build-index --data /tmp/ab_smoke_seg_cap16'
349 ``` 349 ```
350 350
351 2. 如果还在跑,等待 `/tmp/ab_smoke_seg_cap16/report.json` 351 2. 如果 `report.json` 已存在,优先读取并同步文档
352 3. 如果中断: 352 3. 如果中断:
353 - 保留已有 `/tmp/ab_smoke_seg_cap16/hybrid``/tmp/ab_smoke_seg_cap16/beat_aware` 结果作人工记录 353 - 保留已有 `/tmp/ab_smoke_seg_cap16/*` 结果作人工记录
354 - 重新跑剩余策略,或单独跑: 354 - 重新跑缺失策略,或单独跑:
355 355
356 ```bash 356 ```bash
357 cd /workspace/acr-engine 357 cd /workspace/acr-engine
...@@ -369,10 +369,10 @@ cd /workspace/acr-engine ...@@ -369,10 +369,10 @@ cd /workspace/acr-engine
369 --seed 42 369 --seed 42
370 ``` 370 ```
371 371
372 4. 完整结果出来后 372 4. 当前这轮 cap16 的最终建议已经形成
373 - 更新 [open-dataset-workflow.md](./open-dataset-workflow.md) 373 - 默认优先:`hybrid`
374 - 更新 [CHANGELOG.md](./CHANGELOG.md) 374 - 强次选:`high_energy`
375 - commit + push 375 - `beat_aware` / `repeated_section_aware` 更适合作为补充对照,而不是默认策略
376 - `b766c74` Make open-dataset manifests trainable end to end 376 - `b766c74` Make open-dataset manifests trainable end to end
377 - `fa23144` Add a single-page open dataset workflow for training prep 377 - `fa23144` Add a single-page open dataset workflow for training prep
378 - `af33be3` Condense docs and add manifest validation before training 378 - `af33be3` Condense docs and add manifest validation before training
......