Commit f05e7023 f05e70234cff85e3d77a9d273897f0e80d55cfa0 by cnb.bofCdSsphPA

Lock the cap32 result and harden the hybrid default recommendation

Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset.

Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control
Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority
Confidence: high
Scope-risk: narrow
Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them
Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0
Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
1 parent f228197d
...@@ -2,6 +2,27 @@ ...@@ -2,6 +2,27 @@
2 2
3 ## 2026-06-02 3 ## 2026-06-02
4 4
5 ### Stage: 收尾 cap32 top2 真实 FMA 对照并稳定默认策略结论
6
7 完成项:
8 - 读取 `/tmp/ab_smoke_seg_cap32_top2/report.json`
9 - 读取:
10 - `/tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json`
11 - `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json`
12 - 更新:
13 - [open-dataset-workflow.md](./open-dataset-workflow.md)
14 - [session-handoff.md](./session-handoff.md)
15 - [CHANGELOG.md](./CHANGELOG.md)
16
17 最终结果(subset=32, `max_test_queries=20`):
18 - `hybrid`: `num_queries=20`, `top1=0.95`, `topk=1.0`
19 - `high_energy`: `num_queries=20`, `top1=0.5`, `topk=1.0`
20
21 结论:
22 - cap32 继续强化 cap24 的结论:`hybrid` 明显优于 `high_energy`
23 - 当前默认训练 / query 策略可以稳定固定为 `hybrid`
24 - `high_energy` 更适合作为专项对照策略,而非默认策略
25
5 ### Stage: 启动 cap32 top2 真实 FMA 对照并记录运行阶段 26 ### Stage: 启动 cap32 top2 真实 FMA 对照并记录运行阶段
6 27
7 完成项: 28 完成项:
......
...@@ -175,6 +175,20 @@ flowchart LR ...@@ -175,6 +175,20 @@ flowchart LR
175 - `hybrid` 不只是“和 `high_energy` 打平” 175 - `hybrid` 不只是“和 `high_energy` 打平”
176 - 在更大的真实子集上,`hybrid` 的稳定性更强 176 - 在更大的真实子集上,`hybrid` 的稳定性更强
177 - 当前默认推荐应明确收敛到 **`hybrid`** 177 - 当前默认推荐应明确收敛到 **`hybrid`**
178
179 ### 更新:cap32 top2 对照(subset=32, `max_test_queries=20`)
180
181 进一步扩大到 32 首真实 FMA 子集后,结论继续强化:
182
183 | 排名 | 策略 | num_queries | top1 | topk |
184 |---:|---|---:|---:|---:|
185 | 1 | `hybrid` | 20 | 0.95 | 1.0 |
186 | 2 | `high_energy` | 20 | 0.5 | 1.0 |
187
188 这说明:
189 - `hybrid` 在更大真实子集上仍明显领先
190 - `high_energy` 虽然可作为高能区偏置策略,但稳定性不足以成为默认
191 - 当前默认策略已经可以稳定写死为 **`hybrid`**
178 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json 192 /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json
179 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local 193 /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local
180 ``` 194 ```
......
...@@ -416,7 +416,7 @@ cap24 top2 最终结论: ...@@ -416,7 +416,7 @@ cap24 top2 最终结论:
416 416
417 --- 417 ---
418 418
419 ## 11. cap32 top2 对照实验(进行中 419 ## 11. cap32 top2 对照实验(已完成
420 420
421 为了确认 cap24 的结论不是偶然,已继续启动更大的真实 FMA top2 对照: 421 为了确认 cap24 的结论不是偶然,已继续启动更大的真实 FMA top2 对照:
422 422
...@@ -436,15 +436,15 @@ cd /workspace/acr-engine ...@@ -436,15 +436,15 @@ cd /workspace/acr-engine
436 --output-json /tmp/ab_smoke_seg_cap32_top2/report.json 436 --output-json /tmp/ab_smoke_seg_cap32_top2/report.json
437 ``` 437 ```
438 438
439 当前已确认的 fresh evidence 439 最终结果
440 440
441 | 项目 | 状态 | 441 | 项目 | 状态 |
442 |---|---| 442 |---|---|
443 | `subset_size` | `32` | 443 | `subset_size` | `32` |
444 | `max_test_queries` | `20` | 444 | `max_test_queries` | `20` |
445 | `hybrid` | `num_queries=20`, `top1=0.95`, `topk=1.0` | 445 | `hybrid` | `num_queries=20`, `top1=0.95`, `topk=1.0` |
446 | `high_energy` | 训练中 | 446 | `high_energy` | `num_queries=20`, `top1=0.5`, `topk=1.0` |
447 | `report.json` | 尚未生成 | 447 | `report.json` | 生成 |
448 448
449 恢复检查命令: 449 恢复检查命令:
450 450
...@@ -452,9 +452,10 @@ cd /workspace/acr-engine ...@@ -452,9 +452,10 @@ cd /workspace/acr-engine
452 pgrep -af 'ab_smoke_seg_cap32_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap32_top2|evaluate.py --data /tmp/ab_smoke_seg_cap32_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap32_top2|train.py --data /tmp/ab_smoke_seg_cap32_top2' 452 pgrep -af 'ab_smoke_seg_cap32_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap32_top2|evaluate.py --data /tmp/ab_smoke_seg_cap32_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap32_top2|train.py --data /tmp/ab_smoke_seg_cap32_top2'
453 ``` 453 ```
454 454
455 优先等待文件: 455 cap32 top2 最终结论:
456 - `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json` 456 - `hybrid``20 / 0.95 / 1.0`
457 - `/tmp/ab_smoke_seg_cap32_top2/report.json` 457 - `high_energy``20 / 0.5 / 1.0`
458 - cap24 与 cap32 两轮更大真实子集都指向同一结论:**默认策略固定为 `hybrid`**
458 - `b766c74` Make open-dataset manifests trainable end to end 459 - `b766c74` Make open-dataset manifests trainable end to end
459 - `fa23144` Add a single-page open dataset workflow for training prep 460 - `fa23144` Add a single-page open dataset workflow for training prep
460 - `af33be3` Condense docs and add manifest validation before training 461 - `af33be3` Condense docs and add manifest validation before training
......