Lock the cap32 result and harden the hybrid default recommendation
Persist the larger 32-track benchmark showing hybrid strongly outperforming high_energy, so the default strategy decision rests on multiple larger real-data checkpoints instead of a single subset. Constraint: Only documentation changes are allowed because benchmark artifacts stay outside version control Rejected: Keep the default recommendation tentative after cap32 | The 24-track and 32-track capped benchmarks now agree on hybrid superiority Confidence: high Scope-risk: narrow Directive: Use cap24 and cap32 together as the current strongest strategy evidence until a broader multi-style benchmark supersedes them Tested: Verified /tmp/ab_smoke_seg_cap32_top2/report.json; verified high_energy eval.json; verified docs now record hybrid=20/0.95/1.0 and high_energy=20/0.5/1.0 Not-tested: Wider style-balanced benchmark beyond the FMA top-two subsets
Showing
3 changed files
with
43 additions
and
7 deletions
| ... | @@ -2,6 +2,27 @@ | ... | @@ -2,6 +2,27 @@ |
| 2 | 2 | ||
| 3 | ## 2026-06-02 | 3 | ## 2026-06-02 |
| 4 | 4 | ||
| 5 | ### Stage: 收尾 cap32 top2 真实 FMA 对照并稳定默认策略结论 | ||
| 6 | |||
| 7 | 完成项: | ||
| 8 | - 读取 `/tmp/ab_smoke_seg_cap32_top2/report.json` | ||
| 9 | - 读取: | ||
| 10 | - `/tmp/ab_smoke_seg_cap32_top2/hybrid/fma_reports_smoke/eval.json` | ||
| 11 | - `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json` | ||
| 12 | - 更新: | ||
| 13 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 14 | - [session-handoff.md](./session-handoff.md) | ||
| 15 | - [CHANGELOG.md](./CHANGELOG.md) | ||
| 16 | |||
| 17 | 最终结果(subset=32, `max_test_queries=20`): | ||
| 18 | - `hybrid`: `num_queries=20`, `top1=0.95`, `topk=1.0` | ||
| 19 | - `high_energy`: `num_queries=20`, `top1=0.5`, `topk=1.0` | ||
| 20 | |||
| 21 | 结论: | ||
| 22 | - cap32 继续强化 cap24 的结论:`hybrid` 明显优于 `high_energy` | ||
| 23 | - 当前默认训练 / query 策略可以稳定固定为 `hybrid` | ||
| 24 | - `high_energy` 更适合作为专项对照策略,而非默认策略 | ||
| 25 | |||
| 5 | ### Stage: 启动 cap32 top2 真实 FMA 对照并记录运行阶段 | 26 | ### Stage: 启动 cap32 top2 真实 FMA 对照并记录运行阶段 |
| 6 | 27 | ||
| 7 | 完成项: | 28 | 完成项: | ... | ... |
| ... | @@ -175,6 +175,20 @@ flowchart LR | ... | @@ -175,6 +175,20 @@ flowchart LR |
| 175 | - `hybrid` 不只是“和 `high_energy` 打平” | 175 | - `hybrid` 不只是“和 `high_energy` 打平” |
| 176 | - 在更大的真实子集上,`hybrid` 的稳定性更强 | 176 | - 在更大的真实子集上,`hybrid` 的稳定性更强 |
| 177 | - 当前默认推荐应明确收敛到 **`hybrid`** | 177 | - 当前默认推荐应明确收敛到 **`hybrid`** |
| 178 | |||
| 179 | ### 更新:cap32 top2 对照(subset=32, `max_test_queries=20`) | ||
| 180 | |||
| 181 | 进一步扩大到 32 首真实 FMA 子集后,结论继续强化: | ||
| 182 | |||
| 183 | | 排名 | 策略 | num_queries | top1 | topk | | ||
| 184 | |---:|---|---:|---:|---:| | ||
| 185 | | 1 | `hybrid` | 20 | 0.95 | 1.0 | | ||
| 186 | | 2 | `high_energy` | 20 | 0.5 | 1.0 | | ||
| 187 | |||
| 188 | 这说明: | ||
| 189 | - `hybrid` 在更大真实子集上仍明显领先 | ||
| 190 | - `high_energy` 虽然可作为高能区偏置策略,但稳定性不足以成为默认 | ||
| 191 | - 当前默认策略已经可以稳定写死为 **`hybrid`** | ||
| 178 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json | 192 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json |
| 179 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local | 193 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local |
| 180 | ``` | 194 | ``` | ... | ... |
| ... | @@ -416,7 +416,7 @@ cap24 top2 最终结论: | ... | @@ -416,7 +416,7 @@ cap24 top2 最终结论: |
| 416 | 416 | ||
| 417 | --- | 417 | --- |
| 418 | 418 | ||
| 419 | ## 11. cap32 top2 对照实验(进行中) | 419 | ## 11. cap32 top2 对照实验(已完成) |
| 420 | 420 | ||
| 421 | 为了确认 cap24 的结论不是偶然,已继续启动更大的真实 FMA top2 对照: | 421 | 为了确认 cap24 的结论不是偶然,已继续启动更大的真实 FMA top2 对照: |
| 422 | 422 | ||
| ... | @@ -436,15 +436,15 @@ cd /workspace/acr-engine | ... | @@ -436,15 +436,15 @@ cd /workspace/acr-engine |
| 436 | --output-json /tmp/ab_smoke_seg_cap32_top2/report.json | 436 | --output-json /tmp/ab_smoke_seg_cap32_top2/report.json |
| 437 | ``` | 437 | ``` |
| 438 | 438 | ||
| 439 | 当前已确认的 fresh evidence: | 439 | 最终结果: |
| 440 | 440 | ||
| 441 | | 项目 | 状态 | | 441 | | 项目 | 状态 | |
| 442 | |---|---| | 442 | |---|---| |
| 443 | | `subset_size` | `32` | | 443 | | `subset_size` | `32` | |
| 444 | | `max_test_queries` | `20` | | 444 | | `max_test_queries` | `20` | |
| 445 | | `hybrid` | `num_queries=20`, `top1=0.95`, `topk=1.0` | | 445 | | `hybrid` | `num_queries=20`, `top1=0.95`, `topk=1.0` | |
| 446 | | `high_energy` | 训练中 | | 446 | | `high_energy` | `num_queries=20`, `top1=0.5`, `topk=1.0` | |
| 447 | | `report.json` | 尚未生成 | | 447 | | `report.json` | 已生成 | |
| 448 | 448 | ||
| 449 | 恢复检查命令: | 449 | 恢复检查命令: |
| 450 | 450 | ||
| ... | @@ -452,9 +452,10 @@ cd /workspace/acr-engine | ... | @@ -452,9 +452,10 @@ cd /workspace/acr-engine |
| 452 | pgrep -af 'ab_smoke_seg_cap32_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap32_top2|evaluate.py --data /tmp/ab_smoke_seg_cap32_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap32_top2|train.py --data /tmp/ab_smoke_seg_cap32_top2' | 452 | pgrep -af 'ab_smoke_seg_cap32_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap32_top2|evaluate.py --data /tmp/ab_smoke_seg_cap32_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap32_top2|train.py --data /tmp/ab_smoke_seg_cap32_top2' |
| 453 | ``` | 453 | ``` |
| 454 | 454 | ||
| 455 | 优先等待文件: | 455 | cap32 top2 最终结论: |
| 456 | - `/tmp/ab_smoke_seg_cap32_top2/high_energy/fma_reports_smoke/eval.json` | 456 | - `hybrid`:`20 / 0.95 / 1.0` |
| 457 | - `/tmp/ab_smoke_seg_cap32_top2/report.json` | 457 | - `high_energy`:`20 / 0.5 / 1.0` |
| 458 | - cap24 与 cap32 两轮更大真实子集都指向同一结论:**默认策略固定为 `hybrid`** | ||
| 458 | - `b766c74` Make open-dataset manifests trainable end to end | 459 | - `b766c74` Make open-dataset manifests trainable end to end |
| 459 | - `fa23144` Add a single-page open dataset workflow for training prep | 460 | - `fa23144` Add a single-page open dataset workflow for training prep |
| 460 | - `af33be3` Condense docs and add manifest validation before training | 461 | - `af33be3` Condense docs and add manifest validation before training | ... | ... |
-
Please register or sign in to post a comment