Revise the default-strategy story after the cap48 reversal
Persist the larger 48-track benchmark where high_energy overtook hybrid, and downgrade the previously overconfident default-strategy claim to a conditional recommendation pending broader validation. Constraint: Only documentation changes are allowed because benchmark outputs remain outside version control Rejected: Keep asserting hybrid as fully settled default after cap48 | The 48-track capped benchmark materially contradicts that stronger claim Confidence: high Scope-risk: narrow Directive: Resolve the hybrid vs high_energy default question with larger, multi-seed, style-aware benchmarks before making a final hard default claim Tested: Verified /tmp/ab_smoke_seg_cap48_top2/report.json; verified high_energy eval.json; verified docs now record high_energy=24/0.9167/1.0 and hybrid=24/0.7917/1.0 Not-tested: Multi-seed or style-balanced follow-up benchmark beyond the single cap48 run
Showing
3 changed files
with
53 additions
and
7 deletions
| ... | @@ -2,6 +2,30 @@ | ... | @@ -2,6 +2,30 @@ |
| 2 | 2 | ||
| 3 | ## 2026-06-02 | 3 | ## 2026-06-02 |
| 4 | 4 | ||
| 5 | ### Stage: 收尾 cap48 top2 真实 FMA 对照并发现 high_energy 反超 | ||
| 6 | |||
| 7 | 完成项: | ||
| 8 | - 读取 `/tmp/ab_smoke_seg_cap48_top2/report.json` | ||
| 9 | - 读取: | ||
| 10 | - `/tmp/ab_smoke_seg_cap48_top2/hybrid/fma_reports_smoke/eval.json` | ||
| 11 | - `/tmp/ab_smoke_seg_cap48_top2/high_energy/fma_reports_smoke/eval.json` | ||
| 12 | - 更新: | ||
| 13 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 14 | - [session-handoff.md](./session-handoff.md) | ||
| 15 | - [CHANGELOG.md](./CHANGELOG.md) | ||
| 16 | |||
| 17 | 最终结果(subset=48, `max_test_queries=24`): | ||
| 18 | - `high_energy`: `num_queries=24`, `top1=0.9167`, `topk=1.0` | ||
| 19 | - `hybrid`: `num_queries=24`, `top1=0.7917`, `topk=1.0` | ||
| 20 | |||
| 21 | 结论: | ||
| 22 | - cap48 与 cap24 / cap32 给出了不同方向的结果 | ||
| 23 | - 这意味着“默认策略已经完全固定为 hybrid”的说法需要降级为**暂时性结论** | ||
| 24 | - 当前更稳妥的表述应是: | ||
| 25 | - `hybrid` 仍可保留为保守默认 | ||
| 26 | - `high_energy` 已成为必须严肃对待的强竞争方案 | ||
| 27 | - 下一步需要更大样本 / 多 seed / style-aware benchmark 再定最终默认 | ||
| 28 | |||
| 5 | ### Stage: 启动 cap48 top2 真实 FMA 对照并记录运行阶段 | 29 | ### Stage: 启动 cap48 top2 真实 FMA 对照并记录运行阶段 |
| 6 | 30 | ||
| 7 | 完成项: | 31 | 完成项: | ... | ... |
| ... | @@ -189,6 +189,23 @@ flowchart LR | ... | @@ -189,6 +189,23 @@ flowchart LR |
| 189 | - `hybrid` 在更大真实子集上仍明显领先 | 189 | - `hybrid` 在更大真实子集上仍明显领先 |
| 190 | - `high_energy` 虽然可作为高能区偏置策略,但稳定性不足以成为默认 | 190 | - `high_energy` 虽然可作为高能区偏置策略,但稳定性不足以成为默认 |
| 191 | - 当前默认策略已经可以稳定写死为 **`hybrid`** | 191 | - 当前默认策略已经可以稳定写死为 **`hybrid`** |
| 192 | |||
| 193 | ### 更新:cap48 top2 对照(subset=48, `max_test_queries=24`) | ||
| 194 | |||
| 195 | 继续扩大到 48 首真实 FMA 子集后,出现了**结果反转**: | ||
| 196 | |||
| 197 | | 排名 | 策略 | num_queries | top1 | topk | | ||
| 198 | |---:|---|---:|---:|---:| | ||
| 199 | | 1 | `high_energy` | 24 | 0.9167 | 1.0 | | ||
| 200 | | 2 | `hybrid` | 24 | 0.7917 | 1.0 | | ||
| 201 | |||
| 202 | 这轮结果说明: | ||
| 203 | - 前面 cap24 / cap32 支持 `hybrid` | ||
| 204 | - 但 cap48 上 `high_energy` 反超 | ||
| 205 | - 因此当前结论应从“默认策略已完全固定”调整为: | ||
| 206 | - **`hybrid` 仍是当前保守默认** | ||
| 207 | - **`high_energy` 已成为强竞争方案** | ||
| 208 | - 下一步必须做更大样本或多随机种子复核,不能只靠单轮 cap48 就完全改默认 | ||
| 192 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json | 209 | /usr/local/miniconda3/bin/python evaluate.py --data data/external_ingested/fma/manifests --model data/models_fma_smoke/best_model.pt --index-prefix data/index_fma_smoke/reference --split test --device cpu --fast-eval --output-json reports/fma-smoke/eval.json |
| 193 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local | 210 | /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/fma-smoke/eval.json --config-json reports/fma-smoke/config.json --output-dir reports/fma-smoke --model-version fma-smoke --data-version fma_local |
| 194 | ``` | 211 | ``` | ... | ... |
| ... | @@ -459,7 +459,7 @@ cap32 top2 最终结论: | ... | @@ -459,7 +459,7 @@ cap32 top2 最终结论: |
| 459 | 459 | ||
| 460 | --- | 460 | --- |
| 461 | 461 | ||
| 462 | ## 12. cap48 top2 对照实验(进行中) | 462 | ## 12. cap48 top2 对照实验(已完成) |
| 463 | 463 | ||
| 464 | 为继续扩展真实数据证据链,已启动更大的 FMA top2 对照: | 464 | 为继续扩展真实数据证据链,已启动更大的 FMA top2 对照: |
| 465 | 465 | ||
| ... | @@ -479,15 +479,15 @@ cd /workspace/acr-engine | ... | @@ -479,15 +479,15 @@ cd /workspace/acr-engine |
| 479 | --output-json /tmp/ab_smoke_seg_cap48_top2/report.json | 479 | --output-json /tmp/ab_smoke_seg_cap48_top2/report.json |
| 480 | ``` | 480 | ``` |
| 481 | 481 | ||
| 482 | 当前 fresh evidence: | 482 | 最终结果: |
| 483 | 483 | ||
| 484 | | 项目 | 状态 | | 484 | | 项目 | 状态 | |
| 485 | |---|---| | 485 | |---|---| |
| 486 | | `subset_size` | `48` | | 486 | | `subset_size` | `48` | |
| 487 | | `max_test_queries` | `24` | | 487 | | `max_test_queries` | `24` | |
| 488 | | `high_energy` | `num_queries=24`, `top1=0.9167`, `topk=1.0` | | ||
| 488 | | `hybrid` | `num_queries=24`, `top1=0.7917`, `topk=1.0` | | 489 | | `hybrid` | `num_queries=24`, `top1=0.7917`, `topk=1.0` | |
| 489 | | `high_energy` | `evaluate.py --max-queries 24` | | 490 | | `report.json` | 已生成 | |
| 490 | | `report.json` | 尚未生成 | | ||
| 491 | 491 | ||
| 492 | 恢复检查命令: | 492 | 恢复检查命令: |
| 493 | 493 | ||
| ... | @@ -495,9 +495,14 @@ cd /workspace/acr-engine | ... | @@ -495,9 +495,14 @@ cd /workspace/acr-engine |
| 495 | pgrep -af 'ab_smoke_seg_cap48_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap48_top2|evaluate.py --data /tmp/ab_smoke_seg_cap48_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap48_top2|train.py --data /tmp/ab_smoke_seg_cap48_top2' | 495 | pgrep -af 'ab_smoke_seg_cap48_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap48_top2|evaluate.py --data /tmp/ab_smoke_seg_cap48_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap48_top2|train.py --data /tmp/ab_smoke_seg_cap48_top2' |
| 496 | ``` | 496 | ``` |
| 497 | 497 | ||
| 498 | 优先等待文件: | 498 | cap48 top2 最终结论: |
| 499 | - `/tmp/ab_smoke_seg_cap48_top2/high_energy/fma_reports_smoke/eval.json` | 499 | - `high_energy`:`24 / 0.9167 / 1.0` |
| 500 | - `/tmp/ab_smoke_seg_cap48_top2/report.json` | 500 | - `hybrid`:`24 / 0.7917 / 1.0` |
| 501 | - 这轮结果与 cap24 / cap32 不一致,说明当前默认策略结论**还不能视为彻底封板** | ||
| 502 | - 下一步应优先做: | ||
| 503 | 1. 更大 subset(如 64+) | ||
| 504 | 2. 多 seed 复跑 | ||
| 505 | 3. style-aware bucket benchmark | ||
| 501 | - `b766c74` Make open-dataset manifests trainable end to end | 506 | - `b766c74` Make open-dataset manifests trainable end to end |
| 502 | - `fa23144` Add a single-page open dataset workflow for training prep | 507 | - `fa23144` Add a single-page open dataset workflow for training prep |
| 503 | - `af33be3` Condense docs and add manifest validation before training | 508 | - `af33be3` Condense docs and add manifest validation before training | ... | ... |
-
Please register or sign in to post a comment