Commit e49dc0b9 e49dc0b9de5fc6a4f0de81c2f4f0c386eee1469d by cnb.bofCdSsphPA

Record the cap64 reversal once the larger benchmark finished

Constraint: Strategy guidance must now reflect that cap48 and cap64 produce different winners under verified runs
Rejected: Keep high_energy as the generic default | The completed cap64 run shows hybrid winning clearly at a larger subset size, so the docs must acknowledge scale sensitivity
Confidence: high
Scope-risk: moderate
Directive: Do not present a single global default strategy again until bucketed and style-aware benchmarks explain the cap48/cap64 divergence
Tested: Verified cap64 report.json, progress.json, high_energy eval.json, and hybrid eval.json; confirmed cap64 winner=hybrid with top1 0.875 vs high_energy 0.625
Not-tested: Multi-seed cap64 aggregates, bucket/style-aware benchmarks, and any revised hybrid training design
1 parent 8f2e6016
## 2026-06-02 cap64 完结 checkpoint
完成项:
- cap64 真实 FMA 对照已完成。
- 已拿到 `high_energy``hybrid` 的最终评测结果与 winner。
最终结果(cap64, seed=42):
- `hybrid``num_queries=32, top1=0.8750, topk=1.0`
- `high_energy``num_queries=32, top1=0.6250, topk=1.0`
- winner:`hybrid`
结论:
- cap64 与 cap48 给出了不同结论:
- cap48 三 seed 下 `high_energy` 更稳且领先
- cap64 当前单 seed 下 `hybrid` 明显领先
- 这说明默认策略判断已经进入“依赖子集规模 / 数据构成”的阶段。
- 下一步必须进入:
- bucket/style-aware benchmark
- 更系统的 hard-case / genre bucket 评测
## 2026-06-02 cap64 hybrid 索引完成并进入评测 checkpoint
完成项:
......
......@@ -78,3 +78,5 @@ cd /workspace/acr-engine
- 已补充 cap64 新鲜证据:从运行会话确认 `hybrid``Epoch 1/1` 已完整跑完。
- 已补充 cap64 新鲜证据:`hybrid` reference index 完成(`64 refs / 657 windows / 192-d`)并进入 `evaluate.py`
- 已补齐 cap64 最终结果:`hybrid=0.875``high_energy=0.625`,winner=`hybrid`
......
......@@ -61,5 +61,6 @@ test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_se
- 新 benchmark:`/tmp/ab_smoke_seg_cap64_top2`
- 当前阶段:`high_energy` 已完成评测,结果为 `top1=0.625 / topk=1.0 / num_queries=32`
- 当前 `hybrid` 索引已完成,现处于 evaluate 阶段
- 下一 session 应优先检查 `hybrid` 结果与 `report.json` 是否生成
- cap64 已完成,结果:`hybrid=0.875`, `high_energy=0.625`
- cap64 winner=`hybrid`
- 下一 session 应优先进入 bucket/style-aware benchmark
......
......@@ -240,10 +240,10 @@
- `hybrid``mean_top1=0.8750, min=0.7917, max=0.9583, stdev=0.0680`
### 最优先待办
1. 跟进 cap64 的 `hybrid` 结果与最终 `/tmp/ab_smoke_seg_cap64_top2/report.json`
2. 在 cap64 完成后更新 `open-dataset-workflow.md / session-handoff.md / CHANGELOG.md`
3. 接着增加 bucket/style-aware benchmark
4. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性
1. 设计并启动 bucket/style-aware benchmark
2. 对比 cap48 与 cap64 的不一致现象,补充分规模结论
3. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性
4. 在新 benchmark 基线下继续提交与推送
### 续跑时不要做的事
- 不要 `git add .`
......@@ -675,8 +675,9 @@ seed123 最终结论:
- 已启动:`/tmp/ab_smoke_seg_cap64_top2`
- 配置:`subset_size=64`, `max_test_queries=32`, `seed=42`
- 当前最新证据:
- `high_energy` 已完成评测:`num_queries=32, top1=0.625, topk=1.0`
- `hybrid` reference index 已完成:`64 refs / 657 windows / 192-d`
- `hybrid` 当前已进入 `evaluate.py`
-`report.json` 尚未生成
- cap64 已完成:
- `hybrid``num_queries=32, top1=0.875, topk=1.0`
- `high_energy``num_queries=32, top1=0.625, topk=1.0`
- cap64 winner:`hybrid`
- 当前结论已进入“分子集规模不一致”阶段,必须继续做 bucket/style-aware benchmark
......