Commit ef9b24f8 ef9b24f83aa45d9ac32f0f9096b6b991d706a9bb by cnb.bofCdSsphPA

Preserve proof that the cap64 benchmark has started before it finishes

Constraint: The new cap64 run is still in-flight, so only startup and stage-transition evidence can be documented safely
Rejected: Wait for cap64 results before checkpointing | Would leave the next session without a verified handoff that the larger benchmark is already running
Confidence: high
Scope-risk: narrow
Directive: Keep cap64 artifacts out of git and update strategy guidance only after report.json lands
Tested: Verified the cap64 ab_smoke process is running, confirmed the high_energy smoke-local branch entered train.py on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests, and recorded the active work root and parameters in docs
Not-tested: Final cap64 metrics, hybrid branch execution, and any post-cap64 strategy conclusion
1 parent d1f13203
1 ## 2026-06-02 cap64 benchmark 启动 checkpoint
2
3 完成项:
4 - 已启动新的真实 FMA `cap64` 对照 benchmark。
5 - 本轮配置:`subset_size=64`, `max_test_queries=32`, `seed=42`
6 - 当前已确认流程进入 `high_energy` 训练阶段。
7
8 验证证据:
9 - 主进程:`scripts/ab_smoke_segmentation.py --work-root /tmp/ab_smoke_seg_cap64_top2`
10 - 子流程:`external_adapters.py smoke-local ... /tmp/ab_smoke_seg_cap64_top2/high_energy`
11 - 当前训练:`train.py --data /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests ... --segment-strategy high_energy`
12
13 说明:
14 - 截至本 checkpoint,cap64 结果尚未生成。
15 - 本次提交的目的是为下一 session 固化新一轮 benchmark 已正式启动的证据。
16
1 ## 2026-06-02 cap48 seed999 完结与三 seed 聚合 checkpoint 17 ## 2026-06-02 cap48 seed999 完结与三 seed 聚合 checkpoint
2 18
3 完成项: 19 完成项:
......
...@@ -64,3 +64,5 @@ cd /workspace/acr-engine ...@@ -64,3 +64,5 @@ cd /workspace/acr-engine
64 - 已补记 `hybrid` seed=999 的中间结果:`top1=0.875 / topk=1.0 / num_queries=24` 64 - 已补记 `hybrid` seed=999 的中间结果:`top1=0.875 / topk=1.0 / num_queries=24`
65 65
66 - 已补齐 `seed=999` 最终结果,并完成 cap48 三 seed aggregate 归纳。 66 - 已补齐 `seed=999` 最终结果,并完成 cap48 三 seed aggregate 归纳。
67
68 - 已记录 cap64 benchmark 已启动,并确认进入 `high_energy` 训练阶段。
......
...@@ -56,3 +56,9 @@ pgrep -af 'ab_smoke_seg_cap48_top2_seed999|external_adapters.py smoke-local fma ...@@ -56,3 +56,9 @@ pgrep -af 'ab_smoke_seg_cap48_top2_seed999|external_adapters.py smoke-local fma
56 ```bash 56 ```bash
57 test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_seg_cap48_top2_seed999/report.json || echo NO_REPORT 57 test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_seg_cap48_top2_seed999/report.json || echo NO_REPORT
58 ``` 58 ```
59
60 ## 下一轮已启动
61
62 - 新 benchmark:`/tmp/ab_smoke_seg_cap64_top2`
63 - 当前阶段:`high_energy` 训练中
64 - 下一 session 应优先检查 `report.json` 是否生成
......
...@@ -240,10 +240,10 @@ ...@@ -240,10 +240,10 @@
240 - `hybrid``mean_top1=0.8750, min=0.7917, max=0.9583, stdev=0.0680` 240 - `hybrid``mean_top1=0.8750, min=0.7917, max=0.9583, stdev=0.0680`
241 241
242 ### 最优先待办 242 ### 最优先待办
243 1. 基于 3-seed 结果继续设计 cap64 benchmark 243 1. 跟进正在运行的 cap64 benchmark:`/tmp/ab_smoke_seg_cap64_top2/report.json`
244 2. 增加 bucket/style-aware benchmark 244 2. 在 cap64 完成后更新 `open-dataset-workflow.md / session-handoff.md / CHANGELOG.md`
245 3. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性 245 3. 接着增加 bucket/style-aware benchmark
246 4. 提交并推送后继续下一轮验证 246 4. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性
247 247
248 ### 续跑时不要做的事 248 ### 续跑时不要做的事
249 - 不要 `git add .` 249 - 不要 `git add .`
...@@ -669,3 +669,12 @@ seed123 最终结论: ...@@ -669,3 +669,12 @@ seed123 最终结论:
669 - `hybrid` 上限更高但波动更大 669 - `hybrid` 上限更高但波动更大
670 - 最终默认策略要看更多 seed 聚合结果 670 - 最终默认策略要看更多 seed 聚合结果
671 671
672
673 ## 100. 新一轮验证已启动:cap64
674
675 - 已启动:`/tmp/ab_smoke_seg_cap64_top2`
676 - 配置:`subset_size=64`, `max_test_queries=32`, `seed=42`
677 - 当前最新证据:
678 - 已进入 `high_energy``train.py` 阶段
679 - 尚未产出最终 `report.json`
680
......