Preserve proof that the cap64 benchmark has started before it finishes
Constraint: The new cap64 run is still in-flight, so only startup and stage-transition evidence can be documented safely Rejected: Wait for cap64 results before checkpointing | Would leave the next session without a verified handoff that the larger benchmark is already running Confidence: high Scope-risk: narrow Directive: Keep cap64 artifacts out of git and update strategy guidance only after report.json lands Tested: Verified the cap64 ab_smoke process is running, confirmed the high_energy smoke-local branch entered train.py on /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests, and recorded the active work root and parameters in docs Not-tested: Final cap64 metrics, hybrid branch execution, and any post-cap64 strategy conclusion
Showing
4 changed files
with
37 additions
and
4 deletions
| 1 | ## 2026-06-02 cap64 benchmark 启动 checkpoint | ||
| 2 | |||
| 3 | 完成项: | ||
| 4 | - 已启动新的真实 FMA `cap64` 对照 benchmark。 | ||
| 5 | - 本轮配置:`subset_size=64`, `max_test_queries=32`, `seed=42`。 | ||
| 6 | - 当前已确认流程进入 `high_energy` 训练阶段。 | ||
| 7 | |||
| 8 | 验证证据: | ||
| 9 | - 主进程:`scripts/ab_smoke_segmentation.py --work-root /tmp/ab_smoke_seg_cap64_top2` | ||
| 10 | - 子流程:`external_adapters.py smoke-local ... /tmp/ab_smoke_seg_cap64_top2/high_energy` | ||
| 11 | - 当前训练:`train.py --data /tmp/ab_smoke_seg_cap64_top2/high_energy/fma/manifests ... --segment-strategy high_energy` | ||
| 12 | |||
| 13 | 说明: | ||
| 14 | - 截至本 checkpoint,cap64 结果尚未生成。 | ||
| 15 | - 本次提交的目的是为下一 session 固化新一轮 benchmark 已正式启动的证据。 | ||
| 16 | |||
| 1 | ## 2026-06-02 cap48 seed999 完结与三 seed 聚合 checkpoint | 17 | ## 2026-06-02 cap48 seed999 完结与三 seed 聚合 checkpoint |
| 2 | 18 | ||
| 3 | 完成项: | 19 | 完成项: | ... | ... |
| ... | @@ -64,3 +64,5 @@ cd /workspace/acr-engine | ... | @@ -64,3 +64,5 @@ cd /workspace/acr-engine |
| 64 | - 已补记 `hybrid` seed=999 的中间结果:`top1=0.875 / topk=1.0 / num_queries=24`。 | 64 | - 已补记 `hybrid` seed=999 的中间结果:`top1=0.875 / topk=1.0 / num_queries=24`。 |
| 65 | 65 | ||
| 66 | - 已补齐 `seed=999` 最终结果,并完成 cap48 三 seed aggregate 归纳。 | 66 | - 已补齐 `seed=999` 最终结果,并完成 cap48 三 seed aggregate 归纳。 |
| 67 | |||
| 68 | - 已记录 cap64 benchmark 已启动,并确认进入 `high_energy` 训练阶段。 | ... | ... |
| ... | @@ -56,3 +56,9 @@ pgrep -af 'ab_smoke_seg_cap48_top2_seed999|external_adapters.py smoke-local fma | ... | @@ -56,3 +56,9 @@ pgrep -af 'ab_smoke_seg_cap48_top2_seed999|external_adapters.py smoke-local fma |
| 56 | ```bash | 56 | ```bash |
| 57 | test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_seg_cap48_top2_seed999/report.json || echo NO_REPORT | 57 | test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_seg_cap48_top2_seed999/report.json || echo NO_REPORT |
| 58 | ``` | 58 | ``` |
| 59 | |||
| 60 | ## 下一轮已启动 | ||
| 61 | |||
| 62 | - 新 benchmark:`/tmp/ab_smoke_seg_cap64_top2` | ||
| 63 | - 当前阶段:`high_energy` 训练中 | ||
| 64 | - 下一 session 应优先检查 `report.json` 是否生成 | ... | ... |
| ... | @@ -240,10 +240,10 @@ | ... | @@ -240,10 +240,10 @@ |
| 240 | - `hybrid`:`mean_top1=0.8750, min=0.7917, max=0.9583, stdev=0.0680` | 240 | - `hybrid`:`mean_top1=0.8750, min=0.7917, max=0.9583, stdev=0.0680` |
| 241 | 241 | ||
| 242 | ### 最优先待办 | 242 | ### 最优先待办 |
| 243 | 1. 基于 3-seed 结果继续设计 cap64 benchmark。 | 243 | 1. 跟进正在运行的 cap64 benchmark:`/tmp/ab_smoke_seg_cap64_top2/report.json`。 |
| 244 | 2. 增加 bucket/style-aware benchmark。 | 244 | 2. 在 cap64 完成后更新 `open-dataset-workflow.md / session-handoff.md / CHANGELOG.md`。 |
| 245 | 3. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性。 | 245 | 3. 接着增加 bucket/style-aware benchmark。 |
| 246 | 4. 提交并推送后继续下一轮验证。 | 246 | 4. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性。 |
| 247 | 247 | ||
| 248 | ### 续跑时不要做的事 | 248 | ### 续跑时不要做的事 |
| 249 | - 不要 `git add .` | 249 | - 不要 `git add .` |
| ... | @@ -669,3 +669,12 @@ seed123 最终结论: | ... | @@ -669,3 +669,12 @@ seed123 最终结论: |
| 669 | - `hybrid` 上限更高但波动更大 | 669 | - `hybrid` 上限更高但波动更大 |
| 670 | - 最终默认策略要看更多 seed 聚合结果 | 670 | - 最终默认策略要看更多 seed 聚合结果 |
| 671 | 671 | ||
| 672 | |||
| 673 | ## 100. 新一轮验证已启动:cap64 | ||
| 674 | |||
| 675 | - 已启动:`/tmp/ab_smoke_seg_cap64_top2` | ||
| 676 | - 配置:`subset_size=64`, `max_test_queries=32`, `seed=42` | ||
| 677 | - 当前最新证据: | ||
| 678 | - 已进入 `high_energy` 的 `train.py` 阶段 | ||
| 679 | - 尚未产出最终 `report.json` | ||
| 680 | ... | ... |
-
Please register or sign in to post a comment