Commit d13a3b8b d13a3b8b9c45a37ff6243e0b9851b8bc8061a21a by cnb.bofCdSsphPA

Preserve the hybrid seed999 score before the second strategy finishes

Constraint: The cap48 seed=999 run has only completed the hybrid leg, so the three-seed aggregate is still incomplete
Rejected: Wait for high_energy to finish before checkpointing | Would risk losing the verified hybrid seed999 score from the active Ralph session
Confidence: high
Scope-risk: narrow
Directive: Keep recording verified partial benchmark milestones, but do not revise default-strategy guidance until both strategies and the final report are available
Tested: Verified hybrid eval.json reports num_queries=24, top1=0.875, topk=1.0; verified progress.json records the same result; verified high_energy is still running and report.json is still absent
Not-tested: Final high_energy seed999 metrics, final report.json, and updated three-seed aggregate
1 parent bdc04f72
## 2026-06-02 seed999 中间结果 checkpoint(hybrid 已落盘)
完成项:
- 记录 `cap48 top2 seed=999``hybrid` 的已完成评测结果。
- 确认 `hybrid` 结果已经写入 `progress.json`,而总 `report.json` 仍待 `high_energy` 完成后生成。
验证证据:
- `hybrid/fma_reports_smoke/eval.json`
- `num_queries=24`
- `top1=0.875`
- `topk=1.0`
- `/tmp/ab_smoke_seg_cap48_top2_seed999/progress.json` 已记录同一结果。
- 当前进程已切换到:
- `high_energy``run_demo.py build-index`
说明:
- 截至本 checkpoint,三 seed aggregate 仍不能最终更新,因为 `high_energy` 的 seed=999 还未完成。
## 2026-06-02 运行中 benchmark 新证据 checkpoint
完成项:
......
......@@ -60,3 +60,5 @@ cd /workspace/acr-engine
- 已确认 `cap48 top2 seed=999` 未卡在 build-index。
- `hybrid` 已完成 reference index,随后进入 `evaluate.py`
- 本次提交用于沉淀这份 fresh verification evidence,方便下个 session 不必重复排查。
- 已补记 `hybrid` seed=999 的中间结果:`top1=0.875 / topk=1.0 / num_queries=24`
......
......@@ -22,7 +22,8 @@
当前最新状态:
- `hybrid` reference index 已完成
- 当前正在执行 `evaluate.py`
- `hybrid` 已完成评测,当前结果为 `top1=0.875 / topk=1.0 / num_queries=24`
- `high_energy` 仍在运行中
-`report.json` 仍未落盘
待检查:
......
......@@ -232,8 +232,10 @@
- 进程树已确认进入:
- `evaluate.py --data /tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma/manifests ... --output-json /tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma_reports_smoke/eval.json --seed 999 --max-queries 24`
- 截至本 checkpoint:
- `hybrid` 的 seed=999 评测结果已写出到 `hybrid/fma_reports_smoke/eval.json`
- `hybrid` 当前结果:`num_queries=24, top1=0.875, topk=1.0`
- 总报告 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 仍未生成
- `high_energy` 阶段尚未开始产出可见评测结果
- `high_energy` 当前仍在运行中,尚未写出最终 `eval.json`
### 最优先待办
1. 检查 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 是否生成。
......@@ -657,6 +659,7 @@ seed123 最终结论:
## 99. 本次 checkpoint 的明确结论
- `hybrid` 的 seed=999 评测结果已先行落盘:`top1=0.875, topk=1.0, num_queries=24`
- 本次已经完成“交接可续跑化”交付。
- 本次没有等待 `seed=999` 长时 CPU benchmark 完成,因此算法默认策略不做新结论跳变。
- 当前最新稳妥表述仍然是:
......