Preserve the hybrid seed999 score before the second strategy finishes
Constraint: The cap48 seed=999 run has only completed the hybrid leg, so the three-seed aggregate is still incomplete Rejected: Wait for high_energy to finish before checkpointing | Would risk losing the verified hybrid seed999 score from the active Ralph session Confidence: high Scope-risk: narrow Directive: Keep recording verified partial benchmark milestones, but do not revise default-strategy guidance until both strategies and the final report are available Tested: Verified hybrid eval.json reports num_queries=24, top1=0.875, topk=1.0; verified progress.json records the same result; verified high_energy is still running and report.json is still absent Not-tested: Final high_energy seed999 metrics, final report.json, and updated three-seed aggregate
Showing
4 changed files
with
26 additions
and
2 deletions
| 1 | ## 2026-06-02 seed999 中间结果 checkpoint(hybrid 已落盘) | ||
| 2 | |||
| 3 | 完成项: | ||
| 4 | - 记录 `cap48 top2 seed=999` 中 `hybrid` 的已完成评测结果。 | ||
| 5 | - 确认 `hybrid` 结果已经写入 `progress.json`,而总 `report.json` 仍待 `high_energy` 完成后生成。 | ||
| 6 | |||
| 7 | 验证证据: | ||
| 8 | - `hybrid/fma_reports_smoke/eval.json`: | ||
| 9 | - `num_queries=24` | ||
| 10 | - `top1=0.875` | ||
| 11 | - `topk=1.0` | ||
| 12 | - `/tmp/ab_smoke_seg_cap48_top2_seed999/progress.json` 已记录同一结果。 | ||
| 13 | - 当前进程已切换到: | ||
| 14 | - `high_energy` 的 `run_demo.py build-index` | ||
| 15 | |||
| 16 | 说明: | ||
| 17 | - 截至本 checkpoint,三 seed aggregate 仍不能最终更新,因为 `high_energy` 的 seed=999 还未完成。 | ||
| 18 | |||
| 1 | ## 2026-06-02 运行中 benchmark 新证据 checkpoint | 19 | ## 2026-06-02 运行中 benchmark 新证据 checkpoint |
| 2 | 20 | ||
| 3 | 完成项: | 21 | 完成项: | ... | ... |
| ... | @@ -60,3 +60,5 @@ cd /workspace/acr-engine | ... | @@ -60,3 +60,5 @@ cd /workspace/acr-engine |
| 60 | - 已确认 `cap48 top2 seed=999` 未卡在 build-index。 | 60 | - 已确认 `cap48 top2 seed=999` 未卡在 build-index。 |
| 61 | - `hybrid` 已完成 reference index,随后进入 `evaluate.py`。 | 61 | - `hybrid` 已完成 reference index,随后进入 `evaluate.py`。 |
| 62 | - 本次提交用于沉淀这份 fresh verification evidence,方便下个 session 不必重复排查。 | 62 | - 本次提交用于沉淀这份 fresh verification evidence,方便下个 session 不必重复排查。 |
| 63 | |||
| 64 | - 已补记 `hybrid` seed=999 的中间结果:`top1=0.875 / topk=1.0 / num_queries=24`。 | ... | ... |
| ... | @@ -22,7 +22,8 @@ | ... | @@ -22,7 +22,8 @@ |
| 22 | 22 | ||
| 23 | 当前最新状态: | 23 | 当前最新状态: |
| 24 | - `hybrid` reference index 已完成 | 24 | - `hybrid` reference index 已完成 |
| 25 | - 当前正在执行 `evaluate.py` | 25 | - `hybrid` 已完成评测,当前结果为 `top1=0.875 / topk=1.0 / num_queries=24` |
| 26 | - `high_energy` 仍在运行中 | ||
| 26 | - 总 `report.json` 仍未落盘 | 27 | - 总 `report.json` 仍未落盘 |
| 27 | 28 | ||
| 28 | 待检查: | 29 | 待检查: | ... | ... |
| ... | @@ -232,8 +232,10 @@ | ... | @@ -232,8 +232,10 @@ |
| 232 | - 进程树已确认进入: | 232 | - 进程树已确认进入: |
| 233 | - `evaluate.py --data /tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma/manifests ... --output-json /tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma_reports_smoke/eval.json --seed 999 --max-queries 24` | 233 | - `evaluate.py --data /tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma/manifests ... --output-json /tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma_reports_smoke/eval.json --seed 999 --max-queries 24` |
| 234 | - 截至本 checkpoint: | 234 | - 截至本 checkpoint: |
| 235 | - `hybrid` 的 seed=999 评测结果已写出到 `hybrid/fma_reports_smoke/eval.json` | ||
| 236 | - `hybrid` 当前结果:`num_queries=24, top1=0.875, topk=1.0` | ||
| 235 | - 总报告 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 仍未生成 | 237 | - 总报告 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 仍未生成 |
| 236 | - `high_energy` 阶段尚未开始产出可见评测结果 | 238 | - `high_energy` 当前仍在运行中,尚未写出最终 `eval.json` |
| 237 | 239 | ||
| 238 | ### 最优先待办 | 240 | ### 最优先待办 |
| 239 | 1. 检查 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 是否生成。 | 241 | 1. 检查 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 是否生成。 |
| ... | @@ -657,6 +659,7 @@ seed123 最终结论: | ... | @@ -657,6 +659,7 @@ seed123 最终结论: |
| 657 | 659 | ||
| 658 | ## 99. 本次 checkpoint 的明确结论 | 660 | ## 99. 本次 checkpoint 的明确结论 |
| 659 | 661 | ||
| 662 | - `hybrid` 的 seed=999 评测结果已先行落盘:`top1=0.875, topk=1.0, num_queries=24`。 | ||
| 660 | - 本次已经完成“交接可续跑化”交付。 | 663 | - 本次已经完成“交接可续跑化”交付。 |
| 661 | - 本次没有等待 `seed=999` 长时 CPU benchmark 完成,因此算法默认策略不做新结论跳变。 | 664 | - 本次没有等待 `seed=999` 长时 CPU benchmark 完成,因此算法默认策略不做新结论跳变。 |
| 662 | - 当前最新稳妥表述仍然是: | 665 | - 当前最新稳妥表述仍然是: | ... | ... |
-
Please register or sign in to post a comment