Checkpoint the cap48 benchmark while the larger run is still building

Preserve the new 48-track top-two benchmark entry point and current build-index phase so later sessions can continue the expanding validation ladder without rediscovering runtime state. Constraint: cap48 has not produced scores yet, so only execution-state evidence is available Rejected: Wait for cap48 scores before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the cap48 running-state section with measured scores once hybrid eval.json or report.json land Tested: Verified active cap48 processes; verified handoff records work-root, subset size, query cap, and current build-index phase Not-tested: cap48 strategy scores because the run is still in progress

Checkpoint the cap48 benchmark while the larger run is still building
Preserve the new 48-track top-two benchmark entry point and current build-index phase so later sessions can continue the expanding validation ladder without rediscovering runtime state. Constraint: cap48 has not produced scores yet, so only execution-state evidence is available Rejected: Wait for cap48 scores before recording anything | Risks losing the larger-benchmark checkpoint if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the cap48 running-state section with measured scores once hybrid eval.json or report.json land Tested: Verified active cap48 processes; verified handoff records work-root, subset size, query cap, and current build-index phase Not-tested: cap48 strategy scores because the run is still in progress
cnb.bofCdSsphPA
Commit 026b5539 ... 026b553984497e893a1578370b29a3b4f4bd7f8d authored 2026-06-02 17:50:57 +0800 by cnb.bofCdSsphPA
Showing 2 changed files with 64 additions and 0 deletions
docs/CHANGELOG.md
docs/session-handoff.md
--- a/docs/CHANGELOG.md
View file @026b553
+++ b/docs/CHANGELOG.md
View file @026b553
@@ -2,6 +2,28 @@

 ## 2026-06-02

+### Stage: 启动 cap48 top2 真实 FMA 对照并记录运行阶段
+
+完成项：
+- 启动更大的真实 FMA top2 benchmark：
+  - `work_root = /tmp/ab_smoke_seg_cap48_top2`
+  - `subset_size = 48`
+  - `max_test_queries = 24`
+  - 策略：`hybrid` vs `high_energy`
+- 更新 [session-handoff.md](./session-handoff.md)
+
+当前 fresh evidence：
+- `scripts/ab_smoke_segmentation.py ... --work-root /tmp/ab_smoke_seg_cap48_top2` 已启动
+- 当前 first lane 为：
+  - `hybrid`
+- 当前已进入：
+  - `run_demo.py build-index --resume --checkpoint-every-refs 100`
+- `report.json` 尚未落盘
+
+结论：
+- 现在已经开始验证 cap24 / cap32 的结论在更大 `subset=48` 上是否继续成立
+- 即使当前 session 结束，新 session 也可直接从 handoff 中的 cap48 入口继续盯结果
+
 ### Stage: 收尾 cap32 top2 真实 FMA 对照并稳定默认策略结论

 完成项：
--- a/docs/session-handoff.md
View file @026b553
+++ b/docs/session-handoff.md
View file @026b553
@@ -456,6 +456,48 @@ cap32 top2 最终结论：
 - `hybrid`：`20 / 0.95 / 1.0`
 - `high_energy`：`20 / 0.5 / 1.0`
 - cap24 与 cap32 两轮更大真实子集都指向同一结论：**默认策略固定为 `hybrid`**
+
+---
+
+## 12. cap48 top2 对照实验（进行中）
+
+为继续扩展真实数据证据链，已启动更大的 FMA top2 对照：
+
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python scripts/ab_smoke_segmentation.py \
+  --dataset fma \
+  --input-dir data/raw/fma_small_audio \
+  --work-root /tmp/ab_smoke_seg_cap48_top2 \
+  --subset-size 48 \
+  --query-duration 8 \
+  --train-epochs 1 \
+  --batch-size 2 \
+  --device cpu \
+  --strategies hybrid high_energy \
+  --max-test-queries 24 \
+  --output-json /tmp/ab_smoke_seg_cap48_top2/report.json
+```
+
+当前 fresh evidence：
+
+| 项目 | 状态 |
+|---|---|
+| `subset_size` | `48` |
+| `max_test_queries` | `24` |
+| 首个运行策略 | `hybrid` |
+| 当前阶段 | `run_demo.py build-index --resume --checkpoint-every-refs 100` |
+| `report.json` | 尚未生成 |
+
+恢复检查命令：
+
+```bash
+pgrep -af 'ab_smoke_seg_cap48_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap48_top2|evaluate.py --data /tmp/ab_smoke_seg_cap48_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap48_top2|train.py --data /tmp/ab_smoke_seg_cap48_top2'
+```
+
+优先等待文件：
+- `/tmp/ab_smoke_seg_cap48_top2/hybrid/fma_reports_smoke/eval.json`
+- `/tmp/ab_smoke_seg_cap48_top2/report.json`
 - `b766c74` Make open-dataset manifests trainable end to end
 - `fa23144` Add a single-page open dataset workflow for training prep
 - `af33be3` Condense docs and add manifest validation before training