Preserve the larger cap24 top-two benchmark checkpoint

Record the new 24-track capped benchmark setup and the first completed hybrid result so the next session can continue the stronger tie-break experiment without rediscovering runtime state. Constraint: The cap24 benchmark is still in progress, so only partial evidence can be documented now Rejected: Wait for high_energy to finish before updating handoff | Risks losing the fresh larger-subset evidence if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the partial cap24 section with the final two-strategy ranking once report.json lands Tested: Verified /tmp/ab_smoke_seg_cap24_top2/hybrid/fma_reports_smoke/eval.json; verified active cap24 processes; verified docs include the exact work-root and resume command Not-tested: Final cap24 top-two comparison because high_energy is still training

Preserve the larger cap24 top-two benchmark checkpoint
Record the new 24-track capped benchmark setup and the first completed hybrid result so the next session can continue the stronger tie-break experiment without rediscovering runtime state. Constraint: The cap24 benchmark is still in progress, so only partial evidence can be documented now Rejected: Wait for high_energy to finish before updating handoff | Risks losing the fresh larger-subset evidence if the session ends first Confidence: high Scope-risk: narrow Directive: Replace the partial cap24 section with the final two-strategy ranking once report.json lands Tested: Verified /tmp/ab_smoke_seg_cap24_top2/hybrid/fma_reports_smoke/eval.json; verified active cap24 processes; verified docs include the exact work-root and resume command Not-tested: Final cap24 top-two comparison because high_energy is still training
cnb.bofCdSsphPA
Commit 48a5957a ... 48a5957aba254114c0d37aa54c7abab3b019da5b authored 2026-06-02 17:33:42 +0800 by cnb.bofCdSsphPA
Showing 2 changed files with 60 additions and 0 deletions
docs/CHANGELOG.md
docs/session-handoff.md
--- a/docs/CHANGELOG.md
View file @48a5957
+++ b/docs/CHANGELOG.md
View file @48a5957
@@ -2,6 +2,27 @@
 ## 2026-06-02
+### Stage: 启动更大 cap24 top2 真实 FMA 对照并记录首条结果
+完成项：
+- 启动：
+  - `/tmp/ab_smoke_seg_cap24_top2`
+  - 策略仅保留 `hybrid` 与 `high_energy`
+  - `subset_size = 24`
+  - `max_test_queries = 16`
+- 更新 [session-handoff.md](./session-handoff.md)
+当前 fresh evidence：
+- `hybrid` 已完成：
+  - `num_queries = 16`
+  - `top1 = 1.0`
+  - `topk = 1.0`
+- `high_energy` 已进入训练阶段，整轮对照尚未完成
+结论：
+- 在比 cap16 更大的真实 FMA 子集上，`hybrid` 目前仍保持满分
+- 下一步只需等待 `high_energy` 完成，就能判断两者在更大子集上是否继续打平或拉开
 ### Stage: 收尾 cap16 真实 FMA capped segmentation benchmark
 完成项：
--- a/docs/session-handoff.md
View file @48a5957
+++ b/docs/session-handoff.md
View file @48a5957
@@ -373,6 +373,45 @@ cd /workspace/acr-engine
   - 默认优先：`hybrid`
   - 强次选：`high_energy`
   - `beat_aware` / `repeated_section_aware` 更适合作为补充对照，而不是默认策略
+---
+## 10. cap24 top2 对照实验（进行中）
+为进一步判断 `hybrid` 与 `high_energy` 的并列关系，已经启动更大的真实 FMA 对照：
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python scripts/ab_smoke_segmentation.py \
+  --dataset fma \
+  --input-dir data/raw/fma_small_audio \
+  --work-root /tmp/ab_smoke_seg_cap24_top2 \
+  --subset-size 24 \
+  --query-duration 8 \
+  --train-epochs 1 \
+  --batch-size 2 \
+  --device cpu \
+  --strategies hybrid high_energy \
+  --max-test-queries 16 \
+  --output-json /tmp/ab_smoke_seg_cap24_top2/report.json
+```
+当前 fresh evidence：
+| 策略 | subset | max_test_queries | top1 | topk | 状态 |
+|---|---:|---:|---:|---:|---|
+| `hybrid` | 24 | 16 | 1.0 | 1.0 | 已完成 |
+| `high_energy` | 24 | 16 | - | - | 训练中 |
+恢复检查命令：
+```bash
+pgrep -af 'ab_smoke_seg_cap24_top2|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap24_top2|evaluate.py --data /tmp/ab_smoke_seg_cap24_top2|run_demo.py build-index --data /tmp/ab_smoke_seg_cap24_top2'
+```
+如果 `report.json` 尚未生成，优先等待：
+- `/tmp/ab_smoke_seg_cap24_top2/high_energy/fma_reports_smoke/eval.json`
+- `/tmp/ab_smoke_seg_cap24_top2/report.json`
 - `b766c74` Make open-dataset manifests trainable end to end
 - `fa23144` Add a single-page open dataset workflow for training prep
 - `af33be3` Condense docs and add manifest validation before training