Preserve restart-safe handoff for the capped FMA benchmark

Record the latest delivered benchmark evidence, active work-root, partial results, and exact resume commands so a new session can continue without rediscovering context. Constraint: User requested immediate delivery artifacts before the long benchmark fully finishes Rejected: Wait for the entire cap16 benchmark to finish before handing off | Would delay delivery and risk losing resumable context Confidence: high Scope-risk: narrow Directive: Update the handoff again once high_energy and repeated_section_aware finish on cap16 Tested: Verified partial eval files for hybrid and beat_aware; verified active cap16 benchmark processes; verified session-handoff contains resume commands and partial scores Not-tested: Final multi-strategy cap16 ranking because high_energy and repeated_section_aware are still running

Preserve restart-safe handoff for the capped FMA benchmark
Record the latest delivered benchmark evidence, active work-root, partial results, and exact resume commands so a new session can continue without rediscovering context. Constraint: User requested immediate delivery artifacts before the long benchmark fully finishes Rejected: Wait for the entire cap16 benchmark to finish before handing off | Would delay delivery and risk losing resumable context Confidence: high Scope-risk: narrow Directive: Update the handoff again once high_energy and repeated_section_aware finish on cap16 Tested: Verified partial eval files for hybrid and beat_aware; verified active cap16 benchmark processes; verified session-handoff contains resume commands and partial scores Not-tested: Final multi-strategy cap16 ranking because high_energy and repeated_section_aware are still running
cnb.bofCdSsphPA
Commit 2c909862 ... 2c9098625a13e9109e42d89fd9761db1cf70ec08 authored 2026-06-02 17:22:44 +0800 by cnb.bofCdSsphPA
Showing 2 changed files with 141 additions and 0 deletions
docs/CHANGELOG.md
docs/session-handoff.md
--- a/docs/CHANGELOG.md
View file @2c90986
+++ b/docs/CHANGELOG.md
View file @2c90986
@@ -2,6 +2,32 @@

 ## 2026-06-02

+### Stage: 交付当前切片 benchmark 续跑 handoff
+
+完成项：
+- 更新 [session-handoff.md](./session-handoff.md)
+- 记录最新关键提交：
+  - `6232787`
+  - `f04a314`
+  - `d7a0894`
+  - `b6cdf66`
+- 记录中规模真实 FMA capped benchmark 的续跑入口
+- 写入当前已拿到的 partial result：
+  - `hybrid`: `num_queries=12`, `top1=1.0`, `topk=1.0`
+  - `beat_aware`: `num_queries=12`, `top1=0.9167`, `topk=1.0`
+
+验证结果：
+- 当前进程确认：
+  - `scripts/ab_smoke_segmentation.py ... --work-root /tmp/ab_smoke_seg_cap16`
+  - `high_energy` 策略仍在进行中
+- 已落盘评测文件确认：
+  - `/tmp/ab_smoke_seg_cap16/hybrid/fma_reports_smoke/eval.json`
+  - `/tmp/ab_smoke_seg_cap16/beat_aware/fma_reports_smoke/eval.json`
+
+结论：
+- 当前 session 即使立即中断，也已经具备可恢复的续跑交接材料
+- 新 session 可以直接从 `/tmp/ab_smoke_seg_cap16` 和 `docs/session-handoff.md` 接手，而不需要重新梳理上下文
+
 ### Stage: 为切片策略评测补齐公平 query cap，并澄清 librosa 分段现状

 完成项：
--- a/docs/session-handoff.md
View file @2c90986
+++ b/docs/session-handoff.md
View file @2c90986
@@ -254,10 +254,125 @@

 近几次关键提交建议优先看：

+- `6232787` Make segmentation strategy benchmarks comparable under fixed query budgets
+- `f04a314` Benchmark real FMA segmentation strategies on a capped smoke subset
+- `d7a0894` Favor beat-aligned candidate segments for music ACR training/query generation
+- `b6cdf66` Add high-energy and onset-aware music segment selection
 - `d221852` Add explicit drop zones for real open-music corpora
 - `eee15ac` Automate the full open-dataset smoke workflow behind one command
 - `8795907` Generate release artifacts for the open-dataset smoke path
 - `dc9ef1b` Close the open-dataset smoke loop through evaluation
+
+---
+
+## 9. 最新真实数据切片 benchmark 状态（重启后优先续跑这里）
+
+### 已完成的最新事实
+
+当前项目已经不是“只有 random 切片”：
+
+- 训练/外部 query 生成支持：
+  - `random`
+  - `silence_aware`
+  - `high_energy`
+  - `onset_aware`
+  - `beat_aware`
+  - `repeated_section_aware`
+  - `hybrid`
+- 已接入的 `librosa` 逻辑：
+  - `effects.split`
+  - `onset.onset_detect`
+  - `beat.beat_track`
+  - `feature.chroma_cqt`
+
+### 已完成的小规模 capped 验证
+
+命令：
+
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python scripts/ab_smoke_segmentation.py \
+  --dataset fma \
+  --input-dir data/raw/fma_small_audio \
+  --work-root /tmp/ab_smoke_cap \
+  --subset-size 6 \
+  --query-duration 8 \
+  --train-epochs 1 \
+  --batch-size 2 \
+  --device cpu \
+  --strategies hybrid \
+  --max-test-queries 5 \
+  --output-json /tmp/ab_smoke_cap/report.json
+```
+
+结果：
+- `max_test_queries = 5`
+- `num_queries = 5`
+- `top1 = 1.0`
+- `topk = 1.0`
+
+### 正在进行的中规模 capped FMA benchmark
+
+命令：
+
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python scripts/ab_smoke_segmentation.py \
+  --dataset fma \
+  --input-dir data/raw/fma_small_audio \
+  --work-root /tmp/ab_smoke_seg_cap16 \
+  --subset-size 16 \
+  --query-duration 8 \
+  --train-epochs 1 \
+  --batch-size 2 \
+  --device cpu \
+  --strategies hybrid beat_aware high_energy repeated_section_aware \
+  --max-test-queries 12 \
+  --output-json /tmp/ab_smoke_seg_cap16/report.json
+```
+
+在本次交接时，已拿到的 partial result：
+
+| 策略 | num_queries | top1 | topk | 状态 |
+|---|---:|---:|---:|---|
+| `hybrid` | 12 | 1.0 | 1.0 | 已完成 |
+| `beat_aware` | 12 | 0.9167 | 1.0 | 已完成 |
+| `high_energy` | - | - | - | 进行中 |
+| `repeated_section_aware` | - | - | - | 未开始/未完成 |
+
+### 重启后第一优先动作
+
+1. 先检查：
+
+```bash
+pgrep -af 'ab_smoke_seg_cap16|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap16|evaluate.py --data /tmp/ab_smoke_seg_cap16|run_demo.py build-index --data /tmp/ab_smoke_seg_cap16'
+```
+
+2. 如果还在跑，等待 `/tmp/ab_smoke_seg_cap16/report.json`
+3. 如果中断：
+   - 保留已有 `/tmp/ab_smoke_seg_cap16/hybrid`、`/tmp/ab_smoke_seg_cap16/beat_aware` 结果作人工记录
+   - 重新跑剩余策略，或单独跑：
+
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local \
+  fma /tmp/ab_smoke_seg_cap16/subset_audio \
+  --output-root /tmp/ab_smoke_seg_cap16/high_energy \
+  --eval-ratio 0.2 \
+  --query-duration 8.0 \
+  --query-strategy high_energy \
+  --segment-strategy high_energy \
+  --train-epochs 1 \
+  --batch-size 2 \
+  --device cpu \
+  --max-test-queries 12 \
+  --seed 42
+```
+
+4. 完整结果出来后：
+   - 更新 [open-dataset-workflow.md](./open-dataset-workflow.md)
+   - 更新 [CHANGELOG.md](./CHANGELOG.md)
+   - commit + push
 - `b766c74` Make open-dataset manifests trainable end to end
 - `fa23144` Add a single-page open dataset workflow for training prep
 - `af33be3` Condense docs and add manifest validation before training