Preserve restartable delivery state before the long benchmark finishes

Constraint: The cap48 seed=999 benchmark is still running, so this checkpoint must avoid unverified algorithm conclusions Rejected: Wait for the CPU benchmark to finish | Would delay handoff and leave the next session without a clean restart package Confidence: high Scope-risk: narrow Directive: Keep future doc-only checkpoints surgically staged and do not add data/raw, external_smoke, /tmp outputs, or model artifacts Tested: Verified staged diff only includes AGENT memory, handoff, changelog, and changelist docs; confirmed /tmp cap48 seed=999 report is not ready yet Not-tested: The in-flight cap48 seed=999 benchmark result and any follow-up aggregate metrics

Preserve restartable delivery state before the long benchmark finishes
Constraint: The cap48 seed=999 benchmark is still running, so this checkpoint must avoid unverified algorithm conclusions Rejected: Wait for the CPU benchmark to finish | Would delay handoff and leave the next session without a clean restart package Confidence: high Scope-risk: narrow Directive: Keep future doc-only checkpoints surgically staged and do not add data/raw, external_smoke, /tmp outputs, or model artifacts Tested: Verified staged diff only includes AGENT memory, handoff, changelog, and changelist docs; confirmed /tmp cap48 seed=999 report is not ready yet Not-tested: The in-flight cap48 seed=999 benchmark result and any follow-up aggregate metrics
cnb.bofCdSsphPA
Commit 0d40b05c ... 0d40b05c42e1ec04ee3b34f5bc10aa90cd7ed652 authored 2026-06-02 18:20:30 +0800 by cnb.bofCdSsphPA
Showing 5 changed files with 241 additions and 0 deletions
AGENT.md
docs/CHANGELOG.md
docs/changelist-2026-06-02.md
docs/delivery-handoff-2026-06-02.md
docs/session-handoff.md
--- a/AGENT.md 0 → 100644
View file @0d40b05
+++ b/AGENT.md 0 → 100644
View file @0d40b05
+# AGENT Memory / 开发续跑记忆
+> 更新：2026-06-02
+> 目的：让新 session 在 1~3 分钟内接上当前开发节奏。
+## 1. 用户长期偏好
+- 默认使用中文输出。
+- 尽量自主推进，不要频繁停下来询问。
+- 每完成一个阶段性 checklist：
+  1. 更新 `docs/CHANGELOG.md`
+  2. `git commit`
+  3. `git push origin main`
+- Python 固定使用：`/usr/local/miniconda3/bin/python`
+- 文档优先级：图 > 表 > 文字 > 细节附录。
+- 文档要浓缩分类，避免同层级文档过多。
+- 外部/内部文档链接优先使用相对路径跳转，不要只用反引号包裹地址。
+- 严禁误提交大体积数据、训练产物、`/tmp` 结果、`__pycache__`。
+## 2. 当前项目主线
+这是一个正在工业化的音乐 ACR / 检索项目，主线是：
+- 开放数据集接入
+- 音频切片策略优化
+- 训练 / 建索引 / 评测闭环
+- 准确率与混淆鲁棒性提升
+- 文档与交接体系完善
+## 3. 当前已完成重点
+- 已实现多种切片策略：
+  - `random`
+  - `silence_aware`
+  - `high_energy`
+  - `onset_aware`
+  - `beat_aware`
+  - `repeated_section_aware`
+  - `hybrid`
+- 已实现公平评测控制：
+  - `evaluate.py --max-queries --seed`
+  - `smoke-local --max-test-queries`
+  - `scripts/ab_smoke_segmentation.py --max-test-queries`
+- 已补强数据规范、pgvector 指南、FMA/开放数据工作流文档。
+## 4. 当前经验结论
+- 小样本真实 FMA smoke 下，多策略都可能接近满分，不能据此定默认策略。
+- cap48 规模下结果对 seed 敏感。
+- 当前已知两轮 cap48 聚合里：
+  - `high_energy` 稳定性更强
+  - `hybrid` 上限更高但波动更大
+- 默认策略决策应基于 multi-seed aggregate，而不是单次跑分。
+## 5. 当前续跑优先级
+1. 完成/确认 `cap48 top2 seed=999` 结果。
+2. 汇总 3-seed aggregate。
+3. 更新：
+   - `docs/open-dataset-workflow.md`
+   - `docs/session-handoff.md`
+   - `docs/CHANGELOG.md`
+4. 提交并推送。
+5. 下一轮再推进：
+   - cap64 benchmark
+   - bucket/style-aware benchmark
+   - 工业级 hard negative / confusion 集设计
+## 6. 高风险注意事项
+- `git status` 中通常会有大量：
+  - `acr-engine/data/raw/...`
+  - `acr-engine/data/external_smoke/...`
+  - 模型 checkpoint
+  - `__pycache__`
+- 提交时必须显式 `git add` 目标文档文件，不能使用宽泛全量暂存。
+## 7. 关键文档入口
+- `docs/README.md`
+- `docs/open-dataset-workflow.md`
+- `docs/dataset-spec.md`
+- `docs/training-data-and-pgvector-guide.md`
+- `docs/session-handoff.md`
+- `docs/CHANGELOG.md`
--- a/docs/CHANGELOG.md
View file @0d40b05
+++ b/docs/CHANGELOG.md
View file @0d40b05
+## 2026-06-02 交付检查点：handoff / changelist / agent memory
+完成项：
+- 新增根目录 `AGENT.md`，固化当前开发偏好、提交习惯、续跑优先级与避坑约束。
+- 新增 `docs/changelist-2026-06-02.md`，用于本次交付文件级变更说明。
+- 新增 `docs/delivery-handoff-2026-06-02.md`，用于新 session 快速接管。
+- 补充 `docs/session-handoff.md`：明确当前卡点、运行中的 benchmark、下一步命令与禁止误提交项。
+当前卡点：
+- `cap48 top2 seed=999` benchmark 仍在运行中，尚未产出最终 `report.json`。
+- 仓库存在大量未跟踪数据与模型产物，当前阶段只适合提交文档。
+交付说明：
+- 本次提交以“可续跑交接”为目标，不等待长时 CPU benchmark 完成。
+- 下一 session 进入后应优先检查：
+  - `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json`
+  - 相关 `eval.json`
+  - 进程是否仍在运行
 # Changelog
 ## 2026-06-02
--- a/docs/changelist-2026-06-02.md 0 → 100644
View file @0d40b05
+++ b/docs/changelist-2026-06-02.md 0 → 100644
View file @0d40b05
+# Changelist / 2026-06-02
+## 本次交付目标
+在不等待长时间 benchmark 完成的前提下，交付一套足够完整的续跑文档，让新 session 能立刻知道：
+- 已完成什么
+- 正在卡在哪里
+- 下一步跑什么
+- 哪些文件能提，哪些不能提
+## 文件级变更
+| 文件 | 变更说明 |
+|---|---|
+| [../AGENT.md](../AGENT.md) | 新增开发偏好与续跑记忆 |
+| [./session-handoff.md](./session-handoff.md) | 增补当前卡点、待办与续跑命令 |
+| [./delivery-handoff-2026-06-02.md](./delivery-handoff-2026-06-02.md) | 新增快速接管摘要 |
+| [./CHANGELOG.md](./CHANGELOG.md) | 记录本次交付检查点 |
+## 不在本次提交中的内容
+- FMA / MTG-Jamendo 原始数据
+- `data/external_smoke` 中的音频与模型产物
+- `/tmp` benchmark 输出
+- `__pycache__`
+- checkpoint / index 目录
+## 当前运行中的任务
+- `cap48 top2 seed=999`
+- 启动命令：
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python scripts/ab_smoke_segmentation.py \
+  --dataset fma \
+  --input-dir data/raw/fma_small_audio \
+  --work-root /tmp/ab_smoke_seg_cap48_top2_seed999 \
+  --subset-size 48 \
+  --query-duration 8 \
+  --train-epochs 1 \
+  --batch-size 2 \
+  --device cpu \
+  --strategies hybrid high_energy \
+  --max-test-queries 24 \
+  --seed 999 \
+  --output-json /tmp/ab_smoke_seg_cap48_top2_seed999/report.json
+```
+## 下一步建议
+1. 检查 `seed=999` 是否完成。
+2. 生成 3-seed aggregate。
+3. 回写 workflow / handoff / changelog。
+4. 提交推送。
+5. 再开启 cap64 或 bucket benchmark。
--- a/docs/delivery-handoff-2026-06-02.md 0 → 100644
View file @0d40b05
+++ b/docs/delivery-handoff-2026-06-02.md 0 → 100644
View file @0d40b05
+# Delivery Handoff / 2026-06-02
+## 一页接管
+当前可以直接交付的不是“最终算法结论”，而是“可持续续跑的工程状态”：
+- 文档主结构已成型
+- 数据规范/输入输出/pgvector 说明已补齐
+- 切片策略与公平评测能力已落地
+- 最新大一点的 benchmark 还在跑，结果未最终封板
+## 已完成
+- 多种音乐感知切片策略已接入训练与 query 生成。
+- 真实 FMA mini-subset smoke A/B 已多轮验证。
+- `high_energy` 与 `hybrid` 已成为当前最强候选。
+- cap48 结果已明确存在 seed sensitivity。
+- 文档已经浓缩为可导航结构。
+## 当前卡点
+### 卡点 1：seed=999 benchmark 未完成
+待检查：
+- `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json`
+- `/tmp/ab_smoke_seg_cap48_top2_seed999/hybrid/fma_reports_smoke/eval.json`
+- `/tmp/ab_smoke_seg_cap48_top2_seed999/high_energy/fma_reports_smoke/eval.json`
+### 卡点 2：工作区噪音很大
+当前有大量未跟踪或变更的数据/产物文件，提交时必须精准暂存文档文件。
+## 建议接手顺序
+1. 检查进程是否仍在运行。
+2. 如果完成，计算 3-seed aggregate。
+3. 回写结论到：
+   - [open-dataset-workflow.md](./open-dataset-workflow.md)
+   - [session-handoff.md](./session-handoff.md)
+   - [CHANGELOG.md](./CHANGELOG.md)
+4. 单独提交文档。
+5. 再继续下一轮 benchmark。
+## 推荐检查命令
+```bash
+pgrep -af 'ab_smoke_seg_cap48_top2_seed999|external_adapters.py smoke-local fma /tmp/ab_smoke_seg_cap48_top2_seed999|evaluate.py --data /tmp/ab_smoke_seg_cap48_top2_seed999|run_demo.py build-index --data /tmp/ab_smoke_seg_cap48_top2_seed999|train.py --data /tmp/ab_smoke_seg_cap48_top2_seed999'
+```
+```bash
+test -f /tmp/ab_smoke_seg_cap48_top2_seed999/report.json && cat /tmp/ab_smoke_seg_cap48_top2_seed999/report.json || echo NO_REPORT
+```
--- a/docs/session-handoff.md
View file @0d40b05
+++ b/docs/session-handoff.md
View file @0d40b05
@@ -209,6 +209,26 @@
 ---
+## 0. 当前交付状态（本次 checkpoint）
+### 已可交付
+- 文档体系、数据规范、切片策略、评测公平性控制已成型。
+- 新 session 已可依据本文件和 `AGENT.md` 继续推进。
+### 当前卡点
+- `cap48 top2 seed=999` 仍在运行或待收尾，尚未写回最终 3-seed aggregate 结论。
+- 工作区存在大量数据与模型产物，当前只建议精确提交文档文件。
+### 最优先待办
+1. 检查 `/tmp/ab_smoke_seg_cap48_top2_seed999/report.json` 是否生成。
+2. 如已生成，计算 `default + 123 + 999` 三个 seed 的 aggregate。
+3. 更新 `open-dataset-workflow.md / session-handoff.md / CHANGELOG.md`。
+4. 提交并推送。
+### 续跑时不要做的事
+- 不要 `git add .`
+- 不要提交 `data/raw`、`data/external_smoke`、`/tmp`、`__pycache__`、模型与索引产物
 ## 7. 当前最重要的待办
 ### 优先级 A：真实开放数据替换
@@ -618,3 +638,13 @@ seed123 最终结论：
 - FMA 下载完成后可直接执行：[acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)
 - 若需要等待下载完成并自动切到解压/就绪检查，可直接执行：[acr-engine/scripts/wait_for_fma_and_prepare.py](../acr-engine/scripts/wait_for_fma_and_prepare.py)
+## 99. 本次 checkpoint 的明确结论
+- 本次已经完成“交接可续跑化”交付。
+- 本次没有等待 `seed=999` 长时 CPU benchmark 完成，因此算法默认策略不做新结论跳变。
+- 当前最新稳妥表述仍然是：
+  - `high_energy` 在已知两轮 cap48 aggregate 中更稳
+  - `hybrid` 上限更高但波动更大
+  - 最终默认策略要看更多 seed 聚合结果