Commit 75fa5e93 75fa5e932efe8d1c0980dba3081216c18c1af885 by cnb.bofCdSsphPA

Provide a runnable semantic-bucket template so the next benchmark step can start immediately

Constraint: Keep the checkpoint lightweight and avoid touching dataset or model artifacts
Rejected: Wait to add buckets until automatic semantic labeling exists | Manual curated buckets are enough to unblock the next session now
Confidence: high
Scope-risk: narrow
Directive: Use the template as a curated benchmark scaffold, not as evidence that filenames imply semantics
Tested: Parsed the new JSON template; ran ab_smoke_bucketed.py --help; rechecked targeted relative links
Not-tested: Did not launch a new semantic bucket benchmark run in this checkpoint
1 parent 1bdca61b
......@@ -53,6 +53,7 @@
## 5. 当前续跑优先级
1. 将 toy prefix bucket 升级为语义 bucket。
- 模板入口:`acr-engine/configs/buckets/fma_semantic_bucket_template.json`
2. 补 cap64 multi-seed aggregate。
3. 更新:
- `docs/open-dataset-workflow.md`
......
{
"notes": {
"purpose": "Template for semantic/style-aware bucket benchmarking on local FMA-like trees.",
"how_to_use": "Replace placeholder glob patterns with your own curated track groups before running ab_smoke_bucketed.py.",
"warning": "Do not treat filename prefixes as product semantics; this file is for manually curated semantic buckets."
},
"buckets": [
{
"name": "energy_dominant",
"patterns": [
"fma_small/*/REPLACE_WITH_HIGH_ENERGY_TRACKS_*.mp3"
],
"subset_size": 16,
"label_hint": "chorus-heavy or consistently high-energy songs"
},
{
"name": "repeated_section_rich",
"patterns": [
"fma_small/*/REPLACE_WITH_REPEATED_SECTION_TRACKS_*.mp3"
],
"subset_size": 16,
"label_hint": "clear repeating hook/chorus structure"
},
{
"name": "steady_beat_regular_meter",
"patterns": [
"fma_small/*/REPLACE_WITH_STEADY_BEAT_TRACKS_*.mp3"
],
"subset_size": 16,
"label_hint": "stable beat, strong downbeat, regular meter"
},
{
"name": "hard_negative_confusable",
"patterns": [
"fma_small/*/REPLACE_WITH_CONFUSABLE_TRACKS_*.mp3"
],
"subset_size": 16,
"label_hint": "sonically similar tracks likely to trigger confusion"
}
]
}
## 2026-06-02 语义 bucket 模板交付 checkpoint
完成项:
- 新增语义 bucket 配置模板:`acr-engine/configs/buckets/fma_semantic_bucket_template.json`
- 已把模板入口与运行命令补入 workflow / benchmark / handoff 文档。
模板覆盖的首批 bucket:
- `energy_dominant`
- `repeated_section_rich`
- `steady_beat_regular_meter`
- `hard_negative_confusable`
结论:
- 现在下个 session 不需要从 0 设计 bucket 结构。
- 可以直接在模板里替换 glob,开始做更有业务意义的 bucket benchmark。
## 2026-06-02 bucket/style-aware benchmark 汇总完成 checkpoint
完成项:
......
......@@ -117,3 +117,9 @@ flowchart LR
- aggregate 层面两者 `mean_top1` 都是 `1.0`
因此 bucket benchmark 的当前意义不是“选出唯一赢家”,而是为后续语义 bucket / hard-case bucket 提供一个可复用执行框架。
推荐模板:
- [../acr-engine/configs/buckets/fma_semantic_bucket_template.json](../acr-engine/configs/buckets/fma_semantic_bucket_template.json)
它不是自动标注器,而是一个“人工先分 bucket,再复用统一 benchmark 流程”的执行模板。
......
......@@ -367,3 +367,32 @@ cd acr-engine
当前结论:
- bucket baseline 已经能稳定复现“不同子集会选出不同 winner”。
- 下一步不是继续做 prefix toy bucket,而是升级到更有业务意义的 bucket。
推荐直接从模板开始:
- [../acr-engine/configs/buckets/fma_semantic_bucket_template.json](../acr-engine/configs/buckets/fma_semantic_bucket_template.json)
建议先人工挑一批歌,再把 glob 替换成你自己的候选集合,优先覆盖:
1. `energy_dominant`
2. `repeated_section_rich`
3. `steady_beat_regular_meter`
4. `hard_negative_confusable`
对应命令:
```bash
cd /workspace/acr-engine
/usr/local/miniconda3/bin/python scripts/ab_smoke_bucketed.py \
--dataset fma \
--input-dir data/raw/fma_small_audio \
--bucket-config configs/buckets/fma_semantic_bucket_template.json \
--work-root /tmp/ab_smoke_bucketed_semantic \
--default-subset-size 16 \
--query-duration 8 \
--train-epochs 1 \
--batch-size 2 \
--device cpu \
--strategies high_energy hybrid \
--max-test-queries 8 \
--seed 42 \
--output-json /tmp/ab_smoke_bucketed_semantic/report.json
```
......
......@@ -255,6 +255,7 @@
### 最优先待办
1. 把已完成的 toy bucket baseline 升级为语义 bucket(风格 / 结构 / hard-case)。
- 模板:`acr-engine/configs/buckets/fma_semantic_bucket_template.json`
2. 对比 cap48 与 cap64 的不一致现象,补充分规模结论。
3. 继续补 cap64 multi-seed,而不是只保留单 seed。
4. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性。
......