Provide a runnable semantic-bucket template so the next benchmark step can start immediately
Constraint: Keep the checkpoint lightweight and avoid touching dataset or model artifacts Rejected: Wait to add buckets until automatic semantic labeling exists | Manual curated buckets are enough to unblock the next session now Confidence: high Scope-risk: narrow Directive: Use the template as a curated benchmark scaffold, not as evidence that filenames imply semantics Tested: Parsed the new JSON template; ran ab_smoke_bucketed.py --help; rechecked targeted relative links Not-tested: Did not launch a new semantic bucket benchmark run in this checkpoint
Showing
6 changed files
with
94 additions
and
0 deletions
| ... | @@ -53,6 +53,7 @@ | ... | @@ -53,6 +53,7 @@ |
| 53 | ## 5. 当前续跑优先级 | 53 | ## 5. 当前续跑优先级 |
| 54 | 54 | ||
| 55 | 1. 将 toy prefix bucket 升级为语义 bucket。 | 55 | 1. 将 toy prefix bucket 升级为语义 bucket。 |
| 56 | - 模板入口:`acr-engine/configs/buckets/fma_semantic_bucket_template.json` | ||
| 56 | 2. 补 cap64 multi-seed aggregate。 | 57 | 2. 补 cap64 multi-seed aggregate。 |
| 57 | 3. 更新: | 58 | 3. 更新: |
| 58 | - `docs/open-dataset-workflow.md` | 59 | - `docs/open-dataset-workflow.md` | ... | ... |
| 1 | { | ||
| 2 | "notes": { | ||
| 3 | "purpose": "Template for semantic/style-aware bucket benchmarking on local FMA-like trees.", | ||
| 4 | "how_to_use": "Replace placeholder glob patterns with your own curated track groups before running ab_smoke_bucketed.py.", | ||
| 5 | "warning": "Do not treat filename prefixes as product semantics; this file is for manually curated semantic buckets." | ||
| 6 | }, | ||
| 7 | "buckets": [ | ||
| 8 | { | ||
| 9 | "name": "energy_dominant", | ||
| 10 | "patterns": [ | ||
| 11 | "fma_small/*/REPLACE_WITH_HIGH_ENERGY_TRACKS_*.mp3" | ||
| 12 | ], | ||
| 13 | "subset_size": 16, | ||
| 14 | "label_hint": "chorus-heavy or consistently high-energy songs" | ||
| 15 | }, | ||
| 16 | { | ||
| 17 | "name": "repeated_section_rich", | ||
| 18 | "patterns": [ | ||
| 19 | "fma_small/*/REPLACE_WITH_REPEATED_SECTION_TRACKS_*.mp3" | ||
| 20 | ], | ||
| 21 | "subset_size": 16, | ||
| 22 | "label_hint": "clear repeating hook/chorus structure" | ||
| 23 | }, | ||
| 24 | { | ||
| 25 | "name": "steady_beat_regular_meter", | ||
| 26 | "patterns": [ | ||
| 27 | "fma_small/*/REPLACE_WITH_STEADY_BEAT_TRACKS_*.mp3" | ||
| 28 | ], | ||
| 29 | "subset_size": 16, | ||
| 30 | "label_hint": "stable beat, strong downbeat, regular meter" | ||
| 31 | }, | ||
| 32 | { | ||
| 33 | "name": "hard_negative_confusable", | ||
| 34 | "patterns": [ | ||
| 35 | "fma_small/*/REPLACE_WITH_CONFUSABLE_TRACKS_*.mp3" | ||
| 36 | ], | ||
| 37 | "subset_size": 16, | ||
| 38 | "label_hint": "sonically similar tracks likely to trigger confusion" | ||
| 39 | } | ||
| 40 | ] | ||
| 41 | } |
| 1 | ## 2026-06-02 语义 bucket 模板交付 checkpoint | ||
| 2 | |||
| 3 | 完成项: | ||
| 4 | - 新增语义 bucket 配置模板:`acr-engine/configs/buckets/fma_semantic_bucket_template.json` | ||
| 5 | - 已把模板入口与运行命令补入 workflow / benchmark / handoff 文档。 | ||
| 6 | |||
| 7 | 模板覆盖的首批 bucket: | ||
| 8 | - `energy_dominant` | ||
| 9 | - `repeated_section_rich` | ||
| 10 | - `steady_beat_regular_meter` | ||
| 11 | - `hard_negative_confusable` | ||
| 12 | |||
| 13 | 结论: | ||
| 14 | - 现在下个 session 不需要从 0 设计 bucket 结构。 | ||
| 15 | - 可以直接在模板里替换 glob,开始做更有业务意义的 bucket benchmark。 | ||
| 16 | |||
| 1 | ## 2026-06-02 bucket/style-aware benchmark 汇总完成 checkpoint | 17 | ## 2026-06-02 bucket/style-aware benchmark 汇总完成 checkpoint |
| 2 | 18 | ||
| 3 | 完成项: | 19 | 完成项: | ... | ... |
| ... | @@ -117,3 +117,9 @@ flowchart LR | ... | @@ -117,3 +117,9 @@ flowchart LR |
| 117 | - aggregate 层面两者 `mean_top1` 都是 `1.0` | 117 | - aggregate 层面两者 `mean_top1` 都是 `1.0` |
| 118 | 118 | ||
| 119 | 因此 bucket benchmark 的当前意义不是“选出唯一赢家”,而是为后续语义 bucket / hard-case bucket 提供一个可复用执行框架。 | 119 | 因此 bucket benchmark 的当前意义不是“选出唯一赢家”,而是为后续语义 bucket / hard-case bucket 提供一个可复用执行框架。 |
| 120 | |||
| 121 | |||
| 122 | 推荐模板: | ||
| 123 | - [../acr-engine/configs/buckets/fma_semantic_bucket_template.json](../acr-engine/configs/buckets/fma_semantic_bucket_template.json) | ||
| 124 | |||
| 125 | 它不是自动标注器,而是一个“人工先分 bucket,再复用统一 benchmark 流程”的执行模板。 | ... | ... |
| ... | @@ -367,3 +367,32 @@ cd acr-engine | ... | @@ -367,3 +367,32 @@ cd acr-engine |
| 367 | 当前结论: | 367 | 当前结论: |
| 368 | - bucket baseline 已经能稳定复现“不同子集会选出不同 winner”。 | 368 | - bucket baseline 已经能稳定复现“不同子集会选出不同 winner”。 |
| 369 | - 下一步不是继续做 prefix toy bucket,而是升级到更有业务意义的 bucket。 | 369 | - 下一步不是继续做 prefix toy bucket,而是升级到更有业务意义的 bucket。 |
| 370 | |||
| 371 | 推荐直接从模板开始: | ||
| 372 | - [../acr-engine/configs/buckets/fma_semantic_bucket_template.json](../acr-engine/configs/buckets/fma_semantic_bucket_template.json) | ||
| 373 | |||
| 374 | 建议先人工挑一批歌,再把 glob 替换成你自己的候选集合,优先覆盖: | ||
| 375 | 1. `energy_dominant` | ||
| 376 | 2. `repeated_section_rich` | ||
| 377 | 3. `steady_beat_regular_meter` | ||
| 378 | 4. `hard_negative_confusable` | ||
| 379 | |||
| 380 | 对应命令: | ||
| 381 | |||
| 382 | ```bash | ||
| 383 | cd /workspace/acr-engine | ||
| 384 | /usr/local/miniconda3/bin/python scripts/ab_smoke_bucketed.py \ | ||
| 385 | --dataset fma \ | ||
| 386 | --input-dir data/raw/fma_small_audio \ | ||
| 387 | --bucket-config configs/buckets/fma_semantic_bucket_template.json \ | ||
| 388 | --work-root /tmp/ab_smoke_bucketed_semantic \ | ||
| 389 | --default-subset-size 16 \ | ||
| 390 | --query-duration 8 \ | ||
| 391 | --train-epochs 1 \ | ||
| 392 | --batch-size 2 \ | ||
| 393 | --device cpu \ | ||
| 394 | --strategies high_energy hybrid \ | ||
| 395 | --max-test-queries 8 \ | ||
| 396 | --seed 42 \ | ||
| 397 | --output-json /tmp/ab_smoke_bucketed_semantic/report.json | ||
| 398 | ``` | ... | ... |
| ... | @@ -255,6 +255,7 @@ | ... | @@ -255,6 +255,7 @@ |
| 255 | 255 | ||
| 256 | ### 最优先待办 | 256 | ### 最优先待办 |
| 257 | 1. 把已完成的 toy bucket baseline 升级为语义 bucket(风格 / 结构 / hard-case)。 | 257 | 1. 把已完成的 toy bucket baseline 升级为语义 bucket(风格 / 结构 / hard-case)。 |
| 258 | - 模板:`acr-engine/configs/buckets/fma_semantic_bucket_template.json` | ||
| 258 | 2. 对比 cap48 与 cap64 的不一致现象,补充分规模结论。 | 259 | 2. 对比 cap48 与 cap64 的不一致现象,补充分规模结论。 |
| 259 | 3. 继续补 cap64 multi-seed,而不是只保留单 seed。 | 260 | 3. 继续补 cap64 multi-seed,而不是只保留单 seed。 |
| 260 | 4. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性。 | 261 | 4. 继续优化 `hybrid`,重点降低波动并提升 hard case 稳定性。 | ... | ... |
-
Please register or sign in to post a comment