Commit 8ed3e34e 8ed3e34ef1f40d16d160d5581a423c51b8dd57ce by cnb.bofCdSsphPA

Prioritize repeated chorus-like regions in music crop selection

Constraint: Music retrieval should sample repeated hook-like regions without adding heavyweight structure models or breaking the existing lightweight candidate stack
Rejected: Reserve repeated-section logic for a later dedicated chorus detector | delays a practical chorus-like signal that can already improve query realism today
Confidence: medium
Scope-risk: moderate
Directive: Treat repeated_section_aware as a lightweight chorus proxy; future chorus ranking should refine rather than discard these candidates
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy repeated_section_aware; handcrafted 24s repeated-motif fixture with repeated_section_aware and hybrid offset checks
Not-tested: Full end-to-end metric impact on FMA/internal datasets with repeated_section_aware enabled
1 parent d7a08944
......@@ -95,6 +95,32 @@ def compute_candidate_offsets(
step = max(1, len(offsets) // 8)
return sorted(set(offsets[::step][:8]))
if strategy == "repeated_section_aware":
hop = max(segment_len // 2, 1)
starts = list(range(0, max(len(y) - segment_len + 1, 1), hop))
if len(starts) < 2:
return starts[:1]
feats = []
for start in starts:
seg = y[start : start + segment_len]
if len(seg) < segment_len:
seg = np.pad(seg, (0, segment_len - len(seg)))
chroma = librosa.feature.chroma_cqt(y=seg, sr=sr)
feat = np.mean(chroma, axis=1)
norm = float(np.linalg.norm(feat) + 1e-12)
feats.append(feat / norm)
scores: List[tuple[float, int]] = []
for i, feat in enumerate(feats):
sims = []
for j, other in enumerate(feats):
if i == j:
continue
sims.append(float(np.dot(feat, other)))
repeat_score = max(sims) if sims else 0.0
scores.append((repeat_score, starts[i]))
scores.sort(key=lambda x: x[0], reverse=True)
return sorted(set(start for _, start in scores[: min(6, len(scores))]))
return []
......@@ -186,7 +212,7 @@ class ACRDataset(Dataset):
if self.segment_strategy == "hybrid":
candidate_pool: List[int] = []
for strategy in ("beat_aware", "high_energy", "onset_aware", "silence_aware"):
for strategy in ("repeated_section_aware", "beat_aware", "high_energy", "onset_aware", "silence_aware"):
candidate_pool.extend(
compute_candidate_offsets(
y=y,
......@@ -365,7 +391,7 @@ class SongPairDataset(Dataset):
offset = min(random.choice(direct_candidates) / self.sr, max_offset)
elif self.segment_strategy == "hybrid":
candidate_pool: List[int] = []
for strategy in ("beat_aware", "high_energy", "onset_aware", "silence_aware"):
for strategy in ("repeated_section_aware", "beat_aware", "high_energy", "onset_aware", "silence_aware"):
candidate_pool.extend(
compute_candidate_offsets(
y=full_y,
......
......@@ -516,7 +516,7 @@ def main():
p.add_argument("--eval-ratio", type=float, default=0.2)
p.add_argument("--query-duration", type=float, default=8.0)
p.add_argument("--query-stride", type=float, default=None)
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware", "hybrid"], default="random")
p.add_argument("--silence-top-db", type=int, default=30)
p.add_argument("--seed", type=int, default=42)
......@@ -548,8 +548,8 @@ def main():
p.add_argument("--eval-ratio", type=float, default=0.2)
p.add_argument("--query-duration", type=float, default=8.0)
p.add_argument("--query-stride", type=float, default=None)
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware", "hybrid"], default="random")
p.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware", "hybrid"], default="random")
p.add_argument("--silence-top-db", type=int, default=30)
p.add_argument("--index-checkpoint-every-refs", type=int, default=100)
p.add_argument("--seed", type=int, default=42)
......
......@@ -117,14 +117,14 @@ def build_train_eval_from_audio_dir(
if duration >= query_duration:
strategy_offsets = []
if query_strategy in {"silence_aware", "high_energy", "onset_aware", "beat_aware"}:
if query_strategy in {"silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware"}:
strategy_offsets = compute_strategy_offsets(path, duration, query_strategy)
elif query_strategy == "hybrid":
for strategy in ("beat_aware", "high_energy", "onset_aware", "silence_aware"):
for strategy in ("repeated_section_aware", "beat_aware", "high_energy", "onset_aware", "silence_aware"):
strategy_offsets.extend(compute_strategy_offsets(path, duration, strategy))
strategy_offsets = sorted(set(strategy_offsets))
if query_strategy in {"silence_aware", "high_energy", "onset_aware", "beat_aware"} and strategy_offsets:
if query_strategy in {"silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware"} and strategy_offsets:
offsets = strategy_offsets
elif query_strategy == "hybrid" and strategy_offsets:
if query_stride and query_stride > 0:
......@@ -277,7 +277,7 @@ def main():
p.add_argument("--eval-ratio", type=float, default=0.2)
p.add_argument("--query-duration", type=float, default=8.0)
p.add_argument("--query-stride", type=float, default=None)
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware", "hybrid"], default="random")
p.add_argument("--silence-top-db", type=int, default=30)
p.add_argument("--seed", type=int, default=42)
......
......@@ -125,7 +125,7 @@ def main():
parser.add_argument("--epochs", type=int, default=None)
parser.add_argument("--batch-size", type=int, default=None)
parser.add_argument("--lr", type=float, default=None)
parser.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
parser.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "beat_aware", "repeated_section_aware", "hybrid"], default="random")
parser.add_argument("--silence-top-db", type=int, default=30)
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
......
......@@ -5618,3 +5618,60 @@
- 下一步可继续叠加:
- repeated-section-aware
- chorus-like candidate mining
### Stage: repeated-section-aware / chorus-like candidate sampling
完成项:
-`acr-engine/src/data/dataset.py` 新增:
- `repeated_section_aware`
-`acr-engine/src/data/manifest_tools.py` 新增:
- `--query-strategy repeated_section_aware`
-`train.py``external_adapters.py` 暴露:
- `repeated_section_aware`
- 实现方式:
- 对滑窗片段提取 `chroma_cqt`
- 取窗口级平均 chroma 向量
- 计算窗口间相似度
- 优先选择“与其它窗口最相似”的片段,作为重复主段 / 副歌 hook 的轻量近似
-`hybrid` 扩展为优先复用:
- `repeated_section_aware`
- `beat_aware`
- `high_energy`
- `onset_aware`
- `silence_aware`
验证结果:
- 编译验证:
- `/usr/local/miniconda3/bin/python -m py_compile src/data/dataset.py src/data/manifest_tools.py train.py src/data/external_adapters.py`
- 人造“重复副歌”音频验证:
- 构造 `24s` 音频
- `8-12s``16-20s` 放置两段重复 motif
- 直接重复候选结果:
- `DIRECT_REPEAT_CANDIDATES_SEC`
- `6.0, 8.0, 10.0, 14.0, 16.0, 18.0`
- query 生成结果:
- `REPEAT_QUERY_OFFSETS`
- `6.0, 10.0, 14.0, 16.0, 18.0`
- `HYBRID_QUERY_OFFSETS`
- `2.016, 2.048, 2.08, 6.0, 6.048, 8.0, 8.064, 10.0, 12.789, 14.0, 15.968, 16.0`
- 训练侧偏移验证:
- `TRAIN_REPEAT_OFFSETS`
- `17.5, 0.0, 0.0, 17.5, 7.5, 2.5`
- `TRAIN_HYBRID_OFFSETS`
- `0.0, 8.032, 2.5, 2.048, 2.016, 7.5`
- 说明 repeated-section-aware 已能明显偏向重复主段周边,而 hybrid 也已吸收这类候选
- dry-run 验证:
- `train.py --data data/synthetic_v2 --dry-run --segment-strategy repeated_section_aware`
- forward/backward 成功,`Embedding shape: torch.Size([64, 192])`
结论:
- 当前项目的音乐感知切片已经从:
- 避静音
- 高能区
- 起音点
- 拍点
进一步扩展到:
- **重复主段 / 近似副歌**
- 下一步可继续做更强的:
- chorus-like multi-feature ranking
- 小规模真实数据策略 A/B 对比
......
......@@ -357,12 +357,13 @@ flowchart TD
| `high_energy` | 训练 query / 外部 query 生成 | 优先抽取 RMS 高能区,更接近副歌/主唱/强节奏段 | 是 |
| `onset_aware` | 训练 query / 外部 query 生成 | 优先靠近起音事件,减少截到拖尾/空拍 | 是 |
| `beat_aware` | 训练 query / 外部 query 生成 | 优先靠近节拍点,适合强节奏流行/电子/舞曲等 | 是 |
| `repeated_section_aware` | 训练 query / 外部 query 生成 | 优先抽取与其它窗口最相似的重复主段,近似副歌/重复 hook | 是 |
| `hybrid` | 训练 query / 外部 query 生成 | 混合 silence-aware + random,兼顾稳定性与泛化 | 是 |
推荐理解:
1. **训练不是全部随机切**
当前训练集可用 `random / silence_aware / high_energy / onset_aware / beat_aware / hybrid`
当前训练集可用 `random / silence_aware / high_energy / onset_aware / beat_aware / repeated_section_aware / hybrid`
2. **reference 建库不是随机切**
建库仍然是固定滑窗
3. **外部数据 query 生成也不是只能随机切**
......@@ -390,6 +391,7 @@ flowchart TD
- 更想贴近副歌/强节奏:`high_energy`
- 更想贴近短音起点/打点:`onset_aware`
- 更想贴近稳定节拍网格:`beat_aware`
- 更想贴近副歌/重复 hook:`repeated_section_aware`
### 外部数据 query 生成推荐
......@@ -412,6 +414,7 @@ flowchart TD
| 更想贴近副歌/主段 | `high_energy` |
| 更想贴近打点/起唱点 | `onset_aware` |
| 更想贴近规则拍点/律动骨架 | `beat_aware` |
| 更想贴近重复主段/副歌 hook | `repeated_section_aware` |
| 既要音乐感知,又要保留泛化 | `hybrid` |
---
......