Commit d7a08944 d7a08944003d0ceb67427dae605dfa7d46507600 by cnb.bofCdSsphPA

Align music crop sampling with rhythmic grid candidates

Constraint: Music queries often begin near stable pulse locations, but beat tracking can fail on sparse or synthetic signals and must degrade safely
Rejected: Depend on beat tracking alone for all rhythmic sampling | too brittle when beat extraction is weak or absent
Confidence: high
Scope-risk: moderate
Directive: Keep beat_aware as a lightweight candidate generator with onset fallback; future chorus/repeated-section logic should compose with beat-aware rather than bypass it
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/src/data/dataset.py acr-engine/src/data/manifest_tools.py acr-engine/train.py acr-engine/src/data/external_adapters.py; synthetic_v2 dry-run with --segment-strategy beat_aware; handcrafted 20s pulse-track fixture with beat_aware and hybrid offset checks
Not-tested: Full retraining/evaluation impact on open/internal datasets using beat_aware end-to-end
1 parent b6cdf668
......@@ -61,6 +61,40 @@ def compute_candidate_offsets(
offsets.append(start)
return sorted(set(offsets[: min(8, len(offsets))]))
if strategy == "beat_aware":
try:
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr, hop_length=512, units="frames")
beat_samples = librosa.frames_to_samples(beat_frames, hop_length=512)
except Exception:
beat_samples = np.array([], dtype=int)
if beat_samples.size == 0:
try:
onset_frames = librosa.onset.onset_detect(y=y, sr=sr, hop_length=512, units="frames")
onset_samples = librosa.frames_to_samples(onset_frames, hop_length=512)
if onset_samples.size >= 2:
diffs = np.diff(onset_samples)
median_step = int(np.median(diffs)) if diffs.size else 0
if median_step > 0:
approx = [int(onset_samples[0])]
while approx[-1] + median_step < len(y):
approx.append(approx[-1] + median_step)
beat_samples = np.array(approx, dtype=int)
elif onset_samples.size == 1:
beat_samples = onset_samples
except Exception:
beat_samples = np.array([], dtype=int)
if beat_samples.size == 0:
return []
offsets = []
max_start = max(len(y) - segment_len, 0)
for beat in beat_samples.tolist():
start = max(0, min(int(beat), max_start))
offsets.append(start)
if not offsets:
return []
step = max(1, len(offsets) // 8)
return sorted(set(offsets[::step][:8]))
return []
......@@ -152,7 +186,7 @@ class ACRDataset(Dataset):
if self.segment_strategy == "hybrid":
candidate_pool: List[int] = []
for strategy in ("high_energy", "onset_aware", "silence_aware"):
for strategy in ("beat_aware", "high_energy", "onset_aware", "silence_aware"):
candidate_pool.extend(
compute_candidate_offsets(
y=y,
......@@ -331,7 +365,7 @@ class SongPairDataset(Dataset):
offset = min(random.choice(direct_candidates) / self.sr, max_offset)
elif self.segment_strategy == "hybrid":
candidate_pool: List[int] = []
for strategy in ("high_energy", "onset_aware", "silence_aware"):
for strategy in ("beat_aware", "high_energy", "onset_aware", "silence_aware"):
candidate_pool.extend(
compute_candidate_offsets(
y=full_y,
......
......@@ -516,7 +516,7 @@ def main():
p.add_argument("--eval-ratio", type=float, default=0.2)
p.add_argument("--query-duration", type=float, default=8.0)
p.add_argument("--query-stride", type=float, default=None)
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "hybrid"], default="random")
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--silence-top-db", type=int, default=30)
p.add_argument("--seed", type=int, default=42)
......@@ -548,8 +548,8 @@ def main():
p.add_argument("--eval-ratio", type=float, default=0.2)
p.add_argument("--query-duration", type=float, default=8.0)
p.add_argument("--query-stride", type=float, default=None)
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "hybrid"], default="random")
p.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "hybrid"], default="random")
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--silence-top-db", type=int, default=30)
p.add_argument("--index-checkpoint-every-refs", type=int, default=100)
p.add_argument("--seed", type=int, default=42)
......
......@@ -117,14 +117,14 @@ def build_train_eval_from_audio_dir(
if duration >= query_duration:
strategy_offsets = []
if query_strategy in {"silence_aware", "high_energy", "onset_aware"}:
if query_strategy in {"silence_aware", "high_energy", "onset_aware", "beat_aware"}:
strategy_offsets = compute_strategy_offsets(path, duration, query_strategy)
elif query_strategy == "hybrid":
for strategy in ("high_energy", "onset_aware", "silence_aware"):
for strategy in ("beat_aware", "high_energy", "onset_aware", "silence_aware"):
strategy_offsets.extend(compute_strategy_offsets(path, duration, strategy))
strategy_offsets = sorted(set(strategy_offsets))
if query_strategy in {"silence_aware", "high_energy", "onset_aware"} and strategy_offsets:
if query_strategy in {"silence_aware", "high_energy", "onset_aware", "beat_aware"} and strategy_offsets:
offsets = strategy_offsets
elif query_strategy == "hybrid" and strategy_offsets:
if query_stride and query_stride > 0:
......@@ -277,7 +277,7 @@ def main():
p.add_argument("--eval-ratio", type=float, default=0.2)
p.add_argument("--query-duration", type=float, default=8.0)
p.add_argument("--query-stride", type=float, default=None)
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "hybrid"], default="random")
p.add_argument("--query-strategy", choices=["random", "sliding", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
p.add_argument("--silence-top-db", type=int, default=30)
p.add_argument("--seed", type=int, default=42)
......
......@@ -125,7 +125,7 @@ def main():
parser.add_argument("--epochs", type=int, default=None)
parser.add_argument("--batch-size", type=int, default=None)
parser.add_argument("--lr", type=float, default=None)
parser.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "hybrid"], default="random")
parser.add_argument("--segment-strategy", choices=["random", "silence_aware", "high_energy", "onset_aware", "beat_aware", "hybrid"], default="random")
parser.add_argument("--silence-top-db", type=int, default=30)
parser.add_argument("--dry-run", action="store_true")
args = parser.parse_args()
......
......@@ -5569,3 +5569,52 @@
- beat-aware
- chorus-aware
- repeated-section-aware
### Stage: beat-aware music segmentation
完成项:
-`acr-engine/src/data/dataset.py` 新增:
- `beat_aware` 候选切片策略
-`acr-engine/src/data/manifest_tools.py` 新增:
- `--query-strategy beat_aware`
-`train.py``external_adapters.py` 暴露:
- `beat_aware` 选项
-`beat_aware` 增加容错:
- 优先使用 `librosa.beat.beat_track`
- 若 beat 检测失败,则回退到 onset 间隔估计生成近似节拍点
-`hybrid` 扩展为优先复用:
- `beat_aware`
- `high_energy`
- `onset_aware`
- `silence_aware`
验证结果:
- 编译验证:
- `/usr/local/miniconda3/bin/python -m py_compile src/data/dataset.py src/data/manifest_tools.py train.py src/data/external_adapters.py`
- 人造节拍音频验证:
- 构造 `20s` 音频
- `4s-18s` 区间每 `0.5s` 注入一次脉冲(约 120 BPM)
- 再叠加轻微 tonal bed
- 直接 beat 候选结果:
- `DIRECT_BEAT_CANDIDATES_SEC`
- `4.032, 5.952, 7.872, 9.792, 11.712, 13.632, 15.0`
- query 生成结果:
- `BEAT_QUERY_OFFSETS`
- `4.032, 7.872, 9.792, 11.712, 13.632, 15.0`
- `HYBRID_QUERY_OFFSETS`
- `3.968, 4.032, 4.064, 4.544, 5.0, 5.536, 6.016, 6.048, 7.872, 9.591, 9.792, 10.0`
- 训练侧偏移验证:
- `TRAIN_BEAT_AWARE_OFFSETS`
- `13.632, 4.032, 4.032, 13.632, 7.872, 5.952`
- `TRAIN_HYBRID_OFFSETS`
- `2.5, 5.536, 4.064, 12.5, 7.872, 4.032`
- 说明 beat-aware 已明显偏向规则拍点,hybrid 也已吸收 beat-aware 候选
- dry-run 验证:
- `train.py --data data/synthetic_v2 --dry-run --segment-strategy beat_aware`
- forward/backward 成功,`Embedding shape: torch.Size([64, 192])`
结论:
- 当前项目的音乐感知切片已经进一步从“高能/起音”扩展到“规则拍点”
- 下一步可继续叠加:
- repeated-section-aware
- chorus-like candidate mining
......
......@@ -356,12 +356,13 @@ flowchart TD
| `silence_aware` | 训练 query / 外部 query 生成 | 优先避开静音,落到真正有音乐内容的片段 | 是 |
| `high_energy` | 训练 query / 外部 query 生成 | 优先抽取 RMS 高能区,更接近副歌/主唱/强节奏段 | 是 |
| `onset_aware` | 训练 query / 外部 query 生成 | 优先靠近起音事件,减少截到拖尾/空拍 | 是 |
| `beat_aware` | 训练 query / 外部 query 生成 | 优先靠近节拍点,适合强节奏流行/电子/舞曲等 | 是 |
| `hybrid` | 训练 query / 外部 query 生成 | 混合 silence-aware + random,兼顾稳定性与泛化 | 是 |
推荐理解:
1. **训练不是全部随机切**
当前训练集可用 `random / silence_aware / high_energy / onset_aware / hybrid`
当前训练集可用 `random / silence_aware / high_energy / onset_aware / beat_aware / hybrid`
2. **reference 建库不是随机切**
建库仍然是固定滑窗
3. **外部数据 query 生成也不是只能随机切**
......@@ -388,6 +389,7 @@ flowchart TD
- 已知原始音频静音很多:`silence_aware`
- 更想贴近副歌/强节奏:`high_energy`
- 更想贴近短音起点/打点:`onset_aware`
- 更想贴近稳定节拍网格:`beat_aware`
### 外部数据 query 生成推荐
......@@ -409,6 +411,7 @@ flowchart TD
| 录音静音头尾很多 | `silence_aware` |
| 更想贴近副歌/主段 | `high_energy` |
| 更想贴近打点/起唱点 | `onset_aware` |
| 更想贴近规则拍点/律动骨架 | `beat_aware` |
| 既要音乐感知,又要保留泛化 | `hybrid` |
---
......