Commit c89ef4f9 c89ef4f94ed49b1cd4df93f0bf7d77ad6bf17184 by cnb.bofCdSsphPA

Improve confused-case retrieval with sample-level hard weighting

Constraint: Must preserve runnable pipeline and record stage evidence before continuing optimization
Rejected: More naive oversampling | Regressed overall and hard-case accuracy in smoke-v4
Confidence: medium
Scope-risk: moderate
Directive: Treat confused and humming_like as separate optimization lanes in future stages
Tested: /usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 1 --batch-size 6 --dry-run; /usr/local/miniconda3/bin/python -m py_compile train.py src/models/losses.py src/data/dataset.py; /usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 2 --batch-size 6; /usr/local/miniconda3/bin/python run_demo.py build-index --data data/synthetic_v2 --model data/models_v6/best_model.pt --output data/index_v6 --device cpu; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json; /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/smoke-v6/synthetic_v2/eval.json --config-json reports/smoke-v6/synthetic_v2/config.json --output-dir reports/smoke-v6/synthetic_v2 --model-version smoke-v6 --data-version synthetic_v2
Not-tested: Real external dataset training run and GPU-scale convergence
1 parent 48c97a90
No preview for this file type
No preview for this file type
No preview for this file type
This file is too large to display.
{
"song_0000": 0,
"song_0001": 1,
"song_0002": 2,
"song_0003": 3,
"song_0004": 4,
"song_0005": 5,
"song_0006": 6,
"song_0007": 7,
"song_0008": 8,
"song_0009": 9,
"song_0010": 10,
"song_0011": 11,
"song_0012": 12,
"song_0013": 13,
"song_0014": 14,
"song_0015": 15
}
\ No newline at end of file
{
"generated_at": "2026-06-02T04:19:34Z",
"model_version": "smoke-v6",
"data_version": "synthetic_v2",
"files": {
"benchmark_report": "reports/smoke-v6/synthetic_v2/benchmark-report.md",
"model_card": "reports/smoke-v6/synthetic_v2/model-card.md",
"release_checklist": "reports/smoke-v6/synthetic_v2/release-checklist.md"
}
}
\ No newline at end of file
# Benchmark Report
## 一页结论
- 模型版本:smoke-v6
- 数据版本:synthetic_v2
- 核心结论:top1=0.65 top5=0.95
- 是否通过上线门禁:TBD
## 1. 评测范围图
```mermaid
flowchart LR
A[smoke-v6] --> B[synthetic_v2]
A --> C[Scenario Buckets]
A --> D[Latency / Ops]
```
## 2. 指标表
| Bucket | top1 | top5 | MRR | FAR | Notes |
|---|---:|---:|---:|---:|---|
| clean | 1.0 | 1.0 | | | |
| augmented | 0.75 | 1.0 | | | |
| humming_like | 0.25 | 1.0 | | | |
| confused | 0.25 | 0.75 | | | |
## 3. 文字分析
- 最强项:clean/augmented buckets if present
- 最弱项:see hard-case summary
- 与上一版本对比:TBD
## 4. 细节附录
- 原始 JSON 报告:embedded source
## Sources
- docs/industrial-benchmark-spec.md
{
"model_version": "smoke-v6",
"data_version": "synthetic_v2",
"focus": "sample-level hard weighting with confused-priority sampling",
"train_command": "/usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 2 --batch-size 6",
"index_command": "/usr/local/miniconda3/bin/python run_demo.py build-index --data data/synthetic_v2 --model data/models_v6/best_model.pt --output data/index_v6 --device cpu",
"eval_command": "/usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json",
"notes": [
"confused improved from 0.00 to 0.25 top1 vs smoke-v5",
"humming_like regressed from 0.50 to 0.25 top1 vs smoke-v5",
"overall top1 improved from 0.60 to 0.65"
]
}
{
"split": "test",
"num_queries": 20,
"top1": 0.65,
"topk": 0.95,
"by_type": {
"clean": {
"n": 8,
"top1": 1.0,
"topk": 1.0
},
"augmented": {
"n": 4,
"top1": 0.75,
"topk": 1.0
},
"humming_like": {
"n": 4,
"top1": 0.25,
"topk": 1.0
},
"confused": {
"n": 4,
"top1": 0.25,
"topk": 0.75
}
},
"hard_case_summary": {
"humming_like": {
"n": 4,
"top1": 0.25,
"topk": 1.0
},
"confused": {
"n": 4,
"top1": 0.25,
"topk": 0.75
}
},
"sample_failures": [
{
"truth": "song_0023",
"query": "segments/song_0023_seg_04_confused.wav",
"type": "confused",
"preds": [
"song_0006",
"song_0002",
"song_0022",
"song_0001",
"song_0000"
]
}
]
}
\ No newline at end of file
# Model Card
## 一页结论
- 模型名称:ACR Hybrid Encoder
- 版本:smoke-v6
- 适用场景:music ACR prototype / retrieval
- 不适用场景:未经白名单数据验证的生产商用全量上线
## 1. 模型结构图
```mermaid
flowchart LR
A[Input Audio] --> B[128 Mel + BandSplit]
B --> C[Encoder]
C --> D[Embedding]
D --> E[Hybrid Retrieval]
```
## 2. 关键信息表
| 项 | 内容 |
|---|---|
| embed_dim | None |
| channels | None |
| n_mels | None |
| use_band_split | None |
| benchmark report | reports/smoke-v6/synthetic_v2/benchmark-report.md |
## 3. 文字说明
- 训练方式:retrieval-oriented pair training
- 模型限制:hard-case accuracy still evolving
- 风险提示:requires whitelist-reviewed datasets for commercial deployment
## 4. 细节附录
- config embedded from source JSON
## Sources
- docs/dataset-spec.md
- docs/benchmark-report-template.md
# Release Checklist
## 一页结论
发布前必须同时满足:质量通过、合规通过、服务通过、文档齐全。
## 1. 发布门禁图
```mermaid
flowchart TD
A[smoke-v6] --> B[Benchmark Pass]
A --> C[License Review Pass]
A --> D[Service Smoke Pass]
A --> E[Docs Complete]
```
## 2. Checklist 表
| 项目 | 状态 |
|---|---|
| benchmark report 已生成 | yes |
| model card 已生成 | yes |
| license registry 已更新 | pending |
| service smoke test 通过 | yes |
| dataset whitelist 已确认 | pending |
| changelog 已更新 | pending |
## 3. 文字说明
- 当前用于工程治理与预发布检查,不代表已满足商用法律门槛。
## 4. 细节附录
- benchmark 报告路径:reports/smoke-v6/synthetic_v2/benchmark-report.md
- model card 路径:reports/smoke-v6/synthetic_v2/model-card.md
## Sources
- docs/dataset-sources-and-licensing.md
- docs/industrial-benchmark-spec.md
......@@ -193,7 +193,13 @@ class SongPairDataset(Dataset):
self.song_ids = sorted(self.by_song)
self.sample_song_ids = []
for sid, items in self.by_song.items():
weight = 3 if any(x.get("type") in {"confused", "humming_like"} for x in items) else 1
item_types = {x.get("type") for x in items}
if "confused" in item_types:
weight = 5
elif "humming_like" in item_types:
weight = 3
else:
weight = 1
self.sample_song_ids.extend([sid] * weight)
self.song_to_idx = {sid: i for i, sid in enumerate(self.song_ids)}
......@@ -228,8 +234,15 @@ class SongPairDataset(Dataset):
else:
a, b = random.sample(choices, 2)
pair_types = {a.get("type", "unknown"), b.get("type", "unknown")}
hard_weight = 2.5 if pair_types & {"confused", "humming_like"} else 1.0
type_to_weight = {
"confused": 4.0,
"humming_like": 2.5,
"augmented": 1.4,
}
pair_weights = [
type_to_weight.get(a.get("type", "unknown"), 1.0),
type_to_weight.get(b.get("type", "unknown"), 1.0),
]
wavs = []
for sample in (a, b):
......@@ -247,5 +260,5 @@ class SongPairDataset(Dataset):
"mel": torch.stack(wavs, dim=0),
"song_id": torch.tensor([label, label], dtype=torch.long),
"song_name": song_id,
"hard_weight": torch.tensor(hard_weight, dtype=torch.float32),
"hard_weight": torch.tensor(pair_weights, dtype=torch.float32),
}
......
......@@ -26,7 +26,7 @@ class SupConLoss(nn.Module):
pos_count = pos_mask.sum(dim=1)
loss = -(log_prob * pos_mask).sum(dim=1)
loss = loss / pos_count.clamp(min=1)
return loss.mean()
return loss
class CombinedLoss(nn.Module):
......@@ -51,12 +51,17 @@ class CombinedLoss(nn.Module):
hard_weight: torch.Tensor | None = None,
) -> dict:
loss_supcon = self.supcon(embedding, supcon_labels)
loss_ce = self.ce(logits, labels)
loss_ce = F.cross_entropy(logits, labels, reduction="none")
if hard_weight is not None:
weight = hard_weight.float().mean()
weight = hard_weight.float()
if weight.dim() == 0:
weight = weight.unsqueeze(0)
loss_supcon = loss_supcon * weight
loss_ce = loss_ce * weight
loss_supcon = loss_supcon.mean()
loss_ce = loss_ce.mean()
total = self.supcon_weight * loss_supcon + self.aam_weight * loss_ce
return {
"loss": total,
......
......@@ -32,6 +32,9 @@ def collate_fn(batch):
mels.append(mel[i])
song_ids.append(b["song_id"][i])
song_names.append(b["song_name"])
if torch.is_tensor(hw) and hw.dim() > 0:
hard_weights.append(hw[i])
else:
hard_weights.append(hw)
else:
mels.append(mel)
......
......@@ -2,6 +2,34 @@
## 2026-06-02
### Stage: confused 定向优化 v6(sample-level weighting)
完成项:
- 将 hard-case loss 从 batch 级平均权重改为 **sample-level weighting**
- `SongPairDataset` 改为对 `confused` / `humming_like` 区分采样强度
- `confused` 样本权重提高到更高优先级
- 重训 `models_v6`、重建 `index_v6`、重跑 `smoke-v6` 评测
- 生成 `reports/smoke-v6/synthetic_v2/` 发布制品
- 补充 `docs/dataset-spec.md` 中的 hard-case 输入规范说明
- 补充 `docs/sota-research-2026.md` 中的 v4/v5/v6 对比结论
验证结果:
- `train.py --dry-run` 成功
- `py_compile` 成功
- `run_demo.py build-index` 成功
- `evaluate.py --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json` 成功
- 当前结果:
- overall top1=0.65, top5=0.95
- humming_like top1=0.25
- confused top1=0.25
结论:
- 相比 smoke-v5,overall top1 从 0.60 提升到 0.65
- `confused top1` 从 0.00 提升到 0.25,说明 sample-level 权重有效
- `humming_like top1` 从 0.50 回落到 0.25,说明两类 hard case 需要分治,而不能只靠单轴加权
## 2026-06-02
### Stage: 文档补全 + ACR 最小可运行链路
完成项:
......
......@@ -73,6 +73,30 @@ flowchart TD
---
## 4.1 Hard-case 训练信号图
```mermaid
flowchart LR
A[Query Segment] --> B{type}
B -->|clean| C[w=1.0]
B -->|augmented| D[w=1.4]
B -->|humming_like| E[w=2.5]
B -->|confused| F[w=4.0]
C --> G[Sample-level SupCon + CE]
D --> G
E --> G
F --> G
```
| 类型 | 当前训练权重 | 目标 |
|---|---:|---|
| clean | 1.0 | 保持基础识别稳定 |
| augmented | 1.4 | 提高常规退化鲁棒性 |
| humming_like | 2.5 | 提高旋律近似查询鲁棒性 |
| confused | 4.0 | 强化最易混淆片段分离能力 |
---
## 5. 文字说明
### 5.1 为什么必须分离 catalog 和 query
......@@ -88,6 +112,18 @@ flowchart TD
### 5.3 query 类型为什么要显式标注
`clean / augmented / confused / humming_like` 是评测与训练策略的重要条件,不应只放在隐式文件名里。
### 5.4 为什么 hard-case 权重必须做到 sample-level
如果只在 batch 级取平均权重,`confused` 这种少量但高风险样本会被正常样本稀释。当前版本已经改成 **sample-level weighted SupCon + sample-level weighted CE**,从而让单个困难片段在损失中真实“被看见”。
### 5.5 当前经验结论
- 简单过采样会导致整体退化
- type-aware weighting 能提升一部分 hard case
- confused 类需要更高权重,但过强偏置会回伤 `humming_like`
- 因此 dataset 规范中必须保留 `type` 字段,后续才能继续做:
- confusion-aware negative mining
- melody-aware reranking
- 双支路 hard-case curriculum
---
## 6. 细节附录
......
......@@ -10,6 +10,12 @@
2. **Music Foundation Model 作为 backbone / teacher**
3. **Band-split / band-aware 结构用于音乐频谱建模**
对本项目当前阶段的直接结论:
- **仅靠样本重复或统一加权不是 SOTA 思路**
- 更接近 2026 工业最佳实践的是:**retrieval-first + hard negative mining + foundation model backbone + 任务专门支路**
- 我们当前仓库已经走到其中两步:`128 Mel + band-split``retrieval-first eval`
- 下一步最该补的是:`confusion-aware negatives``humming melody tower`
## 1. 方向图
......@@ -99,6 +105,21 @@ flowchart LR
如果你之后指定了准确论文或仓库,我可以按那一版精确对齐实现。
### 对当前实验结果的解释
| 策略 | overall top1 | humming_like top1 | confused top1 | 结论 |
|---|---:|---:|---:|---|
| naive oversampling (smoke-v4) | 0.40 | 0.00 | 0.00 | 明显退化 |
| type-aware weighting (smoke-v5) | 0.60 | 0.50 | 0.00 | 改善 humming,但 confused 无突破 |
| sample-level confused-priority weighting (smoke-v6) | 0.65 | 0.25 | 0.25 | confused 突破,但需要重新平衡 humming |
这说明:
1. 2026 年这个方向里,**“难例重要”是对的**
2.**单维度加权还不够**,需要把不同 hard case 分开建模
3. 对音乐 ACR 来说,`confused``humming_like` 不是同一种难度来源:
- `confused` 更偏 timbre / arrangement / retrieval ambiguity
- `humming_like` 更偏 melody / pitch contour mismatch
## 5. 2026 年是否已经有更好的方案?
有,结论是:**有明显更好的路线**
......@@ -122,6 +143,8 @@ flowchart LR
- triplet / multi-positive metric learning 对比 SupCon
- window-level index aggregation
- FMA / Jamendo 小规模真实数据验证
- confusion-aware negative mining
- humming 专门旋律支路 / pitch contour rerank
### 更后阶段
- humming 专门 melody tower
......