Improve confused-case retrieval with sample-level hard weighting
Constraint: Must preserve runnable pipeline and record stage evidence before continuing optimization Rejected: More naive oversampling | Regressed overall and hard-case accuracy in smoke-v4 Confidence: medium Scope-risk: moderate Directive: Treat confused and humming_like as separate optimization lanes in future stages Tested: /usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 1 --batch-size 6 --dry-run; /usr/local/miniconda3/bin/python -m py_compile train.py src/models/losses.py src/data/dataset.py; /usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 2 --batch-size 6; /usr/local/miniconda3/bin/python run_demo.py build-index --data data/synthetic_v2 --model data/models_v6/best_model.pt --output data/index_v6 --device cpu; /usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json; /usr/local/miniconda3/bin/python scripts/generate_artifacts.py --eval-json reports/smoke-v6/synthetic_v2/eval.json --config-json reports/smoke-v6/synthetic_v2/config.json --output-dir reports/smoke-v6/synthetic_v2 --model-version smoke-v6 --data-version synthetic_v2 Not-tested: Real external dataset training run and GPU-scale convergence
Showing
17 changed files
with
322 additions
and
8 deletions
acr-engine/data/index_v6/chromaprint.pkl
0 → 100644
No preview for this file type
acr-engine/data/index_v6/reference_embs.npy
0 → 100644
No preview for this file type
acr-engine/data/index_v6/reference_ids.npy
0 → 100644
No preview for this file type
acr-engine/data/models_v6/best_model.pt
0 → 100644
This file is too large to display.
acr-engine/data/models_v6/song_to_idx.json
0 → 100644
| 1 | { | ||
| 2 | "song_0000": 0, | ||
| 3 | "song_0001": 1, | ||
| 4 | "song_0002": 2, | ||
| 5 | "song_0003": 3, | ||
| 6 | "song_0004": 4, | ||
| 7 | "song_0005": 5, | ||
| 8 | "song_0006": 6, | ||
| 9 | "song_0007": 7, | ||
| 10 | "song_0008": 8, | ||
| 11 | "song_0009": 9, | ||
| 12 | "song_0010": 10, | ||
| 13 | "song_0011": 11, | ||
| 14 | "song_0012": 12, | ||
| 15 | "song_0013": 13, | ||
| 16 | "song_0014": 14, | ||
| 17 | "song_0015": 15 | ||
| 18 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | { | ||
| 2 | "generated_at": "2026-06-02T04:19:34Z", | ||
| 3 | "model_version": "smoke-v6", | ||
| 4 | "data_version": "synthetic_v2", | ||
| 5 | "files": { | ||
| 6 | "benchmark_report": "reports/smoke-v6/synthetic_v2/benchmark-report.md", | ||
| 7 | "model_card": "reports/smoke-v6/synthetic_v2/model-card.md", | ||
| 8 | "release_checklist": "reports/smoke-v6/synthetic_v2/release-checklist.md" | ||
| 9 | } | ||
| 10 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | # Benchmark Report | ||
| 2 | |||
| 3 | ## 一页结论 | ||
| 4 | - 模型版本:smoke-v6 | ||
| 5 | - 数据版本:synthetic_v2 | ||
| 6 | - 核心结论:top1=0.65 top5=0.95 | ||
| 7 | - 是否通过上线门禁:TBD | ||
| 8 | |||
| 9 | ## 1. 评测范围图 | ||
| 10 | |||
| 11 | ```mermaid | ||
| 12 | flowchart LR | ||
| 13 | A[smoke-v6] --> B[synthetic_v2] | ||
| 14 | A --> C[Scenario Buckets] | ||
| 15 | A --> D[Latency / Ops] | ||
| 16 | ``` | ||
| 17 | |||
| 18 | ## 2. 指标表 | ||
| 19 | |||
| 20 | | Bucket | top1 | top5 | MRR | FAR | Notes | | ||
| 21 | |---|---:|---:|---:|---:|---| | ||
| 22 | | clean | 1.0 | 1.0 | | | | | ||
| 23 | | augmented | 0.75 | 1.0 | | | | | ||
| 24 | | humming_like | 0.25 | 1.0 | | | | | ||
| 25 | | confused | 0.25 | 0.75 | | | | | ||
| 26 | |||
| 27 | ## 3. 文字分析 | ||
| 28 | - 最强项:clean/augmented buckets if present | ||
| 29 | - 最弱项:see hard-case summary | ||
| 30 | - 与上一版本对比:TBD | ||
| 31 | |||
| 32 | ## 4. 细节附录 | ||
| 33 | - 原始 JSON 报告:embedded source | ||
| 34 | |||
| 35 | ## Sources | ||
| 36 | - docs/industrial-benchmark-spec.md |
| 1 | { | ||
| 2 | "model_version": "smoke-v6", | ||
| 3 | "data_version": "synthetic_v2", | ||
| 4 | "focus": "sample-level hard weighting with confused-priority sampling", | ||
| 5 | "train_command": "/usr/local/miniconda3/bin/python train.py --data data/synthetic_v2 --output data/models_v6 --device cpu --epochs 2 --batch-size 6", | ||
| 6 | "index_command": "/usr/local/miniconda3/bin/python run_demo.py build-index --data data/synthetic_v2 --model data/models_v6/best_model.pt --output data/index_v6 --device cpu", | ||
| 7 | "eval_command": "/usr/local/miniconda3/bin/python evaluate.py --data data/synthetic_v2 --model data/models_v6/best_model.pt --index-prefix data/index_v6/reference --split test --device cpu --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json", | ||
| 8 | "notes": [ | ||
| 9 | "confused improved from 0.00 to 0.25 top1 vs smoke-v5", | ||
| 10 | "humming_like regressed from 0.50 to 0.25 top1 vs smoke-v5", | ||
| 11 | "overall top1 improved from 0.60 to 0.65" | ||
| 12 | ] | ||
| 13 | } |
| 1 | { | ||
| 2 | "split": "test", | ||
| 3 | "num_queries": 20, | ||
| 4 | "top1": 0.65, | ||
| 5 | "topk": 0.95, | ||
| 6 | "by_type": { | ||
| 7 | "clean": { | ||
| 8 | "n": 8, | ||
| 9 | "top1": 1.0, | ||
| 10 | "topk": 1.0 | ||
| 11 | }, | ||
| 12 | "augmented": { | ||
| 13 | "n": 4, | ||
| 14 | "top1": 0.75, | ||
| 15 | "topk": 1.0 | ||
| 16 | }, | ||
| 17 | "humming_like": { | ||
| 18 | "n": 4, | ||
| 19 | "top1": 0.25, | ||
| 20 | "topk": 1.0 | ||
| 21 | }, | ||
| 22 | "confused": { | ||
| 23 | "n": 4, | ||
| 24 | "top1": 0.25, | ||
| 25 | "topk": 0.75 | ||
| 26 | } | ||
| 27 | }, | ||
| 28 | "hard_case_summary": { | ||
| 29 | "humming_like": { | ||
| 30 | "n": 4, | ||
| 31 | "top1": 0.25, | ||
| 32 | "topk": 1.0 | ||
| 33 | }, | ||
| 34 | "confused": { | ||
| 35 | "n": 4, | ||
| 36 | "top1": 0.25, | ||
| 37 | "topk": 0.75 | ||
| 38 | } | ||
| 39 | }, | ||
| 40 | "sample_failures": [ | ||
| 41 | { | ||
| 42 | "truth": "song_0023", | ||
| 43 | "query": "segments/song_0023_seg_04_confused.wav", | ||
| 44 | "type": "confused", | ||
| 45 | "preds": [ | ||
| 46 | "song_0006", | ||
| 47 | "song_0002", | ||
| 48 | "song_0022", | ||
| 49 | "song_0001", | ||
| 50 | "song_0000" | ||
| 51 | ] | ||
| 52 | } | ||
| 53 | ] | ||
| 54 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | # Model Card | ||
| 2 | |||
| 3 | ## 一页结论 | ||
| 4 | - 模型名称:ACR Hybrid Encoder | ||
| 5 | - 版本:smoke-v6 | ||
| 6 | - 适用场景:music ACR prototype / retrieval | ||
| 7 | - 不适用场景:未经白名单数据验证的生产商用全量上线 | ||
| 8 | |||
| 9 | ## 1. 模型结构图 | ||
| 10 | |||
| 11 | ```mermaid | ||
| 12 | flowchart LR | ||
| 13 | A[Input Audio] --> B[128 Mel + BandSplit] | ||
| 14 | B --> C[Encoder] | ||
| 15 | C --> D[Embedding] | ||
| 16 | D --> E[Hybrid Retrieval] | ||
| 17 | ``` | ||
| 18 | |||
| 19 | ## 2. 关键信息表 | ||
| 20 | |||
| 21 | | 项 | 内容 | | ||
| 22 | |---|---| | ||
| 23 | | embed_dim | None | | ||
| 24 | | channels | None | | ||
| 25 | | n_mels | None | | ||
| 26 | | use_band_split | None | | ||
| 27 | | benchmark report | reports/smoke-v6/synthetic_v2/benchmark-report.md | | ||
| 28 | |||
| 29 | ## 3. 文字说明 | ||
| 30 | - 训练方式:retrieval-oriented pair training | ||
| 31 | - 模型限制:hard-case accuracy still evolving | ||
| 32 | - 风险提示:requires whitelist-reviewed datasets for commercial deployment | ||
| 33 | |||
| 34 | ## 4. 细节附录 | ||
| 35 | - config embedded from source JSON | ||
| 36 | |||
| 37 | ## Sources | ||
| 38 | - docs/dataset-spec.md | ||
| 39 | - docs/benchmark-report-template.md |
| 1 | # Release Checklist | ||
| 2 | |||
| 3 | ## 一页结论 | ||
| 4 | 发布前必须同时满足:质量通过、合规通过、服务通过、文档齐全。 | ||
| 5 | |||
| 6 | ## 1. 发布门禁图 | ||
| 7 | |||
| 8 | ```mermaid | ||
| 9 | flowchart TD | ||
| 10 | A[smoke-v6] --> B[Benchmark Pass] | ||
| 11 | A --> C[License Review Pass] | ||
| 12 | A --> D[Service Smoke Pass] | ||
| 13 | A --> E[Docs Complete] | ||
| 14 | ``` | ||
| 15 | |||
| 16 | ## 2. Checklist 表 | ||
| 17 | |||
| 18 | | 项目 | 状态 | | ||
| 19 | |---|---| | ||
| 20 | | benchmark report 已生成 | yes | | ||
| 21 | | model card 已生成 | yes | | ||
| 22 | | license registry 已更新 | pending | | ||
| 23 | | service smoke test 通过 | yes | | ||
| 24 | | dataset whitelist 已确认 | pending | | ||
| 25 | | changelog 已更新 | pending | | ||
| 26 | |||
| 27 | ## 3. 文字说明 | ||
| 28 | - 当前用于工程治理与预发布检查,不代表已满足商用法律门槛。 | ||
| 29 | |||
| 30 | ## 4. 细节附录 | ||
| 31 | - benchmark 报告路径:reports/smoke-v6/synthetic_v2/benchmark-report.md | ||
| 32 | - model card 路径:reports/smoke-v6/synthetic_v2/model-card.md | ||
| 33 | |||
| 34 | ## Sources | ||
| 35 | - docs/dataset-sources-and-licensing.md | ||
| 36 | - docs/industrial-benchmark-spec.md |
| ... | @@ -193,7 +193,13 @@ class SongPairDataset(Dataset): | ... | @@ -193,7 +193,13 @@ class SongPairDataset(Dataset): |
| 193 | self.song_ids = sorted(self.by_song) | 193 | self.song_ids = sorted(self.by_song) |
| 194 | self.sample_song_ids = [] | 194 | self.sample_song_ids = [] |
| 195 | for sid, items in self.by_song.items(): | 195 | for sid, items in self.by_song.items(): |
| 196 | weight = 3 if any(x.get("type") in {"confused", "humming_like"} for x in items) else 1 | 196 | item_types = {x.get("type") for x in items} |
| 197 | if "confused" in item_types: | ||
| 198 | weight = 5 | ||
| 199 | elif "humming_like" in item_types: | ||
| 200 | weight = 3 | ||
| 201 | else: | ||
| 202 | weight = 1 | ||
| 197 | self.sample_song_ids.extend([sid] * weight) | 203 | self.sample_song_ids.extend([sid] * weight) |
| 198 | self.song_to_idx = {sid: i for i, sid in enumerate(self.song_ids)} | 204 | self.song_to_idx = {sid: i for i, sid in enumerate(self.song_ids)} |
| 199 | 205 | ||
| ... | @@ -228,8 +234,15 @@ class SongPairDataset(Dataset): | ... | @@ -228,8 +234,15 @@ class SongPairDataset(Dataset): |
| 228 | else: | 234 | else: |
| 229 | a, b = random.sample(choices, 2) | 235 | a, b = random.sample(choices, 2) |
| 230 | 236 | ||
| 231 | pair_types = {a.get("type", "unknown"), b.get("type", "unknown")} | 237 | type_to_weight = { |
| 232 | hard_weight = 2.5 if pair_types & {"confused", "humming_like"} else 1.0 | 238 | "confused": 4.0, |
| 239 | "humming_like": 2.5, | ||
| 240 | "augmented": 1.4, | ||
| 241 | } | ||
| 242 | pair_weights = [ | ||
| 243 | type_to_weight.get(a.get("type", "unknown"), 1.0), | ||
| 244 | type_to_weight.get(b.get("type", "unknown"), 1.0), | ||
| 245 | ] | ||
| 233 | 246 | ||
| 234 | wavs = [] | 247 | wavs = [] |
| 235 | for sample in (a, b): | 248 | for sample in (a, b): |
| ... | @@ -247,5 +260,5 @@ class SongPairDataset(Dataset): | ... | @@ -247,5 +260,5 @@ class SongPairDataset(Dataset): |
| 247 | "mel": torch.stack(wavs, dim=0), | 260 | "mel": torch.stack(wavs, dim=0), |
| 248 | "song_id": torch.tensor([label, label], dtype=torch.long), | 261 | "song_id": torch.tensor([label, label], dtype=torch.long), |
| 249 | "song_name": song_id, | 262 | "song_name": song_id, |
| 250 | "hard_weight": torch.tensor(hard_weight, dtype=torch.float32), | 263 | "hard_weight": torch.tensor(pair_weights, dtype=torch.float32), |
| 251 | } | 264 | } | ... | ... |
| ... | @@ -26,7 +26,7 @@ class SupConLoss(nn.Module): | ... | @@ -26,7 +26,7 @@ class SupConLoss(nn.Module): |
| 26 | pos_count = pos_mask.sum(dim=1) | 26 | pos_count = pos_mask.sum(dim=1) |
| 27 | loss = -(log_prob * pos_mask).sum(dim=1) | 27 | loss = -(log_prob * pos_mask).sum(dim=1) |
| 28 | loss = loss / pos_count.clamp(min=1) | 28 | loss = loss / pos_count.clamp(min=1) |
| 29 | return loss.mean() | 29 | return loss |
| 30 | 30 | ||
| 31 | 31 | ||
| 32 | class CombinedLoss(nn.Module): | 32 | class CombinedLoss(nn.Module): |
| ... | @@ -51,12 +51,17 @@ class CombinedLoss(nn.Module): | ... | @@ -51,12 +51,17 @@ class CombinedLoss(nn.Module): |
| 51 | hard_weight: torch.Tensor | None = None, | 51 | hard_weight: torch.Tensor | None = None, |
| 52 | ) -> dict: | 52 | ) -> dict: |
| 53 | loss_supcon = self.supcon(embedding, supcon_labels) | 53 | loss_supcon = self.supcon(embedding, supcon_labels) |
| 54 | loss_ce = self.ce(logits, labels) | 54 | loss_ce = F.cross_entropy(logits, labels, reduction="none") |
| 55 | if hard_weight is not None: | 55 | if hard_weight is not None: |
| 56 | weight = hard_weight.float().mean() | 56 | weight = hard_weight.float() |
| 57 | if weight.dim() == 0: | ||
| 58 | weight = weight.unsqueeze(0) | ||
| 57 | loss_supcon = loss_supcon * weight | 59 | loss_supcon = loss_supcon * weight |
| 58 | loss_ce = loss_ce * weight | 60 | loss_ce = loss_ce * weight |
| 59 | 61 | ||
| 62 | loss_supcon = loss_supcon.mean() | ||
| 63 | loss_ce = loss_ce.mean() | ||
| 64 | |||
| 60 | total = self.supcon_weight * loss_supcon + self.aam_weight * loss_ce | 65 | total = self.supcon_weight * loss_supcon + self.aam_weight * loss_ce |
| 61 | return { | 66 | return { |
| 62 | "loss": total, | 67 | "loss": total, | ... | ... |
| ... | @@ -32,7 +32,10 @@ def collate_fn(batch): | ... | @@ -32,7 +32,10 @@ def collate_fn(batch): |
| 32 | mels.append(mel[i]) | 32 | mels.append(mel[i]) |
| 33 | song_ids.append(b["song_id"][i]) | 33 | song_ids.append(b["song_id"][i]) |
| 34 | song_names.append(b["song_name"]) | 34 | song_names.append(b["song_name"]) |
| 35 | hard_weights.append(hw) | 35 | if torch.is_tensor(hw) and hw.dim() > 0: |
| 36 | hard_weights.append(hw[i]) | ||
| 37 | else: | ||
| 38 | hard_weights.append(hw) | ||
| 36 | else: | 39 | else: |
| 37 | mels.append(mel) | 40 | mels.append(mel) |
| 38 | song_ids.append(b["song_id"]) | 41 | song_ids.append(b["song_id"]) | ... | ... |
| ... | @@ -2,6 +2,34 @@ | ... | @@ -2,6 +2,34 @@ |
| 2 | 2 | ||
| 3 | ## 2026-06-02 | 3 | ## 2026-06-02 |
| 4 | 4 | ||
| 5 | ### Stage: confused 定向优化 v6(sample-level weighting) | ||
| 6 | |||
| 7 | 完成项: | ||
| 8 | - 将 hard-case loss 从 batch 级平均权重改为 **sample-level weighting** | ||
| 9 | - `SongPairDataset` 改为对 `confused` / `humming_like` 区分采样强度 | ||
| 10 | - `confused` 样本权重提高到更高优先级 | ||
| 11 | - 重训 `models_v6`、重建 `index_v6`、重跑 `smoke-v6` 评测 | ||
| 12 | - 生成 `reports/smoke-v6/synthetic_v2/` 发布制品 | ||
| 13 | - 补充 `docs/dataset-spec.md` 中的 hard-case 输入规范说明 | ||
| 14 | - 补充 `docs/sota-research-2026.md` 中的 v4/v5/v6 对比结论 | ||
| 15 | |||
| 16 | 验证结果: | ||
| 17 | - `train.py --dry-run` 成功 | ||
| 18 | - `py_compile` 成功 | ||
| 19 | - `run_demo.py build-index` 成功 | ||
| 20 | - `evaluate.py --fast-eval --output-json reports/smoke-v6/synthetic_v2/eval.json` 成功 | ||
| 21 | - 当前结果: | ||
| 22 | - overall top1=0.65, top5=0.95 | ||
| 23 | - humming_like top1=0.25 | ||
| 24 | - confused top1=0.25 | ||
| 25 | |||
| 26 | 结论: | ||
| 27 | - 相比 smoke-v5,overall top1 从 0.60 提升到 0.65 | ||
| 28 | - `confused top1` 从 0.00 提升到 0.25,说明 sample-level 权重有效 | ||
| 29 | - `humming_like top1` 从 0.50 回落到 0.25,说明两类 hard case 需要分治,而不能只靠单轴加权 | ||
| 30 | |||
| 31 | ## 2026-06-02 | ||
| 32 | |||
| 5 | ### Stage: 文档补全 + ACR 最小可运行链路 | 33 | ### Stage: 文档补全 + ACR 最小可运行链路 |
| 6 | 34 | ||
| 7 | 完成项: | 35 | 完成项: | ... | ... |
| ... | @@ -73,6 +73,30 @@ flowchart TD | ... | @@ -73,6 +73,30 @@ flowchart TD |
| 73 | 73 | ||
| 74 | --- | 74 | --- |
| 75 | 75 | ||
| 76 | ## 4.1 Hard-case 训练信号图 | ||
| 77 | |||
| 78 | ```mermaid | ||
| 79 | flowchart LR | ||
| 80 | A[Query Segment] --> B{type} | ||
| 81 | B -->|clean| C[w=1.0] | ||
| 82 | B -->|augmented| D[w=1.4] | ||
| 83 | B -->|humming_like| E[w=2.5] | ||
| 84 | B -->|confused| F[w=4.0] | ||
| 85 | C --> G[Sample-level SupCon + CE] | ||
| 86 | D --> G | ||
| 87 | E --> G | ||
| 88 | F --> G | ||
| 89 | ``` | ||
| 90 | |||
| 91 | | 类型 | 当前训练权重 | 目标 | | ||
| 92 | |---|---:|---| | ||
| 93 | | clean | 1.0 | 保持基础识别稳定 | | ||
| 94 | | augmented | 1.4 | 提高常规退化鲁棒性 | | ||
| 95 | | humming_like | 2.5 | 提高旋律近似查询鲁棒性 | | ||
| 96 | | confused | 4.0 | 强化最易混淆片段分离能力 | | ||
| 97 | |||
| 98 | --- | ||
| 99 | |||
| 76 | ## 5. 文字说明 | 100 | ## 5. 文字说明 |
| 77 | 101 | ||
| 78 | ### 5.1 为什么必须分离 catalog 和 query | 102 | ### 5.1 为什么必须分离 catalog 和 query |
| ... | @@ -88,6 +112,18 @@ flowchart TD | ... | @@ -88,6 +112,18 @@ flowchart TD |
| 88 | ### 5.3 query 类型为什么要显式标注 | 112 | ### 5.3 query 类型为什么要显式标注 |
| 89 | `clean / augmented / confused / humming_like` 是评测与训练策略的重要条件,不应只放在隐式文件名里。 | 113 | `clean / augmented / confused / humming_like` 是评测与训练策略的重要条件,不应只放在隐式文件名里。 |
| 90 | 114 | ||
| 115 | ### 5.4 为什么 hard-case 权重必须做到 sample-level | ||
| 116 | 如果只在 batch 级取平均权重,`confused` 这种少量但高风险样本会被正常样本稀释。当前版本已经改成 **sample-level weighted SupCon + sample-level weighted CE**,从而让单个困难片段在损失中真实“被看见”。 | ||
| 117 | |||
| 118 | ### 5.5 当前经验结论 | ||
| 119 | - 简单过采样会导致整体退化 | ||
| 120 | - type-aware weighting 能提升一部分 hard case | ||
| 121 | - confused 类需要更高权重,但过强偏置会回伤 `humming_like` | ||
| 122 | - 因此 dataset 规范中必须保留 `type` 字段,后续才能继续做: | ||
| 123 | - confusion-aware negative mining | ||
| 124 | - melody-aware reranking | ||
| 125 | - 双支路 hard-case curriculum | ||
| 126 | |||
| 91 | --- | 127 | --- |
| 92 | 128 | ||
| 93 | ## 6. 细节附录 | 129 | ## 6. 细节附录 | ... | ... |
| ... | @@ -10,6 +10,12 @@ | ... | @@ -10,6 +10,12 @@ |
| 10 | 2. **Music Foundation Model 作为 backbone / teacher** | 10 | 2. **Music Foundation Model 作为 backbone / teacher** |
| 11 | 3. **Band-split / band-aware 结构用于音乐频谱建模** | 11 | 3. **Band-split / band-aware 结构用于音乐频谱建模** |
| 12 | 12 | ||
| 13 | 对本项目当前阶段的直接结论: | ||
| 14 | - **仅靠样本重复或统一加权不是 SOTA 思路** | ||
| 15 | - 更接近 2026 工业最佳实践的是:**retrieval-first + hard negative mining + foundation model backbone + 任务专门支路** | ||
| 16 | - 我们当前仓库已经走到其中两步:`128 Mel + band-split`、`retrieval-first eval` | ||
| 17 | - 下一步最该补的是:`confusion-aware negatives` 与 `humming melody tower` | ||
| 18 | |||
| 13 | 19 | ||
| 14 | ## 1. 方向图 | 20 | ## 1. 方向图 |
| 15 | 21 | ||
| ... | @@ -99,6 +105,21 @@ flowchart LR | ... | @@ -99,6 +105,21 @@ flowchart LR |
| 99 | 105 | ||
| 100 | 如果你之后指定了准确论文或仓库,我可以按那一版精确对齐实现。 | 106 | 如果你之后指定了准确论文或仓库,我可以按那一版精确对齐实现。 |
| 101 | 107 | ||
| 108 | ### 对当前实验结果的解释 | ||
| 109 | |||
| 110 | | 策略 | overall top1 | humming_like top1 | confused top1 | 结论 | | ||
| 111 | |---|---:|---:|---:|---| | ||
| 112 | | naive oversampling (smoke-v4) | 0.40 | 0.00 | 0.00 | 明显退化 | | ||
| 113 | | type-aware weighting (smoke-v5) | 0.60 | 0.50 | 0.00 | 改善 humming,但 confused 无突破 | | ||
| 114 | | sample-level confused-priority weighting (smoke-v6) | 0.65 | 0.25 | 0.25 | confused 突破,但需要重新平衡 humming | | ||
| 115 | |||
| 116 | 这说明: | ||
| 117 | 1. 2026 年这个方向里,**“难例重要”是对的** | ||
| 118 | 2. 但 **单维度加权还不够**,需要把不同 hard case 分开建模 | ||
| 119 | 3. 对音乐 ACR 来说,`confused` 与 `humming_like` 不是同一种难度来源: | ||
| 120 | - `confused` 更偏 timbre / arrangement / retrieval ambiguity | ||
| 121 | - `humming_like` 更偏 melody / pitch contour mismatch | ||
| 122 | |||
| 102 | ## 5. 2026 年是否已经有更好的方案? | 123 | ## 5. 2026 年是否已经有更好的方案? |
| 103 | 124 | ||
| 104 | 有,结论是:**有明显更好的路线**。 | 125 | 有,结论是:**有明显更好的路线**。 |
| ... | @@ -122,6 +143,8 @@ flowchart LR | ... | @@ -122,6 +143,8 @@ flowchart LR |
| 122 | - triplet / multi-positive metric learning 对比 SupCon | 143 | - triplet / multi-positive metric learning 对比 SupCon |
| 123 | - window-level index aggregation | 144 | - window-level index aggregation |
| 124 | - FMA / Jamendo 小规模真实数据验证 | 145 | - FMA / Jamendo 小规模真实数据验证 |
| 146 | - confusion-aware negative mining | ||
| 147 | - humming 专门旋律支路 / pitch contour rerank | ||
| 125 | 148 | ||
| 126 | ### 更后阶段 | 149 | ### 更后阶段 |
| 127 | - humming 专门 melody tower | 150 | - humming 专门 melody tower | ... | ... |
-
Please register or sign in to post a comment