selected20_songid_eval_repro.md 3.18 KB

Raw Blame History Permalink



Selected20 SongID 检索评测复现手册


这份文档只覆盖 selected20 文件级回归评测；完整数据库主链与“歌曲入库 -> 查询 -> 交付”见 song-ingest-query-delivery.md。


1. 目标

验证目录：


/root/hikoon_song_files/output/selected_20_songs/downloads


在当前方案下：


reference = type_11
query = type_1 / type_7 / type_12 / type_16
目录名即 song_id


检查实战里是否能够正确识别回对应 song_id。


2. 当前方案口径

本专题默认评测的方案是：


exact lane：chromaprint_matcher

semantic lane：mert-v1-95m

fused：0.6 * exact + 0.4 * semantic


说明：


type_11 视为无损 reference

type_1/7/12/16 视为 query
最终评测目标是 song-level song_id 命中率


3. 目录结构假设

每首歌结构类似：

<downloads>/<song_id>/type_1/*
<downloads>/<song_id>/type_7/*
<downloads>/<song_id>/type_11/*
<downloads>/<song_id>/type_12/*
<downloads>/<song_id>/type_16/*


其中：


目录名 <song_id> 就是 ground truth

type_11 至少 1 个文件
query type 可以多文件


4. 复现命令

cd /workspace/acr-engine
/usr/local/miniconda3/bin/python scripts/evaluate_selected20_songid_retrieval.py \
  --downloads-dir /root/hikoon_song_files/output/selected_20_songs/downloads \
  --reference-type 11 \
  --query-types 1 7 12 16 \
  --duration 8.0 \
  --topk 3 \
  --exact-weight 0.6 \
  --semantic-weight 0.4 \
  --output-json /workspace/acr-engine/data/local_eval/selected20_songid_eval_report.json \
  --output-md /workspace/docs/selected20_songid_eval.md


5. 输出物


JSON 报告


acr-engine/data/local_eval/selected20_songid_eval_report.json


包含：


overall top1/top3
per-type top1/top3
每条 query 的 exact / semantic / fused 排名
失败样例


Markdown 摘要


docs/selected20_songid_eval.md


适合给研发、产品、算法同学直接阅读。


6. 评测逻辑


6.1 reference 构建


遍历每个 song_id

读取 type_11 文件作为 reference
同时建立：


chromaprint_matcher 指纹索引

mert-v1-95m song embedding


6.2 query 评估

对每个 query 文件：


走 exact：chromaprint_matcher.match(...)

走 semantic：MERT embedding 和 reference song embedding 相似度
融合：


fused_score = 0.6 * exact_norm + 0.4 * semantic_norm


输出 topk 候选
检查 true song_id 的 rank


6.3 指标


top1：正确 song_id 是否排第 1

top3：正确 song_id 是否排前 3


7. 如何看结果

优先看：


overall
per query type
failed_fused_examples


建议解释顺序：


exact 单独效果
semantic 单独效果
fused 是否优于单 lane
哪些 type 是当前主要短板


8. 后续扩展

如果后续接入 MuQ：


可以在该脚本里增加 muq lane
输出：exact / mert / muq / fused

比较 type_7 / type_12 / type_16 是否改善


如果后续切到 PostgreSQL：


可以把当前文件级 query/reference 逻辑迁移为 song -> asset -> window -> feature_fact

但这个专题脚本仍然保留，作为小样本回归评测基线