Commit 43644ac8 43644ac8006f0ff089a93a251e323b64dbe6dcf7 by cnb.bofCdSsphPA

Why the docs need a first-pass fusion strategy, not just storage paths

Constraint: Phase-1 retrieval must explain how exact and semantic candidates become a ranked song list on the current 4-table schema
Rejected: Defer fusion guidance until model integration finishes | leaves implementation teams without a default ranking contract
Confidence: high
Scope-risk: narrow
Directive: Keep Phase-1 ranking docs exact-led and evidence-oriented until measured recall data justifies a different default
Tested: markdown link check on /workspace/docs after adding fusion diagrams and SQL skeletons
Not-tested: No live retrieval benchmark rerun; this change documents the intended ranking path only
1 parent 5869c876
...@@ -5,6 +5,7 @@ ...@@ -5,6 +5,7 @@
5 - 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window``audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact` 5 - 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window``audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`
6 - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership` 6 - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`
7 - 继续补强在线检索说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 7 - 继续补强在线检索说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。
8 - 继续补充检索融合设计:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。
8 9
9 ## 2026-06-04 10 ## 2026-06-04
10 11
......
...@@ -437,3 +437,61 @@ group by ff.song_id, s.title, s.artist_name ...@@ -437,3 +437,61 @@ group by ff.song_id, s.title, s.artist_name
437 order by matched_windows desc 437 order by matched_windows desc
438 limit 20; 438 limit 20;
439 ``` 439 ```
440
441 ---
442
443 ## 12. exact + semantic 双通道融合样例
444
445 ### 12.1 融合流程图
446
447 ```mermaid
448 flowchart TD
449 A[exact candidates] --> C[song aggregation]
450 B[semantic candidates] --> C
451 C --> D[rerank]
452 D --> E[topK song_ids]
453 ```
454
455 ### 12.2 推荐的 Phase-1 融合口径
456
457 ```text
458 final_song_score =
459 0.55 * exact_score_norm
460 + 0.35 * semantic_score_norm
461 + 0.10 * coverage_score_norm
462 ```
463
464 ### 12.3 融合聚合 SQL 骨架
465
466 ```sql
467 with matched as (
468 select ff.song_id,
469 ff.feature_type,
470 w.object_id as window_id,
471 w.parent_object_id as asset_id,
472 w.start_ms,
473 w.end_ms,
474 :score_map[ff.feature_id]::double precision as raw_score
475 from feature_fact ff
476 join audio_object w
477 on w.object_id = ff.object_id
478 and w.object_type = 'window'
479 where ff.feature_id = any(:matched_feature_ids)
480 )
481 select m.song_id,
482 s.title,
483 s.artist_name,
484 count(*) filter (where m.feature_type = 'fingerprint') as exact_hit_count,
485 count(*) filter (where m.feature_type = 'embedding') as semantic_hit_count,
486 max(raw_score) filter (where m.feature_type = 'fingerprint') as exact_best_score,
487 max(raw_score) filter (where m.feature_type = 'embedding') as semantic_best_score,
488 max(end_ms) - min(start_ms) as offset_coverage_ms
489 from matched m
490 join media_entity s
491 on s.entity_id = m.song_id
492 group by m.song_id, s.title, s.artist_name
493 order by coalesce(max(raw_score) filter (where m.feature_type = 'fingerprint'), 0) desc,
494 coalesce(max(raw_score) filter (where m.feature_type = 'embedding'), 0) desc,
495 offset_coverage_ms desc
496 limit 20;
497 ```
......
...@@ -388,3 +388,149 @@ limit 20; ...@@ -388,3 +388,149 @@ limit 20;
388 - 片段/BGM 定位 388 - 片段/BGM 定位
389 - evidence 回查 389 - evidence 回查
390 - topK song 级召回 390 - topK song 级召回
391
392 ---
393
394 ## 14. exact + semantic 双通道如何融合到 song 排序
395
396 当前推荐把线上召回理解成两条并行 lane:
397
398 - **exact lane**`chromaprint` 等 fingerprint
399 - **semantic lane**`MERT / MuQ / fallback embedding`
400
401 二者最终都不要直接返回 `feature_id`,而是都要先回到:
402
403 ```text
404 feature_fact -> window -> asset -> song
405 ```
406
407 再做 `song_id` 级聚合。
408
409 ### 14.1 融合流程图
410
411 ```mermaid
412 flowchart TD
413 Q[query audio] --> WQ[query windows]
414 WQ --> E1[exact lane\nfingerprint retrieval]
415 WQ --> E2[semantic lane\nembedding retrieval]
416 E1 --> C1[exact candidates\nfeature_fact rows]
417 E2 --> C2[semantic candidates\nfeature_fact rows]
418 C1 --> N1[normalize exact scores]
419 C2 --> N2[normalize semantic scores]
420 N1 --> G[song_id aggregation]
421 N2 --> G
422 G --> R[rerank top songs]
423 R --> O[return topK song_ids + evidence]
424 ```
425
426 ### 14.2 song 级聚合时看什么
427
428 建议至少保留这些聚合信号:
429
430 - `exact_hit_count`
431 - `semantic_hit_count`
432 - `exact_best_score`
433 - `semantic_best_score`
434 - `matched_asset_count`
435 - `matched_window_count`
436 - `offset_coverage_ms`
437 - `first_hit_ms`
438 - `last_hit_ms`
439
440 ### 14.3 一个推荐的融合口径
441
442 Phase-1 可以先用 **规则融合**,不急着上学习排序:
443
444 ```text
445 final_song_score =
446 0.55 * exact_score_norm
447 + 0.35 * semantic_score_norm
448 + 0.10 * coverage_score_norm
449 ```
450
451 其中:
452 - `exact_score_norm`:song 级 exact 命中强度
453 - `semantic_score_norm`:song 级 semantic 命中强度
454 - `coverage_score_norm`:多个 window 是否连续覆盖同一 song
455
456 ### 14.4 为什么 exact 权重更高
457
458 因为当前场景是版权保护 / song-level ACR:
459 - exact lane 命中时通常 precision 更高
460 - semantic lane 更适合补召回、抗翻唱/变速/BGM 干扰
461 - 所以 Phase-1 更稳妥的策略是 **exact 主导、semantic 补强**
462
463 ### 14.5 一个融合后的 song-level 结果表结构(逻辑视图)
464
465 ```text
466 song_id
467 exact_hit_count
468 semantic_hit_count
469 exact_best_score
470 semantic_best_score
471 offset_coverage_ms
472 final_song_score
473 best_asset_id
474 best_window_id
475 best_model_name
476 ```
477
478 ### 14.6 伪 SQL 聚合模板
479
480 ```sql
481 with matched as (
482 select ff.song_id,
483 ff.feature_type,
484 ff.model_name,
485 w.object_id as window_id,
486 w.parent_object_id as asset_id,
487 w.start_ms,
488 w.end_ms,
489 :score_map[ff.feature_id]::double precision as raw_score
490 from feature_fact ff
491 join audio_object w
492 on w.object_id = ff.object_id
493 and w.object_type = 'window'
494 where ff.feature_id = any(:matched_feature_ids)
495 ), song_agg as (
496 select song_id,
497 count(*) filter (where feature_type = 'fingerprint') as exact_hit_count,
498 count(*) filter (where feature_type = 'embedding') as semantic_hit_count,
499 max(raw_score) filter (where feature_type = 'fingerprint') as exact_best_score,
500 max(raw_score) filter (where feature_type = 'embedding') as semantic_best_score,
501 count(distinct asset_id) as matched_asset_count,
502 count(distinct window_id) as matched_window_count,
503 max(end_ms) - min(start_ms) as offset_coverage_ms
504 from matched
505 group by song_id
506 )
507 select sa.song_id,
508 s.title,
509 s.artist_name,
510 sa.exact_hit_count,
511 sa.semantic_hit_count,
512 sa.exact_best_score,
513 sa.semantic_best_score,
514 sa.matched_asset_count,
515 sa.matched_window_count,
516 sa.offset_coverage_ms
517 from song_agg sa
518 join media_entity s on s.entity_id = sa.song_id
519 order by coalesce(sa.exact_best_score, 0) desc,
520 coalesce(sa.semantic_best_score, 0) desc,
521 sa.offset_coverage_ms desc
522 limit 20;
523 ```
524
525 ### 14.7 当前最务实的实现顺序
526
527 1. 先分别拿到 exact lane topN feature candidates
528 2. 再拿到 semantic lane topN feature candidates
529 3. 全部回查成 `song_id` 粒度
530 4. 在应用层做规则融合
531 5. 输出 `topK song_id + evidence`
532
533 这样做的好处是:
534 - 不要求一开始就把融合逻辑写死在数据库里
535 - 便于后续调权重
536 - 便于对比 `MERT` / `MuQ` / fallback 的增益
......