Why the docs need a first-pass fusion strategy, not just storage paths
Constraint: Phase-1 retrieval must explain how exact and semantic candidates become a ranked song list on the current 4-table schema Rejected: Defer fusion guidance until model integration finishes | leaves implementation teams without a default ranking contract Confidence: high Scope-risk: narrow Directive: Keep Phase-1 ranking docs exact-led and evidence-oriented until measured recall data justifies a different default Tested: markdown link check on /workspace/docs after adding fusion diagrams and SQL skeletons Not-tested: No live retrieval benchmark rerun; this change documents the intended ranking path only
Showing
3 changed files
with
205 additions
and
0 deletions
| ... | @@ -5,6 +5,7 @@ | ... | @@ -5,6 +5,7 @@ |
| 5 | - 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window` 落 `audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`。 | 5 | - 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window` 落 `audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`。 |
| 6 | - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`。 | 6 | - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`。 |
| 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 | 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 |
| 8 | - 继续补充检索融合设计:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。 | ||
| 8 | 9 | ||
| 9 | ## 2026-06-04 | 10 | ## 2026-06-04 |
| 10 | 11 | ... | ... |
| ... | @@ -437,3 +437,61 @@ group by ff.song_id, s.title, s.artist_name | ... | @@ -437,3 +437,61 @@ group by ff.song_id, s.title, s.artist_name |
| 437 | order by matched_windows desc | 437 | order by matched_windows desc |
| 438 | limit 20; | 438 | limit 20; |
| 439 | ``` | 439 | ``` |
| 440 | |||
| 441 | --- | ||
| 442 | |||
| 443 | ## 12. exact + semantic 双通道融合样例 | ||
| 444 | |||
| 445 | ### 12.1 融合流程图 | ||
| 446 | |||
| 447 | ```mermaid | ||
| 448 | flowchart TD | ||
| 449 | A[exact candidates] --> C[song aggregation] | ||
| 450 | B[semantic candidates] --> C | ||
| 451 | C --> D[rerank] | ||
| 452 | D --> E[topK song_ids] | ||
| 453 | ``` | ||
| 454 | |||
| 455 | ### 12.2 推荐的 Phase-1 融合口径 | ||
| 456 | |||
| 457 | ```text | ||
| 458 | final_song_score = | ||
| 459 | 0.55 * exact_score_norm | ||
| 460 | + 0.35 * semantic_score_norm | ||
| 461 | + 0.10 * coverage_score_norm | ||
| 462 | ``` | ||
| 463 | |||
| 464 | ### 12.3 融合聚合 SQL 骨架 | ||
| 465 | |||
| 466 | ```sql | ||
| 467 | with matched as ( | ||
| 468 | select ff.song_id, | ||
| 469 | ff.feature_type, | ||
| 470 | w.object_id as window_id, | ||
| 471 | w.parent_object_id as asset_id, | ||
| 472 | w.start_ms, | ||
| 473 | w.end_ms, | ||
| 474 | :score_map[ff.feature_id]::double precision as raw_score | ||
| 475 | from feature_fact ff | ||
| 476 | join audio_object w | ||
| 477 | on w.object_id = ff.object_id | ||
| 478 | and w.object_type = 'window' | ||
| 479 | where ff.feature_id = any(:matched_feature_ids) | ||
| 480 | ) | ||
| 481 | select m.song_id, | ||
| 482 | s.title, | ||
| 483 | s.artist_name, | ||
| 484 | count(*) filter (where m.feature_type = 'fingerprint') as exact_hit_count, | ||
| 485 | count(*) filter (where m.feature_type = 'embedding') as semantic_hit_count, | ||
| 486 | max(raw_score) filter (where m.feature_type = 'fingerprint') as exact_best_score, | ||
| 487 | max(raw_score) filter (where m.feature_type = 'embedding') as semantic_best_score, | ||
| 488 | max(end_ms) - min(start_ms) as offset_coverage_ms | ||
| 489 | from matched m | ||
| 490 | join media_entity s | ||
| 491 | on s.entity_id = m.song_id | ||
| 492 | group by m.song_id, s.title, s.artist_name | ||
| 493 | order by coalesce(max(raw_score) filter (where m.feature_type = 'fingerprint'), 0) desc, | ||
| 494 | coalesce(max(raw_score) filter (where m.feature_type = 'embedding'), 0) desc, | ||
| 495 | offset_coverage_ms desc | ||
| 496 | limit 20; | ||
| 497 | ``` | ... | ... |
| ... | @@ -388,3 +388,149 @@ limit 20; | ... | @@ -388,3 +388,149 @@ limit 20; |
| 388 | - 片段/BGM 定位 | 388 | - 片段/BGM 定位 |
| 389 | - evidence 回查 | 389 | - evidence 回查 |
| 390 | - topK song 级召回 | 390 | - topK song 级召回 |
| 391 | |||
| 392 | --- | ||
| 393 | |||
| 394 | ## 14. exact + semantic 双通道如何融合到 song 排序 | ||
| 395 | |||
| 396 | 当前推荐把线上召回理解成两条并行 lane: | ||
| 397 | |||
| 398 | - **exact lane**:`chromaprint` 等 fingerprint | ||
| 399 | - **semantic lane**:`MERT / MuQ / fallback embedding` | ||
| 400 | |||
| 401 | 二者最终都不要直接返回 `feature_id`,而是都要先回到: | ||
| 402 | |||
| 403 | ```text | ||
| 404 | feature_fact -> window -> asset -> song | ||
| 405 | ``` | ||
| 406 | |||
| 407 | 再做 `song_id` 级聚合。 | ||
| 408 | |||
| 409 | ### 14.1 融合流程图 | ||
| 410 | |||
| 411 | ```mermaid | ||
| 412 | flowchart TD | ||
| 413 | Q[query audio] --> WQ[query windows] | ||
| 414 | WQ --> E1[exact lane\nfingerprint retrieval] | ||
| 415 | WQ --> E2[semantic lane\nembedding retrieval] | ||
| 416 | E1 --> C1[exact candidates\nfeature_fact rows] | ||
| 417 | E2 --> C2[semantic candidates\nfeature_fact rows] | ||
| 418 | C1 --> N1[normalize exact scores] | ||
| 419 | C2 --> N2[normalize semantic scores] | ||
| 420 | N1 --> G[song_id aggregation] | ||
| 421 | N2 --> G | ||
| 422 | G --> R[rerank top songs] | ||
| 423 | R --> O[return topK song_ids + evidence] | ||
| 424 | ``` | ||
| 425 | |||
| 426 | ### 14.2 song 级聚合时看什么 | ||
| 427 | |||
| 428 | 建议至少保留这些聚合信号: | ||
| 429 | |||
| 430 | - `exact_hit_count` | ||
| 431 | - `semantic_hit_count` | ||
| 432 | - `exact_best_score` | ||
| 433 | - `semantic_best_score` | ||
| 434 | - `matched_asset_count` | ||
| 435 | - `matched_window_count` | ||
| 436 | - `offset_coverage_ms` | ||
| 437 | - `first_hit_ms` | ||
| 438 | - `last_hit_ms` | ||
| 439 | |||
| 440 | ### 14.3 一个推荐的融合口径 | ||
| 441 | |||
| 442 | Phase-1 可以先用 **规则融合**,不急着上学习排序: | ||
| 443 | |||
| 444 | ```text | ||
| 445 | final_song_score = | ||
| 446 | 0.55 * exact_score_norm | ||
| 447 | + 0.35 * semantic_score_norm | ||
| 448 | + 0.10 * coverage_score_norm | ||
| 449 | ``` | ||
| 450 | |||
| 451 | 其中: | ||
| 452 | - `exact_score_norm`:song 级 exact 命中强度 | ||
| 453 | - `semantic_score_norm`:song 级 semantic 命中强度 | ||
| 454 | - `coverage_score_norm`:多个 window 是否连续覆盖同一 song | ||
| 455 | |||
| 456 | ### 14.4 为什么 exact 权重更高 | ||
| 457 | |||
| 458 | 因为当前场景是版权保护 / song-level ACR: | ||
| 459 | - exact lane 命中时通常 precision 更高 | ||
| 460 | - semantic lane 更适合补召回、抗翻唱/变速/BGM 干扰 | ||
| 461 | - 所以 Phase-1 更稳妥的策略是 **exact 主导、semantic 补强** | ||
| 462 | |||
| 463 | ### 14.5 一个融合后的 song-level 结果表结构(逻辑视图) | ||
| 464 | |||
| 465 | ```text | ||
| 466 | song_id | ||
| 467 | exact_hit_count | ||
| 468 | semantic_hit_count | ||
| 469 | exact_best_score | ||
| 470 | semantic_best_score | ||
| 471 | offset_coverage_ms | ||
| 472 | final_song_score | ||
| 473 | best_asset_id | ||
| 474 | best_window_id | ||
| 475 | best_model_name | ||
| 476 | ``` | ||
| 477 | |||
| 478 | ### 14.6 伪 SQL 聚合模板 | ||
| 479 | |||
| 480 | ```sql | ||
| 481 | with matched as ( | ||
| 482 | select ff.song_id, | ||
| 483 | ff.feature_type, | ||
| 484 | ff.model_name, | ||
| 485 | w.object_id as window_id, | ||
| 486 | w.parent_object_id as asset_id, | ||
| 487 | w.start_ms, | ||
| 488 | w.end_ms, | ||
| 489 | :score_map[ff.feature_id]::double precision as raw_score | ||
| 490 | from feature_fact ff | ||
| 491 | join audio_object w | ||
| 492 | on w.object_id = ff.object_id | ||
| 493 | and w.object_type = 'window' | ||
| 494 | where ff.feature_id = any(:matched_feature_ids) | ||
| 495 | ), song_agg as ( | ||
| 496 | select song_id, | ||
| 497 | count(*) filter (where feature_type = 'fingerprint') as exact_hit_count, | ||
| 498 | count(*) filter (where feature_type = 'embedding') as semantic_hit_count, | ||
| 499 | max(raw_score) filter (where feature_type = 'fingerprint') as exact_best_score, | ||
| 500 | max(raw_score) filter (where feature_type = 'embedding') as semantic_best_score, | ||
| 501 | count(distinct asset_id) as matched_asset_count, | ||
| 502 | count(distinct window_id) as matched_window_count, | ||
| 503 | max(end_ms) - min(start_ms) as offset_coverage_ms | ||
| 504 | from matched | ||
| 505 | group by song_id | ||
| 506 | ) | ||
| 507 | select sa.song_id, | ||
| 508 | s.title, | ||
| 509 | s.artist_name, | ||
| 510 | sa.exact_hit_count, | ||
| 511 | sa.semantic_hit_count, | ||
| 512 | sa.exact_best_score, | ||
| 513 | sa.semantic_best_score, | ||
| 514 | sa.matched_asset_count, | ||
| 515 | sa.matched_window_count, | ||
| 516 | sa.offset_coverage_ms | ||
| 517 | from song_agg sa | ||
| 518 | join media_entity s on s.entity_id = sa.song_id | ||
| 519 | order by coalesce(sa.exact_best_score, 0) desc, | ||
| 520 | coalesce(sa.semantic_best_score, 0) desc, | ||
| 521 | sa.offset_coverage_ms desc | ||
| 522 | limit 20; | ||
| 523 | ``` | ||
| 524 | |||
| 525 | ### 14.7 当前最务实的实现顺序 | ||
| 526 | |||
| 527 | 1. 先分别拿到 exact lane topN feature candidates | ||
| 528 | 2. 再拿到 semantic lane topN feature candidates | ||
| 529 | 3. 全部回查成 `song_id` 粒度 | ||
| 530 | 4. 在应用层做规则融合 | ||
| 531 | 5. 输出 `topK song_id + evidence` | ||
| 532 | |||
| 533 | 这样做的好处是: | ||
| 534 | - 不要求一开始就把融合逻辑写死在数据库里 | ||
| 535 | - 便于后续调权重 | ||
| 536 | - 便于对比 `MERT` / `MuQ` / fallback 的增益 | ... | ... |
-
Please register or sign in to post a comment