Commit 43644ac8 43644ac8006f0ff089a93a251e323b64dbe6dcf7 by cnb.bofCdSsphPA

Why the docs need a first-pass fusion strategy, not just storage paths

Constraint: Phase-1 retrieval must explain how exact and semantic candidates become a ranked song list on the current 4-table schema
Rejected: Defer fusion guidance until model integration finishes | leaves implementation teams without a default ranking contract
Confidence: high
Scope-risk: narrow
Directive: Keep Phase-1 ranking docs exact-led and evidence-oriented until measured recall data justifies a different default
Tested: markdown link check on /workspace/docs after adding fusion diagrams and SQL skeletons
Not-tested: No live retrieval benchmark rerun; this change documents the intended ranking path only
1 parent 5869c876
......@@ -5,6 +5,7 @@
- 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window``audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`
- 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`
- 继续补强在线检索说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。
- 继续补充检索融合设计:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。
## 2026-06-04
......
......@@ -437,3 +437,61 @@ group by ff.song_id, s.title, s.artist_name
order by matched_windows desc
limit 20;
```
---
## 12. exact + semantic 双通道融合样例
### 12.1 融合流程图
```mermaid
flowchart TD
A[exact candidates] --> C[song aggregation]
B[semantic candidates] --> C
C --> D[rerank]
D --> E[topK song_ids]
```
### 12.2 推荐的 Phase-1 融合口径
```text
final_song_score =
0.55 * exact_score_norm
+ 0.35 * semantic_score_norm
+ 0.10 * coverage_score_norm
```
### 12.3 融合聚合 SQL 骨架
```sql
with matched as (
select ff.song_id,
ff.feature_type,
w.object_id as window_id,
w.parent_object_id as asset_id,
w.start_ms,
w.end_ms,
:score_map[ff.feature_id]::double precision as raw_score
from feature_fact ff
join audio_object w
on w.object_id = ff.object_id
and w.object_type = 'window'
where ff.feature_id = any(:matched_feature_ids)
)
select m.song_id,
s.title,
s.artist_name,
count(*) filter (where m.feature_type = 'fingerprint') as exact_hit_count,
count(*) filter (where m.feature_type = 'embedding') as semantic_hit_count,
max(raw_score) filter (where m.feature_type = 'fingerprint') as exact_best_score,
max(raw_score) filter (where m.feature_type = 'embedding') as semantic_best_score,
max(end_ms) - min(start_ms) as offset_coverage_ms
from matched m
join media_entity s
on s.entity_id = m.song_id
group by m.song_id, s.title, s.artist_name
order by coalesce(max(raw_score) filter (where m.feature_type = 'fingerprint'), 0) desc,
coalesce(max(raw_score) filter (where m.feature_type = 'embedding'), 0) desc,
offset_coverage_ms desc
limit 20;
```
......
......@@ -388,3 +388,149 @@ limit 20;
- 片段/BGM 定位
- evidence 回查
- topK song 级召回
---
## 14. exact + semantic 双通道如何融合到 song 排序
当前推荐把线上召回理解成两条并行 lane:
- **exact lane**`chromaprint` 等 fingerprint
- **semantic lane**`MERT / MuQ / fallback embedding`
二者最终都不要直接返回 `feature_id`,而是都要先回到:
```text
feature_fact -> window -> asset -> song
```
再做 `song_id` 级聚合。
### 14.1 融合流程图
```mermaid
flowchart TD
Q[query audio] --> WQ[query windows]
WQ --> E1[exact lane\nfingerprint retrieval]
WQ --> E2[semantic lane\nembedding retrieval]
E1 --> C1[exact candidates\nfeature_fact rows]
E2 --> C2[semantic candidates\nfeature_fact rows]
C1 --> N1[normalize exact scores]
C2 --> N2[normalize semantic scores]
N1 --> G[song_id aggregation]
N2 --> G
G --> R[rerank top songs]
R --> O[return topK song_ids + evidence]
```
### 14.2 song 级聚合时看什么
建议至少保留这些聚合信号:
- `exact_hit_count`
- `semantic_hit_count`
- `exact_best_score`
- `semantic_best_score`
- `matched_asset_count`
- `matched_window_count`
- `offset_coverage_ms`
- `first_hit_ms`
- `last_hit_ms`
### 14.3 一个推荐的融合口径
Phase-1 可以先用 **规则融合**,不急着上学习排序:
```text
final_song_score =
0.55 * exact_score_norm
+ 0.35 * semantic_score_norm
+ 0.10 * coverage_score_norm
```
其中:
- `exact_score_norm`:song 级 exact 命中强度
- `semantic_score_norm`:song 级 semantic 命中强度
- `coverage_score_norm`:多个 window 是否连续覆盖同一 song
### 14.4 为什么 exact 权重更高
因为当前场景是版权保护 / song-level ACR:
- exact lane 命中时通常 precision 更高
- semantic lane 更适合补召回、抗翻唱/变速/BGM 干扰
- 所以 Phase-1 更稳妥的策略是 **exact 主导、semantic 补强**
### 14.5 一个融合后的 song-level 结果表结构(逻辑视图)
```text
song_id
exact_hit_count
semantic_hit_count
exact_best_score
semantic_best_score
offset_coverage_ms
final_song_score
best_asset_id
best_window_id
best_model_name
```
### 14.6 伪 SQL 聚合模板
```sql
with matched as (
select ff.song_id,
ff.feature_type,
ff.model_name,
w.object_id as window_id,
w.parent_object_id as asset_id,
w.start_ms,
w.end_ms,
:score_map[ff.feature_id]::double precision as raw_score
from feature_fact ff
join audio_object w
on w.object_id = ff.object_id
and w.object_type = 'window'
where ff.feature_id = any(:matched_feature_ids)
), song_agg as (
select song_id,
count(*) filter (where feature_type = 'fingerprint') as exact_hit_count,
count(*) filter (where feature_type = 'embedding') as semantic_hit_count,
max(raw_score) filter (where feature_type = 'fingerprint') as exact_best_score,
max(raw_score) filter (where feature_type = 'embedding') as semantic_best_score,
count(distinct asset_id) as matched_asset_count,
count(distinct window_id) as matched_window_count,
max(end_ms) - min(start_ms) as offset_coverage_ms
from matched
group by song_id
)
select sa.song_id,
s.title,
s.artist_name,
sa.exact_hit_count,
sa.semantic_hit_count,
sa.exact_best_score,
sa.semantic_best_score,
sa.matched_asset_count,
sa.matched_window_count,
sa.offset_coverage_ms
from song_agg sa
join media_entity s on s.entity_id = sa.song_id
order by coalesce(sa.exact_best_score, 0) desc,
coalesce(sa.semantic_best_score, 0) desc,
sa.offset_coverage_ms desc
limit 20;
```
### 14.7 当前最务实的实现顺序
1. 先分别拿到 exact lane topN feature candidates
2. 再拿到 semantic lane topN feature candidates
3. 全部回查成 `song_id` 粒度
4. 在应用层做规则融合
5. 输出 `topK song_id + evidence`
这样做的好处是:
- 不要求一开始就把融合逻辑写死在数据库里
- 便于后续调权重
- 便于对比 `MERT` / `MuQ` / fallback 的增益
......