Commit 6ee8c576 6ee8c576b04b4538dcf21a113600f08e1f08adbb by cnb.bofCdSsphPA

Why the Phase-1 docs must explain feature-to-window binding explicitly

Constraint: The current default must stay aligned with the live 4-table song-centric path and the real MERT baseline
Rejected: Re-expanding old multi-layer docs | increases onboarding cost and reintroduces stale states
Confidence: high
Scope-risk: narrow
Directive: Keep future schema docs anchored to live model_name/feature_set_name facts, not aspirational placeholders
Tested: markdown link check under docs; live PostgreSQL spot-check of feature_fact model_name/object_id/song_id lineage
Not-tested: Mermaid rendering in external markdown viewers
1 parent 08d24bd4
1 # Changelog 1 # Changelog
2 2
3 ## 2026-06-04 3 ## 2026-06-04
4 - 继续收敛文档到当前 live 主链口径:补齐 `feature_fact.object_id -> audio_object(window)``window.parent_object_id -> asset``feature_fact.song_id -> media_entity(song)` 的绑定说明,并新增 manifest/SQL 双样例,专门回答 Phase-1 开源模型集合应该如何落地存储以及 feature 与 audio object 如何关联。
5 - 修正 `docs/session-handoff.md` 中关于 semantic lane 的旧状态残留,统一到当前真实事实:live 默认已落 `chromaprint_matcher + mert-v1-95m`,MuQ 仍是下一阶段 challenger。
6
7 ## 2026-06-04
4 - fresh runtime 进展:已在当前 host 成功安装 `torch-2.12.0+cpu``torchaudio-2.11.0+cpu``transformers-5.10.1`,重跑 song-centric 主链后确认 `semantic_runtime_available = true``semantic_runtime_ready_count = 5``semantic_fallback_count = 0`;当前 semantic 已从 fallback 推进到 `mert-v1-95m`,下一步可在不破坏当前 MERT 基线的前提下继续接 `MuQ` adapter。 8 - fresh runtime 进展:已在当前 host 成功安装 `torch-2.12.0+cpu``torchaudio-2.11.0+cpu``transformers-5.10.1`,重跑 song-centric 主链后确认 `semantic_runtime_available = true``semantic_runtime_ready_count = 5``semantic_fallback_count = 0`;当前 semantic 已从 fallback 推进到 `mert-v1-95m`,下一步可在不破坏当前 MERT 基线的前提下继续接 `MuQ` adapter。
5 - 新增 MuQ 接入线索固化:根据仓库现有 Phase-1 脚本与外部模型线索,下一步可优先尝试 `OpenMuQ/MuQ-large-msd-iter` 作为 MuQ challenger 的最小接入目标;官方加载入口可优先按 `from muq import MuQ` + `MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")` 9 - 新增 MuQ 接入线索固化:根据仓库现有 Phase-1 脚本与外部模型线索,下一步可优先尝试 `OpenMuQ/MuQ-large-msd-iter` 作为 MuQ challenger 的最小接入目标;官方加载入口可优先按 `from muq import MuQ` + `MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter")`
6 - fresh MuQ 进展:当前 host 已完成 `muq` 包安装,但 `import muq` 仍被 `RuntimeError: operator torchvision::nms does not exist` 卡住;当前 blocker 已从“MuQ 未安装”推进为“torchvision 兼容问题”。 10 - fresh MuQ 进展:当前 host 已完成 `muq` 包安装,但 `import muq` 仍被 `RuntimeError: operator torchvision::nms does not exist` 卡住;当前 blocker 已从“MuQ 未安装”推进为“torchvision 兼容问题”。
......
...@@ -73,8 +73,8 @@ acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127. ...@@ -73,8 +73,8 @@ acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127.
73 73
74 - [start-here.md](./start-here.md):新同学 10 分钟接手入口 74 - [start-here.md](./start-here.md):新同学 10 分钟接手入口
75 - [session-handoff.md](./session-handoff.md):下次启动从哪里继续 75 - [session-handoff.md](./session-handoff.md):下次启动从哪里继续
76 - [postgresql-data-model.md](./postgresql-data-model.md):表设计、字段语义、流程图、设计取舍 76 - [postgresql-data-model.md](./postgresql-data-model.md):表设计、字段语义、feature 与 audio_object 的绑定关系、Phase-1 模型落库口径
77 - [postgres_db_schema_samples.md](./postgres_db_schema_samples.md):DDL、样例数据、典型 SQL、导入查询链路 77 - [postgres_db_schema_samples.md](./postgres_db_schema_samples.md):DDL、manifest/SQL 样例、典型查询链路、真实存储示例
78 - [CHANGELOG.md](./CHANGELOG.md):变更历史 78 - [CHANGELOG.md](./CHANGELOG.md):变更历史
79 79
80 --- 80 ---
......
...@@ -33,14 +33,39 @@ song -> asset -> window -> fingerprint / embedding ...@@ -33,14 +33,39 @@ song -> asset -> window -> fingerprint / embedding
33 | song | `media_entity` | `entity_type='song'` | `song_000001` | 33 | song | `media_entity` | `entity_type='song'` | `song_000001` |
34 | asset | `audio_object` | `object_type='asset'` | 一首歌的原始 wav/mp3/flac | 34 | asset | `audio_object` | `object_type='asset'` | 一首歌的原始 wav/mp3/flac |
35 | window | `audio_object` | `object_type='window'` | `0-5000ms`, `2500-7500ms` | 35 | window | `audio_object` | `object_type='window'` | `0-5000ms`, `2500-7500ms` |
36 | fingerprint | `feature_fact` | `feature_type='fingerprint'` | chromaprint | 36 | fingerprint | `feature_fact` | `feature_type='fingerprint'` | chromaprint_matcher |
37 | embedding | `feature_fact` | `feature_type='embedding'` | MERT/MuQ/fallback vector | 37 | embedding | `feature_fact` | `feature_type='embedding'` | MERT/MuQ/fallback vector |
38 | model | `feature_fact` | `model_name`, `model_version` | `mert-v1-95m`, `muq-base`, `local_wavehash_embed` | 38 | model | `feature_fact` | `model_name`, `model_version` | `chromaprint_matcher`, `mert-v1-95m`, `muq-large-msd-iter`, `local_wavehash_embed` |
39 | feature set | `feature_fact` | `feature_set_name`, `feature_schema_ver` | `mert_5s_hop2.5_v1` | 39 | feature set | `feature_fact` | `feature_set_name`, `feature_schema_ver` | `mert_5s_hop2.5_v1` |
40 40
41 --- 41 ---
42 42
43 ## 3. DDL 43 ## 3. Phase-1 数据绑定一页图
44
45 ```mermaid
46 flowchart LR
47 S[media_entity
48 song] --> A[audio_object
49 asset]
50 A --> W[audio_object
51 window]
52 W --> F1[feature_fact
53 chromaprint_matcher]
54 W --> F2[feature_fact
55 mert-v1-95m]
56 W --> F3[feature_fact
57 muq-large-msd-iter 计划]
58 ```
59
60 关键绑定字段:
61 - `audio_object.song_id -> media_entity.entity_id`
62 - `window.parent_object_id -> asset.object_id`
63 - `feature_fact.object_id -> window.object_id`
64 - `feature_fact.song_id -> media_entity.entity_id`
65
66 一句话:`feature_fact` 绑的是“具体 window”,不是抽象 song;但为了快速返回结果,又会把 `song_id` 冗余写进去。
67
68 ## 4. DDL
44 69
45 ### 3.1 `media_entity` 70 ### 3.1 `media_entity`
46 71
...@@ -170,9 +195,63 @@ flowchart LR ...@@ -170,9 +195,63 @@ flowchart LR
170 195
171 --- 196 ---
172 197
173 ## 5. 样例数据 198 ## 5. 导入前的 manifest 样例
199
200 当前主链导入前,推荐就把 feature 放到 `windows[].features[]` 里:
201
202 ```json
203 {
204 "song": {"biz_key": "song_alpha", "title": "song alpha", "artist_name": "artist a"},
205 "asset": {
206 "source_type": "official",
207 "storage_uri": "/workspace/acr-engine/data/songcentric_builder_smoke/song_alpha/artist_a/clip1.wav",
208 "storage_scheme": "file",
209 "checksum": "path:/workspace/acr-engine/data/songcentric_builder_smoke/song_alpha/artist_a/clip1.wav",
210 "codec": "wav",
211 "sample_rate": 16000,
212 "channels": 1,
213 "duration_ms": 8000
214 },
215 "windows": [
216 {
217 "start_ms": 0,
218 "end_ms": 5000,
219 "features": [
220 {
221 "feature_type": "fingerprint",
222 "model_name": "chromaprint_matcher",
223 "model_version": "phase1_local",
224 "feature_set_name": "chromaprint_matcher_5s",
225 "fingerprint_value": "dc0c731425f360787f462da693ff4a50"
226 },
227 {
228 "feature_type": "embedding",
229 "model_name": "mert-v1-95m",
230 "model_version": "hf-main",
231 "feature_set_name": "mert_5s_hop2.5_v1",
232 "feature_schema_ver": "v1",
233 "embedding_dim": 768,
234 "embedding_uri": "inline-mert://19c0162d3bdde235:0:5000",
235 "vector_table_name": "audio_embedding_vector_768_placeholder"
236 }
237 ]
238 }
239 ],
240 "memberships": [
241 {"set_type": "reference_set", "set_name": "phase1_hot_reference_v1", "member_type": "asset", "priority": 100}
242 ]
243 }
244 ```
245
246 这份 JSON 的含义非常直接:
247 - `song` 决定最终要回到哪个 `song_id`
248 - `asset` 决定原始音频文件是谁
249 - `windows[]` 决定切片边界
250 - `windows[].features[]` 决定每个切片已经由哪些模型编码过
251
252 ## 6. 样例数据
174 253
175 ### 5.1 写 song 254 ### 6.1 写 song
176 255
177 ```sql 256 ```sql
178 insert into media_entity ( 257 insert into media_entity (
...@@ -184,7 +263,7 @@ insert into media_entity ( ...@@ -184,7 +263,7 @@ insert into media_entity (
184 returning entity_id; 263 returning entity_id;
185 ``` 264 ```
186 265
187 ### 5.2 写 asset 266 ### 6.2 写 asset
188 267
189 ```sql 268 ```sql
190 insert into audio_object ( 269 insert into audio_object (
...@@ -199,7 +278,7 @@ insert into audio_object ( ...@@ -199,7 +278,7 @@ insert into audio_object (
199 returning object_id; 278 returning object_id;
200 ``` 279 ```
201 280
202 ### 5.3 写 window 281 ### 6.3 写 window
203 282
204 ```sql 283 ```sql
205 insert into audio_object ( 284 insert into audio_object (
...@@ -211,7 +290,7 @@ insert into audio_object ( ...@@ -211,7 +290,7 @@ insert into audio_object (
211 returning object_id; 290 returning object_id;
212 ``` 291 ```
213 292
214 ### 5.4 写 fingerprint 293 ### 6.4 写 fingerprint
215 294
216 ```sql 295 ```sql
217 insert into feature_fact ( 296 insert into feature_fact (
...@@ -220,13 +299,13 @@ insert into feature_fact ( ...@@ -220,13 +299,13 @@ insert into feature_fact (
220 fingerprint_value, checksum, metadata_json 299 fingerprint_value, checksum, metadata_json
221 ) values ( 300 ) values (
222 'fingerprint', :window_id, :song_id, 301 'fingerprint', :window_id, :song_id,
223 'chromaprint', '1.0', 'chromaprint_5s_v1', 'v1', 302 'chromaprint_matcher', 'phase1_local', 'chromaprint_matcher_5s', 'v1',
224 'AQAAE0mUaEkSZSo...', 'sha256:fp001', 303 'AQAAE0mUaEkSZSo...', 'sha256:fp001',
225 '{"lane":"exact"}'::jsonb 304 '{"lane":"exact"}'::jsonb
226 ); 305 );
227 ``` 306 ```
228 307
229 ### 5.5 写 embedding 308 ### 6.5 写 embedding
230 309
231 ```sql 310 ```sql
232 insert into feature_fact ( 311 insert into feature_fact (
...@@ -241,7 +320,7 @@ insert into feature_fact ( ...@@ -241,7 +320,7 @@ insert into feature_fact (
241 ); 320 );
242 ``` 321 ```
243 322
244 ### 5.6 写 set membership 323 ### 6.6 写 set membership
245 324
246 ```sql 325 ```sql
247 insert into set_membership ( 326 insert into set_membership (
...@@ -254,7 +333,7 @@ insert into set_membership ( ...@@ -254,7 +333,7 @@ insert into set_membership (
254 333
255 --- 334 ---
256 335
257 ## 6. 典型查询 336 ## 7. 典型查询
258 337
259 ### 6.1 查看某首歌有哪些 asset 338 ### 6.1 查看某首歌有哪些 asset
260 339
...@@ -555,7 +634,7 @@ insert into feature_fact ( ...@@ -555,7 +634,7 @@ insert into feature_fact (
555 fingerprint_value 634 fingerprint_value
556 ) values ( 635 ) values (
557 'fingerprint', :window_id, :song_id, 636 'fingerprint', :window_id, :song_id,
558 'chromaprint', '1.0', 'chromaprint_5s_v1', 'v1', 637 'chromaprint_matcher', 'phase1_local', 'chromaprint_matcher_5s', 'v1',
559 'AQAAE0mUaEkSZSo...' 638 'AQAAE0mUaEkSZSo...'
560 ); 639 );
561 ``` 640 ```
...@@ -583,7 +662,7 @@ insert into feature_fact ( ...@@ -583,7 +662,7 @@ insert into feature_fact (
583 embedding_dim, embedding_uri, vector_table_name 662 embedding_dim, embedding_uri, vector_table_name
584 ) values ( 663 ) values (
585 'embedding', :window_id, :song_id, 664 'embedding', :window_id, :song_id,
586 'muq-base', 'hf-main', 'muq_5s_hop2.5_v1', 'v1', 665 'muq-large-msd-iter', 'hf-main', 'muq_5s_hop2.5_v1', 'v1',
587 768, 's3://bucket/emb/demo_song_win0001_muq.npy', 'audio_embedding_vector_768' 666 768, 's3://bucket/emb/demo_song_win0001_muq.npy', 'audio_embedding_vector_768'
588 ); 667 );
589 ``` 668 ```
...@@ -636,13 +715,50 @@ order by ff.feature_type, ff.model_name; ...@@ -636,13 +715,50 @@ order by ff.feature_type, ff.model_name;
636 715
637 --- 716 ---
638 717
639 ## 14. 一个完整的多 asset / 多 window / 多 model 样例 718 ## 14. 一个真实绑定查询样例
719
720 下面这条 SQL 用来回答用户最关心的问题:
721
722 > 一条 feature 是怎么和 audio object 绑定,并最终回到 `song_id` 的?
723
724 ```sql
725 select ff.feature_id,
726 ff.feature_type,
727 ff.model_name,
728 ff.model_version,
729 ff.feature_set_name,
730 w.object_id as window_id,
731 w.start_ms,
732 w.end_ms,
733 a.object_id as asset_id,
734 a.storage_uri,
735 s.entity_id as song_id,
736 s.biz_key
737 from feature_fact ff
738 join audio_object w
739 on w.object_id = ff.object_id
740 and w.object_type = 'window'
741 join audio_object a
742 on a.object_id = w.parent_object_id
743 and a.object_type = 'asset'
744 join media_entity s
745 on s.entity_id = ff.song_id
746 where ff.feature_id = :feature_id;
747 ```
748
749 你可以把它理解成 4 步:
750 1.`feature_fact` 找到这条特征
751 2.`object_id` 找到它绑定的 `window`
752 3.`parent_object_id` 找到它所属的 `asset`
753 4.`song_id` 找到最终归属的 `song`
754
755 ## 15. 一个完整的多 asset / 多 window / 多 model 样例
640 756
641 假设: 757 假设:
642 - 同一个 `song_id = 1001` 758 - 同一个 `song_id = 1001`
643 - 有 2 个音频文件:`master.wav``ugc_clip.mp3` 759 - 有 2 个音频文件:`master.wav``ugc_clip.mp3`
644 - 每个 asset 切成 2 个 window 760 - 每个 asset 切成 2 个 window
645 - 每个 window 都跑 `chromaprint + mert-v1-95m + muq-base` 761 - 每个 window 都跑 `chromaprint_matcher + mert-v1-95m + muq-large-msd-iter`
646 762
647 ### 14.1 逻辑结构 763 ### 14.1 逻辑结构
648 764
...@@ -650,22 +766,22 @@ order by ff.feature_type, ff.model_name; ...@@ -650,22 +766,22 @@ order by ff.feature_type, ff.model_name;
650 song(1001) 766 song(1001)
651 -> asset(2001, master.wav) 767 -> asset(2001, master.wav)
652 -> window(3001, 0-5000) 768 -> window(3001, 0-5000)
653 -> chromaprint 769 -> chromaprint_matcher
654 -> mert-v1-95m 770 -> mert-v1-95m
655 -> muq-base 771 -> muq-large-msd-iter
656 -> window(3002, 2500-7500) 772 -> window(3002, 2500-7500)
657 -> chromaprint 773 -> chromaprint_matcher
658 -> mert-v1-95m 774 -> mert-v1-95m
659 -> muq-base 775 -> muq-large-msd-iter
660 -> asset(2002, ugc_clip.mp3) 776 -> asset(2002, ugc_clip.mp3)
661 -> window(3003, 10000-15000) 777 -> window(3003, 10000-15000)
662 -> chromaprint 778 -> chromaprint_matcher
663 -> mert-v1-95m 779 -> mert-v1-95m
664 -> muq-base 780 -> muq-large-msd-iter
665 -> window(3004, 12500-17500) 781 -> window(3004, 12500-17500)
666 -> chromaprint 782 -> chromaprint_matcher
667 -> mert-v1-95m 783 -> mert-v1-95m
668 -> muq-base 784 -> muq-large-msd-iter
669 ``` 785 ```
670 786
671 ### 14.2 会落成多少行 787 ### 14.2 会落成多少行
...@@ -706,7 +822,7 @@ order by a.object_id, w.start_ms, ff.feature_type, ff.model_name; ...@@ -706,7 +822,7 @@ order by a.object_id, w.start_ms, ff.feature_type, ff.model_name;
706 822
707 ### 14.4 查询哪些 window 缺某个模型 823 ### 14.4 查询哪些 window 缺某个模型
708 824
709 这个 SQL 很适合做补算任务扫描,比如检查哪些 window 还没跑 `muq-base` 825 这个 SQL 很适合做补算任务扫描,比如检查哪些 window 还没跑 `muq-large-msd-iter`
710 826
711 ```sql 827 ```sql
712 select w.object_id as window_id, 828 select w.object_id as window_id,
...@@ -721,7 +837,7 @@ where w.object_type = 'window' ...@@ -721,7 +837,7 @@ where w.object_type = 'window'
721 from feature_fact ff 837 from feature_fact ff
722 where ff.object_id = w.object_id 838 where ff.object_id = w.object_id
723 and ff.feature_type = 'embedding' 839 and ff.feature_type = 'embedding'
724 and ff.model_name = 'muq-base' 840 and ff.model_name = 'muq-large-msd-iter'
725 and ff.model_version = 'hf-main' 841 and ff.model_version = 'hf-main'
726 and ff.feature_set_name = 'muq_5s_hop2.5_v1' 842 and ff.feature_set_name = 'muq_5s_hop2.5_v1'
727 ) 843 )
...@@ -746,7 +862,7 @@ order by w.start_ms; ...@@ -746,7 +862,7 @@ order by w.start_ms;
746 862
747 --- 863 ---
748 864
749 ## 15. 批量入库与索引建设样例 865 ## 16. 批量入库与索引建设样例
750 866
751 ### 15.1 推荐批量顺序 867 ### 15.1 推荐批量顺序
752 868
...@@ -756,7 +872,7 @@ batch-2: audio_object(asset) ...@@ -756,7 +872,7 @@ batch-2: audio_object(asset)
756 batch-3: audio_object(window) 872 batch-3: audio_object(window)
757 batch-4: feature_fact(chromaprint) 873 batch-4: feature_fact(chromaprint)
758 batch-5: feature_fact(mert-v1-95m) 874 batch-5: feature_fact(mert-v1-95m)
759 batch-6: feature_fact(muq-base) 875 batch-6: feature_fact(muq-large-msd-iter)
760 ``` 876 ```
761 877
762 ### 15.2 推荐补充索引 878 ### 15.2 推荐补充索引
......
...@@ -67,7 +67,68 @@ song -> asset -> window -> fingerprint / embedding ...@@ -67,7 +67,68 @@ song -> asset -> window -> fingerprint / embedding
67 | feature set identity | `feature_fact` | `feature_set_name`, `feature_schema_ver` | 区分特征配置、窗口策略、schema 版本 | 67 | feature set identity | `feature_fact` | `feature_set_name`, `feature_schema_ver` | 区分特征配置、窗口策略、schema 版本 |
68 | reference routing | `set_membership` | `set_type`, `set_name` | 控制 reference/eval/hot 范围 | 68 | reference routing | `set_membership` | `set_type`, `set_name` | 控制 reference/eval/hot 范围 |
69 69
70 ### 4.1 一个关键设计点 70 ### 4.1 feature 和 audio_object 到底怎么绑定
71
72 这是当前 schema 最关键的一层:
73
74 ```text
75 feature_fact.object_id -> audio_object.object_id
76 ```
77
78 含义:
79 - 一条 `feature_fact` 永远对应一个具体音频对象
80 - 在当前 Phase-1 主链里,这个对象默认是 `window`
81 - 所以检索命中的最小证据单元是 `window`,不是整首 song,也不是整份 asset
82
83 再往上回溯:
84
85 ```text
86 feature_fact.object_id -> window.object_id
87 window.parent_object_id -> asset.object_id
88 window.song_id / feature_fact.song_id -> media_entity.entity_id
89 ```
90
91 也就是说:
92 - `object_id` 负责绑定到“具体哪段音频”
93 - `parent_object_id` 负责回到“这段音频属于哪份 asset”
94 - `song_id` 负责快速回到“最终归属哪个 song_id”
95
96 ### 4.2 为什么 `feature_fact` 里还要冗余存 `song_id`
97
98 因为版权保护场景里,在线服务最终要快速输出 `song_id`
99
100 所以 `feature_fact.song_id` 是一个**有意的冗余字段**,目的有 3 个:
101 - 减少召回后 song-level 聚合时的 join 成本
102 - 允许直接按 `song_id + model_name + feature_type` 做覆盖率巡检
103 - 便于后续把 `window` 命中快速折叠为 song-level evidence
104
105 ### 4.3 Phase-1 默认为什么把 feature 绑到 `window` 而不是 `asset`
106
107 因为 Phase-1 的目标不是只知道“这份音频大概像谁”,而是还要保留:
108 - 命中的 offset
109 - 命中的具体 5s 片段
110 - exact / semantic 在同一时间段上的并行证据
111
112 因此默认策略是:
113 - `asset`:承载原始音频文件
114 - `window`:承载检索、匹配、回溯最小单元
115 - `feature_fact`:默认挂到 `window`
116
117 ### 4.4 一个最小链路示意
118
119 ```mermaid
120 flowchart LR
121 F[feature_fact
122 model_name=mert-v1-95m] --> W[audio_object
123 object_type=window]
124 W --> A[audio_object
125 object_type=asset]
126 W --> S[media_entity
127 entity_type=song]
128 F --> S
129 ```
130
131 ### 4.5 一个关键设计点
71 132
72 当前 **模型信息不单独放 registry 表作为默认主链依赖**,而是先直接沉淀在 `feature_fact` 133 当前 **模型信息不单独放 registry 表作为默认主链依赖**,而是先直接沉淀在 `feature_fact`
73 - 这样 Phase-1 更轻 134 - 这样 Phase-1 更轻
...@@ -610,10 +671,10 @@ flowchart TD ...@@ -610,10 +671,10 @@ flowchart TD
610 671
611 | lane | model_name | model_version | feature_type | 用途 | 672 | lane | model_name | model_version | feature_type | 用途 |
612 |---|---|---|---|---| 673 |---|---|---|---|---|
613 | exact | `chromaprint` | `1.0` | `fingerprint` | 高精度 exact 命中 | 674 | exact(当前 live) | `chromaprint_matcher` | `phase1_local` | `fingerprint` | 当前 live exact baseline |
614 | semantic baseline | `mert-v1-95m` | `hf-main` | `embedding` | song semantic baseline | 675 | semantic baseline(当前 live) | `mert-v1-95m` | `hf-main` | `embedding` | 当前 live semantic baseline |
615 | semantic challenger | `muq-base` | `hf-main` | `embedding` | cover / bgm / 复杂干扰 challenger | 676 | semantic challenger(计划) | `muq-large-msd-iter` | `hf-main` | `embedding` | 下一阶段 cover / bgm / 复杂干扰 challenger |
616 | semantic fallback | `local_wavehash_embed` | `phase1_local` | `embedding` | 当前 host 缺 runtime 时兜底 | 677 | semantic fallback | `local_wavehash_embed` | `phase1_local` | `embedding` | runtime 不可用时兜底 |
617 | historical baseline | `ecapa-tdnn` | `baseline_only` | `embedding` | 历史对比,不建议做 Phase-1 主导 | 678 | historical baseline | `ecapa-tdnn` | `baseline_only` | `embedding` | 历史对比,不建议做 Phase-1 主导 |
618 679
619 ### 16.2 建议用什么字段固化模型身份 680 ### 16.2 建议用什么字段固化模型身份
...@@ -635,17 +696,17 @@ flowchart TD ...@@ -635,17 +696,17 @@ flowchart TD
635 ``` 696 ```
636 697
637 例如: 698 例如:
638 - `chromaprint_5s_v1` 699 - `chromaprint_matcher_5s`(当前 live)
639 - `mert_5s_hop2.5_v1` 700 - `mert_5s_hop2.5_v1`(当前 live)
640 - `muq_5s_hop2.5_v1` 701 - `muq_5s_hop2.5_v1`(计划)
641 - `wavehash_5s_hop2.5_v1` 702 - `wavehash_5s_hop2.5_v1`(fallback)
642 703
643 ### 16.4 Phase-1 推荐的存储规则 704 ### 16.4 Phase-1 推荐的存储规则
644 705
645 #### exact lane 706 #### exact lane
646 - `feature_type = 'fingerprint'` 707 - `feature_type = 'fingerprint'`
647 - `fingerprint_value` 必填 708 - `fingerprint_value` 必填
648 - `model_name = 'chromaprint'` 709 - `model_name = 'chromaprint_matcher'`
649 - `embedding_uri / vector_table_name` 为空 710 - `embedding_uri / vector_table_name` 为空
650 711
651 #### semantic lane 712 #### semantic lane
...@@ -674,7 +735,7 @@ flowchart TD ...@@ -674,7 +735,7 @@ flowchart TD
674 3. 切窗并写 `audio_object(window)` 735 3. 切窗并写 `audio_object(window)`
675 4.`chromaprint`,写 `feature_fact(fingerprint)` 736 4.`chromaprint`,写 `feature_fact(fingerprint)`
676 5.`mert-v1-95m`,写 `feature_fact(embedding)` 737 5.`mert-v1-95m`,写 `feature_fact(embedding)`
677 6. `muq-base`,写 `feature_fact(embedding)` 738 6. 下一阶段接 `muq-large-msd-iter`,写 `feature_fact(embedding)`
678 7. 如果 runtime 不可用,至少写 `local_wavehash_embed` fallback 739 7. 如果 runtime 不可用,至少写 `local_wavehash_embed` fallback
679 740
680 这样最终会形成: 741 这样最终会形成:
...@@ -683,7 +744,7 @@ flowchart TD ...@@ -683,7 +744,7 @@ flowchart TD
683 同一个 window 744 同一个 window
684 -> 1 条 chromaprint fingerprint 745 -> 1 条 chromaprint fingerprint
685 -> 1 条 mert embedding 746 -> 1 条 mert embedding
686 -> 1 条 muq embedding 747 -> 1 条 muq embedding(接入后)
687 -> (可选) 1 条 fallback embedding 748 -> (可选) 1 条 fallback embedding
688 ``` 749 ```
689 750
...@@ -693,7 +754,87 @@ flowchart TD ...@@ -693,7 +754,87 @@ flowchart TD
693 754
694 --- 755 ---
695 756
696 ## 17. 100w 音频 / 30w song 的批量入库与索引建设策略 757 ## 17. 当前 live 样例:一条 feature 是怎么回到 song_id 的
758
759 下面是当前 PostgreSQL `acr_songcentric_test` 的真实主链口径:
760
761 - `feature_type = 'fingerprint'` 时,当前 live `model_name = 'chromaprint_matcher'`
762 - `feature_type = 'embedding'` 时,当前 live baseline `model_name = 'mert-v1-95m'`
763 - 历史测试里还能看到旧的 placeholder / fallback 行,但它们不是当前默认基线
764
765 ### 17.1 一个真实 manifest 样例(导入前)
766
767 ```json
768 {
769 "song": {"biz_key": "song_alpha", "title": "song alpha", "artist_name": "artist a"},
770 "asset": {"storage_uri": ".../clip1.wav", "duration_ms": 8000},
771 "windows": [
772 {
773 "start_ms": 0,
774 "end_ms": 5000,
775 "features": [
776 {
777 "feature_type": "fingerprint",
778 "model_name": "chromaprint_matcher",
779 "model_version": "phase1_local",
780 "feature_set_name": "chromaprint_matcher_5s"
781 },
782 {
783 "feature_type": "embedding",
784 "model_name": "mert-v1-95m",
785 "model_version": "hf-main",
786 "feature_set_name": "mert_5s_hop2.5_v1",
787 "embedding_dim": 768
788 }
789 ]
790 }
791 ]
792 }
793 ```
794
795 ### 17.2 导入后的绑定结果应该长什么样
796
797 ```text
798 media_entity(song_alpha)
799 -> audio_object(asset: clip1.wav)
800 -> audio_object(window: 0-5000)
801 -> feature_fact(fingerprint, chromaprint_matcher)
802 -> feature_fact(embedding, mert-v1-95m)
803 ```
804
805 ### 17.3 查询某条 feature 绑定到哪个 window / asset / song
806
807 ```sql
808 select ff.feature_id,
809 ff.feature_type,
810 ff.model_name,
811 ff.feature_set_name,
812 w.object_id as window_id,
813 w.start_ms,
814 w.end_ms,
815 a.object_id as asset_id,
816 a.storage_uri,
817 s.entity_id as song_id,
818 s.biz_key
819 from feature_fact ff
820 join audio_object w
821 on w.object_id = ff.object_id
822 and w.object_type = 'window'
823 join audio_object a
824 on a.object_id = w.parent_object_id
825 and a.object_type = 'asset'
826 join media_entity s
827 on s.entity_id = ff.song_id
828 where ff.feature_id = :feature_id;
829 ```
830
831 这条 SQL 回答的就是:
832 - 这条 feature 是哪个模型算的
833 - 它绑定的是哪个 window
834 - 这个 window 属于哪个 asset
835 - 最终应该归到哪个 `song_id`
836
837 ## 18. 100w 音频 / 30w song 的批量入库与索引建设策略
697 838
698 当前规模下,最重要的原则不是一次把所有模型都算完,而是: 839 当前规模下,最重要的原则不是一次把所有模型都算完,而是:
699 840
......
...@@ -42,7 +42,7 @@ acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127. ...@@ -42,7 +42,7 @@ acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127.
42 > **4 表 song-centric schema 已在 live PostgreSQL 上真实打通了“真实目录 -> 切片 -> exact/semantic feature enrichment -> import -> feature_fact”的宿主链。** 42 > **4 表 song-centric schema 已在 live PostgreSQL 上真实打通了“真实目录 -> 切片 -> exact/semantic feature enrichment -> import -> feature_fact”的宿主链。**
43 43
44 下一步最应该做的是: 44 下一步最应该做的是:
45 > **在不破坏这条宿主链的前提下,把 semantic lane 从 runtime-aware fallback 升级到真实 MERT / MuQ adapter。** 45 > **在不破坏这条宿主链的前提下,继续把 semantic lane 从当前 MERT baseline 扩展到 MuQ challenger。**
46 46
47 --- 47 ---
48 48
...@@ -114,7 +114,7 @@ flowchart TD ...@@ -114,7 +114,7 @@ flowchart TD
114 5. 真实目录 -> manifest -> import 已验证通过 114 5. 真实目录 -> manifest -> import 已验证通过
115 6. 真实目录 -> fingerprint enrichment -> import 已验证通过 115 6. 真实目录 -> fingerprint enrichment -> import 已验证通过
116 7. exact lane 已优先复用仓库内 `ChromaprintMatcher` 116 7. exact lane 已优先复用仓库内 `ChromaprintMatcher`
117 8. semantic lane 已 runtime-ready,当前 host 已可进入 placeholder runtime 分支 117 8. semantic lane 已真实接入 `mert-v1-95m` baseline,当前 host 的 live 主链已不再停留在 placeholder 分支
118 118
119 --- 119 ---
120 120
...@@ -169,48 +169,67 @@ flowchart TD ...@@ -169,48 +169,67 @@ flowchart TD
169 169
170 --- 170 ---
171 171
172 ## 10. 真实 semantic adapter 下一步应该接到哪里 172 ## 10. 数据关联与当前 live 落库事实
173 173
174 当前最直接的接入点已经明确: 174 当前最重要的绑定关系只有 3 条:
175 175
176 - 入口脚本:`acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py` 176 1. `feature_fact.object_id -> audio_object.object_id`
177 - 关键函数:`build_semantic_feature(...)` 177 - feature 绑定到具体音频对象
178 178 - Phase-1 默认绑定 `window`,不是直接绑定 song
179 ### 当前真实状态 179 2. `audio_object.parent_object_id -> audio_object.object_id`
180 180 - `window -> asset` 父子回溯链
181 - exact lane 已优先复用 `ChromaprintMatcher` 181 3. `feature_fact.song_id -> media_entity.entity_id`
182 - semantic lane 还没有真实接入 `MERT / MuQ` 182 - 用于快速做 song-level 聚合与最终返回 `song_id`
183 - runtime 就绪时,当前会产出: 183
184 - `model_name = mert-v1-95m` 184 可以用一句话理解:
185 - fallback 分支仍保留: 185
186 - `model_name = local_wavehash_embed` 186 > `audio_object` 说明“这段音频是谁”,`feature_fact` 说明“这段音频被哪个模型编码成了什么特征”。
187 187
188 ### fresh 依赖检查事实 188 ### 当前 live 主链已经真实落了什么
189 189
190 当前 host 仍缺: 190 当前 live 新数据已经真实落到:
191 - `torch` 191 - exact:`chromaprint_matcher / phase1_local / chromaprint_matcher_5s`
192 - `torchaudio` 192 - semantic baseline:`mert-v1-95m / hf-main / mert_5s_hop2.5_v1`
193 - `transformers` 193
194 194 当前 MuQ 状态:
195 ### 下次 session 最直接的实现顺序 195 - 目标模型:`OpenMuQ/MuQ-large-msd-iter`
196 196 - 当前 blocker:`import muq` 触发 `RuntimeError: operator torchvision::nms does not exist`
197 1. 安装 `torch / torchaudio / transformers` 197 - 结论:MuQ 仍是下一阶段 challenger,不是当前 live 默认基线
198 2.`build_semantic_feature(...)` 内接真实 `MERT``MuQ` adapter 198
199 3. 保留当前 `local_wavehash_embed` fallback 不删 199 ### 当前 manifest 形状(导入前)
200 4. 重跑: 200
201 201 ```json
202 ```bash 202 {
203 cd /workspace 203 "song": {"biz_key": "song_alpha", "title": "song alpha"},
204 /usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py \ 204 "asset": {"storage_uri": ".../clip1.wav"},
205 --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' \ 205 "windows": [
206 --schema acr_songcentric_test \ 206 {
207 --input-root acr-engine/data/songcentric_builder_smoke \ 207 "start_ms": 0,
208 --output-dir acr-engine/data/pgvector_eval/music20 208 "end_ms": 5000,
209 "features": [
210 {
211 "feature_type": "fingerprint",
212 "model_name": "chromaprint_matcher",
213 "feature_set_name": "chromaprint_matcher_5s"
214 },
215 {
216 "feature_type": "embedding",
217 "model_name": "mert-v1-95m",
218 "feature_set_name": "mert_5s_hop2.5_v1",
219 "embedding_dim": 768
220 }
221 ]
222 }
223 ]
224 }
209 ``` 225 ```
210 226
211 ### 期望看到的 fresh 指标变化 227 ### 下次 session 最直接的继续点
212
213 - `semantic_runtime_available = true`
214 - `semantic_runtime_ready_count > 0`
215 - `semantic_fallback_count` 明显下降或归零
216 228
229 1. 不要再验证 MERT 是否接上,已经接上
230 2. 直接处理 MuQ 的 `torchvision::nms` 兼容问题
231 3. 接入 `OpenMuQ/MuQ-large-msd-iter` challenger
232 4. 重跑主链 runner,确认每个 window 最终可同时看到:
233 - `chromaprint_matcher`
234 - `mert-v1-95m`
235 - `muq-large-msd-iter`(或最终统一后的 `model_name`
......
...@@ -78,6 +78,12 @@ song -> asset -> window -> fingerprint / embedding ...@@ -78,6 +78,12 @@ song -> asset -> window -> fingerprint / embedding
78 | 模型信息 | `feature_fact` | `model_name`, `model_version`, `feature_set_name` | 78 | 模型信息 | `feature_fact` | `model_name`, `model_version`, `feature_set_name` |
79 | reference/eval/hot 集 | `set_membership` | `set_type`, `set_name` | 79 | reference/eval/hot 集 | `set_membership` | `set_type`, `set_name` |
80 80
81 补充理解:
82 - `feature_fact.object_id -> audio_object.object_id`:feature 直接绑定到具体音频对象,Phase-1 默认绑 `window`
83 - `audio_object.parent_object_id`:把 `window` 回溯到它的 `asset`
84 - `feature_fact.song_id -> media_entity.entity_id`:为了 song-level 聚合与快速返回 `song_id` 做的冗余固化
85 - 如果你只想看这一层的详细解释,直接读 [postgresql-data-model.md](./postgresql-data-model.md) 第 4 节和 [postgres_db_schema_samples.md](./postgres_db_schema_samples.md) 第 5 节。
86
81 --- 87 ---
82 88
83 ## 5. 当前主链流程图 89 ## 5. 当前主链流程图
...@@ -99,8 +105,9 @@ flowchart TD ...@@ -99,8 +105,9 @@ flowchart TD
99 - live PostgreSQL schema 已真实建表通过 105 - live PostgreSQL schema 已真实建表通过
100 - 真实目录 -> manifest -> import 已打通 106 - 真实目录 -> manifest -> import 已打通
101 - 真实目录 -> fingerprint enrichment -> import 已打通 107 - 真实目录 -> fingerprint enrichment -> import 已打通
102 - semantic lane 已做成 runtime-ready 108 - semantic lane 已真实接入 `mert-v1-95m` baseline
103 - 当前 host 已能进入 runtime-ready placeholder 分支,下一步可在不破坏当前 MERT 基线的前提下继续接 `MuQ` 109 - 当前 host 上 live 主链已落 `chromaprint_matcher + mert-v1-95m`
110 - 下一步是在不破坏当前 MERT 基线的前提下继续接 `MuQ` challenger
104 - 当前 exact lane 已优先复用仓库内 `ChromaprintMatcher` 111 - 当前 exact lane 已优先复用仓库内 `ChromaprintMatcher`
105 112
106 --- 113 ---
......