Why the docs need explicit bindings between audio objects and feature facts
Constraint: Phase-1 implementers need one concrete explanation for how songs, assets, windows, and open-model features are linked and stored Rejected: Rely on schema columns alone | does not show the intended per-model storage pattern for the current encoder-only phase Confidence: high Scope-risk: narrow Directive: Keep future model-onboarding docs grounded in feature_fact object_id/song_id bindings unless the schema default changes Tested: markdown link check on /workspace/docs after adding binding diagrams and SQL storage examples Not-tested: No live database rerun; this is a documentation clarification over an already-verified schema
Showing
3 changed files
with
295 additions
and
0 deletions
| ... | @@ -6,6 +6,7 @@ | ... | @@ -6,6 +6,7 @@ |
| 6 | - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`。 | 6 | - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`。 |
| 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 | 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 |
| 8 | - 继续补充检索融合设计:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。 | 8 | - 继续补充检索融合设计:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。 |
| 9 | - 继续补充数据绑定与模型落库说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 明确 `media_entity -> audio_object(asset/window) -> feature_fact` 的绑定字段关系,并给出 `chromaprint / mert-v1-95m / muq-base / local_wavehash_embed / ecapa-tdnn` 的 Phase-1 存储口径与 SQL 样例。 | ||
| 9 | 10 | ||
| 10 | ## 2026-06-04 | 11 | ## 2026-06-04 |
| 11 | 12 | ... | ... |
| ... | @@ -495,3 +495,141 @@ order by coalesce(max(raw_score) filter (where m.feature_type = 'fingerprint'), | ... | @@ -495,3 +495,141 @@ order by coalesce(max(raw_score) filter (where m.feature_type = 'fingerprint'), |
| 495 | offset_coverage_ms desc | 495 | offset_coverage_ms desc |
| 496 | limit 20; | 496 | limit 20; |
| 497 | ``` | 497 | ``` |
| 498 | |||
| 499 | --- | ||
| 500 | |||
| 501 | ## 13. 绑定关系与开源模型落库样例 | ||
| 502 | |||
| 503 | ### 13.1 最小绑定关系 | ||
| 504 | |||
| 505 | ```text | ||
| 506 | media_entity(song) | ||
| 507 | -> audio_object(asset) | ||
| 508 | -> audio_object(window) | ||
| 509 | -> feature_fact(chromaprint) | ||
| 510 | -> feature_fact(mert-v1-95m) | ||
| 511 | -> feature_fact(muq-base) | ||
| 512 | ``` | ||
| 513 | |||
| 514 | ### 13.2 具体样例 | ||
| 515 | |||
| 516 | #### Step 1: song | ||
| 517 | |||
| 518 | ```sql | ||
| 519 | insert into media_entity ( | ||
| 520 | entity_type, biz_key, title, artist_name | ||
| 521 | ) values ( | ||
| 522 | 'song', 'song_000123', 'Demo Song', 'Demo Artist' | ||
| 523 | ) | ||
| 524 | returning entity_id; | ||
| 525 | ``` | ||
| 526 | |||
| 527 | #### Step 2: asset | ||
| 528 | |||
| 529 | ```sql | ||
| 530 | insert into audio_object ( | ||
| 531 | object_type, song_id, storage_uri, checksum, codec, sample_rate, channels, duration_ms | ||
| 532 | ) values ( | ||
| 533 | 'asset', :song_id, 's3://bucket/demo_song/master.wav', 'sha256:asset-demo', 'wav', 44100, 2, 210000 | ||
| 534 | ) | ||
| 535 | returning object_id; | ||
| 536 | ``` | ||
| 537 | |||
| 538 | #### Step 3: window | ||
| 539 | |||
| 540 | ```sql | ||
| 541 | insert into audio_object ( | ||
| 542 | object_type, song_id, parent_object_id, start_ms, end_ms, duration_ms | ||
| 543 | ) values ( | ||
| 544 | 'window', :song_id, :asset_id, 0, 5000, 5000 | ||
| 545 | ) | ||
| 546 | returning object_id; | ||
| 547 | ``` | ||
| 548 | |||
| 549 | #### Step 4: chromaprint fingerprint | ||
| 550 | |||
| 551 | ```sql | ||
| 552 | insert into feature_fact ( | ||
| 553 | feature_type, object_id, song_id, | ||
| 554 | model_name, model_version, feature_set_name, feature_schema_ver, | ||
| 555 | fingerprint_value | ||
| 556 | ) values ( | ||
| 557 | 'fingerprint', :window_id, :song_id, | ||
| 558 | 'chromaprint', '1.0', 'chromaprint_5s_v1', 'v1', | ||
| 559 | 'AQAAE0mUaEkSZSo...' | ||
| 560 | ); | ||
| 561 | ``` | ||
| 562 | |||
| 563 | #### Step 5: MERT embedding | ||
| 564 | |||
| 565 | ```sql | ||
| 566 | insert into feature_fact ( | ||
| 567 | feature_type, object_id, song_id, | ||
| 568 | model_name, model_version, feature_set_name, feature_schema_ver, | ||
| 569 | embedding_dim, embedding_uri, vector_table_name | ||
| 570 | ) values ( | ||
| 571 | 'embedding', :window_id, :song_id, | ||
| 572 | 'mert-v1-95m', 'hf-main', 'mert_5s_hop2.5_v1', 'v1', | ||
| 573 | 768, 's3://bucket/emb/demo_song_win0001_mert.npy', 'audio_embedding_vector_768' | ||
| 574 | ); | ||
| 575 | ``` | ||
| 576 | |||
| 577 | #### Step 6: MuQ embedding | ||
| 578 | |||
| 579 | ```sql | ||
| 580 | insert into feature_fact ( | ||
| 581 | feature_type, object_id, song_id, | ||
| 582 | model_name, model_version, feature_set_name, feature_schema_ver, | ||
| 583 | embedding_dim, embedding_uri, vector_table_name | ||
| 584 | ) values ( | ||
| 585 | 'embedding', :window_id, :song_id, | ||
| 586 | 'muq-base', 'hf-main', 'muq_5s_hop2.5_v1', 'v1', | ||
| 587 | 768, 's3://bucket/emb/demo_song_win0001_muq.npy', 'audio_embedding_vector_768' | ||
| 588 | ); | ||
| 589 | ``` | ||
| 590 | |||
| 591 | #### Step 7: fallback embedding | ||
| 592 | |||
| 593 | ```sql | ||
| 594 | insert into feature_fact ( | ||
| 595 | feature_type, object_id, song_id, | ||
| 596 | model_name, model_version, feature_set_name, feature_schema_ver, | ||
| 597 | embedding_dim, embedding_uri, vector_table_name | ||
| 598 | ) values ( | ||
| 599 | 'embedding', :window_id, :song_id, | ||
| 600 | 'local_wavehash_embed', 'phase1_local', 'wavehash_5s_hop2.5_v1', 'v1', | ||
| 601 | 8, 'file:///tmp/demo_song_win0001_wavehash.npy', 'audio_embedding_vector_8_placeholder' | ||
| 602 | ); | ||
| 603 | ``` | ||
| 604 | |||
| 605 | ### 13.3 查询某个 window 已经被哪些开源模型编码过 | ||
| 606 | |||
| 607 | ```sql | ||
| 608 | select object_id, | ||
| 609 | song_id, | ||
| 610 | feature_type, | ||
| 611 | model_name, | ||
| 612 | model_version, | ||
| 613 | feature_set_name, | ||
| 614 | embedding_dim, | ||
| 615 | fingerprint_value, | ||
| 616 | embedding_uri, | ||
| 617 | vector_table_name | ||
| 618 | from feature_fact | ||
| 619 | where object_id = :window_id | ||
| 620 | order by feature_type, model_name; | ||
| 621 | ``` | ||
| 622 | |||
| 623 | ### 13.4 查询某个 song 当前有哪些模型特征 | ||
| 624 | |||
| 625 | ```sql | ||
| 626 | select ff.song_id, | ||
| 627 | ff.model_name, | ||
| 628 | ff.model_version, | ||
| 629 | ff.feature_type, | ||
| 630 | count(*) as feature_rows | ||
| 631 | from feature_fact ff | ||
| 632 | where ff.song_id = :song_id | ||
| 633 | group by ff.song_id, ff.model_name, ff.model_version, ff.feature_type | ||
| 634 | order by ff.feature_type, ff.model_name; | ||
| 635 | ``` | ... | ... |
| ... | @@ -534,3 +534,159 @@ limit 20; | ... | @@ -534,3 +534,159 @@ limit 20; |
| 534 | - 不要求一开始就把融合逻辑写死在数据库里 | 534 | - 不要求一开始就把融合逻辑写死在数据库里 |
| 535 | - 便于后续调权重 | 535 | - 便于后续调权重 |
| 536 | - 便于对比 `MERT` / `MuQ` / fallback 的增益 | 536 | - 便于对比 `MERT` / `MuQ` / fallback 的增益 |
| 537 | |||
| 538 | --- | ||
| 539 | |||
| 540 | ## 15. 数据到底是怎么绑定在一起的 | ||
| 541 | |||
| 542 | 这是当前 4 表 schema 最核心的绑定关系: | ||
| 543 | |||
| 544 | ```text | ||
| 545 | song(media_entity) | ||
| 546 | 1 -> N asset(audio_object) | ||
| 547 | 1 asset -> N window(audio_object) | ||
| 548 | 1 window -> N feature_fact | ||
| 549 | ``` | ||
| 550 | |||
| 551 | 换句话说: | ||
| 552 | - `media_entity` 定义 **这个东西最终属于哪个 song** | ||
| 553 | - `audio_object` 定义 **这个 song 下有哪些音频文件、每个文件切了哪些窗口** | ||
| 554 | - `feature_fact` 定义 **这些窗口被哪些模型编码过,产出了哪些特征** | ||
| 555 | |||
| 556 | ### 15.1 绑定关系图 | ||
| 557 | |||
| 558 | ```mermaid | ||
| 559 | flowchart TD | ||
| 560 | S[media_entity\nsong] --> A1[audio_object\nasset] | ||
| 561 | S --> A2[audio_object\nasset] | ||
| 562 | A1 --> W1[audio_object\nwindow] | ||
| 563 | A1 --> W2[audio_object\nwindow] | ||
| 564 | W1 --> F1[feature_fact\nchromaprint] | ||
| 565 | W1 --> F2[feature_fact\nmert] | ||
| 566 | W1 --> F3[feature_fact\nmuq] | ||
| 567 | W2 --> F4[feature_fact\nchromaprint] | ||
| 568 | W2 --> F5[feature_fact\nlocal_wavehash_embed] | ||
| 569 | ``` | ||
| 570 | |||
| 571 | ### 15.2 每张表靠什么字段绑定 | ||
| 572 | |||
| 573 | | 从 | 到 | 绑定字段 | 说明 | | ||
| 574 | |---|---|---|---| | ||
| 575 | | `audio_object(asset/window)` | `media_entity(song)` | `audio_object.song_id = media_entity.entity_id` | asset/window 都归属于某个 song | | ||
| 576 | | `audio_object(window)` | `audio_object(asset)` | `audio_object.parent_object_id = asset.object_id` | window 的父对象一定是 asset | | ||
| 577 | | `feature_fact` | `audio_object(window)` | `feature_fact.object_id = window.object_id` | feature 绑定到具体切片 | | ||
| 578 | | `feature_fact` | `media_entity(song)` | `feature_fact.song_id = media_entity.entity_id` | 冗余保存 song_id,便于检索聚合 | | ||
| 579 | | `set_membership` | `song/asset/window/feature` | `member_type + member_id` | 集合关系是多态绑定 | | ||
| 580 | |||
| 581 | ### 15.3 为什么 `feature_fact` 同时存 `object_id` 和 `song_id` | ||
| 582 | |||
| 583 | 因为二者回答的是不同问题: | ||
| 584 | |||
| 585 | - `object_id` 回答:**这个特征是从哪一个 window 抽出来的** | ||
| 586 | - `song_id` 回答:**这个特征最终属于哪一个 song** | ||
| 587 | |||
| 588 | 这样做的好处: | ||
| 589 | - 在线召回时可以直接按 `song_id` 聚合 | ||
| 590 | - 同时又能回查到具体 `window -> asset -> offset` | ||
| 591 | - 不需要每次聚合都先做一遍深链路 join 才知道 song 归属 | ||
| 592 | |||
| 593 | ### 15.4 一条 feature 记录可以怎么理解 | ||
| 594 | |||
| 595 | 一条 `feature_fact` 本质上是在说: | ||
| 596 | |||
| 597 | > 对 `song_id = X` 下面的某个 `window(object_id = Y)`,使用 `model_name/model_version/feature_set_name` 这套编码方案,产出了一个 `fingerprint` 或 `embedding` 特征。 | ||
| 598 | |||
| 599 | 所以 `feature_fact` 不是“模型注册表”,而是“**模型计算结果事实表**”。 | ||
| 600 | |||
| 601 | --- | ||
| 602 | |||
| 603 | ## 16. Phase-1 开源模型集合应该怎么落地存储 | ||
| 604 | |||
| 605 | 当前 Phase-1 的原则是: | ||
| 606 | |||
| 607 | > **先直接用开源模型做 encoder,不微调;数据库里先把“是谁算的、怎么算的、结果放哪”固定下来。** | ||
| 608 | |||
| 609 | ### 16.1 当前建议的模型集合 | ||
| 610 | |||
| 611 | | lane | model_name | model_version | feature_type | 用途 | | ||
| 612 | |---|---|---|---|---| | ||
| 613 | | exact | `chromaprint` | `1.0` | `fingerprint` | 高精度 exact 命中 | | ||
| 614 | | semantic baseline | `mert-v1-95m` | `hf-main` | `embedding` | song semantic baseline | | ||
| 615 | | semantic challenger | `muq-base` | `hf-main` | `embedding` | cover / bgm / 复杂干扰 challenger | | ||
| 616 | | semantic fallback | `local_wavehash_embed` | `phase1_local` | `embedding` | 当前 host 缺 runtime 时兜底 | | ||
| 617 | | historical baseline | `ecapa-tdnn` | `baseline_only` | `embedding` | 历史对比,不建议做 Phase-1 主导 | | ||
| 618 | |||
| 619 | ### 16.2 建议用什么字段固化模型身份 | ||
| 620 | |||
| 621 | 统一落在 `feature_fact`: | ||
| 622 | |||
| 623 | - `model_name` | ||
| 624 | - `model_version` | ||
| 625 | - `feature_set_name` | ||
| 626 | - `feature_schema_ver` | ||
| 627 | - `embedding_dim`(embedding 时) | ||
| 628 | |||
| 629 | ### 16.3 `feature_set_name` 应该怎么命名 | ||
| 630 | |||
| 631 | 建议把下面几类信息编码进去: | ||
| 632 | |||
| 633 | ```text | ||
| 634 | <model_family>_<window_sec>s_hop<stride_sec>_<variant>_v<schema> | ||
| 635 | ``` | ||
| 636 | |||
| 637 | 例如: | ||
| 638 | - `chromaprint_5s_v1` | ||
| 639 | - `mert_5s_hop2.5_v1` | ||
| 640 | - `muq_5s_hop2.5_v1` | ||
| 641 | - `wavehash_5s_hop2.5_v1` | ||
| 642 | |||
| 643 | ### 16.4 Phase-1 推荐的存储规则 | ||
| 644 | |||
| 645 | #### exact lane | ||
| 646 | - `feature_type = 'fingerprint'` | ||
| 647 | - `fingerprint_value` 必填 | ||
| 648 | - `model_name = 'chromaprint'` | ||
| 649 | - `embedding_uri / vector_table_name` 为空 | ||
| 650 | |||
| 651 | #### semantic lane | ||
| 652 | - `feature_type = 'embedding'` | ||
| 653 | - `embedding_dim` 必填 | ||
| 654 | - `embedding_uri` 或 `vector_table_name` 至少一个必填 | ||
| 655 | - `fingerprint_value` 为空 | ||
| 656 | |||
| 657 | ### 16.5 为什么现在不强依赖单独的 model_registry | ||
| 658 | |||
| 659 | 因为当前 Phase-1 更关注: | ||
| 660 | - 先把特征稳定算出来 | ||
| 661 | - 先把特征和 song/window 的绑定关系固化 | ||
| 662 | - 先让检索与归属链闭环 | ||
| 663 | |||
| 664 | 所以当前最务实的方式是: | ||
| 665 | - **模型身份直接写进 `feature_fact`** | ||
| 666 | - 后续如果模型数量继续变多,再补 registry 也不迟 | ||
| 667 | |||
| 668 | ### 16.6 一个推荐的落库顺序 | ||
| 669 | |||
| 670 | 对于每个 asset: | ||
| 671 | |||
| 672 | 1. 写 `media_entity(song)` | ||
| 673 | 2. 写 `audio_object(asset)` | ||
| 674 | 3. 切窗并写 `audio_object(window)` | ||
| 675 | 4. 跑 `chromaprint`,写 `feature_fact(fingerprint)` | ||
| 676 | 5. 跑 `mert-v1-95m`,写 `feature_fact(embedding)` | ||
| 677 | 6. 跑 `muq-base`,写 `feature_fact(embedding)` | ||
| 678 | 7. 如果 runtime 不可用,至少写 `local_wavehash_embed` fallback | ||
| 679 | |||
| 680 | 这样最终会形成: | ||
| 681 | |||
| 682 | ```text | ||
| 683 | 同一个 window | ||
| 684 | -> 1 条 chromaprint fingerprint | ||
| 685 | -> 1 条 mert embedding | ||
| 686 | -> 1 条 muq embedding | ||
| 687 | -> (可选) 1 条 fallback embedding | ||
| 688 | ``` | ||
| 689 | |||
| 690 | ### 16.7 一句话理解 Phase-1 的存储策略 | ||
| 691 | |||
| 692 | > `audio_object` 负责“哪段音频”,`feature_fact` 负责“哪种模型算出了什么特征”,二者用 `object_id` 绑定,再用 `song_id` 把所有结果稳定归到 song。 | ... | ... |
-
Please register or sign in to post a comment