Why the schema samples need a complete multi-asset multi-model example
Constraint: Phase-1 implementers need one concrete end-to-end sample that shows how a single song expands into multiple assets, windows, and model facts Rejected: Keep only isolated insert snippets | does not help with batch backfill or completeness checks Confidence: high Scope-risk: narrow Directive: When extending storage examples, include operational queries for gap detection and model completeness, not just inserts Tested: markdown link check on /workspace/docs after adding the complete sample and audit SQL Not-tested: No live database rerun; this is a documentation-only expansion over the verified schema
Showing
2 changed files
with
111 additions
and
0 deletions
| ... | @@ -7,6 +7,7 @@ | ... | @@ -7,6 +7,7 @@ |
| 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 | 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 |
| 8 | - 继续补充检索融合设计:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。 | 8 | - 继续补充检索融合设计:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。 |
| 9 | - 继续补充数据绑定与模型落库说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 明确 `media_entity -> audio_object(asset/window) -> feature_fact` 的绑定字段关系,并给出 `chromaprint / mert-v1-95m / muq-base / local_wavehash_embed / ecapa-tdnn` 的 Phase-1 存储口径与 SQL 样例。 | 9 | - 继续补充数据绑定与模型落库说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 明确 `media_entity -> audio_object(asset/window) -> feature_fact` 的绑定字段关系,并给出 `chromaprint / mert-v1-95m / muq-base / local_wavehash_embed / ecapa-tdnn` 的 Phase-1 存储口径与 SQL 样例。 |
| 10 | - 在 `docs/postgres_db_schema_samples.md` 继续补充一个完整的 `同一 song -> 多 asset -> 多 window -> 多 model` 样例,附带缺模型扫描 SQL 与 asset 级特征完备性检查 SQL,方便 Phase-1 批量补算与巡检。 | ||
| 10 | 11 | ||
| 11 | ## 2026-06-04 | 12 | ## 2026-06-04 |
| 12 | 13 | ... | ... |
| ... | @@ -633,3 +633,113 @@ where ff.song_id = :song_id | ... | @@ -633,3 +633,113 @@ where ff.song_id = :song_id |
| 633 | group by ff.song_id, ff.model_name, ff.model_version, ff.feature_type | 633 | group by ff.song_id, ff.model_name, ff.model_version, ff.feature_type |
| 634 | order by ff.feature_type, ff.model_name; | 634 | order by ff.feature_type, ff.model_name; |
| 635 | ``` | 635 | ``` |
| 636 | |||
| 637 | --- | ||
| 638 | |||
| 639 | ## 14. 一个完整的多 asset / 多 window / 多 model 样例 | ||
| 640 | |||
| 641 | 假设: | ||
| 642 | - 同一个 `song_id = 1001` | ||
| 643 | - 有 2 个音频文件:`master.wav`、`ugc_clip.mp3` | ||
| 644 | - 每个 asset 切成 2 个 window | ||
| 645 | - 每个 window 都跑 `chromaprint + mert-v1-95m + muq-base` | ||
| 646 | |||
| 647 | ### 14.1 逻辑结构 | ||
| 648 | |||
| 649 | ```text | ||
| 650 | song(1001) | ||
| 651 | -> asset(2001, master.wav) | ||
| 652 | -> window(3001, 0-5000) | ||
| 653 | -> chromaprint | ||
| 654 | -> mert-v1-95m | ||
| 655 | -> muq-base | ||
| 656 | -> window(3002, 2500-7500) | ||
| 657 | -> chromaprint | ||
| 658 | -> mert-v1-95m | ||
| 659 | -> muq-base | ||
| 660 | -> asset(2002, ugc_clip.mp3) | ||
| 661 | -> window(3003, 10000-15000) | ||
| 662 | -> chromaprint | ||
| 663 | -> mert-v1-95m | ||
| 664 | -> muq-base | ||
| 665 | -> window(3004, 12500-17500) | ||
| 666 | -> chromaprint | ||
| 667 | -> mert-v1-95m | ||
| 668 | -> muq-base | ||
| 669 | ``` | ||
| 670 | |||
| 671 | ### 14.2 会落成多少行 | ||
| 672 | |||
| 673 | | 表 | 行数 | 说明 | | ||
| 674 | |---|---:|---| | ||
| 675 | | `media_entity` | 1 | 一个 song | | ||
| 676 | | `audio_object` | 6 | 2 个 asset + 4 个 window | | ||
| 677 | | `feature_fact` | 12 | 4 个 window × 3 个模型 | | ||
| 678 | | `set_membership` | 视需要 | 可给 song/asset/window 挂 reference_set | | ||
| 679 | |||
| 680 | ### 14.3 查询某个 song 的全量树状数据 | ||
| 681 | |||
| 682 | ```sql | ||
| 683 | select s.entity_id as song_id, | ||
| 684 | s.title, | ||
| 685 | a.object_id as asset_id, | ||
| 686 | a.storage_uri, | ||
| 687 | w.object_id as window_id, | ||
| 688 | w.start_ms, | ||
| 689 | w.end_ms, | ||
| 690 | ff.feature_type, | ||
| 691 | ff.model_name, | ||
| 692 | ff.model_version, | ||
| 693 | ff.feature_set_name | ||
| 694 | from media_entity s | ||
| 695 | join audio_object a | ||
| 696 | on a.song_id = s.entity_id | ||
| 697 | and a.object_type = 'asset' | ||
| 698 | join audio_object w | ||
| 699 | on w.parent_object_id = a.object_id | ||
| 700 | and w.object_type = 'window' | ||
| 701 | left join feature_fact ff | ||
| 702 | on ff.object_id = w.object_id | ||
| 703 | where s.entity_id = :song_id | ||
| 704 | order by a.object_id, w.start_ms, ff.feature_type, ff.model_name; | ||
| 705 | ``` | ||
| 706 | |||
| 707 | ### 14.4 查询哪些 window 缺某个模型 | ||
| 708 | |||
| 709 | 这个 SQL 很适合做补算任务扫描,比如检查哪些 window 还没跑 `muq-base`: | ||
| 710 | |||
| 711 | ```sql | ||
| 712 | select w.object_id as window_id, | ||
| 713 | w.song_id, | ||
| 714 | w.parent_object_id as asset_id, | ||
| 715 | w.start_ms, | ||
| 716 | w.end_ms | ||
| 717 | from audio_object w | ||
| 718 | where w.object_type = 'window' | ||
| 719 | and not exists ( | ||
| 720 | select 1 | ||
| 721 | from feature_fact ff | ||
| 722 | where ff.object_id = w.object_id | ||
| 723 | and ff.feature_type = 'embedding' | ||
| 724 | and ff.model_name = 'muq-base' | ||
| 725 | and ff.model_version = 'hf-main' | ||
| 726 | and ff.feature_set_name = 'muq_5s_hop2.5_v1' | ||
| 727 | ) | ||
| 728 | order by w.song_id, w.parent_object_id, w.start_ms; | ||
| 729 | ``` | ||
| 730 | |||
| 731 | ### 14.5 查询某个 asset 下每个 window 已经具备哪些模型 | ||
| 732 | |||
| 733 | ```sql | ||
| 734 | select w.object_id as window_id, | ||
| 735 | w.start_ms, | ||
| 736 | w.end_ms, | ||
| 737 | string_agg(ff.model_name || ':' || ff.feature_type, ', ' order by ff.model_name) as ready_features | ||
| 738 | from audio_object w | ||
| 739 | left join feature_fact ff | ||
| 740 | on ff.object_id = w.object_id | ||
| 741 | where w.object_type = 'window' | ||
| 742 | and w.parent_object_id = :asset_id | ||
| 743 | group by w.object_id, w.start_ms, w.end_ms | ||
| 744 | order by w.start_ms; | ||
| 745 | ``` | ... | ... |
-
Please register or sign in to post a comment