Commit b624273c b624273c7015949694852e7fb46f17d25ecab86a by cnb.bofCdSsphPA

Why the schema samples need a complete multi-asset multi-model example

Constraint: Phase-1 implementers need one concrete end-to-end sample that shows how a single song expands into multiple assets, windows, and model facts
Rejected: Keep only isolated insert snippets | does not help with batch backfill or completeness checks
Confidence: high
Scope-risk: narrow
Directive: When extending storage examples, include operational queries for gap detection and model completeness, not just inserts
Tested: markdown link check on /workspace/docs after adding the complete sample and audit SQL
Not-tested: No live database rerun; this is a documentation-only expansion over the verified schema
1 parent 75f156b8
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
7 - 继续补强在线检索说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 7 - 继续补强在线检索说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。
8 - 继续补充检索融合设计:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。 8 - 继续补充检索融合设计:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 新增 exact lane + semantic lane 双通道的 song 级聚合流程图、规则融合口径与 SQL 骨架,明确 Phase-1 采用 `exact 主导、semantic 补强` 的排序策略。
9 - 继续补充数据绑定与模型落库说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 明确 `media_entity -> audio_object(asset/window) -> feature_fact` 的绑定字段关系,并给出 `chromaprint / mert-v1-95m / muq-base / local_wavehash_embed / ecapa-tdnn` 的 Phase-1 存储口径与 SQL 样例。 9 - 继续补充数据绑定与模型落库说明:在 `docs/postgresql-data-model.md``docs/postgres_db_schema_samples.md` 明确 `media_entity -> audio_object(asset/window) -> feature_fact` 的绑定字段关系,并给出 `chromaprint / mert-v1-95m / muq-base / local_wavehash_embed / ecapa-tdnn` 的 Phase-1 存储口径与 SQL 样例。
10 -`docs/postgres_db_schema_samples.md` 继续补充一个完整的 `同一 song -> 多 asset -> 多 window -> 多 model` 样例,附带缺模型扫描 SQL 与 asset 级特征完备性检查 SQL,方便 Phase-1 批量补算与巡检。
10 11
11 ## 2026-06-04 12 ## 2026-06-04
12 13
......
...@@ -633,3 +633,113 @@ where ff.song_id = :song_id ...@@ -633,3 +633,113 @@ where ff.song_id = :song_id
633 group by ff.song_id, ff.model_name, ff.model_version, ff.feature_type 633 group by ff.song_id, ff.model_name, ff.model_version, ff.feature_type
634 order by ff.feature_type, ff.model_name; 634 order by ff.feature_type, ff.model_name;
635 ``` 635 ```
636
637 ---
638
639 ## 14. 一个完整的多 asset / 多 window / 多 model 样例
640
641 假设:
642 - 同一个 `song_id = 1001`
643 - 有 2 个音频文件:`master.wav``ugc_clip.mp3`
644 - 每个 asset 切成 2 个 window
645 - 每个 window 都跑 `chromaprint + mert-v1-95m + muq-base`
646
647 ### 14.1 逻辑结构
648
649 ```text
650 song(1001)
651 -> asset(2001, master.wav)
652 -> window(3001, 0-5000)
653 -> chromaprint
654 -> mert-v1-95m
655 -> muq-base
656 -> window(3002, 2500-7500)
657 -> chromaprint
658 -> mert-v1-95m
659 -> muq-base
660 -> asset(2002, ugc_clip.mp3)
661 -> window(3003, 10000-15000)
662 -> chromaprint
663 -> mert-v1-95m
664 -> muq-base
665 -> window(3004, 12500-17500)
666 -> chromaprint
667 -> mert-v1-95m
668 -> muq-base
669 ```
670
671 ### 14.2 会落成多少行
672
673 | 表 | 行数 | 说明 |
674 |---|---:|---|
675 | `media_entity` | 1 | 一个 song |
676 | `audio_object` | 6 | 2 个 asset + 4 个 window |
677 | `feature_fact` | 12 | 4 个 window × 3 个模型 |
678 | `set_membership` | 视需要 | 可给 song/asset/window 挂 reference_set |
679
680 ### 14.3 查询某个 song 的全量树状数据
681
682 ```sql
683 select s.entity_id as song_id,
684 s.title,
685 a.object_id as asset_id,
686 a.storage_uri,
687 w.object_id as window_id,
688 w.start_ms,
689 w.end_ms,
690 ff.feature_type,
691 ff.model_name,
692 ff.model_version,
693 ff.feature_set_name
694 from media_entity s
695 join audio_object a
696 on a.song_id = s.entity_id
697 and a.object_type = 'asset'
698 join audio_object w
699 on w.parent_object_id = a.object_id
700 and w.object_type = 'window'
701 left join feature_fact ff
702 on ff.object_id = w.object_id
703 where s.entity_id = :song_id
704 order by a.object_id, w.start_ms, ff.feature_type, ff.model_name;
705 ```
706
707 ### 14.4 查询哪些 window 缺某个模型
708
709 这个 SQL 很适合做补算任务扫描,比如检查哪些 window 还没跑 `muq-base`
710
711 ```sql
712 select w.object_id as window_id,
713 w.song_id,
714 w.parent_object_id as asset_id,
715 w.start_ms,
716 w.end_ms
717 from audio_object w
718 where w.object_type = 'window'
719 and not exists (
720 select 1
721 from feature_fact ff
722 where ff.object_id = w.object_id
723 and ff.feature_type = 'embedding'
724 and ff.model_name = 'muq-base'
725 and ff.model_version = 'hf-main'
726 and ff.feature_set_name = 'muq_5s_hop2.5_v1'
727 )
728 order by w.song_id, w.parent_object_id, w.start_ms;
729 ```
730
731 ### 14.5 查询某个 asset 下每个 window 已经具备哪些模型
732
733 ```sql
734 select w.object_id as window_id,
735 w.start_ms,
736 w.end_ms,
737 string_agg(ff.model_name || ':' || ff.feature_type, ', ' order by ff.model_name) as ready_features
738 from audio_object w
739 left join feature_fact ff
740 on ff.object_id = w.object_id
741 where w.object_type = 'window'
742 and w.parent_object_id = :asset_id
743 group by w.object_id, w.start_ms, w.end_ms
744 order by w.start_ms;
745 ```
......