Make the fused Phase-1 ACR schema concrete with DDL samples

Constraint: Keep the storage design aligned to the current song-centric model while turning the 4-table fused schema into something engineers can directly review and implement. Rejected: Keep only conceptual docs without concrete SQL | It leaves too much ambiguity about where slices, models, and features actually land. Confidence: high Scope-risk: narrow Directive: Until the repository gains a production SQL file for the fused model, treat postgres_db_schema_samples.md as the authoritative DDL draft for media_entity/audio_object/feature_fact/set_membership. Tested: git diff --check on touched files; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: Executing the fused DDL against a live PostgreSQL schema

Make the fused Phase-1 ACR schema concrete with DDL samples
Constraint: Keep the storage design aligned to the current song-centric model while turning the 4-table fused schema into something engineers can directly review and implement. Rejected: Keep only conceptual docs without concrete SQL | It leaves too much ambiguity about where slices, models, and features actually land. Confidence: high Scope-risk: narrow Directive: Until the repository gains a production SQL file for the fused model, treat postgres_db_schema_samples.md as the authoritative DDL draft for media_entity/audio_object/feature_fact/set_membership. Tested: git diff --check on touched files; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: Executing the fused DDL against a live PostgreSQL schema
cnb.bofCdSsphPA
Commit fe416ec9 ... fe416ec9cae627abe79814ec3a3e000feea99d02 authored 2026-06-04 14:39:30 +0800 by cnb.bofCdSsphPA
Showing 4 changed files with 86 additions and 2 deletions
docs/CHANGELOG.md
docs/postgres_db_schema_samples.md
docs/start-here.md
scripts/check_markdown_links.py
--- a/docs/CHANGELOG.md
View file @fe416ec
+++ b/docs/CHANGELOG.md
View file @fe416ec
 ## 2026-06-04

+- 重写 `docs/postgres_db_schema_samples.md` 为当前 song-centric 融合优先方案的 DDL 草案，补齐 4 张核心表（`media_entity` / `audio_object` / `feature_fact` / `set_membership`）、落表说明、流程图与常用 SQL 样例。
+
 - 在 `docs/postgresql-data-model.md` 新增“切片数据 / 模型 / feature 具体落哪张表”的表格与流程图，明确当前默认回溯链为 `feature_fact -> audio_object(window) -> audio_object(asset) -> media_entity(song)`。
 - 收敛 `docs/README.md` 为当前 song-centric 设计入口，并清理 docs 目录中与当前设计无关的模板、开放数据、业务导出、历史路线类文档。

--- a/docs/postgres_db_schema_samples.md
View file @fe416ec
+++ b/docs/postgres_db_schema_samples.md
View file @fe416ec
--- a/docs/start-here.md
View file @fe416ec
+++ b/docs/start-here.md
View file @fe416ec
@@ -59,7 +59,7 @@ cd /workspace/acr-engine
 ## 3. 用一句话理解项目

 我们在做的是一个面向 **版权保护 / 听歌识曲 / 版本归属** 的音乐 ACR 系统，
-目标是从 `100w` 音频、约 `30w` 歌曲中，快速定位正确的 `song_id / work / recording` 归属。
+目标是从 `100w` 音频、约 `30w` 歌曲中，快速定位正确的 `song_id` 归属；当前阶段暂不把版本/recording 作为必须返回对象。

 ---

@@ -71,7 +71,12 @@ cd /workspace/acr-engine
 - semantic lane challenger：`MuQ`
 - historical baseline：`ECAPA`

-### 数据主线
+### 当前 Phase-1 最小主线
+```text
+song -> asset -> window
+```
+
+### 可演进完整版主线
 ```text
 canonical_song -> work -> recording -> recording_asset -> audio_window
 ```
@@ -139,6 +144,7 @@ model_registry -> feature_set_registry -> audio_embedding / audio_fingerprint ->
 - [README.md](./README.md)
 - [session-handoff.md](./session-handoff.md)
 - [postgresql-data-model.md](./postgresql-data-model.md)
+- [postgres_db_schema_samples.md](./postgres_db_schema_samples.md)
 - [phase1-worker-contract.md](./phase1-worker-contract.md)

 ### 脚本
--- a/scripts/check_markdown_links.py 0 → 100755
View file @fe416ec
+++ b/scripts/check_markdown_links.py 0 → 100755
View file @fe416ec
+#!/usr/bin/env /usr/local/miniconda3/bin/python
+from __future__ import annotations
+
+import argparse
+import fnmatch
+import re
+import sys
+from pathlib import Path
+
+LINK_RE = re.compile(r'!?(?:\[([^\]]*)\])\(([^)]+)\)')
+SKIP_PREFIXES = ('http://', 'https://', 'mailto:', 'tel:', '#')
+DEFAULT_EXCLUDES = ['CHANGELOG.md']
+
+
+def should_check(target: str) -> bool:
+    target = target.strip()
+    return bool(target) and not target.startswith(SKIP_PREFIXES)
+
+
+def normalize_target(raw: str) -> str:
+    target = raw.strip()
+    if target.startswith('<') and target.endswith('>'):
+        target = target[1:-1]
+    target = target.split('#', 1)[0].split('?', 1)[0].strip()
+    return target
+
+
+def iter_markdown_files(root: Path, excludes: list[str]) -> list[Path]:
+    files: list[Path] = []
+    for path in sorted(root.rglob('*.md')):
+        rel = path.relative_to(root).as_posix()
+        if any(fnmatch.fnmatch(rel, pattern) for pattern in excludes):
+            continue
+        files.append(path)
+    return files
+
+
+def scan_markdown_file(path: Path, root: Path) -> list[tuple[str, str]]:
+    missing: list[tuple[str, str]] = []
+    text = path.read_text(encoding='utf-8')
+    for _, raw_target in LINK_RE.findall(text):
+        if not should_check(raw_target):
+            continue
+        target = normalize_target(raw_target)
+        if not target:
+            continue
+        resolved = (path.parent / target).resolve()
+        if not resolved.exists():
+            missing.append((path.relative_to(root).as_posix(), raw_target))
+    return missing
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser(description='Check relative Markdown links for missing files.')
+    parser.add_argument('--root', default='docs', help='Root directory containing markdown files')
+    parser.add_argument('--exclude', action='append', default=[], help='Glob patterns relative to root to exclude')
+    args = parser.parse_args()
+
+    root = Path(args.root).resolve()
+    if not root.exists():
+        print(f'root not found: {root}', file=sys.stderr)
+        sys.exit(2)
+
+    excludes = DEFAULT_EXCLUDES + list(args.exclude)
+    files = iter_markdown_files(root, excludes)
+    failures: list[tuple[str, str]] = []
+    for md in files:
+        failures.extend(scan_markdown_file(md, root))
+
+    if failures:
+        print('Missing relative markdown targets:')
+        for source, target in failures:
+            print(f'- {source}: {target}')
+        sys.exit(1)
+
+    print(f'OK: checked {len(files)} markdown files under {root} (excluded: {excludes})')