Commit 7ada6f21 7ada6f21f4fbbffa470ad5cf474fc46ffb6d7e97 by cnb.bofCdSsphPA

Freeze the Phase-1 minimal schema story for ACR delivery

Constraint: Keep the production-ready v2 model intact while making the first-delivery table set explicit for engineers starting implementation.
Rejected: Introduce a separate competing Phase-1 schema document | It would create another parallel truth and slow handoff.
Confidence: high
Scope-risk: narrow
Directive: When discussing first-stage storage, default to song/recording/recording_asset/audio_window plus feature and reference tables before bringing in heavier governance tables.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files
Not-tested: SQL DDL generation from the simplified narrative
1 parent 89d9d72b
......@@ -8,6 +8,8 @@
-`docs/postgresql-data-model.md``docs/acr-architecture.md` 补充“为什么层多、Phase-1 如何简化、`recording``asset` 是否能合并”的明确口径:推荐保留 `song -> recording -> asset -> window -> feature` 作为最小可用骨架,不建议在正式 schema 中合并 `recording``recording_asset`
-`docs/postgresql-data-model.md` 补充 `Phase-1` 极简 schema 视图,明确首批应优先落稳的表集合:`song/recording/recording_asset/audio_window``feature_set_registry/audio_fingerprint/audio_embedding``reference_set_registry/reference_set_member`
## 2026-06-04
- 更新 `docs/README.md` 顶部为与 `session-handoff` 一致的“最短启动路径”,并再次用该入口命令重跑 `run_planner_validation_commands_live.py`,确认 fresh 结果仍为 `executed_count=4``all_passed=true`
......
......@@ -45,6 +45,7 @@ cd /workspace/acr-engine
### C. 第一个阶段怎么落地
- [phase1-implementation-checklist.md](./phase1-implementation-checklist.md) — Phase-1 执行清单
- [postgresql-data-model.md](./postgresql-data-model.md) — 含 Phase-1 极简 schema 视图
- [model-feature-registry-bootstrap.md](./model-feature-registry-bootstrap.md) — model/feature/reference set 初始化
- [phase1-worker-contract.md](./phase1-worker-contract.md) — worker、job、失败语义合同
- [postgres_db_schema_samples.md](./postgres_db_schema_samples.md) — PostgreSQL 存储样例
......
......@@ -256,6 +256,62 @@ window -> fingerprint / embedding -> candidate -> aggregate
---
## 1.3 Phase-1 极简 schema 视图
如果只从“第一阶段必须落哪些表”来理解,推荐把正式设计压缩成下面这组最小表集合:
| 层 | 推荐保留表 | 当前作用 |
|---|---|---|
| 归属层 | `song`(或当前 `canonical_song` 的等价口径), `recording` | 最终归属到 song,区分不同录音版本 |
| 资产层 | `recording_asset` | 管理真实音频文件、来源与编码版本 |
| 窗口层 | `audio_window` | 支撑 offset / evidence / 多段投票 |
| 特征层 | `feature_set_registry`, `audio_fingerprint`, `audio_embedding` | 管理 fingerprint / embedding 的生成事实 |
| reference 层 | `reference_set_registry`, `reference_set_member` | 管理当前线上 reference 集 |
也就是说,Phase-1 真正应该优先落稳的是:
```text
song -> recording -> recording_asset -> audio_window
feature_set_registry -> audio_fingerprint / audio_embedding
reference_set_registry -> reference_set_member
```
### 这版极简 schema 明确不要求第一天就重投入的内容
可以后补:
- `work`
- 更重的 `retrieval_index_registry`
- 更细的 `retrieval_candidate / match_decision` 在线审计表
- 复杂的多 lane 重排治理表
### 但是极简不等于扁平
即使走极简版,也**不建议**退回到下面这种扁平结构:
```text
song -> embedding
song -> fingerprint
```
原因:
- 没有 `recording`,版本信息会丢
- 没有 `asset`,文件来源与去重会乱
- 没有 `window`,evidence/offset/多段聚合会弱很多
- 没有 `feature_set_registry`,模型升级会把 schema 写死
### 一个最实用的实现口径
如果团队现在就要开干,最推荐的实施顺序是:
1. 先落 `song / recording / recording_asset / audio_window`
2. 再落 `feature_set_registry / audio_fingerprint / audio_embedding`
3. 再落 `reference_set_registry / reference_set_member`
4. 最后再补 `work / retrieval_index_registry / match_decision` 等增强层
这样既能保持当前 Phase-1 简洁,也不会破坏未来扩展。
---
## 2. 数据主链
```mermaid
......