Freeze the Phase-1 minimal schema story for ACR delivery
Constraint: Keep the production-ready v2 model intact while making the first-delivery table set explicit for engineers starting implementation. Rejected: Introduce a separate competing Phase-1 schema document | It would create another parallel truth and slow handoff. Confidence: high Scope-risk: narrow Directive: When discussing first-stage storage, default to song/recording/recording_asset/audio_window plus feature and reference tables before bringing in heavier governance tables. Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files Not-tested: SQL DDL generation from the simplified narrative
Showing
3 changed files
with
59 additions
and
0 deletions
| ... | @@ -8,6 +8,8 @@ | ... | @@ -8,6 +8,8 @@ |
| 8 | 8 | ||
| 9 | - 在 `docs/postgresql-data-model.md` 与 `docs/acr-architecture.md` 补充“为什么层多、Phase-1 如何简化、`recording` 与 `asset` 是否能合并”的明确口径:推荐保留 `song -> recording -> asset -> window -> feature` 作为最小可用骨架,不建议在正式 schema 中合并 `recording` 与 `recording_asset`。 | 9 | - 在 `docs/postgresql-data-model.md` 与 `docs/acr-architecture.md` 补充“为什么层多、Phase-1 如何简化、`recording` 与 `asset` 是否能合并”的明确口径:推荐保留 `song -> recording -> asset -> window -> feature` 作为最小可用骨架,不建议在正式 schema 中合并 `recording` 与 `recording_asset`。 |
| 10 | 10 | ||
| 11 | - 在 `docs/postgresql-data-model.md` 补充 `Phase-1` 极简 schema 视图,明确首批应优先落稳的表集合:`song/recording/recording_asset/audio_window`、`feature_set_registry/audio_fingerprint/audio_embedding`、`reference_set_registry/reference_set_member`。 | ||
| 12 | |||
| 11 | ## 2026-06-04 | 13 | ## 2026-06-04 |
| 12 | 14 | ||
| 13 | - 更新 `docs/README.md` 顶部为与 `session-handoff` 一致的“最短启动路径”,并再次用该入口命令重跑 `run_planner_validation_commands_live.py`,确认 fresh 结果仍为 `executed_count=4`、`all_passed=true`。 | 15 | - 更新 `docs/README.md` 顶部为与 `session-handoff` 一致的“最短启动路径”,并再次用该入口命令重跑 `run_planner_validation_commands_live.py`,确认 fresh 结果仍为 `executed_count=4`、`all_passed=true`。 | ... | ... |
| ... | @@ -45,6 +45,7 @@ cd /workspace/acr-engine | ... | @@ -45,6 +45,7 @@ cd /workspace/acr-engine |
| 45 | 45 | ||
| 46 | ### C. 第一个阶段怎么落地 | 46 | ### C. 第一个阶段怎么落地 |
| 47 | - [phase1-implementation-checklist.md](./phase1-implementation-checklist.md) — Phase-1 执行清单 | 47 | - [phase1-implementation-checklist.md](./phase1-implementation-checklist.md) — Phase-1 执行清单 |
| 48 | - [postgresql-data-model.md](./postgresql-data-model.md) — 含 Phase-1 极简 schema 视图 | ||
| 48 | - [model-feature-registry-bootstrap.md](./model-feature-registry-bootstrap.md) — model/feature/reference set 初始化 | 49 | - [model-feature-registry-bootstrap.md](./model-feature-registry-bootstrap.md) — model/feature/reference set 初始化 |
| 49 | - [phase1-worker-contract.md](./phase1-worker-contract.md) — worker、job、失败语义合同 | 50 | - [phase1-worker-contract.md](./phase1-worker-contract.md) — worker、job、失败语义合同 |
| 50 | - [postgres_db_schema_samples.md](./postgres_db_schema_samples.md) — PostgreSQL 存储样例 | 51 | - [postgres_db_schema_samples.md](./postgres_db_schema_samples.md) — PostgreSQL 存储样例 | ... | ... |
| ... | @@ -256,6 +256,62 @@ window -> fingerprint / embedding -> candidate -> aggregate | ... | @@ -256,6 +256,62 @@ window -> fingerprint / embedding -> candidate -> aggregate |
| 256 | 256 | ||
| 257 | --- | 257 | --- |
| 258 | 258 | ||
| 259 | ## 1.3 Phase-1 极简 schema 视图 | ||
| 260 | |||
| 261 | 如果只从“第一阶段必须落哪些表”来理解,推荐把正式设计压缩成下面这组最小表集合: | ||
| 262 | |||
| 263 | | 层 | 推荐保留表 | 当前作用 | | ||
| 264 | |---|---|---| | ||
| 265 | | 归属层 | `song`(或当前 `canonical_song` 的等价口径), `recording` | 最终归属到 song,区分不同录音版本 | | ||
| 266 | | 资产层 | `recording_asset` | 管理真实音频文件、来源与编码版本 | | ||
| 267 | | 窗口层 | `audio_window` | 支撑 offset / evidence / 多段投票 | | ||
| 268 | | 特征层 | `feature_set_registry`, `audio_fingerprint`, `audio_embedding` | 管理 fingerprint / embedding 的生成事实 | | ||
| 269 | | reference 层 | `reference_set_registry`, `reference_set_member` | 管理当前线上 reference 集 | | ||
| 270 | |||
| 271 | 也就是说,Phase-1 真正应该优先落稳的是: | ||
| 272 | |||
| 273 | ```text | ||
| 274 | song -> recording -> recording_asset -> audio_window | ||
| 275 | feature_set_registry -> audio_fingerprint / audio_embedding | ||
| 276 | reference_set_registry -> reference_set_member | ||
| 277 | ``` | ||
| 278 | |||
| 279 | ### 这版极简 schema 明确不要求第一天就重投入的内容 | ||
| 280 | |||
| 281 | 可以后补: | ||
| 282 | - `work` | ||
| 283 | - 更重的 `retrieval_index_registry` | ||
| 284 | - 更细的 `retrieval_candidate / match_decision` 在线审计表 | ||
| 285 | - 复杂的多 lane 重排治理表 | ||
| 286 | |||
| 287 | ### 但是极简不等于扁平 | ||
| 288 | |||
| 289 | 即使走极简版,也**不建议**退回到下面这种扁平结构: | ||
| 290 | |||
| 291 | ```text | ||
| 292 | song -> embedding | ||
| 293 | song -> fingerprint | ||
| 294 | ``` | ||
| 295 | |||
| 296 | 原因: | ||
| 297 | - 没有 `recording`,版本信息会丢 | ||
| 298 | - 没有 `asset`,文件来源与去重会乱 | ||
| 299 | - 没有 `window`,evidence/offset/多段聚合会弱很多 | ||
| 300 | - 没有 `feature_set_registry`,模型升级会把 schema 写死 | ||
| 301 | |||
| 302 | ### 一个最实用的实现口径 | ||
| 303 | |||
| 304 | 如果团队现在就要开干,最推荐的实施顺序是: | ||
| 305 | |||
| 306 | 1. 先落 `song / recording / recording_asset / audio_window` | ||
| 307 | 2. 再落 `feature_set_registry / audio_fingerprint / audio_embedding` | ||
| 308 | 3. 再落 `reference_set_registry / reference_set_member` | ||
| 309 | 4. 最后再补 `work / retrieval_index_registry / match_decision` 等增强层 | ||
| 310 | |||
| 311 | 这样既能保持当前 Phase-1 简洁,也不会破坏未来扩展。 | ||
| 312 | |||
| 313 | --- | ||
| 314 | |||
| 259 | ## 2. 数据主链 | 315 | ## 2. 数据主链 |
| 260 | 316 | ||
| 261 | ```mermaid | 317 | ```mermaid | ... | ... |
-
Please register or sign in to post a comment