Why the retrieval docs need the online song backtrace made explicit
Constraint: New engineers need a direct feature_fact-to-song_id query path on the current 4-table schema without reconstructing it from scattered examples Rejected: Leave only insert-side diagrams | does not explain how online recall returns song ownership evidence Confidence: high Scope-risk: narrow Directive: Keep query-path docs aligned with the feature_fact -> window -> asset -> song chain when adding new retrieval lanes Tested: markdown link check on /workspace/docs after adding retrieval flow diagrams and SQL templates Not-tested: No live database rerun; this change only documents the already-verified schema path
Showing
3 changed files
with
157 additions
and
0 deletions
| ... | @@ -4,6 +4,7 @@ | ... | @@ -4,6 +4,7 @@ |
| 4 | - 收敛 `docs/` 到当前 song-centric 主线,只保留 `README / start-here / session-handoff / postgresql-data-model / postgres_db_schema_samples / CHANGELOG` 六份核心文档,删除旧的 v2 / planner-worker / registry 扩展文档,避免新同学误入已退居次线的设计。 | 4 | - 收敛 `docs/` 到当前 song-centric 主线,只保留 `README / start-here / session-handoff / postgresql-data-model / postgres_db_schema_samples / CHANGELOG` 六份核心文档,删除旧的 v2 / planner-worker / registry 扩展文档,避免新同学误入已退居次线的设计。 |
| 5 | - 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window` 落 `audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`。 | 5 | - 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window` 落 `audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`。 |
| 6 | - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`。 | 6 | - 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`。 |
| 7 | - 继续补强在线检索说明:在 `docs/postgresql-data-model.md` 与 `docs/postgres_db_schema_samples.md` 新增 `feature_fact -> window -> asset -> song_id` 回溯流程图,以及 song-level 聚合 SQL 模板,方便研发直接按当前 schema 实现召回后归属。 | ||
| 7 | 8 | ||
| 8 | ## 2026-06-04 | 9 | ## 2026-06-04 |
| 9 | 10 | ... | ... |
| ... | @@ -381,3 +381,59 @@ song(Song Alpha) | ... | @@ -381,3 +381,59 @@ song(Song Alpha) |
| 381 | - [start-here.md](./start-here.md) | 381 | - [start-here.md](./start-here.md) |
| 382 | - [session-handoff.md](./session-handoff.md) | 382 | - [session-handoff.md](./session-handoff.md) |
| 383 | - [postgresql-data-model.md](./postgresql-data-model.md) | 383 | - [postgresql-data-model.md](./postgresql-data-model.md) |
| 384 | |||
| 385 | --- | ||
| 386 | |||
| 387 | ## 11. 在线检索回溯样例 | ||
| 388 | |||
| 389 | ### 11.1 从命中的 feature 回查 song | ||
| 390 | |||
| 391 | ```mermaid | ||
| 392 | flowchart LR | ||
| 393 | A[feature_fact] --> B[window] | ||
| 394 | B --> C[asset] | ||
| 395 | C --> D[song] | ||
| 396 | ``` | ||
| 397 | |||
| 398 | ### 11.2 典型在线查询 SQL | ||
| 399 | |||
| 400 | ```sql | ||
| 401 | select ff.feature_id, | ||
| 402 | ff.feature_type, | ||
| 403 | ff.model_name, | ||
| 404 | ff.feature_set_name, | ||
| 405 | w.object_id as window_id, | ||
| 406 | w.start_ms, | ||
| 407 | w.end_ms, | ||
| 408 | a.object_id as asset_id, | ||
| 409 | a.storage_uri, | ||
| 410 | s.entity_id as song_id, | ||
| 411 | s.title, | ||
| 412 | s.artist_name | ||
| 413 | from feature_fact ff | ||
| 414 | join audio_object w | ||
| 415 | on w.object_id = ff.object_id | ||
| 416 | and w.object_type = 'window' | ||
| 417 | join audio_object a | ||
| 418 | on a.object_id = w.parent_object_id | ||
| 419 | and a.object_type = 'asset' | ||
| 420 | join media_entity s | ||
| 421 | on s.entity_id = ff.song_id | ||
| 422 | where ff.feature_id = :feature_id; | ||
| 423 | ``` | ||
| 424 | |||
| 425 | ### 11.3 典型 song-level 聚合 SQL | ||
| 426 | |||
| 427 | ```sql | ||
| 428 | select ff.song_id, | ||
| 429 | s.title, | ||
| 430 | s.artist_name, | ||
| 431 | count(*) as matched_windows | ||
| 432 | from feature_fact ff | ||
| 433 | join media_entity s | ||
| 434 | on s.entity_id = ff.song_id | ||
| 435 | where ff.feature_id = any(:matched_feature_ids) | ||
| 436 | group by ff.song_id, s.title, s.artist_name | ||
| 437 | order by matched_windows desc | ||
| 438 | limit 20; | ||
| 439 | ``` | ... | ... |
| ... | @@ -288,3 +288,103 @@ Phase-1 暂不强求: | ... | @@ -288,3 +288,103 @@ Phase-1 暂不强求: |
| 288 | - [start-here.md](./start-here.md) | 288 | - [start-here.md](./start-here.md) |
| 289 | - [session-handoff.md](./session-handoff.md) | 289 | - [session-handoff.md](./session-handoff.md) |
| 290 | - [postgres_db_schema_samples.md](./postgres_db_schema_samples.md) | 290 | - [postgres_db_schema_samples.md](./postgres_db_schema_samples.md) |
| 291 | |||
| 292 | --- | ||
| 293 | |||
| 294 | ## 13. 在线检索时怎么从 feature 回到 `song_id` | ||
| 295 | |||
| 296 | 这是当前研发最需要牢记的一条回溯链: | ||
| 297 | |||
| 298 | ```text | ||
| 299 | feature_fact -> audio_object(window) -> audio_object(asset) -> media_entity(song) | ||
| 300 | ``` | ||
| 301 | |||
| 302 | ### 13.1 在线检索流程图 | ||
| 303 | |||
| 304 | ```mermaid | ||
| 305 | flowchart LR | ||
| 306 | Q[query audio] --> QW[query windows] | ||
| 307 | QW --> QE[query fingerprint / embedding] | ||
| 308 | QE --> FF[feature_fact] | ||
| 309 | FF --> W[audio_object\nobject_type=window] | ||
| 310 | W --> A[audio_object\nobject_type=asset] | ||
| 311 | A --> S[media_entity\nentity_type=song] | ||
| 312 | S --> R[return song_id + title + artist + evidence] | ||
| 313 | ``` | ||
| 314 | |||
| 315 | ### 13.2 聚合流程图 | ||
| 316 | |||
| 317 | ```mermaid | ||
| 318 | flowchart TD | ||
| 319 | A[query window features] --> B[命中多个 feature_fact rows] | ||
| 320 | B --> C[回查 window] | ||
| 321 | C --> D[回查 asset] | ||
| 322 | D --> E[聚合到 song_id] | ||
| 323 | E --> F[按 hit_count / score / offset coverage 排序] | ||
| 324 | F --> G[返回 topK songs] | ||
| 325 | ``` | ||
| 326 | |||
| 327 | ### 13.3 最小查询 SQL 模板 | ||
| 328 | |||
| 329 | ```sql | ||
| 330 | select ff.feature_id, | ||
| 331 | ff.feature_type, | ||
| 332 | ff.model_name, | ||
| 333 | ff.model_version, | ||
| 334 | ff.feature_set_name, | ||
| 335 | w.object_id as window_id, | ||
| 336 | w.start_ms, | ||
| 337 | w.end_ms, | ||
| 338 | a.object_id as asset_id, | ||
| 339 | a.storage_uri, | ||
| 340 | s.entity_id as song_id, | ||
| 341 | s.title, | ||
| 342 | s.artist_name | ||
| 343 | from feature_fact ff | ||
| 344 | join audio_object w | ||
| 345 | on w.object_id = ff.object_id | ||
| 346 | and w.object_type = 'window' | ||
| 347 | join audio_object a | ||
| 348 | on a.object_id = w.parent_object_id | ||
| 349 | and a.object_type = 'asset' | ||
| 350 | join media_entity s | ||
| 351 | on s.entity_id = ff.song_id | ||
| 352 | where ff.feature_id = :feature_id; | ||
| 353 | ``` | ||
| 354 | |||
| 355 | ### 13.4 一个 song-level 聚合 SQL 模板 | ||
| 356 | |||
| 357 | ```sql | ||
| 358 | select ff.song_id, | ||
| 359 | s.title, | ||
| 360 | s.artist_name, | ||
| 361 | count(*) as matched_windows, | ||
| 362 | min(w.start_ms) as first_hit_ms, | ||
| 363 | max(w.end_ms) as last_hit_ms | ||
| 364 | from feature_fact ff | ||
| 365 | join audio_object w | ||
| 366 | on w.object_id = ff.object_id | ||
| 367 | and w.object_type = 'window' | ||
| 368 | join media_entity s | ||
| 369 | on s.entity_id = ff.song_id | ||
| 370 | where ff.feature_type = :feature_type | ||
| 371 | and ff.model_name = :model_name | ||
| 372 | and ff.feature_set_name = :feature_set_name | ||
| 373 | and ff.feature_id = any(:matched_feature_ids) | ||
| 374 | group by ff.song_id, s.title, s.artist_name | ||
| 375 | order by matched_windows desc, first_hit_ms asc | ||
| 376 | limit 20; | ||
| 377 | ``` | ||
| 378 | |||
| 379 | ### 13.5 这条链为什么重要 | ||
| 380 | |||
| 381 | 因为它把 3 件事拆清楚了: | ||
| 382 | - `feature_fact` 负责回答:**命中了什么特征** | ||
| 383 | - `audio_object(window/asset)` 负责回答:**命中了哪段、来自哪个文件** | ||
| 384 | - `media_entity(song)` 负责回答:**最终该归到哪个 `song_id`** | ||
| 385 | |||
| 386 | 所以 Phase-1 即使不引入更复杂的 `recording/work/version`,也已经足够支撑: | ||
| 387 | - 版权保护归属 | ||
| 388 | - 片段/BGM 定位 | ||
| 389 | - evidence 回查 | ||
| 390 | - topK song 级召回 | ... | ... |
-
Please register or sign in to post a comment