Commit 797f9032 797f9032e68fa7e89d1e472187eeb720b93e2432 by cnb.bofCdSsphPA

Why the handoff should include a concrete live feature lineage example

Constraint: Future sessions need a zero-ambiguity PostgreSQL example that matches the current live song-centric pipeline
Rejected: Only describing lineage abstractly | forces re-verification every session
Confidence: high
Scope-risk: narrow
Directive: Prefer concrete live feature/window/asset/song examples in handoff docs whenever the default path changes
Tested: markdown link check under docs; live SQL verification for feature_id=34 lineage; manifest feature sample extraction
Not-tested: Re-running the full directory pipeline in this commit
1 parent 6ee8c576
......@@ -3,6 +3,7 @@
## 2026-06-04
- 继续收敛文档到当前 live 主链口径:补齐 `feature_fact.object_id -> audio_object(window)``window.parent_object_id -> asset``feature_fact.song_id -> media_entity(song)` 的绑定说明,并新增 manifest/SQL 双样例,专门回答 Phase-1 开源模型集合应该如何落地存储以及 feature 与 audio object 如何关联。
- 修正 `docs/session-handoff.md` 中关于 semantic lane 的旧状态残留,统一到当前真实事实:live 默认已落 `chromaprint_matcher + mert-v1-95m`,MuQ 仍是下一阶段 challenger。
- 继续补充可复核的 live 样例:把 `feature_id = 34 -> window_id = 22 -> asset_id = 20 -> song_beta` 的真实 PostgreSQL 回溯结果写入 handoff 与 schema sample 文档,方便下次 session 直接人工复核绑定链路。
## 2026-06-04
- fresh runtime 进展:已在当前 host 成功安装 `torch-2.12.0+cpu``torchaudio-2.11.0+cpu``transformers-5.10.1`,重跑 song-centric 主链后确认 `semantic_runtime_available = true``semantic_runtime_ready_count = 5``semantic_fallback_count = 0`;当前 semantic 已从 fallback 推进到 `mert-v1-95m`,下一步可在不破坏当前 MERT 基线的前提下继续接 `MuQ` adapter。
......
......@@ -752,6 +752,29 @@ where ff.feature_id = :feature_id;
3.`parent_object_id` 找到它所属的 `asset`
4.`song_id` 找到最终归属的 `song`
### 14.1 一个当前 live 的真实结果
当前 PostgreSQL `acr_songcentric_test` 中,`feature_id = 34` 的真实回溯结果是:
```text
feature_id = 34
feature_type = embedding
model_name = mert-v1-95m
model_version = hf-main
feature_set_name = mert_5s_hop2.5_v1
window_id = 22
window_range = 1000-6000 ms
asset_id = 20
asset_uri = /workspace/acr-engine/data/songcentric_builder_smoke/song_beta/artist_b/clip2.wav
song_id = 9
song_biz_key = song_beta
```
这条 live 结果说明:
- 当前真实 semantic baseline 已经是 `mert-v1-95m`
- 一条 embedding feature 可以被精确回溯到具体 `window/asset/song`
- 这正是当前版权保护链路里“快速定位 song_id”的最小证据闭环
## 15. 一个完整的多 asset / 多 window / 多 model 样例
假设:
......
......@@ -196,6 +196,29 @@ flowchart TD
- 当前 blocker:`import muq` 触发 `RuntimeError: operator torchvision::nms does not exist`
- 结论:MuQ 仍是下一阶段 challenger,不是当前 live 默认基线
### 一个可直接复核的 live 样例
当前可直接用 `feature_id = 34` 做人工复核:
```text
feature_id = 34
feature_type = embedding
model_name = mert-v1-95m
model_version = hf-main
feature_set_name = mert_5s_hop2.5_v1
window_id = 22
window_range = 1000-6000 ms
asset_id = 20
asset_uri = /workspace/acr-engine/data/songcentric_builder_smoke/song_beta/artist_b/clip2.wav
song_id = 9
song_biz_key = song_beta
```
这条样例可以非常直观地证明:
- feature 不是直接挂 song,而是先挂到 `window`
- `window` 通过 `parent_object_id` 回到 `asset`
- 最终通过 `song_id` 回到 `song_beta`
### 当前 manifest 形状(导入前)
```json
......