Commit 80df0d30 80df0d301f60778aac95e3d4fd528af8c7afb47d by cnb.bofCdSsphPA

Why the song-centric semantic lane must move from placeholder to a real MERT baseline

Constraint: The current host now has torch/torchaudio/transformers, so the default song-centric pipeline should produce a real semantic baseline instead of a runtime-ready placeholder
Rejected: Keep the placeholder branch after runtime became available | would leave the main pipeline in a misleading half-ready state
Confidence: medium
Scope-risk: narrow
Directive: Preserve the local_wavehash_embed fallback, but treat mert-v1-95m as the default semantic baseline until MuQ is added as a challenger
Tested: installed torch-2.12.0+cpu, torchaudio-2.11.0+cpu, transformers-5.10.1; py_compile for enrich_songcentric_manifest_with_local_features.py; reran song-centric pipeline; verified latest embedding rows are mert-v1-95m; markdown link check on /workspace/docs
Not-tested: MuQ adapter implementation and production vector-table persistence are still pending
1 parent b0c52b54
......@@ -8,6 +8,8 @@ import json
import wave
from pathlib import Path
import numpy as np
ROOT = Path(__file__).resolve().parents[1]
import sys
if str(ROOT) not in sys.path:
......@@ -15,6 +17,9 @@ if str(ROOT) not in sys.path:
from src.engines.chromaprint_matcher import ChromaprintMatcher, load_audio_mono
MERT_MODEL_ID = 'm-a-p/MERT-v1-95M'
_MERT_RUNTIME = None
def load_jsonl(path: Path):
for line in path.read_text().splitlines():
......@@ -72,8 +77,77 @@ def extract_matcher_fingerprint(path: Path, start_ms: int, end_ms: int) -> dict
return None
def build_semantic_feature(stats: dict, start_ms: int, end_ms: int, runtime_ok: bool, missing: list[str]) -> dict:
def load_mert_runtime():
global _MERT_RUNTIME
if _MERT_RUNTIME is not None:
return _MERT_RUNTIME
import torch
import torchaudio
from transformers import Wav2Vec2FeatureExtractor, AutoModel
feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(MERT_MODEL_ID, trust_remote_code=True)
model = AutoModel.from_pretrained(MERT_MODEL_ID, trust_remote_code=True)
model.eval()
_MERT_RUNTIME = {
'torch': torch,
'torchaudio': torchaudio,
'feature_extractor': feature_extractor,
'model': model,
'sample_rate': int(feature_extractor.sampling_rate),
'hidden_size': int(getattr(model.config, 'hidden_size', 768)),
}
return _MERT_RUNTIME
def extract_mert_embedding(asset_path: Path, start_ms: int, end_ms: int) -> dict:
rt = load_mert_runtime()
torch = rt['torch']
samples, sr = load_audio_mono(str(asset_path), sr=rt['sample_rate'])
samples = np.asarray(samples, dtype=np.float32)
start_frame = int(start_ms * sr / 1000)
end_frame = int(end_ms * sr / 1000)
segment = samples[start_frame:end_frame]
if segment.size == 0:
raise ValueError('empty segment for MERT extraction')
inputs = rt['feature_extractor'](
segment,
sampling_rate=sr,
return_tensors='pt',
)
with torch.no_grad():
outputs = rt['model'](**inputs)
emb = outputs.last_hidden_state.mean(dim=1).squeeze(0).cpu().numpy().astype(np.float32)
digest = hashlib.sha256(emb.tobytes()).hexdigest()
return {
'embedding_dim': int(emb.shape[0]),
'embedding_uri': f"inline-mert://{digest[:16]}:{start_ms}:{end_ms}",
'vector_table_name': f"audio_embedding_vector_{int(emb.shape[0])}_placeholder",
'checksum': f"emb:{digest[:16]}",
'metadata_json': {
'semantic_backend': 'mert_runtime',
'embedding_preview': [float(x) for x in emb[:8]],
'model_id': MERT_MODEL_ID,
'sample_rate': sr,
},
}
def build_semantic_feature(asset_path: Path, stats: dict, start_ms: int, end_ms: int, runtime_ok: bool, missing: list[str]) -> dict:
if runtime_ok:
try:
mert = extract_mert_embedding(asset_path, start_ms, end_ms)
return {
'feature_type': 'embedding',
'model_name': 'mert-v1-95m',
'model_version': 'hf-main',
'feature_set_name': 'mert_5s_hop2.5_v1',
'feature_schema_ver': 'v1',
'embedding_dim': mert['embedding_dim'],
'embedding_uri': mert['embedding_uri'],
'vector_table_name': mert['vector_table_name'],
'checksum': mert['checksum'],
'metadata_json': mert['metadata_json'],
}
except Exception as exc:
return {
'feature_type': 'embedding',
'model_name': 'semantic_runtime_ready_placeholder',
......@@ -84,7 +158,7 @@ def build_semantic_feature(stats: dict, start_ms: int, end_ms: int, runtime_ok:
'embedding_uri': f"runtime-ready://{stats['digest'][:16]}:{start_ms}:{end_ms}",
'vector_table_name': 'audio_embedding_vector_8_placeholder',
'checksum': f"emb:{stats['digest'][:16]}",
'metadata_json': {'semantic_backend': 'runtime_ready_placeholder'},
'metadata_json': {'semantic_backend': 'runtime_ready_placeholder', 'runtime_error': str(exc)},
}
return {
'feature_type': 'embedding',
......@@ -162,7 +236,7 @@ def main() -> int:
}
fallback_fp_count += 1
emb = build_semantic_feature(stats, window['start_ms'], window['end_ms'], runtime_ok, missing_runtime)
emb = build_semantic_feature(asset_path, stats, window['start_ms'], window['end_ms'], runtime_ok, missing_runtime)
if runtime_ok:
semantic_runtime_ready_count += 1
else:
......
# Changelog
## 2026-06-04
- fresh runtime 进展:已在当前 host 成功安装 `torch-2.12.0+cpu``torchaudio-2.11.0+cpu``transformers-5.10.1`,重跑 song-centric 主链后确认 `semantic_runtime_available = true``semantic_runtime_ready_count = 5``semantic_fallback_count = 0`;当前 semantic 已从 fallback 推进到 `semantic_runtime_ready_placeholder`,下一步只差接真实 `MERT / MuQ` adapter。
- fresh runtime 进展:已在当前 host 成功安装 `torch-2.12.0+cpu``torchaudio-2.11.0+cpu``transformers-5.10.1`,重跑 song-centric 主链后确认 `semantic_runtime_available = true``semantic_runtime_ready_count = 5``semantic_fallback_count = 0`;当前 semantic 已从 fallback 推进到 `mert-v1-95m`,下一步可在不破坏当前 MERT 基线的前提下继续接 `MuQ` adapter。
- 收敛 `docs/` 到当前 song-centric 主线,只保留 `README / start-here / session-handoff / postgresql-data-model / postgres_db_schema_samples / CHANGELOG` 六份核心文档,删除旧的 v2 / planner-worker / registry 扩展文档,避免新同学误入已退居次线的设计。
- 重写 `docs/postgresql-data-model.md`,明确 `保存切片的数据 + 模型 + feature` 的落表方案:`window``audio_object`,模型身份落 `feature_fact.model_name/model_version/feature_set_name`,具体 `fingerprint/embedding` 也统一落 `feature_fact`
- 重写 `docs/postgres_db_schema_samples.md` 与入口文档,补充当前 4 表主链的流程图、典型 SQL 样例、查询回溯路径与写入顺序,统一文档口径到 `media_entity -> audio_object -> feature_fact -> set_membership`
......
......@@ -33,7 +33,7 @@ acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127.
- `semantic_runtime_missing = []`
- `semantic_runtime_ready_count = 5`
- `semantic_fallback_count = 0`
- `import_counts = media_entity:9 / audio_object:22 / feature_fact:29 / set_membership:9`
- `import_counts = media_entity:9 / audio_object:22 / feature_fact:34 / set_membership:9`
---
......@@ -122,10 +122,10 @@ flowchart TD
- `torch / torchaudio / transformers` 已可导入
- 当前 `semantic_runtime_available = true`
- 当前 semantic 仍不是 `MERT / MuQ`,而是 `semantic_runtime_ready_placeholder`
- 当前 semantic 已接上真实 `mert-v1-95m` baseline
这说明当前主要 blocker 已从“依赖缺失”推进为:
> **runtime 已就绪,但真实 semantic adapter 还没接入。**
> **runtime 已就绪,真实 `MERT` baseline 已接入,下一步可继续接 `MuQ`。**
---
......@@ -174,7 +174,7 @@ flowchart TD
- exact lane 已优先复用 `ChromaprintMatcher`
- semantic lane 还没有真实接入 `MERT / MuQ`
- runtime 就绪时,当前会产出:
- `model_name = semantic_runtime_ready_placeholder`
- `model_name = mert-v1-95m`
- fallback 分支仍保留:
- `model_name = local_wavehash_embed`
......
......@@ -31,7 +31,7 @@ acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127.
- `semantic_runtime_missing = []`
- `semantic_runtime_ready_count = 5`
- `semantic_fallback_count = 0`
- `import_counts = media_entity:9 / audio_object:22 / feature_fact:29 / set_membership:9`
- `import_counts = media_entity:9 / audio_object:22 / feature_fact:34 / set_membership:9`
---
......@@ -100,7 +100,7 @@ flowchart TD
- 真实目录 -> manifest -> import 已打通
- 真实目录 -> fingerprint enrichment -> import 已打通
- semantic lane 已做成 runtime-ready
- 当前 host 已能进入 runtime-ready placeholder 分支,下一步只差接真实 `MERT / MuQ`
- 当前 host 已能进入 runtime-ready placeholder 分支,下一步可在不破坏当前 MERT 基线的前提下继续接 `MuQ`
- 当前 exact lane 已优先复用仓库内 `ChromaprintMatcher`
---
......@@ -108,14 +108,14 @@ flowchart TD
## 7. 当前最该继续什么
### 第一优先级
把 semantic lane 从 `semantic_runtime_ready_placeholder` 升级成真实 encoder adapter,且不破坏现有宿主链。
把 semantic lane 从 `mert-v1-95m` baseline 扩展到 `MuQ` challenger,且不破坏现有宿主链。
### 当前 host 事实
- `torch` 已可导入
- `torchaudio` 已可导入
- `transformers` 已可导入
- 当前 `semantic_runtime_available = true`
- 当前最新主链产出仍是 `semantic_runtime_ready_placeholder`,不是真实 `MERT / MuQ`
- 当前最新主链产出已经是 `mert-v1-95m`;下一步可继续补 `MuQ` challenger
---
......