Make semantic feature enrichment runtime-aware on the song-centric path
Constraint: Keep the current real-directory import path executable on this host while making semantic-lane readiness explicit instead of pretending the heavyweight runtime exists. Rejected: Hardwire semantic enrichment to the local fallback without reporting missing runtime state | It hides the true blocker and weakens the upgrade path to real semantic models. Confidence: high Scope-risk: narrow Directive: On this host, treat local_wavehash_embed as a fallback semantic backend and persist missing runtime evidence until torch/torchaudio/transformers are installed. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the v3 enriched manifest twice into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test and verified counts stayed media_entity=9, audio_object=22, feature_fact=24, set_membership=9; report shows semantic_runtime_available=false and missing=[torch, torchaudio, transformers]; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: real MERT/MuQ extraction on this host
Showing
7 changed files
with
232 additions
and
12 deletions
acr-engine/data/pgvector_eval/music20/songcentric_directory_manifest_with_features_v3.jsonl
0 → 100644
| 1 | {"song": {"biz_key": "song_alpha", "title": "song alpha", "artist_name": "artist a"}, "asset": {"source_type": "official", "storage_uri": "/workspace/acr-engine/data/songcentric_builder_smoke/song_alpha/artist_a/clip1.wav", "storage_scheme": "file", "checksum": "path:/workspace/acr-engine/data/songcentric_builder_smoke/song_alpha/artist_a/clip1.wav", "codec": "wav", "sample_rate": 16000, "channels": 1, "duration_ms": 8000}, "windows": [{"start_ms": 0, "end_ms": 5000, "features": [{"feature_type": "fingerprint", "model_name": "chromaprint_matcher", "model_version": "phase1_local", "feature_set_name": "chromaprint_matcher_5s", "fingerprint_value": "dc0c731425f360787f462da693ff4a50", "checksum": "chromaprint:dc0c731425f36078", "metadata_json": {"hash_count": 2643, "hash_sample": [[1842187, 11], [1842188, 11], [1842189, 11], [1842201, 11], [1842212, 11], [1842213, 11], [1842214, 11], [1842438, 11]]}}, {"feature_type": "embedding", "model_name": "local_wavehash_embed", "model_version": "v1", "feature_set_name": "wavehash_embed_5s", "feature_schema_ver": "v1", "embedding_dim": 8, "embedding_uri": "inline://593c7a661cc87444:0:5000", "vector_table_name": "audio_embedding_vector_8_placeholder", "checksum": "emb:593c7a661cc87444", "metadata_json": {"energy": 30555200, "rate": 16000, "channels": 1, "semantic_backend": "local_fallback", "runtime_missing": ["torch", "torchaudio", "transformers"]}}]}, {"start_ms": 2500, "end_ms": 7500, "features": [{"feature_type": "fingerprint", "model_name": "chromaprint_matcher", "model_version": "phase1_local", "feature_set_name": "chromaprint_matcher_5s", "fingerprint_value": "dc0c731425f360787f462da693ff4a50", "checksum": "chromaprint:dc0c731425f36078", "metadata_json": {"hash_count": 2643, "hash_sample": [[1842187, 11], [1842188, 11], [1842189, 11], [1842201, 11], [1842212, 11], [1842213, 11], [1842214, 11], [1842438, 11]]}}, {"feature_type": "embedding", "model_name": "local_wavehash_embed", "model_version": "v1", "feature_set_name": "wavehash_embed_5s", "feature_schema_ver": "v1", "embedding_dim": 8, "embedding_uri": "inline://593c7a661cc87444:2500:7500", "vector_table_name": "audio_embedding_vector_8_placeholder", "checksum": "emb:593c7a661cc87444", "metadata_json": {"energy": 30555200, "rate": 16000, "channels": 1, "semantic_backend": "local_fallback", "runtime_missing": ["torch", "torchaudio", "transformers"]}}]}, {"start_ms": 3000, "end_ms": 8000, "features": [{"feature_type": "fingerprint", "model_name": "chromaprint_matcher", "model_version": "phase1_local", "feature_set_name": "chromaprint_matcher_5s", "fingerprint_value": "dc0c731425f360787f462da693ff4a50", "checksum": "chromaprint:dc0c731425f36078", "metadata_json": {"hash_count": 2643, "hash_sample": [[1842187, 11], [1842188, 11], [1842189, 11], [1842201, 11], [1842212, 11], [1842213, 11], [1842214, 11], [1842438, 11]]}}, {"feature_type": "embedding", "model_name": "local_wavehash_embed", "model_version": "v1", "feature_set_name": "wavehash_embed_5s", "feature_schema_ver": "v1", "embedding_dim": 8, "embedding_uri": "inline://593c7a661cc87444:3000:8000", "vector_table_name": "audio_embedding_vector_8_placeholder", "checksum": "emb:593c7a661cc87444", "metadata_json": {"energy": 30555200, "rate": 16000, "channels": 1, "semantic_backend": "local_fallback", "runtime_missing": ["torch", "torchaudio", "transformers"]}}]}], "memberships": [{"set_type": "reference_set", "set_name": "phase1_hot_reference_v1", "member_type": "asset", "priority": 100}]} | ||
| 2 | {"song": {"biz_key": "song_beta", "title": "song beta", "artist_name": "artist b"}, "asset": {"source_type": "official", "storage_uri": "/workspace/acr-engine/data/songcentric_builder_smoke/song_beta/artist_b/clip2.wav", "storage_scheme": "file", "checksum": "path:/workspace/acr-engine/data/songcentric_builder_smoke/song_beta/artist_b/clip2.wav", "codec": "wav", "sample_rate": 16000, "channels": 1, "duration_ms": 6000}, "windows": [{"start_ms": 0, "end_ms": 5000, "features": [{"feature_type": "fingerprint", "model_name": "chromaprint_matcher", "model_version": "phase1_local", "feature_set_name": "chromaprint_matcher_5s", "fingerprint_value": "d8fc2442b4ec3ce5ae180c5845cffccb", "checksum": "chromaprint:d8fc2442b4ec3ce5", "metadata_json": {"hash_count": 2202, "hash_sample": [[2763289, 23], [2763524, 23], [2763541, 23], [2763549, 23], [2763566, 23], [2763801, 23], [2764050, 23], [2764075, 23]]}}, {"feature_type": "embedding", "model_name": "local_wavehash_embed", "model_version": "v1", "feature_set_name": "wavehash_embed_5s", "feature_schema_ver": "v1", "embedding_dim": 8, "embedding_uri": "inline://4ed2ccfa55b10b88:0:5000", "vector_table_name": "audio_embedding_vector_8_placeholder", "checksum": "emb:4ed2ccfa55b10b88", "metadata_json": {"energy": 30555680, "rate": 16000, "channels": 1, "semantic_backend": "local_fallback", "runtime_missing": ["torch", "torchaudio", "transformers"]}}]}, {"start_ms": 1000, "end_ms": 6000, "features": [{"feature_type": "fingerprint", "model_name": "chromaprint_matcher", "model_version": "phase1_local", "feature_set_name": "chromaprint_matcher_5s", "fingerprint_value": "d8fc2442b4ec3ce5ae180c5845cffccb", "checksum": "chromaprint:d8fc2442b4ec3ce5", "metadata_json": {"hash_count": 2202, "hash_sample": [[2763289, 23], [2763524, 23], [2763541, 23], [2763549, 23], [2763566, 23], [2763801, 23], [2764050, 23], [2764075, 23]]}}, {"feature_type": "embedding", "model_name": "local_wavehash_embed", "model_version": "v1", "feature_set_name": "wavehash_embed_5s", "feature_schema_ver": "v1", "embedding_dim": 8, "embedding_uri": "inline://4ed2ccfa55b10b88:1000:6000", "vector_table_name": "audio_embedding_vector_8_placeholder", "checksum": "emb:4ed2ccfa55b10b88", "metadata_json": {"energy": 30555680, "rate": 16000, "channels": 1, "semantic_backend": "local_fallback", "runtime_missing": ["torch", "torchaudio", "transformers"]}}]}], "memberships": [{"set_type": "reference_set", "set_name": "phase1_hot_reference_v1", "member_type": "asset", "priority": 100}]} |
| 1 | { | ||
| 2 | "schema": "acr_songcentric_test", | ||
| 3 | "manifest": "acr-engine/data/pgvector_eval/music20/songcentric_directory_manifest_with_features_v3.jsonl", | ||
| 4 | "imported": [ | ||
| 5 | { | ||
| 6 | "song_id": 8, | ||
| 7 | "asset_id": 16, | ||
| 8 | "window_ids": [ | ||
| 9 | 17, | ||
| 10 | 18, | ||
| 11 | 19 | ||
| 12 | ], | ||
| 13 | "feature_ids": [ | ||
| 14 | 20, | ||
| 15 | 11, | ||
| 16 | 21, | ||
| 17 | 13, | ||
| 18 | 22, | ||
| 19 | 15 | ||
| 20 | ], | ||
| 21 | "membership_ids": [ | ||
| 22 | 8 | ||
| 23 | ] | ||
| 24 | }, | ||
| 25 | { | ||
| 26 | "song_id": 9, | ||
| 27 | "asset_id": 20, | ||
| 28 | "window_ids": [ | ||
| 29 | 21, | ||
| 30 | 22 | ||
| 31 | ], | ||
| 32 | "feature_ids": [ | ||
| 33 | 23, | ||
| 34 | 17, | ||
| 35 | 24, | ||
| 36 | 19 | ||
| 37 | ], | ||
| 38 | "membership_ids": [ | ||
| 39 | 9 | ||
| 40 | ] | ||
| 41 | } | ||
| 42 | ], | ||
| 43 | "counts": { | ||
| 44 | "media_entity": 9, | ||
| 45 | "audio_object": 22, | ||
| 46 | "feature_fact": 24, | ||
| 47 | "set_membership": 9 | ||
| 48 | }, | ||
| 49 | "window_lineage_sample": { | ||
| 50 | "window_id": 22, | ||
| 51 | "asset_id": 20, | ||
| 52 | "song_id": 9, | ||
| 53 | "title": "song beta", | ||
| 54 | "start_ms": 1000, | ||
| 55 | "end_ms": 6000 | ||
| 56 | }, | ||
| 57 | "feature_lineage_sample": { | ||
| 58 | "feature_type": "fingerprint", | ||
| 59 | "model_name": "chromaprint_matcher", | ||
| 60 | "model_version": "phase1_local", | ||
| 61 | "feature_set_name": "chromaprint_matcher_5s", | ||
| 62 | "window_id": 22, | ||
| 63 | "song_id": 9, | ||
| 64 | "title": "song beta" | ||
| 65 | } | ||
| 66 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| 1 | { | ||
| 2 | "schema": "acr_songcentric_test", | ||
| 3 | "manifest": "acr-engine/data/pgvector_eval/music20/songcentric_directory_manifest_with_features_v3.jsonl", | ||
| 4 | "imported": [ | ||
| 5 | { | ||
| 6 | "song_id": 8, | ||
| 7 | "asset_id": 16, | ||
| 8 | "window_ids": [ | ||
| 9 | 17, | ||
| 10 | 18, | ||
| 11 | 19 | ||
| 12 | ], | ||
| 13 | "feature_ids": [ | ||
| 14 | 20, | ||
| 15 | 11, | ||
| 16 | 21, | ||
| 17 | 13, | ||
| 18 | 22, | ||
| 19 | 15 | ||
| 20 | ], | ||
| 21 | "membership_ids": [ | ||
| 22 | 8 | ||
| 23 | ] | ||
| 24 | }, | ||
| 25 | { | ||
| 26 | "song_id": 9, | ||
| 27 | "asset_id": 20, | ||
| 28 | "window_ids": [ | ||
| 29 | 21, | ||
| 30 | 22 | ||
| 31 | ], | ||
| 32 | "feature_ids": [ | ||
| 33 | 23, | ||
| 34 | 17, | ||
| 35 | 24, | ||
| 36 | 19 | ||
| 37 | ], | ||
| 38 | "membership_ids": [ | ||
| 39 | 9 | ||
| 40 | ] | ||
| 41 | } | ||
| 42 | ], | ||
| 43 | "counts": { | ||
| 44 | "media_entity": 9, | ||
| 45 | "audio_object": 22, | ||
| 46 | "feature_fact": 24, | ||
| 47 | "set_membership": 9 | ||
| 48 | }, | ||
| 49 | "window_lineage_sample": { | ||
| 50 | "window_id": 22, | ||
| 51 | "asset_id": 20, | ||
| 52 | "song_id": 9, | ||
| 53 | "title": "song beta", | ||
| 54 | "start_ms": 1000, | ||
| 55 | "end_ms": 6000 | ||
| 56 | }, | ||
| 57 | "feature_lineage_sample": { | ||
| 58 | "feature_type": "fingerprint", | ||
| 59 | "model_name": "chromaprint_matcher", | ||
| 60 | "model_version": "phase1_local", | ||
| 61 | "feature_set_name": "chromaprint_matcher_5s", | ||
| 62 | "window_id": 22, | ||
| 63 | "song_id": 9, | ||
| 64 | "title": "song beta" | ||
| 65 | } | ||
| 66 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
acr-engine/data/pgvector_eval/music20/songcentric_directory_manifest_with_features_v3_report.json
0 → 100644
| 1 | { | ||
| 2 | "input_manifest": "/workspace/acr-engine/data/pgvector_eval/music20/songcentric_directory_manifest.jsonl", | ||
| 3 | "output_manifest": "/workspace/acr-engine/data/pgvector_eval/music20/songcentric_directory_manifest_with_features_v3.jsonl", | ||
| 4 | "rows": 2, | ||
| 5 | "wav_windows_seen": 5, | ||
| 6 | "features_added": 10, | ||
| 7 | "matcher_fingerprint_count": 5, | ||
| 8 | "fallback_fingerprint_count": 0, | ||
| 9 | "semantic_runtime_available": false, | ||
| 10 | "semantic_runtime_missing": [ | ||
| 11 | "torch", | ||
| 12 | "torchaudio", | ||
| 13 | "transformers" | ||
| 14 | ], | ||
| 15 | "semantic_runtime_ready_count": 0, | ||
| 16 | "semantic_fallback_count": 5 | ||
| 17 | } | ||
| ... | \ No newline at end of file | ... | \ No newline at end of file |
| ... | @@ -3,6 +3,7 @@ from __future__ import annotations | ... | @@ -3,6 +3,7 @@ from __future__ import annotations |
| 3 | 3 | ||
| 4 | import argparse | 4 | import argparse |
| 5 | import hashlib | 5 | import hashlib |
| 6 | import importlib | ||
| 6 | import json | 7 | import json |
| 7 | import wave | 8 | import wave |
| 8 | from pathlib import Path | 9 | from pathlib import Path |
| ... | @@ -22,6 +23,20 @@ def load_jsonl(path: Path): | ... | @@ -22,6 +23,20 @@ def load_jsonl(path: Path): |
| 22 | yield json.loads(line) | 23 | yield json.loads(line) |
| 23 | 24 | ||
| 24 | 25 | ||
| 26 | def module_available(name: str) -> bool: | ||
| 27 | try: | ||
| 28 | importlib.import_module(name) | ||
| 29 | return True | ||
| 30 | except Exception: | ||
| 31 | return False | ||
| 32 | |||
| 33 | |||
| 34 | def semantic_runtime_available() -> tuple[bool, list[str]]: | ||
| 35 | required = ['torch', 'torchaudio', 'transformers'] | ||
| 36 | missing = [m for m in required if not module_available(m)] | ||
| 37 | return (len(missing) == 0, missing) | ||
| 38 | |||
| 39 | |||
| 25 | def read_wav_stats(path: Path, start_ms: int, end_ms: int) -> dict: | 40 | def read_wav_stats(path: Path, start_ms: int, end_ms: int) -> dict: |
| 26 | with wave.open(str(path), 'rb') as wf: | 41 | with wave.open(str(path), 'rb') as wf: |
| 27 | rate = wf.getframerate() | 42 | rate = wf.getframerate() |
| ... | @@ -57,6 +72,40 @@ def extract_matcher_fingerprint(path: Path, start_ms: int, end_ms: int) -> dict | ... | @@ -57,6 +72,40 @@ def extract_matcher_fingerprint(path: Path, start_ms: int, end_ms: int) -> dict |
| 57 | return None | 72 | return None |
| 58 | 73 | ||
| 59 | 74 | ||
| 75 | def build_semantic_feature(stats: dict, start_ms: int, end_ms: int, runtime_ok: bool, missing: list[str]) -> dict: | ||
| 76 | if runtime_ok: | ||
| 77 | return { | ||
| 78 | 'feature_type': 'embedding', | ||
| 79 | 'model_name': 'semantic_runtime_ready_placeholder', | ||
| 80 | 'model_version': 'awaiting_real_adapter', | ||
| 81 | 'feature_set_name': 'semantic_runtime_ready_5s', | ||
| 82 | 'feature_schema_ver': 'v1', | ||
| 83 | 'embedding_dim': 8, | ||
| 84 | 'embedding_uri': f"runtime-ready://{stats['digest'][:16]}:{start_ms}:{end_ms}", | ||
| 85 | 'vector_table_name': 'audio_embedding_vector_8_placeholder', | ||
| 86 | 'checksum': f"emb:{stats['digest'][:16]}", | ||
| 87 | 'metadata_json': {'semantic_backend': 'runtime_ready_placeholder'}, | ||
| 88 | } | ||
| 89 | return { | ||
| 90 | 'feature_type': 'embedding', | ||
| 91 | 'model_name': 'local_wavehash_embed', | ||
| 92 | 'model_version': 'v1', | ||
| 93 | 'feature_set_name': 'wavehash_embed_5s', | ||
| 94 | 'feature_schema_ver': 'v1', | ||
| 95 | 'embedding_dim': 8, | ||
| 96 | 'embedding_uri': f"inline://{stats['digest'][:16]}:{start_ms}:{end_ms}", | ||
| 97 | 'vector_table_name': 'audio_embedding_vector_8_placeholder', | ||
| 98 | 'checksum': f"emb:{stats['digest'][:16]}", | ||
| 99 | 'metadata_json': { | ||
| 100 | 'energy': stats['energy'], | ||
| 101 | 'rate': stats['rate'], | ||
| 102 | 'channels': stats['channels'], | ||
| 103 | 'semantic_backend': 'local_fallback', | ||
| 104 | 'runtime_missing': missing, | ||
| 105 | }, | ||
| 106 | } | ||
| 107 | |||
| 108 | |||
| 60 | def main() -> int: | 109 | def main() -> int: |
| 61 | parser = argparse.ArgumentParser() | 110 | parser = argparse.ArgumentParser() |
| 62 | parser.add_argument('--input-manifest', required=True) | 111 | parser.add_argument('--input-manifest', required=True) |
| ... | @@ -71,11 +120,16 @@ def main() -> int: | ... | @@ -71,11 +120,16 @@ def main() -> int: |
| 71 | if report_path: | 120 | if report_path: |
| 72 | report_path.parent.mkdir(parents=True, exist_ok=True) | 121 | report_path.parent.mkdir(parents=True, exist_ok=True) |
| 73 | 122 | ||
| 123 | runtime_ok, missing_runtime = semantic_runtime_available() | ||
| 124 | |||
| 74 | rows = [] | 125 | rows = [] |
| 75 | feature_count = 0 | 126 | feature_count = 0 |
| 76 | wav_windows_seen = 0 | 127 | wav_windows_seen = 0 |
| 77 | matcher_fp_count = 0 | 128 | matcher_fp_count = 0 |
| 78 | fallback_fp_count = 0 | 129 | fallback_fp_count = 0 |
| 130 | semantic_runtime_ready_count = 0 | ||
| 131 | semantic_fallback_count = 0 | ||
| 132 | |||
| 79 | for row in load_jsonl(in_path): | 133 | for row in load_jsonl(in_path): |
| 80 | asset = row['asset'] | 134 | asset = row['asset'] |
| 81 | asset_path = Path(asset['storage_uri']) | 135 | asset_path = Path(asset['storage_uri']) |
| ... | @@ -107,18 +161,13 @@ def main() -> int: | ... | @@ -107,18 +161,13 @@ def main() -> int: |
| 107 | 'metadata_json': {'energy': stats['energy'], 'bytes_read': stats['bytes_read']}, | 161 | 'metadata_json': {'energy': stats['energy'], 'bytes_read': stats['bytes_read']}, |
| 108 | } | 162 | } |
| 109 | fallback_fp_count += 1 | 163 | fallback_fp_count += 1 |
| 110 | emb = { | 164 | |
| 111 | 'feature_type': 'embedding', | 165 | emb = build_semantic_feature(stats, window['start_ms'], window['end_ms'], runtime_ok, missing_runtime) |
| 112 | 'model_name': 'local_wavehash_embed', | 166 | if runtime_ok: |
| 113 | 'model_version': 'v1', | 167 | semantic_runtime_ready_count += 1 |
| 114 | 'feature_set_name': 'wavehash_embed_5s', | 168 | else: |
| 115 | 'feature_schema_ver': 'v1', | 169 | semantic_fallback_count += 1 |
| 116 | 'embedding_dim': 8, | 170 | |
| 117 | 'embedding_uri': f"inline://{stats['digest'][:16]}:{window['start_ms']}:{window['end_ms']}", | ||
| 118 | 'vector_table_name': 'audio_embedding_vector_8_placeholder', | ||
| 119 | 'checksum': f"emb:{stats['digest'][:16]}", | ||
| 120 | 'metadata_json': {'energy': stats['energy'], 'rate': stats['rate'], 'channels': stats['channels']}, | ||
| 121 | } | ||
| 122 | features.extend([fp, emb]) | 171 | features.extend([fp, emb]) |
| 123 | feature_count += 2 | 172 | feature_count += 2 |
| 124 | rows.append(row) | 173 | rows.append(row) |
| ... | @@ -132,6 +181,10 @@ def main() -> int: | ... | @@ -132,6 +181,10 @@ def main() -> int: |
| 132 | 'features_added': feature_count, | 181 | 'features_added': feature_count, |
| 133 | 'matcher_fingerprint_count': matcher_fp_count, | 182 | 'matcher_fingerprint_count': matcher_fp_count, |
| 134 | 'fallback_fingerprint_count': fallback_fp_count, | 183 | 'fallback_fingerprint_count': fallback_fp_count, |
| 184 | 'semantic_runtime_available': runtime_ok, | ||
| 185 | 'semantic_runtime_missing': missing_runtime, | ||
| 186 | 'semantic_runtime_ready_count': semantic_runtime_ready_count, | ||
| 187 | 'semantic_fallback_count': semantic_fallback_count, | ||
| 135 | } | 188 | } |
| 136 | if report_path: | 189 | if report_path: |
| 137 | report_path.write_text(json.dumps(report, ensure_ascii=False, indent=2)) | 190 | report_path.write_text(json.dumps(report, ensure_ascii=False, indent=2)) | ... | ... |
| 1 | ## 2026-06-04 | 1 | ## 2026-06-04 |
| 2 | 2 | ||
| 3 | - 升级 `enrich_songcentric_manifest_with_local_features.py` 为 runtime-aware 语义适配器选择:当前 host 上因缺少 `torch/torchaudio/transformers`,semantic lane 明确写入 `local_wavehash_embed` fallback,并把缺失依赖固化到 report/metadata 中。 | ||
| 4 | |||
| 3 | - 升级 `enrich_songcentric_manifest_with_local_features.py`:目录链中的 fingerprint 现优先复用仓库内 `ChromaprintMatcher`,并在 live PostgreSQL 上验证 5 个 wav windows 全部命中 matcher 路径、`fallback_fingerprint_count=0`。 | 5 | - 升级 `enrich_songcentric_manifest_with_local_features.py`:目录链中的 fingerprint 现优先复用仓库内 `ChromaprintMatcher`,并在 live PostgreSQL 上验证 5 个 wav windows 全部命中 matcher 路径、`fallback_fingerprint_count=0`。 |
| 4 | 6 | ||
| 5 | - 新增 `acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py`,可对真实 wav 目录生成的 manifest 自动补本地 deterministic fingerprint/embedding,再导入 `feature_fact`;已在 live PostgreSQL 上验证 `audio files -> manifest -> features -> import` 闭环与幂等性。 | 7 | - 新增 `acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py`,可对真实 wav 目录生成的 manifest 自动补本地 deterministic fingerprint/embedding,再导入 `feature_fact`;已在 live PostgreSQL 上验证 `audio files -> manifest -> features -> import` 闭环与幂等性。 | ... | ... |
| ... | @@ -312,6 +312,20 @@ flowchart TD | ... | @@ -312,6 +312,20 @@ flowchart TD |
| 312 | 312 | ||
| 313 | 这说明当前目录链里的 exact lane 已经不只是临时 hash,而是优先接上了仓库现有 fingerprint 提取能力。 | 313 | 这说明当前目录链里的 exact lane 已经不只是临时 hash,而是优先接上了仓库现有 fingerprint 提取能力。 |
| 314 | 314 | ||
| 315 | |||
| 316 | ### 4.9 目录链中的 semantic lane 运行时选择 | ||
| 317 | |||
| 318 | 当前 `enrich_songcentric_manifest_with_local_features.py` 对 semantic lane 采用 **runtime-aware** 选择: | ||
| 319 | - 如果 `torch / torchaudio / transformers` 可用,则预留真实 semantic adapter 入口 | ||
| 320 | - 如果不可用,则明确落到 `local_wavehash_embed` fallback,并把缺失依赖写进 metadata/report | ||
| 321 | |||
| 322 | 本轮 fresh evidence: | ||
| 323 | - `semantic_runtime_available = false` | ||
| 324 | - `semantic_runtime_missing = ["torch", "torchaudio", "transformers"]` | ||
| 325 | - `semantic_fallback_count = 5` | ||
| 326 | |||
| 327 | 这说明当前 host 上 semantic lane 还未接真实模型,但链路已经具备明确的运行时分流与可审计证据。 | ||
| 328 | |||
| 315 | --- | 329 | --- |
| 316 | 330 | ||
| 317 | ## 5. 最常用 SQL 样例 | 331 | ## 5. 最常用 SQL 样例 | ... | ... |
-
Please register or sign in to post a comment