Add voice chunking and match-context foundations for ACR service
Constraint: keep humming/recording query support lightweight and compatible with the existing FAISS-first local workflow while production retrieval remains pgvector-oriented Rejected: delaying service-path scaffolding until full production retrieval is ready | would block validation of voice-to-chunk and context export behavior Confidence: high Scope-risk: moderate Directive: keep semantics song_id-first and treat resource paths only as supporting evidence/context artifacts Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v Not-tested: live FastAPI smoke until uvicorn is available in the current interpreter environment
Showing
12 changed files
with
636 additions
and
199 deletions
| ... | @@ -123,3 +123,29 @@ cd acr-engine | ... | @@ -123,3 +123,29 @@ cd acr-engine |
| 123 | - Hybrid 分数归一化后再融合 | 123 | - Hybrid 分数归一化后再融合 |
| 124 | - full-demo 自动训练 | 124 | - full-demo 自动训练 |
| 125 | - 后续可接入开源数据集 | 125 | - 后续可接入开源数据集 |
| 126 | |||
| 127 | |||
| 128 | ## 哼唱 / 录音识别接口(voice -> chunk -> song_id) | ||
| 129 | |||
| 130 | 当前已经补齐了两段基础能力: | ||
| 131 | |||
| 132 | - `src/data/voice_chunker.py`:把原始 voice / humming 音频切成可检索 chunk | ||
| 133 | - `src/utils/context_exporter.py`:把命中的 reference window 导出为上下文 clip(默认 10s) | ||
| 134 | |||
| 135 | FastAPI 目标接口: | ||
| 136 | |||
| 137 | - `POST /recognize/voice` | ||
| 138 | |||
| 139 | 输入: | ||
| 140 | - 外部上传语音/录音文件 | ||
| 141 | |||
| 142 | 输出: | ||
| 143 | - `song_id` | ||
| 144 | - `reference_audio_path` | ||
| 145 | - `best_chunk` | ||
| 146 | - `context_clip` | ||
| 147 | - `chunk_results` | ||
| 148 | |||
| 149 | 说明: | ||
| 150 | - 该接口代码已接入 `src/service/app.py`。 | ||
| 151 | - 当前环境尚缺 `uvicorn`,因此服务 smoke 需要先补运行依赖后再执行。 | ... | ... |
| 1 | #!/usr/bin/env /usr/local/miniconda3/bin/python | ||
| 2 | from __future__ import annotations | ||
| 3 | |||
| 4 | import argparse | ||
| 5 | import json | ||
| 6 | from pathlib import Path | ||
| 7 | |||
| 8 | |||
| 9 | def main() -> None: | ||
| 10 | ap = argparse.ArgumentParser() | ||
| 11 | ap.add_argument('--chunks-json', required=True) | ||
| 12 | ap.add_argument('--song-id', required=True) | ||
| 13 | ap.add_argument('--split', default='test') | ||
| 14 | ap.add_argument('--output', required=True) | ||
| 15 | ap.add_argument('--source-dataset', default='humming_real') | ||
| 16 | args = ap.parse_args() | ||
| 17 | |||
| 18 | payload = json.loads(Path(args.chunks_json).read_text(encoding='utf-8')) | ||
| 19 | rows = [] | ||
| 20 | for chunk in payload.get('chunks', []): | ||
| 21 | rows.append({ | ||
| 22 | 'song_id': args.song_id, | ||
| 23 | 'audio_path': chunk['audio_path'], | ||
| 24 | 'duration': chunk['duration_sec'], | ||
| 25 | 'type': 'humming_real', | ||
| 26 | 'segment_type': 'humming_query', | ||
| 27 | 'offset': chunk['start_sec'], | ||
| 28 | 'source_dataset': args.source_dataset, | ||
| 29 | 'split': args.split, | ||
| 30 | }) | ||
| 31 | |||
| 32 | out = Path(args.output) | ||
| 33 | out.parent.mkdir(parents=True, exist_ok=True) | ||
| 34 | out.write_text(json.dumps(rows, ensure_ascii=False, indent=2), encoding='utf-8') | ||
| 35 | print(json.dumps({'rows': len(rows), 'output': str(out)}, ensure_ascii=False, indent=2)) | ||
| 36 | |||
| 37 | |||
| 38 | if __name__ == '__main__': | ||
| 39 | main() |
acr-engine/scripts/service_voice_smoke.py
0 → 100755
| 1 | #!/usr/bin/env /usr/local/miniconda3/bin/python | ||
| 2 | from __future__ import annotations | ||
| 3 | |||
| 4 | import json | ||
| 5 | import subprocess | ||
| 6 | import time | ||
| 7 | from pathlib import Path | ||
| 8 | from urllib.request import Request, urlopen | ||
| 9 | |||
| 10 | BASE = 'http://127.0.0.1:8000' | ||
| 11 | |||
| 12 | |||
| 13 | def post_multipart(url: str, file_path: Path): | ||
| 14 | boundary = '----acrboundary' | ||
| 15 | data = file_path.read_bytes() | ||
| 16 | body = ( | ||
| 17 | f'--{boundary}\r\n' | ||
| 18 | f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n' | ||
| 19 | f'Content-Type: audio/wav\r\n\r\n' | ||
| 20 | ).encode('utf-8') + data + f'\r\n--{boundary}--\r\n'.encode('utf-8') | ||
| 21 | req = Request(url, data=body, method='POST') | ||
| 22 | req.add_header('Content-Type', f'multipart/form-data; boundary={boundary}') | ||
| 23 | with urlopen(req, timeout=20) as resp: | ||
| 24 | return json.loads(resp.read().decode('utf-8')) | ||
| 25 | |||
| 26 | |||
| 27 | def main(): | ||
| 28 | cmd = [ | ||
| 29 | '/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000' | ||
| 30 | ] | ||
| 31 | proc = subprocess.Popen(cmd, cwd='/root/vprecog/acr-engine', stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) | ||
| 32 | query = Path('/workspace/downloads/111/type_7/75cd601b-7604-4b37-8132-cfab39e7c644.mp3') | ||
| 33 | try: | ||
| 34 | for _ in range(20): | ||
| 35 | time.sleep(0.5) | ||
| 36 | try: | ||
| 37 | result = post_multipart(BASE + '/recognize/voice', query) | ||
| 38 | print(json.dumps({ | ||
| 39 | 'status': 'ok', | ||
| 40 | 'chunk_count': result.get('chunk_count'), | ||
| 41 | 'top_song_id': result.get('candidates', [{}])[0].get('song_id') if result.get('candidates') else None, | ||
| 42 | 'has_context': bool(result.get('candidates', [{}])[0].get('context_clip')) if result.get('candidates') else False, | ||
| 43 | }, ensure_ascii=False, indent=2)) | ||
| 44 | return | ||
| 45 | except Exception: | ||
| 46 | continue | ||
| 47 | raise SystemExit('service voice smoke failed: service not ready or endpoint failed') | ||
| 48 | finally: | ||
| 49 | proc.terminate() | ||
| 50 | try: | ||
| 51 | proc.wait(timeout=5) | ||
| 52 | except subprocess.TimeoutExpired: | ||
| 53 | proc.kill() | ||
| 54 | proc.wait(timeout=5) | ||
| 55 | |||
| 56 | |||
| 57 | if __name__ == '__main__': | ||
| 58 | main() |
acr-engine/src/data/voice_chunker.py
0 → 100644
| 1 | #!/usr/bin/env /usr/local/miniconda3/bin/python | ||
| 2 | from __future__ import annotations | ||
| 3 | |||
| 4 | import argparse | ||
| 5 | import json | ||
| 6 | from pathlib import Path | ||
| 7 | from typing import List, Dict | ||
| 8 | |||
| 9 | import librosa | ||
| 10 | import numpy as np | ||
| 11 | import soundfile as sf | ||
| 12 | |||
| 13 | |||
| 14 | def normalize_audio(audio_path: str, sr: int = 16000) -> np.ndarray: | ||
| 15 | y, _ = librosa.load(audio_path, sr=sr, mono=True) | ||
| 16 | return y.astype(np.float32) | ||
| 17 | |||
| 18 | |||
| 19 | def detect_voiced_intervals(y: np.ndarray, sr: int, top_db: int = 30, min_voiced_sec: float = 2.0) -> List[tuple[int, int]]: | ||
| 20 | intervals = librosa.effects.split(y, top_db=top_db) | ||
| 21 | min_len = int(sr * min_voiced_sec) | ||
| 22 | kept = [] | ||
| 23 | for start, end in intervals: | ||
| 24 | if end - start >= min_len: | ||
| 25 | kept.append((int(start), int(end))) | ||
| 26 | return kept | ||
| 27 | |||
| 28 | |||
| 29 | def chunk_intervals(intervals: List[tuple[int, int]], sr: int, target_chunk_sec: float = 8.0, stride_sec: float = 4.0) -> List[tuple[int, int, bool]]: | ||
| 30 | chunk_len = int(sr * target_chunk_sec) | ||
| 31 | stride = int(sr * stride_sec) | ||
| 32 | chunks: List[tuple[int, int, bool]] = [] | ||
| 33 | for start, end in intervals: | ||
| 34 | seg_len = end - start | ||
| 35 | if seg_len < chunk_len: | ||
| 36 | chunks.append((start, end, True)) | ||
| 37 | continue | ||
| 38 | pos = start | ||
| 39 | while pos + chunk_len <= end: | ||
| 40 | chunks.append((pos, pos + chunk_len, False)) | ||
| 41 | pos += stride | ||
| 42 | if pos < end and end - pos >= int(sr * 2.0): | ||
| 43 | tail_start = max(start, end - chunk_len) | ||
| 44 | chunks.append((tail_start, end, end - tail_start < chunk_len)) | ||
| 45 | deduped = [] | ||
| 46 | seen = set() | ||
| 47 | for item in chunks: | ||
| 48 | key = (item[0], item[1]) | ||
| 49 | if key not in seen: | ||
| 50 | deduped.append(item) | ||
| 51 | seen.add(key) | ||
| 52 | return deduped | ||
| 53 | |||
| 54 | |||
| 55 | def write_chunks(y: np.ndarray, sr: int, chunks: List[tuple[int, int, bool]], output_dir: str, source_audio_path: str) -> List[Dict]: | ||
| 56 | out_dir = Path(output_dir) | ||
| 57 | out_dir.mkdir(parents=True, exist_ok=True) | ||
| 58 | chunk_len = None | ||
| 59 | results = [] | ||
| 60 | for idx, (start, end, padded) in enumerate(chunks): | ||
| 61 | clip = y[start:end] | ||
| 62 | if chunk_len is None: | ||
| 63 | chunk_len = max(len(clip), 1) | ||
| 64 | target_len = max(chunk_len, len(clip)) | ||
| 65 | if padded and len(clip) < target_len: | ||
| 66 | clip = np.pad(clip, (0, target_len - len(clip))) | ||
| 67 | chunk_path = out_dir / f'chunk_{idx:03d}.wav' | ||
| 68 | sf.write(str(chunk_path), clip, sr) | ||
| 69 | results.append({ | ||
| 70 | 'chunk_id': f'chunk_{idx:03d}', | ||
| 71 | 'audio_path': str(chunk_path), | ||
| 72 | 'start_sec': round(start / sr, 4), | ||
| 73 | 'end_sec': round(end / sr, 4), | ||
| 74 | 'duration_sec': round(len(clip) / sr, 4), | ||
| 75 | 'padded': padded, | ||
| 76 | 'source_audio_path': source_audio_path, | ||
| 77 | }) | ||
| 78 | return results | ||
| 79 | |||
| 80 | |||
| 81 | def voice_to_chunks(audio_path: str, output_dir: str, target_chunk_sec: float = 8.0, stride_sec: float = 4.0, min_voiced_sec: float = 2.0, top_db: int = 30, sr: int = 16000) -> List[Dict]: | ||
| 82 | y = normalize_audio(audio_path, sr=sr) | ||
| 83 | intervals = detect_voiced_intervals(y, sr=sr, top_db=top_db, min_voiced_sec=min_voiced_sec) | ||
| 84 | chunks = chunk_intervals(intervals, sr=sr, target_chunk_sec=target_chunk_sec, stride_sec=stride_sec) | ||
| 85 | return write_chunks(y, sr, chunks, output_dir, source_audio_path=audio_path) | ||
| 86 | |||
| 87 | |||
| 88 | def main() -> None: | ||
| 89 | ap = argparse.ArgumentParser() | ||
| 90 | ap.add_argument('--input', required=True) | ||
| 91 | ap.add_argument('--output-dir', required=True) | ||
| 92 | ap.add_argument('--target-chunk-sec', type=float, default=8.0) | ||
| 93 | ap.add_argument('--stride-sec', type=float, default=4.0) | ||
| 94 | ap.add_argument('--min-voiced-sec', type=float, default=2.0) | ||
| 95 | ap.add_argument('--top-db', type=int, default=30) | ||
| 96 | ap.add_argument('--sr', type=int, default=16000) | ||
| 97 | ap.add_argument('--output-json', default='chunks.json') | ||
| 98 | args = ap.parse_args() | ||
| 99 | chunks = voice_to_chunks( | ||
| 100 | audio_path=args.input, | ||
| 101 | output_dir=args.output_dir, | ||
| 102 | target_chunk_sec=args.target_chunk_sec, | ||
| 103 | stride_sec=args.stride_sec, | ||
| 104 | min_voiced_sec=args.min_voiced_sec, | ||
| 105 | top_db=args.top_db, | ||
| 106 | sr=args.sr, | ||
| 107 | ) | ||
| 108 | out_json = Path(args.output_dir) / args.output_json | ||
| 109 | out_json.write_text(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2), encoding='utf-8') | ||
| 110 | print(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2)) | ||
| 111 | |||
| 112 | |||
| 113 | if __name__ == '__main__': | ||
| 114 | main() |
| 1 | from __future__ import annotations | 1 | from __future__ import annotations |
| 2 | 2 | ||
| 3 | from pathlib import Path | 3 | from pathlib import Path |
| 4 | from tempfile import TemporaryDirectory | ||
| 4 | from threading import Lock | 5 | from threading import Lock |
| 5 | from typing import Optional | 6 | from typing import Optional |
| 6 | 7 | ||
| 7 | import numpy as np | 8 | import numpy as np |
| 8 | from fastapi import FastAPI, HTTPException | 9 | from fastapi import FastAPI, File, HTTPException, UploadFile |
| 9 | from pydantic import BaseModel | 10 | from pydantic import BaseModel |
| 10 | 11 | ||
| 12 | from src.data.voice_chunker import voice_to_chunks | ||
| 11 | from src.engines.chromaprint_matcher import ChromaprintMatcher | 13 | from src.engines.chromaprint_matcher import ChromaprintMatcher |
| 12 | from src.engines.ecapa_embedder import ECAPAEmbedder | 14 | from src.engines.ecapa_embedder import ECAPAEmbedder |
| 13 | from src.engines.hybrid_engine import HybridEngine | 15 | from src.engines.hybrid_engine import HybridEngine |
| 14 | from src.service.settings import ServiceSettings | 16 | from src.service.settings import ServiceSettings |
| 17 | from src.utils.context_exporter import export_match_context, find_best_matching_window | ||
| 15 | 18 | ||
| 16 | 19 | ||
| 17 | class RecognizeRequest(BaseModel): | 20 | class RecognizeRequest(BaseModel): |
| ... | @@ -30,7 +33,7 @@ class BuildIndexRequest(BaseModel): | ... | @@ -30,7 +33,7 @@ class BuildIndexRequest(BaseModel): |
| 30 | device: Optional[str] = None | 33 | device: Optional[str] = None |
| 31 | 34 | ||
| 32 | 35 | ||
| 33 | app = FastAPI(title="ACR Service", version="0.3.0") | 36 | app = FastAPI(title='ACR Service', version='0.4.0') |
| 34 | settings = ServiceSettings() | 37 | settings = ServiceSettings() |
| 35 | _engine_cache: dict[tuple[str, str, str, str], HybridEngine] = {} | 38 | _engine_cache: dict[tuple[str, str, str, str], HybridEngine] = {} |
| 36 | _cache_lock = Lock() | 39 | _cache_lock = Lock() |
| ... | @@ -38,52 +41,52 @@ _cache_lock = Lock() | ... | @@ -38,52 +41,52 @@ _cache_lock = Lock() |
| 38 | 41 | ||
| 39 | def _resolve(req_data_dir=None, req_model_path=None, req_index_prefix=None, req_device=None): | 42 | def _resolve(req_data_dir=None, req_model_path=None, req_index_prefix=None, req_device=None): |
| 40 | return { | 43 | return { |
| 41 | "data_dir": req_data_dir or settings.data_dir, | 44 | 'data_dir': req_data_dir or settings.data_dir, |
| 42 | "model_path": req_model_path or settings.model_path, | 45 | 'model_path': req_model_path or settings.model_path, |
| 43 | "index_prefix": req_index_prefix or settings.index_prefix, | 46 | 'index_prefix': req_index_prefix or settings.index_prefix, |
| 44 | "device": req_device or settings.device, | 47 | 'device': req_device or settings.device, |
| 45 | } | 48 | } |
| 46 | 49 | ||
| 47 | 50 | ||
| 48 | def _readiness_snapshot(data_dir: str, model_path: str, index_prefix: str) -> dict: | 51 | def _readiness_snapshot(data_dir: str, model_path: str, index_prefix: str) -> dict: |
| 49 | chroma_path = str(Path(index_prefix).parent / "chromaprint.pkl") | 52 | chroma_path = str(Path(index_prefix).parent / 'chromaprint.pkl') |
| 50 | embs_path = f"{index_prefix}_embs.npy" | 53 | embs_path = f'{index_prefix}_embs.npy' |
| 51 | ids_path = f"{index_prefix}_ids.npy" | 54 | ids_path = f'{index_prefix}_ids.npy' |
| 52 | manifest_candidates = [str((Path(data_dir) / split).resolve()) for split in ["catalog.json", "train.json", "val.json", "test.json"] if (Path(data_dir) / split).exists()] | 55 | manifest_candidates = [ |
| 56 | str((Path(data_dir) / split).resolve()) | ||
| 57 | for split in ['catalog.json', 'train.json', 'val.json', 'test.json'] | ||
| 58 | if (Path(data_dir) / split).exists() | ||
| 59 | ] | ||
| 53 | files = { | 60 | files = { |
| 54 | "data_dir": {"path": str(Path(data_dir).resolve()), "exists": Path(data_dir).exists()}, | 61 | 'data_dir': {'path': str(Path(data_dir).resolve()), 'exists': Path(data_dir).exists()}, |
| 55 | "model": {"path": str(Path(model_path).resolve()), "exists": Path(model_path).exists()}, | 62 | 'model': {'path': str(Path(model_path).resolve()), 'exists': Path(model_path).exists()}, |
| 56 | "chromaprint_index": {"path": str(Path(chroma_path).resolve()), "exists": Path(chroma_path).exists()}, | 63 | 'chromaprint_index': {'path': str(Path(chroma_path).resolve()), 'exists': Path(chroma_path).exists()}, |
| 57 | "embedding_index": {"path": str(Path(embs_path).resolve()), "exists": Path(embs_path).exists()}, | 64 | 'embedding_index': {'path': str(Path(embs_path).resolve()), 'exists': Path(embs_path).exists()}, |
| 58 | "id_index": {"path": str(Path(ids_path).resolve()), "exists": Path(ids_path).exists()}, | 65 | 'id_index': {'path': str(Path(ids_path).resolve()), 'exists': Path(ids_path).exists()}, |
| 59 | } | ||
| 60 | return { | ||
| 61 | "ready": all(item["exists"] for item in files.values()), | ||
| 62 | "files": files, | ||
| 63 | "manifests": manifest_candidates, | ||
| 64 | } | 66 | } |
| 67 | return {'ready': all(item['exists'] for item in files.values()), 'files': files, 'manifests': manifest_candidates} | ||
| 65 | 68 | ||
| 66 | 69 | ||
| 67 | def _load_engine_uncached(data_dir: str, model_path: str, index_prefix: str, device: str) -> HybridEngine: | 70 | def _load_engine_uncached(data_dir: str, model_path: str, index_prefix: str, device: str) -> HybridEngine: |
| 68 | matcher = ChromaprintMatcher() | 71 | matcher = ChromaprintMatcher() |
| 69 | chroma_path = str(Path(index_prefix).parent / "chromaprint.pkl") | 72 | chroma_path = str(Path(index_prefix).parent / 'chromaprint.pkl') |
| 70 | if not Path(chroma_path).exists(): | 73 | if not Path(chroma_path).exists(): |
| 71 | raise HTTPException(status_code=400, detail=f"Missing chromaprint index: {chroma_path}") | 74 | raise HTTPException(status_code=400, detail=f'Missing chromaprint index: {chroma_path}') |
| 72 | matcher.load(chroma_path) | 75 | matcher.load(chroma_path) |
| 73 | 76 | ||
| 74 | if not Path(model_path).exists(): | 77 | if not Path(model_path).exists(): |
| 75 | raise HTTPException(status_code=400, detail=f"Missing model: {model_path}") | 78 | raise HTTPException(status_code=400, detail=f'Missing model: {model_path}') |
| 76 | embedder = ECAPAEmbedder(model_path=model_path, device=device) | 79 | embedder = ECAPAEmbedder(model_path=model_path, device=device) |
| 77 | 80 | ||
| 78 | embs_path = f"{index_prefix}_embs.npy" | 81 | embs_path = f'{index_prefix}_embs.npy' |
| 79 | ids_path = f"{index_prefix}_ids.npy" | 82 | ids_path = f'{index_prefix}_ids.npy' |
| 80 | if not Path(embs_path).exists() or not Path(ids_path).exists(): | 83 | if not Path(embs_path).exists() or not Path(ids_path).exists(): |
| 81 | raise HTTPException(status_code=400, detail="Missing embedding index files") | 84 | raise HTTPException(status_code=400, detail='Missing embedding index files') |
| 82 | 85 | ||
| 83 | ref_embs = np.load(embs_path) | 86 | ref_embs = np.load(embs_path) |
| 84 | ref_ids = np.load(ids_path, allow_pickle=True).tolist() | 87 | ref_ids = np.load(ids_path, allow_pickle=True).tolist() |
| 85 | engine = HybridEngine(matcher, embedder, ref_embs, ref_ids) | 88 | engine = HybridEngine(matcher, embedder, ref_embs, ref_ids) |
| 86 | for split in ["catalog.json", "train.json", "val.json", "test.json"]: | 89 | for split in ['catalog.json', 'train.json', 'val.json', 'test.json']: |
| 87 | p = Path(data_dir) / split | 90 | p = Path(data_dir) / split |
| 88 | if p.exists(): | 91 | if p.exists(): |
| 89 | engine.load_metadata(str(p)) | 92 | engine.load_metadata(str(p)) |
| ... | @@ -105,70 +108,168 @@ def _load_engine(data_dir: str, model_path: str, index_prefix: str, device: str) | ... | @@ -105,70 +108,168 @@ def _load_engine(data_dir: str, model_path: str, index_prefix: str, device: str) |
| 105 | def _cache_stats() -> dict: | 108 | def _cache_stats() -> dict: |
| 106 | with _cache_lock: | 109 | with _cache_lock: |
| 107 | keys = list(_engine_cache.keys()) | 110 | keys = list(_engine_cache.keys()) |
| 108 | return {"engine_cache_size": len(keys), "cache_keys": keys} | 111 | return {'engine_cache_size': len(keys), 'cache_keys': keys} |
| 109 | 112 | ||
| 110 | 113 | ||
| 111 | @app.get("/health") | 114 | def _aggregate_chunk_results(chunk_results: list[dict], top_n: int) -> list[dict]: |
| 115 | by_song: dict[str, dict] = {} | ||
| 116 | for chunk in chunk_results: | ||
| 117 | for cand in chunk.get('candidates', []): | ||
| 118 | song_id = cand['song_id'] | ||
| 119 | entry = by_song.setdefault(song_id, { | ||
| 120 | 'song_id': song_id, | ||
| 121 | 'best_confidence': -1.0, | ||
| 122 | 'match_count': 0, | ||
| 123 | 'best_chunk': None, | ||
| 124 | 'best_candidate': None, | ||
| 125 | }) | ||
| 126 | entry['match_count'] += 1 | ||
| 127 | if cand['confidence'] > entry['best_confidence']: | ||
| 128 | entry['best_confidence'] = cand['confidence'] | ||
| 129 | entry['best_chunk'] = chunk | ||
| 130 | entry['best_candidate'] = cand | ||
| 131 | ranked = [] | ||
| 132 | for entry in by_song.values(): | ||
| 133 | combined = float(entry['best_confidence']) + 0.05 * float(entry['match_count']) | ||
| 134 | ranked.append({ | ||
| 135 | 'song_id': entry['song_id'], | ||
| 136 | 'combined_confidence': round(combined, 4), | ||
| 137 | 'best_confidence': round(float(entry['best_confidence']), 4), | ||
| 138 | 'match_count': entry['match_count'], | ||
| 139 | 'best_chunk': entry['best_chunk'], | ||
| 140 | 'best_candidate': entry['best_candidate'], | ||
| 141 | }) | ||
| 142 | ranked.sort(key=lambda x: x['combined_confidence'], reverse=True) | ||
| 143 | return ranked[:top_n] | ||
| 144 | |||
| 145 | |||
| 146 | def _reference_audio_for_song(engine: HybridEngine, song_id: str) -> str | None: | ||
| 147 | return engine.song_audio_paths.get(song_id) | ||
| 148 | |||
| 149 | |||
| 150 | @app.get('/health') | ||
| 112 | def health(): | 151 | def health(): |
| 113 | resolved = _resolve() | 152 | resolved = _resolve() |
| 114 | readiness = _readiness_snapshot(resolved["data_dir"], resolved["model_path"], resolved["index_prefix"]) | 153 | readiness = _readiness_snapshot(resolved['data_dir'], resolved['model_path'], resolved['index_prefix']) |
| 115 | return { | 154 | return {'status': 'ok', 'service': 'acr', 'version': '0.4.0', 'ready': readiness['ready']} |
| 116 | "status": "ok", | ||
| 117 | "service": "acr", | ||
| 118 | "version": "0.3.0", | ||
| 119 | "ready": readiness["ready"], | ||
| 120 | } | ||
| 121 | 155 | ||
| 122 | 156 | ||
| 123 | @app.get("/ready") | 157 | @app.get('/ready') |
| 124 | def ready(): | 158 | def ready(): |
| 125 | resolved = _resolve() | 159 | resolved = _resolve() |
| 126 | readiness = _readiness_snapshot(resolved["data_dir"], resolved["model_path"], resolved["index_prefix"]) | 160 | readiness = _readiness_snapshot(resolved['data_dir'], resolved['model_path'], resolved['index_prefix']) |
| 127 | return { | 161 | return {'service': 'acr', 'version': '0.4.0', **readiness, **_cache_stats()} |
| 128 | "service": "acr", | ||
| 129 | "version": "0.3.0", | ||
| 130 | **readiness, | ||
| 131 | **_cache_stats(), | ||
| 132 | } | ||
| 133 | 162 | ||
| 134 | 163 | ||
| 135 | @app.get("/config") | 164 | @app.get('/config') |
| 136 | def config(): | 165 | def config(): |
| 137 | return settings.model_dump() | 166 | return settings.model_dump() |
| 138 | 167 | ||
| 139 | 168 | ||
| 140 | @app.get("/cache") | 169 | @app.get('/cache') |
| 141 | def cache_status(): | 170 | def cache_status(): |
| 142 | return _cache_stats() | 171 | return _cache_stats() |
| 143 | 172 | ||
| 144 | 173 | ||
| 145 | @app.post("/recognize") | 174 | @app.post('/recognize') |
| 146 | def recognize(req: RecognizeRequest): | 175 | def recognize(req: RecognizeRequest): |
| 147 | resolved = _resolve(req.data_dir, req.model_path, req.index_prefix, req.device) | 176 | resolved = _resolve(req.data_dir, req.model_path, req.index_prefix, req.device) |
| 148 | if not Path(req.query_path).exists(): | 177 | if not Path(req.query_path).exists(): |
| 149 | raise HTTPException(status_code=400, detail=f"Missing query file: {req.query_path}") | 178 | raise HTTPException(status_code=400, detail=f'Missing query file: {req.query_path}') |
| 150 | engine, cache_hit = _load_engine(**resolved) | 179 | engine, cache_hit = _load_engine(**resolved) |
| 151 | result = engine.recognize(req.query_path, top_n=req.top_n) | 180 | result = engine.recognize(req.query_path, top_n=req.top_n) |
| 152 | return { | 181 | return {'cache_hit': cache_hit, 'resolved': resolved, 'result': result} |
| 153 | "cache_hit": cache_hit, | ||
| 154 | "resolved": resolved, | ||
| 155 | "result": result, | ||
| 156 | } | ||
| 157 | 182 | ||
| 158 | 183 | ||
| 159 | @app.post("/index/build") | 184 | @app.post('/index/build') |
| 160 | def build_index(req: BuildIndexRequest): | 185 | def build_index(req: BuildIndexRequest): |
| 161 | from run_demo import build_chroma_index, build_embedding_index | 186 | from run_demo import build_chroma_index, build_embedding_index |
| 162 | 187 | ||
| 163 | resolved = _resolve(req.data_dir, req.model_path, None, req.device) | 188 | resolved = _resolve(req.data_dir, req.model_path, None, req.device) |
| 164 | data_dir = Path(resolved["data_dir"]) | 189 | data_dir = Path(resolved['data_dir']) |
| 165 | out_dir = Path(req.output_dir) | 190 | out_dir = Path(req.output_dir) |
| 166 | out_dir.mkdir(parents=True, exist_ok=True) | 191 | out_dir.mkdir(parents=True, exist_ok=True) |
| 167 | build_chroma_index(data_dir, out_dir) | 192 | build_chroma_index(data_dir, out_dir) |
| 168 | _, ref_embs, ref_ids = build_embedding_index(data_dir, Path(resolved["model_path"]), out_dir / "reference", resolved["device"]) | 193 | _, ref_embs, ref_ids = build_embedding_index(data_dir, Path(resolved['model_path']), out_dir / 'reference', resolved['device']) |
| 194 | return { | ||
| 195 | 'status': 'ok', | ||
| 196 | 'num_reference_windows': len(ref_ids), | ||
| 197 | 'embedding_dim': int(ref_embs.shape[1]) if len(ref_embs.shape) > 1 else 0, | ||
| 198 | 'output_dir': str(out_dir.resolve()), | ||
| 199 | } | ||
| 200 | |||
| 201 | |||
| 202 | @app.post('/recognize/voice') | ||
| 203 | async def recognize_voice( | ||
| 204 | file: UploadFile = File(...), | ||
| 205 | top_n: int = 5, | ||
| 206 | data_dir: Optional[str] = None, | ||
| 207 | model_path: Optional[str] = None, | ||
| 208 | index_prefix: Optional[str] = None, | ||
| 209 | device: Optional[str] = None, | ||
| 210 | context_sec: float = 10.0, | ||
| 211 | output_format: str = 'mp3', | ||
| 212 | ): | ||
| 213 | resolved = _resolve(data_dir, model_path, index_prefix, device) | ||
| 214 | engine, cache_hit = _load_engine(**resolved) | ||
| 215 | with TemporaryDirectory(prefix='acr_voice_') as tmpdir: | ||
| 216 | tmp = Path(tmpdir) | ||
| 217 | suffix = Path(file.filename or 'upload.wav').suffix or '.wav' | ||
| 218 | raw_path = tmp / f'input{suffix}' | ||
| 219 | raw_path.write_bytes(await file.read()) | ||
| 220 | |||
| 221 | chunk_dir = tmp / 'chunks' | ||
| 222 | chunks = voice_to_chunks(str(raw_path), str(chunk_dir)) | ||
| 223 | if not chunks: | ||
| 224 | raise HTTPException(status_code=400, detail='No voiced chunks detected from input audio') | ||
| 225 | |||
| 226 | chunk_results = [] | ||
| 227 | for chunk in chunks: | ||
| 228 | result = engine.recognize(chunk['audio_path'], top_n=top_n) | ||
| 229 | chunk_results.append({ | ||
| 230 | 'chunk': chunk, | ||
| 231 | 'candidates': result['candidates'], | ||
| 232 | 'processing_time_ms': result['processing_time_ms'], | ||
| 233 | }) | ||
| 234 | |||
| 235 | ranked = _aggregate_chunk_results(chunk_results, top_n=top_n) | ||
| 236 | response_candidates = [] | ||
| 237 | for item in ranked: | ||
| 238 | song_id = item['song_id'] | ||
| 239 | ref_audio = _reference_audio_for_song(engine, song_id) | ||
| 240 | context_info = None | ||
| 241 | if ref_audio and item['best_chunk'] is not None: | ||
| 242 | match = find_best_matching_window( | ||
| 243 | query_audio_path=item['best_chunk']['chunk']['audio_path'], | ||
| 244 | reference_audio_path=ref_audio, | ||
| 245 | ) | ||
| 246 | out_path = tmp / 'contexts' / f'{song_id}.{output_format}' | ||
| 247 | context_info = export_match_context( | ||
| 248 | audio_path=ref_audio, | ||
| 249 | window_start_sec=match['window_start_sec'], | ||
| 250 | window_end_sec=match['window_end_sec'], | ||
| 251 | output_path=str(out_path), | ||
| 252 | context_sec=context_sec, | ||
| 253 | output_format=output_format, | ||
| 254 | ) | ||
| 255 | context_info['match'] = match | ||
| 256 | |||
| 257 | response_candidates.append({ | ||
| 258 | 'song_id': song_id, | ||
| 259 | 'combined_confidence': item['combined_confidence'], | ||
| 260 | 'best_confidence': item['best_confidence'], | ||
| 261 | 'match_count': item['match_count'], | ||
| 262 | 'reference_audio_path': ref_audio, | ||
| 263 | 'best_candidate': item['best_candidate'], | ||
| 264 | 'best_chunk': item['best_chunk']['chunk'] if item['best_chunk'] else None, | ||
| 265 | 'context_clip': context_info, | ||
| 266 | }) | ||
| 267 | |||
| 169 | return { | 268 | return { |
| 170 | "status": "ok", | 269 | 'cache_hit': cache_hit, |
| 171 | "num_reference_windows": len(ref_ids), | 270 | 'resolved': resolved, |
| 172 | "embedding_dim": int(ref_embs.shape[1]) if len(ref_embs.shape) > 1 else 0, | 271 | 'query_audio_filename': file.filename, |
| 173 | "output_dir": str(out_dir.resolve()), | 272 | 'chunk_count': len(chunks), |
| 273 | 'chunk_results': chunk_results, | ||
| 274 | 'candidates': response_candidates, | ||
| 174 | } | 275 | } | ... | ... |
acr-engine/src/utils/context_exporter.py
0 → 100644
| 1 | from __future__ import annotations | ||
| 2 | |||
| 3 | import shutil | ||
| 4 | import subprocess | ||
| 5 | import tempfile | ||
| 6 | from pathlib import Path | ||
| 7 | from typing import Dict, Tuple | ||
| 8 | |||
| 9 | import librosa | ||
| 10 | import numpy as np | ||
| 11 | import soundfile as sf | ||
| 12 | |||
| 13 | |||
| 14 | def load_audio(audio_path: str, sr: int = 16000) -> np.ndarray: | ||
| 15 | y, _ = librosa.load(audio_path, sr=sr, mono=True) | ||
| 16 | return y.astype(np.float32) | ||
| 17 | |||
| 18 | |||
| 19 | def chroma_embedding(y: np.ndarray, sr: int) -> np.ndarray: | ||
| 20 | chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_chroma=12) | ||
| 21 | feat = np.concatenate([chroma.mean(axis=1), chroma.std(axis=1)], axis=0).astype(np.float32) | ||
| 22 | norm = np.linalg.norm(feat) | ||
| 23 | return feat / norm if norm > 0 else feat | ||
| 24 | |||
| 25 | |||
| 26 | def find_best_matching_window( | ||
| 27 | query_audio_path: str, | ||
| 28 | reference_audio_path: str, | ||
| 29 | sr: int = 16000, | ||
| 30 | stride_sec: float = 1.0, | ||
| 31 | ) -> Dict: | ||
| 32 | query_y = load_audio(query_audio_path, sr=sr) | ||
| 33 | ref_y = load_audio(reference_audio_path, sr=sr) | ||
| 34 | query_len = len(query_y) | ||
| 35 | if query_len == 0: | ||
| 36 | raise ValueError('Empty query audio') | ||
| 37 | if len(ref_y) < query_len: | ||
| 38 | ref_y = np.pad(ref_y, (0, query_len - len(ref_y))) | ||
| 39 | |||
| 40 | query_feat = chroma_embedding(query_y, sr) | ||
| 41 | stride = max(1, int(sr * stride_sec)) | ||
| 42 | best_score = -1.0 | ||
| 43 | best_start = 0 | ||
| 44 | for start in range(0, max(len(ref_y) - query_len + 1, 1), stride): | ||
| 45 | window = ref_y[start:start + query_len] | ||
| 46 | if len(window) < query_len: | ||
| 47 | window = np.pad(window, (0, query_len - len(window))) | ||
| 48 | score = float(np.dot(query_feat, chroma_embedding(window, sr))) | ||
| 49 | if score > best_score: | ||
| 50 | best_score = score | ||
| 51 | best_start = start | ||
| 52 | |||
| 53 | return { | ||
| 54 | 'window_start_sec': round(best_start / sr, 4), | ||
| 55 | 'window_end_sec': round((best_start + query_len) / sr, 4), | ||
| 56 | 'window_score': round(best_score, 6), | ||
| 57 | 'query_duration_sec': round(query_len / sr, 4), | ||
| 58 | } | ||
| 59 | |||
| 60 | |||
| 61 | def export_match_context( | ||
| 62 | audio_path: str, | ||
| 63 | window_start_sec: float, | ||
| 64 | window_end_sec: float, | ||
| 65 | output_path: str, | ||
| 66 | context_sec: float = 10.0, | ||
| 67 | output_format: str = 'mp3', | ||
| 68 | sr: int = 16000, | ||
| 69 | ) -> Dict: | ||
| 70 | y = load_audio(audio_path, sr=sr) | ||
| 71 | center = (window_start_sec + window_end_sec) / 2.0 | ||
| 72 | half = context_sec / 2.0 | ||
| 73 | clip_start_sec = max(0.0, center - half) | ||
| 74 | clip_end_sec = min(len(y) / sr, center + half) | ||
| 75 | start = int(clip_start_sec * sr) | ||
| 76 | end = max(start + 1, int(clip_end_sec * sr)) | ||
| 77 | clip = y[start:end] | ||
| 78 | |||
| 79 | output = Path(output_path) | ||
| 80 | output.parent.mkdir(parents=True, exist_ok=True) | ||
| 81 | actual_format = output_format | ||
| 82 | |||
| 83 | if output_format == 'mp3' and shutil.which('ffmpeg'): | ||
| 84 | with tempfile.TemporaryDirectory() as tmp: | ||
| 85 | wav_path = Path(tmp) / 'context.wav' | ||
| 86 | sf.write(wav_path, clip, sr) | ||
| 87 | cmd = [shutil.which('ffmpeg') or 'ffmpeg', '-y', '-i', str(wav_path), str(output)] | ||
| 88 | subprocess.run(cmd, check=True, capture_output=True) | ||
| 89 | else: | ||
| 90 | if output_format == 'mp3': | ||
| 91 | actual_format = 'wav' | ||
| 92 | output = output.with_suffix('.wav') | ||
| 93 | sf.write(output, clip, sr) | ||
| 94 | |||
| 95 | return { | ||
| 96 | 'source_audio_path': audio_path, | ||
| 97 | 'clip_start_sec': round(clip_start_sec, 4), | ||
| 98 | 'clip_end_sec': round(clip_end_sec, 4), | ||
| 99 | 'duration_sec': round((end - start) / sr, 4), | ||
| 100 | 'output_path': str(output), | ||
| 101 | 'output_format': actual_format, | ||
| 102 | } |
acr-engine/tests/test_bootstrap.py
0 → 100644
acr-engine/tests/test_context_exporter.py
0 → 100644
| 1 | import tempfile | ||
| 2 | import unittest | ||
| 3 | from pathlib import Path | ||
| 4 | |||
| 5 | import test_bootstrap | ||
| 6 | |||
| 7 | import numpy as np | ||
| 8 | import soundfile as sf | ||
| 9 | |||
| 10 | from src.utils.context_exporter import export_match_context, find_best_matching_window | ||
| 11 | |||
| 12 | |||
| 13 | class ContextExporterTests(unittest.TestCase): | ||
| 14 | def test_find_best_matching_window_returns_valid_range(self): | ||
| 15 | sr = 16000 | ||
| 16 | with tempfile.TemporaryDirectory() as tmp: | ||
| 17 | query = Path(tmp) / 'query.wav' | ||
| 18 | ref = Path(tmp) / 'ref.wav' | ||
| 19 | tone = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)).astype(np.float32) | ||
| 20 | ref_y = np.concatenate([np.zeros(sr), tone, np.zeros(sr)]).astype(np.float32) | ||
| 21 | sf.write(query, tone, sr) | ||
| 22 | sf.write(ref, ref_y, sr) | ||
| 23 | match = find_best_matching_window(str(query), str(ref), sr=sr, stride_sec=0.5) | ||
| 24 | self.assertGreaterEqual(match['window_start_sec'], 0.0) | ||
| 25 | self.assertGreater(match['window_end_sec'], match['window_start_sec']) | ||
| 26 | |||
| 27 | def test_export_match_context_writes_audio(self): | ||
| 28 | sr = 16000 | ||
| 29 | with tempfile.TemporaryDirectory() as tmp: | ||
| 30 | ref = Path(tmp) / 'ref.wav' | ||
| 31 | out = Path(tmp) / 'context.wav' | ||
| 32 | y = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 12, sr * 12, endpoint=False)).astype(np.float32) | ||
| 33 | sf.write(ref, y, sr) | ||
| 34 | info = export_match_context(str(ref), 4.0, 7.0, str(out), context_sec=10.0, output_format='wav', sr=sr) | ||
| 35 | self.assertTrue(Path(info['output_path']).exists()) | ||
| 36 | self.assertEqual(info['output_format'], 'wav') | ||
| 37 | |||
| 38 | |||
| 39 | if __name__ == '__main__': | ||
| 40 | unittest.main() |
| ... | @@ -2,6 +2,8 @@ import tempfile | ... | @@ -2,6 +2,8 @@ import tempfile |
| 2 | import unittest | 2 | import unittest |
| 3 | from pathlib import Path | 3 | from pathlib import Path |
| 4 | 4 | ||
| 5 | import test_bootstrap | ||
| 6 | |||
| 5 | from scripts.local_music20_acr import collect_pairs, first_file | 7 | from scripts.local_music20_acr import collect_pairs, first_file |
| 6 | 8 | ||
| 7 | 9 | ... | ... |
acr-engine/tests/test_voice_chunker.py
0 → 100644
| 1 | import tempfile | ||
| 2 | import unittest | ||
| 3 | from pathlib import Path | ||
| 4 | |||
| 5 | import test_bootstrap | ||
| 6 | |||
| 7 | import numpy as np | ||
| 8 | import soundfile as sf | ||
| 9 | |||
| 10 | from src.data.voice_chunker import detect_voiced_intervals, chunk_intervals, voice_to_chunks | ||
| 11 | |||
| 12 | |||
| 13 | class VoiceChunkerTests(unittest.TestCase): | ||
| 14 | def test_detect_voiced_intervals_filters_short_segments(self): | ||
| 15 | sr = 16000 | ||
| 16 | y = np.concatenate([ | ||
| 17 | np.zeros(sr), | ||
| 18 | 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)), | ||
| 19 | np.zeros(sr // 2), | ||
| 20 | ]).astype(np.float32) | ||
| 21 | intervals = detect_voiced_intervals(y, sr=sr, top_db=30, min_voiced_sec=2.0) | ||
| 22 | self.assertEqual(len(intervals), 1) | ||
| 23 | |||
| 24 | def test_chunk_intervals_handles_short_and_long_regions(self): | ||
| 25 | sr = 16000 | ||
| 26 | chunks = chunk_intervals([(0, sr * 3), (sr * 5, sr * 15)], sr=sr, target_chunk_sec=8.0, stride_sec=4.0) | ||
| 27 | self.assertTrue(any(padded for _, _, padded in chunks)) | ||
| 28 | self.assertGreaterEqual(len(chunks), 2) | ||
| 29 | |||
| 30 | def test_voice_to_chunks_writes_chunk_files(self): | ||
| 31 | sr = 16000 | ||
| 32 | with tempfile.TemporaryDirectory() as tmp: | ||
| 33 | src = Path(tmp) / 'hum.wav' | ||
| 34 | out = Path(tmp) / 'chunks' | ||
| 35 | y = np.concatenate([ | ||
| 36 | np.zeros(sr), | ||
| 37 | 0.2 * np.sin(2 * np.pi * 330 * np.linspace(0, 4, sr * 4, endpoint=False)), | ||
| 38 | np.zeros(sr), | ||
| 39 | ]).astype(np.float32) | ||
| 40 | sf.write(src, y, sr) | ||
| 41 | chunks = voice_to_chunks(str(src), str(out), target_chunk_sec=3.0, stride_sec=2.0, min_voiced_sec=2.0, sr=sr) | ||
| 42 | self.assertGreaterEqual(len(chunks), 1) | ||
| 43 | self.assertTrue(Path(chunks[0]['audio_path']).exists()) | ||
| 44 | |||
| 45 | |||
| 46 | if __name__ == '__main__': | ||
| 47 | unittest.main() |
| 1 | 1 | ||
| 2 | ## 2026-06-03 voice-to-chunk and context export foundation | ||
| 3 | |||
| 4 | - 新增 `acr-engine/src/data/voice_chunker.py`,支持 voice / humming 音频切 chunk。 | ||
| 5 | - 新增 `acr-engine/scripts/build_humming_eval_manifest.py`,支持从 chunk 结果生成 `humming_real` 评测 manifest。 | ||
| 6 | - 新增 `acr-engine/src/utils/context_exporter.py`,支持把命中的 reference window 导出成上下文 clip。 | ||
| 7 | - 扩展 `acr-engine/src/service/app.py`,加入 `POST /recognize/voice` 接口雏形。 | ||
| 8 | - 文档入口 `docs/README.md` 已简化为最新架构与最短阅读顺序。 | ||
| 9 | |||
| 10 | Fresh evidence: | ||
| 11 | - `/usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v` => `Ran 7 tests, OK` | ||
| 12 | - 当前环境缺 `uvicorn`,服务 smoke 尚不能直接启动,需要先补运行依赖。 | ||
| 13 | |||
| 14 | |||
| 2 | ## 2026-06-03 20-song local ACR workflow in acr-engine | 15 | ## 2026-06-03 20-song local ACR workflow in acr-engine |
| 3 | 16 | ||
| 4 | - 新增 `acr-engine/scripts/local_music20_acr.py`,在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。 | 17 | - 新增 `acr-engine/scripts/local_music20_acr.py`,在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。 | ... | ... |
| 1 | # ACR Docs Overview | 1 | # ACR Docs Overview |
| 2 | 2 | ||
| 3 | > 更新:2026-06-02 | 3 | > 保留最新架构与最短落地入口。历史细节仍在仓库中,但默认阅读只保留下面 6 份主文档。 |
| 4 | 4 | ||
| 5 | ## 一页结论 | 5 | ## 最短阅读顺序 |
| 6 | 6 | ||
| 7 | 当前文档入口过多,现统一浓缩为 **5 组主文档**: | 7 | 1. [session-handoff.md](./session-handoff.md) |
| 8 | 2. [CHANGELOG.md](./CHANGELOG.md) | ||
| 9 | 3. [acr-architecture.md](./acr-architecture.md) | ||
| 10 | 4. [dataset-spec.md](./dataset-spec.md) | ||
| 11 | 5. [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md) | ||
| 12 | 6. [runbook.md](./runbook.md) | ||
| 8 | 13 | ||
| 9 | 1. **项目与架构** | 14 | ## 当前推荐只看这几类 |
| 10 | 2. **数据与评测** | ||
| 11 | 3. **业务数据接入** | ||
| 12 | 4. **服务与工程** | ||
| 13 | 5. **研究与路线** | ||
| 14 | 15 | ||
| 15 | 建议先只读这 5 组,不必一次看完全部细节文档。 | 16 | ### 1. 项目架构 |
| 17 | - [acr-architecture.md](./acr-architecture.md) | ||
| 18 | - [session-handoff.md](./session-handoff.md) | ||
| 16 | 19 | ||
| 17 | --- | 20 | ### 2. 数据与评测 |
| 21 | - [dataset-spec.md](./dataset-spec.md) | ||
| 22 | - [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md) | ||
| 23 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 18 | 24 | ||
| 19 | ## 1. 文档导航图 | 25 | ### 3. 运行与服务 |
| 26 | - [runbook.md](./runbook.md) | ||
| 27 | - [service-api.md](./service-api.md) | ||
| 20 | 28 | ||
| 21 | ```mermaid | 29 | ### 4. 最新 hard-case 结论 |
| 22 | flowchart TD | 30 | - [acr-hard-case-analysis.md](../acr-engine/../docs/acr-hard-case-analysis.md) |
| 23 | A[Docs Entry] --> B[Project Responsibility] | ||
| 24 | A --> C[Architecture] | ||
| 25 | A --> D[Dataset Spec] | ||
| 26 | A --> E[Business Export Chain] | ||
| 27 | A --> F[Service API] | ||
| 28 | A --> G[Industrial Benchmark] | ||
| 29 | A --> H[Industrialization Roadmap] | ||
| 30 | A --> I[Licensing & Sources] | ||
| 31 | A --> J[SOTA Research] | ||
| 32 | 31 | ||
| 33 | B --> C | 32 | ## 当前架构一句话 |
| 34 | C --> D | ||
| 35 | D --> E | ||
| 36 | E --> F | ||
| 37 | G --> H | ||
| 38 | I --> H | ||
| 39 | J --> H | ||
| 40 | ``` | ||
| 41 | 33 | ||
| 42 | --- | 34 | - `/workspace`:样本与素材来源 |
| 43 | 35 | - `acr-engine/`:训练、索引、识别、服务主工程 | |
| 44 | ## 2. 浓缩阅读入口 | 36 | - 本地小样本验证:优先 **FAISS** |
| 45 | 37 | - 生产向量检索:统一 **pgvector** | |
| 46 | | 读者角色 | 建议先读 | | ||
| 47 | |---|---| | ||
| 48 | | 新成员 | [项目与架构](./project-responsibility-map.md), [系统架构](./acr-architecture.md) | | ||
| 49 | | 算法/模型 | [数据规范](./dataset-spec.md), [SOTA 调研](./sota-research-2026.md) | | ||
| 50 | | 平台/后端 | [服务接口](./service-api.md), [评测规范](./industrial-benchmark-spec.md) | | ||
| 51 | | 数据接入 | [开放数据工作流](./open-dataset-workflow.md), [业务导出 Cookbook](./business-export-cookbook.md) | | ||
| 52 | | 负责人/规划 | [工业化路线](./industrialization-roadmap.md), [交接文档](./session-handoff.md) | | ||
| 53 | |||
| 54 | --- | ||
| 55 | |||
| 56 | ## 2.5 新 session 最短阅读顺序 | ||
| 57 | |||
| 58 | 如果是新 session 接手,建议直接按这个顺序: | ||
| 59 | |||
| 60 | 1. [持续开发交接文档](./session-handoff.md) | ||
| 61 | 2. [更新记录](./CHANGELOG.md) | ||
| 62 | 3. [业务导出 Cookbook](./business-export-cookbook.md) 或 [开放数据工作流](./open-dataset-workflow.md) | ||
| 63 | |||
| 64 | 选择规则: | ||
| 65 | - 做你们自己的业务素材接入:先读 `business-export-cookbook.md` | ||
| 66 | - 做 FMA / MTG-Jamendo 这类开放数据:先读 `open-dataset-workflow.md` | ||
| 67 | |||
| 68 | ## 2.6 新 session 最短可跑命令 | ||
| 69 | |||
| 70 | 如果你只是想先确认“业务导出链还能不能跑”,直接执行: | ||
| 71 | |||
| 72 | ```bash | ||
| 73 | cd /workspace/acr-engine | ||
| 74 | /usr/local/miniconda3/bin/python scripts/business_export_offline_smoke.py \ | ||
| 75 | --output-root /tmp/business_export_offline_smoke | ||
| 76 | ``` | ||
| 77 | |||
| 78 | 预期结果: | ||
| 79 | - 生成业务导出样例 | ||
| 80 | - 生成 manifest-ready JSONL | ||
| 81 | - 生成项目 `catalog/train/test/val` | ||
| 82 | - `train.py --dry-run` 通过 | ||
| 83 | |||
| 84 | ## 3. 主文档分组 | ||
| 85 | |||
| 86 | ### A. 项目与架构 | ||
| 87 | - [项目职责图](./project-responsibility-map.md) | ||
| 88 | - [系统架构](./acr-architecture.md) | ||
| 89 | |||
| 90 | ### B. 数据与评测 | ||
| 91 | - [数据规范](./dataset-spec.md) | ||
| 92 | - [开放数据工作流](./open-dataset-workflow.md) | ||
| 93 | - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md) | ||
| 94 | - [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md) | ||
| 95 | - [数据来源与接入](./dataset-sources-and-licensing.md) | ||
| 96 | - [工业评测规范](./industrial-benchmark-spec.md) | ||
| 97 | |||
| 98 | 快速落地入口: | ||
| 99 | - [开放数据工作流](./open-dataset-workflow.md) | ||
| 100 | - [本地开放数据落点目录](../acr-engine/data/raw/README.md) | ||
| 101 | - 离线 smoke 已验证:`acr-engine/scripts/business_export_offline_smoke.py` | ||
| 102 | |||
| 103 | ### C. 业务数据接入 | ||
| 104 | - [业务素材类型与 Bucket 指南](./business-music-bucket-and-type-guide.md) | ||
| 105 | - [业务 Manifest 与 Type-Role 规范](./business-manifest-and-type-role-spec.md) | ||
| 106 | - [业务导出 Cookbook](./business-export-cookbook.md) | ||
| 107 | - [业务数据到项目 Manifest 适配](./business-project-manifest-adapter.md) | ||
| 108 | |||
| 109 | 业务数据最短链: | ||
| 110 | 1. [业务导出 Cookbook](./business-export-cookbook.md) | ||
| 111 | 2. `acr-engine/scripts/normalize_business_export.py` | ||
| 112 | 3. `acr-engine/scripts/split_business_manifest_ready.py` | ||
| 113 | 4. `acr-engine/scripts/build_business_project_manifests.py` | ||
| 114 | 5. `acr-engine/scripts/business_export_offline_smoke.py` | ||
| 115 | |||
| 116 | ### D. 服务与工程 | ||
| 117 | - [服务接口](./service-api.md) | ||
| 118 | - [持续开发交接文档](./session-handoff.md) | ||
| 119 | - [当前能力地图](./current-capability-map.md) | ||
| 120 | - [首次启动检查清单](../acr-engine/FIRST_RUN_CHECKLIST.md) | ||
| 121 | - [更新记录](./CHANGELOG.md) | ||
| 122 | |||
| 123 | ### E. 研究与路线 | ||
| 124 | - [工业化路线](./industrialization-roadmap.md) | ||
| 125 | - [SOTA 调研](./sota-research-2026.md) | ||
| 126 | - [引用来源总表](./references-and-sources.md) | ||
| 127 | |||
| 128 | --- | ||
| 129 | |||
| 130 | ## 4. 文字说明 | ||
| 131 | |||
| 132 | 现在开始减少“同层重复文档”的阅读成本: | ||
| 133 | - 先从入口页做分组 | ||
| 134 | - 再在每组里保留 1~3 份主文档 | ||
| 135 | - 次级细节尽量放到组内,而不是继续横向扩张文件数量 | ||
| 136 | |||
| 137 | --- | ||
| 138 | |||
| 139 | ## 5. 细节附录 | ||
| 140 | |||
| 141 | 建议使用方式: | ||
| 142 | - 想了解项目先读 [项目职责图](./project-responsibility-map.md) + [系统架构](./acr-architecture.md) | ||
| 143 | - 想训练/评测先读 [数据规范](./dataset-spec.md) | ||
| 144 | - 想接开放数据先读 [数据来源与接入](./dataset-sources-and-licensing.md) | ||
| 145 | - 想看历史演进再读 [更新记录](./CHANGELOG.md) | ||
| 146 | |||
| 147 | ## Sources | ||
| 148 | - This file is an internal documentation navigation artifact for the current repo state. | ... | ... |
-
Please register or sign in to post a comment