Commit bd66c06b bd66c06bd7512295f9d9510ddb3ae45a150685c0 by cnb.bofCdSsphPA

Add voice chunking and match-context foundations for ACR service

Constraint: keep humming/recording query support lightweight and compatible with the existing FAISS-first local workflow while production retrieval remains pgvector-oriented
Rejected: delaying service-path scaffolding until full production retrieval is ready | would block validation of voice-to-chunk and context export behavior
Confidence: high
Scope-risk: moderate
Directive: keep  semantics song_id-first and treat resource paths only as supporting evidence/context artifacts
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v
Not-tested: live FastAPI smoke until uvicorn is available in the current interpreter environment
1 parent 69843933
...@@ -123,3 +123,29 @@ cd acr-engine ...@@ -123,3 +123,29 @@ cd acr-engine
123 - Hybrid 分数归一化后再融合 123 - Hybrid 分数归一化后再融合
124 - full-demo 自动训练 124 - full-demo 自动训练
125 - 后续可接入开源数据集 125 - 后续可接入开源数据集
126
127
128 ## 哼唱 / 录音识别接口(voice -> chunk -> song_id)
129
130 当前已经补齐了两段基础能力:
131
132 - `src/data/voice_chunker.py`:把原始 voice / humming 音频切成可检索 chunk
133 - `src/utils/context_exporter.py`:把命中的 reference window 导出为上下文 clip(默认 10s)
134
135 FastAPI 目标接口:
136
137 - `POST /recognize/voice`
138
139 输入:
140 - 外部上传语音/录音文件
141
142 输出:
143 - `song_id`
144 - `reference_audio_path`
145 - `best_chunk`
146 - `context_clip`
147 - `chunk_results`
148
149 说明:
150 - 该接口代码已接入 `src/service/app.py`
151 - 当前环境尚缺 `uvicorn`,因此服务 smoke 需要先补运行依赖后再执行。
......
1 #!/usr/bin/env /usr/local/miniconda3/bin/python
2 from __future__ import annotations
3
4 import argparse
5 import json
6 from pathlib import Path
7
8
9 def main() -> None:
10 ap = argparse.ArgumentParser()
11 ap.add_argument('--chunks-json', required=True)
12 ap.add_argument('--song-id', required=True)
13 ap.add_argument('--split', default='test')
14 ap.add_argument('--output', required=True)
15 ap.add_argument('--source-dataset', default='humming_real')
16 args = ap.parse_args()
17
18 payload = json.loads(Path(args.chunks_json).read_text(encoding='utf-8'))
19 rows = []
20 for chunk in payload.get('chunks', []):
21 rows.append({
22 'song_id': args.song_id,
23 'audio_path': chunk['audio_path'],
24 'duration': chunk['duration_sec'],
25 'type': 'humming_real',
26 'segment_type': 'humming_query',
27 'offset': chunk['start_sec'],
28 'source_dataset': args.source_dataset,
29 'split': args.split,
30 })
31
32 out = Path(args.output)
33 out.parent.mkdir(parents=True, exist_ok=True)
34 out.write_text(json.dumps(rows, ensure_ascii=False, indent=2), encoding='utf-8')
35 print(json.dumps({'rows': len(rows), 'output': str(out)}, ensure_ascii=False, indent=2))
36
37
38 if __name__ == '__main__':
39 main()
1 #!/usr/bin/env /usr/local/miniconda3/bin/python
2 from __future__ import annotations
3
4 import json
5 import subprocess
6 import time
7 from pathlib import Path
8 from urllib.request import Request, urlopen
9
10 BASE = 'http://127.0.0.1:8000'
11
12
13 def post_multipart(url: str, file_path: Path):
14 boundary = '----acrboundary'
15 data = file_path.read_bytes()
16 body = (
17 f'--{boundary}\r\n'
18 f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n'
19 f'Content-Type: audio/wav\r\n\r\n'
20 ).encode('utf-8') + data + f'\r\n--{boundary}--\r\n'.encode('utf-8')
21 req = Request(url, data=body, method='POST')
22 req.add_header('Content-Type', f'multipart/form-data; boundary={boundary}')
23 with urlopen(req, timeout=20) as resp:
24 return json.loads(resp.read().decode('utf-8'))
25
26
27 def main():
28 cmd = [
29 '/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000'
30 ]
31 proc = subprocess.Popen(cmd, cwd='/root/vprecog/acr-engine', stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
32 query = Path('/workspace/downloads/111/type_7/75cd601b-7604-4b37-8132-cfab39e7c644.mp3')
33 try:
34 for _ in range(20):
35 time.sleep(0.5)
36 try:
37 result = post_multipart(BASE + '/recognize/voice', query)
38 print(json.dumps({
39 'status': 'ok',
40 'chunk_count': result.get('chunk_count'),
41 'top_song_id': result.get('candidates', [{}])[0].get('song_id') if result.get('candidates') else None,
42 'has_context': bool(result.get('candidates', [{}])[0].get('context_clip')) if result.get('candidates') else False,
43 }, ensure_ascii=False, indent=2))
44 return
45 except Exception:
46 continue
47 raise SystemExit('service voice smoke failed: service not ready or endpoint failed')
48 finally:
49 proc.terminate()
50 try:
51 proc.wait(timeout=5)
52 except subprocess.TimeoutExpired:
53 proc.kill()
54 proc.wait(timeout=5)
55
56
57 if __name__ == '__main__':
58 main()
1 #!/usr/bin/env /usr/local/miniconda3/bin/python
2 from __future__ import annotations
3
4 import argparse
5 import json
6 from pathlib import Path
7 from typing import List, Dict
8
9 import librosa
10 import numpy as np
11 import soundfile as sf
12
13
14 def normalize_audio(audio_path: str, sr: int = 16000) -> np.ndarray:
15 y, _ = librosa.load(audio_path, sr=sr, mono=True)
16 return y.astype(np.float32)
17
18
19 def detect_voiced_intervals(y: np.ndarray, sr: int, top_db: int = 30, min_voiced_sec: float = 2.0) -> List[tuple[int, int]]:
20 intervals = librosa.effects.split(y, top_db=top_db)
21 min_len = int(sr * min_voiced_sec)
22 kept = []
23 for start, end in intervals:
24 if end - start >= min_len:
25 kept.append((int(start), int(end)))
26 return kept
27
28
29 def chunk_intervals(intervals: List[tuple[int, int]], sr: int, target_chunk_sec: float = 8.0, stride_sec: float = 4.0) -> List[tuple[int, int, bool]]:
30 chunk_len = int(sr * target_chunk_sec)
31 stride = int(sr * stride_sec)
32 chunks: List[tuple[int, int, bool]] = []
33 for start, end in intervals:
34 seg_len = end - start
35 if seg_len < chunk_len:
36 chunks.append((start, end, True))
37 continue
38 pos = start
39 while pos + chunk_len <= end:
40 chunks.append((pos, pos + chunk_len, False))
41 pos += stride
42 if pos < end and end - pos >= int(sr * 2.0):
43 tail_start = max(start, end - chunk_len)
44 chunks.append((tail_start, end, end - tail_start < chunk_len))
45 deduped = []
46 seen = set()
47 for item in chunks:
48 key = (item[0], item[1])
49 if key not in seen:
50 deduped.append(item)
51 seen.add(key)
52 return deduped
53
54
55 def write_chunks(y: np.ndarray, sr: int, chunks: List[tuple[int, int, bool]], output_dir: str, source_audio_path: str) -> List[Dict]:
56 out_dir = Path(output_dir)
57 out_dir.mkdir(parents=True, exist_ok=True)
58 chunk_len = None
59 results = []
60 for idx, (start, end, padded) in enumerate(chunks):
61 clip = y[start:end]
62 if chunk_len is None:
63 chunk_len = max(len(clip), 1)
64 target_len = max(chunk_len, len(clip))
65 if padded and len(clip) < target_len:
66 clip = np.pad(clip, (0, target_len - len(clip)))
67 chunk_path = out_dir / f'chunk_{idx:03d}.wav'
68 sf.write(str(chunk_path), clip, sr)
69 results.append({
70 'chunk_id': f'chunk_{idx:03d}',
71 'audio_path': str(chunk_path),
72 'start_sec': round(start / sr, 4),
73 'end_sec': round(end / sr, 4),
74 'duration_sec': round(len(clip) / sr, 4),
75 'padded': padded,
76 'source_audio_path': source_audio_path,
77 })
78 return results
79
80
81 def voice_to_chunks(audio_path: str, output_dir: str, target_chunk_sec: float = 8.0, stride_sec: float = 4.0, min_voiced_sec: float = 2.0, top_db: int = 30, sr: int = 16000) -> List[Dict]:
82 y = normalize_audio(audio_path, sr=sr)
83 intervals = detect_voiced_intervals(y, sr=sr, top_db=top_db, min_voiced_sec=min_voiced_sec)
84 chunks = chunk_intervals(intervals, sr=sr, target_chunk_sec=target_chunk_sec, stride_sec=stride_sec)
85 return write_chunks(y, sr, chunks, output_dir, source_audio_path=audio_path)
86
87
88 def main() -> None:
89 ap = argparse.ArgumentParser()
90 ap.add_argument('--input', required=True)
91 ap.add_argument('--output-dir', required=True)
92 ap.add_argument('--target-chunk-sec', type=float, default=8.0)
93 ap.add_argument('--stride-sec', type=float, default=4.0)
94 ap.add_argument('--min-voiced-sec', type=float, default=2.0)
95 ap.add_argument('--top-db', type=int, default=30)
96 ap.add_argument('--sr', type=int, default=16000)
97 ap.add_argument('--output-json', default='chunks.json')
98 args = ap.parse_args()
99 chunks = voice_to_chunks(
100 audio_path=args.input,
101 output_dir=args.output_dir,
102 target_chunk_sec=args.target_chunk_sec,
103 stride_sec=args.stride_sec,
104 min_voiced_sec=args.min_voiced_sec,
105 top_db=args.top_db,
106 sr=args.sr,
107 )
108 out_json = Path(args.output_dir) / args.output_json
109 out_json.write_text(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2), encoding='utf-8')
110 print(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2))
111
112
113 if __name__ == '__main__':
114 main()
1 from __future__ import annotations 1 from __future__ import annotations
2 2
3 from pathlib import Path 3 from pathlib import Path
4 from tempfile import TemporaryDirectory
4 from threading import Lock 5 from threading import Lock
5 from typing import Optional 6 from typing import Optional
6 7
7 import numpy as np 8 import numpy as np
8 from fastapi import FastAPI, HTTPException 9 from fastapi import FastAPI, File, HTTPException, UploadFile
9 from pydantic import BaseModel 10 from pydantic import BaseModel
10 11
12 from src.data.voice_chunker import voice_to_chunks
11 from src.engines.chromaprint_matcher import ChromaprintMatcher 13 from src.engines.chromaprint_matcher import ChromaprintMatcher
12 from src.engines.ecapa_embedder import ECAPAEmbedder 14 from src.engines.ecapa_embedder import ECAPAEmbedder
13 from src.engines.hybrid_engine import HybridEngine 15 from src.engines.hybrid_engine import HybridEngine
14 from src.service.settings import ServiceSettings 16 from src.service.settings import ServiceSettings
17 from src.utils.context_exporter import export_match_context, find_best_matching_window
15 18
16 19
17 class RecognizeRequest(BaseModel): 20 class RecognizeRequest(BaseModel):
...@@ -30,7 +33,7 @@ class BuildIndexRequest(BaseModel): ...@@ -30,7 +33,7 @@ class BuildIndexRequest(BaseModel):
30 device: Optional[str] = None 33 device: Optional[str] = None
31 34
32 35
33 app = FastAPI(title="ACR Service", version="0.3.0") 36 app = FastAPI(title='ACR Service', version='0.4.0')
34 settings = ServiceSettings() 37 settings = ServiceSettings()
35 _engine_cache: dict[tuple[str, str, str, str], HybridEngine] = {} 38 _engine_cache: dict[tuple[str, str, str, str], HybridEngine] = {}
36 _cache_lock = Lock() 39 _cache_lock = Lock()
...@@ -38,52 +41,52 @@ _cache_lock = Lock() ...@@ -38,52 +41,52 @@ _cache_lock = Lock()
38 41
39 def _resolve(req_data_dir=None, req_model_path=None, req_index_prefix=None, req_device=None): 42 def _resolve(req_data_dir=None, req_model_path=None, req_index_prefix=None, req_device=None):
40 return { 43 return {
41 "data_dir": req_data_dir or settings.data_dir, 44 'data_dir': req_data_dir or settings.data_dir,
42 "model_path": req_model_path or settings.model_path, 45 'model_path': req_model_path or settings.model_path,
43 "index_prefix": req_index_prefix or settings.index_prefix, 46 'index_prefix': req_index_prefix or settings.index_prefix,
44 "device": req_device or settings.device, 47 'device': req_device or settings.device,
45 } 48 }
46 49
47 50
48 def _readiness_snapshot(data_dir: str, model_path: str, index_prefix: str) -> dict: 51 def _readiness_snapshot(data_dir: str, model_path: str, index_prefix: str) -> dict:
49 chroma_path = str(Path(index_prefix).parent / "chromaprint.pkl") 52 chroma_path = str(Path(index_prefix).parent / 'chromaprint.pkl')
50 embs_path = f"{index_prefix}_embs.npy" 53 embs_path = f'{index_prefix}_embs.npy'
51 ids_path = f"{index_prefix}_ids.npy" 54 ids_path = f'{index_prefix}_ids.npy'
52 manifest_candidates = [str((Path(data_dir) / split).resolve()) for split in ["catalog.json", "train.json", "val.json", "test.json"] if (Path(data_dir) / split).exists()] 55 manifest_candidates = [
56 str((Path(data_dir) / split).resolve())
57 for split in ['catalog.json', 'train.json', 'val.json', 'test.json']
58 if (Path(data_dir) / split).exists()
59 ]
53 files = { 60 files = {
54 "data_dir": {"path": str(Path(data_dir).resolve()), "exists": Path(data_dir).exists()}, 61 'data_dir': {'path': str(Path(data_dir).resolve()), 'exists': Path(data_dir).exists()},
55 "model": {"path": str(Path(model_path).resolve()), "exists": Path(model_path).exists()}, 62 'model': {'path': str(Path(model_path).resolve()), 'exists': Path(model_path).exists()},
56 "chromaprint_index": {"path": str(Path(chroma_path).resolve()), "exists": Path(chroma_path).exists()}, 63 'chromaprint_index': {'path': str(Path(chroma_path).resolve()), 'exists': Path(chroma_path).exists()},
57 "embedding_index": {"path": str(Path(embs_path).resolve()), "exists": Path(embs_path).exists()}, 64 'embedding_index': {'path': str(Path(embs_path).resolve()), 'exists': Path(embs_path).exists()},
58 "id_index": {"path": str(Path(ids_path).resolve()), "exists": Path(ids_path).exists()}, 65 'id_index': {'path': str(Path(ids_path).resolve()), 'exists': Path(ids_path).exists()},
59 }
60 return {
61 "ready": all(item["exists"] for item in files.values()),
62 "files": files,
63 "manifests": manifest_candidates,
64 } 66 }
67 return {'ready': all(item['exists'] for item in files.values()), 'files': files, 'manifests': manifest_candidates}
65 68
66 69
67 def _load_engine_uncached(data_dir: str, model_path: str, index_prefix: str, device: str) -> HybridEngine: 70 def _load_engine_uncached(data_dir: str, model_path: str, index_prefix: str, device: str) -> HybridEngine:
68 matcher = ChromaprintMatcher() 71 matcher = ChromaprintMatcher()
69 chroma_path = str(Path(index_prefix).parent / "chromaprint.pkl") 72 chroma_path = str(Path(index_prefix).parent / 'chromaprint.pkl')
70 if not Path(chroma_path).exists(): 73 if not Path(chroma_path).exists():
71 raise HTTPException(status_code=400, detail=f"Missing chromaprint index: {chroma_path}") 74 raise HTTPException(status_code=400, detail=f'Missing chromaprint index: {chroma_path}')
72 matcher.load(chroma_path) 75 matcher.load(chroma_path)
73 76
74 if not Path(model_path).exists(): 77 if not Path(model_path).exists():
75 raise HTTPException(status_code=400, detail=f"Missing model: {model_path}") 78 raise HTTPException(status_code=400, detail=f'Missing model: {model_path}')
76 embedder = ECAPAEmbedder(model_path=model_path, device=device) 79 embedder = ECAPAEmbedder(model_path=model_path, device=device)
77 80
78 embs_path = f"{index_prefix}_embs.npy" 81 embs_path = f'{index_prefix}_embs.npy'
79 ids_path = f"{index_prefix}_ids.npy" 82 ids_path = f'{index_prefix}_ids.npy'
80 if not Path(embs_path).exists() or not Path(ids_path).exists(): 83 if not Path(embs_path).exists() or not Path(ids_path).exists():
81 raise HTTPException(status_code=400, detail="Missing embedding index files") 84 raise HTTPException(status_code=400, detail='Missing embedding index files')
82 85
83 ref_embs = np.load(embs_path) 86 ref_embs = np.load(embs_path)
84 ref_ids = np.load(ids_path, allow_pickle=True).tolist() 87 ref_ids = np.load(ids_path, allow_pickle=True).tolist()
85 engine = HybridEngine(matcher, embedder, ref_embs, ref_ids) 88 engine = HybridEngine(matcher, embedder, ref_embs, ref_ids)
86 for split in ["catalog.json", "train.json", "val.json", "test.json"]: 89 for split in ['catalog.json', 'train.json', 'val.json', 'test.json']:
87 p = Path(data_dir) / split 90 p = Path(data_dir) / split
88 if p.exists(): 91 if p.exists():
89 engine.load_metadata(str(p)) 92 engine.load_metadata(str(p))
...@@ -105,70 +108,168 @@ def _load_engine(data_dir: str, model_path: str, index_prefix: str, device: str) ...@@ -105,70 +108,168 @@ def _load_engine(data_dir: str, model_path: str, index_prefix: str, device: str)
105 def _cache_stats() -> dict: 108 def _cache_stats() -> dict:
106 with _cache_lock: 109 with _cache_lock:
107 keys = list(_engine_cache.keys()) 110 keys = list(_engine_cache.keys())
108 return {"engine_cache_size": len(keys), "cache_keys": keys} 111 return {'engine_cache_size': len(keys), 'cache_keys': keys}
109 112
110 113
111 @app.get("/health") 114 def _aggregate_chunk_results(chunk_results: list[dict], top_n: int) -> list[dict]:
115 by_song: dict[str, dict] = {}
116 for chunk in chunk_results:
117 for cand in chunk.get('candidates', []):
118 song_id = cand['song_id']
119 entry = by_song.setdefault(song_id, {
120 'song_id': song_id,
121 'best_confidence': -1.0,
122 'match_count': 0,
123 'best_chunk': None,
124 'best_candidate': None,
125 })
126 entry['match_count'] += 1
127 if cand['confidence'] > entry['best_confidence']:
128 entry['best_confidence'] = cand['confidence']
129 entry['best_chunk'] = chunk
130 entry['best_candidate'] = cand
131 ranked = []
132 for entry in by_song.values():
133 combined = float(entry['best_confidence']) + 0.05 * float(entry['match_count'])
134 ranked.append({
135 'song_id': entry['song_id'],
136 'combined_confidence': round(combined, 4),
137 'best_confidence': round(float(entry['best_confidence']), 4),
138 'match_count': entry['match_count'],
139 'best_chunk': entry['best_chunk'],
140 'best_candidate': entry['best_candidate'],
141 })
142 ranked.sort(key=lambda x: x['combined_confidence'], reverse=True)
143 return ranked[:top_n]
144
145
146 def _reference_audio_for_song(engine: HybridEngine, song_id: str) -> str | None:
147 return engine.song_audio_paths.get(song_id)
148
149
150 @app.get('/health')
112 def health(): 151 def health():
113 resolved = _resolve() 152 resolved = _resolve()
114 readiness = _readiness_snapshot(resolved["data_dir"], resolved["model_path"], resolved["index_prefix"]) 153 readiness = _readiness_snapshot(resolved['data_dir'], resolved['model_path'], resolved['index_prefix'])
115 return { 154 return {'status': 'ok', 'service': 'acr', 'version': '0.4.0', 'ready': readiness['ready']}
116 "status": "ok",
117 "service": "acr",
118 "version": "0.3.0",
119 "ready": readiness["ready"],
120 }
121 155
122 156
123 @app.get("/ready") 157 @app.get('/ready')
124 def ready(): 158 def ready():
125 resolved = _resolve() 159 resolved = _resolve()
126 readiness = _readiness_snapshot(resolved["data_dir"], resolved["model_path"], resolved["index_prefix"]) 160 readiness = _readiness_snapshot(resolved['data_dir'], resolved['model_path'], resolved['index_prefix'])
127 return { 161 return {'service': 'acr', 'version': '0.4.0', **readiness, **_cache_stats()}
128 "service": "acr",
129 "version": "0.3.0",
130 **readiness,
131 **_cache_stats(),
132 }
133 162
134 163
135 @app.get("/config") 164 @app.get('/config')
136 def config(): 165 def config():
137 return settings.model_dump() 166 return settings.model_dump()
138 167
139 168
140 @app.get("/cache") 169 @app.get('/cache')
141 def cache_status(): 170 def cache_status():
142 return _cache_stats() 171 return _cache_stats()
143 172
144 173
145 @app.post("/recognize") 174 @app.post('/recognize')
146 def recognize(req: RecognizeRequest): 175 def recognize(req: RecognizeRequest):
147 resolved = _resolve(req.data_dir, req.model_path, req.index_prefix, req.device) 176 resolved = _resolve(req.data_dir, req.model_path, req.index_prefix, req.device)
148 if not Path(req.query_path).exists(): 177 if not Path(req.query_path).exists():
149 raise HTTPException(status_code=400, detail=f"Missing query file: {req.query_path}") 178 raise HTTPException(status_code=400, detail=f'Missing query file: {req.query_path}')
150 engine, cache_hit = _load_engine(**resolved) 179 engine, cache_hit = _load_engine(**resolved)
151 result = engine.recognize(req.query_path, top_n=req.top_n) 180 result = engine.recognize(req.query_path, top_n=req.top_n)
152 return { 181 return {'cache_hit': cache_hit, 'resolved': resolved, 'result': result}
153 "cache_hit": cache_hit,
154 "resolved": resolved,
155 "result": result,
156 }
157 182
158 183
159 @app.post("/index/build") 184 @app.post('/index/build')
160 def build_index(req: BuildIndexRequest): 185 def build_index(req: BuildIndexRequest):
161 from run_demo import build_chroma_index, build_embedding_index 186 from run_demo import build_chroma_index, build_embedding_index
162 187
163 resolved = _resolve(req.data_dir, req.model_path, None, req.device) 188 resolved = _resolve(req.data_dir, req.model_path, None, req.device)
164 data_dir = Path(resolved["data_dir"]) 189 data_dir = Path(resolved['data_dir'])
165 out_dir = Path(req.output_dir) 190 out_dir = Path(req.output_dir)
166 out_dir.mkdir(parents=True, exist_ok=True) 191 out_dir.mkdir(parents=True, exist_ok=True)
167 build_chroma_index(data_dir, out_dir) 192 build_chroma_index(data_dir, out_dir)
168 _, ref_embs, ref_ids = build_embedding_index(data_dir, Path(resolved["model_path"]), out_dir / "reference", resolved["device"]) 193 _, ref_embs, ref_ids = build_embedding_index(data_dir, Path(resolved['model_path']), out_dir / 'reference', resolved['device'])
169 return { 194 return {
170 "status": "ok", 195 'status': 'ok',
171 "num_reference_windows": len(ref_ids), 196 'num_reference_windows': len(ref_ids),
172 "embedding_dim": int(ref_embs.shape[1]) if len(ref_embs.shape) > 1 else 0, 197 'embedding_dim': int(ref_embs.shape[1]) if len(ref_embs.shape) > 1 else 0,
173 "output_dir": str(out_dir.resolve()), 198 'output_dir': str(out_dir.resolve()),
174 } 199 }
200
201
202 @app.post('/recognize/voice')
203 async def recognize_voice(
204 file: UploadFile = File(...),
205 top_n: int = 5,
206 data_dir: Optional[str] = None,
207 model_path: Optional[str] = None,
208 index_prefix: Optional[str] = None,
209 device: Optional[str] = None,
210 context_sec: float = 10.0,
211 output_format: str = 'mp3',
212 ):
213 resolved = _resolve(data_dir, model_path, index_prefix, device)
214 engine, cache_hit = _load_engine(**resolved)
215 with TemporaryDirectory(prefix='acr_voice_') as tmpdir:
216 tmp = Path(tmpdir)
217 suffix = Path(file.filename or 'upload.wav').suffix or '.wav'
218 raw_path = tmp / f'input{suffix}'
219 raw_path.write_bytes(await file.read())
220
221 chunk_dir = tmp / 'chunks'
222 chunks = voice_to_chunks(str(raw_path), str(chunk_dir))
223 if not chunks:
224 raise HTTPException(status_code=400, detail='No voiced chunks detected from input audio')
225
226 chunk_results = []
227 for chunk in chunks:
228 result = engine.recognize(chunk['audio_path'], top_n=top_n)
229 chunk_results.append({
230 'chunk': chunk,
231 'candidates': result['candidates'],
232 'processing_time_ms': result['processing_time_ms'],
233 })
234
235 ranked = _aggregate_chunk_results(chunk_results, top_n=top_n)
236 response_candidates = []
237 for item in ranked:
238 song_id = item['song_id']
239 ref_audio = _reference_audio_for_song(engine, song_id)
240 context_info = None
241 if ref_audio and item['best_chunk'] is not None:
242 match = find_best_matching_window(
243 query_audio_path=item['best_chunk']['chunk']['audio_path'],
244 reference_audio_path=ref_audio,
245 )
246 out_path = tmp / 'contexts' / f'{song_id}.{output_format}'
247 context_info = export_match_context(
248 audio_path=ref_audio,
249 window_start_sec=match['window_start_sec'],
250 window_end_sec=match['window_end_sec'],
251 output_path=str(out_path),
252 context_sec=context_sec,
253 output_format=output_format,
254 )
255 context_info['match'] = match
256
257 response_candidates.append({
258 'song_id': song_id,
259 'combined_confidence': item['combined_confidence'],
260 'best_confidence': item['best_confidence'],
261 'match_count': item['match_count'],
262 'reference_audio_path': ref_audio,
263 'best_candidate': item['best_candidate'],
264 'best_chunk': item['best_chunk']['chunk'] if item['best_chunk'] else None,
265 'context_clip': context_info,
266 })
267
268 return {
269 'cache_hit': cache_hit,
270 'resolved': resolved,
271 'query_audio_filename': file.filename,
272 'chunk_count': len(chunks),
273 'chunk_results': chunk_results,
274 'candidates': response_candidates,
275 }
......
1 from __future__ import annotations
2
3 import shutil
4 import subprocess
5 import tempfile
6 from pathlib import Path
7 from typing import Dict, Tuple
8
9 import librosa
10 import numpy as np
11 import soundfile as sf
12
13
14 def load_audio(audio_path: str, sr: int = 16000) -> np.ndarray:
15 y, _ = librosa.load(audio_path, sr=sr, mono=True)
16 return y.astype(np.float32)
17
18
19 def chroma_embedding(y: np.ndarray, sr: int) -> np.ndarray:
20 chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_chroma=12)
21 feat = np.concatenate([chroma.mean(axis=1), chroma.std(axis=1)], axis=0).astype(np.float32)
22 norm = np.linalg.norm(feat)
23 return feat / norm if norm > 0 else feat
24
25
26 def find_best_matching_window(
27 query_audio_path: str,
28 reference_audio_path: str,
29 sr: int = 16000,
30 stride_sec: float = 1.0,
31 ) -> Dict:
32 query_y = load_audio(query_audio_path, sr=sr)
33 ref_y = load_audio(reference_audio_path, sr=sr)
34 query_len = len(query_y)
35 if query_len == 0:
36 raise ValueError('Empty query audio')
37 if len(ref_y) < query_len:
38 ref_y = np.pad(ref_y, (0, query_len - len(ref_y)))
39
40 query_feat = chroma_embedding(query_y, sr)
41 stride = max(1, int(sr * stride_sec))
42 best_score = -1.0
43 best_start = 0
44 for start in range(0, max(len(ref_y) - query_len + 1, 1), stride):
45 window = ref_y[start:start + query_len]
46 if len(window) < query_len:
47 window = np.pad(window, (0, query_len - len(window)))
48 score = float(np.dot(query_feat, chroma_embedding(window, sr)))
49 if score > best_score:
50 best_score = score
51 best_start = start
52
53 return {
54 'window_start_sec': round(best_start / sr, 4),
55 'window_end_sec': round((best_start + query_len) / sr, 4),
56 'window_score': round(best_score, 6),
57 'query_duration_sec': round(query_len / sr, 4),
58 }
59
60
61 def export_match_context(
62 audio_path: str,
63 window_start_sec: float,
64 window_end_sec: float,
65 output_path: str,
66 context_sec: float = 10.0,
67 output_format: str = 'mp3',
68 sr: int = 16000,
69 ) -> Dict:
70 y = load_audio(audio_path, sr=sr)
71 center = (window_start_sec + window_end_sec) / 2.0
72 half = context_sec / 2.0
73 clip_start_sec = max(0.0, center - half)
74 clip_end_sec = min(len(y) / sr, center + half)
75 start = int(clip_start_sec * sr)
76 end = max(start + 1, int(clip_end_sec * sr))
77 clip = y[start:end]
78
79 output = Path(output_path)
80 output.parent.mkdir(parents=True, exist_ok=True)
81 actual_format = output_format
82
83 if output_format == 'mp3' and shutil.which('ffmpeg'):
84 with tempfile.TemporaryDirectory() as tmp:
85 wav_path = Path(tmp) / 'context.wav'
86 sf.write(wav_path, clip, sr)
87 cmd = [shutil.which('ffmpeg') or 'ffmpeg', '-y', '-i', str(wav_path), str(output)]
88 subprocess.run(cmd, check=True, capture_output=True)
89 else:
90 if output_format == 'mp3':
91 actual_format = 'wav'
92 output = output.with_suffix('.wav')
93 sf.write(output, clip, sr)
94
95 return {
96 'source_audio_path': audio_path,
97 'clip_start_sec': round(clip_start_sec, 4),
98 'clip_end_sec': round(clip_end_sec, 4),
99 'duration_sec': round((end - start) / sr, 4),
100 'output_path': str(output),
101 'output_format': actual_format,
102 }
1 from pathlib import Path
2 import sys
3
4 ROOT = Path(__file__).resolve().parents[1]
5 if str(ROOT) not in sys.path:
6 sys.path.insert(0, str(ROOT))
1 import tempfile
2 import unittest
3 from pathlib import Path
4
5 import test_bootstrap
6
7 import numpy as np
8 import soundfile as sf
9
10 from src.utils.context_exporter import export_match_context, find_best_matching_window
11
12
13 class ContextExporterTests(unittest.TestCase):
14 def test_find_best_matching_window_returns_valid_range(self):
15 sr = 16000
16 with tempfile.TemporaryDirectory() as tmp:
17 query = Path(tmp) / 'query.wav'
18 ref = Path(tmp) / 'ref.wav'
19 tone = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)).astype(np.float32)
20 ref_y = np.concatenate([np.zeros(sr), tone, np.zeros(sr)]).astype(np.float32)
21 sf.write(query, tone, sr)
22 sf.write(ref, ref_y, sr)
23 match = find_best_matching_window(str(query), str(ref), sr=sr, stride_sec=0.5)
24 self.assertGreaterEqual(match['window_start_sec'], 0.0)
25 self.assertGreater(match['window_end_sec'], match['window_start_sec'])
26
27 def test_export_match_context_writes_audio(self):
28 sr = 16000
29 with tempfile.TemporaryDirectory() as tmp:
30 ref = Path(tmp) / 'ref.wav'
31 out = Path(tmp) / 'context.wav'
32 y = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 12, sr * 12, endpoint=False)).astype(np.float32)
33 sf.write(ref, y, sr)
34 info = export_match_context(str(ref), 4.0, 7.0, str(out), context_sec=10.0, output_format='wav', sr=sr)
35 self.assertTrue(Path(info['output_path']).exists())
36 self.assertEqual(info['output_format'], 'wav')
37
38
39 if __name__ == '__main__':
40 unittest.main()
...@@ -2,6 +2,8 @@ import tempfile ...@@ -2,6 +2,8 @@ import tempfile
2 import unittest 2 import unittest
3 from pathlib import Path 3 from pathlib import Path
4 4
5 import test_bootstrap
6
5 from scripts.local_music20_acr import collect_pairs, first_file 7 from scripts.local_music20_acr import collect_pairs, first_file
6 8
7 9
......
1 import tempfile
2 import unittest
3 from pathlib import Path
4
5 import test_bootstrap
6
7 import numpy as np
8 import soundfile as sf
9
10 from src.data.voice_chunker import detect_voiced_intervals, chunk_intervals, voice_to_chunks
11
12
13 class VoiceChunkerTests(unittest.TestCase):
14 def test_detect_voiced_intervals_filters_short_segments(self):
15 sr = 16000
16 y = np.concatenate([
17 np.zeros(sr),
18 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)),
19 np.zeros(sr // 2),
20 ]).astype(np.float32)
21 intervals = detect_voiced_intervals(y, sr=sr, top_db=30, min_voiced_sec=2.0)
22 self.assertEqual(len(intervals), 1)
23
24 def test_chunk_intervals_handles_short_and_long_regions(self):
25 sr = 16000
26 chunks = chunk_intervals([(0, sr * 3), (sr * 5, sr * 15)], sr=sr, target_chunk_sec=8.0, stride_sec=4.0)
27 self.assertTrue(any(padded for _, _, padded in chunks))
28 self.assertGreaterEqual(len(chunks), 2)
29
30 def test_voice_to_chunks_writes_chunk_files(self):
31 sr = 16000
32 with tempfile.TemporaryDirectory() as tmp:
33 src = Path(tmp) / 'hum.wav'
34 out = Path(tmp) / 'chunks'
35 y = np.concatenate([
36 np.zeros(sr),
37 0.2 * np.sin(2 * np.pi * 330 * np.linspace(0, 4, sr * 4, endpoint=False)),
38 np.zeros(sr),
39 ]).astype(np.float32)
40 sf.write(src, y, sr)
41 chunks = voice_to_chunks(str(src), str(out), target_chunk_sec=3.0, stride_sec=2.0, min_voiced_sec=2.0, sr=sr)
42 self.assertGreaterEqual(len(chunks), 1)
43 self.assertTrue(Path(chunks[0]['audio_path']).exists())
44
45
46 if __name__ == '__main__':
47 unittest.main()
1 1
2 ## 2026-06-03 voice-to-chunk and context export foundation
3
4 - 新增 `acr-engine/src/data/voice_chunker.py`,支持 voice / humming 音频切 chunk。
5 - 新增 `acr-engine/scripts/build_humming_eval_manifest.py`,支持从 chunk 结果生成 `humming_real` 评测 manifest。
6 - 新增 `acr-engine/src/utils/context_exporter.py`,支持把命中的 reference window 导出成上下文 clip。
7 - 扩展 `acr-engine/src/service/app.py`,加入 `POST /recognize/voice` 接口雏形。
8 - 文档入口 `docs/README.md` 已简化为最新架构与最短阅读顺序。
9
10 Fresh evidence:
11 - `/usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v` => `Ran 7 tests, OK`
12 - 当前环境缺 `uvicorn`,服务 smoke 尚不能直接启动,需要先补运行依赖。
13
14
2 ## 2026-06-03 20-song local ACR workflow in acr-engine 15 ## 2026-06-03 20-song local ACR workflow in acr-engine
3 16
4 - 新增 `acr-engine/scripts/local_music20_acr.py`,在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。 17 - 新增 `acr-engine/scripts/local_music20_acr.py`,在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。
......
1 # ACR Docs Overview 1 # ACR Docs Overview
2 2
3 > 更新:2026-06-02 3 > 保留最新架构与最短落地入口。历史细节仍在仓库中,但默认阅读只保留下面 6 份主文档。
4 4
5 ## 一页结论 5 ## 最短阅读顺序
6 6
7 当前文档入口过多,现统一浓缩为 **5 组主文档** 7 1. [session-handoff.md](./session-handoff.md)
8 2. [CHANGELOG.md](./CHANGELOG.md)
9 3. [acr-architecture.md](./acr-architecture.md)
10 4. [dataset-spec.md](./dataset-spec.md)
11 5. [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md)
12 6. [runbook.md](./runbook.md)
8 13
9 1. **项目与架构** 14 ## 当前推荐只看这几类
10 2. **数据与评测**
11 3. **业务数据接入**
12 4. **服务与工程**
13 5. **研究与路线**
14 15
15 建议先只读这 5 组,不必一次看完全部细节文档。 16 ### 1. 项目架构
17 - [acr-architecture.md](./acr-architecture.md)
18 - [session-handoff.md](./session-handoff.md)
16 19
17 --- 20 ### 2. 数据与评测
21 - [dataset-spec.md](./dataset-spec.md)
22 - [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md)
23 - [open-dataset-workflow.md](./open-dataset-workflow.md)
18 24
19 ## 1. 文档导航图 25 ### 3. 运行与服务
26 - [runbook.md](./runbook.md)
27 - [service-api.md](./service-api.md)
20 28
21 ```mermaid 29 ### 4. 最新 hard-case 结论
22 flowchart TD 30 - [acr-hard-case-analysis.md](../acr-engine/../docs/acr-hard-case-analysis.md)
23 A[Docs Entry] --> B[Project Responsibility]
24 A --> C[Architecture]
25 A --> D[Dataset Spec]
26 A --> E[Business Export Chain]
27 A --> F[Service API]
28 A --> G[Industrial Benchmark]
29 A --> H[Industrialization Roadmap]
30 A --> I[Licensing & Sources]
31 A --> J[SOTA Research]
32 31
33 B --> C 32 ## 当前架构一句话
34 C --> D
35 D --> E
36 E --> F
37 G --> H
38 I --> H
39 J --> H
40 ```
41 33
42 --- 34 - `/workspace`:样本与素材来源
43 35 - `acr-engine/`:训练、索引、识别、服务主工程
44 ## 2. 浓缩阅读入口 36 - 本地小样本验证:优先 **FAISS**
45 37 - 生产向量检索:统一 **pgvector**
46 | 读者角色 | 建议先读 |
47 |---|---|
48 | 新成员 | [项目与架构](./project-responsibility-map.md), [系统架构](./acr-architecture.md) |
49 | 算法/模型 | [数据规范](./dataset-spec.md), [SOTA 调研](./sota-research-2026.md) |
50 | 平台/后端 | [服务接口](./service-api.md), [评测规范](./industrial-benchmark-spec.md) |
51 | 数据接入 | [开放数据工作流](./open-dataset-workflow.md), [业务导出 Cookbook](./business-export-cookbook.md) |
52 | 负责人/规划 | [工业化路线](./industrialization-roadmap.md), [交接文档](./session-handoff.md) |
53
54 ---
55
56 ## 2.5 新 session 最短阅读顺序
57
58 如果是新 session 接手,建议直接按这个顺序:
59
60 1. [持续开发交接文档](./session-handoff.md)
61 2. [更新记录](./CHANGELOG.md)
62 3. [业务导出 Cookbook](./business-export-cookbook.md)[开放数据工作流](./open-dataset-workflow.md)
63
64 选择规则:
65 - 做你们自己的业务素材接入:先读 `business-export-cookbook.md`
66 - 做 FMA / MTG-Jamendo 这类开放数据:先读 `open-dataset-workflow.md`
67
68 ## 2.6 新 session 最短可跑命令
69
70 如果你只是想先确认“业务导出链还能不能跑”,直接执行:
71
72 ```bash
73 cd /workspace/acr-engine
74 /usr/local/miniconda3/bin/python scripts/business_export_offline_smoke.py \
75 --output-root /tmp/business_export_offline_smoke
76 ```
77
78 预期结果:
79 - 生成业务导出样例
80 - 生成 manifest-ready JSONL
81 - 生成项目 `catalog/train/test/val`
82 - `train.py --dry-run` 通过
83
84 ## 3. 主文档分组
85
86 ### A. 项目与架构
87 - [项目职责图](./project-responsibility-map.md)
88 - [系统架构](./acr-architecture.md)
89
90 ### B. 数据与评测
91 - [数据规范](./dataset-spec.md)
92 - [开放数据工作流](./open-dataset-workflow.md)
93 - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md)
94 - [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md)
95 - [数据来源与接入](./dataset-sources-and-licensing.md)
96 - [工业评测规范](./industrial-benchmark-spec.md)
97
98 快速落地入口:
99 - [开放数据工作流](./open-dataset-workflow.md)
100 - [本地开放数据落点目录](../acr-engine/data/raw/README.md)
101 - 离线 smoke 已验证:`acr-engine/scripts/business_export_offline_smoke.py`
102
103 ### C. 业务数据接入
104 - [业务素材类型与 Bucket 指南](./business-music-bucket-and-type-guide.md)
105 - [业务 Manifest 与 Type-Role 规范](./business-manifest-and-type-role-spec.md)
106 - [业务导出 Cookbook](./business-export-cookbook.md)
107 - [业务数据到项目 Manifest 适配](./business-project-manifest-adapter.md)
108
109 业务数据最短链:
110 1. [业务导出 Cookbook](./business-export-cookbook.md)
111 2. `acr-engine/scripts/normalize_business_export.py`
112 3. `acr-engine/scripts/split_business_manifest_ready.py`
113 4. `acr-engine/scripts/build_business_project_manifests.py`
114 5. `acr-engine/scripts/business_export_offline_smoke.py`
115
116 ### D. 服务与工程
117 - [服务接口](./service-api.md)
118 - [持续开发交接文档](./session-handoff.md)
119 - [当前能力地图](./current-capability-map.md)
120 - [首次启动检查清单](../acr-engine/FIRST_RUN_CHECKLIST.md)
121 - [更新记录](./CHANGELOG.md)
122
123 ### E. 研究与路线
124 - [工业化路线](./industrialization-roadmap.md)
125 - [SOTA 调研](./sota-research-2026.md)
126 - [引用来源总表](./references-and-sources.md)
127
128 ---
129
130 ## 4. 文字说明
131
132 现在开始减少“同层重复文档”的阅读成本:
133 - 先从入口页做分组
134 - 再在每组里保留 1~3 份主文档
135 - 次级细节尽量放到组内,而不是继续横向扩张文件数量
136
137 ---
138
139 ## 5. 细节附录
140
141 建议使用方式:
142 - 想了解项目先读 [项目职责图](./project-responsibility-map.md) + [系统架构](./acr-architecture.md)
143 - 想训练/评测先读 [数据规范](./dataset-spec.md)
144 - 想接开放数据先读 [数据来源与接入](./dataset-sources-and-licensing.md)
145 - 想看历史演进再读 [更新记录](./CHANGELOG.md)
146
147 ## Sources
148 - This file is an internal documentation navigation artifact for the current repo state.
......