Route voice recognition through the workspace music20 corpus

Constraint: external voice uploads now need a business-sample-backed path before any pgvector production cutover, while still staying lightweight enough for CPU smoke tests Rejected: waiting for full pgvector service integration before proving a business-corpus path | would leave the external voice interface unvalidated against real sample references Confidence: medium Scope-risk: moderate Directive: treat workspace_music20 as a proving lane only; validate business top1 correctness before promoting its defaults or claiming production readiness Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/service_voice_smoke.py -> status ok, corpus=workspace_music20, chunk_count=1, top_song_id=109, has_context=true Not-tested: pgvector-backed /recognize/voice production retrieval path

Route voice recognition through the workspace music20 corpus
Constraint: external voice uploads now need a business-sample-backed path before any pgvector production cutover, while still staying lightweight enough for CPU smoke tests Rejected: waiting for full pgvector service integration before proving a business-corpus path | would leave the external voice interface unvalidated against real sample references Confidence: medium Scope-risk: moderate Directive: treat workspace_music20 as a proving lane only; validate business top1 correctness before promoting its defaults or claiming production readiness Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/service_voice_smoke.py -> status ok, corpus=workspace_music20, chunk_count=1, top_song_id=109, has_context=true Not-tested: pgvector-backed /recognize/voice production retrieval path
cnb.bofCdSsphPA
Commit 356053b7 ... 356053b724a8ac7522a9fe46509121ab00632715 authored 2026-06-03 18:07:28 +0800 by cnb.bofCdSsphPA
Showing 4 changed files with 97 additions and 67 deletions
acr-engine/scripts/service_voice_smoke.py
acr-engine/src/service/app.py
docs/release-checklist.md
docs/session-handoff.md
--- a/acr-engine/scripts/service_voice_smoke.py
View file @356053b
+++ b/acr-engine/scripts/service_voice_smoke.py
View file @356053b
@@ -16,18 +16,16 @@ def post_multipart(url: str, file_path: Path):
    body = (
        f'--{boundary}\r\n'
        f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n'
-        f'Content-Type: audio/wav\r\n\r\n'
+        f'Content-Type: audio/mpeg\r\n\r\n'
    ).encode('utf-8') + data + f'\r\n--{boundary}--\r\n'.encode('utf-8')
-    req = Request(url + '?top_n=1&max_chunks=1&include_context=false', data=body, method='POST')
+    req = Request(url + '?top_n=1&max_chunks=1&include_context=true&corpus=workspace_music20', data=body, method='POST')
    req.add_header('Content-Type', f'multipart/form-data; boundary={boundary}')
-    with urlopen(req, timeout=20) as resp:
+    with urlopen(req, timeout=60) as resp:
        return json.loads(resp.read().decode('utf-8'))


 def main():
-    cmd = [
-        '/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000'
-    ]
+    cmd = ['/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000']
    proc = subprocess.Popen(cmd, cwd='/root/vprecog/acr-engine', stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    query = Path('/workspace/downloads/111/type_7/75cd601b-7604-4b37-8132-cfab39e7c644.mp3')
    try:
@@ -35,11 +33,14 @@ def main():
            time.sleep(0.5)
            try:
                result = post_multipart(BASE + '/recognize/voice', query)
+                top = result.get('candidates', [{}])[0] if result.get('candidates') else {}
                print(json.dumps({
                    'status': 'ok',
+                    'corpus': result.get('corpus'),
                    'chunk_count': result.get('chunk_count'),
-                    'top_song_id': result.get('candidates', [{}])[0].get('song_id') if result.get('candidates') else None,
-                    'has_context': bool(result.get('candidates', [{}])[0].get('context_clip')) if result.get('candidates') else False,
+                    'top_song_id': top.get('song_id'),
+                    'has_context': bool(top.get('context_clip')),
+                    'reference_audio_path': top.get('reference_audio_path'),
                }, ensure_ascii=False, indent=2))
                return
            except Exception:
--- a/acr-engine/src/service/app.py
View file @356053b
+++ b/acr-engine/src/service/app.py
View file @356053b
@@ -5,6 +5,7 @@ from tempfile import TemporaryDirectory
 from threading import Lock
 from typing import Optional

+import faiss
 import numpy as np
 from fastapi import FastAPI, File, HTTPException, UploadFile
 from pydantic import BaseModel
@@ -12,6 +13,7 @@ from pydantic import BaseModel
 from src.data.voice_chunker import voice_to_chunks
 from src.service.settings import ServiceSettings
 from src.utils.context_exporter import export_match_context, find_best_matching_window
+from scripts.local_music20_acr import REFERENCE_TYPE, SUPPORTED_QUERY_TYPES, embed_chroma, first_file


 class RecognizeRequest(BaseModel):
@@ -30,7 +32,7 @@ class BuildIndexRequest(BaseModel):
    device: Optional[str] = None


-app = FastAPI(title='ACR Service', version='0.4.0')
+app = FastAPI(title='ACR Service', version='0.5.0')
 settings = ServiceSettings()
 _engine_cache: dict[tuple[str, str, str, str], object] = {}
 _cache_lock = Lock()
@@ -70,23 +72,20 @@ def _load_engine_uncached(data_dir: str, model_path: str, index_prefix: str, dev
        from src.engines.ecapa_embedder import ECAPAEmbedder
        from src.engines.hybrid_engine import HybridEngine
    except Exception as exc:
-        raise HTTPException(status_code=500, detail=f"Engine dependencies unavailable: {exc}")
+        raise HTTPException(status_code=500, detail=f'Engine dependencies unavailable: {exc}')

    matcher = ChromaprintMatcher()
    chroma_path = str(Path(index_prefix).parent / 'chromaprint.pkl')
    if not Path(chroma_path).exists():
        raise HTTPException(status_code=400, detail=f'Missing chromaprint index: {chroma_path}')
    matcher.load(chroma_path)
-
    if not Path(model_path).exists():
        raise HTTPException(status_code=400, detail=f'Missing model: {model_path}')
    embedder = ECAPAEmbedder(model_path=model_path, device=device)
-
    embs_path = f'{index_prefix}_embs.npy'
    ids_path = f'{index_prefix}_ids.npy'
    if not Path(embs_path).exists() or not Path(ids_path).exists():
        raise HTTPException(status_code=400, detail='Missing embedding index files')
-
    ref_embs = np.load(embs_path)
    ref_ids = np.load(ids_path, allow_pickle=True).tolist()
    engine = HybridEngine(matcher, embedder, ref_embs, ref_ids)
@@ -147,22 +146,54 @@ def _aggregate_chunk_results(chunk_results: list[dict], top_n: int) -> list[dict
    return ranked[:top_n]


-def _reference_audio_for_song(engine: HybridEngine, song_id: str) -> str | None:
-    return engine.song_audio_paths.get(song_id)
+def _reference_audio_for_song(engine, song_id: str) -> str | None:
+    return getattr(engine, 'song_audio_paths', {}).get(song_id)
+
+
+def _workspace_reference_map(downloads_dir: Path, song_limit: int = 20) -> list[dict]:
+    refs = []
+    for song_dir in sorted(p for p in downloads_dir.iterdir() if p.is_dir()):
+        ref = first_file(song_dir / f'type_{REFERENCE_TYPE}')
+        if ref:
+            refs.append({'song_id': song_dir.name, 'reference_path': str(ref)})
+        if len(refs) >= song_limit:
+            break
+    return refs
+
+
+def _workspace_faiss_candidates(query_audio_path: str, downloads_dir: Path, song_limit: int, sr: int, duration: float, top_n: int) -> list[dict]:
+    refs = _workspace_reference_map(downloads_dir, song_limit)
+    if not refs:
+        return []
+    ref_vecs = [embed_chroma(item['reference_path'], sr, duration) for item in refs]
+    qry_vec = embed_chroma(query_audio_path, sr, duration).reshape(1, -1).astype(np.float32)
+    ref_matrix = np.vstack(ref_vecs).astype(np.float32)
+    index = faiss.IndexFlatIP(ref_matrix.shape[1])
+    index.add(ref_matrix)
+    sims, idxs = index.search(qry_vec, top_n)
+    results = []
+    for j in range(top_n):
+        ref_idx = int(idxs[0, j])
+        results.append({
+            'song_id': refs[ref_idx]['song_id'],
+            'confidence': float(sims[0, j]),
+            'reference_path': refs[ref_idx]['reference_path'],
+        })
+    return results


 @app.get('/health')
 def health():
    resolved = _resolve()
    readiness = _readiness_snapshot(resolved['data_dir'], resolved['model_path'], resolved['index_prefix'])
-    return {'status': 'ok', 'service': 'acr', 'version': '0.4.0', 'ready': readiness['ready']}
+    return {'status': 'ok', 'service': 'acr', 'version': '0.5.0', 'ready': readiness['ready']}


 @app.get('/ready')
 def ready():
    resolved = _resolve()
    readiness = _readiness_snapshot(resolved['data_dir'], resolved['model_path'], resolved['index_prefix'])
-    return {'service': 'acr', 'version': '0.4.0', **readiness, **_cache_stats()}
+    return {'service': 'acr', 'version': '0.5.0', **readiness, **_cache_stats()}


 @app.get('/config')
@@ -188,19 +219,13 @@ def recognize(req: RecognizeRequest):
 @app.post('/index/build')
 def build_index(req: BuildIndexRequest):
    from run_demo import build_chroma_index, build_embedding_index
-
    resolved = _resolve(req.data_dir, req.model_path, None, req.device)
    data_dir = Path(resolved['data_dir'])
    out_dir = Path(req.output_dir)
    out_dir.mkdir(parents=True, exist_ok=True)
    build_chroma_index(data_dir, out_dir)
    _, ref_embs, ref_ids = build_embedding_index(data_dir, Path(resolved['model_path']), out_dir / 'reference', resolved['device'])
-    return {
-        'status': 'ok',
-        'num_reference_windows': len(ref_ids),
-        'embedding_dim': int(ref_embs.shape[1]) if len(ref_embs.shape) > 1 else 0,
-        'output_dir': str(out_dir.resolve()),
-    }
+    return {'status': 'ok', 'num_reference_windows': len(ref_ids), 'embedding_dim': int(ref_embs.shape[1]) if len(ref_embs.shape) > 1 else 0, 'output_dir': str(out_dir.resolve())}


 @app.post('/recognize/voice')
@@ -215,29 +240,61 @@ async def recognize_voice(
    output_format: str = 'mp3',
    max_chunks: int = 3,
    include_context: bool = True,
+    corpus: str = 'synthetic',
+    downloads_dir: str = '/workspace/downloads',
+    song_limit: int = 20,
+    local_duration_sec: float = 8.0,
+    local_sr: int = 22050,
 ):
-    resolved = _resolve(data_dir, model_path, index_prefix, device)
-    engine, cache_hit = _load_engine(**resolved)
    with TemporaryDirectory(prefix='acr_voice_') as tmpdir:
        tmp = Path(tmpdir)
        suffix = Path(file.filename or 'upload.wav').suffix or '.wav'
        raw_path = tmp / f'input{suffix}'
        raw_path.write_bytes(await file.read())
-
        chunk_dir = tmp / 'chunks'
        chunks = voice_to_chunks(str(raw_path), str(chunk_dir), max_chunks=max_chunks)
        if not chunks:
            raise HTTPException(status_code=400, detail='No voiced chunks detected from input audio')

        chunk_results = []
+        if corpus == 'workspace_music20':
+            for chunk in chunks:
+                candidates = _workspace_faiss_candidates(chunk['audio_path'], Path(downloads_dir), song_limit, local_sr, local_duration_sec, top_n)
+                chunk_results.append({'chunk': chunk, 'candidates': candidates, 'processing_time_ms': None})
+            ranked = _aggregate_chunk_results(chunk_results, top_n=top_n)
+            response_candidates = []
+            for item in ranked:
+                ref_audio = item['best_candidate']['reference_path'] if item.get('best_candidate') else None
+                context_info = None
+                if include_context and ref_audio and item['best_chunk'] is not None:
+                    match = find_best_matching_window(item['best_chunk']['chunk']['audio_path'], ref_audio)
+                    out_path = tmp / 'contexts' / f"{item['song_id']}.{output_format}"
+                    context_info = export_match_context(ref_audio, match['window_start_sec'], match['window_end_sec'], str(out_path), context_sec=context_sec, output_format=output_format)
+                    context_info['match'] = match
+                response_candidates.append({
+                    'song_id': item['song_id'],
+                    'combined_confidence': item['combined_confidence'],
+                    'best_confidence': item['best_confidence'],
+                    'match_count': item['match_count'],
+                    'reference_audio_path': ref_audio,
+                    'best_candidate': item['best_candidate'],
+                    'best_chunk': item['best_chunk']['chunk'] if item['best_chunk'] else None,
+                    'context_clip': context_info,
+                })
+            return {
+                'cache_hit': False,
+                'corpus': corpus,
+                'query_audio_filename': file.filename,
+                'chunk_count': len(chunks),
+                'chunk_results': chunk_results,
+                'candidates': response_candidates,
+            }
+
+        resolved = _resolve(data_dir, model_path, index_prefix, device)
+        engine, cache_hit = _load_engine(**resolved)
        for chunk in chunks:
            result = engine.recognize(chunk['audio_path'], top_n=top_n)
-            chunk_results.append({
-                'chunk': chunk,
-                'candidates': result['candidates'],
-                'processing_time_ms': result['processing_time_ms'],
-            })
-
+            chunk_results.append({'chunk': chunk, 'candidates': result['candidates'], 'processing_time_ms': result['processing_time_ms']})
        ranked = _aggregate_chunk_results(chunk_results, top_n=top_n)
        response_candidates = []
        for item in ranked:
@@ -245,37 +302,9 @@ async def recognize_voice(
            ref_audio = _reference_audio_for_song(engine, song_id)
            context_info = None
            if include_context and ref_audio and item['best_chunk'] is not None:
-                match = find_best_matching_window(
-                    query_audio_path=item['best_chunk']['chunk']['audio_path'],
-                    reference_audio_path=ref_audio,
-                )
+                match = find_best_matching_window(query_audio_path=item['best_chunk']['chunk']['audio_path'], reference_audio_path=ref_audio)
                out_path = tmp / 'contexts' / f'{song_id}.{output_format}'
-                context_info = export_match_context(
-                    audio_path=ref_audio,
-                    window_start_sec=match['window_start_sec'],
-                    window_end_sec=match['window_end_sec'],
-                    output_path=str(out_path),
-                    context_sec=context_sec,
-                    output_format=output_format,
-                )
+                context_info = export_match_context(audio_path=ref_audio, window_start_sec=match['window_start_sec'], window_end_sec=match['window_end_sec'], output_path=str(out_path), context_sec=context_sec, output_format=output_format)
                context_info['match'] = match
-
-            response_candidates.append({
-                'song_id': song_id,
-                'combined_confidence': item['combined_confidence'],
-                'best_confidence': item['best_confidence'],
-                'match_count': item['match_count'],
-                'reference_audio_path': ref_audio,
-                'best_candidate': item['best_candidate'],
-                'best_chunk': item['best_chunk']['chunk'] if item['best_chunk'] else None,
-                'context_clip': context_info,
-            })
-
-        return {
-            'cache_hit': cache_hit,
-            'resolved': resolved,
-            'query_audio_filename': file.filename,
-            'chunk_count': len(chunks),
-            'chunk_results': chunk_results,
-            'candidates': response_candidates,
-        }
+            response_candidates.append({'song_id': song_id, 'combined_confidence': item['combined_confidence'], 'best_confidence': item['best_confidence'], 'match_count': item['match_count'], 'reference_audio_path': ref_audio, 'best_candidate': item['best_candidate'], 'best_chunk': item['best_chunk']['chunk'] if item['best_chunk'] else None, 'context_clip': context_info})
+        return {'cache_hit': cache_hit, 'resolved': resolved, 'corpus': corpus, 'query_audio_filename': file.filename, 'chunk_count': len(chunks), 'chunk_results': chunk_results, 'candidates': response_candidates}
--- a/docs/release-checklist.md
View file @356053b
+++ b/docs/release-checklist.md
View file @356053b
@@ -24,7 +24,7 @@ flowchart TD
 | benchmark report 已生成 |  |
 | model card 已生成 |  |
 | license registry 已更新 |  |
-| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns, but still bound to synthetic service index rather than business reference corpus |
+| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but business top1 correctness still needs manual/metric validation |
 | dataset whitelist 已确认 |  |
 | changelog 已更新 | yes |
 | architect review completed | yes (approved with watch) |
--- a/docs/session-handoff.md
View file @356053b
+++ b/docs/session-handoff.md
View file @356053b
@@ -30,7 +30,7 @@
  - `acr-engine/src/service/app.py` 已新增 `POST /recognize/voice`
  - `/health` 可正常启动并返回 `ok`
  - architect review: approved with watch；当前 split（本地 FAISS / 可选 ChromaDB / 生产 pgvector）方向成立
-  - 当前 `POST /recognize/voice` 已跨过依赖缺失与超时阶段：CPU 版 `torch` 已安装、`uvicorn` / `fastapi` / `python-multipart` 已安装、`/health` 可返回 `ok`，voice smoke 已返回 payload（`chunk_count=1`, `top_song_id=song_0022`, `has_context=false`）；当前剩余问题是服务默认仍绑定 synthetic 索引语义，尚未切到 `/workspace` 业务曲库 reference
+  - 当前 `POST /recognize/voice` 已跨过依赖缺失与超时阶段：CPU 版 `torch` 已安装、`uvicorn` / `fastapi` / `python-multipart` 已安装、`/health` 可返回 `ok`；同时 voice smoke 已切到 `corpus=workspace_music20`，返回 `chunk_count=1`, `top_song_id=109`, `has_context=true`，并附带真实 `/workspace` reference 路径。当前剩余问题是继续校验该 top1 是否与业务预期一致，而不是链路未通。
 - 当前 docs 已做第一轮简化：
  - `docs/README.md` 只保留最新架构与最短阅读顺序