Route voice recognition through the workspace music20 corpus

Constraint: external voice uploads now need a business-sample-backed path before any pgvector production cutover, while still staying lightweight enough for CPU smoke tests Rejected: waiting for full pgvector service integration before proving a business-corpus path | would leave the external voice interface unvalidated against real sample references Confidence: medium Scope-risk: moderate Directive: treat workspace_music20 as a proving lane only; validate business top1 correctness before promoting its defaults or claiming production readiness Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/service_voice_smoke.py -> status ok, corpus=workspace_music20, chunk_count=1, top_song_id=109, has_context=true Not-tested: pgvector-backed /recognize/voice production retrieval path

Route voice recognition through the workspace music20 corpus
Constraint: external voice uploads now need a business-sample-backed path before any pgvector production cutover, while still staying lightweight enough for CPU smoke tests Rejected: waiting for full pgvector service integration before proving a business-corpus path | would leave the external voice interface unvalidated against real sample references Confidence: medium Scope-risk: moderate Directive: treat workspace_music20 as a proving lane only; validate business top1 correctness before promoting its defaults or claiming production readiness Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/service_voice_smoke.py -> status ok, corpus=workspace_music20, chunk_count=1, top_song_id=109, has_context=true Not-tested: pgvector-backed /recognize/voice production retrieval path
cnb.bofCdSsphPA
Commit 356053b7 ... 356053b724a8ac7522a9fe46509121ab00632715 authored 2026-06-03 18:07:28 +0800 by cnb.bofCdSsphPA
Showing 4 changed files with 11 additions and 10 deletions
acr-engine/scripts/service_voice_smoke.py
acr-engine/src/service/app.py
docs/release-checklist.md
docs/session-handoff.md
--- a/acr-engine/scripts/service_voice_smoke.py
View file @356053b
+++ b/acr-engine/scripts/service_voice_smoke.py
View file @356053b
@@ -16,18 +16,16 @@ def post_multipart(url: str, file_path: Path):
    body = (
        f'--{boundary}\r\n'
        f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n'
-        f'Content-Type: audio/wav\r\n\r\n'
+        f'Content-Type: audio/mpeg\r\n\r\n'
    ).encode('utf-8') + data + f'\r\n--{boundary}--\r\n'.encode('utf-8')
-    req = Request(url + '?top_n=1&max_chunks=1&include_context=false', data=body, method='POST')
+    req = Request(url + '?top_n=1&max_chunks=1&include_context=true&corpus=workspace_music20', data=body, method='POST')
    req.add_header('Content-Type', f'multipart/form-data; boundary={boundary}')
-    with urlopen(req, timeout=20) as resp:
+    with urlopen(req, timeout=60) as resp:
        return json.loads(resp.read().decode('utf-8'))
 def main():
-    cmd = [
+    cmd = ['/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000']
-        '/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000'
-    ]
    proc = subprocess.Popen(cmd, cwd='/root/vprecog/acr-engine', stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
    query = Path('/workspace/downloads/111/type_7/75cd601b-7604-4b37-8132-cfab39e7c644.mp3')
    try:
@@ -35,11 +33,14 @@ def main():
            time.sleep(0.5)
            try:
                result = post_multipart(BASE + '/recognize/voice', query)
+                top = result.get('candidates', [{}])[0] if result.get('candidates') else {}
                print(json.dumps({
                    'status': 'ok',
+                    'corpus': result.get('corpus'),
                    'chunk_count': result.get('chunk_count'),
-                    'top_song_id': result.get('candidates', [{}])[0].get('song_id') if result.get('candidates') else None,
+                    'top_song_id': top.get('song_id'),
-                    'has_context': bool(result.get('candidates', [{}])[0].get('context_clip')) if result.get('candidates') else False,
+                    'has_context': bool(top.get('context_clip')),
+                    'reference_audio_path': top.get('reference_audio_path'),
                }, ensure_ascii=False, indent=2))
                return
            except Exception:
--- a/acr-engine/src/service/app.py
View file @356053b
+++ b/acr-engine/src/service/app.py
View file @356053b
--- a/docs/release-checklist.md
View file @356053b
+++ b/docs/release-checklist.md
View file @356053b
@@ -24,7 +24,7 @@ flowchart TD
 | benchmark report 已生成 |  |
 | model card 已生成 |  |
 | license registry 已更新 |  |
-| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns, but still bound to synthetic service index rather than business reference corpus |
+| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but business top1 correctness still needs manual/metric validation |
 | dataset whitelist 已确认 |  |
 | changelog 已更新 | yes |
 | architect review completed | yes (approved with watch) |
--- a/docs/session-handoff.md
View file @356053b
+++ b/docs/session-handoff.md
View file @356053b
@@ -30,7 +30,7 @@
  - `acr-engine/src/service/app.py` 已新增 `POST /recognize/voice`
  - `/health` 可正常启动并返回 `ok`
  - architect review: approved with watch；当前 split（本地 FAISS / 可选 ChromaDB / 生产 pgvector）方向成立
-  - 当前 `POST /recognize/voice` 已跨过依赖缺失与超时阶段：CPU 版 `torch` 已安装、`uvicorn` / `fastapi` / `python-multipart` 已安装、`/health` 可返回 `ok`，voice smoke 已返回 payload（`chunk_count=1`, `top_song_id=song_0022`, `has_context=false`）；当前剩余问题是服务默认仍绑定 synthetic 索引语义，尚未切到 `/workspace` 业务曲库 reference
+  - 当前 `POST /recognize/voice` 已跨过依赖缺失与超时阶段：CPU 版 `torch` 已安装、`uvicorn` / `fastapi` / `python-multipart` 已安装、`/health` 可返回 `ok`；同时 voice smoke 已切到 `corpus=workspace_music20`，返回 `chunk_count=1`, `top_song_id=109`, `has_context=true`，并附带真实 `/workspace` reference 路径。当前剩余问题是继续校验该 top1 是否与业务预期一致，而不是链路未通。
 - 当前 docs 已做第一轮简化：
  - `docs/README.md` 只保留最新架构与最短阅读顺序