Extend the business-corpus voice correctness baseline to type8 and type16

Constraint: we need a complete hard-query picture before claiming the workspace_music20 voice lane is usable or deciding where pgvector work should start Rejected: extrapolating from type_7 alone | the type_8 and type_16 lanes can fail differently and need their own measured baselines Confidence: high Scope-risk: narrow Directive: keep all future business-corpus voice evaluations split by query type so we can see exactly which hard lanes fail and why Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated voice_workspace20_type8_eval.json (top1=0.0, top3=0.0) and voice_workspace20_type16_eval.json (top1=0.0, top3=0.0) Not-tested: improved business-corpus voice correctness after moving to embedding/pgvector retrieval

Extend the business-corpus voice correctness baseline to type8 and type16
Constraint: we need a complete hard-query picture before claiming the workspace_music20 voice lane is usable or deciding where pgvector work should start Rejected: extrapolating from type_7 alone | the type_8 and type_16 lanes can fail differently and need their own measured baselines Confidence: high Scope-risk: narrow Directive: keep all future business-corpus voice evaluations split by query type so we can see exactly which hard lanes fail and why Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated voice_workspace20_type8_eval.json (top1=0.0, top3=0.0) and voice_workspace20_type16_eval.json (top1=0.0, top3=0.0) Not-tested: improved business-corpus voice correctness after moving to embedding/pgvector retrieval
cnb.bofCdSsphPA
Commit a0ceb991 ... a0ceb99163204d4f7cf57634c2319a8aa613335d authored 2026-06-03 18:11:11 +0800 by cnb.bofCdSsphPA
Showing 5 changed files with 6 additions and 1 deletions
acr-engine/data/local_eval/voice_workspace20_type16_eval.json
acr-engine/data/local_eval/voice_workspace20_type8_eval.json
docs/CHANGELOG.md
docs/release-checklist.md
docs/session-handoff.md
--- a/acr-engine/data/local_eval/voice_workspace20_type16_eval.json 0 → 100644
View file @a0ceb99
+++ b/acr-engine/data/local_eval/voice_workspace20_type16_eval.json 0 → 100644
View file @a0ceb99
--- a/acr-engine/data/local_eval/voice_workspace20_type8_eval.json 0 → 100644
View file @a0ceb99
+++ b/acr-engine/data/local_eval/voice_workspace20_type8_eval.json 0 → 100644
View file @a0ceb99
--- a/docs/CHANGELOG.md
View file @a0ceb99
+++ b/docs/CHANGELOG.md
View file @a0ceb99
 - 新增 `acr-engine/data/local_eval/voice_workspace20_type7_eval.json`，对当前 `workspace_music20` 语义做了 20 条 `type_7` 批量验证：`top1=0.0`、`top3=0.05`，说明业务 song_id 正确性仍明显不足。
+- 新增 `acr-engine/data/local_eval/voice_workspace20_type8_eval.json` 与 `voice_workspace20_type16_eval.json`，补充 business-corpus voice correctness 基线：`type_8 top1=0.0/top3=0.0`，`type_16 top1=0.0/top3=0.0`。
 - architect review 当前结论：`APPROVED (WATCH)`，允许继续沿当前架构推进，但不能把当前 business-corpus 结果视作完成。
 - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态，明确 `/health` 可用、`/recognize/voice` 仍超时，以及下一步最短排查路径
--- a/docs/release-checklist.md
View file @a0ceb99
+++ b/docs/release-checklist.md
View file @a0ceb99
@@ -24,7 +24,7 @@ flowchart TD
 | benchmark report 已生成 |  |
 | model card 已生成 |  |
 | license registry 已更新 |  |
-| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0`, `top3=0.05`) |
+| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0/top3=0.05`, `type_8 top1=0.0/top3=0.0`, `type_16 top1=0.0/top3=0.0`) |
 | dataset whitelist 已确认 |  |
 | changelog 已更新 | yes |
 | architect review completed | yes (approved with watch) |
--- a/docs/session-handoff.md
View file @a0ceb99
+++ b/docs/session-handoff.md
View file @a0ceb99
@@ -51,6 +51,10 @@
  - `top1=0.0`
  - `top3=0.05`
  - 说明当前 business sample 语义虽然已通路，但 song_id 正确性还很差，必须继续优化，不可直接当成可用识别能力。
+- 当前已继续补齐 `type_8 / type_16` 的 business-corpus voice correctness 基线：
+  - `voice_workspace20_type8_eval.json`: `num_queries=15`, `top1=0.0`, `top3=0.0`
+  - `voice_workspace20_type16_eval.json`: `num_queries=12`, `top1=0.0`, `top3=0.0`
+  - 说明当前基于 `/workspace` 的本地 chroma+FAISS voice lane 在 hard query 上几乎不可用，后续应优先切向更接近生产的 embedding/pgvector 评测路径。
 - architect review 当前结论：`APPROVED (WATCH)`，允许继续沿当前架构推进，但需要明确区分“链路通”与“业务正确”。
 ### 最新补充（2026-06-03 voice service runtime）