Commit a0ceb991 a0ceb99163204d4f7cf57634c2319a8aa613335d by cnb.bofCdSsphPA

Extend the business-corpus voice correctness baseline to type8 and type16

Constraint: we need a complete hard-query picture before claiming the workspace_music20 voice lane is usable or deciding where pgvector work should start
Rejected: extrapolating from type_7 alone | the type_8 and type_16 lanes can fail differently and need their own measured baselines
Confidence: high
Scope-risk: narrow
Directive: keep all future business-corpus voice evaluations split by query type so we can see exactly which hard lanes fail and why
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated voice_workspace20_type8_eval.json (top1=0.0, top3=0.0) and voice_workspace20_type16_eval.json (top1=0.0, top3=0.0)
Not-tested: improved business-corpus voice correctness after moving to embedding/pgvector retrieval
1 parent 5a01ab7f
1 - 新增 `acr-engine/data/local_eval/voice_workspace20_type7_eval.json`,对当前 `workspace_music20` 语义做了 20 条 `type_7` 批量验证:`top1=0.0``top3=0.05`,说明业务 song_id 正确性仍明显不足。 1 - 新增 `acr-engine/data/local_eval/voice_workspace20_type7_eval.json`,对当前 `workspace_music20` 语义做了 20 条 `type_7` 批量验证:`top1=0.0``top3=0.05`,说明业务 song_id 正确性仍明显不足。
2 - 新增 `acr-engine/data/local_eval/voice_workspace20_type8_eval.json``voice_workspace20_type16_eval.json`,补充 business-corpus voice correctness 基线:`type_8 top1=0.0/top3=0.0``type_16 top1=0.0/top3=0.0`
2 - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但不能把当前 business-corpus 结果视作完成。 3 - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但不能把当前 business-corpus 结果视作完成。
3 - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径 4 - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径
4 5
......
...@@ -24,7 +24,7 @@ flowchart TD ...@@ -24,7 +24,7 @@ flowchart TD
24 | benchmark report 已生成 | | 24 | benchmark report 已生成 | |
25 | model card 已生成 | | 25 | model card 已生成 | |
26 | license registry 已更新 | | 26 | license registry 已更新 | |
27 | service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0`, `top3=0.05`) | 27 | service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0/top3=0.05`, `type_8 top1=0.0/top3=0.0`, `type_16 top1=0.0/top3=0.0`) |
28 | dataset whitelist 已确认 | | 28 | dataset whitelist 已确认 | |
29 | changelog 已更新 | yes | 29 | changelog 已更新 | yes |
30 | architect review completed | yes (approved with watch) | 30 | architect review completed | yes (approved with watch) |
......
...@@ -51,6 +51,10 @@ ...@@ -51,6 +51,10 @@
51 - `top1=0.0` 51 - `top1=0.0`
52 - `top3=0.05` 52 - `top3=0.05`
53 - 说明当前 business sample 语义虽然已通路,但 song_id 正确性还很差,必须继续优化,不可直接当成可用识别能力。 53 - 说明当前 business sample 语义虽然已通路,但 song_id 正确性还很差,必须继续优化,不可直接当成可用识别能力。
54 - 当前已继续补齐 `type_8 / type_16` 的 business-corpus voice correctness 基线:
55 - `voice_workspace20_type8_eval.json`: `num_queries=15`, `top1=0.0`, `top3=0.0`
56 - `voice_workspace20_type16_eval.json`: `num_queries=12`, `top1=0.0`, `top3=0.0`
57 - 说明当前基于 `/workspace` 的本地 chroma+FAISS voice lane 在 hard query 上几乎不可用,后续应优先切向更接近生产的 embedding/pgvector 评测路径。
54 - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但需要明确区分“链路通”与“业务正确”。 58 - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但需要明确区分“链路通”与“业务正确”。
55 59
56 ### 最新补充(2026-06-03 voice service runtime) 60 ### 最新补充(2026-06-03 voice service runtime)
......