Commit 5a01ab7f 5a01ab7faa1dad545daa57c00dd9f624249fb76f by cnb.bofCdSsphPA

Record the first business-corpus voice correctness check

Constraint: the repo needs to distinguish runtime success from business-level song_id correctness before any production claim
Rejected: treating the workspace_music20 smoke as good enough | the current type_7 batch result is top1=0.0 and top3=0.05, which is far below a usable threshold
Confidence: high
Scope-risk: narrow
Directive: keep all future business-corpus voice evaluations written to local_eval artifacts and mirrored into changelog/checklist/handoff before push
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated acr-engine/data/local_eval/voice_workspace20_type7_eval.json with num_queries=20, top1=0.0, top3=0.05
Not-tested: improved business-corpus correctness after further retrieval tuning
1 parent 356053b7
1 - 新增 `acr-engine/data/local_eval/voice_workspace20_type7_eval.json`,对当前 `workspace_music20` 语义做了 20 条 `type_7` 批量验证:`top1=0.0``top3=0.05`,说明业务 song_id 正确性仍明显不足。
2 - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但不能把当前 business-corpus 结果视作完成。
1 - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径 3 - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径
2 4
3 ## 2026-06-03 voice-to-chunk and context export foundation 5 ## 2026-06-03 voice-to-chunk and context export foundation
......
...@@ -24,7 +24,7 @@ flowchart TD ...@@ -24,7 +24,7 @@ flowchart TD
24 | benchmark report 已生成 | | 24 | benchmark report 已生成 | |
25 | model card 已生成 | | 25 | model card 已生成 | |
26 | license registry 已更新 | | 26 | license registry 已更新 | |
27 | service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but business top1 correctness still needs manual/metric validation | 27 | service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0`, `top3=0.05`) |
28 | dataset whitelist 已确认 | | 28 | dataset whitelist 已确认 | |
29 | changelog 已更新 | yes | 29 | changelog 已更新 | yes |
30 | architect review completed | yes (approved with watch) | 30 | architect review completed | yes (approved with watch) |
......
...@@ -46,6 +46,13 @@ ...@@ -46,6 +46,13 @@
46 3. 把哼唱评测集接入 `evaluate.py` 或独立评测脚本 46 3. 把哼唱评测集接入 `evaluate.py` 或独立评测脚本
47 4. 继续做 docs 第二轮收敛,只保留当前有效主文档 47 4. 继续做 docs 第二轮收敛,只保留当前有效主文档
48 48
49 - 当前 `workspace_music20` 业务正确性初测(`acr-engine/data/local_eval/voice_workspace20_type7_eval.json`):
50 - `num_queries=20`
51 - `top1=0.0`
52 - `top3=0.05`
53 - 说明当前 business sample 语义虽然已通路,但 song_id 正确性还很差,必须继续优化,不可直接当成可用识别能力。
54 - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但需要明确区分“链路通”与“业务正确”。
55
49 ### 最新补充(2026-06-03 voice service runtime) 56 ### 最新补充(2026-06-03 voice service runtime)
50 57
51 - 已确认当前解释器 `/usr/local/miniconda3/bin/python` 下: 58 - 已确认当前解释器 `/usr/local/miniconda3/bin/python` 下:
......