Commit 5a01ab7f 5a01ab7faa1dad545daa57c00dd9f624249fb76f by cnb.bofCdSsphPA

Record the first business-corpus voice correctness check

Constraint: the repo needs to distinguish runtime success from business-level song_id correctness before any production claim
Rejected: treating the workspace_music20 smoke as good enough | the current type_7 batch result is top1=0.0 and top3=0.05, which is far below a usable threshold
Confidence: high
Scope-risk: narrow
Directive: keep all future business-corpus voice evaluations written to local_eval artifacts and mirrored into changelog/checklist/handoff before push
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated acr-engine/data/local_eval/voice_workspace20_type7_eval.json with num_queries=20, top1=0.0, top3=0.05
Not-tested: improved business-corpus correctness after further retrieval tuning
1 parent 356053b7
- 新增 `acr-engine/data/local_eval/voice_workspace20_type7_eval.json`,对当前 `workspace_music20` 语义做了 20 条 `type_7` 批量验证:`top1=0.0``top3=0.05`,说明业务 song_id 正确性仍明显不足。
- architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但不能把当前 business-corpus 结果视作完成。
- `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径
## 2026-06-03 voice-to-chunk and context export foundation
......
......@@ -24,7 +24,7 @@ flowchart TD
| benchmark report 已生成 | |
| model card 已生成 | |
| license registry 已更新 | |
| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but business top1 correctness still needs manual/metric validation |
| service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0`, `top3=0.05`) |
| dataset whitelist 已确认 | |
| changelog 已更新 | yes |
| architect review completed | yes (approved with watch) |
......
......@@ -46,6 +46,13 @@
3. 把哼唱评测集接入 `evaluate.py` 或独立评测脚本
4. 继续做 docs 第二轮收敛,只保留当前有效主文档
- 当前 `workspace_music20` 业务正确性初测(`acr-engine/data/local_eval/voice_workspace20_type7_eval.json`):
- `num_queries=20`
- `top1=0.0`
- `top3=0.05`
- 说明当前 business sample 语义虽然已通路,但 song_id 正确性还很差,必须继续优化,不可直接当成可用识别能力。
- architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但需要明确区分“链路通”与“业务正确”。
### 最新补充(2026-06-03 voice service runtime)
- 已确认当前解释器 `/usr/local/miniconda3/bin/python` 下:
......