Record the first business-corpus voice correctness check
Constraint: the repo needs to distinguish runtime success from business-level song_id correctness before any production claim Rejected: treating the workspace_music20 smoke as good enough | the current type_7 batch result is top1=0.0 and top3=0.05, which is far below a usable threshold Confidence: high Scope-risk: narrow Directive: keep all future business-corpus voice evaluations written to local_eval artifacts and mirrored into changelog/checklist/handoff before push Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; generated acr-engine/data/local_eval/voice_workspace20_type7_eval.json with num_queries=20, top1=0.0, top3=0.05 Not-tested: improved business-corpus correctness after further retrieval tuning
Showing
4 changed files
with
10 additions
and
1 deletions
This diff is collapsed.
Click to expand it.
| 1 | - 新增 `acr-engine/data/local_eval/voice_workspace20_type7_eval.json`,对当前 `workspace_music20` 语义做了 20 条 `type_7` 批量验证:`top1=0.0`、`top3=0.05`,说明业务 song_id 正确性仍明显不足。 | ||
| 2 | - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但不能把当前 business-corpus 结果视作完成。 | ||
| 1 | - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径 | 3 | - `docs/session-handoff.md` 已刷新为最新 voice service runtime 状态,明确 `/health` 可用、`/recognize/voice` 仍超时,以及下一步最短排查路径 |
| 2 | 4 | ||
| 3 | ## 2026-06-03 voice-to-chunk and context export foundation | 5 | ## 2026-06-03 voice-to-chunk and context export foundation | ... | ... |
| ... | @@ -24,7 +24,7 @@ flowchart TD | ... | @@ -24,7 +24,7 @@ flowchart TD |
| 24 | | benchmark report 已生成 | | | 24 | | benchmark report 已生成 | | |
| 25 | | model card 已生成 | | | 25 | | model card 已生成 | | |
| 26 | | license registry 已更新 | | | 26 | | license registry 已更新 | | |
| 27 | | service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but business top1 correctness still needs manual/metric validation | | 27 | | service smoke test 通过 | partial: `/health` OK, `/recognize/voice` payload returns against `workspace_music20`, but batch validation is currently poor (`type_7 top1=0.0`, `top3=0.05`) | |
| 28 | | dataset whitelist 已确认 | | | 28 | | dataset whitelist 已确认 | | |
| 29 | | changelog 已更新 | yes | | 29 | | changelog 已更新 | yes | |
| 30 | | architect review completed | yes (approved with watch) | | 30 | | architect review completed | yes (approved with watch) | | ... | ... |
| ... | @@ -46,6 +46,13 @@ | ... | @@ -46,6 +46,13 @@ |
| 46 | 3. 把哼唱评测集接入 `evaluate.py` 或独立评测脚本 | 46 | 3. 把哼唱评测集接入 `evaluate.py` 或独立评测脚本 |
| 47 | 4. 继续做 docs 第二轮收敛,只保留当前有效主文档 | 47 | 4. 继续做 docs 第二轮收敛,只保留当前有效主文档 |
| 48 | 48 | ||
| 49 | - 当前 `workspace_music20` 业务正确性初测(`acr-engine/data/local_eval/voice_workspace20_type7_eval.json`): | ||
| 50 | - `num_queries=20` | ||
| 51 | - `top1=0.0` | ||
| 52 | - `top3=0.05` | ||
| 53 | - 说明当前 business sample 语义虽然已通路,但 song_id 正确性还很差,必须继续优化,不可直接当成可用识别能力。 | ||
| 54 | - architect review 当前结论:`APPROVED (WATCH)`,允许继续沿当前架构推进,但需要明确区分“链路通”与“业务正确”。 | ||
| 55 | |||
| 49 | ### 最新补充(2026-06-03 voice service runtime) | 56 | ### 最新补充(2026-06-03 voice service runtime) |
| 50 | 57 | ||
| 51 | - 已确认当前解释器 `/usr/local/miniconda3/bin/python` 下: | 58 | - 已确认当前解释器 `/usr/local/miniconda3/bin/python` 下: | ... | ... |
-
Please register or sign in to post a comment