Commit 9de8092d 9de8092d5982e71e52d18fd3247b5eb11d2ca4eb by cnb.bofCdSsphPA

Freeze the production encoder before scaling the music index

Document the production decision to stabilize the embedding space before onboarding a 300k-song catalog, and record the migration rules for future encoder upgrades.

Constraint: 300k-song production rollout makes embedding churn expensive and risky
Rejected: keep iterating encoder before defining a production embedding version | would force repeated full-vector rebuilds and unstable rollout criteria
Confidence: high
Scope-risk: narrow
Directive: Treat encoder changes as versioned index migrations, not in-place model swaps
Tested: reviewed rendered markdown content, docs index link, changelog entry, and git diff for the three touched docs
Not-tested: git push / remote sync outcome depends on repository remote state
1 parent 73d28fae
### Stage: production encoder freeze FAQ and rollout guidance
完成项:
- 新增文档:
- `docs/production-encoder-freeze-and-embedding-strategy.md`
- 文档内容覆盖:
- 为什么当前应先冻结 encoder
- 当前结构的泛化能力边界
- 外置模型权重后如何给其他歌曲直接使用
- wav/mp3/flac/ogg 集合如何快速进入 manifest -> build-index -> evaluate 链路
- 30 万首生产曲库下 embedding/version/index 的治理建议
- encoder 升级后哪些数据必须重建、哪些元数据可以保留
- docs 入口已补充:
- `docs/README.md` 新增该答疑文档链接
结论:
- 当前阶段先冻结 `encoder v1` 是更稳妥的生产决策
- 生产环境应把模型文件、embedding 版本、reference 索引与评测报告解耦管理
- 后续新 encoder 应走“离线 shadow build -> A/B -> 切换”的升级路径,而不是直接覆盖旧 embedding 库
## 2026-06-02 16:11 UTC / hum_guard fresh eval did not beat hum_focus
-`/tmp/dualaxis_sweep/hum_guard/eval.json` 做了最新复核
......
......@@ -91,6 +91,7 @@ cd /workspace/acr-engine
- [数据规范](./dataset-spec.md)
- [开放数据工作流](./open-dataset-workflow.md)
- [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md)
- [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md)
- [数据来源与接入](./dataset-sources-and-licensing.md)
- [工业评测规范](./industrial-benchmark-spec.md)
......