Commit 9de8092d 9de8092d5982e71e52d18fd3247b5eb11d2ca4eb by cnb.bofCdSsphPA

Freeze the production encoder before scaling the music index

Document the production decision to stabilize the embedding space before onboarding a 300k-song catalog, and record the migration rules for future encoder upgrades.

Constraint: 300k-song production rollout makes embedding churn expensive and risky
Rejected: keep iterating encoder before defining a production embedding version | would force repeated full-vector rebuilds and unstable rollout criteria
Confidence: high
Scope-risk: narrow
Directive: Treat encoder changes as versioned index migrations, not in-place model swaps
Tested: reviewed rendered markdown content, docs index link, changelog entry, and git diff for the three touched docs
Not-tested: git push / remote sync outcome depends on repository remote state
1 parent 73d28fae
1 ### Stage: production encoder freeze FAQ and rollout guidance
2
3 完成项:
4 - 新增文档:
5 - `docs/production-encoder-freeze-and-embedding-strategy.md`
6 - 文档内容覆盖:
7 - 为什么当前应先冻结 encoder
8 - 当前结构的泛化能力边界
9 - 外置模型权重后如何给其他歌曲直接使用
10 - wav/mp3/flac/ogg 集合如何快速进入 manifest -> build-index -> evaluate 链路
11 - 30 万首生产曲库下 embedding/version/index 的治理建议
12 - encoder 升级后哪些数据必须重建、哪些元数据可以保留
13 - docs 入口已补充:
14 - `docs/README.md` 新增该答疑文档链接
15
16 结论:
17 - 当前阶段先冻结 `encoder v1` 是更稳妥的生产决策
18 - 生产环境应把模型文件、embedding 版本、reference 索引与评测报告解耦管理
19 - 后续新 encoder 应走“离线 shadow build -> A/B -> 切换”的升级路径,而不是直接覆盖旧 embedding 库
20
1 ## 2026-06-02 16:11 UTC / hum_guard fresh eval did not beat hum_focus 21 ## 2026-06-02 16:11 UTC / hum_guard fresh eval did not beat hum_focus
2 22
3 -`/tmp/dualaxis_sweep/hum_guard/eval.json` 做了最新复核 23 -`/tmp/dualaxis_sweep/hum_guard/eval.json` 做了最新复核
......
...@@ -91,6 +91,7 @@ cd /workspace/acr-engine ...@@ -91,6 +91,7 @@ cd /workspace/acr-engine
91 - [数据规范](./dataset-spec.md) 91 - [数据规范](./dataset-spec.md)
92 - [开放数据工作流](./open-dataset-workflow.md) 92 - [开放数据工作流](./open-dataset-workflow.md)
93 - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md) 93 - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md)
94 - [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md)
94 - [数据来源与接入](./dataset-sources-and-licensing.md) 95 - [数据来源与接入](./dataset-sources-and-licensing.md)
95 - [工业评测规范](./industrial-benchmark-spec.md) 96 - [工业评测规范](./industrial-benchmark-spec.md)
96 97
......