Freeze the production encoder before scaling the music index
Document the production decision to stabilize the embedding space before onboarding a 300k-song catalog, and record the migration rules for future encoder upgrades. Constraint: 300k-song production rollout makes embedding churn expensive and risky Rejected: keep iterating encoder before defining a production embedding version | would force repeated full-vector rebuilds and unstable rollout criteria Confidence: high Scope-risk: narrow Directive: Treat encoder changes as versioned index migrations, not in-place model swaps Tested: reviewed rendered markdown content, docs index link, changelog entry, and git diff for the three touched docs Not-tested: git push / remote sync outcome depends on repository remote state
Showing
3 changed files
with
21 additions
and
0 deletions
| 1 | ### Stage: production encoder freeze FAQ and rollout guidance | ||
| 2 | |||
| 3 | 完成项: | ||
| 4 | - 新增文档: | ||
| 5 | - `docs/production-encoder-freeze-and-embedding-strategy.md` | ||
| 6 | - 文档内容覆盖: | ||
| 7 | - 为什么当前应先冻结 encoder | ||
| 8 | - 当前结构的泛化能力边界 | ||
| 9 | - 外置模型权重后如何给其他歌曲直接使用 | ||
| 10 | - wav/mp3/flac/ogg 集合如何快速进入 manifest -> build-index -> evaluate 链路 | ||
| 11 | - 30 万首生产曲库下 embedding/version/index 的治理建议 | ||
| 12 | - encoder 升级后哪些数据必须重建、哪些元数据可以保留 | ||
| 13 | - docs 入口已补充: | ||
| 14 | - `docs/README.md` 新增该答疑文档链接 | ||
| 15 | |||
| 16 | 结论: | ||
| 17 | - 当前阶段先冻结 `encoder v1` 是更稳妥的生产决策 | ||
| 18 | - 生产环境应把模型文件、embedding 版本、reference 索引与评测报告解耦管理 | ||
| 19 | - 后续新 encoder 应走“离线 shadow build -> A/B -> 切换”的升级路径,而不是直接覆盖旧 embedding 库 | ||
| 20 | |||
| 1 | ## 2026-06-02 16:11 UTC / hum_guard fresh eval did not beat hum_focus | 21 | ## 2026-06-02 16:11 UTC / hum_guard fresh eval did not beat hum_focus |
| 2 | 22 | ||
| 3 | - 对 `/tmp/dualaxis_sweep/hum_guard/eval.json` 做了最新复核 | 23 | - 对 `/tmp/dualaxis_sweep/hum_guard/eval.json` 做了最新复核 | ... | ... |
| ... | @@ -91,6 +91,7 @@ cd /workspace/acr-engine | ... | @@ -91,6 +91,7 @@ cd /workspace/acr-engine |
| 91 | - [数据规范](./dataset-spec.md) | 91 | - [数据规范](./dataset-spec.md) |
| 92 | - [开放数据工作流](./open-dataset-workflow.md) | 92 | - [开放数据工作流](./open-dataset-workflow.md) |
| 93 | - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md) | 93 | - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md) |
| 94 | - [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md) | ||
| 94 | - [数据来源与接入](./dataset-sources-and-licensing.md) | 95 | - [数据来源与接入](./dataset-sources-and-licensing.md) |
| 95 | - [工业评测规范](./industrial-benchmark-spec.md) | 96 | - [工业评测规范](./industrial-benchmark-spec.md) |
| 96 | 97 | ... | ... |
This diff is collapsed.
Click to expand it.
-
Please register or sign in to post a comment