model-feature-registry-bootstrap.md 5.83 KB

模型与 Feature Set 初始化手册 / Model & Feature Registry Bootstrap

更新:2026-06-04
目标:给出 Phase-1 里 model_registryfeature_set_registryreference_set_registry 的初始化约定,避免每次接入新 encoder 时重新设计。

一页结论

Phase-1 不微调底座时,真正需要初始化的不是“训练任务”,而是三类对象:

  1. 模型定义model_registry
  2. 特征定义feature_set_registry
  3. reference 集定义reference_set_registry

也就是说,先把“你要怎么用模型”写清楚,再开始抽特征。


1. 推荐命名约定

1.1 model_name

推荐固定小写:

  • chromaprint
  • mert
  • muq
  • ecapa
  • coverhunter_encoder(Phase-2+)

1.2 model_version

推荐表达清楚来源和规模:

  • v1-95m
  • v1-330m
  • large-msd-iter
  • acr-baseline-v1

1.3 feature set 命名

推荐格式:

<model_name>__<feature_name>__<window>x<hop>__<pooling>__<metric>

示例:

  • mert__semantic_embedding__5s_2.5s__mean__cosine
  • mert__semantic_embedding__10s_5s__mean__cosine
  • muq__semantic_embedding__5s_2.5s__mean__cosine

2. Phase-1 推荐初始化对象

2.1 模型清单

model_name model_version 角色
chromaprint v1 exact lane
mert v1-95m semantic 主 baseline
muq large-msd-iter semantic challenger
ecapa acr-baseline-v1 历史 baseline / 对照

2.2 Feature set 清单

feature_set 目的
chromaprint asset-level exact 匹配
mert 5s/2.5s mean 主 semantic baseline
mert 10s/5s mean 较长上下文验证
muq 5s/2.5s mean challenger baseline
ecapa 5s/2.5s 历史对照

3. 推荐初始化 SQL

3.1 注册模型

insert into model_registry (
    model_name, model_family, model_version, model_source, model_uri,
    license_name, input_modality, input_sample_rate, input_channel_mode,
    default_window_sec, default_hop_sec, output_embedding_dim,
    pooling_supported, layer_selection_supported, is_trainable
) values
('chromaprint', 'fingerprint', 'v1', 'local', null,
 null, 'audio', 16000, 'mono',
 5.0, 2.5, null,
 array['none'], false, false),
('mert', 'music_ssl', 'v1-95m', 'github', 'https://github.com/yizhilll/MERT',
 null, 'audio', 24000, 'mono',
 5.0, 2.5, 768,
 array['mean','cls'], true, false),
('muq', 'music_ssl', 'large-msd-iter', 'github', 'https://github.com/tencent-ailab/MuQ',
 null, 'audio', 24000, 'mono',
 5.0, 2.5, 768,
 array['mean','cls'], true, false),
('ecapa', 'speech_derived', 'acr-baseline-v1', 'local', null,
 null, 'audio', 16000, 'mono',
 5.0, 2.5, 192,
 array['mean'], false, true);

3.2 注册 feature set

insert into feature_set_registry (
    model_id, feature_name, feature_level, extraction_granularity,
    window_sec, hop_sec, embedding_dim, pooling_strategy, layer_selection,
    normalize_l2, distance_metric, quantization_type, feature_schema_version
)
select model_id, 'semantic_embedding', 'window', 'sliding_window',
       5.0, 2.5, 768, 'mean', 'final',
       true, 'cosine', 'none', 'v1'
from model_registry
where model_name = 'mert' and model_version = 'v1-95m';

insert into feature_set_registry (
    model_id, feature_name, feature_level, extraction_granularity,
    window_sec, hop_sec, embedding_dim, pooling_strategy, layer_selection,
    normalize_l2, distance_metric, quantization_type, feature_schema_version
)
select model_id, 'semantic_embedding', 'window', 'sliding_window',
       10.0, 5.0, 768, 'mean', 'final',
       true, 'cosine', 'none', 'v1'
from model_registry
where model_name = 'mert' and model_version = 'v1-95m';

insert into feature_set_registry (
    model_id, feature_name, feature_level, extraction_granularity,
    window_sec, hop_sec, embedding_dim, pooling_strategy, layer_selection,
    normalize_l2, distance_metric, quantization_type, feature_schema_version
)
select model_id, 'semantic_embedding', 'window', 'sliding_window',
       5.0, 2.5, 768, 'mean', 'final',
       true, 'cosine', 'none', 'v1'
from model_registry
where model_name = 'muq' and model_version = 'large-msd-iter';

3.3 注册 reference set

insert into reference_set_registry (
    set_name, description, encoder_scope, status
) values (
    'phase1_hot_reference_v1',
    'Phase-1 主 reference 集,仅包含当前线上热参考 recording',
    'mert-v1-95m / muq-large-msd-iter',
    'active'
);

4. reference set 的运营原则

当前建议

  • 一个时间点只允许一个主 active hot reference set
  • 新 encoder / 新聚合策略上线时,新建 set,不覆盖旧 set
  • A/B 或 shadow 期间,允许多个 set 并存,但只有一个主线上标记

为什么

这样可以支持:

  • 线上回滚
  • encoder 升级
  • 索引热切换
  • 离线重放

5. 维度扩展规则

当前 DDL 只演示了:

  • audio_embedding_vector_192
  • audio_embedding_vector_768

后续如果接入新 encoder 维度,如 1024

  1. 新增 audio_embedding_vector_1024
  2. 对应 feature_set 的 embedding_dim=1024
  3. 独立建索引
  4. 通过 retrieval_index_registry 切换

原则

  • 维度变化是 feature set 升级,不是主数据模型升级
  • 主数据层不该因 encoder 升级而改表

6. 当前推荐顺序

flowchart TD
    A[注册 model_registry] --> B[注册 feature_set_registry]
    B --> C[注册 reference_set_registry]
    C --> D[抽取 embeddings/fingerprint]
    D --> E[写 audio_embedding/audio_fingerprint]
    E --> F[建 retrieval_index_registry]

7. 最后建议

如果你今天就开始做 Phase-1 初始化,建议最少先注册:

  1. chromaprint v1
  2. mert v1-95m
  3. muq large-msd-iter
  4. mert 5s/2.5s mean
  5. muq 5s/2.5s mean
  6. phase1_hot_reference_v1

这样数据、模型、索引三条线就都有了稳定入口。