Bootstrap the Phase-1 model registry on live PostgreSQL

Constraint: Continue the Ralph loop without waiting on missing business sample mounts, while still leaving a push-ready implementation and documentation trail Rejected: Keep Phase-1 registry setup as static SQL snippets only | It slows live validation and leaves no machine-checkable bootstrap path Confidence: high Scope-risk: narrow Directive: Treat model_registry/feature_set_registry/reference_set_registry as the mandatory entrypoint before any future MERT/MuQ extraction jobs Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_model_registry_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_model_registry_live.py acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md Not-tested: Actual MERT/MuQ embedding extraction, hard-case type_8/type_16 live queries, multi-recording/cover-lane retrieval

Bootstrap the Phase-1 model registry on live PostgreSQL
Constraint: Continue the Ralph loop without waiting on missing business sample mounts, while still leaving a push-ready implementation and documentation trail Rejected: Keep Phase-1 registry setup as static SQL snippets only | It slows live validation and leaves no machine-checkable bootstrap path Confidence: high Scope-risk: narrow Directive: Treat model_registry/feature_set_registry/reference_set_registry as the mandatory entrypoint before any future MERT/MuQ extraction jobs Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_model_registry_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_model_registry_live.py acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md Not-tested: Actual MERT/MuQ embedding extraction, hard-case type_8/type_16 live queries, multi-recording/cover-lane retrieval
cnb.bofCdSsphPA
Commit fef8f438 ... fef8f4387d95be5d4a017ba55150b6fa7463f1f6 authored 2026-06-04 12:44:49 +0800 by cnb.bofCdSsphPA
Showing 6 changed files with 588 additions and 0 deletions
acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json
acr-engine/scripts/bootstrap_phase1_model_registry_live.py
docs/CHANGELOG.md
docs/model-feature-registry-bootstrap.md
docs/postgres_db_schema_samples.md
docs/session-handoff.md
--- a/acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json 0 → 100644
View file @fef8f43
+++ b/acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json 0 → 100644
View file @fef8f43
+{
+  "schema": "acr_test",
+  "dsn_redacted": "postgres://d2:***@127.0.0.1:5432/d2",
+  "models": [
+    {
+      "model_id": 2,
+      "model_name": "chromaprint",
+      "model_version": "v1",
+      "output_embedding_dim": null
+    },
+    {
+      "model_id": 3,
+      "model_name": "mert",
+      "model_version": "v1-95m",
+      "output_embedding_dim": 768
+    },
+    {
+      "model_id": 4,
+      "model_name": "muq",
+      "model_version": "large-msd-iter",
+      "output_embedding_dim": 768
+    },
+    {
+      "model_id": 5,
+      "model_name": "ecapa",
+      "model_version": "acr-baseline-v1",
+      "output_embedding_dim": 192
+    }
+  ],
+  "feature_sets": [
+    {
+      "feature_set_id": 2,
+      "model_name": "chromaprint",
+      "model_version": "v1",
+      "feature_name": "fingerprint_asset",
+      "window_sec": 5.0,
+      "hop_sec": 2.5,
+      "embedding_dim": null,
+      "distance_metric": "hamming"
+    },
+    {
+      "feature_set_id": 3,
+      "model_name": "mert",
+      "model_version": "v1-95m",
+      "feature_name": "semantic_embedding",
+      "window_sec": 5.0,
+      "hop_sec": 2.5,
+      "embedding_dim": 768,
+      "distance_metric": "cosine"
+    },
+    {
+      "feature_set_id": 4,
+      "model_name": "mert",
+      "model_version": "v1-95m",
+      "feature_name": "semantic_embedding",
+      "window_sec": 10.0,
+      "hop_sec": 5.0,
+      "embedding_dim": 768,
+      "distance_metric": "cosine"
+    },
+    {
+      "feature_set_id": 5,
+      "model_name": "muq",
+      "model_version": "large-msd-iter",
+      "feature_name": "semantic_embedding",
+      "window_sec": 5.0,
+      "hop_sec": 2.5,
+      "embedding_dim": 768,
+      "distance_metric": "cosine"
+    },
+    {
+      "feature_set_id": 6,
+      "model_name": "ecapa",
+      "model_version": "acr-baseline-v1",
+      "feature_name": "semantic_embedding",
+      "window_sec": 5.0,
+      "hop_sec": 2.5,
+      "embedding_dim": 192,
+      "distance_metric": "cosine"
+    }
+  ],
+  "reference_set": {
+    "reference_set_id": 2,
+    "set_name": "phase1_hot_reference_v1",
+    "encoder_scope": "chromaprint-v1 / mert-v1-95m / muq-large-msd-iter"
+  },
+  "counts": {
+    "model_registry": 5,
+    "feature_set_registry": 6,
+    "reference_set_registry": 2
+  }
+}
\ No newline at end of file
--- a/acr-engine/scripts/bootstrap_phase1_model_registry_live.py 0 → 100755
View file @fef8f43
+++ b/acr-engine/scripts/bootstrap_phase1_model_registry_live.py 0 → 100755
View file @fef8f43
+#!/usr/bin/env /usr/local/miniconda3/bin/python
+from __future__ import annotations
+
+import argparse
+import json
+from pathlib import Path
+from typing import Any
+
+import psycopg
+
+ROOT = Path(__file__).resolve().parents[1]
+DEFAULT_OUTPUT = ROOT / 'data' / 'pgvector_eval' / 'music20' / 'phase1_registry_bootstrap_report.json'
+
+MODELS = [
+    {
+        'model_name': 'chromaprint',
+        'model_family': 'fingerprint',
+        'model_version': 'v1',
+        'model_source': 'acoustid',
+        'model_uri': 'https://acoustid.org/chromaprint',
+        'license_name': 'lgpl-2.1',
+        'input_modality': 'audio',
+        'input_sample_rate': 16000,
+        'input_channel_mode': 'mono',
+        'default_window_sec': 5.0,
+        'default_hop_sec': 2.5,
+        'output_embedding_dim': None,
+        'pooling_supported': ['none'],
+        'layer_selection_supported': False,
+        'is_trainable': False,
+        'metadata_json': {
+            'lane': 'exact',
+            'phase': 'phase1',
+            'note': 'exact fingerprint lane baseline',
+        },
+    },
+    {
+        'model_name': 'mert',
+        'model_family': 'music_ssl',
+        'model_version': 'v1-95m',
+        'model_source': 'github',
+        'model_uri': 'https://github.com/yizhilll/MERT',
+        'license_name': 'apache-2.0',
+        'input_modality': 'audio',
+        'input_sample_rate': 24000,
+        'input_channel_mode': 'mono',
+        'default_window_sec': 5.0,
+        'default_hop_sec': 2.5,
+        'output_embedding_dim': 768,
+        'pooling_supported': ['mean', 'cls'],
+        'layer_selection_supported': True,
+        'is_trainable': False,
+        'metadata_json': {
+            'lane': 'semantic',
+            'role': 'primary_baseline',
+            'phase': 'phase1',
+        },
+    },
+    {
+        'model_name': 'muq',
+        'model_family': 'music_ssl',
+        'model_version': 'large-msd-iter',
+        'model_source': 'github',
+        'model_uri': 'https://github.com/tencent-ailab/MuQ',
+        'license_name': 'apache-2.0',
+        'input_modality': 'audio',
+        'input_sample_rate': 24000,
+        'input_channel_mode': 'mono',
+        'default_window_sec': 5.0,
+        'default_hop_sec': 2.5,
+        'output_embedding_dim': 768,
+        'pooling_supported': ['mean', 'cls'],
+        'layer_selection_supported': True,
+        'is_trainable': False,
+        'metadata_json': {
+            'lane': 'semantic',
+            'role': 'challenger',
+            'phase': 'phase1',
+        },
+    },
+    {
+        'model_name': 'ecapa',
+        'model_family': 'speech_derived',
+        'model_version': 'acr-baseline-v1',
+        'model_source': 'local',
+        'model_uri': None,
+        'license_name': 'internal-eval',
+        'input_modality': 'audio',
+        'input_sample_rate': 16000,
+        'input_channel_mode': 'mono',
+        'default_window_sec': 5.0,
+        'default_hop_sec': 2.5,
+        'output_embedding_dim': 192,
+        'pooling_supported': ['mean'],
+        'layer_selection_supported': False,
+        'is_trainable': True,
+        'metadata_json': {
+            'lane': 'semantic',
+            'role': 'historical_baseline',
+            'phase': 'phase1',
+        },
+    },
+]
+
+FEATURE_SETS = [
+    {
+        'model_name': 'chromaprint',
+        'model_version': 'v1',
+        'feature_name': 'fingerprint_asset',
+        'feature_level': 'asset',
+        'extraction_granularity': 'full_asset',
+        'window_sec': 5.0,
+        'hop_sec': 2.5,
+        'embedding_dim': None,
+        'pooling_strategy': 'none',
+        'layer_selection': 'na',
+        'normalize_l2': False,
+        'distance_metric': 'hamming',
+        'quantization_type': 'fingerprint_hash',
+        'feature_schema_version': 'v1',
+        'config_json': {'lane': 'exact', 'index_target': 'audio_fingerprint'},
+        'status': 'active',
+    },
+    {
+        'model_name': 'mert',
+        'model_version': 'v1-95m',
+        'feature_name': 'semantic_embedding',
+        'feature_level': 'window',
+        'extraction_granularity': 'sliding_window',
+        'window_sec': 5.0,
+        'hop_sec': 2.5,
+        'embedding_dim': 768,
+        'pooling_strategy': 'mean',
+        'layer_selection': 'final',
+        'normalize_l2': True,
+        'distance_metric': 'cosine',
+        'quantization_type': None,
+        'feature_schema_version': 'v1',
+        'config_json': {'role': 'primary_semantic_baseline'},
+        'status': 'active',
+    },
+    {
+        'model_name': 'mert',
+        'model_version': 'v1-95m',
+        'feature_name': 'semantic_embedding',
+        'feature_level': 'window',
+        'extraction_granularity': 'sliding_window',
+        'window_sec': 10.0,
+        'hop_sec': 5.0,
+        'embedding_dim': 768,
+        'pooling_strategy': 'mean',
+        'layer_selection': 'final',
+        'normalize_l2': True,
+        'distance_metric': 'cosine',
+        'quantization_type': None,
+        'feature_schema_version': 'v1',
+        'config_json': {'role': 'long_context_validation'},
+        'status': 'active',
+    },
+    {
+        'model_name': 'muq',
+        'model_version': 'large-msd-iter',
+        'feature_name': 'semantic_embedding',
+        'feature_level': 'window',
+        'extraction_granularity': 'sliding_window',
+        'window_sec': 5.0,
+        'hop_sec': 2.5,
+        'embedding_dim': 768,
+        'pooling_strategy': 'mean',
+        'layer_selection': 'final',
+        'normalize_l2': True,
+        'distance_metric': 'cosine',
+        'quantization_type': None,
+        'feature_schema_version': 'v1',
+        'config_json': {'role': 'semantic_challenger'},
+        'status': 'active',
+    },
+    {
+        'model_name': 'ecapa',
+        'model_version': 'acr-baseline-v1',
+        'feature_name': 'semantic_embedding',
+        'feature_level': 'window',
+        'extraction_granularity': 'sliding_window',
+        'window_sec': 5.0,
+        'hop_sec': 2.5,
+        'embedding_dim': 192,
+        'pooling_strategy': 'mean',
+        'layer_selection': 'na',
+        'normalize_l2': True,
+        'distance_metric': 'cosine',
+        'quantization_type': None,
+        'feature_schema_version': 'v1',
+        'config_json': {'role': 'historical_baseline'},
+        'status': 'active',
+    },
+]
+
+REFERENCE_SET = {
+    'set_name': 'phase1_hot_reference_v1',
+    'description': 'Phase-1 hot reference set bootstrap for MERT/MuQ/Chromaprint lanes',
+    'encoder_scope': 'chromaprint-v1 / mert-v1-95m / muq-large-msd-iter',
+    'status': 'active',
+    'metadata_json': {
+        'phase': 'phase1',
+        'purpose': 'registry_bootstrap',
+    },
+}
+
+
+def upsert_model(conn: psycopg.Connection, model: dict[str, Any]) -> int:
+    row = conn.execute(
+        """
+        INSERT INTO model_registry (
+            model_name, model_family, model_version, model_source, model_uri,
+            license_name, input_modality, input_sample_rate, input_channel_mode,
+            default_window_sec, default_hop_sec, output_embedding_dim,
+            pooling_supported, layer_selection_supported, is_trainable, metadata_json
+        ) VALUES (
+            %(model_name)s, %(model_family)s, %(model_version)s, %(model_source)s, %(model_uri)s,
+            %(license_name)s, %(input_modality)s, %(input_sample_rate)s, %(input_channel_mode)s,
+            %(default_window_sec)s, %(default_hop_sec)s, %(output_embedding_dim)s,
+            %(pooling_supported)s, %(layer_selection_supported)s, %(is_trainable)s, %(metadata_json)s::jsonb
+        )
+        ON CONFLICT (model_name, model_version)
+        DO UPDATE SET
+            model_family = EXCLUDED.model_family,
+            model_source = EXCLUDED.model_source,
+            model_uri = EXCLUDED.model_uri,
+            license_name = EXCLUDED.license_name,
+            input_modality = EXCLUDED.input_modality,
+            input_sample_rate = EXCLUDED.input_sample_rate,
+            input_channel_mode = EXCLUDED.input_channel_mode,
+            default_window_sec = EXCLUDED.default_window_sec,
+            default_hop_sec = EXCLUDED.default_hop_sec,
+            output_embedding_dim = EXCLUDED.output_embedding_dim,
+            pooling_supported = EXCLUDED.pooling_supported,
+            layer_selection_supported = EXCLUDED.layer_selection_supported,
+            is_trainable = EXCLUDED.is_trainable,
+            metadata_json = EXCLUDED.metadata_json,
+            updated_at = NOW()
+        RETURNING model_id;
+        """,
+        {**model, 'metadata_json': json.dumps(model['metadata_json'])},
+    ).fetchone()
+    return int(row[0])
+
+
+def ensure_feature_set(conn: psycopg.Connection, model_id: int, feature: dict[str, Any]) -> int:
+    existing = conn.execute(
+        """
+        SELECT feature_set_id
+        FROM feature_set_registry
+        WHERE model_id = %s
+          AND feature_name = %s
+          AND feature_level = %s
+          AND extraction_granularity = %s
+          AND coalesce(window_sec, -1) = coalesce(%s, -1)
+          AND coalesce(hop_sec, -1) = coalesce(%s, -1)
+          AND coalesce(embedding_dim, -1) = coalesce(%s, -1)
+          AND coalesce(pooling_strategy, '') = coalesce(%s, '')
+          AND coalesce(layer_selection, '') = coalesce(%s, '')
+          AND normalize_l2 = %s
+          AND distance_metric = %s
+          AND coalesce(feature_schema_version, '') = coalesce(%s, '');
+        """,
+        (
+            model_id,
+            feature['feature_name'],
+            feature['feature_level'],
+            feature['extraction_granularity'],
+            feature['window_sec'],
+            feature['hop_sec'],
+            feature['embedding_dim'],
+            feature['pooling_strategy'],
+            feature['layer_selection'],
+            feature['normalize_l2'],
+            feature['distance_metric'],
+            feature['feature_schema_version'],
+        ),
+    ).fetchone()
+    if existing:
+        conn.execute(
+            "UPDATE feature_set_registry SET config_json = %s::jsonb, status = %s, updated_at = NOW() WHERE feature_set_id = %s",
+            (json.dumps(feature['config_json']), feature['status'], existing[0]),
+        )
+        return int(existing[0])
+
+    row = conn.execute(
+        """
+        INSERT INTO feature_set_registry (
+            model_id, feature_name, feature_level, extraction_granularity,
+            window_sec, hop_sec, embedding_dim, pooling_strategy, layer_selection,
+            normalize_l2, distance_metric, quantization_type, feature_schema_version,
+            config_json, status
+        ) VALUES (
+            %s, %s, %s, %s,
+            %s, %s, %s, %s, %s,
+            %s, %s, %s, %s,
+            %s::jsonb, %s
+        )
+        RETURNING feature_set_id;
+        """,
+        (
+            model_id,
+            feature['feature_name'],
+            feature['feature_level'],
+            feature['extraction_granularity'],
+            feature['window_sec'],
+            feature['hop_sec'],
+            feature['embedding_dim'],
+            feature['pooling_strategy'],
+            feature['layer_selection'],
+            feature['normalize_l2'],
+            feature['distance_metric'],
+            feature['quantization_type'],
+            feature['feature_schema_version'],
+            json.dumps(feature['config_json']),
+            feature['status'],
+        ),
+    ).fetchone()
+    return int(row[0])
+
+
+def upsert_reference_set(conn: psycopg.Connection, payload: dict[str, Any]) -> int:
+    row = conn.execute(
+        """
+        INSERT INTO reference_set_registry (set_name, description, encoder_scope, status, metadata_json)
+        VALUES (%s, %s, %s, %s, %s::jsonb)
+        ON CONFLICT (set_name)
+        DO UPDATE SET
+            description = EXCLUDED.description,
+            encoder_scope = EXCLUDED.encoder_scope,
+            status = EXCLUDED.status,
+            metadata_json = EXCLUDED.metadata_json,
+            updated_at = NOW()
+        RETURNING reference_set_id;
+        """,
+        (
+            payload['set_name'],
+            payload['description'],
+            payload['encoder_scope'],
+            payload['status'],
+            json.dumps(payload['metadata_json']),
+        ),
+    ).fetchone()
+    return int(row[0])
+
+
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument('--dsn', required=True)
+    ap.add_argument('--schema', default='acr_test')
+    ap.add_argument('--output', default=str(DEFAULT_OUTPUT))
+    args = ap.parse_args()
+
+    summary: dict[str, Any] = {
+        'schema': args.schema,
+        'dsn_redacted': 'postgres://d2:***@127.0.0.1:5432/d2',
+        'models': [],
+        'feature_sets': [],
+        'reference_set': None,
+    }
+
+    with psycopg.connect(args.dsn, autocommit=True) as conn:
+        conn.execute(f'SET search_path TO {args.schema}, public;')
+        model_ids: dict[tuple[str, str], int] = {}
+        for model in MODELS:
+            model_id = upsert_model(conn, model)
+            model_ids[(model['model_name'], model['model_version'])] = model_id
+            summary['models'].append({
+                'model_id': model_id,
+                'model_name': model['model_name'],
+                'model_version': model['model_version'],
+                'output_embedding_dim': model['output_embedding_dim'],
+            })
+
+        for feature in FEATURE_SETS:
+            model_id = model_ids[(feature['model_name'], feature['model_version'])]
+            feature_set_id = ensure_feature_set(conn, model_id, feature)
+            summary['feature_sets'].append({
+                'feature_set_id': feature_set_id,
+                'model_name': feature['model_name'],
+                'model_version': feature['model_version'],
+                'feature_name': feature['feature_name'],
+                'window_sec': feature['window_sec'],
+                'hop_sec': feature['hop_sec'],
+                'embedding_dim': feature['embedding_dim'],
+                'distance_metric': feature['distance_metric'],
+            })
+
+        reference_set_id = upsert_reference_set(conn, REFERENCE_SET)
+        summary['reference_set'] = {
+            'reference_set_id': reference_set_id,
+            'set_name': REFERENCE_SET['set_name'],
+            'encoder_scope': REFERENCE_SET['encoder_scope'],
+        }
+        summary['counts'] = {
+            'model_registry': int(conn.execute('SELECT count(*) FROM model_registry;').fetchone()[0]),
+            'feature_set_registry': int(conn.execute('SELECT count(*) FROM feature_set_registry;').fetchone()[0]),
+            'reference_set_registry': int(conn.execute('SELECT count(*) FROM reference_set_registry;').fetchone()[0]),
+        }
+
+    out = Path(args.output)
+    out.parent.mkdir(parents=True, exist_ok=True)
+    out.write_text(json.dumps(summary, ensure_ascii=False, indent=2), encoding='utf-8')
+    print(json.dumps(summary, ensure_ascii=False, indent=2))
+
+
+if __name__ == '__main__':
+    main()
--- a/docs/CHANGELOG.md
View file @fef8f43
+++ b/docs/CHANGELOG.md
View file @fef8f43
 ## 2026-06-04

+- 新增 `acr-engine/scripts/bootstrap_phase1_model_registry_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json`，把 Phase-1 的 `chromaprint / mert / muq / ecapa` 与对应 `feature_set_registry / reference_set_registry` 初始化做成可直接连 PostgreSQL 的 live bootstrap 脚本，并已在 `acr_test` schema 验证通过。
 - 补充文档阻塞事实：当前容器里缺少 `/workspace/downloads`，因此本轮无法直接从业务样本目录继续生成 `type_8 / type_16` 的 live PostgreSQL query JSONL；已把该环境前提写入 handoff 与 PostgreSQL 样例文档。
 - 更新 [PostgreSQL 落库样例与 live 测试链路](./postgres_db_schema_samples.md) 与 `acr-engine/scripts/live_pgvector_music20_eval.py`，把 lineage 负例验证从单条 `audio_window` 扩展到 `recording` / `audio_window` / `audio_embedding` 三类核心 trigger，并已重跑 live pgvector 报告确认检索指标不变；同时补充 `py_compile` 与 `diff --check` 通过的机械验证事实。
 - 新增 [PostgreSQL 落库样例与 live 测试链路](./postgres_db_schema_samples.md)，补齐 `acr_pg_schema_v2.sql` 的真实落库样例、`pgvector` live 检索验证、lineage trigger 负例测试，以及当前召回/混淆结果解读。
--- a/docs/model-feature-registry-bootstrap.md
View file @fef8f43
+++ b/docs/model-feature-registry-bootstrap.md
View file @fef8f43
@@ -216,3 +216,67 @@ flowchart TD
 6. `phase1_hot_reference_v1`

 这样数据、模型、索引三条线就都有了稳定入口。
+
+---
+
+## 8. live PostgreSQL bootstrap 脚本
+
+为了避免每次手工执行 SQL，本仓库现在提供了一个可直接连 PostgreSQL 的 live bootstrap 脚本：
+
+- `acr-engine/scripts/bootstrap_phase1_model_registry_live.py`
+
+用途：
+- 向目标 schema 写入 `model_registry`
+- 写入 `feature_set_registry`
+- 写入 `reference_set_registry`
+- 采用 **幂等式 upsert / ensure** 方式，适合重复执行
+
+### 8.1 执行命令
+
+```bash
+cd /workspace/acr-engine
+/usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py \
+  --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' \
+  --schema acr_test \
+  --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json
+```
+
+### 8.2 当前已验证结果（acr_test）
+
+本轮已在 `acr_test` schema 上真实执行，写入结果如下：
+
+| 对象 | 数量 |
+|---|---:|
+| `model_registry` | `5` |
+| `feature_set_registry` | `6` |
+| `reference_set_registry` | `2` |
+
+其中新增的 Phase-1 对象包含：
+
+#### models
+- `chromaprint v1`
+- `mert v1-95m`
+- `muq large-msd-iter`
+- `ecapa acr-baseline-v1`
+
+#### feature sets
+- `chromaprint fingerprint_asset`
+- `mert semantic_embedding 5s/2.5s`
+- `mert semantic_embedding 10s/5s`
+- `muq semantic_embedding 5s/2.5s`
+- `ecapa semantic_embedding 5s/2.5s`
+
+#### reference set
+- `phase1_hot_reference_v1`
+
+### 8.3 当前产物
+
+- `acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json`
+
+这个文件已经记录了：
+- model_id
+- feature_set_id
+- reference_set_id
+- 最终表计数
+
+因此，下次 session 不需要再从 SQL 片段手工执行开始，而可以直接从 live bootstrap 脚本接上。
--- a/docs/postgres_db_schema_samples.md
View file @fef8f43
+++ b/docs/postgres_db_schema_samples.md
View file @fef8f43
@@ -62,8 +62,10 @@
 |---|---|
 | 推荐 DDL | `acr-engine/sql/acr_pg_schema_v2.sql` |
 | live 测试脚本 | `acr-engine/scripts/live_pgvector_music20_eval.py` |
+| registry bootstrap 脚本 | `acr-engine/scripts/bootstrap_phase1_model_registry_live.py` |
 | live 报告 | `acr-engine/data/pgvector_eval/music20/live_pgvector_report.json` |
 | FAISS 对照报告 | `acr-engine/data/pgvector_eval/music20/songid_eval_report_fresh.json` |
+| registry bootstrap 报告 | `acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json` |
 | 历史对照报告 | `acr-engine/data/pgvector_eval/music20/songid_eval_report.json` |

 ---
@@ -379,6 +381,23 @@ flowchart LR

 ## 推荐的下一步

+### 本轮新增：Phase-1 registry 已可 live bootstrap
+
+除了 live 检索脚本外，本轮还新增了：
+
+- `acr-engine/scripts/bootstrap_phase1_model_registry_live.py`
+
+它已经在 `acr_test` schema 上真实写入了：
+- `chromaprint`
+- `mert`
+- `muq`
+- `ecapa`
+- 对应 feature sets
+- `phase1_hot_reference_v1`
+
+对应 live 报告：
+- `acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json`
+
 ### 路线 1：继续做 PostgreSQL 工程化

 1. 把 `live_pgvector_music20_eval.py` 泛化成：
--- a/docs/session-handoff.md
View file @fef8f43
+++ b/docs/session-handoff.md
View file @fef8f43
@@ -24,6 +24,7 @@
 - SOTA 演进路径已明确：**Phase-1 先走 encoder-only**
 - PostgreSQL 主数据与特征注册 DDL 已落地为推荐版 schema
 - Phase-1 实施 checklist 和 model/feature/reference set 初始化手册已补齐
+- `acr_test` schema 上已经真实完成 Phase-1 `model_registry / feature_set_registry / reference_set_registry` bootstrap 验证

 当前最重要的下一步不是继续写方案，而是：

@@ -180,6 +181,7 @@ sed -n '1,320p' acr-engine/sql/acr_pg_schema_v2.sql
 - 代码已推送远端
 - PostgreSQL `acr_test` live 路径已再次验证：`recording` / `audio_window` / `audio_embedding` 三类 lineage trigger 均有真实负例证据
 - 机械校验已补齐：`live_pgvector_music20_eval.py` 的 `py_compile` 通过，相关变更 `diff --check` 通过
+- PostgreSQL `acr_test` schema 上已真实写入 Phase-1 registry bootstrap：`chromaprint / mert / muq / ecapa` + 5 组 feature set + `phase1_hot_reference_v1`

 ### 未验证 / 仍是缺口
 - **未实际跑 MERT / MuQ encoder-only 特征抽取**