ACR Docs Overview
当前 docs 只保留与 song-centric + 4 表融合 schema 直接相关的文档。
1. 先看什么
新同学接手顺序:
- start-here.md
- session-handoff.md
- postgresql-data-model.md
- postgres_db_schema_samples.md
- CHANGELOG.md
2. 当前默认设计口径
逻辑语义:
song -> asset -> window -> fingerprint / embedding
物理落表:
media_entity -> audio_object -> feature_fact -> set_membership
核心目标:
- 最终稳定返回
song_id - 同一个
song下允许多个音频文件 -
window是切片/evidence/召回最小单元 -
feature_fact同时承载 exact lane 与 semantic lane - Phase-1 直接复用开源 encoder,不先训练/微调
3. 一键验证主链
cd /workspace
/usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py \
--dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' \
--schema acr_songcentric_test \
--input-root acr-engine/data/songcentric_builder_smoke \
--output-dir acr-engine/data/pgvector_eval/music20
包装脚本:
acr-engine/scripts/start_songcentric_shortest_path.sh 'postgres://d2:d2pass@127.0.0.1:5432/d2'
当前 fresh evidence:
song_count = 2asset_count = 2window_count = 5matcher_fingerprint_count = 5fallback_fingerprint_count = 0semantic_runtime_available = falseimport_counts.feature_fact = 24
4. 当前保留文档分别解决什么
- start-here.md:新同学 10 分钟接手入口
- session-handoff.md:下次启动从哪里继续
- postgresql-data-model.md:表设计、字段语义、流程图、设计取舍
- postgres_db_schema_samples.md:DDL、样例数据、典型 SQL、导入查询链路
- CHANGELOG.md:变更历史
5. 文档维护命令
/usr/local/miniconda3/bin/python /workspace/scripts/check_markdown_links.py --root /workspace/docs