README.md
2.84 KB
ACR Docs Overview
当前仅保留与 song-centric + 融合优先 ACR 设计直接相关的文档。
0. 新同学先做什么
如果当前要继续 song-centric 主线,先跑:
cd /workspace
/usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py \
--dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' \
--schema acr_songcentric_test \
--input-root acr-engine/data/songcentric_builder_smoke \
--output-dir acr-engine/data/pgvector_eval/music20
如果要回归旧的 planner/worker 合同,再跑:
cd /workspace/acr-engine
/usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py \
--dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' \
--output data/pgvector_eval/music20/planner_validation_commands_runner_report.json
也可以用包装脚本:acr-engine/scripts/start_phase1_shortest_path.sh 'postgres://d2:d2pass@127.0.0.1:5432/d2'
当前 fresh evidence:
executed_count = 4all_passed = true
1. 当前默认设计口径
当前 Phase-1 默认按下面理解:
song -> asset -> window -> fingerprint / embedding
对应融合优先物理表:
media_entity -> audio_object -> feature_fact -> set_membership
2. 必读文档
- start-here.md
- session-handoff.md
- acr-architecture.md
- postgresql-data-model.md
- phase1-implementation-checklist.md
3. 实施相关文档
- postgresql-data-model.md — 当前唯一默认数据模型;含切片/模型/feature 落表说明与流程图
- postgres_db_schema_samples.md — PostgreSQL 存储样例
- model-feature-registry-bootstrap.md — model/feature/reference set 初始化
- phase1-worker-contract.md — worker、job、失败语义合同
- phase1-implementation-checklist.md — Phase-1 实施清单
- production-encoder-freeze-and-embedding-strategy.md — encoder-only 冻结策略
- sota-evolution-guide.md — 当前 SOTA 演进主线
4. 当前稳定结论
- 最终归属对象当前只要求稳定返回
song_id - 同一个
song下允许有多个音频文件 - 当前暂不把
recording/version作为必须返回对象 -
window仍然保留,因为它是 evidence / offset / 检索最小单元 -
feature_fact统一承载fingerprint和embedding
5. 文档维护命令
/usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs
默认会跳过 CHANGELOG.md 这类历史归档文档。