Add voice chunking and match-context foundations for ACR service

Constraint: keep humming/recording query support lightweight and compatible with the existing FAISS-first local workflow while production retrieval remains pgvector-oriented Rejected: delaying service-path scaffolding until full production retrieval is ready | would block validation of voice-to-chunk and context export behavior Confidence: high Scope-risk: moderate Directive: keep semantics song_id-first and treat resource paths only as supporting evidence/context artifacts Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v Not-tested: live FastAPI smoke until uvicorn is available in the current interpreter environment

Add voice chunking and match-context foundations for ACR service
Constraint: keep humming/recording query support lightweight and compatible with the existing FAISS-first local workflow while production retrieval remains pgvector-oriented Rejected: delaying service-path scaffolding until full production retrieval is ready | would block validation of voice-to-chunk and context export behavior Confidence: high Scope-risk: moderate Directive: keep semantics song_id-first and treat resource paths only as supporting evidence/context artifacts Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v Not-tested: live FastAPI smoke until uvicorn is available in the current interpreter environment
cnb.bofCdSsphPA
Commit bd66c06b ... bd66c06bd7512295f9d9510ddb3ae45a150685c0 authored 2026-06-03 17:36:22 +0800 by cnb.bofCdSsphPA
Showing 12 changed files with 473 additions and 137 deletions
acr-engine/README.md
acr-engine/scripts/build_humming_eval_manifest.py
acr-engine/scripts/service_voice_smoke.py
acr-engine/src/data/voice_chunker.py
acr-engine/src/service/app.py
acr-engine/src/utils/context_exporter.py
acr-engine/tests/test_bootstrap.py
acr-engine/tests/test_context_exporter.py
acr-engine/tests/test_local_music20_acr.py
acr-engine/tests/test_voice_chunker.py
docs/CHANGELOG.md
docs/README.md
--- a/acr-engine/README.md
View file @bd66c06
+++ b/acr-engine/README.md
View file @bd66c06
@@ -123,3 +123,29 @@ cd acr-engine
 - Hybrid 分数归一化后再融合
 - full-demo 自动训练
 - 后续可接入开源数据集
+## 哼唱 / 录音识别接口（voice -> chunk -> song_id）
+当前已经补齐了两段基础能力：
+- `src/data/voice_chunker.py`：把原始 voice / humming 音频切成可检索 chunk
+- `src/utils/context_exporter.py`：把命中的 reference window 导出为上下文 clip（默认 10s）
+FastAPI 目标接口：
+- `POST /recognize/voice`
+输入：
+- 外部上传语音/录音文件
+输出：
+- `song_id`
+- `reference_audio_path`
+- `best_chunk`
+- `context_clip`
+- `chunk_results`
+说明：
+- 该接口代码已接入 `src/service/app.py`。
+- 当前环境尚缺 `uvicorn`，因此服务 smoke 需要先补运行依赖后再执行。
--- a/acr-engine/scripts/build_humming_eval_manifest.py 0 → 100755
View file @bd66c06
+++ b/acr-engine/scripts/build_humming_eval_manifest.py 0 → 100755
View file @bd66c06
+#!/usr/bin/env /usr/local/miniconda3/bin/python
+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument('--chunks-json', required=True)
+    ap.add_argument('--song-id', required=True)
+    ap.add_argument('--split', default='test')
+    ap.add_argument('--output', required=True)
+    ap.add_argument('--source-dataset', default='humming_real')
+    args = ap.parse_args()
+    payload = json.loads(Path(args.chunks_json).read_text(encoding='utf-8'))
+    rows = []
+    for chunk in payload.get('chunks', []):
+        rows.append({
+            'song_id': args.song_id,
+            'audio_path': chunk['audio_path'],
+            'duration': chunk['duration_sec'],
+            'type': 'humming_real',
+            'segment_type': 'humming_query',
+            'offset': chunk['start_sec'],
+            'source_dataset': args.source_dataset,
+            'split': args.split,
+        })
+    out = Path(args.output)
+    out.parent.mkdir(parents=True, exist_ok=True)
+    out.write_text(json.dumps(rows, ensure_ascii=False, indent=2), encoding='utf-8')
+    print(json.dumps({'rows': len(rows), 'output': str(out)}, ensure_ascii=False, indent=2))
+if __name__ == '__main__':
+    main()
--- a/acr-engine/scripts/service_voice_smoke.py 0 → 100755
View file @bd66c06
+++ b/acr-engine/scripts/service_voice_smoke.py 0 → 100755
View file @bd66c06
+#!/usr/bin/env /usr/local/miniconda3/bin/python
+from __future__ import annotations
+import json
+import subprocess
+import time
+from pathlib import Path
+from urllib.request import Request, urlopen
+BASE = 'http://127.0.0.1:8000'
+def post_multipart(url: str, file_path: Path):
+    boundary = '----acrboundary'
+    data = file_path.read_bytes()
+    body = (
+        f'--{boundary}\r\n'
+        f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n'
+        f'Content-Type: audio/wav\r\n\r\n'
+    ).encode('utf-8') + data + f'\r\n--{boundary}--\r\n'.encode('utf-8')
+    req = Request(url, data=body, method='POST')
+    req.add_header('Content-Type', f'multipart/form-data; boundary={boundary}')
+    with urlopen(req, timeout=20) as resp:
+        return json.loads(resp.read().decode('utf-8'))
+def main():
+    cmd = [
+        '/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000'
+    ]
+    proc = subprocess.Popen(cmd, cwd='/root/vprecog/acr-engine', stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
+    query = Path('/workspace/downloads/111/type_7/75cd601b-7604-4b37-8132-cfab39e7c644.mp3')
+    try:
+        for _ in range(20):
+            time.sleep(0.5)
+            try:
+                result = post_multipart(BASE + '/recognize/voice', query)
+                print(json.dumps({
+                    'status': 'ok',
+                    'chunk_count': result.get('chunk_count'),
+                    'top_song_id': result.get('candidates', [{}])[0].get('song_id') if result.get('candidates') else None,
+                    'has_context': bool(result.get('candidates', [{}])[0].get('context_clip')) if result.get('candidates') else False,
+                }, ensure_ascii=False, indent=2))
+                return
+            except Exception:
+                continue
+        raise SystemExit('service voice smoke failed: service not ready or endpoint failed')
+    finally:
+        proc.terminate()
+        try:
+            proc.wait(timeout=5)
+        except subprocess.TimeoutExpired:
+            proc.kill()
+            proc.wait(timeout=5)
+if __name__ == '__main__':
+    main()
--- a/acr-engine/src/data/voice_chunker.py 0 → 100644
View file @bd66c06
+++ b/acr-engine/src/data/voice_chunker.py 0 → 100644
View file @bd66c06
+#!/usr/bin/env /usr/local/miniconda3/bin/python
+from __future__ import annotations
+import argparse
+import json
+from pathlib import Path
+from typing import List, Dict
+import librosa
+import numpy as np
+import soundfile as sf
+def normalize_audio(audio_path: str, sr: int = 16000) -> np.ndarray:
+    y, _ = librosa.load(audio_path, sr=sr, mono=True)
+    return y.astype(np.float32)
+def detect_voiced_intervals(y: np.ndarray, sr: int, top_db: int = 30, min_voiced_sec: float = 2.0) -> List[tuple[int, int]]:
+    intervals = librosa.effects.split(y, top_db=top_db)
+    min_len = int(sr * min_voiced_sec)
+    kept = []
+    for start, end in intervals:
+        if end - start >= min_len:
+            kept.append((int(start), int(end)))
+    return kept
+def chunk_intervals(intervals: List[tuple[int, int]], sr: int, target_chunk_sec: float = 8.0, stride_sec: float = 4.0) -> List[tuple[int, int, bool]]:
+    chunk_len = int(sr * target_chunk_sec)
+    stride = int(sr * stride_sec)
+    chunks: List[tuple[int, int, bool]] = []
+    for start, end in intervals:
+        seg_len = end - start
+        if seg_len < chunk_len:
+            chunks.append((start, end, True))
+            continue
+        pos = start
+        while pos + chunk_len <= end:
+            chunks.append((pos, pos + chunk_len, False))
+            pos += stride
+        if pos < end and end - pos >= int(sr * 2.0):
+            tail_start = max(start, end - chunk_len)
+            chunks.append((tail_start, end, end - tail_start < chunk_len))
+    deduped = []
+    seen = set()
+    for item in chunks:
+        key = (item[0], item[1])
+        if key not in seen:
+            deduped.append(item)
+            seen.add(key)
+    return deduped
+def write_chunks(y: np.ndarray, sr: int, chunks: List[tuple[int, int, bool]], output_dir: str, source_audio_path: str) -> List[Dict]:
+    out_dir = Path(output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    chunk_len = None
+    results = []
+    for idx, (start, end, padded) in enumerate(chunks):
+        clip = y[start:end]
+        if chunk_len is None:
+            chunk_len = max(len(clip), 1)
+        target_len = max(chunk_len, len(clip))
+        if padded and len(clip) < target_len:
+            clip = np.pad(clip, (0, target_len - len(clip)))
+        chunk_path = out_dir / f'chunk_{idx:03d}.wav'
+        sf.write(str(chunk_path), clip, sr)
+        results.append({
+            'chunk_id': f'chunk_{idx:03d}',
+            'audio_path': str(chunk_path),
+            'start_sec': round(start / sr, 4),
+            'end_sec': round(end / sr, 4),
+            'duration_sec': round(len(clip) / sr, 4),
+            'padded': padded,
+            'source_audio_path': source_audio_path,
+        })
+    return results
+def voice_to_chunks(audio_path: str, output_dir: str, target_chunk_sec: float = 8.0, stride_sec: float = 4.0, min_voiced_sec: float = 2.0, top_db: int = 30, sr: int = 16000) -> List[Dict]:
+    y = normalize_audio(audio_path, sr=sr)
+    intervals = detect_voiced_intervals(y, sr=sr, top_db=top_db, min_voiced_sec=min_voiced_sec)
+    chunks = chunk_intervals(intervals, sr=sr, target_chunk_sec=target_chunk_sec, stride_sec=stride_sec)
+    return write_chunks(y, sr, chunks, output_dir, source_audio_path=audio_path)
+def main() -> None:
+    ap = argparse.ArgumentParser()
+    ap.add_argument('--input', required=True)
+    ap.add_argument('--output-dir', required=True)
+    ap.add_argument('--target-chunk-sec', type=float, default=8.0)
+    ap.add_argument('--stride-sec', type=float, default=4.0)
+    ap.add_argument('--min-voiced-sec', type=float, default=2.0)
+    ap.add_argument('--top-db', type=int, default=30)
+    ap.add_argument('--sr', type=int, default=16000)
+    ap.add_argument('--output-json', default='chunks.json')
+    args = ap.parse_args()
+    chunks = voice_to_chunks(
+        audio_path=args.input,
+        output_dir=args.output_dir,
+        target_chunk_sec=args.target_chunk_sec,
+        stride_sec=args.stride_sec,
+        min_voiced_sec=args.min_voiced_sec,
+        top_db=args.top_db,
+        sr=args.sr,
+    )
+    out_json = Path(args.output_dir) / args.output_json
+    out_json.write_text(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2), encoding='utf-8')
+    print(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2))
+if __name__ == '__main__':
+    main()
--- a/acr-engine/src/service/app.py
View file @bd66c06
+++ b/acr-engine/src/service/app.py
View file @bd66c06
--- a/acr-engine/src/utils/context_exporter.py 0 → 100644
View file @bd66c06
+++ b/acr-engine/src/utils/context_exporter.py 0 → 100644
View file @bd66c06
+from __future__ import annotations
+import shutil
+import subprocess
+import tempfile
+from pathlib import Path
+from typing import Dict, Tuple
+import librosa
+import numpy as np
+import soundfile as sf
+def load_audio(audio_path: str, sr: int = 16000) -> np.ndarray:
+    y, _ = librosa.load(audio_path, sr=sr, mono=True)
+    return y.astype(np.float32)
+def chroma_embedding(y: np.ndarray, sr: int) -> np.ndarray:
+    chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_chroma=12)
+    feat = np.concatenate([chroma.mean(axis=1), chroma.std(axis=1)], axis=0).astype(np.float32)
+    norm = np.linalg.norm(feat)
+    return feat / norm if norm > 0 else feat
+def find_best_matching_window(
+    query_audio_path: str,
+    reference_audio_path: str,
+    sr: int = 16000,
+    stride_sec: float = 1.0,
+) -> Dict:
+    query_y = load_audio(query_audio_path, sr=sr)
+    ref_y = load_audio(reference_audio_path, sr=sr)
+    query_len = len(query_y)
+    if query_len == 0:
+        raise ValueError('Empty query audio')
+    if len(ref_y) < query_len:
+        ref_y = np.pad(ref_y, (0, query_len - len(ref_y)))
+    query_feat = chroma_embedding(query_y, sr)
+    stride = max(1, int(sr * stride_sec))
+    best_score = -1.0
+    best_start = 0
+    for start in range(0, max(len(ref_y) - query_len + 1, 1), stride):
+        window = ref_y[start:start + query_len]
+        if len(window) < query_len:
+            window = np.pad(window, (0, query_len - len(window)))
+        score = float(np.dot(query_feat, chroma_embedding(window, sr)))
+        if score > best_score:
+            best_score = score
+            best_start = start
+    return {
+        'window_start_sec': round(best_start / sr, 4),
+        'window_end_sec': round((best_start + query_len) / sr, 4),
+        'window_score': round(best_score, 6),
+        'query_duration_sec': round(query_len / sr, 4),
+    }
+def export_match_context(
+    audio_path: str,
+    window_start_sec: float,
+    window_end_sec: float,
+    output_path: str,
+    context_sec: float = 10.0,
+    output_format: str = 'mp3',
+    sr: int = 16000,
+) -> Dict:
+    y = load_audio(audio_path, sr=sr)
+    center = (window_start_sec + window_end_sec) / 2.0
+    half = context_sec / 2.0
+    clip_start_sec = max(0.0, center - half)
+    clip_end_sec = min(len(y) / sr, center + half)
+    start = int(clip_start_sec * sr)
+    end = max(start + 1, int(clip_end_sec * sr))
+    clip = y[start:end]
+    output = Path(output_path)
+    output.parent.mkdir(parents=True, exist_ok=True)
+    actual_format = output_format
+    if output_format == 'mp3' and shutil.which('ffmpeg'):
+        with tempfile.TemporaryDirectory() as tmp:
+            wav_path = Path(tmp) / 'context.wav'
+            sf.write(wav_path, clip, sr)
+            cmd = [shutil.which('ffmpeg') or 'ffmpeg', '-y', '-i', str(wav_path), str(output)]
+            subprocess.run(cmd, check=True, capture_output=True)
+    else:
+        if output_format == 'mp3':
+            actual_format = 'wav'
+            output = output.with_suffix('.wav')
+        sf.write(output, clip, sr)
+    return {
+        'source_audio_path': audio_path,
+        'clip_start_sec': round(clip_start_sec, 4),
+        'clip_end_sec': round(clip_end_sec, 4),
+        'duration_sec': round((end - start) / sr, 4),
+        'output_path': str(output),
+        'output_format': actual_format,
+    }
--- a/acr-engine/tests/test_bootstrap.py 0 → 100644
View file @bd66c06
+++ b/acr-engine/tests/test_bootstrap.py 0 → 100644
View file @bd66c06
+from pathlib import Path
+import sys
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
--- a/acr-engine/tests/test_context_exporter.py 0 → 100644
View file @bd66c06
+++ b/acr-engine/tests/test_context_exporter.py 0 → 100644
View file @bd66c06
+import tempfile
+import unittest
+from pathlib import Path
+import test_bootstrap
+import numpy as np
+import soundfile as sf
+from src.utils.context_exporter import export_match_context, find_best_matching_window
+class ContextExporterTests(unittest.TestCase):
+    def test_find_best_matching_window_returns_valid_range(self):
+        sr = 16000
+        with tempfile.TemporaryDirectory() as tmp:
+            query = Path(tmp) / 'query.wav'
+            ref = Path(tmp) / 'ref.wav'
+            tone = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)).astype(np.float32)
+            ref_y = np.concatenate([np.zeros(sr), tone, np.zeros(sr)]).astype(np.float32)
+            sf.write(query, tone, sr)
+            sf.write(ref, ref_y, sr)
+            match = find_best_matching_window(str(query), str(ref), sr=sr, stride_sec=0.5)
+            self.assertGreaterEqual(match['window_start_sec'], 0.0)
+            self.assertGreater(match['window_end_sec'], match['window_start_sec'])
+    def test_export_match_context_writes_audio(self):
+        sr = 16000
+        with tempfile.TemporaryDirectory() as tmp:
+            ref = Path(tmp) / 'ref.wav'
+            out = Path(tmp) / 'context.wav'
+            y = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 12, sr * 12, endpoint=False)).astype(np.float32)
+            sf.write(ref, y, sr)
+            info = export_match_context(str(ref), 4.0, 7.0, str(out), context_sec=10.0, output_format='wav', sr=sr)
+            self.assertTrue(Path(info['output_path']).exists())
+            self.assertEqual(info['output_format'], 'wav')
+if __name__ == '__main__':
+    unittest.main()
--- a/acr-engine/tests/test_local_music20_acr.py
View file @bd66c06
+++ b/acr-engine/tests/test_local_music20_acr.py
View file @bd66c06
@@ -2,6 +2,8 @@ import tempfile
 import unittest
 from pathlib import Path
+import test_bootstrap
 from scripts.local_music20_acr import collect_pairs, first_file
--- a/acr-engine/tests/test_voice_chunker.py 0 → 100644
View file @bd66c06
+++ b/acr-engine/tests/test_voice_chunker.py 0 → 100644
View file @bd66c06
+import tempfile
+import unittest
+from pathlib import Path
+import test_bootstrap
+import numpy as np
+import soundfile as sf
+from src.data.voice_chunker import detect_voiced_intervals, chunk_intervals, voice_to_chunks
+class VoiceChunkerTests(unittest.TestCase):
+    def test_detect_voiced_intervals_filters_short_segments(self):
+        sr = 16000
+        y = np.concatenate([
+            np.zeros(sr),
+            0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)),
+            np.zeros(sr // 2),
+        ]).astype(np.float32)
+        intervals = detect_voiced_intervals(y, sr=sr, top_db=30, min_voiced_sec=2.0)
+        self.assertEqual(len(intervals), 1)
+    def test_chunk_intervals_handles_short_and_long_regions(self):
+        sr = 16000
+        chunks = chunk_intervals([(0, sr * 3), (sr * 5, sr * 15)], sr=sr, target_chunk_sec=8.0, stride_sec=4.0)
+        self.assertTrue(any(padded for _, _, padded in chunks))
+        self.assertGreaterEqual(len(chunks), 2)
+    def test_voice_to_chunks_writes_chunk_files(self):
+        sr = 16000
+        with tempfile.TemporaryDirectory() as tmp:
+            src = Path(tmp) / 'hum.wav'
+            out = Path(tmp) / 'chunks'
+            y = np.concatenate([
+                np.zeros(sr),
+                0.2 * np.sin(2 * np.pi * 330 * np.linspace(0, 4, sr * 4, endpoint=False)),
+                np.zeros(sr),
+            ]).astype(np.float32)
+            sf.write(src, y, sr)
+            chunks = voice_to_chunks(str(src), str(out), target_chunk_sec=3.0, stride_sec=2.0, min_voiced_sec=2.0, sr=sr)
+            self.assertGreaterEqual(len(chunks), 1)
+            self.assertTrue(Path(chunks[0]['audio_path']).exists())
+if __name__ == '__main__':
+    unittest.main()
--- a/docs/CHANGELOG.md
View file @bd66c06
+++ b/docs/CHANGELOG.md
View file @bd66c06
+## 2026-06-03 voice-to-chunk and context export foundation
+- 新增 `acr-engine/src/data/voice_chunker.py`，支持 voice / humming 音频切 chunk。
+- 新增 `acr-engine/scripts/build_humming_eval_manifest.py`，支持从 chunk 结果生成 `humming_real` 评测 manifest。
+- 新增 `acr-engine/src/utils/context_exporter.py`，支持把命中的 reference window 导出成上下文 clip。
+- 扩展 `acr-engine/src/service/app.py`，加入 `POST /recognize/voice` 接口雏形。
+- 文档入口 `docs/README.md` 已简化为最新架构与最短阅读顺序。
+Fresh evidence:
+- `/usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v` => `Ran 7 tests, OK`
+- 当前环境缺 `uvicorn`，服务 smoke 尚不能直接启动，需要先补运行依赖。
 ## 2026-06-03 20-song local ACR workflow in acr-engine
 - 新增 `acr-engine/scripts/local_music20_acr.py`，在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。
--- a/docs/README.md
View file @bd66c06
+++ b/docs/README.md
View file @bd66c06
 # ACR Docs Overview
-> 更新：2026-06-02
+> 保留最新架构与最短落地入口。历史细节仍在仓库中，但默认阅读只保留下面 6 份主文档。
-## 一页结论
+## 最短阅读顺序
-当前文档入口过多，现统一浓缩为 **5 组主文档**：
+1. [session-handoff.md](./session-handoff.md)
+2. [CHANGELOG.md](./CHANGELOG.md)
+3. [acr-architecture.md](./acr-architecture.md)
+4. [dataset-spec.md](./dataset-spec.md)
+5. [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md)
+6. [runbook.md](./runbook.md)
-1. **项目与架构**
+## 当前推荐只看这几类
-2. **数据与评测**
-3. **业务数据接入**
-4. **服务与工程**
-5. **研究与路线**
-建议先只读这 5 组，不必一次看完全部细节文档。
+### 1. 项目架构
+- [acr-architecture.md](./acr-architecture.md)
+- [session-handoff.md](./session-handoff.md)
---
+### 2. 数据与评测
+- [dataset-spec.md](./dataset-spec.md)
+- [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md)
+- [open-dataset-workflow.md](./open-dataset-workflow.md)
-## 1. 文档导航图
+### 3. 运行与服务
+- [runbook.md](./runbook.md)
+- [service-api.md](./service-api.md)
-```mermaid
+### 4. 最新 hard-case 结论
-flowchart TD
+- [acr-hard-case-analysis.md](../acr-engine/../docs/acr-hard-case-analysis.md)
-    A[Docs Entry] --> B[Project Responsibility]
-    A --> C[Architecture]
-    A --> D[Dataset Spec]
-    A --> E[Business Export Chain]
-    A --> F[Service API]
-    A --> G[Industrial Benchmark]
-    A --> H[Industrialization Roadmap]
-    A --> I[Licensing & Sources]
-    A --> J[SOTA Research]
-    B --> C
+## 当前架构一句话
-    C --> D
-    D --> E
-    E --> F
-    G --> H
-    I --> H
-    J --> H
-```
---
+- `/workspace`：样本与素材来源
+- `acr-engine/`：训练、索引、识别、服务主工程
-## 2. 浓缩阅读入口
+- 本地小样本验证：优先 **FAISS**
+- 生产向量检索：统一 **pgvector**
-| 读者角色 | 建议先读 |
-|---|---|
-| 新成员 | [项目与架构](./project-responsibility-map.md), [系统架构](./acr-architecture.md) |
-| 算法/模型 | [数据规范](./dataset-spec.md), [SOTA 调研](./sota-research-2026.md) |
-| 平台/后端 | [服务接口](./service-api.md), [评测规范](./industrial-benchmark-spec.md) |
-| 数据接入 | [开放数据工作流](./open-dataset-workflow.md), [业务导出 Cookbook](./business-export-cookbook.md) |
-| 负责人/规划 | [工业化路线](./industrialization-roadmap.md), [交接文档](./session-handoff.md) |
---
-## 2.5 新 session 最短阅读顺序
-如果是新 session 接手，建议直接按这个顺序：
-1. [持续开发交接文档](./session-handoff.md)
-2. [更新记录](./CHANGELOG.md)
-3. [业务导出 Cookbook](./business-export-cookbook.md) 或 [开放数据工作流](./open-dataset-workflow.md)
-选择规则：
- 做你们自己的业务素材接入：先读 `business-export-cookbook.md`
- 做 FMA / MTG-Jamendo 这类开放数据：先读 `open-dataset-workflow.md`
-## 2.6 新 session 最短可跑命令
-如果你只是想先确认“业务导出链还能不能跑”，直接执行：
-```bash
-cd /workspace/acr-engine
-/usr/local/miniconda3/bin/python scripts/business_export_offline_smoke.py \
-  --output-root /tmp/business_export_offline_smoke
-```
-预期结果：
- 生成业务导出样例
- 生成 manifest-ready JSONL
- 生成项目 `catalog/train/test/val`
- `train.py --dry-run` 通过
-## 3. 主文档分组
-### A. 项目与架构
- [项目职责图](./project-responsibility-map.md)
- [系统架构](./acr-architecture.md)
-### B. 数据与评测
- [数据规范](./dataset-spec.md)
- [开放数据工作流](./open-dataset-workflow.md)
- [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md)
- [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md)
- [数据来源与接入](./dataset-sources-and-licensing.md)
- [工业评测规范](./industrial-benchmark-spec.md)
-快速落地入口：
- [开放数据工作流](./open-dataset-workflow.md)
- [本地开放数据落点目录](../acr-engine/data/raw/README.md)
- 离线 smoke 已验证：`acr-engine/scripts/business_export_offline_smoke.py`
-### C. 业务数据接入
- [业务素材类型与 Bucket 指南](./business-music-bucket-and-type-guide.md)
- [业务 Manifest 与 Type-Role 规范](./business-manifest-and-type-role-spec.md)
- [业务导出 Cookbook](./business-export-cookbook.md)
- [业务数据到项目 Manifest 适配](./business-project-manifest-adapter.md)
-业务数据最短链：
-1. [业务导出 Cookbook](./business-export-cookbook.md)
-2. `acr-engine/scripts/normalize_business_export.py`
-3. `acr-engine/scripts/split_business_manifest_ready.py`
-4. `acr-engine/scripts/build_business_project_manifests.py`
-5. `acr-engine/scripts/business_export_offline_smoke.py`
-### D. 服务与工程
- [服务接口](./service-api.md)
- [持续开发交接文档](./session-handoff.md)
- [当前能力地图](./current-capability-map.md)
- [首次启动检查清单](../acr-engine/FIRST_RUN_CHECKLIST.md)
- [更新记录](./CHANGELOG.md)
-### E. 研究与路线
- [工业化路线](./industrialization-roadmap.md)
- [SOTA 调研](./sota-research-2026.md)
- [引用来源总表](./references-and-sources.md)
---
-## 4. 文字说明
-现在开始减少“同层重复文档”的阅读成本：
- 先从入口页做分组
- 再在每组里保留 1~3 份主文档
- 次级细节尽量放到组内，而不是继续横向扩张文件数量
---
-## 5. 细节附录
-建议使用方式：
- 想了解项目先读 [项目职责图](./project-responsibility-map.md) + [系统架构](./acr-architecture.md)
- 想训练/评测先读 [数据规范](./dataset-spec.md)
- 想接开放数据先读 [数据来源与接入](./dataset-sources-and-licensing.md)
- 想看历史演进再读 [更新记录](./CHANGELOG.md)
-## Sources
- This file is an internal documentation navigation artifact for the current repo state.