Add voice chunking and match-context foundations for ACR service
Constraint: keep humming/recording query support lightweight and compatible with the existing FAISS-first local workflow while production retrieval remains pgvector-oriented Rejected: delaying service-path scaffolding until full production retrieval is ready | would block validation of voice-to-chunk and context export behavior Confidence: high Scope-risk: moderate Directive: keep semantics song_id-first and treat resource paths only as supporting evidence/context artifacts Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v Not-tested: live FastAPI smoke until uvicorn is available in the current interpreter environment
Showing
12 changed files
with
473 additions
and
137 deletions
| ... | @@ -123,3 +123,29 @@ cd acr-engine | ... | @@ -123,3 +123,29 @@ cd acr-engine |
| 123 | - Hybrid 分数归一化后再融合 | 123 | - Hybrid 分数归一化后再融合 |
| 124 | - full-demo 自动训练 | 124 | - full-demo 自动训练 |
| 125 | - 后续可接入开源数据集 | 125 | - 后续可接入开源数据集 |
| 126 | |||
| 127 | |||
| 128 | ## 哼唱 / 录音识别接口(voice -> chunk -> song_id) | ||
| 129 | |||
| 130 | 当前已经补齐了两段基础能力: | ||
| 131 | |||
| 132 | - `src/data/voice_chunker.py`:把原始 voice / humming 音频切成可检索 chunk | ||
| 133 | - `src/utils/context_exporter.py`:把命中的 reference window 导出为上下文 clip(默认 10s) | ||
| 134 | |||
| 135 | FastAPI 目标接口: | ||
| 136 | |||
| 137 | - `POST /recognize/voice` | ||
| 138 | |||
| 139 | 输入: | ||
| 140 | - 外部上传语音/录音文件 | ||
| 141 | |||
| 142 | 输出: | ||
| 143 | - `song_id` | ||
| 144 | - `reference_audio_path` | ||
| 145 | - `best_chunk` | ||
| 146 | - `context_clip` | ||
| 147 | - `chunk_results` | ||
| 148 | |||
| 149 | 说明: | ||
| 150 | - 该接口代码已接入 `src/service/app.py`。 | ||
| 151 | - 当前环境尚缺 `uvicorn`,因此服务 smoke 需要先补运行依赖后再执行。 | ... | ... |
| 1 | #!/usr/bin/env /usr/local/miniconda3/bin/python | ||
| 2 | from __future__ import annotations | ||
| 3 | |||
| 4 | import argparse | ||
| 5 | import json | ||
| 6 | from pathlib import Path | ||
| 7 | |||
| 8 | |||
| 9 | def main() -> None: | ||
| 10 | ap = argparse.ArgumentParser() | ||
| 11 | ap.add_argument('--chunks-json', required=True) | ||
| 12 | ap.add_argument('--song-id', required=True) | ||
| 13 | ap.add_argument('--split', default='test') | ||
| 14 | ap.add_argument('--output', required=True) | ||
| 15 | ap.add_argument('--source-dataset', default='humming_real') | ||
| 16 | args = ap.parse_args() | ||
| 17 | |||
| 18 | payload = json.loads(Path(args.chunks_json).read_text(encoding='utf-8')) | ||
| 19 | rows = [] | ||
| 20 | for chunk in payload.get('chunks', []): | ||
| 21 | rows.append({ | ||
| 22 | 'song_id': args.song_id, | ||
| 23 | 'audio_path': chunk['audio_path'], | ||
| 24 | 'duration': chunk['duration_sec'], | ||
| 25 | 'type': 'humming_real', | ||
| 26 | 'segment_type': 'humming_query', | ||
| 27 | 'offset': chunk['start_sec'], | ||
| 28 | 'source_dataset': args.source_dataset, | ||
| 29 | 'split': args.split, | ||
| 30 | }) | ||
| 31 | |||
| 32 | out = Path(args.output) | ||
| 33 | out.parent.mkdir(parents=True, exist_ok=True) | ||
| 34 | out.write_text(json.dumps(rows, ensure_ascii=False, indent=2), encoding='utf-8') | ||
| 35 | print(json.dumps({'rows': len(rows), 'output': str(out)}, ensure_ascii=False, indent=2)) | ||
| 36 | |||
| 37 | |||
| 38 | if __name__ == '__main__': | ||
| 39 | main() |
acr-engine/scripts/service_voice_smoke.py
0 → 100755
| 1 | #!/usr/bin/env /usr/local/miniconda3/bin/python | ||
| 2 | from __future__ import annotations | ||
| 3 | |||
| 4 | import json | ||
| 5 | import subprocess | ||
| 6 | import time | ||
| 7 | from pathlib import Path | ||
| 8 | from urllib.request import Request, urlopen | ||
| 9 | |||
| 10 | BASE = 'http://127.0.0.1:8000' | ||
| 11 | |||
| 12 | |||
| 13 | def post_multipart(url: str, file_path: Path): | ||
| 14 | boundary = '----acrboundary' | ||
| 15 | data = file_path.read_bytes() | ||
| 16 | body = ( | ||
| 17 | f'--{boundary}\r\n' | ||
| 18 | f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n' | ||
| 19 | f'Content-Type: audio/wav\r\n\r\n' | ||
| 20 | ).encode('utf-8') + data + f'\r\n--{boundary}--\r\n'.encode('utf-8') | ||
| 21 | req = Request(url, data=body, method='POST') | ||
| 22 | req.add_header('Content-Type', f'multipart/form-data; boundary={boundary}') | ||
| 23 | with urlopen(req, timeout=20) as resp: | ||
| 24 | return json.loads(resp.read().decode('utf-8')) | ||
| 25 | |||
| 26 | |||
| 27 | def main(): | ||
| 28 | cmd = [ | ||
| 29 | '/usr/local/miniconda3/bin/python', '-m', 'uvicorn', 'src.service.app:app', '--host', '127.0.0.1', '--port', '8000' | ||
| 30 | ] | ||
| 31 | proc = subprocess.Popen(cmd, cwd='/root/vprecog/acr-engine', stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True) | ||
| 32 | query = Path('/workspace/downloads/111/type_7/75cd601b-7604-4b37-8132-cfab39e7c644.mp3') | ||
| 33 | try: | ||
| 34 | for _ in range(20): | ||
| 35 | time.sleep(0.5) | ||
| 36 | try: | ||
| 37 | result = post_multipart(BASE + '/recognize/voice', query) | ||
| 38 | print(json.dumps({ | ||
| 39 | 'status': 'ok', | ||
| 40 | 'chunk_count': result.get('chunk_count'), | ||
| 41 | 'top_song_id': result.get('candidates', [{}])[0].get('song_id') if result.get('candidates') else None, | ||
| 42 | 'has_context': bool(result.get('candidates', [{}])[0].get('context_clip')) if result.get('candidates') else False, | ||
| 43 | }, ensure_ascii=False, indent=2)) | ||
| 44 | return | ||
| 45 | except Exception: | ||
| 46 | continue | ||
| 47 | raise SystemExit('service voice smoke failed: service not ready or endpoint failed') | ||
| 48 | finally: | ||
| 49 | proc.terminate() | ||
| 50 | try: | ||
| 51 | proc.wait(timeout=5) | ||
| 52 | except subprocess.TimeoutExpired: | ||
| 53 | proc.kill() | ||
| 54 | proc.wait(timeout=5) | ||
| 55 | |||
| 56 | |||
| 57 | if __name__ == '__main__': | ||
| 58 | main() |
acr-engine/src/data/voice_chunker.py
0 → 100644
| 1 | #!/usr/bin/env /usr/local/miniconda3/bin/python | ||
| 2 | from __future__ import annotations | ||
| 3 | |||
| 4 | import argparse | ||
| 5 | import json | ||
| 6 | from pathlib import Path | ||
| 7 | from typing import List, Dict | ||
| 8 | |||
| 9 | import librosa | ||
| 10 | import numpy as np | ||
| 11 | import soundfile as sf | ||
| 12 | |||
| 13 | |||
| 14 | def normalize_audio(audio_path: str, sr: int = 16000) -> np.ndarray: | ||
| 15 | y, _ = librosa.load(audio_path, sr=sr, mono=True) | ||
| 16 | return y.astype(np.float32) | ||
| 17 | |||
| 18 | |||
| 19 | def detect_voiced_intervals(y: np.ndarray, sr: int, top_db: int = 30, min_voiced_sec: float = 2.0) -> List[tuple[int, int]]: | ||
| 20 | intervals = librosa.effects.split(y, top_db=top_db) | ||
| 21 | min_len = int(sr * min_voiced_sec) | ||
| 22 | kept = [] | ||
| 23 | for start, end in intervals: | ||
| 24 | if end - start >= min_len: | ||
| 25 | kept.append((int(start), int(end))) | ||
| 26 | return kept | ||
| 27 | |||
| 28 | |||
| 29 | def chunk_intervals(intervals: List[tuple[int, int]], sr: int, target_chunk_sec: float = 8.0, stride_sec: float = 4.0) -> List[tuple[int, int, bool]]: | ||
| 30 | chunk_len = int(sr * target_chunk_sec) | ||
| 31 | stride = int(sr * stride_sec) | ||
| 32 | chunks: List[tuple[int, int, bool]] = [] | ||
| 33 | for start, end in intervals: | ||
| 34 | seg_len = end - start | ||
| 35 | if seg_len < chunk_len: | ||
| 36 | chunks.append((start, end, True)) | ||
| 37 | continue | ||
| 38 | pos = start | ||
| 39 | while pos + chunk_len <= end: | ||
| 40 | chunks.append((pos, pos + chunk_len, False)) | ||
| 41 | pos += stride | ||
| 42 | if pos < end and end - pos >= int(sr * 2.0): | ||
| 43 | tail_start = max(start, end - chunk_len) | ||
| 44 | chunks.append((tail_start, end, end - tail_start < chunk_len)) | ||
| 45 | deduped = [] | ||
| 46 | seen = set() | ||
| 47 | for item in chunks: | ||
| 48 | key = (item[0], item[1]) | ||
| 49 | if key not in seen: | ||
| 50 | deduped.append(item) | ||
| 51 | seen.add(key) | ||
| 52 | return deduped | ||
| 53 | |||
| 54 | |||
| 55 | def write_chunks(y: np.ndarray, sr: int, chunks: List[tuple[int, int, bool]], output_dir: str, source_audio_path: str) -> List[Dict]: | ||
| 56 | out_dir = Path(output_dir) | ||
| 57 | out_dir.mkdir(parents=True, exist_ok=True) | ||
| 58 | chunk_len = None | ||
| 59 | results = [] | ||
| 60 | for idx, (start, end, padded) in enumerate(chunks): | ||
| 61 | clip = y[start:end] | ||
| 62 | if chunk_len is None: | ||
| 63 | chunk_len = max(len(clip), 1) | ||
| 64 | target_len = max(chunk_len, len(clip)) | ||
| 65 | if padded and len(clip) < target_len: | ||
| 66 | clip = np.pad(clip, (0, target_len - len(clip))) | ||
| 67 | chunk_path = out_dir / f'chunk_{idx:03d}.wav' | ||
| 68 | sf.write(str(chunk_path), clip, sr) | ||
| 69 | results.append({ | ||
| 70 | 'chunk_id': f'chunk_{idx:03d}', | ||
| 71 | 'audio_path': str(chunk_path), | ||
| 72 | 'start_sec': round(start / sr, 4), | ||
| 73 | 'end_sec': round(end / sr, 4), | ||
| 74 | 'duration_sec': round(len(clip) / sr, 4), | ||
| 75 | 'padded': padded, | ||
| 76 | 'source_audio_path': source_audio_path, | ||
| 77 | }) | ||
| 78 | return results | ||
| 79 | |||
| 80 | |||
| 81 | def voice_to_chunks(audio_path: str, output_dir: str, target_chunk_sec: float = 8.0, stride_sec: float = 4.0, min_voiced_sec: float = 2.0, top_db: int = 30, sr: int = 16000) -> List[Dict]: | ||
| 82 | y = normalize_audio(audio_path, sr=sr) | ||
| 83 | intervals = detect_voiced_intervals(y, sr=sr, top_db=top_db, min_voiced_sec=min_voiced_sec) | ||
| 84 | chunks = chunk_intervals(intervals, sr=sr, target_chunk_sec=target_chunk_sec, stride_sec=stride_sec) | ||
| 85 | return write_chunks(y, sr, chunks, output_dir, source_audio_path=audio_path) | ||
| 86 | |||
| 87 | |||
| 88 | def main() -> None: | ||
| 89 | ap = argparse.ArgumentParser() | ||
| 90 | ap.add_argument('--input', required=True) | ||
| 91 | ap.add_argument('--output-dir', required=True) | ||
| 92 | ap.add_argument('--target-chunk-sec', type=float, default=8.0) | ||
| 93 | ap.add_argument('--stride-sec', type=float, default=4.0) | ||
| 94 | ap.add_argument('--min-voiced-sec', type=float, default=2.0) | ||
| 95 | ap.add_argument('--top-db', type=int, default=30) | ||
| 96 | ap.add_argument('--sr', type=int, default=16000) | ||
| 97 | ap.add_argument('--output-json', default='chunks.json') | ||
| 98 | args = ap.parse_args() | ||
| 99 | chunks = voice_to_chunks( | ||
| 100 | audio_path=args.input, | ||
| 101 | output_dir=args.output_dir, | ||
| 102 | target_chunk_sec=args.target_chunk_sec, | ||
| 103 | stride_sec=args.stride_sec, | ||
| 104 | min_voiced_sec=args.min_voiced_sec, | ||
| 105 | top_db=args.top_db, | ||
| 106 | sr=args.sr, | ||
| 107 | ) | ||
| 108 | out_json = Path(args.output_dir) / args.output_json | ||
| 109 | out_json.write_text(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2), encoding='utf-8') | ||
| 110 | print(json.dumps({'chunks': chunks}, ensure_ascii=False, indent=2)) | ||
| 111 | |||
| 112 | |||
| 113 | if __name__ == '__main__': | ||
| 114 | main() |
This diff is collapsed.
Click to expand it.
acr-engine/src/utils/context_exporter.py
0 → 100644
| 1 | from __future__ import annotations | ||
| 2 | |||
| 3 | import shutil | ||
| 4 | import subprocess | ||
| 5 | import tempfile | ||
| 6 | from pathlib import Path | ||
| 7 | from typing import Dict, Tuple | ||
| 8 | |||
| 9 | import librosa | ||
| 10 | import numpy as np | ||
| 11 | import soundfile as sf | ||
| 12 | |||
| 13 | |||
| 14 | def load_audio(audio_path: str, sr: int = 16000) -> np.ndarray: | ||
| 15 | y, _ = librosa.load(audio_path, sr=sr, mono=True) | ||
| 16 | return y.astype(np.float32) | ||
| 17 | |||
| 18 | |||
| 19 | def chroma_embedding(y: np.ndarray, sr: int) -> np.ndarray: | ||
| 20 | chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_chroma=12) | ||
| 21 | feat = np.concatenate([chroma.mean(axis=1), chroma.std(axis=1)], axis=0).astype(np.float32) | ||
| 22 | norm = np.linalg.norm(feat) | ||
| 23 | return feat / norm if norm > 0 else feat | ||
| 24 | |||
| 25 | |||
| 26 | def find_best_matching_window( | ||
| 27 | query_audio_path: str, | ||
| 28 | reference_audio_path: str, | ||
| 29 | sr: int = 16000, | ||
| 30 | stride_sec: float = 1.0, | ||
| 31 | ) -> Dict: | ||
| 32 | query_y = load_audio(query_audio_path, sr=sr) | ||
| 33 | ref_y = load_audio(reference_audio_path, sr=sr) | ||
| 34 | query_len = len(query_y) | ||
| 35 | if query_len == 0: | ||
| 36 | raise ValueError('Empty query audio') | ||
| 37 | if len(ref_y) < query_len: | ||
| 38 | ref_y = np.pad(ref_y, (0, query_len - len(ref_y))) | ||
| 39 | |||
| 40 | query_feat = chroma_embedding(query_y, sr) | ||
| 41 | stride = max(1, int(sr * stride_sec)) | ||
| 42 | best_score = -1.0 | ||
| 43 | best_start = 0 | ||
| 44 | for start in range(0, max(len(ref_y) - query_len + 1, 1), stride): | ||
| 45 | window = ref_y[start:start + query_len] | ||
| 46 | if len(window) < query_len: | ||
| 47 | window = np.pad(window, (0, query_len - len(window))) | ||
| 48 | score = float(np.dot(query_feat, chroma_embedding(window, sr))) | ||
| 49 | if score > best_score: | ||
| 50 | best_score = score | ||
| 51 | best_start = start | ||
| 52 | |||
| 53 | return { | ||
| 54 | 'window_start_sec': round(best_start / sr, 4), | ||
| 55 | 'window_end_sec': round((best_start + query_len) / sr, 4), | ||
| 56 | 'window_score': round(best_score, 6), | ||
| 57 | 'query_duration_sec': round(query_len / sr, 4), | ||
| 58 | } | ||
| 59 | |||
| 60 | |||
| 61 | def export_match_context( | ||
| 62 | audio_path: str, | ||
| 63 | window_start_sec: float, | ||
| 64 | window_end_sec: float, | ||
| 65 | output_path: str, | ||
| 66 | context_sec: float = 10.0, | ||
| 67 | output_format: str = 'mp3', | ||
| 68 | sr: int = 16000, | ||
| 69 | ) -> Dict: | ||
| 70 | y = load_audio(audio_path, sr=sr) | ||
| 71 | center = (window_start_sec + window_end_sec) / 2.0 | ||
| 72 | half = context_sec / 2.0 | ||
| 73 | clip_start_sec = max(0.0, center - half) | ||
| 74 | clip_end_sec = min(len(y) / sr, center + half) | ||
| 75 | start = int(clip_start_sec * sr) | ||
| 76 | end = max(start + 1, int(clip_end_sec * sr)) | ||
| 77 | clip = y[start:end] | ||
| 78 | |||
| 79 | output = Path(output_path) | ||
| 80 | output.parent.mkdir(parents=True, exist_ok=True) | ||
| 81 | actual_format = output_format | ||
| 82 | |||
| 83 | if output_format == 'mp3' and shutil.which('ffmpeg'): | ||
| 84 | with tempfile.TemporaryDirectory() as tmp: | ||
| 85 | wav_path = Path(tmp) / 'context.wav' | ||
| 86 | sf.write(wav_path, clip, sr) | ||
| 87 | cmd = [shutil.which('ffmpeg') or 'ffmpeg', '-y', '-i', str(wav_path), str(output)] | ||
| 88 | subprocess.run(cmd, check=True, capture_output=True) | ||
| 89 | else: | ||
| 90 | if output_format == 'mp3': | ||
| 91 | actual_format = 'wav' | ||
| 92 | output = output.with_suffix('.wav') | ||
| 93 | sf.write(output, clip, sr) | ||
| 94 | |||
| 95 | return { | ||
| 96 | 'source_audio_path': audio_path, | ||
| 97 | 'clip_start_sec': round(clip_start_sec, 4), | ||
| 98 | 'clip_end_sec': round(clip_end_sec, 4), | ||
| 99 | 'duration_sec': round((end - start) / sr, 4), | ||
| 100 | 'output_path': str(output), | ||
| 101 | 'output_format': actual_format, | ||
| 102 | } |
acr-engine/tests/test_bootstrap.py
0 → 100644
acr-engine/tests/test_context_exporter.py
0 → 100644
| 1 | import tempfile | ||
| 2 | import unittest | ||
| 3 | from pathlib import Path | ||
| 4 | |||
| 5 | import test_bootstrap | ||
| 6 | |||
| 7 | import numpy as np | ||
| 8 | import soundfile as sf | ||
| 9 | |||
| 10 | from src.utils.context_exporter import export_match_context, find_best_matching_window | ||
| 11 | |||
| 12 | |||
| 13 | class ContextExporterTests(unittest.TestCase): | ||
| 14 | def test_find_best_matching_window_returns_valid_range(self): | ||
| 15 | sr = 16000 | ||
| 16 | with tempfile.TemporaryDirectory() as tmp: | ||
| 17 | query = Path(tmp) / 'query.wav' | ||
| 18 | ref = Path(tmp) / 'ref.wav' | ||
| 19 | tone = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)).astype(np.float32) | ||
| 20 | ref_y = np.concatenate([np.zeros(sr), tone, np.zeros(sr)]).astype(np.float32) | ||
| 21 | sf.write(query, tone, sr) | ||
| 22 | sf.write(ref, ref_y, sr) | ||
| 23 | match = find_best_matching_window(str(query), str(ref), sr=sr, stride_sec=0.5) | ||
| 24 | self.assertGreaterEqual(match['window_start_sec'], 0.0) | ||
| 25 | self.assertGreater(match['window_end_sec'], match['window_start_sec']) | ||
| 26 | |||
| 27 | def test_export_match_context_writes_audio(self): | ||
| 28 | sr = 16000 | ||
| 29 | with tempfile.TemporaryDirectory() as tmp: | ||
| 30 | ref = Path(tmp) / 'ref.wav' | ||
| 31 | out = Path(tmp) / 'context.wav' | ||
| 32 | y = 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 12, sr * 12, endpoint=False)).astype(np.float32) | ||
| 33 | sf.write(ref, y, sr) | ||
| 34 | info = export_match_context(str(ref), 4.0, 7.0, str(out), context_sec=10.0, output_format='wav', sr=sr) | ||
| 35 | self.assertTrue(Path(info['output_path']).exists()) | ||
| 36 | self.assertEqual(info['output_format'], 'wav') | ||
| 37 | |||
| 38 | |||
| 39 | if __name__ == '__main__': | ||
| 40 | unittest.main() |
| ... | @@ -2,6 +2,8 @@ import tempfile | ... | @@ -2,6 +2,8 @@ import tempfile |
| 2 | import unittest | 2 | import unittest |
| 3 | from pathlib import Path | 3 | from pathlib import Path |
| 4 | 4 | ||
| 5 | import test_bootstrap | ||
| 6 | |||
| 5 | from scripts.local_music20_acr import collect_pairs, first_file | 7 | from scripts.local_music20_acr import collect_pairs, first_file |
| 6 | 8 | ||
| 7 | 9 | ... | ... |
acr-engine/tests/test_voice_chunker.py
0 → 100644
| 1 | import tempfile | ||
| 2 | import unittest | ||
| 3 | from pathlib import Path | ||
| 4 | |||
| 5 | import test_bootstrap | ||
| 6 | |||
| 7 | import numpy as np | ||
| 8 | import soundfile as sf | ||
| 9 | |||
| 10 | from src.data.voice_chunker import detect_voiced_intervals, chunk_intervals, voice_to_chunks | ||
| 11 | |||
| 12 | |||
| 13 | class VoiceChunkerTests(unittest.TestCase): | ||
| 14 | def test_detect_voiced_intervals_filters_short_segments(self): | ||
| 15 | sr = 16000 | ||
| 16 | y = np.concatenate([ | ||
| 17 | np.zeros(sr), | ||
| 18 | 0.2 * np.sin(2 * np.pi * 440 * np.linspace(0, 3, sr * 3, endpoint=False)), | ||
| 19 | np.zeros(sr // 2), | ||
| 20 | ]).astype(np.float32) | ||
| 21 | intervals = detect_voiced_intervals(y, sr=sr, top_db=30, min_voiced_sec=2.0) | ||
| 22 | self.assertEqual(len(intervals), 1) | ||
| 23 | |||
| 24 | def test_chunk_intervals_handles_short_and_long_regions(self): | ||
| 25 | sr = 16000 | ||
| 26 | chunks = chunk_intervals([(0, sr * 3), (sr * 5, sr * 15)], sr=sr, target_chunk_sec=8.0, stride_sec=4.0) | ||
| 27 | self.assertTrue(any(padded for _, _, padded in chunks)) | ||
| 28 | self.assertGreaterEqual(len(chunks), 2) | ||
| 29 | |||
| 30 | def test_voice_to_chunks_writes_chunk_files(self): | ||
| 31 | sr = 16000 | ||
| 32 | with tempfile.TemporaryDirectory() as tmp: | ||
| 33 | src = Path(tmp) / 'hum.wav' | ||
| 34 | out = Path(tmp) / 'chunks' | ||
| 35 | y = np.concatenate([ | ||
| 36 | np.zeros(sr), | ||
| 37 | 0.2 * np.sin(2 * np.pi * 330 * np.linspace(0, 4, sr * 4, endpoint=False)), | ||
| 38 | np.zeros(sr), | ||
| 39 | ]).astype(np.float32) | ||
| 40 | sf.write(src, y, sr) | ||
| 41 | chunks = voice_to_chunks(str(src), str(out), target_chunk_sec=3.0, stride_sec=2.0, min_voiced_sec=2.0, sr=sr) | ||
| 42 | self.assertGreaterEqual(len(chunks), 1) | ||
| 43 | self.assertTrue(Path(chunks[0]['audio_path']).exists()) | ||
| 44 | |||
| 45 | |||
| 46 | if __name__ == '__main__': | ||
| 47 | unittest.main() |
| 1 | 1 | ||
| 2 | ## 2026-06-03 voice-to-chunk and context export foundation | ||
| 3 | |||
| 4 | - 新增 `acr-engine/src/data/voice_chunker.py`,支持 voice / humming 音频切 chunk。 | ||
| 5 | - 新增 `acr-engine/scripts/build_humming_eval_manifest.py`,支持从 chunk 结果生成 `humming_real` 评测 manifest。 | ||
| 6 | - 新增 `acr-engine/src/utils/context_exporter.py`,支持把命中的 reference window 导出成上下文 clip。 | ||
| 7 | - 扩展 `acr-engine/src/service/app.py`,加入 `POST /recognize/voice` 接口雏形。 | ||
| 8 | - 文档入口 `docs/README.md` 已简化为最新架构与最短阅读顺序。 | ||
| 9 | |||
| 10 | Fresh evidence: | ||
| 11 | - `/usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v` => `Ran 7 tests, OK` | ||
| 12 | - 当前环境缺 `uvicorn`,服务 smoke 尚不能直接启动,需要先补运行依赖。 | ||
| 13 | |||
| 14 | |||
| 2 | ## 2026-06-03 20-song local ACR workflow in acr-engine | 15 | ## 2026-06-03 20-song local ACR workflow in acr-engine |
| 3 | 16 | ||
| 4 | - 新增 `acr-engine/scripts/local_music20_acr.py`,在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。 | 17 | - 新增 `acr-engine/scripts/local_music20_acr.py`,在 `acr-engine` 内提供基于 `/workspace/downloads` 的本地 20 首歌 ACR 小样本流程。 | ... | ... |
| 1 | # ACR Docs Overview | 1 | # ACR Docs Overview |
| 2 | 2 | ||
| 3 | > 更新:2026-06-02 | 3 | > 保留最新架构与最短落地入口。历史细节仍在仓库中,但默认阅读只保留下面 6 份主文档。 |
| 4 | 4 | ||
| 5 | ## 一页结论 | 5 | ## 最短阅读顺序 |
| 6 | 6 | ||
| 7 | 当前文档入口过多,现统一浓缩为 **5 组主文档**: | 7 | 1. [session-handoff.md](./session-handoff.md) |
| 8 | 2. [CHANGELOG.md](./CHANGELOG.md) | ||
| 9 | 3. [acr-architecture.md](./acr-architecture.md) | ||
| 10 | 4. [dataset-spec.md](./dataset-spec.md) | ||
| 11 | 5. [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md) | ||
| 12 | 6. [runbook.md](./runbook.md) | ||
| 8 | 13 | ||
| 9 | 1. **项目与架构** | 14 | ## 当前推荐只看这几类 |
| 10 | 2. **数据与评测** | ||
| 11 | 3. **业务数据接入** | ||
| 12 | 4. **服务与工程** | ||
| 13 | 5. **研究与路线** | ||
| 14 | 15 | ||
| 15 | 建议先只读这 5 组,不必一次看完全部细节文档。 | 16 | ### 1. 项目架构 |
| 17 | - [acr-architecture.md](./acr-architecture.md) | ||
| 18 | - [session-handoff.md](./session-handoff.md) | ||
| 16 | 19 | ||
| 17 | --- | 20 | ### 2. 数据与评测 |
| 21 | - [dataset-spec.md](./dataset-spec.md) | ||
| 22 | - [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md) | ||
| 23 | - [open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 18 | 24 | ||
| 19 | ## 1. 文档导航图 | 25 | ### 3. 运行与服务 |
| 26 | - [runbook.md](./runbook.md) | ||
| 27 | - [service-api.md](./service-api.md) | ||
| 20 | 28 | ||
| 21 | ```mermaid | 29 | ### 4. 最新 hard-case 结论 |
| 22 | flowchart TD | 30 | - [acr-hard-case-analysis.md](../acr-engine/../docs/acr-hard-case-analysis.md) |
| 23 | A[Docs Entry] --> B[Project Responsibility] | ||
| 24 | A --> C[Architecture] | ||
| 25 | A --> D[Dataset Spec] | ||
| 26 | A --> E[Business Export Chain] | ||
| 27 | A --> F[Service API] | ||
| 28 | A --> G[Industrial Benchmark] | ||
| 29 | A --> H[Industrialization Roadmap] | ||
| 30 | A --> I[Licensing & Sources] | ||
| 31 | A --> J[SOTA Research] | ||
| 32 | 31 | ||
| 33 | B --> C | 32 | ## 当前架构一句话 |
| 34 | C --> D | ||
| 35 | D --> E | ||
| 36 | E --> F | ||
| 37 | G --> H | ||
| 38 | I --> H | ||
| 39 | J --> H | ||
| 40 | ``` | ||
| 41 | 33 | ||
| 42 | --- | 34 | - `/workspace`:样本与素材来源 |
| 43 | 35 | - `acr-engine/`:训练、索引、识别、服务主工程 | |
| 44 | ## 2. 浓缩阅读入口 | 36 | - 本地小样本验证:优先 **FAISS** |
| 45 | 37 | - 生产向量检索:统一 **pgvector** | |
| 46 | | 读者角色 | 建议先读 | | ||
| 47 | |---|---| | ||
| 48 | | 新成员 | [项目与架构](./project-responsibility-map.md), [系统架构](./acr-architecture.md) | | ||
| 49 | | 算法/模型 | [数据规范](./dataset-spec.md), [SOTA 调研](./sota-research-2026.md) | | ||
| 50 | | 平台/后端 | [服务接口](./service-api.md), [评测规范](./industrial-benchmark-spec.md) | | ||
| 51 | | 数据接入 | [开放数据工作流](./open-dataset-workflow.md), [业务导出 Cookbook](./business-export-cookbook.md) | | ||
| 52 | | 负责人/规划 | [工业化路线](./industrialization-roadmap.md), [交接文档](./session-handoff.md) | | ||
| 53 | |||
| 54 | --- | ||
| 55 | |||
| 56 | ## 2.5 新 session 最短阅读顺序 | ||
| 57 | |||
| 58 | 如果是新 session 接手,建议直接按这个顺序: | ||
| 59 | |||
| 60 | 1. [持续开发交接文档](./session-handoff.md) | ||
| 61 | 2. [更新记录](./CHANGELOG.md) | ||
| 62 | 3. [业务导出 Cookbook](./business-export-cookbook.md) 或 [开放数据工作流](./open-dataset-workflow.md) | ||
| 63 | |||
| 64 | 选择规则: | ||
| 65 | - 做你们自己的业务素材接入:先读 `business-export-cookbook.md` | ||
| 66 | - 做 FMA / MTG-Jamendo 这类开放数据:先读 `open-dataset-workflow.md` | ||
| 67 | |||
| 68 | ## 2.6 新 session 最短可跑命令 | ||
| 69 | |||
| 70 | 如果你只是想先确认“业务导出链还能不能跑”,直接执行: | ||
| 71 | |||
| 72 | ```bash | ||
| 73 | cd /workspace/acr-engine | ||
| 74 | /usr/local/miniconda3/bin/python scripts/business_export_offline_smoke.py \ | ||
| 75 | --output-root /tmp/business_export_offline_smoke | ||
| 76 | ``` | ||
| 77 | |||
| 78 | 预期结果: | ||
| 79 | - 生成业务导出样例 | ||
| 80 | - 生成 manifest-ready JSONL | ||
| 81 | - 生成项目 `catalog/train/test/val` | ||
| 82 | - `train.py --dry-run` 通过 | ||
| 83 | |||
| 84 | ## 3. 主文档分组 | ||
| 85 | |||
| 86 | ### A. 项目与架构 | ||
| 87 | - [项目职责图](./project-responsibility-map.md) | ||
| 88 | - [系统架构](./acr-architecture.md) | ||
| 89 | |||
| 90 | ### B. 数据与评测 | ||
| 91 | - [数据规范](./dataset-spec.md) | ||
| 92 | - [开放数据工作流](./open-dataset-workflow.md) | ||
| 93 | - [训练数据与 pgvector 指南](./training-data-and-pgvector-guide.md) | ||
| 94 | - [生产 Encoder 冻结与 Embedding 策略答疑](./production-encoder-freeze-and-embedding-strategy.md) | ||
| 95 | - [数据来源与接入](./dataset-sources-and-licensing.md) | ||
| 96 | - [工业评测规范](./industrial-benchmark-spec.md) | ||
| 97 | |||
| 98 | 快速落地入口: | ||
| 99 | - [开放数据工作流](./open-dataset-workflow.md) | ||
| 100 | - [本地开放数据落点目录](../acr-engine/data/raw/README.md) | ||
| 101 | - 离线 smoke 已验证:`acr-engine/scripts/business_export_offline_smoke.py` | ||
| 102 | |||
| 103 | ### C. 业务数据接入 | ||
| 104 | - [业务素材类型与 Bucket 指南](./business-music-bucket-and-type-guide.md) | ||
| 105 | - [业务 Manifest 与 Type-Role 规范](./business-manifest-and-type-role-spec.md) | ||
| 106 | - [业务导出 Cookbook](./business-export-cookbook.md) | ||
| 107 | - [业务数据到项目 Manifest 适配](./business-project-manifest-adapter.md) | ||
| 108 | |||
| 109 | 业务数据最短链: | ||
| 110 | 1. [业务导出 Cookbook](./business-export-cookbook.md) | ||
| 111 | 2. `acr-engine/scripts/normalize_business_export.py` | ||
| 112 | 3. `acr-engine/scripts/split_business_manifest_ready.py` | ||
| 113 | 4. `acr-engine/scripts/build_business_project_manifests.py` | ||
| 114 | 5. `acr-engine/scripts/business_export_offline_smoke.py` | ||
| 115 | |||
| 116 | ### D. 服务与工程 | ||
| 117 | - [服务接口](./service-api.md) | ||
| 118 | - [持续开发交接文档](./session-handoff.md) | ||
| 119 | - [当前能力地图](./current-capability-map.md) | ||
| 120 | - [首次启动检查清单](../acr-engine/FIRST_RUN_CHECKLIST.md) | ||
| 121 | - [更新记录](./CHANGELOG.md) | ||
| 122 | |||
| 123 | ### E. 研究与路线 | ||
| 124 | - [工业化路线](./industrialization-roadmap.md) | ||
| 125 | - [SOTA 调研](./sota-research-2026.md) | ||
| 126 | - [引用来源总表](./references-and-sources.md) | ||
| 127 | |||
| 128 | --- | ||
| 129 | |||
| 130 | ## 4. 文字说明 | ||
| 131 | |||
| 132 | 现在开始减少“同层重复文档”的阅读成本: | ||
| 133 | - 先从入口页做分组 | ||
| 134 | - 再在每组里保留 1~3 份主文档 | ||
| 135 | - 次级细节尽量放到组内,而不是继续横向扩张文件数量 | ||
| 136 | |||
| 137 | --- | ||
| 138 | |||
| 139 | ## 5. 细节附录 | ||
| 140 | |||
| 141 | 建议使用方式: | ||
| 142 | - 想了解项目先读 [项目职责图](./project-responsibility-map.md) + [系统架构](./acr-architecture.md) | ||
| 143 | - 想训练/评测先读 [数据规范](./dataset-spec.md) | ||
| 144 | - 想接开放数据先读 [数据来源与接入](./dataset-sources-and-licensing.md) | ||
| 145 | - 想看历史演进再读 [更新记录](./CHANGELOG.md) | ||
| 146 | |||
| 147 | ## Sources | ||
| 148 | - This file is an internal documentation navigation artifact for the current repo state. | ... | ... |
-
Please register or sign in to post a comment