Commit cc263571 cc2635716524efcb73a357b25f4f11b1d5e2b60b by cnb.bofCdSsphPA

Expose smoke device control before scaling real-data runs

Constraint: Real FMA smoke is already running on CPU, but future smoke runs must be able to target GPU without manually splitting the pipeline
Rejected: Pass through raw 'auto' everywhere | run_demo/evaluate embedder paths cannot consume torch.device('auto') safely
Confidence: high
Scope-risk: narrow
Directive: Keep smoke orchestration device handling normalized at the adapter boundary unless all downstream CLIs gain native auto-device support
Tested: smoke-local --help shows --device; resolve_device('auto') returns cpu on this host; smoke-local synthetic run prints Device: cpu; manual build-index and evaluate succeed on smoke artifacts with top1=1.0 topk=1.0
Not-tested: End-to-end smoke-local completion on the long-running real FMA job and a live CUDA host path
1 parent fa7f5f57
......@@ -8,6 +8,7 @@ from typing import Dict, List
import argparse
import json
import subprocess
import torch
AUDIO_EXTS = (".wav", ".mp3", ".flac", ".ogg")
......@@ -15,6 +16,12 @@ MIN_SMOKE_AUDIO_FILES = 2
MIN_SMOKE_ELIGIBLE_QUERY_FILES = 2
def resolve_device(device: str) -> str:
if device == "auto":
return "cuda" if torch.cuda.is_available() else "cpu"
return device
@dataclass
class DatasetRecord:
name: str
......@@ -306,6 +313,7 @@ def smoke_local_dataset(
seed: int,
train_epochs: int,
batch_size: int,
device: str,
) -> Dict:
readiness = assess_local_dataset_ready(
dataset,
......@@ -321,6 +329,7 @@ def smoke_local_dataset(
}, indent=2, ensure_ascii=False))
adapter = ADAPTERS[dataset]
resolved_device = resolve_device(device)
inspect_summary = readiness["inspect"]
prepare_summary = adapter.prepare_local_audio(
input_dir,
......@@ -342,7 +351,7 @@ def smoke_local_dataset(
"train.py",
"--data", str(manifests_dir),
"--output", str(model_dir),
"--device", "cpu",
"--device", resolved_device,
"--epochs", str(train_epochs),
"--batch-size", str(batch_size),
], check=True)
......@@ -354,7 +363,7 @@ def smoke_local_dataset(
"--data", str(manifests_dir),
"--model", str(model_dir / "best_model.pt"),
"--output", str(index_dir),
"--device", "cpu",
"--device", resolved_device,
], check=True)
report_dir.mkdir(parents=True, exist_ok=True)
......@@ -366,7 +375,7 @@ def smoke_local_dataset(
"--model", str(model_dir / "best_model.pt"),
"--index-prefix", str(index_dir / "reference"),
"--split", "test",
"--device", "cpu",
"--device", resolved_device,
"--fast-eval",
"--output-json", str(eval_json),
], check=True)
......@@ -377,6 +386,8 @@ def smoke_local_dataset(
"run": {
"train_epochs": train_epochs,
"batch_size": batch_size,
"requested_device": device,
"resolved_device": resolved_device,
},
}
report_dir.mkdir(parents=True, exist_ok=True)
......@@ -398,6 +409,8 @@ def smoke_local_dataset(
"inspect": inspect_summary,
"prepare": prepare_summary,
"validate": validate_summary,
"requested_device": device,
"resolved_device": resolved_device,
"model_dir": str(model_dir),
"index_dir": str(index_dir),
"report_dir": str(report_dir),
......@@ -457,6 +470,7 @@ def main():
p.add_argument("--seed", type=int, default=42)
p.add_argument("--train-epochs", type=int, default=1)
p.add_argument("--batch-size", type=int, default=2)
p.add_argument("--device", default="cpu")
args = parser.parse_args()
if args.cmd == "registry":
......@@ -508,6 +522,7 @@ def main():
seed=args.seed,
train_epochs=args.train_epochs,
batch_size=args.batch_size,
device=args.device,
)
print(json.dumps(summary, indent=2, ensure_ascii=False))
......
......@@ -2,6 +2,35 @@
## 2026-06-02
### Stage: 让 smoke-local 支持显式设备选择并验证 auto 设备解析
完成项:
- 修改 `acr-engine/src/data/external_adapters.py`,为 `smoke-local` 增加 `--device`
- 增加 `auto -> cpu/cuda` 的内部解析,避免把字符串 `auto` 直接传给 embedding / eval 侧
- 将训练、建索引、评测三个子命令统一改为透传解析后的设备
- 在 smoke 配置摘要中记录 `requested_device``resolved_device`
- 同步更新 [open-dataset-workflow.md](./open-dataset-workflow.md)[training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md)
验证结果:
- CLI 验证:
- `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local --help` 已出现 `--device DEVICE`
- 最小链路验证:
- 使用 `data/synthetic_v2/songs` 运行 `smoke-local ... --device auto`
- 训练阶段输出 `Device: cpu`,说明 `auto` 已被正确解析
- 随后手动验证后半段命令可正常运行:
- `run_demo.py build-index --device cpu`
- `evaluate.py --device cpu`
- `evaluate.py` 返回:
- `top1=1.0`
- `topk=1.0`
- 真实 FMA 状态复检:
- 真实 FMA smoke 主进程仍存活
- 当前子进程停留在 `run_demo.py build-index ... --device cpu`
结论:
- 现在 `smoke-local` 已具备 GPU/CPU/auto 设备入口,可直接用于后续真实数据 GPU smoke
- 同时也暴露出新的后续任务:真实 FMA smoke 的后半段索引/产物生成仍需继续观察与优化
### Stage: 补齐训练数据、重叠窗口、GPU 与 FMA 数据处理文档
......
......@@ -85,6 +85,7 @@ flowchart LR
```bash
/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 --device auto
```
真实目录放置位置可参考:
......@@ -128,6 +129,8 @@ flowchart LR
- `release-checklist.md`
- `smoke-local`
- 会一次性返回 inspect / prepare / validate / report 路径摘要
- 现在支持 `--device cpu|cuda|auto`
- `auto` 会在 smoke 内部解析成实际设备,避免把字符串 `auto` 直接传给 embedding/eval 侧
---
......
......@@ -9,7 +9,7 @@
1. **当前训练输入的最小单位是“带 `song_id` 的 query 样本 + reference 资产 + manifest”**,不是直接把 3 分钟 mp3 整批扔进模型。
2. **3 分钟 mp3 当前在训练端通常不是预切全量重叠窗口,而是运行时随机裁 5s;检索端才是重叠滑窗。**
3. **如果有 GPU,FMA 这类真实数据训练会明显加速,当前 `train.py` 已支持 `auto/cuda`,但 `smoke-local` 现在仍硬编码为 CPU。**
3. **如果有 GPU,FMA 这类真实数据训练会明显加速;当前 `train.py` 支持 `auto/cuda`,`smoke-local` 也已支持 `--device cpu|cuda|auto`,其中 `auto` 会在 smoke 内部解析成实际设备。**
4. **FMA、MTG-Jamendo、自有 BGM/录音都应先变成统一 manifest,再做训练、评测和 pgvector 入库。**
5. **后续你们要扩自己的数据集时,最重要的不是文件后缀,而是 `song_id / type / offset / source_dataset / split` 这些结构化字段。**
......@@ -242,18 +242,21 @@ cd /workspace/acr-engine
|---|---|
| `train.py` | 支持 `--device auto/cuda/cpu` |
| CUDA mixed precision | 已支持 |
| `smoke-local` | **当前硬编码 `--device cpu`** |
| `smoke-local` | 现已支持 `--device cpu|cuda|auto` |
| `evaluate.py` | 当前 CLI 默认 `cpu` |
| `run_demo.py build-index` | 当前 smoke 里也走 `cpu` |
### 当前要注意的一点
`smoke-local` 虽然支持真实数据,但它现在为了稳妥把:
- 训练
- 建索引
- 评测
`smoke-local` 现在已经支持显式设备选择,但有一个实现细节必须明确:
- `train.py` 可以直接理解 `auto`
- `run_demo.py / evaluate.py` 的 embedding 侧不能直接吃字符串 `auto`
都固定到了 **CPU**。所以如果你想真正在本机上加速真实 FMA 训练,后续应该继续把 `smoke-local` 的 device 变成可配置项。
所以当前 `smoke-local` 的做法是:
- 对外允许传 `--device auto`
- 对内先解析成真实设备,再分发给训练 / 建索引 / 评测
这让真实数据 smoke 可以直接复用 GPU,而不需要手工拆成多段命令。
---
......