Expose smoke device control before scaling real-data runs

Constraint: Real FMA smoke is already running on CPU, but future smoke runs must be able to target GPU without manually splitting the pipeline Rejected: Pass through raw 'auto' everywhere | run_demo/evaluate embedder paths cannot consume torch.device('auto') safely Confidence: high Scope-risk: narrow Directive: Keep smoke orchestration device handling normalized at the adapter boundary unless all downstream CLIs gain native auto-device support Tested: smoke-local --help shows --device; resolve_device('auto') returns cpu on this host; smoke-local synthetic run prints Device: cpu; manual build-index and evaluate succeed on smoke artifacts with top1=1.0 topk=1.0 Not-tested: End-to-end smoke-local completion on the long-running real FMA job and a live CUDA host path

Expose smoke device control before scaling real-data runs
Constraint: Real FMA smoke is already running on CPU, but future smoke runs must be able to target GPU without manually splitting the pipeline Rejected: Pass through raw 'auto' everywhere | run_demo/evaluate embedder paths cannot consume torch.device('auto') safely Confidence: high Scope-risk: narrow Directive: Keep smoke orchestration device handling normalized at the adapter boundary unless all downstream CLIs gain native auto-device support Tested: smoke-local --help shows --device; resolve_device('auto') returns cpu on this host; smoke-local synthetic run prints Device: cpu; manual build-index and evaluate succeed on smoke artifacts with top1=1.0 topk=1.0 Not-tested: End-to-end smoke-local completion on the long-running real FMA job and a live CUDA host path
cnb.bofCdSsphPA
Commit cc263571 ... cc2635716524efcb73a357b25f4f11b1d5e2b60b authored 2026-06-02 15:03:12 +0800 by cnb.bofCdSsphPA
Showing 4 changed files with 60 additions and 10 deletions
acr-engine/src/data/external_adapters.py
docs/CHANGELOG.md
docs/open-dataset-workflow.md
docs/training-data-and-pgvector-guide.md
--- a/acr-engine/src/data/external_adapters.py
View file @cc26357
+++ b/acr-engine/src/data/external_adapters.py
View file @cc26357
@@ -8,6 +8,7 @@ from typing import Dict, List
 import argparse
 import json
 import subprocess
+import torch


 AUDIO_EXTS = (".wav", ".mp3", ".flac", ".ogg")
@@ -15,6 +16,12 @@ MIN_SMOKE_AUDIO_FILES = 2
 MIN_SMOKE_ELIGIBLE_QUERY_FILES = 2


+def resolve_device(device: str) -> str:
+    if device == "auto":
+        return "cuda" if torch.cuda.is_available() else "cpu"
+    return device
+
+
 @dataclass
 class DatasetRecord:
    name: str
@@ -306,6 +313,7 @@ def smoke_local_dataset(
    seed: int,
    train_epochs: int,
    batch_size: int,
+    device: str,
 ) -> Dict:
    readiness = assess_local_dataset_ready(
        dataset,
@@ -321,6 +329,7 @@ def smoke_local_dataset(
        }, indent=2, ensure_ascii=False))

    adapter = ADAPTERS[dataset]
+    resolved_device = resolve_device(device)
    inspect_summary = readiness["inspect"]
    prepare_summary = adapter.prepare_local_audio(
        input_dir,
@@ -342,7 +351,7 @@ def smoke_local_dataset(
        "train.py",
        "--data", str(manifests_dir),
        "--output", str(model_dir),
-        "--device", "cpu",
+        "--device", resolved_device,
        "--epochs", str(train_epochs),
        "--batch-size", str(batch_size),
    ], check=True)
@@ -354,7 +363,7 @@ def smoke_local_dataset(
        "--data", str(manifests_dir),
        "--model", str(model_dir / "best_model.pt"),
        "--output", str(index_dir),
-        "--device", "cpu",
+        "--device", resolved_device,
    ], check=True)

    report_dir.mkdir(parents=True, exist_ok=True)
@@ -366,7 +375,7 @@ def smoke_local_dataset(
        "--model", str(model_dir / "best_model.pt"),
        "--index-prefix", str(index_dir / "reference"),
        "--split", "test",
-        "--device", "cpu",
+        "--device", resolved_device,
        "--fast-eval",
        "--output-json", str(eval_json),
    ], check=True)
@@ -377,6 +386,8 @@ def smoke_local_dataset(
        "run": {
            "train_epochs": train_epochs,
            "batch_size": batch_size,
+            "requested_device": device,
+            "resolved_device": resolved_device,
        },
    }
    report_dir.mkdir(parents=True, exist_ok=True)
@@ -398,6 +409,8 @@ def smoke_local_dataset(
        "inspect": inspect_summary,
        "prepare": prepare_summary,
        "validate": validate_summary,
+        "requested_device": device,
+        "resolved_device": resolved_device,
        "model_dir": str(model_dir),
        "index_dir": str(index_dir),
        "report_dir": str(report_dir),
@@ -457,6 +470,7 @@ def main():
    p.add_argument("--seed", type=int, default=42)
    p.add_argument("--train-epochs", type=int, default=1)
    p.add_argument("--batch-size", type=int, default=2)
+    p.add_argument("--device", default="cpu")

    args = parser.parse_args()
    if args.cmd == "registry":
@@ -508,6 +522,7 @@ def main():
            seed=args.seed,
            train_epochs=args.train_epochs,
            batch_size=args.batch_size,
+            device=args.device,
        )
        print(json.dumps(summary, indent=2, ensure_ascii=False))

--- a/docs/CHANGELOG.md
View file @cc26357
+++ b/docs/CHANGELOG.md
View file @cc26357
@@ -2,6 +2,35 @@

 ## 2026-06-02

+### Stage: 让 smoke-local 支持显式设备选择并验证 auto 设备解析
+
+完成项：
+- 修改 `acr-engine/src/data/external_adapters.py`，为 `smoke-local` 增加 `--device`
+- 增加 `auto -> cpu/cuda` 的内部解析，避免把字符串 `auto` 直接传给 embedding / eval 侧
+- 将训练、建索引、评测三个子命令统一改为透传解析后的设备
+- 在 smoke 配置摘要中记录 `requested_device` 与 `resolved_device`
+- 同步更新 [open-dataset-workflow.md](./open-dataset-workflow.md) 与 [training-data-and-pgvector-guide.md](./training-data-and-pgvector-guide.md)
+
+验证结果：
+- CLI 验证：
+  - `/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local --help` 已出现 `--device DEVICE`
+- 最小链路验证：
+  - 使用 `data/synthetic_v2/songs` 运行 `smoke-local ... --device auto`
+  - 训练阶段输出 `Device: cpu`，说明 `auto` 已被正确解析
+  - 随后手动验证后半段命令可正常运行：
+    - `run_demo.py build-index --device cpu`
+    - `evaluate.py --device cpu`
+  - `evaluate.py` 返回：
+    - `top1=1.0`
+    - `topk=1.0`
+- 真实 FMA 状态复检：
+  - 真实 FMA smoke 主进程仍存活
+  - 当前子进程停留在 `run_demo.py build-index ... --device cpu`
+
+结论：
+- 现在 `smoke-local` 已具备 GPU/CPU/auto 设备入口，可直接用于后续真实数据 GPU smoke
+- 同时也暴露出新的后续任务：真实 FMA smoke 的后半段索引/产物生成仍需继续观察与优化
+

 ### Stage: 补齐训练数据、重叠窗口、GPU 与 FMA 数据处理文档

--- a/docs/open-dataset-workflow.md
View file @cc26357
+++ b/docs/open-dataset-workflow.md
View file @cc26357
@@ -85,6 +85,7 @@ flowchart LR

 ```bash
 /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2
+/usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 --device auto
 ```

 真实目录放置位置可参考：
@@ -128,6 +129,8 @@ flowchart LR
  - `release-checklist.md`
 - `smoke-local`：
  - 会一次性返回 inspect / prepare / validate / report 路径摘要
+  - 现在支持 `--device cpu|cuda|auto`
+  - `auto` 会在 smoke 内部解析成实际设备，避免把字符串 `auto` 直接传给 embedding/eval 侧

 ---

--- a/docs/training-data-and-pgvector-guide.md
View file @cc26357
+++ b/docs/training-data-and-pgvector-guide.md
View file @cc26357
@@ -9,7 +9,7 @@

 1. **当前训练输入的最小单位是“带 `song_id` 的 query 样本 + reference 资产 + manifest”**，不是直接把 3 分钟 mp3 整批扔进模型。
 2. **3 分钟 mp3 当前在训练端通常不是预切全量重叠窗口，而是运行时随机裁 5s；检索端才是重叠滑窗。**
-3. **如果有 GPU，FMA 这类真实数据训练会明显加速，当前 `train.py` 已支持 `auto/cuda`，但 `smoke-local` 现在仍硬编码为 CPU。**
+3. **如果有 GPU，FMA 这类真实数据训练会明显加速；当前 `train.py` 支持 `auto/cuda`，`smoke-local` 也已支持 `--device cpu|cuda|auto`，其中 `auto` 会在 smoke 内部解析成实际设备。**
 4. **FMA、MTG-Jamendo、自有 BGM/录音都应先变成统一 manifest，再做训练、评测和 pgvector 入库。**
 5. **后续你们要扩自己的数据集时，最重要的不是文件后缀，而是 `song_id / type / offset / source_dataset / split` 这些结构化字段。**

@@ -242,18 +242,21 @@ cd /workspace/acr-engine
 |---|---|
 | `train.py` | 支持 `--device auto/cuda/cpu` |
 | CUDA mixed precision | 已支持 |
-| `smoke-local` | **当前硬编码 `--device cpu`** |
+| `smoke-local` | 现已支持 `--device cpu|cuda|auto` |
 | `evaluate.py` | 当前 CLI 默认 `cpu` |
 | `run_demo.py build-index` | 当前 smoke 里也走 `cpu` |

 ### 当前要注意的一点

-`smoke-local` 虽然支持真实数据，但它现在为了稳妥把：
- 训练
- 建索引
- 评测
+`smoke-local` 现在已经支持显式设备选择，但有一个实现细节必须明确：
+- `train.py` 可以直接理解 `auto`
+- `run_demo.py / evaluate.py` 的 embedding 侧不能直接吃字符串 `auto`

-都固定到了 **CPU**。所以如果你想真正在本机上加速真实 FMA 训练，后续应该继续把 `smoke-local` 的 device 变成可配置项。
+所以当前 `smoke-local` 的做法是：
+- 对外允许传 `--device auto`
+- 对内先解析成真实设备，再分发给训练 / 建索引 / 评测
+
+这让真实数据 smoke 可以直接复用 GPU，而不需要手工拆成多段命令。

 ---