Reduce the handoff gap between archive completion and real-data readiness

Constraint: Once the large FMA archive finishes, future sessions should not need to manually stitch extraction and readiness checks together Rejected: Leave post-download steps as manual shell sequences | Increases delay and error risk at the most valuable transition point Confidence: high Scope-risk: narrow Directive: Keep fma_postdownload_ready.py as the canonical first command after archive completion before attempting real-data smoke runs Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fma_postdownload_ready.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fma_postdownload_ready.py Not-tested: Successful extract and readiness on the full archive remain pending completion of the download

Reduce the handoff gap between archive completion and real-data readiness
Constraint: Once the large FMA archive finishes, future sessions should not need to manually stitch extraction and readiness checks together Rejected: Leave post-download steps as manual shell sequences | Increases delay and error risk at the most valuable transition point Confidence: high Scope-risk: narrow Directive: Keep fma_postdownload_ready.py as the canonical first command after archive completion before attempting real-data smoke runs Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fma_postdownload_ready.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fma_postdownload_ready.py Not-tested: Successful extract and readiness on the full archive remain pending completion of the download
cnb.bofCdSsphPA
Commit 46b9d8d4 ... 46b9d8d40b465cffc6d512639e3fe33de04779da authored 2026-06-02 13:54:06 +0800 by cnb.bofCdSsphPA
Showing 4 changed files with 94 additions and 0 deletions
acr-engine/scripts/fma_postdownload_ready.py
docs/CHANGELOG.md
docs/open-dataset-workflow.md
docs/session-handoff.md
--- a/acr-engine/scripts/fma_postdownload_ready.py 0 → 100755
View file @46b9d8d
+++ b/acr-engine/scripts/fma_postdownload_ready.py 0 → 100755
View file @46b9d8d
+#!/usr/bin/env python3
+"""Post-download automation for the real FMA small archive.
+
+Runs extract + readiness checks once the archive is complete.
+If the archive is incomplete, exits with a structured blocked result.
+"""
+
+from __future__ import annotations
+
+import json
+import subprocess
+from pathlib import Path
+
+PYTHON = "/usr/local/miniconda3/bin/python"
+EXPECTED_BYTES = 7679594875
+ARCHIVE = Path("data/raw/fma_small.zip")
+EXTRACT_DIR = Path("data/raw/fma_small_audio")
+
+
+def run_json(cmd: list[str]) -> dict:
+    out = subprocess.check_output(cmd, text=True)
+    return json.loads(out)
+
+
+def main():
+    archive_exists = ARCHIVE.exists()
+    archive_size = ARCHIVE.stat().st_size if archive_exists else 0
+    if (not archive_exists) or archive_size < EXPECTED_BYTES:
+        print(json.dumps({
+            "status": "blocked",
+            "reason": "archive_not_complete",
+            "archive_exists": archive_exists,
+            "archive_size": archive_size,
+            "expected_bytes": EXPECTED_BYTES,
+            "progress_percent": round((archive_size / EXPECTED_BYTES) * 100, 4) if archive_exists else 0.0,
+        }, indent=2, ensure_ascii=False))
+        return
+
+    extract = run_json([PYTHON, "scripts/prepare_fma_archive.py", "extract"])
+    ready = run_json([PYTHON, "src/data/external_adapters.py", "check-local-ready", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"])
+    inspect = run_json([PYTHON, "src/data/external_adapters.py", "inspect-local", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"])
+
+    print(json.dumps({
+        "status": "ok",
+        "extract": extract,
+        "ready": ready,
+        "inspect": inspect,
+    }, indent=2, ensure_ascii=False))
+
+
+if __name__ == "__main__":
+    main()
--- a/docs/CHANGELOG.md
View file @46b9d8d
+++ b/docs/CHANGELOG.md
View file @46b9d8d
@@ -236,6 +236,30 @@



+
+### Stage: FMA 下载完成后自动就绪
+
+完成项：
+- 新增 [acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)
+- 在 FMA 整包完成后，可自动执行：
+  - `extract`
+  - `check-local-ready`
+  - `inspect-local`
+- 将该脚本挂接到 [docs/open-dataset-workflow.md](./open-dataset-workflow.md) 与 [docs/session-handoff.md](./session-handoff.md)
+
+验证结果：
+- `/usr/local/miniconda3/bin/python -m py_compile scripts/fma_postdownload_ready.py` 成功
+- `/usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py` 成功返回结构化结果
+- 当前结果：
+  - `status=blocked`
+  - `reason=archive_not_complete`
+  - `archive_size=1631584256`
+  - `progress_percent=21.2457`
+
+结论：
+- 真实 FMA 下载一旦完成，仓库已经具备单命令进入“解压 + 本地就绪检查”的能力
+- 后续 session 不需要再手工串接这些步骤
+
 ### Stage: pgvector bulk load plan 模板

 完成项：
--- a/docs/open-dataset-workflow.md
View file @46b9d8d
+++ b/docs/open-dataset-workflow.md
View file @46b9d8d
@@ -131,6 +131,22 @@ flowchart LR

 ---

+
+### FMA 下载完成后的单条准备命令
+
+```bash
+cd acr-engine
+/usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py
+```
+
+这个脚本会在归档完整时自动执行：
+
+1. `extract`
+2. `check-local-ready`
+3. `inspect-local`
+
+如果归档还没下完，会返回结构化 `archive_not_complete`。
+
 ## Sources
 - See [dataset-spec.md](./dataset-spec.md)
 - See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md)
--- a/docs/session-handoff.md
View file @46b9d8d
+++ b/docs/session-handoff.md
View file @46b9d8d
@@ -317,3 +317,5 @@
 - [README.md](./README.md)
 - [open-dataset-workflow.md](./open-dataset-workflow.md)
 - [CHANGELOG.md](./CHANGELOG.md)
+
+- FMA 下载完成后可直接执行：[acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)