Commit 46b9d8d4 46b9d8d40b465cffc6d512639e3fe33de04779da by cnb.bofCdSsphPA

Reduce the handoff gap between archive completion and real-data readiness

Constraint: Once the large FMA archive finishes, future sessions should not need to manually stitch extraction and readiness checks together
Rejected: Leave post-download steps as manual shell sequences | Increases delay and error risk at the most valuable transition point
Confidence: high
Scope-risk: narrow
Directive: Keep fma_postdownload_ready.py as the canonical first command after archive completion before attempting real-data smoke runs
Tested: /usr/local/miniconda3/bin/python -m py_compile acr-engine/scripts/fma_postdownload_ready.py; /usr/local/miniconda3/bin/python acr-engine/scripts/fma_postdownload_ready.py
Not-tested: Successful extract and readiness on the full archive remain pending completion of the download
1 parent 44bbfcb5
1 #!/usr/bin/env python3
2 """Post-download automation for the real FMA small archive.
3
4 Runs extract + readiness checks once the archive is complete.
5 If the archive is incomplete, exits with a structured blocked result.
6 """
7
8 from __future__ import annotations
9
10 import json
11 import subprocess
12 from pathlib import Path
13
14 PYTHON = "/usr/local/miniconda3/bin/python"
15 EXPECTED_BYTES = 7679594875
16 ARCHIVE = Path("data/raw/fma_small.zip")
17 EXTRACT_DIR = Path("data/raw/fma_small_audio")
18
19
20 def run_json(cmd: list[str]) -> dict:
21 out = subprocess.check_output(cmd, text=True)
22 return json.loads(out)
23
24
25 def main():
26 archive_exists = ARCHIVE.exists()
27 archive_size = ARCHIVE.stat().st_size if archive_exists else 0
28 if (not archive_exists) or archive_size < EXPECTED_BYTES:
29 print(json.dumps({
30 "status": "blocked",
31 "reason": "archive_not_complete",
32 "archive_exists": archive_exists,
33 "archive_size": archive_size,
34 "expected_bytes": EXPECTED_BYTES,
35 "progress_percent": round((archive_size / EXPECTED_BYTES) * 100, 4) if archive_exists else 0.0,
36 }, indent=2, ensure_ascii=False))
37 return
38
39 extract = run_json([PYTHON, "scripts/prepare_fma_archive.py", "extract"])
40 ready = run_json([PYTHON, "src/data/external_adapters.py", "check-local-ready", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"])
41 inspect = run_json([PYTHON, "src/data/external_adapters.py", "inspect-local", "fma", str(EXTRACT_DIR), "--eval-ratio", "0.2", "--query-duration", "8.0"])
42
43 print(json.dumps({
44 "status": "ok",
45 "extract": extract,
46 "ready": ready,
47 "inspect": inspect,
48 }, indent=2, ensure_ascii=False))
49
50
51 if __name__ == "__main__":
52 main()
...@@ -236,6 +236,30 @@ ...@@ -236,6 +236,30 @@
236 236
237 237
238 238
239
240 ### Stage: FMA 下载完成后自动就绪
241
242 完成项:
243 - 新增 [acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)
244 - 在 FMA 整包完成后,可自动执行:
245 - `extract`
246 - `check-local-ready`
247 - `inspect-local`
248 - 将该脚本挂接到 [docs/open-dataset-workflow.md](./open-dataset-workflow.md)[docs/session-handoff.md](./session-handoff.md)
249
250 验证结果:
251 - `/usr/local/miniconda3/bin/python -m py_compile scripts/fma_postdownload_ready.py` 成功
252 - `/usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py` 成功返回结构化结果
253 - 当前结果:
254 - `status=blocked`
255 - `reason=archive_not_complete`
256 - `archive_size=1631584256`
257 - `progress_percent=21.2457`
258
259 结论:
260 - 真实 FMA 下载一旦完成,仓库已经具备单命令进入“解压 + 本地就绪检查”的能力
261 - 后续 session 不需要再手工串接这些步骤
262
239 ### Stage: pgvector bulk load plan 模板 263 ### Stage: pgvector bulk load plan 模板
240 264
241 完成项: 265 完成项:
......
...@@ -131,6 +131,22 @@ flowchart LR ...@@ -131,6 +131,22 @@ flowchart LR
131 131
132 --- 132 ---
133 133
134
135 ### FMA 下载完成后的单条准备命令
136
137 ```bash
138 cd acr-engine
139 /usr/local/miniconda3/bin/python scripts/fma_postdownload_ready.py
140 ```
141
142 这个脚本会在归档完整时自动执行:
143
144 1. `extract`
145 2. `check-local-ready`
146 3. `inspect-local`
147
148 如果归档还没下完,会返回结构化 `archive_not_complete`
149
134 ## Sources 150 ## Sources
135 - See [dataset-spec.md](./dataset-spec.md) 151 - See [dataset-spec.md](./dataset-spec.md)
136 - See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md) 152 - See [dataset-sources-and-licensing.md](./dataset-sources-and-licensing.md)
......
...@@ -317,3 +317,5 @@ ...@@ -317,3 +317,5 @@
317 - [README.md](./README.md) 317 - [README.md](./README.md)
318 - [open-dataset-workflow.md](./open-dataset-workflow.md) 318 - [open-dataset-workflow.md](./open-dataset-workflow.md)
319 - [CHANGELOG.md](./CHANGELOG.md) 319 - [CHANGELOG.md](./CHANGELOG.md)
320
321 - FMA 下载完成后可直接执行:[acr-engine/scripts/fma_postdownload_ready.py](../acr-engine/scripts/fma_postdownload_ready.py)
......