Commit d1d7a512 d1d7a512c67befb32efac74a0ae54a6ac09580e2 by cnb.bofCdSsphPA

Align real FMA ingestion with the user-provided ModelScope source

Constraint: The user supplied a verified archive URL that is a better current source of truth than the previously tested mirror path
Rejected: Keep the older archive URL as the default control surface | Would ignore fresher user evidence and split operational guidance across sources
Confidence: high
Scope-risk: narrow
Directive: Treat the ModelScope FMA archive URL as the primary default until a newer verified source supersedes it
Tested: curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; curl -L --range 0-1023 --max-time 60 -o /tmp/fma_modelscope_probe.bin https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect
Not-tested: Full archive completion, extraction, and downstream real-data smoke remain pending
1 parent 2ee3e829
...@@ -45,7 +45,7 @@ flowchart LR ...@@ -45,7 +45,7 @@ flowchart LR
45 45
46 ### What this script standardizes 46 ### What this script standardizes
47 47
48 - official source URL: `https://os.unil.cloud.switch.ch/fma/fma_small.zip` 48 - official source URL: `https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip`
49 - resumable archive download to `data/raw/fma_small.zip` 49 - resumable archive download to `data/raw/fma_small.zip`
50 - extraction target: `data/raw/fma_small_audio/` 50 - extraction target: `data/raw/fma_small_audio/`
51 51
......
...@@ -8,7 +8,7 @@ import json ...@@ -8,7 +8,7 @@ import json
8 import subprocess 8 import subprocess
9 from pathlib import Path 9 from pathlib import Path
10 10
11 FMA_SMALL_URL = "https://os.unil.cloud.switch.ch/fma/fma_small.zip" 11 FMA_SMALL_URL = "https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip"
12 ARCHIVE_PATH = Path("data/raw/fma_small.zip") 12 ARCHIVE_PATH = Path("data/raw/fma_small.zip")
13 EXTRACT_DIR = Path("data/raw/fma_small_audio") 13 EXTRACT_DIR = Path("data/raw/fma_small_audio")
14 14
......
...@@ -229,6 +229,34 @@ ...@@ -229,6 +229,34 @@
229 229
230 230
231 231
232
233 ### Stage: FMA 源切换到 ModelScope
234
235 完成项:
236 -[acr-engine/scripts/prepare_fma_archive.py](../acr-engine/scripts/prepare_fma_archive.py) 的默认 FMA 整包源切换到用户提供的 ModelScope 地址
237 - 同步更新:
238 - [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md)
239 - [docs/open-dataset-workflow.md](./open-dataset-workflow.md)
240 - [docs/session-handoff.md](./session-handoff.md)
241 - 通过 repo 内脚本重新启动托管下载流程
242
243 验证结果:
244 - `curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 成功
245 - 当前响应关键信息:
246 - `200`
247 - `content-length=7679594875`
248 - `accept-ranges: bytes`
249 - `curl -L --range 0-1023 ...` 成功获取 `1024` bytes
250 - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 成功
251 - 当前结果:
252 - `archive_url=https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip`
253 - `archive_size=53620736`
254 - 托管下载进程存在:`prepare_fma_archive.py download`
255
256 结论:
257 - 真实 FMA 下载现在已切换到用户指定的 ModelScope 通道
258 - 下载控制面也已统一回 repo 内脚本,后续 session 更容易续传与接力
259
232 ### Stage: 服务 HTTP smoke 260 ### Stage: 服务 HTTP smoke
233 261
234 完成项: 262 完成项:
...@@ -311,7 +339,7 @@ ...@@ -311,7 +339,7 @@
311 - 将该路径补充到开放数据工作流与交接文档 339 - 将该路径补充到开放数据工作流与交接文档
312 340
313 验证结果: 341 验证结果:
314 - `curl -I -L --max-time 60 https://os.unil.cloud.switch.ch/fma/fma_small.zip` 成功 342 - `curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 成功
315 - 当前响应头关键信息: 343 - 当前响应头关键信息:
316 - `200 OK` 344 - `200 OK`
317 - `Content-Type: application/zip` 345 - `Content-Type: application/zip`
......
...@@ -276,7 +276,7 @@ ...@@ -276,7 +276,7 @@
276 - [docs/session-handoff.md](./session-handoff.md) 276 - [docs/session-handoff.md](./session-handoff.md)
277 - [docs/current-capability-map.md](./current-capability-map.md) 277 - [docs/current-capability-map.md](./current-capability-map.md)
278 - [acr-engine/FIRST_RUN_CHECKLIST.md](../acr-engine/FIRST_RUN_CHECKLIST.md) 278 - [acr-engine/FIRST_RUN_CHECKLIST.md](../acr-engine/FIRST_RUN_CHECKLIST.md)
279 - FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`、页面级历史 URL `404`;但 `https://os.unil.cloud.switch.ch/fma/fma_small.zip` 已验证 `200 OK` 且支持 range 279 - FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`、页面级历史 URL `404`;但 `https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 已验证 `200 OK` 且支持 range
280 - 运行 [acr-engine/scripts/status_snapshot.py](../acr-engine/scripts/status_snapshot.py) 280 - 运行 [acr-engine/scripts/status_snapshot.py](../acr-engine/scripts/status_snapshot.py)
281 - 或直接查看最新落盘快照:`acr-engine/.omx/latest_status_snapshot.json` 281 - 或直接查看最新落盘快照:`acr-engine/.omx/latest_status_snapshot.json`
282 282
......