Commit d1d7a512 d1d7a512c67befb32efac74a0ae54a6ac09580e2 by cnb.bofCdSsphPA

Align real FMA ingestion with the user-provided ModelScope source

Constraint: The user supplied a verified archive URL that is a better current source of truth than the previously tested mirror path
Rejected: Keep the older archive URL as the default control surface | Would ignore fresher user evidence and split operational guidance across sources
Confidence: high
Scope-risk: narrow
Directive: Treat the ModelScope FMA archive URL as the primary default until a newer verified source supersedes it
Tested: curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; curl -L --range 0-1023 --max-time 60 -o /tmp/fma_modelscope_probe.bin https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect
Not-tested: Full archive completion, extraction, and downstream real-data smoke remain pending
1 parent 2ee3e829
......@@ -45,7 +45,7 @@ flowchart LR
### What this script standardizes
- official source URL: `https://os.unil.cloud.switch.ch/fma/fma_small.zip`
- official source URL: `https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip`
- resumable archive download to `data/raw/fma_small.zip`
- extraction target: `data/raw/fma_small_audio/`
......
......@@ -8,7 +8,7 @@ import json
import subprocess
from pathlib import Path
FMA_SMALL_URL = "https://os.unil.cloud.switch.ch/fma/fma_small.zip"
FMA_SMALL_URL = "https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip"
ARCHIVE_PATH = Path("data/raw/fma_small.zip")
EXTRACT_DIR = Path("data/raw/fma_small_audio")
......
......@@ -229,6 +229,34 @@
### Stage: FMA 源切换到 ModelScope
完成项:
-[acr-engine/scripts/prepare_fma_archive.py](../acr-engine/scripts/prepare_fma_archive.py) 的默认 FMA 整包源切换到用户提供的 ModelScope 地址
- 同步更新:
- [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md)
- [docs/open-dataset-workflow.md](./open-dataset-workflow.md)
- [docs/session-handoff.md](./session-handoff.md)
- 通过 repo 内脚本重新启动托管下载流程
验证结果:
- `curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 成功
- 当前响应关键信息:
- `200`
- `content-length=7679594875`
- `accept-ranges: bytes`
- `curl -L --range 0-1023 ...` 成功获取 `1024` bytes
- `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 成功
- 当前结果:
- `archive_url=https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip`
- `archive_size=53620736`
- 托管下载进程存在:`prepare_fma_archive.py download`
结论:
- 真实 FMA 下载现在已切换到用户指定的 ModelScope 通道
- 下载控制面也已统一回 repo 内脚本,后续 session 更容易续传与接力
### Stage: 服务 HTTP smoke
完成项:
......@@ -311,7 +339,7 @@
- 将该路径补充到开放数据工作流与交接文档
验证结果:
- `curl -I -L --max-time 60 https://os.unil.cloud.switch.ch/fma/fma_small.zip` 成功
- `curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 成功
- 当前响应头关键信息:
- `200 OK`
- `Content-Type: application/zip`
......
......@@ -276,7 +276,7 @@
- [docs/session-handoff.md](./session-handoff.md)
- [docs/current-capability-map.md](./current-capability-map.md)
- [acr-engine/FIRST_RUN_CHECKLIST.md](../acr-engine/FIRST_RUN_CHECKLIST.md)
- FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`、页面级历史 URL `404`;但 `https://os.unil.cloud.switch.ch/fma/fma_small.zip` 已验证 `200 OK` 且支持 range
- FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`、页面级历史 URL `404`;但 `https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 已验证 `200 OK` 且支持 range
- 运行 [acr-engine/scripts/status_snapshot.py](../acr-engine/scripts/status_snapshot.py)
- 或直接查看最新落盘快照:`acr-engine/.omx/latest_status_snapshot.json`
......