Align real FMA ingestion with the user-provided ModelScope source
Constraint: The user supplied a verified archive URL that is a better current source of truth than the previously tested mirror path Rejected: Keep the older archive URL as the default control surface | Would ignore fresher user evidence and split operational guidance across sources Confidence: high Scope-risk: narrow Directive: Treat the ModelScope FMA archive URL as the primary default until a newer verified source supersedes it Tested: curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; curl -L --range 0-1023 --max-time 60 -o /tmp/fma_modelscope_probe.bin https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip; /usr/local/miniconda3/bin/python acr-engine/scripts/prepare_fma_archive.py inspect Not-tested: Full archive completion, extraction, and downstream real-data smoke remain pending
Showing
4 changed files
with
32 additions
and
4 deletions
| ... | @@ -45,7 +45,7 @@ flowchart LR | ... | @@ -45,7 +45,7 @@ flowchart LR |
| 45 | 45 | ||
| 46 | ### What this script standardizes | 46 | ### What this script standardizes |
| 47 | 47 | ||
| 48 | - official source URL: `https://os.unil.cloud.switch.ch/fma/fma_small.zip` | 48 | - official source URL: `https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` |
| 49 | - resumable archive download to `data/raw/fma_small.zip` | 49 | - resumable archive download to `data/raw/fma_small.zip` |
| 50 | - extraction target: `data/raw/fma_small_audio/` | 50 | - extraction target: `data/raw/fma_small_audio/` |
| 51 | 51 | ... | ... |
| ... | @@ -8,7 +8,7 @@ import json | ... | @@ -8,7 +8,7 @@ import json |
| 8 | import subprocess | 8 | import subprocess |
| 9 | from pathlib import Path | 9 | from pathlib import Path |
| 10 | 10 | ||
| 11 | FMA_SMALL_URL = "https://os.unil.cloud.switch.ch/fma/fma_small.zip" | 11 | FMA_SMALL_URL = "https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip" |
| 12 | ARCHIVE_PATH = Path("data/raw/fma_small.zip") | 12 | ARCHIVE_PATH = Path("data/raw/fma_small.zip") |
| 13 | EXTRACT_DIR = Path("data/raw/fma_small_audio") | 13 | EXTRACT_DIR = Path("data/raw/fma_small_audio") |
| 14 | 14 | ... | ... |
| ... | @@ -229,6 +229,34 @@ | ... | @@ -229,6 +229,34 @@ |
| 229 | 229 | ||
| 230 | 230 | ||
| 231 | 231 | ||
| 232 | |||
| 233 | ### Stage: FMA 源切换到 ModelScope | ||
| 234 | |||
| 235 | 完成项: | ||
| 236 | - 将 [acr-engine/scripts/prepare_fma_archive.py](../acr-engine/scripts/prepare_fma_archive.py) 的默认 FMA 整包源切换到用户提供的 ModelScope 地址 | ||
| 237 | - 同步更新: | ||
| 238 | - [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md) | ||
| 239 | - [docs/open-dataset-workflow.md](./open-dataset-workflow.md) | ||
| 240 | - [docs/session-handoff.md](./session-handoff.md) | ||
| 241 | - 通过 repo 内脚本重新启动托管下载流程 | ||
| 242 | |||
| 243 | 验证结果: | ||
| 244 | - `curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 成功 | ||
| 245 | - 当前响应关键信息: | ||
| 246 | - `200` | ||
| 247 | - `content-length=7679594875` | ||
| 248 | - `accept-ranges: bytes` | ||
| 249 | - `curl -L --range 0-1023 ...` 成功获取 `1024` bytes | ||
| 250 | - `/usr/local/miniconda3/bin/python scripts/prepare_fma_archive.py inspect` 成功 | ||
| 251 | - 当前结果: | ||
| 252 | - `archive_url=https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` | ||
| 253 | - `archive_size=53620736` | ||
| 254 | - 托管下载进程存在:`prepare_fma_archive.py download` | ||
| 255 | |||
| 256 | 结论: | ||
| 257 | - 真实 FMA 下载现在已切换到用户指定的 ModelScope 通道 | ||
| 258 | - 下载控制面也已统一回 repo 内脚本,后续 session 更容易续传与接力 | ||
| 259 | |||
| 232 | ### Stage: 服务 HTTP smoke | 260 | ### Stage: 服务 HTTP smoke |
| 233 | 261 | ||
| 234 | 完成项: | 262 | 完成项: |
| ... | @@ -311,7 +339,7 @@ | ... | @@ -311,7 +339,7 @@ |
| 311 | - 将该路径补充到开放数据工作流与交接文档 | 339 | - 将该路径补充到开放数据工作流与交接文档 |
| 312 | 340 | ||
| 313 | 验证结果: | 341 | 验证结果: |
| 314 | - `curl -I -L --max-time 60 https://os.unil.cloud.switch.ch/fma/fma_small.zip` 成功 | 342 | - `curl -I -L --max-time 60 https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 成功 |
| 315 | - 当前响应头关键信息: | 343 | - 当前响应头关键信息: |
| 316 | - `200 OK` | 344 | - `200 OK` |
| 317 | - `Content-Type: application/zip` | 345 | - `Content-Type: application/zip` | ... | ... |
| ... | @@ -276,7 +276,7 @@ | ... | @@ -276,7 +276,7 @@ |
| 276 | - [docs/session-handoff.md](./session-handoff.md) | 276 | - [docs/session-handoff.md](./session-handoff.md) |
| 277 | - [docs/current-capability-map.md](./current-capability-map.md) | 277 | - [docs/current-capability-map.md](./current-capability-map.md) |
| 278 | - [acr-engine/FIRST_RUN_CHECKLIST.md](../acr-engine/FIRST_RUN_CHECKLIST.md) | 278 | - [acr-engine/FIRST_RUN_CHECKLIST.md](../acr-engine/FIRST_RUN_CHECKLIST.md) |
| 279 | - FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`、页面级历史 URL `404`;但 `https://os.unil.cloud.switch.ch/fma/fma_small.zip` 已验证 `200 OK` 且支持 range | 279 | - FMA 真实子集下载脚手架已存在:[acr-engine/scripts/fetch_fma_subset.py](../acr-engine/scripts/fetch_fma_subset.py);最近验证结果是旧直链 `403`、页面级历史 URL `404`;但 `https://modelscope.cn/datasets/pengzhendong/fma/resolve/master/fma_small.zip` 已验证 `200 OK` 且支持 range |
| 280 | - 运行 [acr-engine/scripts/status_snapshot.py](../acr-engine/scripts/status_snapshot.py) | 280 | - 运行 [acr-engine/scripts/status_snapshot.py](../acr-engine/scripts/status_snapshot.py) |
| 281 | - 或直接查看最新落盘快照:`acr-engine/.omx/latest_status_snapshot.json` | 281 | - 或直接查看最新落盘快照:`acr-engine/.omx/latest_status_snapshot.json` |
| 282 | 282 | ... | ... |
-
Please register or sign in to post a comment