Add explicit drop zones for real open-music corpora
Constraint: Replacing the synthetic stand-in with real FMA or MTG-Jamendo data should not require users to infer directory structure Rejected: Leave only generic workflow text | Still forces users to guess where local audio should live before smoke runs Confidence: high Scope-risk: narrow Directive: Keep future real-corpus onboarding anchored to data/raw drop zones and smoke-local commands Tested: filesystem existence checks for acr-engine/data/raw/fma_small_audio, acr-engine/data/raw/mtg_jamendo_audio, acr-engine/data/raw/README.md, docs/README.md, docs/open-dataset-workflow.md, acr-engine/data/external_ingested/README.md Not-tested: Real downloaded audio placed into the new drop zones
Showing
5 changed files
with
50 additions
and
0 deletions
| ... | @@ -13,6 +13,9 @@ Examples: | ... | @@ -13,6 +13,9 @@ Examples: |
| 13 | - [data/raw/fma_small_audio/](../raw/fma_small_audio/) | 13 | - [data/raw/fma_small_audio/](../raw/fma_small_audio/) |
| 14 | - [data/raw/mtg_jamendo_audio/](../raw/mtg_jamendo_audio/) | 14 | - [data/raw/mtg_jamendo_audio/](../raw/mtg_jamendo_audio/) |
| 15 | 15 | ||
| 16 | Drop-zone details: | ||
| 17 | - [data/raw/README.md](../raw/README.md) | ||
| 18 | |||
| 16 | ### 2. Generate manifests through the adapter entrypoint | 19 | ### 2. Generate manifests through the adapter entrypoint |
| 17 | Optional pre-check: | 20 | Optional pre-check: |
| 18 | ```bash | 21 | ```bash | ... | ... |
acr-engine/data/raw/README.md
0 → 100644
| 1 | # Local Open-Music Drop Zones | ||
| 2 | |||
| 3 | Put real downloaded open-music audio files here before running the one-shot smoke flow. | ||
| 4 | |||
| 5 | ## Recommended folders | ||
| 6 | - `data/raw/fma_small_audio/` | ||
| 7 | - `data/raw/mtg_jamendo_audio/` | ||
| 8 | |||
| 9 | ## Next command | ||
| 10 | For FMA: | ||
| 11 | ```bash | ||
| 12 | /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 | ||
| 13 | ``` | ||
| 14 | |||
| 15 | For MTG-Jamendo: | ||
| 16 | ```bash | ||
| 17 | /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local mtg_jamendo data/raw/mtg_jamendo_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 | ||
| 18 | ``` |
| ... | @@ -144,6 +144,26 @@ | ... | @@ -144,6 +144,26 @@ |
| 144 | - 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke | 144 | - 现在只要替换 `input_dir`,就能对真实 FMA / MTG-Jamendo 本地目录跑完整 smoke |
| 145 | - 这显著降低了真实开放数据集接入和验证成本 | 145 | - 这显著降低了真实开放数据集接入和验证成本 |
| 146 | 146 | ||
| 147 | ### Stage: 真实开放数据落点目录模板 | ||
| 148 | |||
| 149 | 完成项: | ||
| 150 | - 新增 [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md) | ||
| 151 | - 新建本地开放数据落点目录: | ||
| 152 | - [acr-engine/data/raw/fma_small_audio/](../acr-engine/data/raw/fma_small_audio/) | ||
| 153 | - [acr-engine/data/raw/mtg_jamendo_audio/](../acr-engine/data/raw/mtg_jamendo_audio/) | ||
| 154 | - 将这些目录入口链接接入开放数据工作流与 docs 总入口 | ||
| 155 | |||
| 156 | 验证结果: | ||
| 157 | - 本地目录已创建: | ||
| 158 | - `data/raw/` | ||
| 159 | - `data/raw/fma_small_audio/` | ||
| 160 | - `data/raw/mtg_jamendo_audio/` | ||
| 161 | - `data/raw/README.md` 已包含可直接执行的下一条 smoke 命令模板 | ||
| 162 | |||
| 163 | 结论: | ||
| 164 | - 现在真实开放数据只需要放进明确目录即可 | ||
| 165 | - 后续替换真实 FMA / MTG-Jamendo 本地音频时无需再猜目录结构 | ||
| 166 | |||
| 147 | ### Stage: confused 定向优化 v6(sample-level weighting) | 167 | ### Stage: confused 定向优化 v6(sample-level weighting) |
| 148 | 168 | ||
| 149 | 完成项: | 169 | 完成项: | ... | ... |
| ... | @@ -63,6 +63,10 @@ flowchart TD | ... | @@ -63,6 +63,10 @@ flowchart TD |
| 63 | - [数据来源与接入](./dataset-sources-and-licensing.md) | 63 | - [数据来源与接入](./dataset-sources-and-licensing.md) |
| 64 | - [工业评测规范](./industrial-benchmark-spec.md) | 64 | - [工业评测规范](./industrial-benchmark-spec.md) |
| 65 | 65 | ||
| 66 | 快速落地入口: | ||
| 67 | - [开放数据工作流](./open-dataset-workflow.md) | ||
| 68 | - [本地开放数据落点目录](../acr-engine/data/raw/README.md) | ||
| 69 | |||
| 66 | ### C. 服务与工程 | 70 | ### C. 服务与工程 |
| 67 | - [服务接口](./service-api.md) | 71 | - [服务接口](./service-api.md) |
| 68 | - [更新记录](./CHANGELOG.md) | 72 | - [更新记录](./CHANGELOG.md) | ... | ... |
| ... | @@ -69,6 +69,11 @@ flowchart LR | ... | @@ -69,6 +69,11 @@ flowchart LR |
| 69 | /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 | 69 | /usr/local/miniconda3/bin/python src/data/external_adapters.py smoke-local fma data/raw/fma_small_audio --output-root data/external_smoke --eval-ratio 0.2 --query-duration 8.0 --train-epochs 1 --batch-size 2 |
| 70 | ``` | 70 | ``` |
| 71 | 71 | ||
| 72 | 真实目录放置位置可参考: | ||
| 73 | - [acr-engine/data/raw/README.md](../acr-engine/data/raw/README.md) | ||
| 74 | - [acr-engine/data/raw/fma_small_audio/](../acr-engine/data/raw/fma_small_audio/) | ||
| 75 | - [acr-engine/data/raw/mtg_jamendo_audio/](../acr-engine/data/raw/mtg_jamendo_audio/) | ||
| 76 | |||
| 72 | --- | 77 | --- |
| 73 | 78 | ||
| 74 | ## 4. 输出物说明 | 79 | ## 4. 输出物说明 | ... | ... |
-
Please register or sign in to post a comment