Enable open music datasets to feed train and eval splits
Constraint: Personal-use workflow needs real train/eval manifests rather than bootstrap-only placeholders Rejected: Keep external datasets as catalog skeletons only | Does not satisfy training/evaluation reuse requirement Confidence: high Scope-risk: narrow Directive: Wire real FMA or MTG-Jamendo local download directories into this ingestion path before larger-scale training Tested: /usr/local/miniconda3/bin/python -m py_compile src/data/manifest_tools.py; /usr/local/miniconda3/bin/python src/data/manifest_tools.py audio-dir-to-splits tmp/open_music_demo data/external_ingested/demo_fma_like --source-dataset demo_fma_like --eval-ratio 0.5 --query-duration 5.0 Not-tested: Full download/import of upstream FMA or MTG-Jamendo corpora
Showing
8 changed files
with
215 additions
and
1 deletions
-
Please register or sign in to post a comment