Add open-dataset inventory checks before ingestion
Constraint: Personal-use dataset setup needs quick scale visibility before generating train/eval manifests Rejected: Generate splits blindly | Hides whether a local corpus is large enough for meaningful train/test separation Confidence: high Scope-risk: narrow Directive: Run inspect-local on real FMA or MTG-Jamendo folders before prepare-local and training Tested: /usr/local/miniconda3/bin/python -m py_compile src/data/manifest_tools.py src/data/external_adapters.py; /usr/local/miniconda3/bin/python src/data/manifest_tools.py inspect-audio-dir tmp/open_music_demo --query-duration 5.0 --eval-ratio 0.5; /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma tmp/open_music_demo --eval-ratio 0.5 --query-duration 5.0 Not-tested: Real large external corpus inventory on downloaded FMA or MTG-Jamendo directories
Showing
5 changed files
with
155 additions
and
0 deletions
acr-engine/data/external_ingested/README.md
0 → 100644
-
Please register or sign in to post a comment