README.md
1.41 KB
External Open-Music Ingestion
Goal
Convert local open-music audio folders into ACR-ready manifests for:
- training queries
- evaluation queries
- reference catalog indexing
Recommended personal-use flow
1. Prepare a local audio directory
Examples:
data/raw/fma_small_audio/data/raw/mtg_jamendo_audio/
2. Generate manifests through the adapter entrypoint
Optional pre-check:
/usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/raw/fma_small_audio --eval-ratio 0.2 --query-duration 8.0
Then generate manifests:
/usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/raw/fma_small_audio --output-root data/external_ingested --eval-ratio 0.2 --query-duration 8.0
or
/usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local mtg_jamendo data/raw/mtg_jamendo_audio --output-root data/external_ingested --eval-ratio 0.2 --query-duration 8.0
3. Use outputs
Generated files:
-
catalog.json: reference tracks for indexing -
train.json: train queries + references -
test.json: held-out eval queries + references -
val.json: optional validation split
Notes
- Small datasets are automatically protected so both train/test query sets exist.
- For personal use, FMA and MTG-Jamendo should be the first real baselines.
- Keep
test.jsonfixed across experiments to compare models fairly.