README.md
1.98 KB
External Open-Music Ingestion
Goal
Convert local open-music audio folders into ACR-ready manifests for:
- training queries
- evaluation queries
- reference catalog indexing
Recommended personal-use flow
1. Prepare a local audio directory
Examples:
Drop-zone details:
2. Generate manifests through the adapter entrypoint
Optional pre-check:
/usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/raw/fma_small_audio --eval-ratio 0.2 --query-duration 8.0
Batch pre-check across multiple candidate corpora:
/usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-batch fma=data/raw/fma_small_audio mtg_jamendo=data/raw/mtg_jamendo_audio --eval-ratio 0.2 --query-duration 8.0
Then generate manifests:
/usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/raw/fma_small_audio --output-root data/external_ingested --eval-ratio 0.2 --query-duration 8.0
or
/usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local mtg_jamendo data/raw/mtg_jamendo_audio --output-root data/external_ingested --eval-ratio 0.2 --query-duration 8.0
3. Use outputs
Generated files:
- catalog.json: reference tracks for indexing
- train.json: train queries + references
- test.json: held-out eval queries + references
- val.json: optional validation split
Notes
- Small datasets are automatically protected so both train/test query sets exist.
- For personal use, FMA and MTG-Jamendo should be the first real baselines.
- Keep test.json fixed across experiments to compare models fairly.