open-dataset-plan.md 646 Bytes

Open Dataset Integration Plan

Recommended order

  1. FMA small
  2. MTG-Jamendo
  3. QBSH / humming corpora
    • Why: add after retrieval baseline is stable

Repo strategy

  • Keep external dataset ingestion optional
  • Convert external tracks into:
    • catalog.json for searchable references
    • query segment manifests for evaluation
  • Start with small local subsets before full-corpus scaling