Extend dataset bootstrap coverage and improve humming hard-case weighting · 48c97a90 ...

Broaden external dataset bootstrap support and replace naive hard-case oversampling with a more targeted weighting signal that measurably helps humming-like queries while preserving the release/eval workflow.

Constraint: Hard-case optimization must be evidence-driven and preserve a record of mixed outcomes across iterations
Rejected: Reuse naive oversampling after regression | it already showed worse overall behavior with no hard-case gain
Confidence: medium
Scope-risk: moderate
Directive: Next iteration should target confused-case negatives explicitly; do not assume humming gains transfer to confusion robustness
Tested: bootstrap generation for MTG-Jamendo and ModelScope placeholders; 2-epoch CPU training for models_v5; index_v5 build; fast eval JSON generation for smoke-v5
Not-tested: real audio ingestion for the new datasets; full melody-aware slow evaluation on models_v5

authored 2026-06-02 12:15:19 +0800

README.bootstrap.md 142 Bytes

Raw Blame History Permalink

modelscope_music bootstrap

Fill raw audio files under raw/
Review license before training
Convert to final catalog/query manifests