Extend dataset bootstrap coverage and improve humming hard-case weighting
Broaden external dataset bootstrap support and replace naive hard-case oversampling with a more targeted weighting signal that measurably helps humming-like queries while preserving the release/eval workflow. Constraint: Hard-case optimization must be evidence-driven and preserve a record of mixed outcomes across iterations Rejected: Reuse naive oversampling after regression | it already showed worse overall behavior with no hard-case gain Confidence: medium Scope-risk: moderate Directive: Next iteration should target confused-case negatives explicitly; do not assume humming gains transfer to confusion robustness Tested: bootstrap generation for MTG-Jamendo and ModelScope placeholders; 2-epoch CPU training for models_v5; index_v5 build; fast eval JSON generation for smoke-v5 Not-tested: real audio ingestion for the new datasets; full melody-aware slow evaluation on models_v5
Showing
17 changed files
with
184 additions
and
2 deletions
acr-engine/__pycache__/train.cpython-310.pyc
0 → 100644
No preview for this file type
acr-engine/data/index_v5/chromaprint.pkl
0 → 100644
No preview for this file type
acr-engine/data/index_v5/reference_embs.npy
0 → 100644
No preview for this file type
acr-engine/data/index_v5/reference_ids.npy
0 → 100644
No preview for this file type
acr-engine/data/models_v5/best_model.pt
0 → 100644
This file is too large to display.
acr-engine/data/models_v5/song_to_idx.json
0 → 100644
No preview for this file type
No preview for this file type
-
Please register or sign in to post a comment