Make open-dataset manifests trainable end to end

Constraint: Open dataset onboarding was incomplete until generated manifests could enter train.py without manual path fixes Rejected: Keep manifests as ingestion-only artifacts | Fails the actual training handoff and leaves the workflow broken Confidence: high Scope-risk: moderate Directive: Preserve the self-contained output layout (audio plus manifests) for all future external dataset imports Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open_fixed --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open_fixed/fma/manifests; /usr/local/miniconda3/bin/python train.py --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --output data/models_open_smoke_fixed --device cpu --epochs 1 --batch-size 2 --dry-run; /usr/local/miniconda3/bin/python -m py_compile src/data/dataset.py train.py src/data/manifest_tools.py src/data/external_adapters.py Not-tested: Full multi-epoch training and index/eval loop on a real downloaded FMA or MTG-Jamendo corpus

Make open-dataset manifests trainable end to end
Constraint: Open dataset onboarding was incomplete until generated manifests could enter train.py without manual path fixes Rejected: Keep manifests as ingestion-only artifacts | Fails the actual training handoff and leaves the workflow broken Confidence: high Scope-risk: moderate Directive: Preserve the self-contained output layout (audio plus manifests) for all future external dataset imports Tested: /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open_fixed --eval-ratio 0.2 --query-duration 5.0; /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open_fixed/fma/manifests; /usr/local/miniconda3/bin/python train.py --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --output data/models_open_smoke_fixed --device cpu --epochs 1 --batch-size 2 --dry-run; /usr/local/miniconda3/bin/python -m py_compile src/data/dataset.py train.py src/data/manifest_tools.py src/data/external_adapters.py Not-tested: Full multi-epoch training and index/eval loop on a real downloaded FMA or MTG-Jamendo corpus
cnb.bofCdSsphPA
Commit b766c74e ... b766c74e9ff1c9be3223d226d9ef4e0da9a7cb03 authored 2026-06-02 12:53:53 +0800 by cnb.bofCdSsphPA
Showing 32 changed files with 772 additions and 9 deletions
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00000.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00001.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00002.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00003.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00004.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00005.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00006.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00007.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00008.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00009.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00010.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00011.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00012.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00013.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00014.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00015.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00016.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00017.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00018.wav
acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00019.wav
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00000.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00000.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00001.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00001.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00002.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00002.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00003.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00003.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00004.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00004.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00005.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00005.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00006.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00006.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00007.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00007.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00008.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00008.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00009.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00009.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00010.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00010.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00011.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00011.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00012.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00012.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00013.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00013.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00014.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00014.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00015.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00015.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00016.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00016.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00017.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00017.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00018.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00018.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00019.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00019.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00020.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00020.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00021.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00021.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00022.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00022.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00023.wav 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/audio/fma_00023.wav 0 → 100644
View file @b766c74
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/catalog.json 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/catalog.json 0 → 100644
View file @b766c74
+[
+  {
+    "song_id": "fma_00000",
+    "audio_path": "audio/fma_00000.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00001",
+    "audio_path": "audio/fma_00001.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00002",
+    "audio_path": "audio/fma_00002.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00003",
+    "audio_path": "audio/fma_00003.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00004",
+    "audio_path": "audio/fma_00004.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00005",
+    "audio_path": "audio/fma_00005.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00006",
+    "audio_path": "audio/fma_00006.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00007",
+    "audio_path": "audio/fma_00007.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00008",
+    "audio_path": "audio/fma_00008.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00009",
+    "audio_path": "audio/fma_00009.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00010",
+    "audio_path": "audio/fma_00010.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00011",
+    "audio_path": "audio/fma_00011.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00012",
+    "audio_path": "audio/fma_00012.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00013",
+    "audio_path": "audio/fma_00013.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00014",
+    "audio_path": "audio/fma_00014.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00015",
+    "audio_path": "audio/fma_00015.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00016",
+    "audio_path": "audio/fma_00016.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00017",
+    "audio_path": "audio/fma_00017.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00018",
+    "audio_path": "audio/fma_00018.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00019",
+    "audio_path": "audio/fma_00019.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00020",
+    "audio_path": "audio/fma_00020.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00021",
+    "audio_path": "audio/fma_00021.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00022",
+    "audio_path": "audio/fma_00022.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00023",
+    "audio_path": "audio/fma_00023.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  }
+]
\ No newline at end of file
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/test.json 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/test.json 0 → 100644
View file @b766c74
+[
+  {
+    "song_id": "fma_00000",
+    "audio_path": "audio/fma_00000.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 6.394,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00003",
+    "audio_path": "audio/fma_00003.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.922,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00004",
+    "audio_path": "audio/fma_00004.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 4.219,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00006",
+    "audio_path": "audio/fma_00006.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 0.265,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00009",
+    "audio_path": "audio/fma_00009.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.094,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00011",
+    "audio_path": "audio/fma_00011.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 3.403,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00013",
+    "audio_path": "audio/fma_00013.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 0.927,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00020",
+    "audio_path": "audio/fma_00020.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 7.046,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00000",
+    "audio_path": "audio/fma_00000.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00001",
+    "audio_path": "audio/fma_00001.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00002",
+    "audio_path": "audio/fma_00002.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00003",
+    "audio_path": "audio/fma_00003.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00004",
+    "audio_path": "audio/fma_00004.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00005",
+    "audio_path": "audio/fma_00005.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00006",
+    "audio_path": "audio/fma_00006.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00007",
+    "audio_path": "audio/fma_00007.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00008",
+    "audio_path": "audio/fma_00008.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00009",
+    "audio_path": "audio/fma_00009.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00010",
+    "audio_path": "audio/fma_00010.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00011",
+    "audio_path": "audio/fma_00011.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00012",
+    "audio_path": "audio/fma_00012.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00013",
+    "audio_path": "audio/fma_00013.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00014",
+    "audio_path": "audio/fma_00014.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00015",
+    "audio_path": "audio/fma_00015.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00016",
+    "audio_path": "audio/fma_00016.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00017",
+    "audio_path": "audio/fma_00017.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00018",
+    "audio_path": "audio/fma_00018.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00019",
+    "audio_path": "audio/fma_00019.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00020",
+    "audio_path": "audio/fma_00020.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00021",
+    "audio_path": "audio/fma_00021.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00022",
+    "audio_path": "audio/fma_00022.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00023",
+    "audio_path": "audio/fma_00023.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  }
+]
\ No newline at end of file
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/train.json 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/train.json 0 → 100644
View file @b766c74
+[
+  {
+    "song_id": "fma_00001",
+    "audio_path": "audio/fma_00001.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 2.75,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00002",
+    "audio_path": "audio/fma_00002.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 7.365,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00005",
+    "audio_path": "audio/fma_00005.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 2.186,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00007",
+    "audio_path": "audio/fma_00007.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 6.499,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00008",
+    "audio_path": "audio/fma_00008.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 2.204,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00010",
+    "audio_path": "audio/fma_00010.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.058,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00012",
+    "audio_path": "audio/fma_00012.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 9.572,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00014",
+    "audio_path": "audio/fma_00014.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.475,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00015",
+    "audio_path": "audio/fma_00015.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.071,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00016",
+    "audio_path": "audio/fma_00016.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 5.362,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00017",
+    "audio_path": "audio/fma_00017.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 3.785,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00018",
+    "audio_path": "audio/fma_00018.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.294,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00019",
+    "audio_path": "audio/fma_00019.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 8.617,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00021",
+    "audio_path": "audio/fma_00021.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 2.279,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00022",
+    "audio_path": "audio/fma_00022.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 0.798,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00023",
+    "audio_path": "audio/fma_00023.wav",
+    "duration": 5.0,
+    "type": "clean",
+    "offset": 1.01,
+    "segment_type": "external_query",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00000",
+    "audio_path": "audio/fma_00000.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00001",
+    "audio_path": "audio/fma_00001.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00002",
+    "audio_path": "audio/fma_00002.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00003",
+    "audio_path": "audio/fma_00003.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00004",
+    "audio_path": "audio/fma_00004.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00005",
+    "audio_path": "audio/fma_00005.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00006",
+    "audio_path": "audio/fma_00006.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00007",
+    "audio_path": "audio/fma_00007.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00008",
+    "audio_path": "audio/fma_00008.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00009",
+    "audio_path": "audio/fma_00009.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00010",
+    "audio_path": "audio/fma_00010.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00011",
+    "audio_path": "audio/fma_00011.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00012",
+    "audio_path": "audio/fma_00012.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00013",
+    "audio_path": "audio/fma_00013.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00014",
+    "audio_path": "audio/fma_00014.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00015",
+    "audio_path": "audio/fma_00015.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00016",
+    "audio_path": "audio/fma_00016.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00017",
+    "audio_path": "audio/fma_00017.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00018",
+    "audio_path": "audio/fma_00018.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00019",
+    "audio_path": "audio/fma_00019.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00020",
+    "audio_path": "audio/fma_00020.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00021",
+    "audio_path": "audio/fma_00021.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00022",
+    "audio_path": "audio/fma_00022.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  },
+  {
+    "song_id": "fma_00023",
+    "audio_path": "audio/fma_00023.wav",
+    "duration": 15.0,
+    "type": "reference",
+    "source_dataset": "fma"
+  }
+]
\ No newline at end of file
--- a/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/val.json 0 → 100644
View file @b766c74
+++ b/acr-engine/data/external_ingested/synthetic_as_open_fixed/fma/manifests/val.json 0 → 100644
View file @b766c74
+[]
\ No newline at end of file
--- a/acr-engine/src/data/dataset.py
View file @b766c74
+++ b/acr-engine/src/data/dataset.py
View file @b766c74
@@ -32,6 +32,7 @@ class ACRDataset(Dataset):
        self.augment = augment
        self.n_crops = n_crops_per_song
        self.data_dir = Path(data_dir)
+        self.asset_root = self.data_dir.parent if self.data_dir.name == "manifests" else self.data_dir
        meta_path = self.data_dir / f"{split}.json"
        with open(meta_path) as f:
@@ -41,7 +42,7 @@ class ACRDataset(Dataset):
        for item in self.metadata:
            if references_only and item.get("type") != "reference":
                continue
-            song_path = self.data_dir / item["audio_path"]
+            song_path = self.asset_root / item["audio_path"]
            if song_path.exists():
                self.samples.append(item)
@@ -75,7 +76,7 @@ class ACRDataset(Dataset):
        max_offset = max(0, duration - 5.0)
        offset = random.uniform(0, max_offset) if max_offset > 0 else 0
-        audio_path = self.data_dir / sample["audio_path"]
+        audio_path = self.asset_root / sample["audio_path"]
        y = self._load_segment(str(audio_path), offset, 5.0)
        if self.augment and sample.get("type") != "reference":
@@ -113,6 +114,7 @@ class ACRTestDataset(Dataset):
        self.n_fft = n_fft
        self.hop_length = hop_length
        self.data_dir = Path(data_dir)
+        self.asset_root = self.data_dir.parent if self.data_dir.name == "manifests" else self.data_dir
        meta_path = self.data_dir / f"{split}.json"
        with open(meta_path) as f:
@@ -120,7 +122,7 @@ class ACRTestDataset(Dataset):
        self.samples = []
        for item in self.metadata:
-            p = self.data_dir / item["audio_path"]
+            p = self.asset_root / item["audio_path"]
            if p.exists():
                self.samples.append(item)
@@ -132,7 +134,7 @@ class ACRTestDataset(Dataset):
    def __getitem__(self, idx):
        sample = self.samples[idx]
-        audio_path = self.data_dir / sample["audio_path"]
+        audio_path = self.asset_root / sample["audio_path"]
        y, _ = librosa.load(str(audio_path), sr=self.sr, mono=True, offset=0, duration=min(sample["duration"], 5.0))
        seg_len = 5 * self.sr
        if len(y) < seg_len:
@@ -178,6 +180,7 @@ class SongPairDataset(Dataset):
        self.segment_len = int(segment_dur * sr)
        self.augment = augment
        self.data_dir = Path(data_dir)
+        self.asset_root = self.data_dir.parent if self.data_dir.name == "manifests" else self.data_dir
        with open(self.data_dir / f"{split}.json") as f:
            metadata = json.load(f)
@@ -186,7 +189,7 @@ class SongPairDataset(Dataset):
        for item in metadata:
            if item.get("type") == "reference":
                continue
-            p = self.data_dir / item["audio_path"]
+            p = self.asset_root / item["audio_path"]
            if p.exists():
                self.by_song.setdefault(item["song_id"], []).append(item)
@@ -207,7 +210,7 @@ class SongPairDataset(Dataset):
        return len(self.sample_song_ids)
    def _load_clip(self, sample: Dict) -> np.ndarray:
-        path = self.data_dir / sample["audio_path"]
+        path = self.asset_root / sample["audio_path"]
        y, _ = librosa.load(str(path), sr=self.sr, mono=True, duration=5.0)
        if len(y) < self.segment_len:
            y = np.pad(y, (0, self.segment_len - len(y)))
--- a/acr-engine/src/data/manifest_tools.py
View file @b766c74
+++ b/acr-engine/src/data/manifest_tools.py
View file @b766c74
@@ -6,6 +6,7 @@ import argparse
 import csv
 import json
 import random
+import shutil
 from pathlib import Path
 from typing import List, Dict
 import soundfile as sf
@@ -49,13 +50,19 @@ def build_train_eval_from_audio_dir(
    output_dir.mkdir(parents=True, exist_ok=True)
    manifests_dir = output_dir / "manifests"
    manifests_dir.mkdir(parents=True, exist_ok=True)
+    audio_out_dir = output_dir / "audio"
+    audio_out_dir.mkdir(parents=True, exist_ok=True)
    refs = []
    train = []
    test = []
    for idx, path in enumerate(files):
-        rel = path.relative_to(output_dir.parent if output_dir.parent in path.parents else audio_dir.parent)
+        target_name = f"{source_dataset}_{idx:05d}{path.suffix.lower()}"
+        target_path = audio_out_dir / target_name
+        if not target_path.exists():
+            shutil.copy2(path, target_path)
+        rel = target_path.relative_to(output_dir)
        song_id = f"{source_dataset}_{idx:05d}"
        try:
            info = sf.info(str(path))
--- a/docs/CHANGELOG.md
View file @b766c74
+++ b/docs/CHANGELOG.md
View file @b766c74
@@ -50,6 +50,28 @@
 - 现在开放数据接入路径已经浓缩成单页可执行工作流
 - 后续接真实 FMA / MTG-Jamendo 本地目录时，上手成本更低
+### Stage: 开放数据 manifests 直连训练
+完成项：
+- 修复 `src/data/manifest_tools.py` 生成的开放数据 manifests 路径自洽性
+- 让开放数据音频复制到输出根下的 `audio/`
+- 修复 `src/data/dataset.py` 对 `.../manifests` 目录布局的路径解析
+- 打通 `prepare-local -> validate-local -> train.py --dry-run`
+验证结果：
+- `/usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/synthetic_v2/songs --output-root data/external_ingested/synthetic_as_open_fixed --eval-ratio 0.2 --query-duration 5.0` 成功
+- `/usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/synthetic_as_open_fixed/fma/manifests` 成功
+- `/usr/local/miniconda3/bin/python train.py --data data/external_ingested/synthetic_as_open_fixed/fma/manifests --output data/models_open_smoke_fixed --device cpu --epochs 1 --batch-size 2 --dry-run` 成功
+- 当前结果：
+  - `catalog=24`
+  - `train_queries=16`
+  - `test_queries=8`
+  - `Dry run passed!`
+结论：
+- 开放数据路径现在不仅能生成 manifests，还能真正进入训练
+- 后续接入真实 FMA / MTG-Jamendo 时，可以直接走同一链路
 ### Stage: confused 定向优化 v6（sample-level weighting）
 完成项：
--- a/docs/open-dataset-workflow.md
View file @b766c74
+++ b/docs/open-dataset-workflow.md
View file @b766c74
@@ -20,8 +20,8 @@ flowchart LR
    A[Local Open Audio Dir] --> B[inspect-local / inspect-batch]
    B --> C[prepare-local]
    C --> D[validate-local]
-    D --> E[train.json]
+    D --> E[train.py]
-    D --> F[test.json]
+    D --> F[evaluate.py]
 ```
 ---
@@ -34,6 +34,7 @@ flowchart LR
 | 批量比较 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `inspect-batch ...` | 比较多个候选目录 |
 | 生成清单 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `prepare-local ...` | 产出 train/test/catalog |
 | 训练前校验 | [`src/data/external_adapters.py`](../acr-engine/src/data/external_adapters.py) `validate-local ...` | 确认结构正确 |
+| 训练 smoke | [`train.py`](../acr-engine/train.py) `--data ... --dry-run` | 验证 manifests 可直接进入训练 |
 ---
@@ -45,6 +46,7 @@ flowchart LR
 /usr/local/miniconda3/bin/python src/data/external_adapters.py inspect-local fma data/raw/fma_small_audio --eval-ratio 0.2 --query-duration 8.0
 /usr/local/miniconda3/bin/python src/data/external_adapters.py prepare-local fma data/raw/fma_small_audio --output-root data/external_ingested --eval-ratio 0.2 --query-duration 8.0
 /usr/local/miniconda3/bin/python src/data/external_adapters.py validate-local fma data/external_ingested/fma/manifests
+/usr/local/miniconda3/bin/python train.py --data data/external_ingested/fma/manifests --output data/models_fma_smoke --device cpu --epochs 1 --batch-size 2 --dry-run
 ```
 ### 3.2 多目录比较
@@ -78,6 +80,8 @@ flowchart LR
  - `test_queries=8`
 - `validate-local`：
  - `ok=true`
+- `train.py --dry-run`：
+  - `Dry run passed! Pipeline is working.`
 ---