business-project-manifest-adapter.md 2.47 KB

Business Project Manifest Adapter / 业务数据到项目 Manifest 适配说明

更新:2026-06-02
关联文档:业务导出 Cookbook · 业务 Manifest 与 Type-Role 规范

一页结论

现在仓库里已经有一条接近项目训练/评测 manifest 的离线脚本链:

  1. 业务库表导出 CSV / JSONL
  2. ../acr-engine/scripts/normalize_business_export.py
  3. ../acr-engine/scripts/split_business_manifest_ready.py
  4. ../acr-engine/scripts/build_business_project_manifests.py

最后一步会直接生成:

  • catalog.json
  • train.json
  • test.json
  • val.json

格式对齐当前项目已有 manifest 结构。


1. 对齐后的项目格式

catalog.json

  • 只放 reference
  • 字段:song_id / audio_path / duration / type=reference / source_dataset

train.json / test.json

  • 前半部分是 query
  • 后半部分拼接 reference
  • query 字段:
    • song_id
    • audio_path
    • duration
    • type=clean
    • offset
    • segment_type=external_query
    • source_dataset

val.json

  • 当前默认只放 split=val 的 query
  • 可选把 holdout 合并进 val

2. 示例命令

cd /workspace/acr-engine
/usr/local/miniconda3/bin/python scripts/normalize_business_export.py \
  --input configs/manifests/examples/business_asset_export_example.csv \
  --output /tmp/business_asset_manifest_ready.jsonl

/usr/local/miniconda3/bin/python scripts/build_business_project_manifests.py \
  --input /tmp/business_asset_manifest_ready.jsonl \
  --output-dir /tmp/business_project_manifests

如果你希望把 holdout 先并进 val.json

/usr/local/miniconda3/bin/python scripts/build_business_project_manifests.py \
  --input /tmp/business_asset_manifest_ready.jsonl \
  --output-dir /tmp/business_project_manifests \
  --include-holdout-in-val

3. 适配边界

这一步还不是最终“真实业务生产接入”,但已经足够让下个 session:

  • 用真实业务导出样本跑通 manifest 结构
  • 对接 train.py / evaluate.py / run_demo.py
  • 再只针对最终字段细节做小修

Sources