business-project-manifest-adapter.md
2.47 KB
Business Project Manifest Adapter / 业务数据到项目 Manifest 适配说明
更新:2026-06-02
关联文档:业务导出 Cookbook · 业务 Manifest 与 Type-Role 规范
一页结论
现在仓库里已经有一条接近项目训练/评测 manifest 的离线脚本链:
- 业务库表导出 CSV / JSONL
- ../acr-engine/scripts/normalize_business_export.py
- ../acr-engine/scripts/split_business_manifest_ready.py
- ../acr-engine/scripts/build_business_project_manifests.py
最后一步会直接生成:
catalog.jsontrain.jsontest.jsonval.json
格式对齐当前项目已有 manifest 结构。
1. 对齐后的项目格式
catalog.json
- 只放 reference
- 字段:
song_id / audio_path / duration / type=reference / source_dataset
train.json / test.json
- 前半部分是 query
- 后半部分拼接 reference
- query 字段:
song_idaudio_pathdurationtype=cleanoffsetsegment_type=external_querysource_dataset
val.json
- 当前默认只放
split=val的 query - 可选把
holdout合并进val
2. 示例命令
cd /workspace/acr-engine
/usr/local/miniconda3/bin/python scripts/normalize_business_export.py \
--input configs/manifests/examples/business_asset_export_example.csv \
--output /tmp/business_asset_manifest_ready.jsonl
/usr/local/miniconda3/bin/python scripts/build_business_project_manifests.py \
--input /tmp/business_asset_manifest_ready.jsonl \
--output-dir /tmp/business_project_manifests
如果你希望把 holdout 先并进 val.json:
/usr/local/miniconda3/bin/python scripts/build_business_project_manifests.py \
--input /tmp/business_asset_manifest_ready.jsonl \
--output-dir /tmp/business_project_manifests \
--include-holdout-in-val
3. 适配边界
这一步还不是最终“真实业务生产接入”,但已经足够让下个 session:
- 用真实业务导出样本跑通 manifest 结构
- 对接
train.py / evaluate.py / run_demo.py - 再只针对最终字段细节做小修