Validate internal audio assets before manifest-scale training
Constraint: Internal CSV exports should expose missing audio and usable durations before they are treated as train-ready manifests Rejected: Defer path and duration checks to later training failures | Would make ingestion debugging slow and noisy Confidence: high Scope-risk: narrow Directive: Keep internal asset validation lightweight at mapping time; surface existence and duration early, then layer richer QC rules incrementally Tested: internal_asset_type_mapper.py with --audio-root on a 6-row sample detected missing_audio=2 and emitted durations for existing reference/query assets Not-tested: Production-scale scans over the full internal asset repository
Showing
3 changed files
with
76 additions
and
3 deletions
-
Please register or sign in to post a comment