Attach runnable command templates to the extraction plan
Constraint: The Phase-1 PostgreSQL plan needed to become immediately actionable without pretending the workers already exist Rejected: Keep the plan as ordering-only metadata | It still leaves the next session to reconstruct command wiring by hand Confidence: high Scope-risk: narrow Directive: Keep future worker implementations compatible with the env-var contract emitted by the planner report Tested: /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/plan_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md Not-tested: Real worker binaries at workers/run_chromaprint_job.py and workers/run_embedding_job.py do not exist yet
Showing
6 changed files
with
102 additions
and
5 deletions
| ... | @@ -49,6 +49,10 @@ | ... | @@ -49,6 +49,10 @@ |
| 49 | "run feature extraction for chromaprint v1", | 49 | "run feature extraction for chromaprint v1", |
| 50 | "write to audio_fingerprint", | 50 | "write to audio_fingerprint", |
| 51 | "target scope: reference_set:phase1_hot_reference_v1" | 51 | "target scope: reference_set:phase1_hot_reference_v1" |
| 52 | ], | ||
| 53 | "command_suggestions": [ | ||
| 54 | "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py", | ||
| 55 | "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 52 | ] | 56 | ] |
| 53 | }, | 57 | }, |
| 54 | { | 58 | { |
| ... | @@ -90,6 +94,10 @@ | ... | @@ -90,6 +94,10 @@ |
| 90 | "run feature extraction for mert v1-95m", | 94 | "run feature extraction for mert v1-95m", |
| 91 | "write to audio_embedding + audio_embedding_vector_768", | 95 | "write to audio_embedding + audio_embedding_vector_768", |
| 92 | "target scope: reference_set:phase1_hot_reference_v1" | 96 | "target scope: reference_set:phase1_hot_reference_v1" |
| 97 | ], | ||
| 98 | "command_suggestions": [ | ||
| 99 | "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 100 | "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 93 | ] | 101 | ] |
| 94 | }, | 102 | }, |
| 95 | { | 103 | { |
| ... | @@ -131,6 +139,10 @@ | ... | @@ -131,6 +139,10 @@ |
| 131 | "run feature extraction for mert v1-95m", | 139 | "run feature extraction for mert v1-95m", |
| 132 | "write to audio_embedding + audio_embedding_vector_768", | 140 | "write to audio_embedding + audio_embedding_vector_768", |
| 133 | "target scope: reference_set:phase1_hot_reference_v1" | 141 | "target scope: reference_set:phase1_hot_reference_v1" |
| 142 | ], | ||
| 143 | "command_suggestions": [ | ||
| 144 | "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 145 | "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 134 | ] | 146 | ] |
| 135 | }, | 147 | }, |
| 136 | { | 148 | { |
| ... | @@ -172,6 +184,10 @@ | ... | @@ -172,6 +184,10 @@ |
| 172 | "run feature extraction for muq large-msd-iter", | 184 | "run feature extraction for muq large-msd-iter", |
| 173 | "write to audio_embedding + audio_embedding_vector_768", | 185 | "write to audio_embedding + audio_embedding_vector_768", |
| 174 | "target scope: reference_set:phase1_hot_reference_v1" | 186 | "target scope: reference_set:phase1_hot_reference_v1" |
| 187 | ], | ||
| 188 | "command_suggestions": [ | ||
| 189 | "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=muq MODEL_VERSION=large-msd-iter VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 190 | "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 175 | ] | 191 | ] |
| 176 | }, | 192 | }, |
| 177 | { | 193 | { |
| ... | @@ -213,6 +229,10 @@ | ... | @@ -213,6 +229,10 @@ |
| 213 | "run feature extraction for ecapa acr-baseline-v1", | 229 | "run feature extraction for ecapa acr-baseline-v1", |
| 214 | "write to audio_embedding + audio_embedding_vector_192", | 230 | "write to audio_embedding + audio_embedding_vector_192", |
| 215 | "target scope: reference_set:phase1_hot_reference_v1" | 231 | "target scope: reference_set:phase1_hot_reference_v1" |
| 232 | ], | ||
| 233 | "command_suggestions": [ | ||
| 234 | "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=ecapa MODEL_VERSION=acr-baseline-v1 VECTOR_TABLE=audio_embedding_vector_192 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 235 | "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 216 | ] | 236 | ] |
| 217 | } | 237 | } |
| 218 | ], | 238 | ], |
| ... | @@ -257,6 +277,10 @@ | ... | @@ -257,6 +277,10 @@ |
| 257 | "run feature extraction for chromaprint v1", | 277 | "run feature extraction for chromaprint v1", |
| 258 | "write to audio_fingerprint", | 278 | "write to audio_fingerprint", |
| 259 | "target scope: reference_set:phase1_hot_reference_v1" | 279 | "target scope: reference_set:phase1_hot_reference_v1" |
| 280 | ], | ||
| 281 | "command_suggestions": [ | ||
| 282 | "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py", | ||
| 283 | "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 260 | ] | 284 | ] |
| 261 | } | 285 | } |
| 262 | ], | 286 | ], |
| ... | @@ -300,6 +324,10 @@ | ... | @@ -300,6 +324,10 @@ |
| 300 | "run feature extraction for mert v1-95m", | 324 | "run feature extraction for mert v1-95m", |
| 301 | "write to audio_embedding + audio_embedding_vector_768", | 325 | "write to audio_embedding + audio_embedding_vector_768", |
| 302 | "target scope: reference_set:phase1_hot_reference_v1" | 326 | "target scope: reference_set:phase1_hot_reference_v1" |
| 327 | ], | ||
| 328 | "command_suggestions": [ | ||
| 329 | "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 330 | "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 303 | ] | 331 | ] |
| 304 | }, | 332 | }, |
| 305 | { | 333 | { |
| ... | @@ -341,6 +369,10 @@ | ... | @@ -341,6 +369,10 @@ |
| 341 | "run feature extraction for mert v1-95m", | 369 | "run feature extraction for mert v1-95m", |
| 342 | "write to audio_embedding + audio_embedding_vector_768", | 370 | "write to audio_embedding + audio_embedding_vector_768", |
| 343 | "target scope: reference_set:phase1_hot_reference_v1" | 371 | "target scope: reference_set:phase1_hot_reference_v1" |
| 372 | ], | ||
| 373 | "command_suggestions": [ | ||
| 374 | "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 375 | "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 344 | ] | 376 | ] |
| 345 | }, | 377 | }, |
| 346 | { | 378 | { |
| ... | @@ -382,6 +414,10 @@ | ... | @@ -382,6 +414,10 @@ |
| 382 | "run feature extraction for muq large-msd-iter", | 414 | "run feature extraction for muq large-msd-iter", |
| 383 | "write to audio_embedding + audio_embedding_vector_768", | 415 | "write to audio_embedding + audio_embedding_vector_768", |
| 384 | "target scope: reference_set:phase1_hot_reference_v1" | 416 | "target scope: reference_set:phase1_hot_reference_v1" |
| 417 | ], | ||
| 418 | "command_suggestions": [ | ||
| 419 | "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=muq MODEL_VERSION=large-msd-iter VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 420 | "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 385 | ] | 421 | ] |
| 386 | }, | 422 | }, |
| 387 | { | 423 | { |
| ... | @@ -423,6 +459,10 @@ | ... | @@ -423,6 +459,10 @@ |
| 423 | "run feature extraction for ecapa acr-baseline-v1", | 459 | "run feature extraction for ecapa acr-baseline-v1", |
| 424 | "write to audio_embedding + audio_embedding_vector_192", | 460 | "write to audio_embedding + audio_embedding_vector_192", |
| 425 | "target scope: reference_set:phase1_hot_reference_v1" | 461 | "target scope: reference_set:phase1_hot_reference_v1" |
| 462 | ], | ||
| 463 | "command_suggestions": [ | ||
| 464 | "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=ecapa MODEL_VERSION=acr-baseline-v1 VECTOR_TABLE=audio_embedding_vector_192 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py", | ||
| 465 | "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running" | ||
| 426 | ] | 466 | ] |
| 427 | } | 467 | } |
| 428 | ] | 468 | ] |
| ... | @@ -436,7 +476,8 @@ | ... | @@ -436,7 +476,8 @@ |
| 436 | "feature_name": "fingerprint_asset", | 476 | "feature_name": "fingerprint_asset", |
| 437 | "window_sec": 5.0, | 477 | "window_sec": 5.0, |
| 438 | "hop_sec": 2.5, | 478 | "hop_sec": 2.5, |
| 439 | "physical_target": "audio_fingerprint" | 479 | "physical_target": "audio_fingerprint", |
| 480 | "primary_command": "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py" | ||
| 440 | }, | 481 | }, |
| 441 | { | 482 | { |
| 442 | "order": 2, | 483 | "order": 2, |
| ... | @@ -446,7 +487,8 @@ | ... | @@ -446,7 +487,8 @@ |
| 446 | "feature_name": "semantic_embedding", | 487 | "feature_name": "semantic_embedding", |
| 447 | "window_sec": 5.0, | 488 | "window_sec": 5.0, |
| 448 | "hop_sec": 2.5, | 489 | "hop_sec": 2.5, |
| 449 | "physical_target": "audio_embedding" | 490 | "physical_target": "audio_embedding", |
| 491 | "primary_command": "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py" | ||
| 450 | }, | 492 | }, |
| 451 | { | 493 | { |
| 452 | "order": 3, | 494 | "order": 3, |
| ... | @@ -456,7 +498,8 @@ | ... | @@ -456,7 +498,8 @@ |
| 456 | "feature_name": "semantic_embedding", | 498 | "feature_name": "semantic_embedding", |
| 457 | "window_sec": 10.0, | 499 | "window_sec": 10.0, |
| 458 | "hop_sec": 5.0, | 500 | "hop_sec": 5.0, |
| 459 | "physical_target": "audio_embedding" | 501 | "physical_target": "audio_embedding", |
| 502 | "primary_command": "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py" | ||
| 460 | }, | 503 | }, |
| 461 | { | 504 | { |
| 462 | "order": 4, | 505 | "order": 4, |
| ... | @@ -466,7 +509,8 @@ | ... | @@ -466,7 +509,8 @@ |
| 466 | "feature_name": "semantic_embedding", | 509 | "feature_name": "semantic_embedding", |
| 467 | "window_sec": 5.0, | 510 | "window_sec": 5.0, |
| 468 | "hop_sec": 2.5, | 511 | "hop_sec": 2.5, |
| 469 | "physical_target": "audio_embedding" | 512 | "physical_target": "audio_embedding", |
| 513 | "primary_command": "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=muq MODEL_VERSION=large-msd-iter VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py" | ||
| 470 | }, | 514 | }, |
| 471 | { | 515 | { |
| 472 | "order": 5, | 516 | "order": 5, |
| ... | @@ -476,7 +520,8 @@ | ... | @@ -476,7 +520,8 @@ |
| 476 | "feature_name": "semantic_embedding", | 520 | "feature_name": "semantic_embedding", |
| 477 | "window_sec": 5.0, | 521 | "window_sec": 5.0, |
| 478 | "hop_sec": 2.5, | 522 | "hop_sec": 2.5, |
| 479 | "physical_target": "audio_embedding" | 523 | "physical_target": "audio_embedding", |
| 524 | "primary_command": "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=ecapa MODEL_VERSION=acr-baseline-v1 VECTOR_TABLE=audio_embedding_vector_192 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py" | ||
| 480 | } | 525 | } |
| 481 | ] | 526 | ] |
| 482 | } | 527 | } |
| ... | \ No newline at end of file | ... | \ No newline at end of file | ... | ... |
| ... | @@ -25,6 +25,26 @@ def parse_target_scope(target_scope: str) -> dict[str, Any]: | ... | @@ -25,6 +25,26 @@ def parse_target_scope(target_scope: str) -> dict[str, Any]: |
| 25 | return {'scope_type': 'unknown', 'scope_value': target_scope} | 25 | return {'scope_type': 'unknown', 'scope_value': target_scope} |
| 26 | 26 | ||
| 27 | 27 | ||
| 28 | def build_command_suggestions(job: dict[str, Any], schema: str) -> list[str]: | ||
| 29 | base_env = f"EXTRACTION_JOB_ID={job['extraction_job_id']} FEATURE_SET_ID={job['feature_set_id']} TARGET_SCOPE='{job['target_scope']}' PG_SCHEMA={schema}" | ||
| 30 | commands = [] | ||
| 31 | if job['lane'] == 'exact': | ||
| 32 | commands.append( | ||
| 33 | base_env | ||
| 34 | + " OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py" | ||
| 35 | ) | ||
| 36 | else: | ||
| 37 | commands.append( | ||
| 38 | base_env | ||
| 39 | + f" MODEL_NAME={job['model_name']} MODEL_VERSION={job['model_version']} VECTOR_TABLE={job['vector_table']} OUTPUT_TARGET={job['physical_target']} \\\npython workers/run_embedding_job.py" | ||
| 40 | ) | ||
| 41 | commands.append( | ||
| 42 | base_env | ||
| 43 | + " \\\npython workers/mark_job_status.py --status running" | ||
| 44 | ) | ||
| 45 | return commands | ||
| 46 | |||
| 47 | |||
| 28 | def main() -> None: | 48 | def main() -> None: |
| 29 | ap = argparse.ArgumentParser() | 49 | ap = argparse.ArgumentParser() |
| 30 | ap.add_argument('--dsn', required=True) | 50 | ap.add_argument('--dsn', required=True) |
| ... | @@ -112,6 +132,7 @@ def main() -> None: | ... | @@ -112,6 +132,7 @@ def main() -> None: |
| 112 | f"target scope: {row[2]}", | 132 | f"target scope: {row[2]}", |
| 113 | ], | 133 | ], |
| 114 | } | 134 | } |
| 135 | item['command_suggestions'] = build_command_suggestions(item, args.schema) | ||
| 115 | jobs.append(item) | 136 | jobs.append(item) |
| 116 | by_lane.setdefault(lane, []).append(item) | 137 | by_lane.setdefault(lane, []).append(item) |
| 117 | 138 | ||
| ... | @@ -139,6 +160,7 @@ def main() -> None: | ... | @@ -139,6 +160,7 @@ def main() -> None: |
| 139 | 'window_sec': job['window_sec'], | 160 | 'window_sec': job['window_sec'], |
| 140 | 'hop_sec': job['hop_sec'], | 161 | 'hop_sec': job['hop_sec'], |
| 141 | 'physical_target': job['physical_target'], | 162 | 'physical_target': job['physical_target'], |
| 163 | 'primary_command': job['command_suggestions'][0], | ||
| 142 | } | 164 | } |
| 143 | for idx, job in enumerate(jobs) | 165 | for idx, job in enumerate(jobs) |
| 144 | ], | 166 | ], | ... | ... |
| 1 | ## 2026-06-04 | 1 | ## 2026-06-04 |
| 2 | 2 | ||
| 3 | - 更新 `plan_phase1_extraction_jobs_live.py` 与 `phase1_extraction_plan_report.json`,把 Phase-1 execution plan 从“仅有排序计划”推进到“附带 `command_suggestions / primary_command` 的可复制执行命令模板”。 | ||
| 3 | - 新增 `acr-engine/scripts/plan_phase1_extraction_jobs_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json`,支持从 PostgreSQL 的 `feature_extraction_job` 真实读取 pending jobs,并联表生成按 lane / priority 排序的 Phase-1 execution plan。 | 4 | - 新增 `acr-engine/scripts/plan_phase1_extraction_jobs_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json`,支持从 PostgreSQL 的 `feature_extraction_job` 真实读取 pending jobs,并联表生成按 lane / priority 排序的 Phase-1 execution plan。 |
| 4 | - 新增 `acr-engine/scripts/bootstrap_phase1_extraction_jobs_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_extraction_jobs_report.json`,把 Phase-1 的 `feature_extraction_job` 初始化做成可直接连 PostgreSQL 的 live 脚本,并已在 `acr_test` schema 真实创建 5 条 pending jobs。 | 5 | - 新增 `acr-engine/scripts/bootstrap_phase1_extraction_jobs_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_extraction_jobs_report.json`,把 Phase-1 的 `feature_extraction_job` 初始化做成可直接连 PostgreSQL 的 live 脚本,并已在 `acr_test` schema 真实创建 5 条 pending jobs。 |
| 5 | - 补充 `phase1_registry_bootstrap_idempotency_report.json` 与文档说明,验证 `bootstrap_phase1_model_registry_live.py` 在 `acr_test` schema 上连续执行两次后表计数保持稳定,证明 Phase-1 registry bootstrap 具备可重复执行的幂等性。 | 6 | - 补充 `phase1_registry_bootstrap_idempotency_report.json` 与文档说明,验证 `bootstrap_phase1_model_registry_live.py` 在 `acr_test` schema 上连续执行两次后表计数保持稳定,证明 Phase-1 registry bootstrap 具备可重复执行的幂等性。 | ... | ... |
| ... | @@ -397,3 +397,25 @@ cd /workspace/acr-engine | ... | @@ -397,3 +397,25 @@ cd /workspace/acr-engine |
| 397 | 结论: | 397 | 结论: |
| 398 | 398 | ||
| 399 | > 现在 PostgreSQL 里已经不仅能描述“有哪些 job”,还可以直接生成**按执行顺序排好的抽特征计划**。 | 399 | > 现在 PostgreSQL 里已经不仅能描述“有哪些 job”,还可以直接生成**按执行顺序排好的抽特征计划**。 |
| 400 | |||
| 401 | ### 10.3 ready-to-run command suggestions(已补齐) | ||
| 402 | |||
| 403 | 本轮又进一步把 planner 升级为:**每条 job 都生成 command suggestion**。 | ||
| 404 | |||
| 405 | 示例: | ||
| 406 | |||
| 407 | #### exact lane | ||
| 408 | |||
| 409 | ```bash | ||
| 410 | EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \ | ||
| 411 | python workers/run_chromaprint_job.py | ||
| 412 | ``` | ||
| 413 | |||
| 414 | #### semantic lane | ||
| 415 | |||
| 416 | ```bash | ||
| 417 | EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \ | ||
| 418 | python workers/run_embedding_job.py | ||
| 419 | ``` | ||
| 420 | |||
| 421 | 这意味着下个 session 不需要先手工拼环境变量和 job 绑定关系,而可以直接从 planner 报告里复制命令模板。 | ... | ... |
| ... | @@ -430,6 +430,12 @@ flowchart LR | ... | @@ -430,6 +430,12 @@ flowchart LR |
| 430 | 对应 live 报告: | 430 | 对应 live 报告: |
| 431 | - `acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json` | 431 | - `acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json` |
| 432 | 432 | ||
| 433 | 本轮补充后,plan 里还会真实给出: | ||
| 434 | - `command_suggestions` | ||
| 435 | - `primary_command` | ||
| 436 | |||
| 437 | 也就是从 PostgreSQL 的 pending jobs 已经可以直接走到“可复制的执行命令模板”。 | ||
| 438 | |||
| 433 | ### 路线 1:继续做 PostgreSQL 工程化 | 439 | ### 路线 1:继续做 PostgreSQL 工程化 |
| 434 | 440 | ||
| 435 | 1. 把 `live_pgvector_music20_eval.py` 泛化成: | 441 | 1. 把 `live_pgvector_music20_eval.py` 泛化成: | ... | ... |
| ... | @@ -185,6 +185,7 @@ sed -n '1,320p' acr-engine/sql/acr_pg_schema_v2.sql | ... | @@ -185,6 +185,7 @@ sed -n '1,320p' acr-engine/sql/acr_pg_schema_v2.sql |
| 185 | - Phase-1 registry bootstrap 已有幂等性证据:同 schema 连续执行两次后,`model_registry=5 / feature_set_registry=6 / reference_set_registry=2` 保持不变 | 185 | - Phase-1 registry bootstrap 已有幂等性证据:同 schema 连续执行两次后,`model_registry=5 / feature_set_registry=6 / reference_set_registry=2` 保持不变 |
| 186 | - PostgreSQL `acr_test` schema 上已真实创建 5 条 `feature_extraction_job`,后续 MERT / MuQ 接入可直接从 pending jobs 启动 | 186 | - PostgreSQL `acr_test` schema 上已真实创建 5 条 `feature_extraction_job`,后续 MERT / MuQ 接入可直接从 pending jobs 启动 |
| 187 | - PostgreSQL `acr_test` schema 上已真实生成 Phase-1 extraction execution plan,当前顺序是 `chromaprint -> mert -> mert-long -> muq -> ecapa` | 187 | - PostgreSQL `acr_test` schema 上已真实生成 Phase-1 extraction execution plan,当前顺序是 `chromaprint -> mert -> mert-long -> muq -> ecapa` |
| 188 | - extraction plan 报告里已包含 `command_suggestions / primary_command`,下次可直接从 plan 抄 worker 命令模板 | ||
| 188 | 189 | ||
| 189 | ### 未验证 / 仍是缺口 | 190 | ### 未验证 / 仍是缺口 |
| 190 | - **未实际跑 MERT / MuQ encoder-only 特征抽取** | 191 | - **未实际跑 MERT / MuQ encoder-only 特征抽取** | ... | ... |
-
Please register or sign in to post a comment