Attach runnable command templates to the extraction plan

Constraint: The Phase-1 PostgreSQL plan needed to become immediately actionable without pretending the workers already exist Rejected: Keep the plan as ordering-only metadata | It still leaves the next session to reconstruct command wiring by hand Confidence: high Scope-risk: narrow Directive: Keep future worker implementations compatible with the env-var contract emitted by the planner report Tested: /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/plan_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md Not-tested: Real worker binaries at workers/run_chromaprint_job.py and workers/run_embedding_job.py do not exist yet

Attach runnable command templates to the extraction plan
Constraint: The Phase-1 PostgreSQL plan needed to become immediately actionable without pretending the workers already exist Rejected: Keep the plan as ordering-only metadata | It still leaves the next session to reconstruct command wiring by hand Confidence: high Scope-risk: narrow Directive: Keep future worker implementations compatible with the env-var contract emitted by the planner report Tested: /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/plan_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md Not-tested: Real worker binaries at workers/run_chromaprint_job.py and workers/run_embedding_job.py do not exist yet
cnb.bofCdSsphPA
Commit 06794812 ... 067948120de164a61de09d3141d3c106299c68ff authored 2026-06-04 12:57:39 +0800 by cnb.bofCdSsphPA
Showing 6 changed files with 102 additions and 5 deletions
acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json
acr-engine/scripts/plan_phase1_extraction_jobs_live.py
docs/CHANGELOG.md
docs/model-feature-registry-bootstrap.md
docs/postgres_db_schema_samples.md
docs/session-handoff.md
--- a/acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json
View file @0679481
+++ b/acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json
View file @0679481
@@ -49,6 +49,10 @@
        "run feature extraction for chromaprint v1",
        "write to audio_fingerprint",
        "target scope: reference_set:phase1_hot_reference_v1"
+      ],
+      "command_suggestions": [
+        "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py",
+        "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
      ]
    },
    {
@@ -90,6 +94,10 @@
        "run feature extraction for mert v1-95m",
        "write to audio_embedding + audio_embedding_vector_768",
        "target scope: reference_set:phase1_hot_reference_v1"
+      ],
+      "command_suggestions": [
+        "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+        "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
      ]
    },
    {
@@ -131,6 +139,10 @@
        "run feature extraction for mert v1-95m",
        "write to audio_embedding + audio_embedding_vector_768",
        "target scope: reference_set:phase1_hot_reference_v1"
+      ],
+      "command_suggestions": [
+        "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+        "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
      ]
    },
    {
@@ -172,6 +184,10 @@
        "run feature extraction for muq large-msd-iter",
        "write to audio_embedding + audio_embedding_vector_768",
        "target scope: reference_set:phase1_hot_reference_v1"
+      ],
+      "command_suggestions": [
+        "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=muq MODEL_VERSION=large-msd-iter VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+        "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
      ]
    },
    {
@@ -213,6 +229,10 @@
        "run feature extraction for ecapa acr-baseline-v1",
        "write to audio_embedding + audio_embedding_vector_192",
        "target scope: reference_set:phase1_hot_reference_v1"
+      ],
+      "command_suggestions": [
+        "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=ecapa MODEL_VERSION=acr-baseline-v1 VECTOR_TABLE=audio_embedding_vector_192 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+        "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
      ]
    }
  ],
@@ -257,6 +277,10 @@
          "run feature extraction for chromaprint v1",
          "write to audio_fingerprint",
          "target scope: reference_set:phase1_hot_reference_v1"
+        ],
+        "command_suggestions": [
+          "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py",
+          "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
        ]
      }
    ],
@@ -300,6 +324,10 @@
          "run feature extraction for mert v1-95m",
          "write to audio_embedding + audio_embedding_vector_768",
          "target scope: reference_set:phase1_hot_reference_v1"
+        ],
+        "command_suggestions": [
+          "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+          "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
        ]
      },
      {
@@ -341,6 +369,10 @@
          "run feature extraction for mert v1-95m",
          "write to audio_embedding + audio_embedding_vector_768",
          "target scope: reference_set:phase1_hot_reference_v1"
+        ],
+        "command_suggestions": [
+          "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+          "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
        ]
      },
      {
@@ -382,6 +414,10 @@
          "run feature extraction for muq large-msd-iter",
          "write to audio_embedding + audio_embedding_vector_768",
          "target scope: reference_set:phase1_hot_reference_v1"
+        ],
+        "command_suggestions": [
+          "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=muq MODEL_VERSION=large-msd-iter VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+          "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
        ]
      },
      {
@@ -423,6 +459,10 @@
          "run feature extraction for ecapa acr-baseline-v1",
          "write to audio_embedding + audio_embedding_vector_192",
          "target scope: reference_set:phase1_hot_reference_v1"
+        ],
+        "command_suggestions": [
+          "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=ecapa MODEL_VERSION=acr-baseline-v1 VECTOR_TABLE=audio_embedding_vector_192 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py",
+          "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test \\\npython workers/mark_job_status.py --status running"
        ]
      }
    ]
@@ -436,7 +476,8 @@
      "feature_name": "fingerprint_asset",
      "window_sec": 5.0,
      "hop_sec": 2.5,
-      "physical_target": "audio_fingerprint"
+      "physical_target": "audio_fingerprint",
+      "primary_command": "EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py"
    },
    {
      "order": 2,
@@ -446,7 +487,8 @@
      "feature_name": "semantic_embedding",
      "window_sec": 5.0,
      "hop_sec": 2.5,
-      "physical_target": "audio_embedding"
+      "physical_target": "audio_embedding",
+      "primary_command": "EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py"
    },
    {
      "order": 3,
@@ -456,7 +498,8 @@
      "feature_name": "semantic_embedding",
      "window_sec": 10.0,
      "hop_sec": 5.0,
-      "physical_target": "audio_embedding"
+      "physical_target": "audio_embedding",
+      "primary_command": "EXTRACTION_JOB_ID=3 FEATURE_SET_ID=4 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py"
    },
    {
      "order": 4,
@@ -466,7 +509,8 @@
      "feature_name": "semantic_embedding",
      "window_sec": 5.0,
      "hop_sec": 2.5,
-      "physical_target": "audio_embedding"
+      "physical_target": "audio_embedding",
+      "primary_command": "EXTRACTION_JOB_ID=4 FEATURE_SET_ID=5 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=muq MODEL_VERSION=large-msd-iter VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py"
    },
    {
      "order": 5,
@@ -476,7 +520,8 @@
      "feature_name": "semantic_embedding",
      "window_sec": 5.0,
      "hop_sec": 2.5,
-      "physical_target": "audio_embedding"
+      "physical_target": "audio_embedding",
+      "primary_command": "EXTRACTION_JOB_ID=5 FEATURE_SET_ID=6 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=ecapa MODEL_VERSION=acr-baseline-v1 VECTOR_TABLE=audio_embedding_vector_192 OUTPUT_TARGET=audio_embedding \\\npython workers/run_embedding_job.py"
    }
  ]
 }
\ No newline at end of file
--- a/acr-engine/scripts/plan_phase1_extraction_jobs_live.py
View file @0679481
+++ b/acr-engine/scripts/plan_phase1_extraction_jobs_live.py
View file @0679481
@@ -25,6 +25,26 @@ def parse_target_scope(target_scope: str) -> dict[str, Any]:
    return {'scope_type': 'unknown', 'scope_value': target_scope}


+def build_command_suggestions(job: dict[str, Any], schema: str) -> list[str]:
+    base_env = f"EXTRACTION_JOB_ID={job['extraction_job_id']} FEATURE_SET_ID={job['feature_set_id']} TARGET_SCOPE='{job['target_scope']}' PG_SCHEMA={schema}"
+    commands = []
+    if job['lane'] == 'exact':
+        commands.append(
+            base_env
+            + " OUTPUT_TARGET=audio_fingerprint \\\npython workers/run_chromaprint_job.py"
+        )
+    else:
+        commands.append(
+            base_env
+            + f" MODEL_NAME={job['model_name']} MODEL_VERSION={job['model_version']} VECTOR_TABLE={job['vector_table']} OUTPUT_TARGET={job['physical_target']} \\\npython workers/run_embedding_job.py"
+        )
+    commands.append(
+        base_env
+        + " \\\npython workers/mark_job_status.py --status running"
+    )
+    return commands
+
+
 def main() -> None:
    ap = argparse.ArgumentParser()
    ap.add_argument('--dsn', required=True)
@@ -112,6 +132,7 @@ def main() -> None:
                f"target scope: {row[2]}",
            ],
        }
+        item['command_suggestions'] = build_command_suggestions(item, args.schema)
        jobs.append(item)
        by_lane.setdefault(lane, []).append(item)

@@ -139,6 +160,7 @@ def main() -> None:
                'window_sec': job['window_sec'],
                'hop_sec': job['hop_sec'],
                'physical_target': job['physical_target'],
+                'primary_command': job['command_suggestions'][0],
            }
            for idx, job in enumerate(jobs)
        ],
--- a/docs/CHANGELOG.md
View file @0679481
+++ b/docs/CHANGELOG.md
View file @0679481
 ## 2026-06-04

+- 更新 `plan_phase1_extraction_jobs_live.py` 与 `phase1_extraction_plan_report.json`，把 Phase-1 execution plan 从“仅有排序计划”推进到“附带 `command_suggestions / primary_command` 的可复制执行命令模板”。
 - 新增 `acr-engine/scripts/plan_phase1_extraction_jobs_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json`，支持从 PostgreSQL 的 `feature_extraction_job` 真实读取 pending jobs，并联表生成按 lane / priority 排序的 Phase-1 execution plan。
 - 新增 `acr-engine/scripts/bootstrap_phase1_extraction_jobs_live.py` 与 `acr-engine/data/pgvector_eval/music20/phase1_extraction_jobs_report.json`，把 Phase-1 的 `feature_extraction_job` 初始化做成可直接连 PostgreSQL 的 live 脚本，并已在 `acr_test` schema 真实创建 5 条 pending jobs。
 - 补充 `phase1_registry_bootstrap_idempotency_report.json` 与文档说明，验证 `bootstrap_phase1_model_registry_live.py` 在 `acr_test` schema 上连续执行两次后表计数保持稳定，证明 Phase-1 registry bootstrap 具备可重复执行的幂等性。
--- a/docs/model-feature-registry-bootstrap.md
View file @0679481
+++ b/docs/model-feature-registry-bootstrap.md
View file @0679481
@@ -397,3 +397,25 @@ cd /workspace/acr-engine
 结论：

 > 现在 PostgreSQL 里已经不仅能描述“有哪些 job”，还可以直接生成**按执行顺序排好的抽特征计划**。
+
+### 10.3 ready-to-run command suggestions（已补齐）
+
+本轮又进一步把 planner 升级为：**每条 job 都生成 command suggestion**。
+
+示例：
+
+#### exact lane
+
+```bash
+EXTRACTION_JOB_ID=1 FEATURE_SET_ID=2 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test OUTPUT_TARGET=audio_fingerprint \
+python workers/run_chromaprint_job.py
+```
+
+#### semantic lane
+
+```bash
+EXTRACTION_JOB_ID=2 FEATURE_SET_ID=3 TARGET_SCOPE='reference_set:phase1_hot_reference_v1' PG_SCHEMA=acr_test MODEL_NAME=mert MODEL_VERSION=v1-95m VECTOR_TABLE=audio_embedding_vector_768 OUTPUT_TARGET=audio_embedding \
+python workers/run_embedding_job.py
+```
+
+这意味着下个 session 不需要先手工拼环境变量和 job 绑定关系，而可以直接从 planner 报告里复制命令模板。
--- a/docs/postgres_db_schema_samples.md
View file @0679481
+++ b/docs/postgres_db_schema_samples.md
View file @0679481
@@ -430,6 +430,12 @@ flowchart LR
 对应 live 报告：
 - `acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json`

+本轮补充后，plan 里还会真实给出：
+- `command_suggestions`
+- `primary_command`
+
+也就是从 PostgreSQL 的 pending jobs 已经可以直接走到“可复制的执行命令模板”。
+
 ### 路线 1：继续做 PostgreSQL 工程化

 1. 把 `live_pgvector_music20_eval.py` 泛化成：
--- a/docs/session-handoff.md
View file @0679481
+++ b/docs/session-handoff.md
View file @0679481
@@ -185,6 +185,7 @@ sed -n '1,320p' acr-engine/sql/acr_pg_schema_v2.sql
 - Phase-1 registry bootstrap 已有幂等性证据：同 schema 连续执行两次后，`model_registry=5 / feature_set_registry=6 / reference_set_registry=2` 保持不变
 - PostgreSQL `acr_test` schema 上已真实创建 5 条 `feature_extraction_job`，后续 MERT / MuQ 接入可直接从 pending jobs 启动
 - PostgreSQL `acr_test` schema 上已真实生成 Phase-1 extraction execution plan，当前顺序是 `chromaprint -> mert -> mert-long -> muq -> ecapa`
+- extraction plan 报告里已包含 `command_suggestions / primary_command`，下次可直接从 plan 抄 worker 命令模板

 ### 未验证 / 仍是缺口
 - **未实际跑 MERT / MuQ encoder-only 特征抽取**