Commits · 8002dfb03ecaffefd2ee691b9fd275131a159cf4 · wanghai-tech / hikoon-ACR

04 Jun, 2026 39 commits

Import song-centric manifests into live PostgreSQL with idempotent upserts · 8002dfb0 ...

Constraint: Extend the current 4-table song-centric schema with a practical manifest ingestion path without introducing the older split-table model or hidden side metadata tables.
Rejected: Leave ingestion as handwritten SQL or one-off bootstrap logic | It slows real asset onboarding and makes repeatability hard to verify.
Confidence: high
Scope-risk: narrow
Directive: Use import_songcentric_manifest_live.py plus a manifest JSONL as the default path for batch asset/window onboarding into the fused schema.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/import_songcentric_manifest_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --manifest acr-engine/data/pgvector_eval/music20/songcentric_manifest_sample.jsonl; repeated the import and verified counts remained media_entity=5, audio_object=11, feature_fact=6, set_membership=5; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: feature_fact generation during manifest import and large-scale manifest throughput

authored 2026-06-04 14:45:31 +0800

Bootstrap the fused song-centric schema with repeatable live seed data · 5e43f28b ...

5e43f28b Browse Directory

Constraint: Keep all new initialization logic on top of the current 4-table song-centric schema and validate it against the user PostgreSQL instead of synthetic-only assumptions.
Rejected: Stop at one-row smoke evidence | It does not prove the schema is practical for repeated Phase-1 bootstrap workflows.
Confidence: high
Scope-risk: narrow
Directive: Use bootstrap_songcentric_phase1_live.py as the default seed/bootstrap path when demonstrating or validating the fused schema on live PostgreSQL.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/bootstrap_songcentric_phase1_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: large-batch bootstrap and conflict handling under concurrent writers

authored 2026-06-04 14:43:27 +0800

Prove the fused song-centric ACR schema on live PostgreSQL · 3ce36679 ...

3ce36679 Browse Directory

Constraint: Stay within the current 4-table song-centric model and validate it against the user-provided PostgreSQL before treating it as the active schema candidate.
Rejected: Leave the fused model as docs-only guidance | Without a runnable SQL file and smoke evidence, downstream implementation would still be ambiguous.
Confidence: high
Scope-risk: narrow
Directive: Prefer acr_pg_schema_songcentric_v1.sql for new schema experiments tied to the current song-centric design; do not revive the older split-table model for Phase-1 by default.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/smoke_songcentric_schema_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: high-volume bulk ingest on the fused schema

authored 2026-06-04 14:41:37 +0800

Make the fused Phase-1 ACR schema concrete with DDL samples · fe416ec9 ...

fe416ec9

Constraint: Keep the storage design aligned to the current song-centric model while turning the 4-table fused schema into something engineers can directly review and implement.
Rejected: Keep only conceptual docs without concrete SQL | It leaves too much ambiguity about where slices, models, and features actually land.
Confidence: high
Scope-risk: narrow
Directive: Until the repository gains a production SQL file for the fused model, treat postgres_db_schema_samples.md as the authoritative DDL draft for media_entity/audio_object/feature_fact/set_membership.
Tested: git diff --check on touched files; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: Executing the fused DDL against a live PostgreSQL schema

authored 2026-06-04 14:39:30 +0800

Reduce ACR docs to the current song-centric storage design · ac2e6730 ...

ac2e6730 Browse Directory

Constraint: Keep only documentation that directly serves the current Phase-1 song-centric + fused-table storage and retrieval design.
Rejected: Preserve broad historical, dataset, business-export, and template docs in the main docs root | They increase handoff cost and blur the active design surface.
Confidence: high
Scope-risk: moderate
Directive: Treat postgresql-data-model.md as the single source of truth for where slices, models, and features are stored until a concrete fused DDL supersedes it.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files; final docs root reduced to 12 files
Not-tested: external markdown renderers and downstream readers that may still expect removed auxiliary docs

authored 2026-06-04 14:37:22 +0800

Favor typed unified tables for the Phase-1 ACR storage model · 44222971 ...

44222971 Browse Directory

Constraint: Reduce schema reading cost for new engineers while preserving the logical distinctions needed for copyright-scale retrieval and attribution.
Rejected: Keep adding highly specialized tables for every layer in Phase-1 | It increases join cost in the mental model faster than it improves first-stage delivery.
Confidence: high
Scope-risk: narrow
Directive: Prefer a fused physical model (media_entity/audio_object/feature_fact/set_membership) with type fields, while keeping song/recording/asset/window as logical semantics.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files; rg confirmed fused-model sections are present in docs
Not-tested: concrete SQL DDL for the fused physical model

authored 2026-06-04 14:32:58 +0800

Freeze the Phase-1 minimal schema story for ACR delivery · 7ada6f21 ...

7ada6f21 Browse Directory

Constraint: Keep the production-ready v2 model intact while making the first-delivery table set explicit for engineers starting implementation.
Rejected: Introduce a separate competing Phase-1 schema document | It would create another parallel truth and slow handoff.
Confidence: high
Scope-risk: narrow
Directive: When discussing first-stage storage, default to song/recording/recording_asset/audio_window plus feature and reference tables before bringing in heavier governance tables.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files
Not-tested: SQL DDL generation from the simplified narrative

authored 2026-06-04 14:31:31 +0800

Keep ACR entity layers simple without collapsing recording assets · 89d9d72b ...

89d9d72b Browse Directory

Constraint: Preserve the Phase-1 minimal schema story while clarifying when simplification is safe and when it creates future refactor risk.
Rejected: Merge recording and recording_asset in the formal schema now | Copyright-scale catalogs will quickly need multi-file and multi-source recording support.
Confidence: high
Scope-risk: narrow
Directive: Use 'song -> recording -> asset -> window -> feature' as the default communication shorthand, but keep recording and asset split in the persisted model.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files
Not-tested: Rendered markdown preview in external viewers

authored 2026-06-04 14:30:15 +0800

Reduce ACR handoff time with a single doc chain · 6d4f8c1c ...

6d4f8c1c Browse Directory

Constraint: Preserve the current Phase-1 runner, PostgreSQL v2 contract, and live validation narrative while removing duplicate doc entrypoints.
Rejected: Keep multiple parallel handoff docs | They force new contributors to diff stale narratives before they can act.
Confidence: high
Scope-risk: narrow
Directive: Treat README -> start-here -> session-handoff as the only first-read path unless a newer handoff chain fully replaces it.
Tested: git diff --check on touched docs/script; rg for deleted-doc residual refs outside CHANGELOG; reran scripts/run_planner_validation_commands_live.py with executed_count=4 and all_passed=true
Not-tested: Markdown link rendering in external viewers

authored 2026-06-04 14:25:25 +0800

Put the shortest verified startup path at the docs entrypoint · 8d6e4b29 ...

8d6e4b29 Browse Directory

Constraint: The README remained a reading-first surface while the handoff had already converged on a faster validated startup command, so the docs entrypoint needed to match the actual recovery workflow.
Rejected: Keep the shortest path only in session-handoff | That would still force many sessions to open the wrong document first.
Confidence: high
Scope-risk: narrow
Directive: Treat docs/README.md and docs/session-handoff.md as aligned startup surfaces; keep the runner command identical in both places.
Tested: git diff --check; /usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/planner_validation_commands_runner_report.json
Not-tested: This commit reshapes documentation only; it does not change worker behavior.

authored 2026-06-04 14:16:18 +0800

Make the handoff start from the shortest verified recovery path · 061dd5e7 ...

061dd5e7 Browse Directory

Constraint: The session handoff had become information-rich enough that the next session still needed manual triage, so the opening section had to be collapsed to one verified command path.
Rejected: Keep the handoff primarily as a reading list | That would preserve context but not minimize restart latency.
Confidence: high
Scope-risk: narrow
Directive: Start future sessions with the planner validation runner before reading deeper docs unless the task explicitly skips validation.
Tested: git diff --check; verified docs/session-handoff.md now points to scripts/run_planner_validation_commands_live.py backed by fresh data/pgvector_eval/music20/planner_validation_commands_runner_report.json evidence (executed_count=4, all_passed=true)
Not-tested: No new code-path execution was needed in this commit because it reorganizes the already-verified startup flow.

authored 2026-06-04 14:14:47 +0800

Add a runner for all planner validation entrypoints · eb2ea03a ...

eb2ea03a Browse Directory

Constraint: The planner artifact had become executable, but future sessions still needed a reusable entrypoint instead of ad-hoc inline Python to consume it.
Rejected: Keep the execution proof as one-off shell snippets | That would not give the next session a durable command surface.
Confidence: high
Scope-risk: narrow
Directive: Use run_planner_validation_commands_live.py as the default preflight gate before attempting new Phase-1 worker changes on a host.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_planner_validation_commands_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/planner_validation_commands_runner_report.json
Not-tested: The runner only validates planner entrypoints; it does not unlock successful extraction on an environment-blocked host.

authored 2026-06-04 14:12:50 +0800

Close the planner validation loop across all four live entrypoints · 8e2d4852 ...

8e2d4852 Browse Directory

Constraint: Partial execution proof for planner validation commands still left room for manual reconstruction risk, so the remaining entrypoints had to be exercised too.
Rejected: Stop after two executed planner commands | It would leave the negative matrix and asset-upsert entrypoints unproven.
Confidence: high
Scope-risk: narrow
Directive: Treat phase1_validation_commands_execution_report.json as the authoritative proof that the planner artifact is executable end-to-end.
Tested: git diff --check; /usr/local/miniconda3/bin/python - <<'PY' ... execute validation_commands.semantic_vector_negative_matrix and validation_commands.asset_level_upsert_validation from data/pgvector_eval/music20/phase1_extraction_plan_report.json ... PY
Not-tested: Individual extraction jobs still remain environment-blocked; this commit proves validation entrypoints, not successful feature extraction.

authored 2026-06-04 14:11:30 +0800

Prove planner validation commands execute without manual reconstruction · fa33c3a1 ...

fa33c3a1 Browse Directory

Constraint: Adding validation_commands to the planner was only useful if the emitted commands could be consumed directly, so the plan artifact needed one more layer of execution proof.
Rejected: Assume command strings are correct because they look valid | That would leave restart automation unproven.
Confidence: high
Scope-risk: narrow
Directive: Prefer executing validation_commands from the planner artifact instead of retyping equivalent checks by hand.
Tested: git diff --check; /usr/local/miniconda3/bin/python - <<'PY' ... execute validation_commands.prereq_audit and validation_commands.worker_contract_smoke from data/pgvector_eval/music20/phase1_extraction_plan_report.json ... PY
Not-tested: The remaining planner validation commands were not executed in this commit, though their sibling commands proved the artifact is directly consumable.

authored 2026-06-04 14:10:01 +0800

Make the Phase-1 planner carry live validation entrypoints · 9b020339 ...

9b020339 Browse Directory

Constraint: The repo now has multiple verified smoke and audit scripts, so leaving them outside the planner would force future sessions to rediscover the right validation commands by reading docs.
Rejected: Document the commands only in markdown | That would drift from the executable plan artifact and slow restart execution.
Confidence: high
Scope-risk: narrow
Directive: Treat validation_commands in the planner as the first-stop entrypoints before running individual extraction jobs on a new host.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json
Not-tested: The planner still emits commands for an environment-blocked host and does not prove successful extraction by itself.

authored 2026-06-04 14:08:30 +0800

Turn Phase-1 host prerequisites into a live audit artifact · 58c29eaa ...

58c29eaa

Constraint: Worker-contract validation is now stable enough that the remaining uncertainty is host readiness, so the next blocker had to be made explicit instead of inferred from repeated failed runs.
Rejected: Keep prerequisite knowledge only in prose | It would drift and force future sessions to rediscover the same missing mounts and packages.
Confidence: high
Scope-risk: narrow
Directive: Run the prerequisite audit before retrying live extraction so host blockers are measured once and reused across lanes.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_phase1_prereq_audit_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_phase1_prereq_audit_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_prereq_audit_report.json
Not-tested: This audit does not install dependencies or mount assets; it only reports readiness.

authored 2026-06-04 14:05:48 +0800

Add code-server variant as Dockerfile.cnb · 43d2f93a ...

43d2f93a Browse Directory

Create Dockerfile.cnb based on the optimized Dockerfile with
code-server v4.123.0 + 10 VS Code extensions (golang, cnb-welcome,
code-runner, kubernetes, coding-copilot, github-theme, zh-hans-langpack,
vscode-icons, indent-rainbow, markdown-all-in-one).

Constraint: extensions install as user to ~/.local, then switch back to root
Confidence: high
Scope-risk: narrow
Tested: docker build succeeded, all 10 extensions installed OK

authored 2026-06-04 14:02:50 +0800

Make semantic vector-table misconfigurations fail with live evidence · 71bbe76f ...

71bbe76f

Constraint: Phase-1 semantic jobs were already blocked by missing audio and model runtimes, so vector-table regressions needed their own isolated live proof to avoid being masked by the same environment failures.
Rejected: Infer vector-table coverage from code inspection only | It would not prove the worker writes the correct blocker reasons into PostgreSQL metadata.
Confidence: high
Scope-risk: narrow
Directive: When semantic extraction fails, inspect vector_table_report.reason before assuming the host is only missing mounts or model dependencies.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_embedding_vector_table_negative_matrix_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_embedding_vector_table_negative_matrix_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/embedding_vector_table_negative_matrix_report.json
Not-tested: No successful semantic extraction path exists yet on this host; this commit validates negative preflight cases only.

authored 2026-06-04 14:02:20 +0800

Collapse Phase-1 worker validation into one live smoke entrypoint · 223f80ac ...

223f80ac

Constraint: Phase-1 now has multiple lane-specific validation scripts, so without a single smoke entrypoint the next session must manually reconstruct the current blocker picture.
Rejected: Keep exact and semantic checks separate only | It would slow restart diagnosis and hide the shared environment blockers.
Confidence: high
Scope-risk: narrow
Directive: Use the smoke entrypoint first on future sessions to distinguish contract regressions from missing mounts/runtime prerequisites.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_phase1_worker_contract_smoke_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_phase1_worker_contract_smoke_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_worker_contract_smoke_report.json
Not-tested: This smoke still reflects an environment-blocked host and does not prove successful extraction.

authored 2026-06-04 13:58:37 +0800

Prove asset-level embedding upserts against live PostgreSQL · 6ea7365b ...

6ea7365b Browse Directory

Constraint: The schema already declared asset-level idempotency, but without live evidence future work could mistake it for an unverified design note.
Rejected: Rely on DDL inspection alone | It would not prove duplicate inserts are blocked and upserts reuse the same embedding row.
Confidence: high
Scope-risk: narrow
Directive: Keep asset-level writer implementations aligned with the verified ON CONFLICT (feature_set_id, asset_id) WHERE window_id IS NULL contract.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/validate_audio_embedding_asset_upsert_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/validate_audio_embedding_asset_upsert_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_asset_upsert_test --output data/pgvector_eval/music20/audio_embedding_asset_upsert_live_report.json
Not-tested: No production semantic writer uses the asset-level contract yet; this commit validates the DB contract, not an end-to-end extractor.

authored 2026-06-04 13:57:11 +0800

Freeze a live blocker matrix for semantic extraction jobs · 015e3261 ...

015e3261

Constraint: The current container still lacks mounted source audio and the semantic model runtimes, so repeated manual spot-checks are noisy and wasteful.
Rejected: Ad-hoc one-job validation only | It would not show whether failures are contract-wide or model-specific.
Confidence: high
Scope-risk: narrow
Directive: Re-run the matrix before claiming any semantic worker progress so blocker drift across MERT/MuQ/ECAPA is visible.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_phase1_embedding_preflight_matrix_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_phase1_embedding_preflight_matrix_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_embedding_preflight_matrix_report.json
Not-tested: This matrix still cannot prove successful semantic inference until assets and runtime dependencies are available.

authored 2026-06-04 13:54:15 +0800

Make semantic extraction failures auditable before model runtimes land · 399db601 ...

399db601

Constraint: Current container lacks /workspace/downloads and torch/torchaudio/transformers, so Phase-1 semantic work must prove honest failure semantics instead of pretending inference succeeded.
Rejected: Stub semantic embeddings | Would blur the contract between real model outputs and repo-local placeholders.
Confidence: high
Scope-risk: narrow
Directive: Keep the preflight blockers explicit until real MERT/MuQ/ECAPA adapters and asset-level embedding tests exist.
Tested: /usr/local/miniconda3/bin/python -m py_compile workers/run_embedding_job.py workers/run_chromaprint_job.py workers/_job_common.py scripts/bootstrap_phase1_extraction_jobs_live.py scripts/plan_phase1_extraction_jobs_live.py scripts/bootstrap_phase1_reference_members_live.py scripts/live_pgvector_music20_eval.py; git diff --check; /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test; /usr/local/miniconda3/bin/python workers/run_embedding_job.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-id 2 --model-name mert --model-version v1-95m --vector-table audio_embedding_vector_768 --output data/pgvector_eval/music20/phase1_worker_embedding_write_attempt.json
Not-tested: Real encoder inference and asset-level embedding upsert path remain unavailable in this container.

authored 2026-06-04 13:51:52 +0800

Optimize Dockerfile: reduce image size from 4.51GB to 2.16GB · 94d75e92 ...

94d75e92

Remove code-server, build-essential, gcc, libc6-dev, pkg-config, libssl-dev
from final stage. Add conda clean post-install in builder. Strip
unnecessary opencode platform binaries (musl/baseline variants) post-npm-install.
Remove redundant COPY layers for opencode (already covered by full
node directory copy). Keep opencode.exe entry point (Node.js bootstrap).

Constraint: buildkit crashes with 'frontend grpc server closed unexpectedly' on this host; legacy builder used
Confidence: high
Scope-risk: narrow
Directive: opencode.exe is the Node.js bootstrapper, not a Windows binary; do not delete
Tested: docker run --rm verified node/npm/bun/python/hx/claude/opencode/nvim all work

authored 2026-06-04 13:50:24 +0800

Make the exact lane fail honestly before real audio is mounted · 6a97ca13 ...

6a97ca13 Browse Directory

Constraint: the Phase-1 exact lane must not pretend success when reference audio is unreadable, and repeated writes must be idempotent at the database boundary.
Rejected: keep partial-success writes in completed state | rejected because it would blur asset-readability failures and weaken auditability.
Confidence: high
Scope-risk: moderate
Directive: preserve the repo-local chromaprint-style wording and the all-or-nothing failure semantics until production audio mounts and real extractor validation are in place.
Tested: py_compile for chromaprint matcher and chromaprint worker; live PostgreSQL unique index creation on acr_test; non-dry-run chromaprint worker attempt with job_status=failed and failure_reason=unreadable_audio_assets; bootstrap reset back to pending; architect review APPROVED.
Not-tested: successful audio_fingerprint writes against mounted production audio, semantic worker real writes, large-scale concurrent exact-lane execution.

authored 2026-06-04 13:38:45 +0800

Harden the Phase-1 worker contract before real extractors land · b4f304c1 ...

b4f304c1

Constraint: planner outputs must be copy-runnable in the current environment and live PostgreSQL entrypoints must treat schema input as untrusted.
Rejected: defer state guards until real inference arrives | rejected because repeat execution and empty-scope drift would corrupt Phase-1 evidence now.
Confidence: high
Scope-risk: moderate
Directive: keep using the guarded job contract (expected status, schema validation, explicit python path) when replacing dry-run with real writes.
Tested: py_compile for live bootstrap/planner/worker scripts; live PostgreSQL bootstrap for model registry, reference members, and extraction jobs; regenerated extraction plan report; chromaprint + mert dry-run worker runs with scope=20; double-claim guard report returns non-zero; architect review APPROVED.
Not-tested: real fingerprint writes, real embedding writes, large-scale production reference-set ingestion beyond the 20-song acr_test sample.

authored 2026-06-04 13:25:37 +0800

Make Phase-1 extraction jobs executable through PostgreSQL workers · 1b1096ae ...

1b1096ae Browse Directory

Constraint: Phase-1 must stay encoder-only and use PostgreSQL as the orchestration/state plane before real extractor inference lands.
Rejected: implement real MERT/MuQ inference first | rejected because planner/job/state contracts were not yet executable or verified end-to-end.
Confidence: high
Scope-risk: moderate
Directive: preserve the worker job contract and replace dry-run incrementally with real fingerprint/embedding writes.
Tested: py_compile for new workers and planner; live PostgreSQL dry-run for chromaprint job 1 and mert job 2; planner report regeneration; bootstrap restore to pending; git diff --check.
Not-tested: real chromaprint extraction, real MERT/MuQ/ECAPA embedding writes, failed-job retry handling.

authored 2026-06-04 13:10:09 +0800

Attach runnable command templates to the extraction plan · 06794812 ...

06794812 Browse Directory

Constraint: The Phase-1 PostgreSQL plan needed to become immediately actionable without pretending the workers already exist
Rejected: Keep the plan as ordering-only metadata | It still leaves the next session to reconstruct command wiring by hand
Confidence: high
Scope-risk: narrow
Directive: Keep future worker implementations compatible with the env-var contract emitted by the planner report
Tested: /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/plan_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Real worker binaries at workers/run_chromaprint_job.py and workers/run_embedding_job.py do not exist yet

authored 2026-06-04 12:57:39 +0800

Generate a live execution plan from pending extraction jobs · f13caa3e ...

f13caa3e Browse Directory

Constraint: Ralph must keep turning PostgreSQL state into concrete next-step artifacts rather than leaving implied manual steps
Rejected: Stop at creating pending jobs only | It still leaves future sessions to infer ordering and physical targets by hand
Confidence: high
Scope-risk: narrow
Directive: Treat the planner report as the canonical bridge between pending jobs and real extraction workers
Tested: /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/plan_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_plan_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Actual worker that consumes the plan to run MERT/MuQ/Chromaprint extraction end-to-end

authored 2026-06-04 12:54:11 +0800

Create live Phase-1 extraction jobs in PostgreSQL · 5be68c1d ...

5be68c1d Browse Directory

Constraint: Continue Phase-1 industrialization without waiting on missing audio mounts, and keep every Ralph step documented and pushed
Rejected: Leave extraction scheduling as an implicit next step after registry bootstrap | It forces future sessions to reconstruct pending jobs by hand
Confidence: high
Scope-risk: narrow
Directive: Use feature_extraction_job as the canonical handoff between registry bootstrap and actual encoder extraction runs
Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_extraction_jobs_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_extraction_jobs_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_extraction_jobs_live.py acr-engine/data/pgvector_eval/music20/phase1_extraction_jobs_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Downstream worker that consumes these pending jobs to run real MERT/MuQ extraction

authored 2026-06-04 12:49:45 +0800

Prove the Phase-1 registry bootstrap is idempotent · f0c82687 ...

f0c82687 Browse Directory

Constraint: Ralph follow-up work must keep producing audit-ready evidence and a pushed trail for the next session
Rejected: Assume the new bootstrap script is safe to rerun without proof | Duplicate feature-set inserts would erode trust in the PostgreSQL bootstrap path
Confidence: high
Scope-risk: narrow
Directive: Re-run registry bootstrap in-place before future extraction jobs and treat count drift as a regression signal
Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json (run twice); /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_model_registry_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_model_registry_live.py acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_idempotency_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Actual downstream MERT/MuQ extraction after bootstrap, missing business sample mount recovery

authored 2026-06-04 12:47:24 +0800

Bootstrap the Phase-1 model registry on live PostgreSQL · fef8f438 ...

fef8f438 Browse Directory

Constraint: Continue the Ralph loop without waiting on missing business sample mounts, while still leaving a push-ready implementation and documentation trail
Rejected: Keep Phase-1 registry setup as static SQL snippets only | It slows live validation and leaves no machine-checkable bootstrap path
Confidence: high
Scope-risk: narrow
Directive: Treat model_registry/feature_set_registry/reference_set_registry as the mandatory entrypoint before any future MERT/MuQ extraction jobs
Tested: /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_model_registry_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_registry_bootstrap_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/bootstrap_phase1_model_registry_live.py; git diff --check -- acr-engine/scripts/bootstrap_phase1_model_registry_live.py acr-engine/data/pgvector_eval/music20/phase1_registry_bootstrap_report.json docs/model-feature-registry-bootstrap.md docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md
Not-tested: Actual MERT/MuQ embedding extraction, hard-case type_8/type_16 live queries, multi-recording/cover-lane retrieval

authored 2026-06-04 12:44:49 +0800

Record the current blocker for hard-case live samples · ea51b9c1 ...

ea51b9c1 Browse Directory

Constraint: Each Ralph follow-up change must leave a documented, pushed trail for the next session
Rejected: Keep the missing /workspace/downloads discovery only in transient shell output | It would be rediscovered and waste the next session
Confidence: high
Scope-risk: narrow
Directive: Treat hard-case live evaluation as environment-dependent until business sample mounts are restored
Tested: git diff --check -- docs/postgres_db_schema_samples.md docs/session-handoff.md docs/CHANGELOG.md; ls -ld /workspace/downloads => no such file or directory
Not-tested: Restoring or remounting the missing business sample directory

authored 2026-06-04 12:40:52 +0800

Harden lineage validation evidence for the PostgreSQL ACR path · e54e2ff2 ...

e54e2ff2 Browse Directory

Constraint: Each follow-up Ralph edit must update docs and preserve a push-ready, auditable validation trail
Rejected: Stop at a single audio_window negative test | It left recording/audio_embedding trigger coverage and report readability weaker than needed
Confidence: high
Scope-risk: narrow
Directive: Keep live retrieval reports self-explanatory enough for reviewers who only inspect JSON artifacts
Tested: /usr/local/miniconda3/bin/python scripts/live_pgvector_music20_eval.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --reset-schema --output data/pgvector_eval/music20/live_pgvector_report.json; /usr/local/miniconda3/bin/python -m py_compile scripts/live_pgvector_music20_eval.py; git diff --check -- acr-engine/scripts/live_pgvector_music20_eval.py acr-engine/data/pgvector_eval/music20/live_pgvector_report.json docs/postgres_db_schema_samples.md docs/CHANGELOG.md docs/session-handoff.md
Not-tested: type_8/type_16 live JSONL coverage, MERT/MuQ live embeddings, multi-recording/cover-lane decision flow

authored 2026-06-04 12:39:05 +0800

Validate the PostgreSQL ACR storage path with live evidence · 96c9ce7d ...

96c9ce7d Browse Directory

Constraint: The new data model had to be proven against the user-provided PostgreSQL instance and stay aligned with Phase-1 encoder-only decisions
Rejected: Document-only schema guidance without a live database run | It would leave retrieval correctness and table intent unproven
Confidence: high
Scope-risk: narrow
Directive: Keep future retrieval experiments writing through model/feature/reference registries instead of adding fixed per-model columns
Tested: /usr/local/miniconda3/bin/python scripts/live_pgvector_music20_eval.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --reset-schema --output data/pgvector_eval/music20/live_pgvector_report.json; /usr/local/miniconda3/bin/python scripts/evaluate_songid_pgvector_path.py --reference-embeddings-jsonl data/pgvector_eval/music20/reference_embeddings.jsonl --query-embeddings-jsonl data/pgvector_eval/music20/query_embeddings.jsonl --output data/pgvector_eval/music20/songid_eval_report_fresh.json; /usr/local/miniconda3/bin/python -m py_compile scripts/live_pgvector_music20_eval.py scripts/evaluate_songid_pgvector_path.py; git diff --check -- docs/README.md docs/CHANGELOG.md docs/postgres_db_schema_samples.md acr-engine/scripts/live_pgvector_music20_eval.py acr-engine/data/pgvector_eval/music20/live_pgvector_report.json acr-engine/data/pgvector_eval/music20/songid_eval_report_fresh.json
Not-tested: MERT/MuQ live embeddings, type_8/type_16 live JSONL coverage, multi-recording/cover-lane decision flow

authored 2026-06-04 12:20:15 +0800

Update 1 · b220751b
b220751b Browse Files

cnb.bofCdSsphPA authored 2026-06-04 11:43:33 +0800

Preserve a fast handoff entrypoint for the ACR roadmap · d8fd2d15 ...

d8fd2d15 Browse Directory

Constraint: The startup handoff must reflect the new Phase-1 encoder-only and PostgreSQL v2 decisions without carrying stale timeline noise
Rejected: Keep appending runtime logs to session-handoff.md | It obscures the current start point for the next session
Confidence: high
Scope-risk: narrow
Directive: Keep session-handoff.md focused on where to resume next, and move detailed chronology into changelog/history docs
Tested: git diff --check -- docs/session-handoff.md docs/CHANGELOG.md
Not-tested: No link checker or markdown linter was run

authored 2026-06-04 11:17:46 +0800

Make the Phase-1 ACR plan executable for each delivery role · 4b23f546 ...

4b23f546 Browse Directory

Constraint: The architecture and schema docs were already in place, but teams still lacked a concrete implementation checklist and registry bootstrap contract for encoder-only rollout
Rejected: leaving execution guidance implicit in architecture prose | would slow Phase-1 delivery and cause inconsistent model/feature initialization
Confidence: high
Scope-risk: narrow
Directive: treat Phase-1 implementation sequencing and model/feature/reference-set bootstrap as first-class docs that evolve with the schema
Tested: git diff --check on changed docs; Python document sanity check; README/CHANGELOG link coverage verified with rg
Not-tested: no runtime behavior changed; no database apply executed

authored 2026-06-04 11:13:31 +0800

Keep the new ACR architecture guide clean for follow-up edits · e514a6c7 ...

e514a6c7 Browse Directory

Constraint: The documentation refactor was already pushed and only needed a formatting-only hygiene follow-up
Rejected: leaving known markdown whitespace debt in the freshly introduced guide | would add avoidable noise to future reviews
Confidence: high
Scope-risk: narrow
Directive: keep the new role-oriented architecture docs diff-clean so future schema/model edits stay reviewable
Tested: git diff --check on docs/acr-architecture.md
Not-tested: content semantics unchanged; no runtime effects

authored 2026-06-04 11:07:22 +0800

Clarify the ACR evolution path and freeze a production-grade data model · a549d1de ...

a549d1de

Constraint: Phase-1 must support encoder-only open-source backbones without destabilizing future schema evolution
Rejected: extending the old flat song_id + fixed-vector schema | would couple model swaps to schema rewrites and weaken copyright lineage
Confidence: high
Scope-risk: moderate
Directive: treat canonical_song/work/recording/recording_asset/audio_window plus model/feature registries as the stable contract; evolve models and indexes around them
Tested: git diff --check on changed files; Python content/structure sanity check; architect review APPROVED; README link coverage and DDL object presence verified
Not-tested: live PostgreSQL apply not run because psql is unavailable in this environment

authored 2026-06-04 11:06:42 +0800

03 Jun, 2026 1 commit

Add the song_id pgvector evaluation scaffolding · 2898ef26 ...

2898ef26 Browse Directory

Constraint: we need a song-level evaluation path that matches the future pgvector production shape before moving off the local FAISS proving lane
Rejected: jumping straight to a live pgvector-only implementation | we still need a reproducible repo-local evaluation harness and artifact trail first
Confidence: high
Scope-risk: moderate
Directive: keep future pgvector work song_id-first and measure each query type separately before aggregating product claims
Tested: /usr/local/miniconda3/bin/python -m unittest discover -s acr-engine/tests -v; /usr/local/miniconda3/bin/python acr-engine/scripts/export_workspace_music20_embeddings_jsonl.py --downloads-dir /workspace/downloads --song-limit 20 --out-dir acr-engine/data/pgvector_eval/music20; /usr/local/miniconda3/bin/python acr-engine/scripts/evaluate_songid_pgvector_path.py --reference-embeddings-jsonl acr-engine/data/pgvector_eval/music20/reference_embeddings.jsonl --query-embeddings-jsonl acr-engine/data/pgvector_eval/music20/query_embeddings.jsonl --output acr-engine/data/pgvector_eval/music20/songid_eval_report.json
Not-tested: live PostgreSQL/pgvector online retrieval path

authored 2026-06-03 18:13:59 +0800