Commits · 8265f961abf69f08a78fc4ed14778ae272627637 · wanghai-tech / hikoon-ACR

04 Jun, 2026 40 commits

Why the handoff must distinguish current model coverage from historical test rows · 8265f961 ...

Constraint: The live schema contains historical placeholder and fallback rows that must not be mistaken for the current baseline
Rejected: Relying on informal memory of which rows are current | too easy for future sessions to misread the database
Confidence: high
Scope-risk: narrow
Directive: When live schemas keep historical test data, always document the exact SQL that identifies the current baseline coverage
Tested: markdown link check under docs; live SQL counts for chromaprint_matcher, mert-v1-95m, and muq-large-msd-iter
Not-tested: Full schema cleanup of historical test rows

authored 2026-06-04 16:15:24 +0800

Why the handoff should include a concrete live feature lineage example · 797f9032 ...

797f9032 Browse Files

Constraint: Future sessions need a zero-ambiguity PostgreSQL example that matches the current live song-centric pipeline
Rejected: Only describing lineage abstractly | forces re-verification every session
Confidence: high
Scope-risk: narrow
Directive: Prefer concrete live feature/window/asset/song examples in handoff docs whenever the default path changes
Tested: markdown link check under docs; live SQL verification for feature_id=34 lineage; manifest feature sample extraction
Not-tested: Re-running the full directory pipeline in this commit

authored 2026-06-04 16:14:12 +0800

Why the Phase-1 docs must explain feature-to-window binding explicitly · 6ee8c576 ...

6ee8c576 Browse Files

Constraint: The current default must stay aligned with the live 4-table song-centric path and the real MERT baseline
Rejected: Re-expanding old multi-layer docs | increases onboarding cost and reintroduces stale states
Confidence: high
Scope-risk: narrow
Directive: Keep future schema docs anchored to live model_name/feature_set_name facts, not aspirational placeholders
Tested: markdown link check under docs; live PostgreSQL spot-check of feature_fact model_name/object_id/song_id lineage
Not-tested: Mermaid rendering in external markdown viewers

authored 2026-06-04 16:11:59 +0800

Why the handoff must record MuQ's real blocker, not just its target model name · 08d24bd4 ...

08d24bd4 Browse Files

Constraint: The next challenger session should not misdiagnose the current MuQ state as 'package missing' when the actual blocker is a torchvision::nms runtime error
Rejected: Keep only the MuQ target name in docs | would hide the concrete runtime failure already observed on this host
Confidence: high
Scope-risk: narrow
Directive: Treat torchvision::nms import failure as the current MuQ blocker until a working import path is verified on this host
Tested: markdown link check on /workspace/docs after updating MuQ blocker notes
Not-tested: MuQ runtime import is still failing; this commit only records the verified blocker

authored 2026-06-04 16:06:28 +0800

Why the handoff should include MuQ's official loading entrypoint · 3913b97f ...

3913b97f Browse Files

Constraint: The next challenger implementation step should not rediscover MuQ's official model identifier and loader after the MERT baseline is already live
Rejected: Keep only the model name | still leaves the next session to infer the intended loading API from scratch
Confidence: medium
Scope-risk: narrow
Directive: Use the official MuQ loading entrypoint in future challenger work unless repo-local evidence establishes a better integration path
Tested: markdown link check on /workspace/docs after updating MuQ handoff notes
Not-tested: MuQ adapter implementation itself is still pending

authored 2026-06-04 16:00:15 +0800

Why the handoff should name the likely MuQ challenger explicitly · 3be95b9a ...

3be95b9a Browse Files

Constraint: The next implementation step should not waste time rediscovering the most likely MuQ model identifier after the MERT baseline is already live
Rejected: Leave MuQ as a generic future challenger | forces the next session to repeat model-name discovery work
Confidence: medium
Scope-risk: narrow
Directive: Treat OpenMuQ/MuQ-large-msd-iter as the first MuQ challenger candidate to try unless newer repo-local evidence supersedes it
Tested: markdown link check on /workspace/docs after updating handoff notes
Not-tested: MuQ model loading itself is still pending

authored 2026-06-04 15:57:19 +0800

Why the song-centric semantic lane must move from placeholder to a real MERT baseline · 80df0d30 ...

80df0d30

Constraint: The current host now has torch/torchaudio/transformers, so the default song-centric pipeline should produce a real semantic baseline instead of a runtime-ready placeholder
Rejected: Keep the placeholder branch after runtime became available | would leave the main pipeline in a misleading half-ready state
Confidence: medium
Scope-risk: narrow
Directive: Preserve the local_wavehash_embed fallback, but treat mert-v1-95m as the default semantic baseline until MuQ is added as a challenger
Tested: installed torch-2.12.0+cpu, torchaudio-2.11.0+cpu, transformers-5.10.1; py_compile for enrich_songcentric_manifest_with_local_features.py; reran song-centric pipeline; verified latest embedding rows are mert-v1-95m; markdown link check on /workspace/docs
Not-tested: MuQ adapter implementation and production vector-table persistence are still pending

authored 2026-06-04 15:53:24 +0800

Why the handoff must reflect the runtime-ready semantic state · b0c52b54 ...

b0c52b54 Browse Files

Constraint: The latest host state is no longer fallback-only; docs must show that torch/torchaudio/transformers are installed and the song-centric pipeline now reaches the runtime-ready placeholder branch
Rejected: Keep the old missing-dependency handoff | would mislead the next session into debugging an already-cleared blocker
Confidence: high
Scope-risk: narrow
Directive: Keep future handoff notes aligned with the latest runner report and import status before planning semantic-adapter implementation
Tested: installed torch-2.12.0+cpu, torchaudio-2.11.0+cpu, transformers-5.10.1; reran song-centric pipeline; markdown link check on /workspace/docs
Not-tested: Real MERT/MuQ adapter implementation is still pending; current semantic output is the runtime-ready placeholder

authored 2026-06-04 15:44:07 +0800

Why the handoff needs the exact semantic adapter insertion point · 21388b99 ...

21388b99 Browse Files

Constraint: The next session should not spend time rediscovering where the real MERT/MuQ adapter belongs in the song-centric pipeline
Rejected: Leave the adapter step as a generic future task | does not identify the concrete file and function to change
Confidence: high
Scope-risk: narrow
Directive: Keep future semantic-adapter handoffs anchored on enrich_songcentric_manifest_with_local_features.py unless the host pipeline entrypoint changes
Tested: markdown link check on /workspace/docs after adding the semantic adapter handoff note
Not-tested: No runtime install or adapter implementation yet; this commit records the verified insertion point only

authored 2026-06-04 15:18:56 +0800

Why the docs need an explicit vector payload storage contract · e6c2e0a1 ...

e6c2e0a1 Browse Files

Constraint: Phase-1 embedding storage must explain where vectors live, how feature_fact points to them, and how hot/cold migration works at scale
Rejected: Leave vector payload placement implicit | does not give operators a stable contract for ANN loading, backfill, or cleanup
Confidence: high
Scope-risk: narrow
Directive: Keep embedding payload guidance split between feature_fact metadata and vector-side storage unless the physical schema default changes
Tested: markdown link check on /workspace/docs after adding vector payload storage and lifecycle guidance
Not-tested: No live database rerun; this is a documentation-only clarification over the current schema path

authored 2026-06-04 15:16:20 +0800

Why the docs need a concrete scale-up ingestion and indexing strategy · df241d20 ...

df241d20 Browse Files

Constraint: A 100w-audio Phase-1 system needs a default order for ingest, exact coverage, semantic backfill, and index build timing on the current schema
Rejected: Leave scale-up as implied by the small examples | does not guide batch execution or gap audits at production volume
Confidence: high
Scope-risk: narrow
Directive: Keep future scale docs anchored on main-chain-first ingestion and model-by-model feature backfill unless benchmark evidence proves otherwise
Tested: markdown link check on /workspace/docs after adding scale-up ingestion, indexing, and audit SQL guidance
Not-tested: No live database rerun; this is a documentation-only expansion over the verified schema path

authored 2026-06-04 15:14:46 +0800

Why the schema samples need a complete multi-asset multi-model example · b624273c ...

b624273c Browse Files

Constraint: Phase-1 implementers need one concrete end-to-end sample that shows how a single song expands into multiple assets, windows, and model facts
Rejected: Keep only isolated insert snippets | does not help with batch backfill or completeness checks
Confidence: high
Scope-risk: narrow
Directive: When extending storage examples, include operational queries for gap detection and model completeness, not just inserts
Tested: markdown link check on /workspace/docs after adding the complete sample and audit SQL
Not-tested: No live database rerun; this is a documentation-only expansion over the verified schema

authored 2026-06-04 15:13:22 +0800

Why the docs need explicit bindings between audio objects and feature facts · 75f156b8 ...

75f156b8 Browse Files

Constraint: Phase-1 implementers need one concrete explanation for how songs, assets, windows, and open-model features are linked and stored
Rejected: Rely on schema columns alone | does not show the intended per-model storage pattern for the current encoder-only phase
Confidence: high
Scope-risk: narrow
Directive: Keep future model-onboarding docs grounded in feature_fact object_id/song_id bindings unless the schema default changes
Tested: markdown link check on /workspace/docs after adding binding diagrams and SQL storage examples
Not-tested: No live database rerun; this is a documentation clarification over an already-verified schema

authored 2026-06-04 15:12:13 +0800

Why the docs need a first-pass fusion strategy, not just storage paths · 43644ac8 ...

43644ac8 Browse Files

Constraint: Phase-1 retrieval must explain how exact and semantic candidates become a ranked song list on the current 4-table schema
Rejected: Defer fusion guidance until model integration finishes | leaves implementation teams without a default ranking contract
Confidence: high
Scope-risk: narrow
Directive: Keep Phase-1 ranking docs exact-led and evidence-oriented until measured recall data justifies a different default
Tested: markdown link check on /workspace/docs after adding fusion diagrams and SQL skeletons
Not-tested: No live retrieval benchmark rerun; this change documents the intended ranking path only

authored 2026-06-04 15:10:52 +0800

Why the retrieval docs need the online song backtrace made explicit · 5869c876 ...

5869c876 Browse Files

Constraint: New engineers need a direct feature_fact-to-song_id query path on the current 4-table schema without reconstructing it from scattered examples
Rejected: Leave only insert-side diagrams | does not explain how online recall returns song ownership evidence
Confidence: high
Scope-risk: narrow
Directive: Keep query-path docs aligned with the feature_fact -> window -> asset -> song chain when adding new retrieval lanes
Tested: markdown link check on /workspace/docs after adding retrieval flow diagrams and SQL templates
Not-tested: No live database rerun; this change only documents the already-verified schema path

authored 2026-06-04 15:09:36 +0800

Why the schema docs need one song-centric story, not parallel histories · 38b37e08 ...

38b37e08

Constraint: New teammates must understand where slice/model/feature data lands without reading deprecated v2/planner-worker material
Rejected: Keep old docs with disclaimers | still leaves two competing mental models in the default docs path
Confidence: high
Scope-risk: narrow
Directive: Keep future docs anchored on the 4-table song-centric path unless the physical schema default truly changes
Tested: markdown link check on /workspace/docs; staged diff review; verified referenced wrapper script is present
Not-tested: No database or pipeline rerun was needed for this docs-only consolidation

authored 2026-06-04 15:07:26 +0800

Promote the one-command song-centric runner as the default handoff path · 020702cc ...

020702cc Browse Files

Constraint: Reduce resume friction for future sessions by making the current live-validated runner the first-class entrypoint in docs and artifacts.
Rejected: Keep the active song-centric workflow scattered across multiple lower-level commands in the handoff docs | It slows recovery and increases cognitive overhead.
Confidence: high
Scope-risk: narrow
Directive: For current development, start with run_songcentric_directory_pipeline_live.py before dropping to the lower-level builder/enricher/importer commands.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --input-root acr-engine/data/songcentric_builder_smoke --output-dir acr-engine/data/pgvector_eval/music20; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: alternate input roots beyond the current smoke directory

authored 2026-06-04 15:00:03 +0800

Collapse the song-centric directory workflow into one live runner · 3b4b3684 ...

3b4b3684 Browse Files

Constraint: Keep the current real-directory onboarding path executable end-to-end on this host while exposing exact/semantic backend selection in one reproducible report.
Rejected: Leave the song-centric pipeline as multiple manual commands only | It raises handoff cost and makes repeated host validation slower and noisier.
Confidence: high
Scope-risk: narrow
Directive: Use run_songcentric_directory_pipeline_live.py as the default smoke/verification entrypoint for the current song-centric ingestion path.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --input-root acr-engine/data/songcentric_builder_smoke --output-dir acr-engine/data/pgvector_eval/music20; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: very large directory trees and true semantic runtime readiness on this host

authored 2026-06-04 14:58:32 +0800

Make semantic feature enrichment runtime-aware on the song-centric path · 35d883a8 ...

35d883a8 Browse Directory

Constraint: Keep the current real-directory import path executable on this host while making semantic-lane readiness explicit instead of pretending the heavyweight runtime exists.
Rejected: Hardwire semantic enrichment to the local fallback without reporting missing runtime state | It hides the true blocker and weakens the upgrade path to real semantic models.
Confidence: high
Scope-risk: narrow
Directive: On this host, treat local_wavehash_embed as a fallback semantic backend and persist missing runtime evidence until torch/torchaudio/transformers are installed.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the v3 enriched manifest twice into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test and verified counts stayed media_entity=9, audio_object=22, feature_fact=24, set_membership=9; report shows semantic_runtime_available=false and missing=[torch, torchaudio, transformers]; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: real MERT/MuQ extraction on this host

authored 2026-06-04 14:56:51 +0800

Upgrade Python to 3.12 via Miniconda3-py312_26.3.2-2 · 7e3b0136 ...

7e3b0136 Browse Files

Switch from Miniconda3-py310_23.11.0-2 to Miniconda3-py312_26.3.2-2
(Python 3.10 -> 3.12) across both Dockerfile variants.

Constraint: Tsinghua mirror URL updated in both files
Confidence: high
Scope-risk: narrow
Tested: both images build and report Python 3.12.13

authored 2026-06-04 14:55:59 +0800

Prefer the repo fingerprint matcher in the real-directory song-centric pipeline · 8095eeea ...

8095eeea Browse Directory

Constraint: Improve the current directory-to-feature path using components already present in the repo, without depending on unavailable heavyweight semantic runtimes.
Rejected: Keep exact-lane validation on a purely ad-hoc local hash path | It underuses the repo's existing fingerprint extraction capability and weakens evidence for the real pipeline.
Confidence: high
Scope-risk: narrow
Directive: In host-level song-centric pipeline validation, prefer ChromaprintMatcher-backed fingerprints first and use local_wavehash only as fallback.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the enriched manifest into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test twice and verified counts stayed media_entity=9, audio_object=22, feature_fact=24, set_membership=9 on rerun; matcher_fingerprint_count=5 and fallback_fingerprint_count=0; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: true external chromaprint library integration and semantic-model-backed enrichment on this host

authored 2026-06-04 14:54:47 +0800

Complete the real-directory song-centric pipeline through feature_fact · 5e00c5b0 ...

5e00c5b0 Browse Directory

Constraint: Finish the current real-directory onboarding loop without depending on missing heavyweight model runtimes, while still writing concrete feature rows into the fused schema.
Rejected: Wait for MERT/MuQ runtime availability before validating directory-to-feature ingestion | It would leave the Phase-1 data path unproven on this host.
Confidence: high
Scope-risk: narrow
Directive: Use enrich_songcentric_manifest_with_local_features.py as the temporary deterministic feature stage for host-level pipeline validation until full model runtimes are installed.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the enriched manifest twice into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test and verified counts remained media_entity=9, audio_object=22, feature_fact=19, set_membership=9; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: semantic quality of the temporary local features and large-scale feature enrichment throughput

authored 2026-06-04 14:52:26 +0800

Build song-centric manifests directly from real audio directories · 0f75787b ...

0f75787b

Constraint: Keep the current fused 4-table workflow while reducing manual JSONL authoring for onboarding real audio files into live PostgreSQL.
Rejected: Require hand-authored manifests as the only path into the song-centric importer | It slows real data onboarding and raises operator effort.
Confidence: high
Scope-risk: narrow
Directive: Prefer build_songcentric_manifest_from_directory.py -> import_songcentric_manifest_live.py as the default Phase-1 path for real file-directory onboarding.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/build_songcentric_manifest_from_directory.py on a real local wav smoke directory; imported the generated manifest into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test; reran the import and verified counts remained media_entity=9, audio_object=22, feature_fact=9, set_membership=9; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: non-wav duration probing and very large directory trees

authored 2026-06-04 14:50:48 +0800

Close the song-centric import loop with feature_fact ingestion · d04a6e65 ...

d04a6e65 Browse Directory

Constraint: Complete the current manifest-to-PostgreSQL onboarding path on the 4-table fused schema without reintroducing any split-table storage path.
Rejected: Keep feature generation outside the manifest import workflow for Phase-1 | It leaves the current onboarding path incomplete and harder to validate end-to-end.
Confidence: high
Scope-risk: narrow
Directive: Treat windows[].features[] in song-centric manifests as the default batch path for writing fingerprint and embedding rows into feature_fact.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/import_songcentric_manifest_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --manifest acr-engine/data/pgvector_eval/music20/songcentric_feature_manifest_sample.jsonl; repeated the import and verified counts remained media_entity=7, audio_object=15, feature_fact=9, set_membership=7; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: automatic feature extraction from raw audio during import; large-scale concurrent manifest ingest

authored 2026-06-04 14:48:56 +0800

Keep the song-centric manifest import docs mechanically clean · ba387bf0 ...

ba387bf0 Browse Directory

Constraint: Preserve the just-validated manifest import workflow while removing formatting noise from the retained docs set.
Rejected: Leave doc formatting drift after a live-validated workflow change | It accumulates avoidable friction for later edits and checks.
Confidence: high
Scope-risk: narrow
Directive: Run diff-check and docs link validation after each docs-only follow-up on the reduced docs surface.
Tested: git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: no runtime logic changes in this cleanup commit

authored 2026-06-04 14:45:45 +0800

Import song-centric manifests into live PostgreSQL with idempotent upserts · 8002dfb0 ...

8002dfb0 Browse Directory

Constraint: Extend the current 4-table song-centric schema with a practical manifest ingestion path without introducing the older split-table model or hidden side metadata tables.
Rejected: Leave ingestion as handwritten SQL or one-off bootstrap logic | It slows real asset onboarding and makes repeatability hard to verify.
Confidence: high
Scope-risk: narrow
Directive: Use import_songcentric_manifest_live.py plus a manifest JSONL as the default path for batch asset/window onboarding into the fused schema.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/import_songcentric_manifest_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --manifest acr-engine/data/pgvector_eval/music20/songcentric_manifest_sample.jsonl; repeated the import and verified counts remained media_entity=5, audio_object=11, feature_fact=6, set_membership=5; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: feature_fact generation during manifest import and large-scale manifest throughput

authored 2026-06-04 14:45:31 +0800

Bootstrap the fused song-centric schema with repeatable live seed data · 5e43f28b ...

5e43f28b Browse Directory

Constraint: Keep all new initialization logic on top of the current 4-table song-centric schema and validate it against the user PostgreSQL instead of synthetic-only assumptions.
Rejected: Stop at one-row smoke evidence | It does not prove the schema is practical for repeated Phase-1 bootstrap workflows.
Confidence: high
Scope-risk: narrow
Directive: Use bootstrap_songcentric_phase1_live.py as the default seed/bootstrap path when demonstrating or validating the fused schema on live PostgreSQL.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/bootstrap_songcentric_phase1_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: large-batch bootstrap and conflict handling under concurrent writers

authored 2026-06-04 14:43:27 +0800

Prove the fused song-centric ACR schema on live PostgreSQL · 3ce36679 ...

3ce36679 Browse Directory

Constraint: Stay within the current 4-table song-centric model and validate it against the user-provided PostgreSQL before treating it as the active schema candidate.
Rejected: Leave the fused model as docs-only guidance | Without a runnable SQL file and smoke evidence, downstream implementation would still be ambiguous.
Confidence: high
Scope-risk: narrow
Directive: Prefer acr_pg_schema_songcentric_v1.sql for new schema experiments tied to the current song-centric design; do not revive the older split-table model for Phase-1 by default.
Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/smoke_songcentric_schema_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: high-volume bulk ingest on the fused schema

authored 2026-06-04 14:41:37 +0800

Make the fused Phase-1 ACR schema concrete with DDL samples · fe416ec9 ...

fe416ec9

Constraint: Keep the storage design aligned to the current song-centric model while turning the 4-table fused schema into something engineers can directly review and implement.
Rejected: Keep only conceptual docs without concrete SQL | It leaves too much ambiguity about where slices, models, and features actually land.
Confidence: high
Scope-risk: narrow
Directive: Until the repository gains a production SQL file for the fused model, treat postgres_db_schema_samples.md as the authoritative DDL draft for media_entity/audio_object/feature_fact/set_membership.
Tested: git diff --check on touched files; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files
Not-tested: Executing the fused DDL against a live PostgreSQL schema

authored 2026-06-04 14:39:30 +0800

Reduce ACR docs to the current song-centric storage design · ac2e6730 ...

ac2e6730 Browse Directory

Constraint: Keep only documentation that directly serves the current Phase-1 song-centric + fused-table storage and retrieval design.
Rejected: Preserve broad historical, dataset, business-export, and template docs in the main docs root | They increase handoff cost and blur the active design surface.
Confidence: high
Scope-risk: moderate
Directive: Treat postgresql-data-model.md as the single source of truth for where slices, models, and features are stored until a concrete fused DDL supersedes it.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files; final docs root reduced to 12 files
Not-tested: external markdown renderers and downstream readers that may still expect removed auxiliary docs

authored 2026-06-04 14:37:22 +0800

Favor typed unified tables for the Phase-1 ACR storage model · 44222971 ...

44222971 Browse Directory

Constraint: Reduce schema reading cost for new engineers while preserving the logical distinctions needed for copyright-scale retrieval and attribution.
Rejected: Keep adding highly specialized tables for every layer in Phase-1 | It increases join cost in the mental model faster than it improves first-stage delivery.
Confidence: high
Scope-risk: narrow
Directive: Prefer a fused physical model (media_entity/audio_object/feature_fact/set_membership) with type fields, while keeping song/recording/asset/window as logical semantics.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files; rg confirmed fused-model sections are present in docs
Not-tested: concrete SQL DDL for the fused physical model

authored 2026-06-04 14:32:58 +0800

Freeze the Phase-1 minimal schema story for ACR delivery · 7ada6f21 ...

7ada6f21 Browse Directory

Constraint: Keep the production-ready v2 model intact while making the first-delivery table set explicit for engineers starting implementation.
Rejected: Introduce a separate competing Phase-1 schema document | It would create another parallel truth and slow handoff.
Confidence: high
Scope-risk: narrow
Directive: When discussing first-stage storage, default to song/recording/recording_asset/audio_window plus feature and reference tables before bringing in heavier governance tables.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files
Not-tested: SQL DDL generation from the simplified narrative

authored 2026-06-04 14:31:31 +0800

Keep ACR entity layers simple without collapsing recording assets · 89d9d72b ...

89d9d72b Browse Directory

Constraint: Preserve the Phase-1 minimal schema story while clarifying when simplification is safe and when it creates future refactor risk.
Rejected: Merge recording and recording_asset in the formal schema now | Copyright-scale catalogs will quickly need multi-file and multi-source recording support.
Confidence: high
Scope-risk: narrow
Directive: Use 'song -> recording -> asset -> window -> feature' as the default communication shorthand, but keep recording and asset split in the persisted model.
Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files
Not-tested: Rendered markdown preview in external viewers

authored 2026-06-04 14:30:15 +0800

Reduce ACR handoff time with a single doc chain · 6d4f8c1c ...

6d4f8c1c Browse Directory

Constraint: Preserve the current Phase-1 runner, PostgreSQL v2 contract, and live validation narrative while removing duplicate doc entrypoints.
Rejected: Keep multiple parallel handoff docs | They force new contributors to diff stale narratives before they can act.
Confidence: high
Scope-risk: narrow
Directive: Treat README -> start-here -> session-handoff as the only first-read path unless a newer handoff chain fully replaces it.
Tested: git diff --check on touched docs/script; rg for deleted-doc residual refs outside CHANGELOG; reran scripts/run_planner_validation_commands_live.py with executed_count=4 and all_passed=true
Not-tested: Markdown link rendering in external viewers

authored 2026-06-04 14:25:25 +0800

Put the shortest verified startup path at the docs entrypoint · 8d6e4b29 ...

8d6e4b29 Browse Directory

Constraint: The README remained a reading-first surface while the handoff had already converged on a faster validated startup command, so the docs entrypoint needed to match the actual recovery workflow.
Rejected: Keep the shortest path only in session-handoff | That would still force many sessions to open the wrong document first.
Confidence: high
Scope-risk: narrow
Directive: Treat docs/README.md and docs/session-handoff.md as aligned startup surfaces; keep the runner command identical in both places.
Tested: git diff --check; /usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/planner_validation_commands_runner_report.json
Not-tested: This commit reshapes documentation only; it does not change worker behavior.

authored 2026-06-04 14:16:18 +0800

Make the handoff start from the shortest verified recovery path · 061dd5e7 ...

061dd5e7 Browse Directory

Constraint: The session handoff had become information-rich enough that the next session still needed manual triage, so the opening section had to be collapsed to one verified command path.
Rejected: Keep the handoff primarily as a reading list | That would preserve context but not minimize restart latency.
Confidence: high
Scope-risk: narrow
Directive: Start future sessions with the planner validation runner before reading deeper docs unless the task explicitly skips validation.
Tested: git diff --check; verified docs/session-handoff.md now points to scripts/run_planner_validation_commands_live.py backed by fresh data/pgvector_eval/music20/planner_validation_commands_runner_report.json evidence (executed_count=4, all_passed=true)
Not-tested: No new code-path execution was needed in this commit because it reorganizes the already-verified startup flow.

authored 2026-06-04 14:14:47 +0800

Add a runner for all planner validation entrypoints · eb2ea03a ...

eb2ea03a Browse Directory

Constraint: The planner artifact had become executable, but future sessions still needed a reusable entrypoint instead of ad-hoc inline Python to consume it.
Rejected: Keep the execution proof as one-off shell snippets | That would not give the next session a durable command surface.
Confidence: high
Scope-risk: narrow
Directive: Use run_planner_validation_commands_live.py as the default preflight gate before attempting new Phase-1 worker changes on a host.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_planner_validation_commands_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/planner_validation_commands_runner_report.json
Not-tested: The runner only validates planner entrypoints; it does not unlock successful extraction on an environment-blocked host.

authored 2026-06-04 14:12:50 +0800

Close the planner validation loop across all four live entrypoints · 8e2d4852 ...

8e2d4852 Browse Directory

Constraint: Partial execution proof for planner validation commands still left room for manual reconstruction risk, so the remaining entrypoints had to be exercised too.
Rejected: Stop after two executed planner commands | It would leave the negative matrix and asset-upsert entrypoints unproven.
Confidence: high
Scope-risk: narrow
Directive: Treat phase1_validation_commands_execution_report.json as the authoritative proof that the planner artifact is executable end-to-end.
Tested: git diff --check; /usr/local/miniconda3/bin/python - <<'PY' ... execute validation_commands.semantic_vector_negative_matrix and validation_commands.asset_level_upsert_validation from data/pgvector_eval/music20/phase1_extraction_plan_report.json ... PY
Not-tested: Individual extraction jobs still remain environment-blocked; this commit proves validation entrypoints, not successful feature extraction.

authored 2026-06-04 14:11:30 +0800

Prove planner validation commands execute without manual reconstruction · fa33c3a1 ...

fa33c3a1 Browse Directory

Constraint: Adding validation_commands to the planner was only useful if the emitted commands could be consumed directly, so the plan artifact needed one more layer of execution proof.
Rejected: Assume command strings are correct because they look valid | That would leave restart automation unproven.
Confidence: high
Scope-risk: narrow
Directive: Prefer executing validation_commands from the planner artifact instead of retyping equivalent checks by hand.
Tested: git diff --check; /usr/local/miniconda3/bin/python - <<'PY' ... execute validation_commands.prereq_audit and validation_commands.worker_contract_smoke from data/pgvector_eval/music20/phase1_extraction_plan_report.json ... PY
Not-tested: The remaining planner validation commands were not executed in this commit, though their sibling commands proved the artifact is directly consumable.

authored 2026-06-04 14:10:01 +0800

Make the Phase-1 planner carry live validation entrypoints · 9b020339 ...

9b020339 Browse Directory

Constraint: The repo now has multiple verified smoke and audit scripts, so leaving them outside the planner would force future sessions to rediscover the right validation commands by reading docs.
Rejected: Document the commands only in markdown | That would drift from the executable plan artifact and slow restart execution.
Confidence: high
Scope-risk: narrow
Directive: Treat validation_commands in the planner as the first-stop entrypoints before running individual extraction jobs on a new host.
Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json
Not-tested: The planner still emits commands for an environment-blocked host and does not prove successful extraction by itself.

authored 2026-06-04 14:08:30 +0800