- 04 Jun, 2026 40 commits
-
-
Constraint: Phase-1 embedding storage must explain where vectors live, how feature_fact points to them, and how hot/cold migration works at scale Rejected: Leave vector payload placement implicit | does not give operators a stable contract for ANN loading, backfill, or cleanup Confidence: high Scope-risk: narrow Directive: Keep embedding payload guidance split between feature_fact metadata and vector-side storage unless the physical schema default changes Tested: markdown link check on /workspace/docs after adding vector payload storage and lifecycle guidance Not-tested: No live database rerun; this is a documentation-only clarification over the current schema path
cnb.bofCdSsphPA authored -
Constraint: A 100w-audio Phase-1 system needs a default order for ingest, exact coverage, semantic backfill, and index build timing on the current schema Rejected: Leave scale-up as implied by the small examples | does not guide batch execution or gap audits at production volume Confidence: high Scope-risk: narrow Directive: Keep future scale docs anchored on main-chain-first ingestion and model-by-model feature backfill unless benchmark evidence proves otherwise Tested: markdown link check on /workspace/docs after adding scale-up ingestion, indexing, and audit SQL guidance Not-tested: No live database rerun; this is a documentation-only expansion over the verified schema path
cnb.bofCdSsphPA authored -
Constraint: Phase-1 implementers need one concrete end-to-end sample that shows how a single song expands into multiple assets, windows, and model facts Rejected: Keep only isolated insert snippets | does not help with batch backfill or completeness checks Confidence: high Scope-risk: narrow Directive: When extending storage examples, include operational queries for gap detection and model completeness, not just inserts Tested: markdown link check on /workspace/docs after adding the complete sample and audit SQL Not-tested: No live database rerun; this is a documentation-only expansion over the verified schema
cnb.bofCdSsphPA authored -
Constraint: Phase-1 implementers need one concrete explanation for how songs, assets, windows, and open-model features are linked and stored Rejected: Rely on schema columns alone | does not show the intended per-model storage pattern for the current encoder-only phase Confidence: high Scope-risk: narrow Directive: Keep future model-onboarding docs grounded in feature_fact object_id/song_id bindings unless the schema default changes Tested: markdown link check on /workspace/docs after adding binding diagrams and SQL storage examples Not-tested: No live database rerun; this is a documentation clarification over an already-verified schema
cnb.bofCdSsphPA authored -
Constraint: Phase-1 retrieval must explain how exact and semantic candidates become a ranked song list on the current 4-table schema Rejected: Defer fusion guidance until model integration finishes | leaves implementation teams without a default ranking contract Confidence: high Scope-risk: narrow Directive: Keep Phase-1 ranking docs exact-led and evidence-oriented until measured recall data justifies a different default Tested: markdown link check on /workspace/docs after adding fusion diagrams and SQL skeletons Not-tested: No live retrieval benchmark rerun; this change documents the intended ranking path only
cnb.bofCdSsphPA authored -
Constraint: New engineers need a direct feature_fact-to-song_id query path on the current 4-table schema without reconstructing it from scattered examples Rejected: Leave only insert-side diagrams | does not explain how online recall returns song ownership evidence Confidence: high Scope-risk: narrow Directive: Keep query-path docs aligned with the feature_fact -> window -> asset -> song chain when adding new retrieval lanes Tested: markdown link check on /workspace/docs after adding retrieval flow diagrams and SQL templates Not-tested: No live database rerun; this change only documents the already-verified schema path
cnb.bofCdSsphPA authored -
Constraint: New teammates must understand where slice/model/feature data lands without reading deprecated v2/planner-worker material Rejected: Keep old docs with disclaimers | still leaves two competing mental models in the default docs path Confidence: high Scope-risk: narrow Directive: Keep future docs anchored on the 4-table song-centric path unless the physical schema default truly changes Tested: markdown link check on /workspace/docs; staged diff review; verified referenced wrapper script is present Not-tested: No database or pipeline rerun was needed for this docs-only consolidation
cnb.bofCdSsphPA authored -
Constraint: Reduce resume friction for future sessions by making the current live-validated runner the first-class entrypoint in docs and artifacts. Rejected: Keep the active song-centric workflow scattered across multiple lower-level commands in the handoff docs | It slows recovery and increases cognitive overhead. Confidence: high Scope-risk: narrow Directive: For current development, start with run_songcentric_directory_pipeline_live.py before dropping to the lower-level builder/enricher/importer commands. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --input-root acr-engine/data/songcentric_builder_smoke --output-dir acr-engine/data/pgvector_eval/music20; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: alternate input roots beyond the current smoke directory
cnb.bofCdSsphPA authored -
Constraint: Keep the current real-directory onboarding path executable end-to-end on this host while exposing exact/semantic backend selection in one reproducible report. Rejected: Leave the song-centric pipeline as multiple manual commands only | It raises handoff cost and makes repeated host validation slower and noisier. Confidence: high Scope-risk: narrow Directive: Use run_songcentric_directory_pipeline_live.py as the default smoke/verification entrypoint for the current song-centric ingestion path. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/run_songcentric_directory_pipeline_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --input-root acr-engine/data/songcentric_builder_smoke --output-dir acr-engine/data/pgvector_eval/music20; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: very large directory trees and true semantic runtime readiness on this host
cnb.bofCdSsphPA authored -
Constraint: Keep the current real-directory import path executable on this host while making semantic-lane readiness explicit instead of pretending the heavyweight runtime exists. Rejected: Hardwire semantic enrichment to the local fallback without reporting missing runtime state | It hides the true blocker and weakens the upgrade path to real semantic models. Confidence: high Scope-risk: narrow Directive: On this host, treat local_wavehash_embed as a fallback semantic backend and persist missing runtime evidence until torch/torchaudio/transformers are installed. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the v3 enriched manifest twice into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test and verified counts stayed media_entity=9, audio_object=22, feature_fact=24, set_membership=9; report shows semantic_runtime_available=false and missing=[torch, torchaudio, transformers]; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: real MERT/MuQ extraction on this host
cnb.bofCdSsphPA authored -
Switch from Miniconda3-py310_23.11.0-2 to Miniconda3-py312_26.3.2-2 (Python 3.10 -> 3.12) across both Dockerfile variants. Constraint: Tsinghua mirror URL updated in both files Confidence: high Scope-risk: narrow Tested: both images build and report Python 3.12.13
cnb.bofCdSsphPA authored -
Constraint: Improve the current directory-to-feature path using components already present in the repo, without depending on unavailable heavyweight semantic runtimes. Rejected: Keep exact-lane validation on a purely ad-hoc local hash path | It underuses the repo's existing fingerprint extraction capability and weakens evidence for the real pipeline. Confidence: high Scope-risk: narrow Directive: In host-level song-centric pipeline validation, prefer ChromaprintMatcher-backed fingerprints first and use local_wavehash only as fallback. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the enriched manifest into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test twice and verified counts stayed media_entity=9, audio_object=22, feature_fact=24, set_membership=9 on rerun; matcher_fingerprint_count=5 and fallback_fingerprint_count=0; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: true external chromaprint library integration and semantic-model-backed enrichment on this host
cnb.bofCdSsphPA authored -
Constraint: Finish the current real-directory onboarding loop without depending on missing heavyweight model runtimes, while still writing concrete feature rows into the fused schema. Rejected: Wait for MERT/MuQ runtime availability before validating directory-to-feature ingestion | It would leave the Phase-1 data path unproven on this host. Confidence: high Scope-risk: narrow Directive: Use enrich_songcentric_manifest_with_local_features.py as the temporary deterministic feature stage for host-level pipeline validation until full model runtimes are installed. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/enrich_songcentric_manifest_with_local_features.py on the real wav smoke manifest; imported the enriched manifest twice into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test and verified counts remained media_entity=9, audio_object=22, feature_fact=19, set_membership=9; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: semantic quality of the temporary local features and large-scale feature enrichment throughput
cnb.bofCdSsphPA authored -
Constraint: Keep the current fused 4-table workflow while reducing manual JSONL authoring for onboarding real audio files into live PostgreSQL. Rejected: Require hand-authored manifests as the only path into the song-centric importer | It slows real data onboarding and raises operator effort. Confidence: high Scope-risk: narrow Directive: Prefer build_songcentric_manifest_from_directory.py -> import_songcentric_manifest_live.py as the default Phase-1 path for real file-directory onboarding. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/build_songcentric_manifest_from_directory.py on a real local wav smoke directory; imported the generated manifest into postgres://d2:d2pass@127.0.0.1:5432/d2 schema acr_songcentric_test; reran the import and verified counts remained media_entity=9, audio_object=22, feature_fact=9, set_membership=9; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: non-wav duration probing and very large directory trees
cnb.bofCdSsphPA authored -
Constraint: Complete the current manifest-to-PostgreSQL onboarding path on the 4-table fused schema without reintroducing any split-table storage path. Rejected: Keep feature generation outside the manifest import workflow for Phase-1 | It leaves the current onboarding path incomplete and harder to validate end-to-end. Confidence: high Scope-risk: narrow Directive: Treat windows[].features[] in song-centric manifests as the default batch path for writing fingerprint and embedding rows into feature_fact. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/import_songcentric_manifest_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --manifest acr-engine/data/pgvector_eval/music20/songcentric_feature_manifest_sample.jsonl; repeated the import and verified counts remained media_entity=7, audio_object=15, feature_fact=9, set_membership=7; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: automatic feature extraction from raw audio during import; large-scale concurrent manifest ingest
cnb.bofCdSsphPA authored -
Constraint: Preserve the just-validated manifest import workflow while removing formatting noise from the retained docs set. Rejected: Leave doc formatting drift after a live-validated workflow change | It accumulates avoidable friction for later edits and checks. Confidence: high Scope-risk: narrow Directive: Run diff-check and docs link validation after each docs-only follow-up on the reduced docs surface. Tested: git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: no runtime logic changes in this cleanup commit
cnb.bofCdSsphPA authored -
Constraint: Extend the current 4-table song-centric schema with a practical manifest ingestion path without introducing the older split-table model or hidden side metadata tables. Rejected: Leave ingestion as handwritten SQL or one-off bootstrap logic | It slows real asset onboarding and makes repeatability hard to verify. Confidence: high Scope-risk: narrow Directive: Use import_songcentric_manifest_live.py plus a manifest JSONL as the default path for batch asset/window onboarding into the fused schema. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/import_songcentric_manifest_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test --manifest acr-engine/data/pgvector_eval/music20/songcentric_manifest_sample.jsonl; repeated the import and verified counts remained media_entity=5, audio_object=11, feature_fact=6, set_membership=5; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: feature_fact generation during manifest import and large-scale manifest throughput
cnb.bofCdSsphPA authored -
Constraint: Keep all new initialization logic on top of the current 4-table song-centric schema and validate it against the user PostgreSQL instead of synthetic-only assumptions. Rejected: Stop at one-row smoke evidence | It does not prove the schema is practical for repeated Phase-1 bootstrap workflows. Confidence: high Scope-risk: narrow Directive: Use bootstrap_songcentric_phase1_live.py as the default seed/bootstrap path when demonstrating or validating the fused schema on live PostgreSQL. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/bootstrap_songcentric_phase1_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: large-batch bootstrap and conflict handling under concurrent writers
cnb.bofCdSsphPA authored -
Constraint: Stay within the current 4-table song-centric model and validate it against the user-provided PostgreSQL before treating it as the active schema candidate. Rejected: Leave the fused model as docs-only guidance | Without a runnable SQL file and smoke evidence, downstream implementation would still be ambiguous. Confidence: high Scope-risk: narrow Directive: Prefer acr_pg_schema_songcentric_v1.sql for new schema experiments tied to the current song-centric design; do not revive the older split-table model for Phase-1 by default. Tested: /usr/local/miniconda3/bin/python acr-engine/scripts/smoke_songcentric_schema_live.py --dsn postgres://d2:d2pass@127.0.0.1:5432/d2 --schema acr_songcentric_test; git diff --check; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: high-volume bulk ingest on the fused schema
cnb.bofCdSsphPA authored -
Constraint: Keep the storage design aligned to the current song-centric model while turning the 4-table fused schema into something engineers can directly review and implement. Rejected: Keep only conceptual docs without concrete SQL | It leaves too much ambiguity about where slices, models, and features actually land. Confidence: high Scope-risk: narrow Directive: Until the repository gains a production SQL file for the fused model, treat postgres_db_schema_samples.md as the authoritative DDL draft for media_entity/audio_object/feature_fact/set_membership. Tested: git diff --check on touched files; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files Not-tested: Executing the fused DDL against a live PostgreSQL schema
cnb.bofCdSsphPA authored -
Constraint: Keep only documentation that directly serves the current Phase-1 song-centric + fused-table storage and retrieval design. Rejected: Preserve broad historical, dataset, business-export, and template docs in the main docs root | They increase handoff cost and blur the active design surface. Confidence: high Scope-risk: moderate Directive: Treat postgresql-data-model.md as the single source of truth for where slices, models, and features are stored until a concrete fused DDL supersedes it. Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 11 active markdown files; final docs root reduced to 12 files Not-tested: external markdown renderers and downstream readers that may still expect removed auxiliary docs
cnb.bofCdSsphPA authored -
Constraint: Reduce schema reading cost for new engineers while preserving the logical distinctions needed for copyright-scale retrieval and attribution. Rejected: Keep adding highly specialized tables for every layer in Phase-1 | It increases join cost in the mental model faster than it improves first-stage delivery. Confidence: high Scope-risk: narrow Directive: Prefer a fused physical model (media_entity/audio_object/feature_fact/set_membership) with type fields, while keeping song/recording/asset/window as logical semantics. Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files; rg confirmed fused-model sections are present in docs Not-tested: concrete SQL DDL for the fused physical model
cnb.bofCdSsphPA authored -
Constraint: Keep the production-ready v2 model intact while making the first-delivery table set explicit for engineers starting implementation. Rejected: Introduce a separate competing Phase-1 schema document | It would create another parallel truth and slow handoff. Confidence: high Scope-risk: narrow Directive: When discussing first-stage storage, default to song/recording/recording_asset/audio_window plus feature and reference tables before bringing in heavier governance tables. Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files Not-tested: SQL DDL generation from the simplified narrative
cnb.bofCdSsphPA authored -
Constraint: Preserve the Phase-1 minimal schema story while clarifying when simplification is safe and when it creates future refactor risk. Rejected: Merge recording and recording_asset in the formal schema now | Copyright-scale catalogs will quickly need multi-file and multi-source recording support. Confidence: high Scope-risk: narrow Directive: Use 'song -> recording -> asset -> window -> feature' as the default communication shorthand, but keep recording and asset split in the persisted model. Tested: git diff --check on touched docs; /usr/local/miniconda3/bin/python scripts/check_markdown_links.py --root docs returned OK for 31 markdown files Not-tested: Rendered markdown preview in external viewers
cnb.bofCdSsphPA authored -
Constraint: Preserve the current Phase-1 runner, PostgreSQL v2 contract, and live validation narrative while removing duplicate doc entrypoints. Rejected: Keep multiple parallel handoff docs | They force new contributors to diff stale narratives before they can act. Confidence: high Scope-risk: narrow Directive: Treat README -> start-here -> session-handoff as the only first-read path unless a newer handoff chain fully replaces it. Tested: git diff --check on touched docs/script; rg for deleted-doc residual refs outside CHANGELOG; reran scripts/run_planner_validation_commands_live.py with executed_count=4 and all_passed=true Not-tested: Markdown link rendering in external viewers
cnb.bofCdSsphPA authored -
Constraint: The README remained a reading-first surface while the handoff had already converged on a faster validated startup command, so the docs entrypoint needed to match the actual recovery workflow. Rejected: Keep the shortest path only in session-handoff | That would still force many sessions to open the wrong document first. Confidence: high Scope-risk: narrow Directive: Treat docs/README.md and docs/session-handoff.md as aligned startup surfaces; keep the runner command identical in both places. Tested: git diff --check; /usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/planner_validation_commands_runner_report.json Not-tested: This commit reshapes documentation only; it does not change worker behavior.
cnb.bofCdSsphPA authored -
Constraint: The session handoff had become information-rich enough that the next session still needed manual triage, so the opening section had to be collapsed to one verified command path. Rejected: Keep the handoff primarily as a reading list | That would preserve context but not minimize restart latency. Confidence: high Scope-risk: narrow Directive: Start future sessions with the planner validation runner before reading deeper docs unless the task explicitly skips validation. Tested: git diff --check; verified docs/session-handoff.md now points to scripts/run_planner_validation_commands_live.py backed by fresh data/pgvector_eval/music20/planner_validation_commands_runner_report.json evidence (executed_count=4, all_passed=true) Not-tested: No new code-path execution was needed in this commit because it reorganizes the already-verified startup flow.
cnb.bofCdSsphPA authored -
Constraint: The planner artifact had become executable, but future sessions still needed a reusable entrypoint instead of ad-hoc inline Python to consume it. Rejected: Keep the execution proof as one-off shell snippets | That would not give the next session a durable command surface. Confidence: high Scope-risk: narrow Directive: Use run_planner_validation_commands_live.py as the default preflight gate before attempting new Phase-1 worker changes on a host. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_planner_validation_commands_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_planner_validation_commands_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/planner_validation_commands_runner_report.json Not-tested: The runner only validates planner entrypoints; it does not unlock successful extraction on an environment-blocked host.
cnb.bofCdSsphPA authored -
Constraint: Partial execution proof for planner validation commands still left room for manual reconstruction risk, so the remaining entrypoints had to be exercised too. Rejected: Stop after two executed planner commands | It would leave the negative matrix and asset-upsert entrypoints unproven. Confidence: high Scope-risk: narrow Directive: Treat phase1_validation_commands_execution_report.json as the authoritative proof that the planner artifact is executable end-to-end. Tested: git diff --check; /usr/local/miniconda3/bin/python - <<'PY' ... execute validation_commands.semantic_vector_negative_matrix and validation_commands.asset_level_upsert_validation from data/pgvector_eval/music20/phase1_extraction_plan_report.json ... PY Not-tested: Individual extraction jobs still remain environment-blocked; this commit proves validation entrypoints, not successful feature extraction.
cnb.bofCdSsphPA authored -
Constraint: Adding validation_commands to the planner was only useful if the emitted commands could be consumed directly, so the plan artifact needed one more layer of execution proof. Rejected: Assume command strings are correct because they look valid | That would leave restart automation unproven. Confidence: high Scope-risk: narrow Directive: Prefer executing validation_commands from the planner artifact instead of retyping equivalent checks by hand. Tested: git diff --check; /usr/local/miniconda3/bin/python - <<'PY' ... execute validation_commands.prereq_audit and validation_commands.worker_contract_smoke from data/pgvector_eval/music20/phase1_extraction_plan_report.json ... PY Not-tested: The remaining planner validation commands were not executed in this commit, though their sibling commands proved the artifact is directly consumable.
cnb.bofCdSsphPA authored -
Constraint: The repo now has multiple verified smoke and audit scripts, so leaving them outside the planner would force future sessions to rediscover the right validation commands by reading docs. Rejected: Document the commands only in markdown | That would drift from the executable plan artifact and slow restart execution. Confidence: high Scope-risk: narrow Directive: Treat validation_commands in the planner as the first-stop entrypoints before running individual extraction jobs on a new host. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/plan_phase1_extraction_jobs_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/plan_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-status pending --output data/pgvector_eval/music20/phase1_extraction_plan_report.json Not-tested: The planner still emits commands for an environment-blocked host and does not prove successful extraction by itself.
cnb.bofCdSsphPA authored -
Constraint: Worker-contract validation is now stable enough that the remaining uncertainty is host readiness, so the next blocker had to be made explicit instead of inferred from repeated failed runs. Rejected: Keep prerequisite knowledge only in prose | It would drift and force future sessions to rediscover the same missing mounts and packages. Confidence: high Scope-risk: narrow Directive: Run the prerequisite audit before retrying live extraction so host blockers are measured once and reused across lanes. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_phase1_prereq_audit_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_phase1_prereq_audit_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_prereq_audit_report.json Not-tested: This audit does not install dependencies or mount assets; it only reports readiness.
cnb.bofCdSsphPA authored -
Create Dockerfile.cnb based on the optimized Dockerfile with code-server v4.123.0 + 10 VS Code extensions (golang, cnb-welcome, code-runner, kubernetes, coding-copilot, github-theme, zh-hans-langpack, vscode-icons, indent-rainbow, markdown-all-in-one). Constraint: extensions install as user to ~/.local, then switch back to root Confidence: high Scope-risk: narrow Tested: docker build succeeded, all 10 extensions installed OK
cnb.bofCdSsphPA authored -
Constraint: Phase-1 semantic jobs were already blocked by missing audio and model runtimes, so vector-table regressions needed their own isolated live proof to avoid being masked by the same environment failures. Rejected: Infer vector-table coverage from code inspection only | It would not prove the worker writes the correct blocker reasons into PostgreSQL metadata. Confidence: high Scope-risk: narrow Directive: When semantic extraction fails, inspect vector_table_report.reason before assuming the host is only missing mounts or model dependencies. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_embedding_vector_table_negative_matrix_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_embedding_vector_table_negative_matrix_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --output data/pgvector_eval/music20/embedding_vector_table_negative_matrix_report.json Not-tested: No successful semantic extraction path exists yet on this host; this commit validates negative preflight cases only.
cnb.bofCdSsphPA authored -
Constraint: Phase-1 now has multiple lane-specific validation scripts, so without a single smoke entrypoint the next session must manually reconstruct the current blocker picture. Rejected: Keep exact and semantic checks separate only | It would slow restart diagnosis and hide the shared environment blockers. Confidence: high Scope-risk: narrow Directive: Use the smoke entrypoint first on future sessions to distinguish contract regressions from missing mounts/runtime prerequisites. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_phase1_worker_contract_smoke_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_phase1_worker_contract_smoke_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_worker_contract_smoke_report.json Not-tested: This smoke still reflects an environment-blocked host and does not prove successful extraction.
cnb.bofCdSsphPA authored -
Constraint: The schema already declared asset-level idempotency, but without live evidence future work could mistake it for an unverified design note. Rejected: Rely on DDL inspection alone | It would not prove duplicate inserts are blocked and upserts reuse the same embedding row. Confidence: high Scope-risk: narrow Directive: Keep asset-level writer implementations aligned with the verified ON CONFLICT (feature_set_id, asset_id) WHERE window_id IS NULL contract. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/validate_audio_embedding_asset_upsert_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/validate_audio_embedding_asset_upsert_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_asset_upsert_test --output data/pgvector_eval/music20/audio_embedding_asset_upsert_live_report.json Not-tested: No production semantic writer uses the asset-level contract yet; this commit validates the DB contract, not an end-to-end extractor.
cnb.bofCdSsphPA authored -
Constraint: The current container still lacks mounted source audio and the semantic model runtimes, so repeated manual spot-checks are noisy and wasteful. Rejected: Ad-hoc one-job validation only | It would not show whether failures are contract-wide or model-specific. Confidence: high Scope-risk: narrow Directive: Re-run the matrix before claiming any semantic worker progress so blocker drift across MERT/MuQ/ECAPA is visible. Tested: /usr/local/miniconda3/bin/python -m py_compile scripts/run_phase1_embedding_preflight_matrix_live.py; git diff --check; /usr/local/miniconda3/bin/python scripts/run_phase1_embedding_preflight_matrix_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --output data/pgvector_eval/music20/phase1_embedding_preflight_matrix_report.json Not-tested: This matrix still cannot prove successful semantic inference until assets and runtime dependencies are available.
cnb.bofCdSsphPA authored -
Constraint: Current container lacks /workspace/downloads and torch/torchaudio/transformers, so Phase-1 semantic work must prove honest failure semantics instead of pretending inference succeeded. Rejected: Stub semantic embeddings | Would blur the contract between real model outputs and repo-local placeholders. Confidence: high Scope-risk: narrow Directive: Keep the preflight blockers explicit until real MERT/MuQ/ECAPA adapters and asset-level embedding tests exist. Tested: /usr/local/miniconda3/bin/python -m py_compile workers/run_embedding_job.py workers/run_chromaprint_job.py workers/_job_common.py scripts/bootstrap_phase1_extraction_jobs_live.py scripts/plan_phase1_extraction_jobs_live.py scripts/bootstrap_phase1_reference_members_live.py scripts/live_pgvector_music20_eval.py; git diff --check; /usr/local/miniconda3/bin/python scripts/bootstrap_phase1_extraction_jobs_live.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test; /usr/local/miniconda3/bin/python workers/run_embedding_job.py --dsn 'postgres://d2:d2pass@127.0.0.1:5432/d2' --schema acr_test --job-id 2 --model-name mert --model-version v1-95m --vector-table audio_embedding_vector_768 --output data/pgvector_eval/music20/phase1_worker_embedding_write_attempt.json Not-tested: Real encoder inference and asset-level embedding upsert path remain unavailable in this container.
cnb.bofCdSsphPA authored -
Remove code-server, build-essential, gcc, libc6-dev, pkg-config, libssl-dev from final stage. Add conda clean post-install in builder. Strip unnecessary opencode platform binaries (musl/baseline variants) post-npm-install. Remove redundant COPY layers for opencode (already covered by full node directory copy). Keep opencode.exe entry point (Node.js bootstrap). Constraint: buildkit crashes with 'frontend grpc server closed unexpectedly' on this host; legacy builder used Confidence: high Scope-risk: narrow Directive: opencode.exe is the Node.js bootstrapper, not a Windows binary; do not delete Tested: docker run --rm verified node/npm/bun/python/hx/claude/opencode/nvim all work
cnb.bofCdSsphPA authored -
Constraint: the Phase-1 exact lane must not pretend success when reference audio is unreadable, and repeated writes must be idempotent at the database boundary. Rejected: keep partial-success writes in completed state | rejected because it would blur asset-readability failures and weaken auditability. Confidence: high Scope-risk: moderate Directive: preserve the repo-local chromaprint-style wording and the all-or-nothing failure semantics until production audio mounts and real extractor validation are in place. Tested: py_compile for chromaprint matcher and chromaprint worker; live PostgreSQL unique index creation on acr_test; non-dry-run chromaprint worker attempt with job_status=failed and failure_reason=unreadable_audio_assets; bootstrap reset back to pending; architect review APPROVED. Not-tested: successful audio_fingerprint writes against mounted production audio, semantic worker real writes, large-scale concurrent exact-lane execution.
cnb.bofCdSsphPA authored
-