Adds the end-to-end pipeline that mines unwritten coding rules out of this project's GitLab MR review history and renders them as Nexedi-style docs/coding-rules/*.md pages — both reviewable by humans and loadable as context by coding agents (Claude Code, Cursor, Aider, Codex). AGENTS.md at the root points every such agent at the docs.
Harvesting (tools/harvest-mr-rules/)
download.py + fetch_single_comment_batch.py Fetch every merged and closed MR's review threads from lab.nexedi.com via the GitLab REST API into a local SQLite corpus (~10 MB, gitignored). Includes per-discussion diff_hunks, resolved-state, reviewer authors, and the linked notes. Resumable via checkpoint table.
schema.sql + filter.sql
Corpus schema + an interesting_discussions view that gates on
minimum-length human comments with a reviewer participating.
scopes.py Maps a rule's path-scope to an output area-file (slapos/recipe/ → recipe.md, software/rapid-cdn/ → rapid-cdn.md, …).
Curation skills (.claude/skills/)
extract-mr-rules/ Walks a batch of discussions and saves candidates.jsonl entries (rule + rationale + evidence + scope). SKILL.md briefs the agent on the extraction protocol; save_candidate.py / fetch_batch.py do the DB work.
dedupe-rules/ Clusters near-duplicates with simple TF-IDF, asks the agent to decide merge / keep-separate / subsume per pair, then runs promote.py to apply the promotion threshold (≥2 reviewers OR ≥3 MRs → promoted; else soft).
attach-examples/
Enriches rules.jsonl with a one-line title, a Bad code snippet
(extracted verbatim from the source discussion's diff_hunk, re-
anchored against position_new_line so GitLab's preview-truncated
hunks line up), and a Good code snippet (extracted verbatim from
git show origin/master:, anchor-matched against the Bad's
first non-blank line with a line-range fallback). No LLM. Files
deleted on master → examples_status: needs-curation.
place-rules/ Renders rules.jsonl into docs/coding-rules/*.md as per-rule H3 sections with a TOC table at the top of each area file. Each section: title + metadata strip + description + Bad fenced block + Good fenced block + Why-this-rule rationale + Evidence URL list. INDEX.md preamble (between META-RULES BEGIN/END markers) is preserved across regenerations so the hand-edited meta-rules summary survives.
Pre-commit hook (.claude/hooks/ + tools/check-coding-rules/)
Spawns claude -p with the staged diff plus the area-relevant rule
files; warns on the commit if the diff appears to violate a rule.
Warn-only by default; CODING_RULES_BLOCK=1 makes it fatal. Skippable
with CODING_RULES_SKIP=1. Installed both as a git pre-commit hook
(terminal commits) and as a Claude Code PreToolUse(Bash) hook
registered in .claude/settings.json.
Re-running the pipeline
download.py refreshes the corpus → /extract-mr-rules walks new discussions → /dedupe-rules promotes → /attach-examples re-attaches Bad/Good snippets → /place-rules regenerates the docs. Rule IDs are sticky once allocated; existing rules survive a re-run unchanged.