Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Register
  • Sign in
  • slapos slapos
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 0
    • Issues 0
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 1
    • Merge requests 1
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Cédric Le Ninivin
  • slaposslapos
  • Merge requests
  • !2

claude: harvest MR review rules into reviewable, agent-readable docs

  • Review changes

  • Download
  • Patches
  • Plain diff
Open Cédric Le Ninivin requested to merge harvest-coding-rules into claude-code-v0 May 13, 2026
  • Overview 6
  • Commits 4
  • Changes 44

Adds the end-to-end pipeline that mines unwritten coding rules out of this project's GitLab MR review history and renders them as Nexedi-style docs/coding-rules/*.md pages — both reviewable by humans and loadable as context by coding agents (Claude Code, Cursor, Aider, Codex). AGENTS.md at the root points every such agent at the docs.

Harvesting (tools/harvest-mr-rules/)

download.py + fetch_single_comment_batch.py Fetch every merged and closed MR's review threads from lab.nexedi.com via the GitLab REST API into a local SQLite corpus (~10 MB, gitignored). Includes per-discussion diff_hunks, resolved-state, reviewer authors, and the linked notes. Resumable via checkpoint table.

schema.sql + filter.sql Corpus schema + an interesting_discussions view that gates on minimum-length human comments with a reviewer participating.

scopes.py Maps a rule's path-scope to an output area-file (slapos/recipe/ → recipe.md, software/rapid-cdn/ → rapid-cdn.md, …).

Curation skills (.claude/skills/)

extract-mr-rules/ Walks a batch of discussions and saves candidates.jsonl entries (rule + rationale + evidence + scope). SKILL.md briefs the agent on the extraction protocol; save_candidate.py / fetch_batch.py do the DB work.

dedupe-rules/ Clusters near-duplicates with simple TF-IDF, asks the agent to decide merge / keep-separate / subsume per pair, then runs promote.py to apply the promotion threshold (≥2 reviewers OR ≥3 MRs → promoted; else soft).

attach-examples/ Enriches rules.jsonl with a one-line title, a Bad code snippet (extracted verbatim from the source discussion's diff_hunk, re- anchored against position_new_line so GitLab's preview-truncated hunks line up), and a Good code snippet (extracted verbatim from git show origin/master:, anchor-matched against the Bad's first non-blank line with a line-range fallback). No LLM. Files deleted on master → examples_status: needs-curation.

place-rules/ Renders rules.jsonl into docs/coding-rules/*.md as per-rule H3 sections with a TOC table at the top of each area file. Each section: title + metadata strip + description + Bad fenced block + Good fenced block + Why-this-rule rationale + Evidence URL list. INDEX.md preamble (between META-RULES BEGIN/END markers) is preserved across regenerations so the hand-edited meta-rules summary survives.

Pre-commit hook (.claude/hooks/ + tools/check-coding-rules/)

Spawns claude -p with the staged diff plus the area-relevant rule files; warns on the commit if the diff appears to violate a rule. Warn-only by default; CODING_RULES_BLOCK=1 makes it fatal. Skippable with CODING_RULES_SKIP=1. Installed both as a git pre-commit hook (terminal commits) and as a Claude Code PreToolUse(Bash) hook registered in .claude/settings.json.

Re-running the pipeline

download.py refreshes the corpus → /extract-mr-rules walks new discussions → /dedupe-rules promotes → /attach-examples re-attaches Bad/Good snippets → /place-rules regenerates the docs. Rule IDs are sticky once allocated; existing rules survive a re-run unchanged.

Assignee
Assign to
Reviewers
Request review from
Time tracking
Source branch: harvest-coding-rules
GitLab Nexedi Edition | About GitLab | About Nexedi | 沪ICP备2021021310号-2 | 沪ICP备2021021310号-7