A small-model-friendly Python CLI that scans a repository, finds duplicate or near-duplicate code/logic, and prepares safe replacement plans.
- Walks a repo while honoring
.gitignore-style excludes. - Extracts Python functions and methods with
ast. - Normalizes code to improve duplicate detection.
- Finds exact duplicates by hashing normalized bodies.
- Finds near-duplicates with embeddings and FAISS.
- Produces a JSON plan and Markdown report for human review.
- Emits prompt bundles so a small local LLM can propose replacements with tight context windows.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython -m repo_dedupe_tool scan /path/to/repo --out findings
python -m repo_dedupe_tool prompts findings/plan.json --out findings/promptsreport.md: readable summaryplan.json: machine-readable candidate actionsprompts/*.md: one prompt per duplicate cluster
Use the export command to create a zip archive of the project directory.
python -m repo_dedupe_tool export output/repo_dedupe_tool --zip-path repo_dedupe_tool.zipThe tool is designed to stop at a plan stage first. After user review, you can add a patch-generation step that converts one approved cluster into a unified diff and applies it only after explicit confirmation.
- Run
scanto produceplan.json. - Review a cluster and prepare a replacement snippet file that contains the approved function body.
- Generate a patch:
python -m repo_dedupe_tool make-patch findings/plan.json /path/to/repo \
--cluster-id cluster-001 \
--member-key 'src/app.py:10-35:old_func' \
--replacement-file approved_replacement.py \
--out findings/patches- Apply it only after confirmation:
python -m repo_dedupe_tool apply-patch findings/patches/cluster-001_old_func.patch /path/to/repo