GitHub - codelined-ag/Extracto: Your private document brain. PDFs in, RAG out. Self-hosted. Plug everywhere.

Your private document brain.
PDFs in, RAG out. Self-hosted. Plug everywhere.

Quickstart · What you get · Plug everywhere · Docs · OpenAPI · Changelog

Extracto workspace

v1.1.0: cloud integrations end-to-end. Connect Dropbox / Google Drive / OneDrive from the UI (paste your own OAuth client_id+secret if the operator hasn't), browse and import any file from the cloud, send any OCR result back as md, docx, xlsx, obsidian, or zip, and configure watched folders (cloud or local) that auto-submit new files for OCR. See the changelog.

v1.0.0: side-by-side multi-model comparison with server-computed word-level diff, model recommendations from your own OCR history, PII auto-redaction with audit trail, form-field extraction, LaTeX equation extraction, and an E2E encryption scaffold (RSA SPKI public-key registration + AES-256-GCM envelope).

Why

Most document-to-AI tools are SaaS. They cost per page, they see your documents, they lock you into one provider. Extracto is the opposite: one Docker container, your machine, any vision model (local or hosted), output goes wherever you want it. Browser, code, agent, vector store. You pick.

What you get

A complete pipeline from raw document to retrievable knowledge, in one container:

Ingest any PDF, image, watched local folder, or watched Dropbox / Google Drive / OneDrive folder.
Extract with the vision model of your choice (Ollama, Mistral OCR, OpenRouter, any OpenAI-compatible endpoint).
Post-process with a second LLM pass (clean to markdown or strict JSON, with your own instruction).
Chunk + embed + store into Chroma, Qdrant, Weaviate, Milvus, OpenSearch, Pinecone, or Typesense.
Retrieve through a stable v1 REST API, an OpenAI-Chat-Completions adapter, an MCP server, a typed CLI, or the browser UI.
Push any result back to Dropbox / Google Drive / OneDrive, S3/MinIO, or download as md, json, docx, rtf, csv, xlsx, obsidian, or per-page zip.

Everything else (per-user accounts, scoped API keys, rate limits, signed webhooks, S3/MinIO offload, Prometheus metrics, multi-language UI, per-user OAuth credentials when the operator hasn't preconfigured them) is documented at extracto.help.

Quickstart

You need Docker. That's it.

curl -fsSL https://extracto.help/install.sh | bash

Pulls the prebuilt multi-arch image, runs a single container with an auto-generated AUTH_SECRET and a persistent SQLite volume, waits for the healthcheck, and prints the URL. Open http://localhost:3000, sign up, follow the tour.

For the full install (compose stack, Docker + Ollama provisioning, extracto CLI on PATH, Windows path), see extracto.help/install.

Plug everywhere

Same backend, five surfaces. Pick what fits.

Surface	Use it when	Read
Browser UI	You're a human with a stack of PDFs	How it works
REST API (`/api/v1/*`)	You're building a document-intake pipeline	API reference
MCP server	Your agent speaks MCP (Claude Desktop, Cursor, Codex, OpenClaw, Hermes)	Agents
CLI + `SKILL.md`	Your agent only has a shell tool	Skill file
OpenAI-Chat adapter	You already have OpenAI-SDK code; point it at Extracto	OpenAI compat

OpenAPI 3.1 spec at openapi.yaml. Live Scalar reference at /api/v1/docs on every running instance.

Name		Name	Last commit message	Last commit date
Latest commit History 393 Commits
.github/workflows		.github/workflows
docs/screenshots		docs/screenshots
download		download
examples		examples
prisma		prisma
public		public
scripts		scripts
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Caddyfile		Caddyfile
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SKILL.md		SKILL.md
bun.lock		bun.lock
components.json		components.json
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
docker.env		docker.env
eslint.config.mjs		eslint.config.mjs
extracto-banner.png		extracto-banner.png
install-extracto.sh		install-extracto.sh
next.config.ts		next.config.ts
openapi.yaml		openapi.yaml
package.json		package.json
postcss.config.mjs		postcss.config.mjs
renovate.json		renovate.json
scorecard.png		scorecard.png
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why

What you get

Quickstart

Plug everywhere

Star history

License

About

Uh oh!

Releases 24

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why

What you get

Quickstart

Plug everywhere

Star history

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 24

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages