Schema-driven document extraction with local OCR + LLM. Document in, Structured JSON out.
Document in, Structured JSON out. Locally. With your schema. docpick is a lightweight, schema-driven document extraction pipeline that combines local OCR engines with local LLMs to extract structured JSON from any document — invoices, receipts, bills of lading, tax forms, and more. Zero cloud dependency — runs entirely on your machine (CPU or GPU) Custom schemas — define your own Pydantic models…
Verification confirms publisher identity (repo ownership), not code safety. The security scan covers known CVEs and suspicious install scripts — it cannot prove the absence of malicious code.
Document in, Structured JSON out. Locally. With your schema. docpick is a lightweight, schema-driven document extraction pipeline that combines local OCR engines with local LLMs to extract structured JSON from any document — invoices, receipts, bills of lading, tax forms, and more. Zero cloud dependency — runs entirely on your machine (CPU or GPU) Custom schemas — define your own Pydantic models or use 8 built-in document schemas Validation built-in — checkdigit verification, cross-field rules,…