Skip to main content

Paperless

AI-Powered Document Classification with paperless-ai and Ollama

·2743 words·13 mins
This post is a complete runbook for integrating AI-powered auto-tagging and classification into paperless-ngx using paperless-ai and a locally-running Ollama instance. The setup uses a local LLM to read document text and automatically populate metadata fields — title, document type, tags, correspondent, date, and custom fields. Hardware and Architecture # NAS (Synology DS1621+, 10.0.10.10): runs paperless-ngx on port 5656 Desktop PC: Windows with WSL2, Docker Desktop, RTX 4090 Goal: AI auto-tagging/classification using a local LLM, zero cloud dependency The key architecture decision is a pull model: paperless-ai runs in WSL2 Docker, polls the paperless-ngx API for documents tagged ai-pending, processes them with Ollama, and writes metadata back. This is the correct approach for a desktop that is not on 24/7 — the NAS holds the queue and the desktop drains it when available.

Paperless-ngx: Migrating a Decade of Documents from Google Drive

Runbook and design journal for migrating ~400 personal documents from a folder-based Google Drive system into Paperless-ngx on a Synology NAS. Covers taxonomy design, bulk import from Google Takeout, ML classifier setup, and ongoing intake workflow. Problem Statement # For years my “document management” was a manually maintained folder tree on Google Drive: 1 2 3 4 5 6 7 8 9 10 10 - 文书材料/ 10 - 证件材料/身份证件/ 30 - 移民文档/ 30 - Tax Filing/ 40 - Finance/ 50 - 车辆注册/ 60 - 住房买房/ 80 - Medical/ 20 - 家装住房信息/ 80 - 旅行计划/ This worked well enough for filing but poorly for retrieval. Finding “what insurance forms did I have in 2022?” meant navigating six folders and guessing what I named things. Paperless-ngx offers full-text search, OCR, and an ML classifier that learns from your own labeling — a meaningfully better system for a document archive that spans immigration paperwork, tax filings, mortgage docs, and medical records across 10+ years.