ArchiveBox is open-source self-hosted web archiving for preserving websites, bookmarks, RSS feeds, social posts, media, evidence, research sources, and institutional records in durable HTML, PDF, PNG, WARC, JSON, SQLite, and filesystem formats.
A powerful, open‑source tool that saves web pages, media, bookmarks, and research material into durable, portable formats you control. Ideal for personal archiving, OSINT, journalism, and long‑term knowledge preservation.
✨ Features
📦 Multiple capture formats — HTML, PDF, PNG, TXT, JSON, WARC, MP4, SQLite and file system formats
🔁 Multiple captures per URL — screenshots, article text, headers, favicons, media, git repos
🌐 Import from anywhere — bookmarks, browser history, RSS, JSON, CSV, TXT, Markdown, Pocket, Pinboard, Shaarli
🖥️ Web UI + CLI + REST API — manage archives however you prefer
🧩 Extensible ecosystem — plugins, extractors, automation tools
🔒 Self‑hosted by default — keep everything on infrastructure you control
🗂️ Readable for decades — snapshots stored as ordinary files and folders
🐳 Docker‑friendly — recommended setup with bundled dependencies (Chrome, wget, yt‑dlp, SingleFile, Readability)
⚙️ Highly configurable — environment variables, config file, or CLI
🧠 Great for professionals — journalists, lawyers, researchers, OSINT teams or personal archivers
🚀 Automate everything — scheduled imports, webhooks, API‑driven workflows
