Self-hosted, privacy-first document conversion API. Office to PDF, HTML to PDF, PDF manipulation and OCR, designed to slot into CI/CD pipelines without per-conversion fees or third-party uploads.
Existing document conversion services charge per conversion or require uploading sensitive files to third-party servers. I needed enterprise-grade conversion that runs anywhere with no vendor lock-in.
Built a Python async stack with FastAPI for the API layer, SQLAlchemy 2.0 async with PostgreSQL 16 for persistence, Redis 7 powering both the arq job queue and rate limiting, and MinIO for S3-compatible object storage. Delegated the heavy lifting to Gotenberg (LibreOffice and headless Chromium) for Office and HTML to PDF, and Stirling-PDF (Tesseract OCR) for PDF manipulation. Workers run independently from the API. Every external engine call sits behind a circuit breaker.
Foundation and core conversion phases complete. Handles job submission, authentication, rate limiting and the main conversion types. Property-based tests with Hypothesis on the validation logic, testcontainers-backed integration tests, 80%+ coverage target.