Lyall Paldano
CV
© 2026 Lyall Paldano. All rights reserved.
Back to Home

DocConvert

Sep 2025 - Present

Self-hosted, privacy-first document conversion API. Office to PDF, HTML to PDF, PDF manipulation and OCR, designed to slot into CI/CD pipelines without per-conversion fees or third-party uploads.

Tech Stack
PythonFastAPIPostgreSQL 16SQLAlchemy 2.0Redis 7arqMinIOGotenbergStirling-PDFTesseractDocker ComposeTraefikHypothesis

Challenge

Existing document conversion services charge per conversion or require uploading sensitive files to third-party servers. I needed enterprise-grade conversion that runs anywhere with no vendor lock-in.

Solution

Built a Python async stack with FastAPI for the API layer, SQLAlchemy 2.0 async with PostgreSQL 16 for persistence, Redis 7 powering both the arq job queue and rate limiting, and MinIO for S3-compatible object storage. Delegated the heavy lifting to Gotenberg (LibreOffice and headless Chromium) for Office and HTML to PDF, and Stirling-PDF (Tesseract OCR) for PDF manipulation. Workers run independently from the API. Every external engine call sits behind a circuit breaker.

Outcome

Foundation and core conversion phases complete. Handles job submission, authentication, rate limiting and the main conversion types. Property-based tests with Hypothesis on the validation logic, testcontainers-backed integration tests, 80%+ coverage target.

Skills Learned

Async Python at the API layerWorker and API decouplingCircuit breaker patternsProperty-based testing with HypothesisSelf-hosted document pipelinesVertical slice delivery