When Your Celery Workers Ran Tasks Twice: Migrating from Redis to RabbitMQ

What happened (the incident) We had a production incident where long-running AI pricing tasks were requeued and re-executed, doubling downstream LLM API costs and wasting CPU time. The short timeline: A batch of long-running tasks (each ~20–25 minutes) completed on workers. Immediately after completion the network layer silently dropped the broker TCP connection. Because tasks were running with late ACKs, Celery never acknowledged the finished tasks to Redis. Redis treated the tasks as unacknowledged and requeued them — workers picked them up again and executed the tasks a second time. The financial impact was unfortunate and immediate: LLM calls and compute doubled for affected runs. The technical impact exposed two important failure modes when using Redis as a Celery broker for long-running tasks: visibility timeout semantics and idle TCP connection drops. ...

March 1, 2026 · 10 min · Zeeshan Khan

Extracting Structured Content from PDFs Using OCR and LLM

Introduction In this post, I want to document the steps I took to parse PDFs and extract structured output using OCR and LLM. The goal is to extract structured content from PDFs and other documents with high accuracy. The results of this experiment were quite impressive. I was able to extract structured content from various documents with a high degree of accuracy. The process was straightforward, and the extracted data was well-structured and useful. ...

August 30, 2024 · 3 min · Zeeshan Khan