
FFmpeg in the real world: how we use it in LetsAI to process AI video and audio
FFmpeg is one of those tools everyone knows by name but few use in production. We put it at the center of LetsAI's pipeline for transcoding, format conversion and compositing. Here's how it works handling thousands of files per day.

Why FFmpeg and not cloud services
The first temptation was a managed service — AWS MediaConvert, Cloudflare Stream. Then we did the math: with LetsAI's file volume, the cost would be unsustainable. A 30-second AI video at 1080p weighs 50-80MB. Hundreds of generations per day, terabytes per month. FFmpeg runs on our servers, costs zero in licenses (LGPL), and does everything. The trade-off is you must know how to configure it — docs aren't user-friendly. But after a few weeks we have a solid pipeline that hasn't failed in 18 months of production.
The pipeline: from raw AI file to final format
When an AI provider returns video or audio, the format is almost always different from what's needed. One model generates WebM, user wants MP4. Another generates WAV at 48kHz, needs MP3 at 44.1kHz. The pipeline: 1. Automatic analysis with ffprobe (codec, resolution, bitrate, duration) 2. Audio normalization: loudness at -14 LUFS (streaming standard), silence removal 3. Video transcoding: H.264 for universal compatibility 4. Thumbnail generation: frame at 1/3 duration, 640x360 5. Output in requested formats (MP4, WebM, MP3, WAV, FLAC) All in a separate BullMQ worker. Average time: 8-15 seconds for a 30-second video.
Concatenation: merging AI clips into a coherent video
One of the most requested features: generate multiple short clips and merge them. Sounds simple — in reality it's a nightmare if clips have different codecs, resolutions or frame rates. FFmpeg has the concat filter, but only works if inputs have the same specs. In the real world they don't. Solution: normalize every clip first (same resolution, codec, frame rate), then concatenate. Crossfade between scenes with xfade — 0.5 seconds is enough for continuity. For audio: loudness normalization, crossfade between tracks, final mix. The FFmpeg command is 20-30 lines, generated dynamically in Node.js.
Common mistakes and how we solved them
FFmpeg is powerful but punishes you. Our mistakes: Memory: can eat all RAM. We use -threads 2 and limited -bufsize. Max 2GB per worker. Corrupted files: AI providers sometimes return truncated files. ffprobe check first. If corrupted, discard and notify. Timeout: transcoding over 60 seconds gets killed. Better to fail fast than block the queue. Permissions: worker runs as dedicated user with access only to working directory. Every error became an automated CI test. Today: 1,500 files/day, error rate under 0.3%.
Related Services
See how we apply these technologies in our enterprise projects.
Interested?
Contact us to receive a personalized quote.
Securvita S.r.l. — i3k.eu