Software & AIFebruary 8, 2026

Nginx: The Reverse Proxy That Protects and Accelerates Every Application We Build

None of our applications — RAG Enterprise, CRM81, LetsAI — is directly exposed to the internet. Nginx is the guardian that sits in front of everything: terminates SSL, compresses responses, rate-limits traffic, and forwards requests to backends. Here's how we configure it in production.

Nginx: The Reverse Proxy That Protects and Accelerates Every Application We Build - Software & AI | i3k

SSL Termination and Let's Encrypt: HTTPS Without Compromise

Every on-premise RAG Enterprise deployment has a dedicated client domain (e.g., rag.company.com) with a free SSL certificate via Let's Encrypt and Certbot. Nginx handles SSL termination, so the FastAPI backend receives cleartext traffic on localhost:8000 without ever being directly exposed. This enormously simplifies backend configuration. Our SSL configuration follows Mozilla's "Modern" best practices: mandatory TLS 1.3, restricted cipher suites (TLS_AES_256_GCM_SHA384, TLS_CHACHA20_POLY1305_SHA256), HSTS with one-year max-age, and OCSP stapling enabled. The result is an A+ score on SSL Labs for every installation. It's not vanity — banking and legal sector clients require it contractually. Certbot automatically renews certificates 30 days before expiration with a systemd timer. In 2 years of deployments we've never had an expired certificate. Nginx reload after renewal is zero-downtime thanks to the nginx -s reload command that doesn't interrupt active connections.

Rate Limiting, Gzip, and Performance Optimization

A RAG Enterprise system receives heavy requests: each query can generate 2-5 KB text responses, and document indexing can upload multi-megabyte files. Without rate limiting, a single user could saturate the LLM with continuous requests. Our Nginx configuration enforces a limit of 10 requests per second per IP on the /api/query endpoint with a burst of 20, and 2 requests per minute on the /api/upload endpoint to prevent abuse. Gzip compression is enabled for all text content types (application/json, text/html, text/plain) at level 6, which offers the best compression/CPU ratio. JSON responses from our RAG, which can include extensive document citations, compress by an average of 75%. On a typical enterprise connection this means 200ms less perceived latency. We've also configured Nginx caching for React frontend static assets. JS/CSS files with hashed names are served with Cache-Control: max-age=31536000 (one year). API responses are never cached — every RAG query must be fresh. proxy_buffering is disabled for SSE streaming endpoints, so AI responses arrive token by token without buffering.

WebSocket Proxying and Multi-Service Configuration

RAG Enterprise PRO uses Server-Sent Events for response streaming, but CRM81 requires bidirectional WebSocket for real-time notifications and integrated chat. Nginx handles both protocols on the same domain with separate location blocks. For WebSocket the configuration is proxy_set_header Upgrade $http_upgrade and proxy_set_header Connection "upgrade", with a proxy_read_timeout of 3600 seconds to keep connections alive. On more complex installations, where the client wants RAG Enterprise and CRM81 on the same server, we use Nginx as an application router. The /rag/ path forwards to localhost:8000 (FastAPI), /crm/ to localhost:3000 (Node.js), and the React frontend is served directly by Nginx as static files. One server, one domain, one SSL certificate. For clients with high availability requirements, we configure Nginx as a load balancer with upstream blocks and active health checks. Two backend instances on different ports (8000 and 8001), with Nginx distributing traffic and automatically removing the unresponsive backend. Failover happens in under 5 seconds and the client notices nothing. This setup has allowed us to do zero-downtime upgrades even on on-premise deployments.

Interested?

Contact us to receive a personalized quote.

All articles

Securvita S.r.l. — i3k.eu