
Linux Server: The Foundation of Every On-Premise AI Deployment
Every RAG Enterprise and CRM81 instance we deploy runs on Ubuntu Server. It's not a random choice: Linux gives us total control over resources, security, and the reliability that enterprise clients demand. Here's how we configure and harden our servers.

Ubuntu Server and Systemd: The Heart of Our Deployments
When an enterprise client signs the contract for RAG Enterprise on-premise, the first thing we do is prepare an Ubuntu 22.04 LTS server. The LTS choice isn't random — it guarantees us 5 years of security updates without touching system dependencies. We had a client who kept the same installation for 3 years without a single OS-related issue. Every component of our stack runs as a systemd unit: the FastAPI backend, the Qdrant database, the embedding service, the document indexing worker. Systemd gives us automatic restart on crash (Restart=on-failure with RestartSec=5s), centralized log management with journald, and service dependencies (After=qdrant.service). If Qdrant isn't ready, FastAPI doesn't start. We use systemd units with advanced sandboxing: ProtectSystem=strict, PrivateTmp=true, NoNewPrivileges=true. Every service runs with the minimum required permissions. The FastAPI service can't write outside its data directory, the indexing worker can't access the network. It's defense in depth at the process level.
Hardening: UFW, Fail2ban, and SSH Access
Server security starts from minute one. As soon as the OS is installed, our Ansible playbook applies complete hardening. The UFW firewall is configured with a deny-all inbound policy: we only open ports 22 (SSH), 443 (HTTPS via Nginx), and 80 (redirect to HTTPS). No database port, no Qdrant port, no backend port is directly exposed. Everything goes through the reverse proxy. Fail2ban monitors SSH and Nginx logs. After 3 failed attempts in 10 minutes, the IP gets banned for 24 hours. For SSH we have an even stricter policy: authentication only via RSA/Ed25519 keys, password login disabled, root login disabled, and access restricted to a dedicated group (AllowGroups i3k-deploy). In 2 years of on-premise deployments, we've never had a breach. We also manage automatic security updates with unattended-upgrades, configured to apply only security patches without touching application packages. Every week a cron job sends us a report on the server's security status. If a critical CVE is published for an installed package, we know within 24 hours.
Resource Monitoring and Performance Management
A RAG Enterprise system on an on-premise server consumes resources very differently from a traditional web application. Document embedding is CPU-intensive and can saturate all cores for hours during initial indexing. Vector search on Qdrant requires RAM proportional to index size. The LLM model, if local, needs a dedicated GPU. That's why we monitor everything with a combination of native Linux tools and Prometheus. We have systemd-timer scripts that collect CPU, RAM, disk I/O, and GPU temperature metrics every 30 seconds. If RAM exceeds 85% or GPU temperature exceeds 82°C, we get a Slack alert. We've saved at least two servers from the OOM killer by intervening proactively. Server sizing is an art we've refined through experience. For a RAG Enterprise installation with up to 50,000 documents, we recommend minimum 32 GB RAM, 8 cores, and 500 GB NVMe SSD. For larger installations (over 200,000 documents) we spec 64 GB RAM and consider an NVIDIA T4 GPU for local embedding. These specs aren't theory — they're the result of 20+ real deployments.
Related Services
See how we apply these technologies in our enterprise projects.
Interested?
Contact us to receive a personalized quote.
Securvita S.r.l. — i3k.eu