Why Ubuntu Server instead of Debian or CentOS for on-premise AI deployments?

Ubuntu LTS offers 5 years of support, official NVIDIA drivers in repositories, and the broadest compatibility with ML libraries like PyTorch and CUDA. Our clients' IT teams already know it.

How many resources does an on-premise RAG Enterprise server need?

Minimum 32 GB RAM, 8 CPU cores, and 500 GB NVMe SSD for up to 50,000 documents. Over 200,000 documents we recommend 64 GB RAM and an NVIDIA T4 GPU.

Back to Blog

Software & AIFebruary 7, 2026

Linux Server: The Foundation of Every On-Premise AI Deployment

linux server on-premise deploy ubuntu systemd

Every RAG Enterprise and CRM81 instance we deploy runs on Ubuntu Server. It's not a random choice: Linux gives us total control over resources, security, and the reliability that enterprise clients demand. Here's how we configure and harden our servers.

Linux Server: The Foundation of Every On-Premise AI Deployment - Software & AI | i3k

Ubuntu Server and Systemd: The Heart of Our Deployments

When an enterprise client signs the contract for RAG Enterprise on-premise, the first thing we do is prepare an Ubuntu 22.04 LTS server. The LTS choice isn't random — it guarantees us 5 years of security updates without touching system dependencies. We had a client who kept the same installation for 3 years without a single OS-related issue. Every component of our stack runs as a systemd unit: the FastAPI backend, the Qdrant database, the embedding service, the document indexing worker. Systemd gives us automatic restart on crash (Restart=on-failure with RestartSec=5s), centralized log management with journald, and service dependencies (After=qdrant.service). If Qdrant isn't ready, FastAPI doesn't start. We use systemd units with advanced sandboxing: ProtectSystem=strict, PrivateTmp=true, NoNewPrivileges=true. Every service runs with the minimum required permissions. The FastAPI service can't write outside its data directory, the indexing worker can't access the network. It's defense in depth at the process level.

Hardening: UFW, Fail2ban, and SSH Access

Server security starts from minute one. As soon as the OS is installed, our Ansible playbook applies complete hardening. The UFW firewall is configured with a deny-all inbound policy: we only open ports 22 (SSH), 443 (HTTPS via Nginx), and 80 (redirect to HTTPS). No database port, no Qdrant port, no backend port is directly exposed. Everything goes through the reverse proxy. Fail2ban monitors SSH and Nginx logs. After 3 failed attempts in 10 minutes, the IP gets banned for 24 hours. For SSH we have an even stricter policy: authentication only via RSA/Ed25519 keys, password login disabled, root login disabled, and access restricted to a dedicated group (AllowGroups i3k-deploy). In 2 years of on-premise deployments, we've never had a breach. We also manage automatic security updates with unattended-upgrades, configured to apply only security patches without touching application packages. Every week a cron job sends us a report on the server's security status. If a critical CVE is published for an installed package, we know within 24 hours.

Resource Monitoring and Performance Management

A RAG Enterprise system on an on-premise server consumes resources very differently from a traditional web application. Document embedding is CPU-intensive and can saturate all cores for hours during initial indexing. Vector search on Qdrant requires RAM proportional to index size. The LLM model, if local, needs a dedicated GPU. That's why we monitor everything with a combination of native Linux tools and Prometheus. We have systemd-timer scripts that collect CPU, RAM, disk I/O, and GPU temperature metrics every 30 seconds. If RAM exceeds 85% or GPU temperature exceeds 82°C, we get a Slack alert. We've saved at least two servers from the OOM killer by intervening proactively. Server sizing is an art we've refined through experience. For a RAG Enterprise installation with up to 50,000 documents, we recommend minimum 32 GB RAM, 8 cores, and 500 GB NVMe SSD. For larger installations (over 200,000 documents) we spec 64 GB RAM and consider an NVIDIA T4 GPU for local embedding. These specs aren't theory — they're the result of 20+ real deployments.

Related Services

See how we apply these technologies in our enterprise projects.

AI Enterprise Software AI Integration On-Premise Solutions Software Development

Interested?

All articles

Securvita S.r.l. — i3k.eu