DeepSeek and Ollama: Advanced AI on bare-metal servers and private cloud with Stackscale

Deploying DeepSeek on Dedicated Servers and Private Cloud

Generative AI and machine learning have revolutionized how businesses process and analyze data. DeepSeek-R1, an open-source AI model, stands out for its advanced reasoning capabilities, resource optimization, and security, as it runs locally. By combining Ollama with Stackscale’s infrastructure, organizations can deploy DeepSeek-R1 on bare-metal dedicated servers or private cloud environments, ensuring high performance and complete data sovereignty.

Why Use DeepSeek-R1 on Private Infrastructure?

1. Security and Privacy

DeepSeek-R1 enables local data processing, eliminating reliance on external servers and ensuring sensitive information remains protected.

2. Optimized Costs

Running DeepSeek-R1 on dedicated GPU servers from Stackscale reduces the need for cloud-based AI services, eliminating token-based fees or recurring monthly costs.

3. Enhanced Performance with NVIDIA GPUs

Stackscale’s infrastructure offers NVIDIA Tesla T4, L4, and L40S GPUs, designed to accelerate AI workloads, machine learning, and high-performance computing (HPC).

4. Flexibility and Scalability

Private cloud and bare-metal servers allow businesses to scale resources based on project demand, ensuring stability and full control over infrastructure.


Installing DeepSeek-R1 with Ollama

To deploy DeepSeek-R1 on dedicated servers or private cloud, Ollama is the recommended tool for managing local AI models.

Installation Steps:

1️⃣ Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

2️⃣ Download DeepSeek-R1

ollama pull deepseek-r1:8b

(The 8B version provides a balance between performance and resource usage. Other versions are available depending on the GPU.)

3️⃣ Run DeepSeek-R1

ollama run deepseek-r1:8b

Once started, DeepSeek-R1 will be ready to process queries within the private environment, with no need for an internet connection.

Replace 8B with the desired model version:

  • 1.5B parameters: ollama run deepseek-r1:1.5b
  • 7B parameters: ollama run deepseek-r1
  • 70B parameters (Requires 24GB+ VRAM): ollama run deepseek-r1:70b
  • Full-scale 671B model: ollama run deepseek-r1:671b

Optimizing GPU Acceleration

  • Ensure NVIDIA CUDA drivers are installed.
  • Use ollama list to check installed models.
  • Start the service: ollama serve

Enhancing Performance with Open WebUI

For an improved user experience, Open WebUI provides a browser-based interface to interact with AI models running on Ollama. Key features include:
Model switching via @ commands.
Conversation tagging and management.
Easy model download and removal.


Optimizing Performance with NVIDIA GPUs on Stackscale

For maximum DeepSeek-R1 performance, using optimized GPUs is recommended. Stackscale offers:

GPUMemoryTensor CoresShading UnitsTFLOPS (FP32)
Tesla T416 GB GDDR63202,5608.1
L424 GB GDDR62407,02430.3
L40S48 GB GDDR658618,17691.6

These GPUs accelerate AI processing, reduce execution times, and optimize computing resources.


Benefits of Using Stackscale for AI and Machine Learning

100% European Infrastructure: Servers in Madrid and Amsterdam, ensuring full data sovereignty.
High Availability: 99.90% SLA, with redundant power supply and ultra-fast networks.
Complete Isolation: No resource oversubscription, ensuring dedicated high-performance computing.
24/7 Support: Specialized technical assistance in English and Spanish.


Conclusion

Deploying DeepSeek-R1 with Ollama on Stackscale’s private infrastructure enables businesses to access an optimized, secure AI environment with full data control. With high-end GPUs and an infrastructure designed for demanding workloads, Stackscale provides the ideal solution for AI and machine learning projects.

If you need more information about our dedicated GPU solutions and private cloud, contact us, and our team will help you configure the ideal setup for your project.

Share it on Social Media!