# Which compute instance for AI models training and inference?

Choosing the right GPU instance is crucial to the success of your AI projects. An optimal configuration not only improves performance, but also keeps costs under control. This guide will help you navigate through the various options available to find the ideal solution for your needs.

## Why is the compute instance choice so important?

Your choice of GPU infrastructure has a direct impact on..:

* **Performance**: model training speed and inference latency
* **Cost**: optimizing your budget by avoiding over-sizing
* **Scalability**: ability to scale according to your needs
* **Reliability**: stability of your workloads in production

## Which compute instance to choose for model training?

### 1. Large Language Models (LLMs) models training

Fine-tuning Large Language Models represents one of the most resource-intensive tasks in modern AI development.&#x20;

The hardware requirements vary significantly based on model size, from smaller 7B parameter models to massive 70B+ architectures. This section will help you select the optimal configuration for your fine-tuning project, ensuring efficient resource utilization while maintaining performance.

<table><thead><tr><th width="147.28125">Model Size</th><th width="125.578125">Server type</th><th width="181.26171875">VRAM</th><th width="247.87890625">Recommended Offers</th></tr></thead><tbody><tr><td>Small (less than 7B parameters)</td><td>VM</td><td>From <strong>22 GB</strong> (for 1B parameters models) to <strong>140 GB</strong> (to fine-tune models such as DeepSeek-R1 7B)</td><td><p>1 to 3B models:<br><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span> The most cost-effective: <a href="https://cloud.sesterce.com/compute/new?gpuType=A100_80G&#x26;numGpus=1"><strong>1xA100 80G</strong></a></p><p><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span> The most efficient: </p><p><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=1"><strong>1xH100 80G</strong></a><br>7B models:<br><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span> <a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=1"><strong>1xH200 141 GB</strong></a></p></td></tr><tr><td>Medium (12B-32B)</td><td>VM or Bare-Metal</td><td>From <strong>200</strong> to <strong>500 GB</strong> (to fine-tune models such as DeepSeek-R1 32B)</td><td><p>12 to 14B models:<br><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span> <a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=2"><strong>2xH200 141 GB</strong></a><br>27 to 32B models:</p><p><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span><a href="https://cloud.sesterce.com/compute/new?gpuType=A100_80G&#x26;numGpus=8"> <strong>8xA100 80G Bare Metal</strong></a><br>or <a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=4"><strong>4xH200 141 GB</strong></a></p></td></tr><tr><td>Large (70B and more)</td><td>Bare-Metal</td><td>More than <strong>1000GB</strong></td><td><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span> <a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8xH200 141 GB Bare-Metal</strong></a><br><span data-gb-custom-inline data-tag="emoji" data-code="1f449">👉</span> <a href="https://cloud.sesterce.com/compute/new?gpuType=B200&#x26;numGpus=8"><strong>8xB200 192 GB Bare-Metal</strong></a></td></tr></tbody></table>

### 2. Computer Vision models training

Whether you're developing object detection systems, processing medical imagery, or creating next-generation AI art, selecting the right GPU infrastructure is crucial for your success.

The computational requirements for vision tasks vary significantly based on complexity and scale. Classification tasks might require modest GPU power, while advanced generative models demand substantial computational resources. This section outlines three primary categories of computer vision workloads - classification, segmentation, and generation - each with its unique hardware requirements and optimal configurations.

<table><thead><tr><th width="253.15625">Use Case</th><th width="148.96875">Models example</th><th width="156.046875">Resource intensity</th><th>Batch</th></tr></thead><tbody><tr><td><a href="#classification-models"><strong>Classification</strong></a> (images, videos): object detection, face recognition</td><td>ResNet, YOLO</td><td>⭐️⭐️⭐️</td><td>32-64</td></tr><tr><td><a href="#segmentation-models"><strong>Segmentation</strong></a> (pixel-level): medical imaging, satellite analysis</td><td>U-Net, DeepLab</td><td>⭐️⭐️⭐️⭐️</td><td>16-32</td></tr><tr><td><a href="#generative-models"><strong>Generative models</strong></a> (stable diffusion): image generation, AI art</td><td>Stable Diffusion</td><td>⭐️⭐️⭐️⭐️⭐️</td><td>8-16</td></tr></tbody></table>

#### Classification Models

<table><thead><tr><th width="114.4140625">Data Volume</th><th width="112.734375">Training Time</th><th width="176.16015625">Recommended Instance</th><th width="88.5">Type</th><th>Comment</th></tr></thead><tbody><tr><td>&#x3C;100GB</td><td>&#x3C;24h</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=1"><strong>1xL40S 48 GB</strong></a></td><td>VM</td><td>Perfect for development and testing</td></tr><tr><td>100-500GB</td><td>1-3 days</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=1"><strong>1xH100 80GB</strong></a></td><td>VM</td><td>Parallel processing beneficial</td></tr><tr><td>500GB-1TB</td><td>3-7 days</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=2"><strong>2xL40S 48 GB</strong></a></td><td>VM/BM</td><td>Higher throughput needed</td></tr><tr><td>>1TB</td><td>>1 week</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8xH100 80 GB</strong></a></td><td>BM</td><td>Bare Metal for optimal performance</td></tr></tbody></table>

#### Segmentation Models

<table><thead><tr><th width="112.78515625">Data Volume</th><th width="111.54296875">Training Time</th><th width="180.3828125">Recommended Instance</th><th width="87.15234375">Type</th><th>Comment</th></tr></thead><tbody><tr><td>&#x3C; 200GB</td><td>&#x3C; 48h</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=2"><strong>2x H200 141 GB</strong></a></td><td>VM</td><td>High-res image processing</td></tr><tr><td>200GB-1TB</td><td>3-5 days</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=A100_80G&#x26;numGpus=4"><strong>4x A100 80GB</strong></a></td><td>VM</td><td>Multiple batch processing</td></tr><tr><td>1TB-5TB</td><td>1-2 weeks</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100 80 GB</strong></a></td><td>BM</td><td>Heavy data augmentation</td></tr><tr><td>> 5TB</td><td>> 2 weeks</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200 80 GB</strong></a></td><td>BM</td><td>Maximum processing power</td></tr></tbody></table>

#### Generative Models

<table><thead><tr><th width="113.91796875">Data Volume</th><th width="110.5625">Training Time</th><th width="170.08203125">Recommended Instance</th><th width="101.39453125">Type</th><th>Notes</th></tr></thead><tbody><tr><td>&#x3C; 500GB</td><td>&#x3C; 3 days</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>4x H100 80 GB</strong></a></td><td>VM</td><td>Model fine-tuning</td></tr><tr><td>500GB-2TB</td><td>3-7 days</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100 80 GB</strong></a></td><td>BM</td><td>Full model training</td></tr><tr><td>2TB-10TB</td><td>1-3 weeks</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200 141 GB</strong></a></td><td>BM</td><td>Large scale training</td></tr><tr><td>> 10TB</td><td>> 3 weeks</td><td><a href="https://www.sesterce.com/booking"><strong>16x H100 80GB</strong></a></td><td>BM</td><td>Distributed training</td></tr></tbody></table>

### 3. Audio/Speech models training

Whether you're developing voice recognition systems, building text-to-speech applications, or exploring the cutting edge of AI music generation, selecting the right GPU infrastructure is crucial for successful model training.

This section outlines three primary categories of audio ML workloads - speech recognition, text-to-speech synthesis, and audio generation - each requiring specific hardware configurations to achieve optimal performance.

| Use Case           | Input Data Type                               | Dataset Size | Model Examples                  | Resource Intensity |
| ------------------ | --------------------------------------------- | ------------ | ------------------------------- | ------------------ |
| Speech Recognition | Audio files (.wav, .mp3), Labeled transcripts | 100GB-1TB    | Whisper, DeepSpeech, Wav2Vec    | ⭐⭐⭐                |
| Text-to-Speech     | Text corpus, Audio pairs                      | 50-500GB     | Tacotron, FastSpeech, VALL-E    | ⭐⭐⭐⭐               |
| Audio Generation   | Audio samples, MIDI files                     | 1-2TB        | MusicLM, AudioLDM, Stable Audio | ⭐⭐⭐⭐⭐              |

#### Speech Recognition Models

<table><thead><tr><th>Model Size (param)</th><th width="138.01953125">VRAM/RAM Needed</th><th width="164.63671875">Recommended Instance</th><th width="101.12890625">Type</th><th width="156.2421875">Comment</th></tr></thead><tbody><tr><td>Small (&#x3C; 100M params)</td><td>24GB/64GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=A6000&#x26;numGpus=1"><strong>1x A6000 48GB</strong></a></td><td>VM</td><td>Development/testing</td></tr><tr><td>Medium (100M-500M)</td><td>48GB/128GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=2"><strong>2x L40S 48GB</strong></a><br><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=1"><strong>1x H100 80GB</strong></a></td><td>VM</td><td>Production training</td></tr><tr><td>Large (500M-1B)</td><td>96GB/256GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=4"><strong>4x L40S 48GB</strong></a></td><td>VM/BM</td><td>Large scale training</td></tr><tr><td>Very Large (>1B)</td><td>160GB/384GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8xH100 80GB</strong></a></td><td>BM</td><td>Enterprise scale</td></tr></tbody></table>

#### Text-to-Speech Models

<table><thead><tr><th width="143.1796875">Model Size</th><th width="143.52734375">VRAM/RAM Needed</th><th width="160.80078125">Recommended Instance</th><th width="102.390625">Type</th><th>Notes</th></tr></thead><tbody><tr><td>Small (&#x3C; 200M params)</td><td>80GB/192GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>2x H100 80GB</strong></a></td><td>VM</td><td>Basic TTS</td></tr><tr><td>Medium (200M-500M)</td><td>160GB/384GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>4xH100 80GB</strong></a></td><td>VM</td><td>Multi-speaker</td></tr><tr><td>Large (500M-1B)</td><td>320GB/768GB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100 80GB</strong></a></td><td>BM</td><td>High-quality TTS</td></tr><tr><td>Very Large (>1B)</td><td>640GB/1TB</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200 141GB</strong></a></td><td>BM</td><td>Enterprise TTS</td></tr></tbody></table>

## Which compute instance to choose for model inference?

### 1. Large Language Models (LLMs) inference

Whether you're serving chatbots, content generation, or text analysis applications, choosing the right infrastructure is crucial for balancing performance, cost, and user experience.

The requirements for LLM inference vary significantly based on several key factors: model size (from 7B to 70B+ parameters), user load (from individual testing to thousands of concurrent users), and latency requirements (from real-time chat applications to batch processing). Each of these factors directly impacts your choice of infrastructure, from single GPU instances to distributed multi-GPU deployments.

#### LLM Inference Sizing - Small Scale (1-50 concurrent users)

<table><thead><tr><th width="113.4453125">Model Size</th><th width="143.8046875">Concurrent Users</th><th width="155.19140625">VRAM/RAM needed</th><th>Latency Target</th><th>Recommended Instance</th><th>Type</th><th>Estimated RPS*</th><th>Comment</th></tr></thead><tbody><tr><td>7B</td><td>1-10</td><td>16GB/32GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=RTX4090&#x26;numGpus=1"><strong>1x RTX4090</strong></a></td><td>VM</td><td>15-20</td><td>Development/testing</td></tr><tr><td>7B</td><td>11-25</td><td>24GB/64GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=1"><strong>1x L40S</strong></a></td><td>VM</td><td>30-40</td><td>Small production</td></tr><tr><td>7B</td><td>26-50</td><td>48GB/128GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=1"><strong>1x H100</strong></a></td><td>VM</td><td>60-80</td><td>Medium production</td></tr><tr><td>13B</td><td>1-10</td><td>24GB/64GB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=RTX4090&#x26;numGpus=2"><strong>1x RTX4090</strong></a></td><td>VM</td><td>10-15</td><td>Development/testing</td></tr><tr><td>13B</td><td>11-25</td><td>48GB/128GB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=1"><strong>1x H100</strong></a></td><td>VM</td><td>25-35</td><td>Small production</td></tr><tr><td>13B</td><td>26-50</td><td>80GB/192GB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>2x H100</strong></a></td><td>VM/BM</td><td>50-70</td><td>Medium production</td></tr><tr><td>70B</td><td>1-10</td><td>80GB/192GB</td><td>&#x3C; 200ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>2x H100</strong></a></td><td>VM/BM</td><td>5-8</td><td>Small production</td></tr><tr><td>70B</td><td>11-25</td><td>160GB/384GB</td><td>&#x3C; 200ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>4x H100</strong></a></td><td>BM</td><td>15-20</td><td>Medium production</td></tr><tr><td>70B</td><td>26-50</td><td>320GB/768GB</td><td>&#x3C; 200ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>8x H100</strong></a></td><td>BM</td><td>35-45</td><td>Large production</td></tr></tbody></table>

#### LLM inference Sizing - Medium Scale (51-200 concurrent users)

<table><thead><tr><th width="114.2421875">Model Size</th><th width="141.48828125">Concurrent Users</th><th width="160.5546875">VRAM/RAM</th><th>Latency Target</th><th>Recommended Instance</th><th>Type</th><th>Estimated RPS*</th><th>Notes</th></tr></thead><tbody><tr><td>7B</td><td>51-100</td><td>48GB/128GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=2"><strong>2x L40S</strong></a></td><td>VM</td><td>100-120</td><td>Production</td></tr><tr><td>7B</td><td>101-150</td><td>80GB/192GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>2x H100</strong></a></td><td>VM/BM</td><td>150-180</td><td>High-performance</td></tr><tr><td>7B</td><td>151-200</td><td>160GB/384GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>4x H100</strong></a></td><td>BM</td><td>200-240</td><td>Enterprise scale</td></tr><tr><td>13B</td><td>51-100</td><td>160GB/384GB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>4x H100</strong></a></td><td>BM</td><td>80-100</td><td>Production</td></tr><tr><td>13B</td><td>101-150</td><td>240GB/512GB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=4"><strong>4x H200</strong></a></td><td>BM</td><td>120-150</td><td>High-performance</td></tr><tr><td>13B</td><td>151-200</td><td>320GB/768GB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100</strong></a></td><td>BM</td><td>160-200</td><td>Enterprise scale</td></tr><tr><td>70B</td><td>51-100</td><td>480GB/1TB</td><td>&#x3C; 200ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>60-80</td><td>Production</td></tr><tr><td>70B</td><td>101-150</td><td>640GB/1.5TB</td><td>&#x3C; 200ms</td><td><a href="https://www.sesterce.com/booking"><strong>16x H100</strong></a></td><td>BM</td><td>90-120</td><td>High-performance</td></tr><tr><td>70B</td><td>151-200</td><td>800GB/2TB</td><td>&#x3C; 200ms</td><td><a href="https://www.sesterce.com/booking"><strong>16x H200</strong></a></td><td>BM</td><td>140-180</td><td>Enterprise scale</td></tr></tbody></table>

#### LLM Inference Sizing - Large Scale (201-1000+ concurrent users)

<table><thead><tr><th width="111.828125">Model Size</th><th>Concurrent Users</th><th width="173.234375">VRAM/RAM</th><th width="135.875">Latency Target</th><th>Recommended Instance</th><th>Type</th><th>Estimated RPS*</th><th>Notes</th></tr></thead><tbody><tr><td>7B</td><td>201-500</td><td>320GB/768GB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100</strong></a></td><td>BM</td><td>300-400</td><td>Enterprise scale</td></tr><tr><td>7B</td><td>501-1000</td><td>640GB/1.5TB</td><td>&#x3C; 100ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>600-800</td><td>High-scale production</td></tr><tr><td>7B</td><td>1000+</td><td>1.2TB/2.5TB</td><td>&#x3C; 100ms</td><td><a href="https://www.sesterce.com/booking"><strong>16xH200</strong></a></td><td>BM</td><td>1000+</td><td>Distributed clusters</td></tr><tr><td>13B</td><td>201-500</td><td>480GB/1TB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>250-350</td><td>Enterprise scale</td></tr><tr><td>13B</td><td>501-1000</td><td>800GB/2TB</td><td>&#x3C; 150ms</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>500-700</td><td>High-scale production</td></tr><tr><td>13B</td><td>1000+</td><td>1.6TB/3TB</td><td>&#x3C; 150ms</td><td><a href="https://www.sesterce.com/booking"><strong>16x H200</strong></a></td><td>BM</td><td>800+</td><td>Distributed clusters</td></tr><tr><td>70B</td><td>201-500</td><td>1.2TB/2.5TB</td><td>&#x3C; 200ms</td><td><a href="https://www.sesterce.com/booking"><strong>16x H100</strong></a></td><td>BM</td><td>200-300</td><td>Enterprise scale</td></tr><tr><td>70B</td><td>501-1000</td><td>2TB/4TB</td><td>&#x3C; 200ms</td><td><a href="https://www.sesterce.com/booking"><strong>24x H200</strong></a></td><td>BM</td><td>400-600</td><td>High-scale production</td></tr><tr><td>70B</td><td>1000+</td><td>3TB/6TB</td><td>&#x3C; 200ms</td><td><a href="https://www.sesterce.com/booking"><strong>32x H200</strong></a></td><td>BM</td><td>700+</td><td>Distributed clusters</td></tr></tbody></table>

### 2. Image Generation Inference Sizing

Deploying image generation models like Stable Diffusion for production introduces unique infrastructure challenges compared to traditional ML workload.

The hardware requirements vary significantly based on three key factors: model complexity (from base models to SDXL with refiners), concurrent user load (affecting batch processing and queue management), and image generation parameters (resolution, steps, and additional features like ControlNet or inpainting). Each of these factors directly impacts your choice of infrastructure and can significantly affect both performance and operational costs.

#### Small scale (1-50 concurrent users)

<table><thead><tr><th width="116.60546875">Model Type</th><th>Concurrent Users</th><th width="162.890625">VRAM/RAM</th><th>Latency Target*</th><th width="161.92578125">Recommended Instance</th><th>Type</th><th>Images/Minute**</th><th>Notes</th></tr></thead><tbody><tr><td>SD XL Base</td><td>1-10</td><td>16GB/32GB</td><td>&#x3C; 3s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=RTX4090&#x26;numGpus=1"><strong>1x RTX4090</strong></a></td><td>VM</td><td>15-20</td><td>Development/testing</td></tr><tr><td>SD XL Base</td><td>11-25</td><td>24GB/64GB</td><td>&#x3C; 3s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=1"><strong>1x L40S</strong></a></td><td>VM</td><td>30-40</td><td>Small production</td></tr><tr><td>SD XL Base</td><td>26-50</td><td>48GB/128GB</td><td>&#x3C; 3s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>1x H100</strong></a></td><td>VM</td><td>60-80</td><td>Medium production</td></tr><tr><td>SD XL + Refiner</td><td>1-10</td><td>24GB/64GB</td><td>&#x3C; 5s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=L40S&#x26;numGpus=1"><strong>1x L40S</strong></a></td><td>VM</td><td>10-15</td><td>Development/testing</td></tr><tr><td>SD XL + Refiner</td><td>11-25</td><td>48GB/128GB</td><td>&#x3C; 5s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>1x H100</strong></a></td><td>VM</td><td>25-35</td><td>Small production</td></tr><tr><td>SD XL + Refiner</td><td>26-50</td><td>80GB/192GB</td><td>&#x3C; 5s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=2"><strong>2x H100</strong></a></td><td>VM or BM</td><td>50-70</td><td>Medium production</td></tr></tbody></table>

#### Medium scale (51-200 concurrent users)

<table><thead><tr><th>Model Type</th><th>Concurrent Users</th><th width="164.40234375">VRAM/RAM</th><th>Latency Target*</th><th>Recommended Instance</th><th>Type</th><th>Images/Minute**</th><th>Notes</th></tr></thead><tbody><tr><td>SD XL Base</td><td>51-100</td><td>160GB/384GB</td><td>&#x3C; 3s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=4"><strong>4x H100</strong></a></td><td>VM or BM</td><td>120-150</td><td>Production</td></tr><tr><td>SD XL Base</td><td>101-150</td><td>320GB/768GB</td><td>&#x3C; 3s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100</strong></a></td><td>BM</td><td>200-250</td><td>High-performance</td></tr><tr><td>SD XL Base</td><td>151-200</td><td>480GB/1TB</td><td>&#x3C; 3s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>300-350</td><td>Enterprise scale</td></tr><tr><td>SD XL + Refiner</td><td>51-100</td><td>320GB/768GB</td><td>&#x3C; 5s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H100&#x26;numGpus=8"><strong>8x H100</strong></a></td><td>BM</td><td>100-130</td><td>Production</td></tr><tr><td>SD XL + Refiner</td><td>101-150</td><td>480GB/1TB</td><td>&#x3C; 5s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>180-220</td><td>High-performance</td></tr><tr><td>SD XL + Refiner</td><td>151-200</td><td>640GB/1.5TB</td><td>&#x3C; 5s</td><td><a href="https://cloud.sesterce.com/compute/new?gpuType=H200&#x26;numGpus=8"><strong>8x H200</strong></a></td><td>BM</td><td>250-300</td><td>Enterprise scale</td></tr></tbody></table>

#### Large scale (201-1000+ concurrent users)

| Model Type      | Concurrent Users | VRAM/RAM  | Latency Target\* | Recommended Instance                             | Type | Images/Minute\*\* | Notes               |
| --------------- | ---------------- | --------- | ---------------- | ------------------------------------------------ | ---- | ----------------- | ------------------- |
| SD XL Base      | 201-500          | 800GB/2TB | < 3s             | [**16x H100**](https://www.sesterce.com/booking) | BM   | 400-500           | Multi-cluster       |
| SD XL Base      | 501-1000         | 1.6TB/4TB | < 3s             | [**16x H200**](https://www.sesterce.com/booking) | BM   | 800-1000          | Distributed system  |
| SD XL Base      | 1000+            | 2.4TB/6TB | < 3s             | [**24x H200**](https://www.sesterce.com/booking) | BM   | 1500+             | Global distribution |
| SD XL + Refiner | 201-500          | 1.2TB/3TB | < 5s             | [**24x H100**](https://www.sesterce.com/booking) | BM   | 350-450           | Multi-cluster       |
| SD XL + Refiner | 501-1000         | 2TB/5TB   | < 5s             | [**32x H200**](https://www.sesterce.com/booking) | BM   | 700-900           | Distributed system  |
| SD XL + Refiner | 1000+            | 3TB/8TB   | < 5s             | [**40x H200**](https://www.sesterce.com/booking) | BM   | 1200+             | Global distribution |
