# AI Inference instances

## What is Sesterce AI Inference service?

We built our inference feature to enable our users to bring their ML model to life by deploying it in a dedicated production environment accessible to all via an endpoint.

You can use our inference service to **deploy your own custom model** to make it accessible to your users, or to [**infer with the best models**](#pre-charged-public-models) on the market which you can then seamlessly integrate into your applications.

{% hint style="info" %}
In addition to the [classic compute instances](https://docs.sesterce.com/compute-instances) that are very useful for building and training your model, the **AI inference feature** is an additional brick that **will allow you to manage also the deployment** of your ML Model **as closely as possible to your users**.
{% endhint %}

## Deploy your Model as closely as possible to your customers

Sesterce's AI inference feature allows you to deploy your model as close as possible to your end-users, to guarantee minimal latency, here's why:

{% stepper %}
{% step %}

### Edge inference nodes distributed around the world

Processes data locally at the network's edge, minimizing latency and bandwidth usage for real-time applications.
{% endstep %}

{% step %}

### Anycast Endpoint setupped automatically

Directs user requests to the nearest instance of a service, optimizing performance and reducing response time.
{% endstep %}

{% step %}

### Smart Routing technology to 180 points of presence worldwide&#x20;

End users' queries are routed to the closest active model, ensuring low latency and an improved user experience.
{% endstep %}
{% endstepper %}

## Pre-charged public models

Here is a non exhaustive list of models available on Sesterce Cloud. [Click here to discover the entire model catalog](https://cloud.sesterce.com/ai-inference)!

<table data-full-width="false"><thead><tr><th width="202">Model</th><th width="162.33333333333331">Type</th><th>Description</th></tr></thead><tbody><tr><td>distilbert-base</td><td>Text processing</td><td>A smaller, faster version of BERT used for natural language tasks.</td></tr><tr><td>stable-diffusion</td><td>Text-to-image</td><td>Generates images from text descriptions using deep learning techniques.</td></tr><tr><td>stable-cascade</td><td>Text-to-image</td><td>Enhances image generation with multiple refinement steps.</td></tr><tr><td>sdxl-lightning</td><td>Text-to-image</td><td>Optimized for fast image generation from text inputs.</td></tr><tr><td>ResNet-50</td><td>Image classification</td><td>A convolutional neural network designed for image recognition tasks.</td></tr><tr><td>Llama-Pro-8b</td><td>Text generation</td><td>A large language model designed for generating human-like text.</td></tr><tr><td>Llama-3.2-3B-Instruct</td><td>Text generation</td><td>An instruction-tuned model for generating text with specific guidelines.</td></tr><tr><td>Mistral-Nemo-Instruct-2407</td><td>Text generation</td><td>Tailored for creating text based on given instructions.</td></tr><tr><td>Llama-3.1-8B-Instruct</td><td>Text generation</td><td>An advanced model for generating text with detailed instructions.</td></tr><tr><td>Pixtral-12B-2409</td><td>Text-to-image</td><td>Produces high-quality images from text prompts using a large model.</td></tr><tr><td>Llama-3.2-1B-Instruct</td><td>Text generation</td><td>Focused on generating text according to user-provided instructions.</td></tr><tr><td>Mistral-7B-Instruct-v.0.3</td><td>Text generation</td><td>Designed for generating guided text outputs with minimal latency.</td></tr><tr><td>Whisper-large-V3-turbo</td><td>Audio-to-text</td><td>Quickly transcribes audio into text with high accuracy.</td></tr><tr><td>Whisper-large-V3</td><td>Audio-to-text</td><td>Transcribes spoken language into written text using deep learning.</td></tr></tbody></table>
