# AI Inference instances

## What is Sesterce AI Inference service?

We built our inference feature to enable our users to bring their ML model to life by deploying it in a dedicated production environment accessible to all via an endpoint.

You can use our inference service to **deploy your own custom model** to make it accessible to your users, or to [**infer with the best models**](#pre-charged-public-models) on the market which you can then seamlessly integrate into your applications.

{% hint style="info" %}
In addition to the [classic compute instances](/compute-instances.md) that are very useful for building and training your model, the **AI inference feature** is an additional brick that **will allow you to manage also the deployment** of your ML Model **as closely as possible to your users**.
{% endhint %}

## Deploy your Model as closely as possible to your customers

Sesterce's AI inference feature allows you to deploy your model as close as possible to your end-users, to guarantee minimal latency, here's why:

{% stepper %}
{% step %}

### Edge inference nodes distributed around the world

Processes data locally at the network's edge, minimizing latency and bandwidth usage for real-time applications.
{% endstep %}

{% step %}

### Anycast Endpoint setupped automatically

Directs user requests to the nearest instance of a service, optimizing performance and reducing response time.
{% endstep %}

{% step %}

### Smart Routing technology to 180 points of presence worldwide&#x20;

End users' queries are routed to the closest active model, ensuring low latency and an improved user experience.
{% endstep %}
{% endstepper %}

## Pre-charged public models

Here is a non exhaustive list of models available on Sesterce Cloud. [Click here to discover the entire model catalog](https://cloud.sesterce.com/ai-inference)!

<table data-full-width="false"><thead><tr><th width="202">Model</th><th width="162.33333333333331">Type</th><th>Description</th></tr></thead><tbody><tr><td>distilbert-base</td><td>Text processing</td><td>A smaller, faster version of BERT used for natural language tasks.</td></tr><tr><td>stable-diffusion</td><td>Text-to-image</td><td>Generates images from text descriptions using deep learning techniques.</td></tr><tr><td>stable-cascade</td><td>Text-to-image</td><td>Enhances image generation with multiple refinement steps.</td></tr><tr><td>sdxl-lightning</td><td>Text-to-image</td><td>Optimized for fast image generation from text inputs.</td></tr><tr><td>ResNet-50</td><td>Image classification</td><td>A convolutional neural network designed for image recognition tasks.</td></tr><tr><td>Llama-Pro-8b</td><td>Text generation</td><td>A large language model designed for generating human-like text.</td></tr><tr><td>Llama-3.2-3B-Instruct</td><td>Text generation</td><td>An instruction-tuned model for generating text with specific guidelines.</td></tr><tr><td>Mistral-Nemo-Instruct-2407</td><td>Text generation</td><td>Tailored for creating text based on given instructions.</td></tr><tr><td>Llama-3.1-8B-Instruct</td><td>Text generation</td><td>An advanced model for generating text with detailed instructions.</td></tr><tr><td>Pixtral-12B-2409</td><td>Text-to-image</td><td>Produces high-quality images from text prompts using a large model.</td></tr><tr><td>Llama-3.2-1B-Instruct</td><td>Text generation</td><td>Focused on generating text according to user-provided instructions.</td></tr><tr><td>Mistral-7B-Instruct-v.0.3</td><td>Text generation</td><td>Designed for generating guided text outputs with minimal latency.</td></tr><tr><td>Whisper-large-V3-turbo</td><td>Audio-to-text</td><td>Quickly transcribes audio into text with high accuracy.</td></tr><tr><td>Whisper-large-V3</td><td>Audio-to-text</td><td>Transcribes spoken language into written text using deep learning.</td></tr></tbody></table>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.sesterce.com/ai-inference-instances.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
