Sesterce Cloud Doc
  • 👋Welcome on Sesterce Cloud
    • Glossary
    • Account creation
    • Manage your account
    • Payment & Billing
    • Invoicing
  • 🚀Compute instances
    • Configure your Compute Instance
      • SSH Keys management
      • Persistent storage (volumes)
    • Terminal connection
  • 💬AI Inference instances
    • Inference Instance configuration
      • Select your Flavor
      • Select your regions
      • Autoscaling limits
    • Edit an inference instance
    • Chat with Endpoint
  • ▶️Manage your instances
  • 🔗API Reference
    • Authentication
    • GPU Cloud instances
    • SSH Keys
    • Volumes
    • Inference Instances
  • 📗Tutorials
    • Which compute instance for AI models training and inference?
    • Expose AI model from Hugging Face using vLLM
Powered by GitBook
On this page
  • From Chat Playground
  • From your Terminal
  • Prerequisites
  • Authentication
  • Model types and endpoints
  • Open AI SDK integration
  • Best Practices
  • Supported Formats and Limitations
  • Error Handling

Was this helpful?

  1. AI Inference instances

Chat with Endpoint

PreviousEdit an inference instanceNextManage your instances

Last updated 2 months ago

Was this helpful?

From Chat Playground

When your inference instance turns active, a button "Open Playground" will appear if the model hosted is Open AI compatible.

Click the button to access Chat Playground! You'll be able to interact with your endpoint, and change parameters such as Temperature, Top-p, Top-k and repetition penalty.

From your Terminal

To interact with the endpoint directly from your terminal, you'll need to follow this process. Our endpoints follow OpenAI's specification, enabling seamless integration with existing tools and libraries.

Prerequisites

  • Your endpoint URL (format: https://<id>-<hash>.ai.sesterce.dev/)

  • Your API Secret (provided upon deployment)

  • Model ID (retrieved via API)

  • OpenAI-compatible client or SDK

Authentication

Verifying connection

First, list available models:

curl -H "x-api-key: <SECRET>" -X GET "<ENDPOINT>/v1/models"

Model types and endpoints

Ensure to replace <SECRET> by your own secret (available from launched instance page), as well as <MODEL_ID> and <ENDPOINT>, to be replaced by your own data.

1. Text generation

Endpoint: /v1/chat/completions

curl -H "Content-Type: application/json" \
     -H "x-api-key: <SECRET>" \
     -X POST "<ENDPOINT>/v1/chat/completions" \
     -d '{
       "model": "<MODEL_ID>",
       "messages": [
         {
           "role": "user",
           "content": "Hello, how are you?"
         }
       ]
     }'

2. Multimodel (text+image)

Endpoint: /v1/chat/completions

curl -H "Content-Type: application/json" \
     -H "x-api-key: <SECRET>" \
     -X POST "<ENDPOINT>/v1/chat/completions" \
     -d '{
       "model": "<MODEL_ID>",
       "messages": [
         {
           "role": "user",
           "content": [
             {
               "type": "text",
               "text": "What's in this image?"
             },
             {
               "type": "image_url",
               "image_url": {
                 "url": "https://example.com/image.jpg"
               }
             }
           ]
         }
       ]
     }'

3. Audio-Speech Recognition (ASR)

Endpoint: /v1/audio/transcriptions

curl -H "x-api-key: <SECRET>" \
     -X POST "<ENDPOINT>/v1/audio/transcriptions" \
     -H "Content-Type: multipart/form-data" \
     -F file="@/path/to/audio.mp3" \
     -F model="<MODEL_ID>"

Open AI SDK integration

Javascript/Typescript

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: "<SECRET>",
    baseURL: "<ENDPOINT>/v1"
});

// Text Generation
async function generateText() {
    const completion = await openai.chat.completions.create({
        model: "<MODEL_ID>",
        messages: [
            {
                role: "user",
                content: "Hello, how are you?"
            }
        ]
    });
    console.log(completion.choices[0].message.content);
}

// Multimodal
async function analyzeImage() {
    const response = await openai.chat.completions.create({
        model: "<MODEL_ID>",
        messages: [
            {
                role: "user",
                content: [
                    {
                        type: "text",
                        text: "What's in this image?"
                    },
                    {
                        type: "image_url",
                        image_url: {
                            url: "https://example.com/image.jpg"
                        }
                    }
                ]
            }
        ]
    });
}

Python

from openai import OpenAI

client = OpenAI(
    api_key="<SECRET>",
    base_url="<ENDPOINT>/v1"
)

# Text Generation
response = client.chat.completions.create(
    model="<MODEL_ID>",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

# Audio Transcription
with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="<MODEL_ID>", 
        file=audio_file
    )

Best Practices

  1. Performance Optimization

    • Use batch processing for multiple requests

    • Implement caching when possible

    • Compress files before upload

    • Use URLs for large files

  2. Security

    • Never share your API Secret

    • Implement rate limiting

    • Monitor usage patterns

    • Secure stored credentials

  3. Integration Tips

    • Always validate model availability

    • Implement proper error handling

    • Use the SDK for robust integration

    • Keep dependencies updated

Supported Formats and Limitations

File Formats

  • Images: JPEG, PNG, WEBP, GIF

  • Audio: mp3, mp4, mpeg, mpga, m4a, wav, webm

Size Limits

  • Images: Maximum 20MB

  • Audio: Maximum 25MB

  • Audio Duration: Up to 4 hours

Error Handling

Common Error Codes

  • 401: Invalid API Secret

  • 404: Model not found

  • 429: Too many requests

  • 500: Server error

Error Handling example

try:
    response = client.chat.completions.create(...)
except Exception as e:
    if "file too large" in str(e):
        # Handle size error
    elif "unsupported file type" in str(e):
        # Handle format error
    else:
        # Handle other errors

If you need support, please reach us at support@sesterce.com.

💬