Chat with Endpoint

From Chat Playground

When your inference instance turns active, a button "Open Playground" will appear if the model hosted is Open AI compatible.

Click the button to access Chat Playground! You'll be able to interact with your endpoint, and change parameters such as Temperature, Top-p, Top-k and repetition penalty.

From your Terminal

To interact with the endpoint directly from your terminal, you'll need to follow this process. Our endpoints follow OpenAI's specification, enabling seamless integration with existing tools and libraries.

Prerequisites

Your endpoint URL (format: https://<id>-<hash>.ai.sesterce.dev/)
Your API Secret (provided upon deployment)
Model ID (retrieved via API)
OpenAI-compatible client or SDK

Authentication

Verifying connection

First, list available models:

curl -H "x-api-key: <SECRET>" -X GET "<ENDPOINT>/v1/models"

Model types and endpoints

Ensure to replace <SECRET> by your own secret (available from launched instance page), as well as <MODEL_ID> and <ENDPOINT>, to be replaced by your own data.

1. Text generation

Endpoint: /v1/chat/completions

curl -H "Content-Type: application/json" \
     -H "x-api-key: <SECRET>" \
     -X POST "<ENDPOINT>/v1/chat/completions" \
     -d '{
       "model": "<MODEL_ID>",
       "messages": [
         {
           "role": "user",
           "content": "Hello, how are you?"
         }
       ]
     }'

2. Multimodel (text+image)

Endpoint: /v1/chat/completions

curl -H "Content-Type: application/json" \
     -H "x-api-key: <SECRET>" \
     -X POST "<ENDPOINT>/v1/chat/completions" \
     -d '{
       "model": "<MODEL_ID>",
       "messages": [
         {
           "role": "user",
           "content": [
             {
               "type": "text",
               "text": "What's in this image?"
             },
             {
               "type": "image_url",
               "image_url": {
                 "url": "https://example.com/image.jpg"
               }
             }
           ]
         }
       ]
     }'

3. Audio-Speech Recognition (ASR)

Endpoint: /v1/audio/transcriptions

curl -H "x-api-key: <SECRET>" \
     -X POST "<ENDPOINT>/v1/audio/transcriptions" \
     -H "Content-Type: multipart/form-data" \
     -F file="@/path/to/audio.mp3" \
     -F model="<MODEL_ID>"

Open AI SDK integration

Javascript/Typescript

import OpenAI from "openai";

const openai = new OpenAI({
    apiKey: "<SECRET>",
    baseURL: "<ENDPOINT>/v1"
});

// Text Generation
async function generateText() {
    const completion = await openai.chat.completions.create({
        model: "<MODEL_ID>",
        messages: [
            {
                role: "user",
                content: "Hello, how are you?"
            }
        ]
    });
    console.log(completion.choices[0].message.content);
}

// Multimodal
async function analyzeImage() {
    const response = await openai.chat.completions.create({
        model: "<MODEL_ID>",
        messages: [
            {
                role: "user",
                content: [
                    {
                        type: "text",
                        text: "What's in this image?"
                    },
                    {
                        type: "image_url",
                        image_url: {
                            url: "https://example.com/image.jpg"
                        }
                    }
                ]
            }
        ]
    });
}

Python

from openai import OpenAI

client = OpenAI(
    api_key="<SECRET>",
    base_url="<ENDPOINT>/v1"
)

# Text Generation
response = client.chat.completions.create(
    model="<MODEL_ID>",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

# Audio Transcription
with open("audio.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="<MODEL_ID>", 
        file=audio_file
    )

Best Practices

Performance Optimization
- Use batch processing for multiple requests
- Implement caching when possible
- Compress files before upload
- Use URLs for large files
Security
- Never share your API Secret
- Implement rate limiting
- Monitor usage patterns
- Secure stored credentials
Integration Tips
- Always validate model availability
- Implement proper error handling
- Use the SDK for robust integration
- Keep dependencies updated

Supported Formats and Limitations

File Formats

Images: JPEG, PNG, WEBP, GIF
Audio: mp3, mp4, mpeg, mpga, m4a, wav, webm

Size Limits

Images: Maximum 20MB
Audio: Maximum 25MB
Audio Duration: Up to 4 hours

Error Handling

Common Error Codes

401: Invalid API Secret
404: Model not found
429: Too many requests
500: Server error

Error Handling example

try:
    response = client.chat.completions.create(...)
except Exception as e:
    if "file too large" in str(e):
        # Handle size error
    elif "unsupported file type" in str(e):
        # Handle format error
    else:
        # Handle other errors

If you need support, please reach us at [email protected].

PreviousEdit an inference instance NextManage your instances

Last updated 3 months ago

Was this helpful?