Sesterce Cloud Doc
  • 👋Welcome on Sesterce Cloud
    • 🚀Get Started!
      • Account creation
      • Manage your account
      • Payment & Billing
        • Invoicing
  • 🚀Compute instances
    • Compute Instance configuration
      • Persistent storage (volumes)
      • SSH Keys
    • Terminal connection
  • 💬AI Inference instances
    • Inference Instance configuration
      • Select your Flavor
      • Select your regions
      • Autoscaling limits
    • Edit an inference instance
    • Chat with Endpoint
  • ▶️Manage your instances
  • 🔗API Reference
    • Authentication
    • GPU Cloud instances
    • SSH Keys
    • Volumes
    • Inference Instances
  • 📗Tutorials
    • Expose AI model from Hugging Face using vLLM
Powered by GitBook
On this page

Was this helpful?

  1. API Reference

Inference Instances

PreviousVolumesNextExpose AI model from Hugging Face using vLLM

Last updated 5 months ago

Was this helpful?

Get Inference models

This endpoint allows you to view available AI models for deployment, aiding in selecting the right model for your needs.

Check model features to match your specific project requirements.

Get inference hardware

You have several hardware options available through the AI inference feature of Sesterce Cloud (you can consult this section for more information). This endpoint allows you to explore options for deploying AI instances, which are crucial for planning resources and manage latency rate.

Evaluate hardware capabilities to ensure optimal performance for your AI tasks.

Get Regions available for inference instances

Identify available regions for deploying AI instances, important for compliance and latency considerations.

The region choice is a crucial parameter for your inference endpoint hosting. It will determine the latency rate for your final end-users. Choose regions that align with your data residency and latency needs.

Create a Registry

Get the list of registries created

To manage your registries for storing and accessing AI models, use the following endpoint:

Update a registry

To modify registry details to ensure they meet current security and access needs, use the following endpoint:

Delete a Registry

The following endpoint allows to remove outdated or unused registries to maintain a clean environment.

Create an inference instance

Time has come! You can now deploy a new AI inference instance to scale your applications and services, or deploy in production an existing model! Use the following endpoint to perform this action.

Start an inference instance

This endpoint allows you to activate an AI inference instance to begin processing tasks and data.

You can monitor startup times to assess performance efficiency.

Get the list of your Inference instances

Here is the endpoint to monitor your active AI instances to manage resources and performance.

Get details about a specific Inference Instance

Retrieve detailed information about a specific AI instance for management and troubleshooting.

Preview AI instance pricing

This endpoints allows you to estimate costs for your running AI instances, helping in budget planning.

Sesterce Cloud AI inference service is based on an unlimited-token pricing. This means you are charged for a global hour price, whatever the use of your dedicated endpoint.

Update an inference instance

This endpoint allows you to modify existing AI instances to adapt to changing project needs. This is particularly useful is you need to update your hardware flavor and/or autoscaling limits according to the use of your dedicated endpoint :

Stop an inference instance

If you need to pause an AI instance to conserve resources and manage costs, use the following endpoint:

A registry is necessary if you need to infere your own custom model, which is not publicly available. on Sesterce Cloud AI Inference service!

To create an inference instance, check that your credit balance is filled. Please our documentation to top up your balance.

🔗
check here
Click here to learn more about Registries
get
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
200Success
application/json
get
GET /ai-inference/models HTTP/1.1
Host: 
x-api-key: text
Accept: */*
200Success
[
  {
    "id": "12ed7523-432c-48f5-b3cd-32e6726d07c8",
    "name": "stable-diffusion",
    "port": 8000,
    "defaultHardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
    "features": [
      "image"
    ]
  }
]
get
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
200Success
application/json
get
GET /ai-inference/hardwares HTTP/1.1
Host: 
x-api-key: text
Accept: */*
200Success
[
  {
    "id": "59651ba4-657a-41d4-8c42-00f34f732375",
    "name": "1xL40S / 16 vCPU / 232GiB RAM",
    "cpu": 16000,
    "ram": 237568,
    "gpu": {
      "model": "NVIDIA-L40S",
      "count": 1,
      "memory": 48
    }
  }
]
get
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
200Success
application/json
get
GET /ai-inference/regions HTTP/1.1
Host: 
x-api-key: text
Accept: */*
200Success
[
  {
    "id": 18,
    "name": "Singapore",
    "countryCode": "SG",
    "state": "ACTIVE",
    "capacities": [
      {
        "hardwareId": "30328755-51f0-4251-b535-029358700099",
        "capacity": 38
      },
      {
        "hardwareId": "dc172cc8-0035-4527-958b-775f951e8836",
        "capacity": 19
      }
    ]
  }
]
get
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
200Success
application/json
get
GET /ai-inference/registries HTTP/1.1
Host: 
x-api-key: text
Accept: */*
200Success
[
  {
    "_id": "6721058be81810b9dd045f40",
    "name": "example-registry",
    "url": "docker.io/library/user/image:tag",
    "createdAt": "2019-06-26T13:00:00.000Z",
    "updatedAt": "2019-06-26T13:00:00.000Z"
  }
]
delete
Path parameters
idstringRequired
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
204
Registry successfully deleted.
404
Registry not found.
application/json
delete
DELETE /ai-inference/registries/{id} HTTP/1.1
Host: 
x-api-key: text
Accept: */*

No content

post
Path parameters
idstringRequired
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
204
Inference instance successfully started
404
Inference instance not found.
application/json
post
POST /ai-inference/instances/{id}/start HTTP/1.1
Host: 
x-api-key: text
Accept: */*

No content

get
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
200Success
application/json
get
GET /ai-inference/instances HTTP/1.1
Host: 
x-api-key: text
Accept: */*
200Success
[
  {
    "_id": "6721058be81810b9dd045f40",
    "name": "example-inference-instance",
    "status": "ACTIVE",
    "hourlyPrice": 2.55,
    "features": [
      "chat"
    ],
    "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
    "createdAt": "2019-06-26T13:00:00.000Z",
    "updatedAt": "2019-06-26T13:00:00.000Z"
  }
]
get
Path parameters
idstringRequired
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
200Success
application/json
404
Inference instance not found.
application/json
get
GET /ai-inference/instances/{id} HTTP/1.1
Host: 
x-api-key: text
Accept: */*
{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-inference-instance",
  "status": "ACTIVE",
  "hourlyPrice": 2.55,
  "features": [
    "chat"
  ],
  "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
  "containerPort": 80,
  "description": "some description",
  "podLifetime": 120,
  "containers": [
    {
      "regionId": 78,
      "scale": {
        "min": 1,
        "max": 4,
        "cooldownPeriod": 300,
        "triggers": {
          "cpu": {
            "threshold": 92
          },
          "gpuMemory": {
            "threshold": 92
          },
          "gpuUtilization": {
            "threshold": 92
          },
          "memory": {
            "threshold": 92
          },
          "http": {
            "rate": 12,
            "window": 20
          }
        }
      },
      "deployStatus": {
        "total": 2,
        "ready": 1
      },
      "errorMessage": "Unexpected error occurred while launching the instance. Please contact the support team."
    }
  ],
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "envs": {
    "PORT": "3333"
  },
  "startupCommand": "npx create-llama",
  "apiKeysIds": [
    "6721058be81810b9dd045f40"
  ],
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}
post
Path parameters
idstringRequired
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses
204
Inference instance successfully stopped
404
Inference instance not found.
application/json
post
POST /ai-inference/instances/{id}/stop HTTP/1.1
Host: 
x-api-key: text
Accept: */*

No content

  • Get Inference models
  • GET/ai-inference/models
  • Get inference hardware
  • GET/ai-inference/hardwares
  • Get Regions available for inference instances
  • GET/ai-inference/regions
  • Create a Registry
  • POST/ai-inference/registries
  • Get the list of registries created
  • GET/ai-inference/registries
  • Update a registry
  • PATCH/ai-inference/registries/{id}
  • Delete a Registry
  • DELETE/ai-inference/registries/{id}
  • Create an inference instance
  • POST/ai-inference/instances
  • Start an inference instance
  • POST/ai-inference/instances/{id}/start
  • Get the list of your Inference instances
  • GET/ai-inference/instances
  • Get details about a specific Inference Instance
  • GET/ai-inference/instances/{id}
  • Preview AI instance pricing
  • POST/ai-inference/instances/pricing
  • Update an inference instance
  • PATCH/ai-inference/instances/{id}
  • Stop an inference instance
  • POST/ai-inference/instances/{id}/stop
post
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body
urlstringRequiredExample: docker.io/library/user/image:tag
usernamestringRequiredExample: someusername
passwordstringRequiredExample: securepassword
namestringRequiredExample: example-registry
Responses
201Success
application/json
post
POST /ai-inference/registries HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 122

{
  "url": "docker.io/library/user/image:tag",
  "username": "someusername",
  "password": "securepassword",
  "name": "example-registry"
}
201Success
{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-registry",
  "url": "docker.io/library/user/image:tag",
  "username": "someusername",
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}
patch
Path parameters
idstringRequired
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body
urlstringRequiredExample: docker.io/library/user/image:tag
usernamestringRequiredExample: someusername
passwordstringRequiredExample: securepassword
Responses
204
Registry successfully updated.
404
Registry not found.
application/json
patch
PATCH /ai-inference/registries/{id} HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 96

{
  "url": "docker.io/library/user/image:tag",
  "username": "someusername",
  "password": "securepassword"
}

No content

post
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body
modelIdstring | nullableRequiredExample: 201a99c3-7cd4-4831-865e-b261082fda4b
containerPortnumberRequiredExample: 80
namestringRequiredExample: example-inference-instance
descriptionstring | nullableRequiredExample: example description.
podLifetimenumber | nullableRequiredExample: 120
envsobject | nullableRequiredExample: {"PORT":"3333"}
hardwareIdstringRequiredExample: 59651ba4-657a-41d4-8c42-00f34f732375
startupCommandstring | nullableRequiredExample: npx create-llama
registryIdstring | nullableRequiredExample: 6721058be81810b9dd045f40
apiKeyIdsstring[] | nullableRequiredExample: ["6721058be81810b9dd045f40"]
Responses
201Success
application/json
400
You do not have enough credits to run this inference instance for at least one hour. Please add more credits.
application/json
404
Model image not found.
application/json
post
POST /ai-inference/instances HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 601

{
  "modelId": "201a99c3-7cd4-4831-865e-b261082fda4b",
  "containerPort": 80,
  "name": "example-inference-instance",
  "description": "example description.",
  "autoScalingConfigurations": [
    {
      "regionId": 18,
      "scale": {
        "min": 1,
        "max": 4,
        "cooldownPeriod": 300,
        "triggers": {
          "cpu": {
            "threshold": 92
          },
          "gpuMemory": {
            "threshold": 92
          },
          "gpuUtilization": {
            "threshold": 92
          },
          "memory": {
            "threshold": 92
          },
          "http": {
            "rate": 12,
            "window": 20
          }
        }
      }
    }
  ],
  "podLifetime": 120,
  "envs": {
    "PORT": "3333"
  },
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "startupCommand": "npx create-llama",
  "registryId": "6721058be81810b9dd045f40",
  "apiKeyIds": [
    "6721058be81810b9dd045f40"
  ]
}
{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-inference-instance",
  "status": "ACTIVE",
  "hourlyPrice": 2.55,
  "features": [
    "chat"
  ],
  "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}
post
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body
hardwareIdstringRequiredExample: 59651ba4-657a-41d4-8c42-00f34f732375
Responses
201Success
application/json
post
POST /ai-inference/instances/pricing HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 117

{
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "autoScalingConfigurations": [
    {
      "regionId": 18,
      "scale": {
        "max": 4
      }
    }
  ]
}
201Success
{
  "currencyCode": "USD",
  "pricePerHour": 1.44,
  "pricePerMonth": 43.2
}
patch
Path parameters
idstringRequired
Header parameters
x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body
modelIdstring | nullableRequiredExample: 201a99c3-7cd4-4831-865e-b261082fda4b
containerPortnumberRequiredExample: 80
descriptionstring | nullableRequiredExample: example description.
podLifetimenumber | nullableRequiredExample: 120
envsobject | nullableRequiredExample: {"PORT":"3333"}
hardwareIdstringRequiredExample: 59651ba4-657a-41d4-8c42-00f34f732375
startupCommandstring | nullableRequiredExample: npx create-llama
registryIdstring | nullableRequiredExample: 6721058be81810b9dd045f40
apiKeyIdsstring[] | nullableRequiredExample: ["6721058be81810b9dd045f40"]
Responses
200Success
application/json
404
Inference instance not found.
application/json
patch
PATCH /ai-inference/instances/{id} HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 565

{
  "modelId": "201a99c3-7cd4-4831-865e-b261082fda4b",
  "containerPort": 80,
  "description": "example description.",
  "autoScalingConfigurations": [
    {
      "regionId": 18,
      "scale": {
        "min": 1,
        "max": 4,
        "cooldownPeriod": 300,
        "triggers": {
          "cpu": {
            "threshold": 92
          },
          "gpuMemory": {
            "threshold": 92
          },
          "gpuUtilization": {
            "threshold": 92
          },
          "memory": {
            "threshold": 92
          },
          "http": {
            "rate": 12,
            "window": 20
          }
        }
      }
    }
  ],
  "podLifetime": 120,
  "envs": {
    "PORT": "3333"
  },
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "startupCommand": "npx create-llama",
  "registryId": "6721058be81810b9dd045f40",
  "apiKeyIds": [
    "6721058be81810b9dd045f40"
  ]
}
{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-inference-instance",
  "status": "ACTIVE",
  "hourlyPrice": 2.55,
  "features": [
    "chat"
  ],
  "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}