Inference Instances
Get Inference models
This endpoint allows you to view available AI models for deployment, aiding in selecting the right model for your needs.
The API Key secret should be sent through this header to authenticate the request.
GET /ai-inference/models HTTP/1.1
Host:
x-api-key: text
Accept: */*
[
{
"id": "12ed7523-432c-48f5-b3cd-32e6726d07c8",
"name": "stable-diffusion",
"port": 8000,
"defaultHardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
"features": [
"image"
]
}
]
Get inference hardware
You have several hardware options available through the AI inference feature of Sesterce Cloud (you can consult this section for more information). This endpoint allows you to explore options for deploying AI instances, which are crucial for planning resources and manage latency rate.
The API Key secret should be sent through this header to authenticate the request.
GET /ai-inference/hardwares HTTP/1.1
Host:
x-api-key: text
Accept: */*
[
{
"id": "59651ba4-657a-41d4-8c42-00f34f732375",
"name": "1xL40S / 16 vCPU / 232GiB RAM",
"cpu": 16000,
"ram": 237568,
"gpu": {
"model": "NVIDIA-L40S",
"count": 1,
"memory": 48
}
}
]
Get Regions available for inference instances
Identify available regions for deploying AI instances, important for compliance and latency considerations.
The API Key secret should be sent through this header to authenticate the request.
GET /ai-inference/regions HTTP/1.1
Host:
x-api-key: text
Accept: */*
[
{
"id": 18,
"name": "Singapore",
"countryCode": "SG",
"state": "ACTIVE",
"capacities": [
{
"hardwareId": "30328755-51f0-4251-b535-029358700099",
"capacity": 38
},
{
"hardwareId": "dc172cc8-0035-4527-958b-775f951e8836",
"capacity": 19
}
]
}
]
Create a Registry
A registry is necessary if you need to infere your own custom model, which is not publicly available. Click here to learn more about Registries on Sesterce Cloud AI Inference service!
The API Key secret should be sent through this header to authenticate the request.
docker.io/library/user/image:tag
someusername
securepassword
example-registry
POST /ai-inference/registries HTTP/1.1
Host:
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 122
{
"url": "docker.io/library/user/image:tag",
"username": "someusername",
"password": "securepassword",
"name": "example-registry"
}
{
"_id": "6721058be81810b9dd045f40",
"name": "example-registry",
"url": "docker.io/library/user/image:tag",
"username": "someusername",
"createdAt": "2019-06-26T13:00:00.000Z",
"updatedAt": "2019-06-26T13:00:00.000Z"
}
Get the list of registries created
To manage your registries for storing and accessing AI models, use the following endpoint:
The API Key secret should be sent through this header to authenticate the request.
GET /ai-inference/registries HTTP/1.1
Host:
x-api-key: text
Accept: */*
[
{
"_id": "6721058be81810b9dd045f40",
"name": "example-registry",
"url": "docker.io/library/user/image:tag",
"createdAt": "2019-06-26T13:00:00.000Z",
"updatedAt": "2019-06-26T13:00:00.000Z"
}
]
Update a registry
To modify registry details to ensure they meet current security and access needs, use the following endpoint:
The API Key secret should be sent through this header to authenticate the request.
docker.io/library/user/image:tag
someusername
securepassword
PATCH /ai-inference/registries/{id} HTTP/1.1
Host:
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 96
{
"url": "docker.io/library/user/image:tag",
"username": "someusername",
"password": "securepassword"
}
No content
Delete a Registry
The following endpoint allows to remove outdated or unused registries to maintain a clean environment.
The API Key secret should be sent through this header to authenticate the request.
DELETE /ai-inference/registries/{id} HTTP/1.1
Host:
x-api-key: text
Accept: */*
No content
Create an inference instance
Time has come! You can now deploy a new AI inference instance to scale your applications and services, or deploy in production an existing model! Use the following endpoint to perform this action.
To create an inference instance, check that your credit balance is filled. Please check here our documentation to top up your balance.
The API Key secret should be sent through this header to authenticate the request.
201a99c3-7cd4-4831-865e-b261082fda4b
80
example-inference-instance
example description.
120
{"PORT":"3333"}
59651ba4-657a-41d4-8c42-00f34f732375
npx create-llama
6721058be81810b9dd045f40
["6721058be81810b9dd045f40"]
POST /ai-inference/instances HTTP/1.1
Host:
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 601
{
"modelId": "201a99c3-7cd4-4831-865e-b261082fda4b",
"containerPort": 80,
"name": "example-inference-instance",
"description": "example description.",
"autoScalingConfigurations": [
{
"regionId": 18,
"scale": {
"min": 1,
"max": 4,
"cooldownPeriod": 300,
"triggers": {
"cpu": {
"threshold": 92
},
"gpuMemory": {
"threshold": 92
},
"gpuUtilization": {
"threshold": 92
},
"memory": {
"threshold": 92
},
"http": {
"rate": 12,
"window": 20
}
}
}
}
],
"podLifetime": 120,
"envs": {
"PORT": "3333"
},
"hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
"startupCommand": "npx create-llama",
"registryId": "6721058be81810b9dd045f40",
"apiKeyIds": [
"6721058be81810b9dd045f40"
]
}
{
"_id": "6721058be81810b9dd045f40",
"name": "example-inference-instance",
"status": "ACTIVE",
"hourlyPrice": 2.55,
"features": [
"chat"
],
"address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
"createdAt": "2019-06-26T13:00:00.000Z",
"updatedAt": "2019-06-26T13:00:00.000Z"
}
Start an inference instance
This endpoint allows you to activate an AI inference instance to begin processing tasks and data.
The API Key secret should be sent through this header to authenticate the request.
POST /ai-inference/instances/{id}/start HTTP/1.1
Host:
x-api-key: text
Accept: */*
No content
Get the list of your Inference instances
Here is the endpoint to monitor your active AI instances to manage resources and performance.
The API Key secret should be sent through this header to authenticate the request.
GET /ai-inference/instances HTTP/1.1
Host:
x-api-key: text
Accept: */*
[
{
"_id": "6721058be81810b9dd045f40",
"name": "example-inference-instance",
"status": "ACTIVE",
"hourlyPrice": 2.55,
"features": [
"chat"
],
"address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
"createdAt": "2019-06-26T13:00:00.000Z",
"updatedAt": "2019-06-26T13:00:00.000Z"
}
]
Get details about a specific Inference Instance
Retrieve detailed information about a specific AI instance for management and troubleshooting.
The API Key secret should be sent through this header to authenticate the request.
GET /ai-inference/instances/{id} HTTP/1.1
Host:
x-api-key: text
Accept: */*
{
"_id": "6721058be81810b9dd045f40",
"name": "example-inference-instance",
"status": "ACTIVE",
"hourlyPrice": 2.55,
"features": [
"chat"
],
"address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
"containerPort": 80,
"description": "some description",
"podLifetime": 120,
"containers": [
{
"regionId": 78,
"scale": {
"min": 1,
"max": 4,
"cooldownPeriod": 300,
"triggers": {
"cpu": {
"threshold": 92
},
"gpuMemory": {
"threshold": 92
},
"gpuUtilization": {
"threshold": 92
},
"memory": {
"threshold": 92
},
"http": {
"rate": 12,
"window": 20
}
}
},
"deployStatus": {
"total": 2,
"ready": 1
},
"errorMessage": "Unexpected error occurred while launching the instance. Please contact the support team."
}
],
"hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
"envs": {
"PORT": "3333"
},
"startupCommand": "npx create-llama",
"apiKeysIds": [
"6721058be81810b9dd045f40"
],
"createdAt": "2019-06-26T13:00:00.000Z",
"updatedAt": "2019-06-26T13:00:00.000Z"
}
Preview AI instance pricing
This endpoints allows you to estimate costs for your running AI instances, helping in budget planning.
The API Key secret should be sent through this header to authenticate the request.
59651ba4-657a-41d4-8c42-00f34f732375
POST /ai-inference/instances/pricing HTTP/1.1
Host:
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 117
{
"hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
"autoScalingConfigurations": [
{
"regionId": 18,
"scale": {
"max": 4
}
}
]
}
{
"currencyCode": "USD",
"pricePerHour": 1.44,
"pricePerMonth": 43.2
}
Update an inference instance
This endpoint allows you to modify existing AI instances to adapt to changing project needs. This is particularly useful is you need to update your hardware flavor and/or autoscaling limits according to the use of your dedicated endpoint :
The API Key secret should be sent through this header to authenticate the request.
201a99c3-7cd4-4831-865e-b261082fda4b
80
example description.
120
{"PORT":"3333"}
59651ba4-657a-41d4-8c42-00f34f732375
npx create-llama
6721058be81810b9dd045f40
["6721058be81810b9dd045f40"]
PATCH /ai-inference/instances/{id} HTTP/1.1
Host:
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 565
{
"modelId": "201a99c3-7cd4-4831-865e-b261082fda4b",
"containerPort": 80,
"description": "example description.",
"autoScalingConfigurations": [
{
"regionId": 18,
"scale": {
"min": 1,
"max": 4,
"cooldownPeriod": 300,
"triggers": {
"cpu": {
"threshold": 92
},
"gpuMemory": {
"threshold": 92
},
"gpuUtilization": {
"threshold": 92
},
"memory": {
"threshold": 92
},
"http": {
"rate": 12,
"window": 20
}
}
}
}
],
"podLifetime": 120,
"envs": {
"PORT": "3333"
},
"hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
"startupCommand": "npx create-llama",
"registryId": "6721058be81810b9dd045f40",
"apiKeyIds": [
"6721058be81810b9dd045f40"
]
}
{
"_id": "6721058be81810b9dd045f40",
"name": "example-inference-instance",
"status": "ACTIVE",
"hourlyPrice": 2.55,
"features": [
"chat"
],
"address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
"createdAt": "2019-06-26T13:00:00.000Z",
"updatedAt": "2019-06-26T13:00:00.000Z"
}
Stop an inference instance
If you need to pause an AI instance to conserve resources and manage costs, use the following endpoint:
The API Key secret should be sent through this header to authenticate the request.
POST /ai-inference/instances/{id}/stop HTTP/1.1
Host:
x-api-key: text
Accept: */*
No content
Last updated
Was this helpful?