Inference Instances

Get Inference models

This endpoint allows you to view available AI models for deployment, aiding in selecting the right model for your needs.

Check model features to match your specific project requirements.

get

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

200Success

application/json

get

GET /ai-inference/models HTTP/1.1
Host: 
x-api-key: text
Accept: */*

200Success

[
  {
    "id": "12ed7523-432c-48f5-b3cd-32e6726d07c8",
    "name": "stable-diffusion",
    "port": 8000,
    "defaultHardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
    "features": [
      "image"
    ]
  }
]

Get inference hardware

You have several hardware options available through the AI inference feature of Sesterce Cloud (you can consult this section for more information). This endpoint allows you to explore options for deploying AI instances, which are crucial for planning resources and manage latency rate.

Evaluate hardware capabilities to ensure optimal performance for your AI tasks.

get

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

200Success

application/json

get

GET /ai-inference/hardwares HTTP/1.1
Host: 
x-api-key: text
Accept: */*

200Success

[
  {
    "id": "59651ba4-657a-41d4-8c42-00f34f732375",
    "name": "1xL40S / 16 vCPU / 232GiB RAM",
    "cpu": 16000,
    "ram": 237568,
    "gpu": {
      "model": "NVIDIA-L40S",
      "count": 1,
      "memory": 48
    }
  }
]

Get Regions available for inference instances

Identify available regions for deploying AI instances, important for compliance and latency considerations.

The region choice is a crucial parameter for your inference endpoint hosting. It will determine the latency rate for your final end-users. Choose regions that align with your data residency and latency needs.

get

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

200Success

application/json

get

GET /ai-inference/regions HTTP/1.1
Host: 
x-api-key: text
Accept: */*

200Success

[
  {
    "id": 18,
    "name": "Singapore",
    "countryCode": "SG",
    "state": "ACTIVE",
    "capacities": [
      {
        "hardwareId": "30328755-51f0-4251-b535-029358700099",
        "capacity": 38
      },
      {
        "hardwareId": "dc172cc8-0035-4527-958b-775f951e8836",
        "capacity": 19
      }
    ]
  }
]

Create a Registry

A registry is necessary if you need to infere your own custom model, which is not publicly available. Click here to learn more about Registries on Sesterce Cloud AI Inference service!

post

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body

urlstringRequiredExample: docker.io/library/user/image:tag

usernamestringRequiredExample: someusername

passwordstringRequiredExample: securepassword

namestringRequiredExample: example-registry

Responses

201Success

application/json

post

POST /ai-inference/registries HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 122

{
  "url": "docker.io/library/user/image:tag",
  "username": "someusername",
  "password": "securepassword",
  "name": "example-registry"
}

201Success

{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-registry",
  "url": "docker.io/library/user/image:tag",
  "username": "someusername",
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}

Get the list of registries created

To manage your registries for storing and accessing AI models, use the following endpoint:

get

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

200Success

application/json

get

GET /ai-inference/registries HTTP/1.1
Host: 
x-api-key: text
Accept: */*

200Success

[
  {
    "_id": "6721058be81810b9dd045f40",
    "name": "example-registry",
    "url": "docker.io/library/user/image:tag",
    "createdAt": "2019-06-26T13:00:00.000Z",
    "updatedAt": "2019-06-26T13:00:00.000Z"
  }
]

Update a registry

To modify registry details to ensure they meet current security and access needs, use the following endpoint:

patch

Path parameters

idstringRequired

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body

urlstringRequiredExample: docker.io/library/user/image:tag

usernamestringRequiredExample: someusername

passwordstringRequiredExample: securepassword

Responses

204

Registry successfully updated.

No content

404

Registry not found.

application/json

patch

PATCH /ai-inference/registries/{id} HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 96

{
  "url": "docker.io/library/user/image:tag",
  "username": "someusername",
  "password": "securepassword"
}

No content

Delete a Registry

The following endpoint allows to remove outdated or unused registries to maintain a clean environment.

delete

Path parameters

idstringRequired

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

204

Registry successfully deleted.

No content

404

Registry not found.

application/json

delete

DELETE /ai-inference/registries/{id} HTTP/1.1
Host: 
x-api-key: text
Accept: */*

No content

Create an inference instance

Time has come! You can now deploy a new AI inference instance to scale your applications and services, or deploy in production an existing model! Use the following endpoint to perform this action.

To create an inference instance, check that your credit balance is filled. Please check here our documentation to top up your balance.

post

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body

modelIdstring | nullableRequiredExample: 201a99c3-7cd4-4831-865e-b261082fda4b

containerPortnumberRequiredExample: 80

namestringRequiredExample: example-inference-instance

descriptionstring | nullableRequiredExample: example description.

podLifetimenumber | nullableRequiredExample: 120

envsobject | nullableRequiredExample: {"PORT":"3333"}

hardwareIdstringRequiredExample: 59651ba4-657a-41d4-8c42-00f34f732375

startupCommandstring | nullableRequiredExample: npx create-llama

registryIdstring | nullableRequiredExample: 6721058be81810b9dd045f40

apiKeyIdsstring[] | nullableRequiredExample: ["6721058be81810b9dd045f40"]

Responses

201Success

application/json

400

You do not have enough credits to run this inference instance for at least one hour. Please add more credits.

application/json

404

Model image not found.

application/json

post

POST /ai-inference/instances HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 601

{
  "modelId": "201a99c3-7cd4-4831-865e-b261082fda4b",
  "containerPort": 80,
  "name": "example-inference-instance",
  "description": "example description.",
  "autoScalingConfigurations": [
    {
      "regionId": 18,
      "scale": {
        "min": 1,
        "max": 4,
        "cooldownPeriod": 300,
        "triggers": {
          "cpu": {
            "threshold": 92
          },
          "gpuMemory": {
            "threshold": 92
          },
          "gpuUtilization": {
            "threshold": 92
          },
          "memory": {
            "threshold": 92
          },
          "http": {
            "rate": 12,
            "window": 20
          }
        }
      }
    }
  ],
  "podLifetime": 120,
  "envs": {
    "PORT": "3333"
  },
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "startupCommand": "npx create-llama",
  "registryId": "6721058be81810b9dd045f40",
  "apiKeyIds": [
    "6721058be81810b9dd045f40"
  ]
}

{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-inference-instance",
  "status": "ACTIVE",
  "hourlyPrice": 2.55,
  "features": [
    "chat"
  ],
  "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}

Start an inference instance

This endpoint allows you to activate an AI inference instance to begin processing tasks and data.

You can monitor startup times to assess performance efficiency.

post

Path parameters

idstringRequired

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

204

Inference instance successfully started

No content

404

Inference instance not found.

application/json

post

POST /ai-inference/instances/{id}/start HTTP/1.1
Host: 
x-api-key: text
Accept: */*

No content

Get the list of your Inference instances

Here is the endpoint to monitor your active AI instances to manage resources and performance.

get

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

200Success

application/json

get

GET /ai-inference/instances HTTP/1.1
Host: 
x-api-key: text
Accept: */*

200Success

[
  {
    "_id": "6721058be81810b9dd045f40",
    "name": "example-inference-instance",
    "status": "ACTIVE",
    "hourlyPrice": 2.55,
    "features": [
      "chat"
    ],
    "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
    "createdAt": "2019-06-26T13:00:00.000Z",
    "updatedAt": "2019-06-26T13:00:00.000Z"
  }
]

Get details about a specific Inference Instance

Retrieve detailed information about a specific AI instance for management and troubleshooting.

get

Path parameters

idstringRequired

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

200Success

application/json

404

Inference instance not found.

application/json

get

GET /ai-inference/instances/{id} HTTP/1.1
Host: 
x-api-key: text
Accept: */*

{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-inference-instance",
  "status": "ACTIVE",
  "hourlyPrice": 2.55,
  "features": [
    "chat"
  ],
  "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
  "containerPort": 80,
  "description": "some description",
  "podLifetime": 120,
  "containers": [
    {
      "regionId": 78,
      "scale": {
        "min": 1,
        "max": 4,
        "cooldownPeriod": 300,
        "triggers": {
          "cpu": {
            "threshold": 92
          },
          "gpuMemory": {
            "threshold": 92
          },
          "gpuUtilization": {
            "threshold": 92
          },
          "memory": {
            "threshold": 92
          },
          "http": {
            "rate": 12,
            "window": 20
          }
        }
      },
      "deployStatus": {
        "total": 2,
        "ready": 1
      },
      "errorMessage": "Unexpected error occurred while launching the instance. Please contact the support team."
    }
  ],
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "envs": {
    "PORT": "3333"
  },
  "startupCommand": "npx create-llama",
  "apiKeysIds": [
    "6721058be81810b9dd045f40"
  ],
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}

Preview AI instance pricing

This endpoints allows you to estimate costs for your running AI instances, helping in budget planning.

Sesterce Cloud AI inference service is based on an unlimited-token pricing. This means you are charged for a global hour price, whatever the use of your dedicated endpoint.

post

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body

hardwareIdstringRequiredExample: 59651ba4-657a-41d4-8c42-00f34f732375

Responses

201Success

application/json

post

POST /ai-inference/instances/pricing HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 117

{
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "autoScalingConfigurations": [
    {
      "regionId": 18,
      "scale": {
        "max": 4
      }
    }
  ]
}

201Success

{
  "currencyCode": "USD",
  "pricePerHour": 1.44,
  "pricePerMonth": 43.2
}

Update an inference instance

This endpoint allows you to modify existing AI instances to adapt to changing project needs. This is particularly useful is you need to update your hardware flavor and/or autoscaling limits according to the use of your dedicated endpoint :

patch

Path parameters

idstringRequired

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Body

modelIdstring | nullableRequiredExample: 201a99c3-7cd4-4831-865e-b261082fda4b

containerPortnumberRequiredExample: 80

descriptionstring | nullableRequiredExample: example description.

podLifetimenumber | nullableRequiredExample: 120

envsobject | nullableRequiredExample: {"PORT":"3333"}

hardwareIdstringRequiredExample: 59651ba4-657a-41d4-8c42-00f34f732375

startupCommandstring | nullableRequiredExample: npx create-llama

registryIdstring | nullableRequiredExample: 6721058be81810b9dd045f40

apiKeyIdsstring[] | nullableRequiredExample: ["6721058be81810b9dd045f40"]

Responses

200Success

application/json

404

Inference instance not found.

application/json

patch

PATCH /ai-inference/instances/{id} HTTP/1.1
Host: 
x-api-key: text
Content-Type: application/json
Accept: */*
Content-Length: 565

{
  "modelId": "201a99c3-7cd4-4831-865e-b261082fda4b",
  "containerPort": 80,
  "description": "example description.",
  "autoScalingConfigurations": [
    {
      "regionId": 18,
      "scale": {
        "min": 1,
        "max": 4,
        "cooldownPeriod": 300,
        "triggers": {
          "cpu": {
            "threshold": 92
          },
          "gpuMemory": {
            "threshold": 92
          },
          "gpuUtilization": {
            "threshold": 92
          },
          "memory": {
            "threshold": 92
          },
          "http": {
            "rate": 12,
            "window": 20
          }
        }
      }
    }
  ],
  "podLifetime": 120,
  "envs": {
    "PORT": "3333"
  },
  "hardwareId": "59651ba4-657a-41d4-8c42-00f34f732375",
  "startupCommand": "npx create-llama",
  "registryId": "6721058be81810b9dd045f40",
  "apiKeyIds": [
    "6721058be81810b9dd045f40"
  ]
}

{
  "_id": "6721058be81810b9dd045f40",
  "name": "example-inference-instance",
  "status": "ACTIVE",
  "hourlyPrice": 2.55,
  "features": [
    "chat"
  ],
  "address": "https://iate-example-6672-d7c85154.ai.sesterce.dev/",
  "createdAt": "2019-06-26T13:00:00.000Z",
  "updatedAt": "2019-06-26T13:00:00.000Z"
}

Stop an inference instance

If you need to pause an AI instance to conserve resources and manage costs, use the following endpoint:

post

Path parameters

idstringRequired

Header parameters

x-api-keystringRequired

The API Key secret should be sent through this header to authenticate the request.

Responses

204

Inference instance successfully stopped

No content

404

Inference instance not found.

application/json

post

POST /ai-inference/instances/{id}/stop HTTP/1.1
Host: 
x-api-key: text
Accept: */*

No content

PreviousVolumes NextWhich compute instance for AI models training and inference?

Last updated 11 months ago

Was this helpful?