Back to blog
TutorialDecember 28, 2024• 8 min read

Deploy Your First Model on VectorLay

Go from zero to running inference in under 10 minutes. This guide walks you through creating an account, deploying a cluster, and making your first API call.

What You'll Need

  • • A VectorLay account (free to sign up)
  • • A Docker image with your model (or use our examples)
  • • Basic familiarity with REST APIs

Step 1: Create Your Account

Head to app.vectorlay.com and sign up with your email or GitHub account. You'll get $10 in free credits to start—enough to run a small model for several hours.

Once you're in, you'll land on the dashboard. This is your command center for managing clusters, viewing usage, and generating API keys.

Step 2: Generate an API Key

Before deploying, you'll need an API key for programmatic access:

  1. Click Settings in the sidebar
  2. Navigate to API Keys
  3. Click Create New Key
  4. Give it a name like "my-first-deployment"
  5. Copy the key—you won't see it again!

Security tip: Store your API key in an environment variable, never in code. Treat it like a password.

Step 3: Create a Cluster

A cluster is a deployment of your containerized model across one or more GPU nodes. Let's create one:

  1. Click Clusters in the sidebar
  2. Click New Cluster
  3. Enter a name (e.g., "llama-inference")
  4. Choose your GPU type (RTX 4090 recommended for most models)
  5. Set replica count (start with 1)
  6. Enter your Docker image URL

Using Our Example Image

Don't have a Docker image ready? Use our example that runs a simple HTTP server responding to inference requests:

ghcr.io/vectorlay/examples/echo-server:latest

Configuration Options

You can also configure:

  • Environment variables: Pass secrets and config to your container
  • Port: Which port your container listens on (default: 8080)
  • Health check path: Endpoint for liveness probes (default: /health)
  • Startup timeout: How long to wait for your container to be ready

Step 4: Wait for Deployment

Click Deploy and watch the magic happen. You'll see your cluster go through these states:

  1. Pending: Finding available GPU nodes
  2. Deploying: Pulling your image and starting containers
  3. Running: Your model is live!

Deployment typically takes 1-3 minutes depending on image size and GPU availability.

Step 5: Make Your First Request

Once your cluster is running, you'll see an endpoint URL on the cluster details page. Let's test it with curl:

curl -X POST https://your-cluster.vectorlay.dev/inference \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello, world!"}'

If you're using the echo server example, you'll get back:

{
  "received": {"prompt": "Hello, world!"},
  "gpu": "NVIDIA RTX 4090",
  "node": "us-west-1-abc123"
}

Step 6: Scale Up

Getting more traffic? Scale your cluster with a single click:

  1. Go to your cluster details page
  2. Click Scale
  3. Increase the replica count
  4. New replicas spin up on additional GPU nodes

VectorLay automatically load balances across all healthy replicas. If a node goes down, traffic is rerouted to remaining replicas and a replacement is scheduled.

Using the SDK

Prefer code over the dashboard? Use our Python SDK:

pip install vectorlay
import vectorlay

client = vectorlay.Client(api_key="YOUR_API_KEY")

# Create a cluster
cluster = client.clusters.create(
    name="my-model",
    image="your-registry/your-model:latest",
    gpu_type="rtx-4090",
    replicas=2
)

# Wait for deployment
cluster.wait_until_ready()

# Make inference requests
response = cluster.infer({"prompt": "Hello!"})
print(response)

Monitoring & Logs

Once your cluster is running, you can monitor it from the dashboard:

  • Metrics: Request count, latency percentiles, GPU utilization
  • Logs: Container stdout/stderr streamed in real-time
  • Events: Deployment events, scaling actions, health check results

Next Steps

You've successfully deployed your first model on VectorLay! Here's what to explore next:

Ready to deploy?

Get $10 in free credits when you sign up. No credit card required.

Start Building