Wollnut Endpoints

Deploy AI Models as APIs

One click to deploy any open-source model as a production-ready, auto-scaling API endpoint. OpenAI-compatible. Pay per token.

Deploy Your First Model Browse Models

One-Click Deploy

Deploy any HuggingFace or custom model as a production API in seconds. We handle vLLM, TGI, and GPU allocation.

OpenAI-Compatible API

Use the same OpenAI SDK you already know. Change the base URL and API key — that's it.

Auto-Scaling

Scale from 1 to N replicas automatically based on request queue depth. Pay only for what you use.

API Key Management

Generate API keys for your endpoints. Give your users access to your Wollnut-hosted models securely.

Real-Time Analytics

Monitor request count, latency (p50/p95/p99), token throughput, and cost — all in real time.

Multi-Model Support

Run text generation, image generation, audio transcription, and embeddings — all on the same platform.

OpenAI-Compatible

Drop-In Replacement

Use any OpenAI SDK. Change two lines — base URL and API key. Your existing code just works.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.wollnut.ai/v1",
    api_key="wn_ek_your_key_here"
)

response = client.chat.completions.create(
    model="meta-llama/Llama-3.1-70B",
    messages=[
        {"role": "user", "content": "Explain transformers in 3 sentences."}
    ],
    temperature=0.7,
    max_tokens=256
)

print(response.choices[0].message.content)

Per-Token Pricing

Pay per 1M tokens. No minimum spend. Scale to zero when idle.

Model	Provider	Category	Per 1M Tokens
Llama 3.1 8B	Meta	Text Generation	₹15
Llama 3.1 70B	Meta	Text Generation	₹120
Llama 3.1 405B	Meta	Text Generation	₹400
Mistral 7B / Mixtral	Mistral	Text Generation	₹12–₹90
DeepSeek V3 / R1	DeepSeek	Text + Reasoning	₹15–₹150
Qwen 2.5	Alibaba	Text Generation	₹12–₹100
Whisper Large V3	OpenAI	Audio Transcription	₹8
Stable Diffusion XL / Flux	Stability AI	Image Generation	₹20

Custom models and fine-tuned weights supported. Pricing varies by model size.

Automatic Scaling

Your endpoint scales from 1 to N GPU replicas based on request queue depth. Scale to zero during quiet hours. No configuration needed.

Min 1 replica

Always-on baseline

Auto-scale up

Based on queue depth

Scale to zero

Pay nothing when idle

Deploy Your First Endpoint

Go from model to production API in under 60 seconds. Start with ₹500 free credit.

Get Started Free Try the Playground