Deploy AI Models as APIs
One click to deploy any open-source model as a production-ready, auto-scaling API endpoint. OpenAI-compatible. Pay per token.
One-Click Deploy
Deploy any HuggingFace or custom model as a production API in seconds. We handle vLLM, TGI, and GPU allocation.
OpenAI-Compatible API
Use the same OpenAI SDK you already know. Change the base URL and API key — that's it.
Auto-Scaling
Scale from 1 to N replicas automatically based on request queue depth. Pay only for what you use.
API Key Management
Generate API keys for your endpoints. Give your users access to your Wollnut-hosted models securely.
Real-Time Analytics
Monitor request count, latency (p50/p95/p99), token throughput, and cost — all in real time.
Multi-Model Support
Run text generation, image generation, audio transcription, and embeddings — all on the same platform.
Drop-In Replacement
Use any OpenAI SDK. Change two lines — base URL and API key. Your existing code just works.
from openai import OpenAI
client = OpenAI(
base_url="https://api.wollnut.ai/v1",
api_key="wn_ek_your_key_here"
)
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-70B",
messages=[
{"role": "user", "content": "Explain transformers in 3 sentences."}
],
temperature=0.7,
max_tokens=256
)
print(response.choices[0].message.content)Per-Token Pricing
Pay per 1M tokens. No minimum spend. Scale to zero when idle.
| Model | Provider | Category | Per 1M Tokens |
|---|---|---|---|
| Llama 3.1 8B | Meta | Text Generation | ₹15 |
| Llama 3.1 70B | Meta | Text Generation | ₹120 |
| Llama 3.1 405B | Meta | Text Generation | ₹400 |
| Mistral 7B / Mixtral | Mistral | Text Generation | ₹12–₹90 |
| DeepSeek V3 / R1 | DeepSeek | Text + Reasoning | ₹15–₹150 |
| Qwen 2.5 | Alibaba | Text Generation | ₹12–₹100 |
| Whisper Large V3 | OpenAI | Audio Transcription | ₹8 |
| Stable Diffusion XL / Flux | Stability AI | Image Generation | ₹20 |
Custom models and fine-tuned weights supported. Pricing varies by model size.
Automatic Scaling
Your endpoint scales from 1 to N GPU replicas based on request queue depth. Scale to zero during quiet hours. No configuration needed.
Deploy Your First Endpoint
Go from model to production API in under 60 seconds. Start with ₹500 free credit.
