Prerequisites
Step 1: Launch a GPU Instance
Your instance will be ready in about 60 seconds.
Step 2: Connect via SSH
ssh -i ~/.ssh/your_key root@YOUR_INSTANCE_IPStep 3: Install Fine-Tuning Dependencies
pip install trl peft datasets bitsandbytes accelerateStep 4: Upload Your Dataset
scp -i ~/.ssh/your_key dataset.jsonl root@YOUR_IP:/workspace/Step 5: Run Fine-Tuning
Create a training script:
from trl import SFTTrainer
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig
from datasets import load_dataset
model_name = "meta-llama/Llama-4-Scout-17B"
dataset = load_dataset("json", data_files="dataset.jsonl")
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
peft_config=lora_config,
max_seq_length=2048,
tokenizer=tokenizer,
)
trainer.train()
trainer.save_model("./llama-finetuned")Step 6: Export Your Model
scp -r -i ~/.ssh/your_key root@YOUR_IP:/workspace/llama-finetuned ./Step 7: Stop Your Instance
Don't forget to stop your instance when training is complete! Billing stops immediately.
Cost Estimate
Fine-tuning Llama 4 Scout 17B on a typical dataset (10k examples, 3 epochs) takes about 2-4 hours on an H100. At $2.25/hr, that's roughly $4.50–$9.00 total.
