How to Fine-Tune Qwen 3.5 on Your Own Data
Alibaba’s Qwen 3.5 is the strongest open-weight multimodal model available today. But a base model trained on general data will not match the performance you need for domain-specific tasks. Fine-tuning Qwen 3.5 on your own data lets you specialize the model for your use case while keeping the broad knowledge from pre-training intact.
This guide walks through the complete process: data preparation, hardware choices, training configuration, evaluation, and cost estimates. We focus on LoRA (Low-Rank Adaptation) because it is the most practical approach for teams without hundreds of GPUs.
Before You Start: What You Need
- Training data: A minimum of 1,000 high-quality examples in your target format. More data helps, but quality matters more than quantity. Clean, consistent, well-labeled examples produce better results than 10x more noisy data.
- Hardware: One or two GPUs with at least 48GB VRAM each (A6000, L40S, or RTX 6000 Ada). Consumer GPUs with 24GB (RTX 4090) work for smaller LoRA ranks with gradient checkpointing.
- Software: Python 3.11+, PyTorch 2.3+, Hugging Face Transformers, PEFT library for LoRA, and the Qwen 3.5 base model weights from Hugging Face.
- Time estimate: 2-8 hours of training depending on dataset size and hardware.
Step 1: Prepare Your Dataset
Your training data should match the input-output format you want the model to produce in production. If you want the model to answer customer questions about your product, each training example should contain a customer question as input and an ideal answer as output.
Format your data as JSONL with three fields per line: system (the system prompt defining behavior), input (the user message), and output (the target response). Here is the structure:
The system field stays consistent across all examples and defines the model’s persona and constraints. Keep it under 200 words. Long system prompts dilute the fine-tuning signal.
Data quality checklist:
- Remove duplicate or near-duplicate examples.
- Verify that every output accurately answers its input.
- Ensure consistent formatting (same tone, same structure, same level of detail).
- Remove examples where the correct answer is ambiguous.
- Balance your categories if your task involves classification.
Step 2: Configure Fine-Tune Qwen 3.5 With LoRA
LoRA works by adding small trainable matrices to the model’s attention layers while keeping the original weights frozen. This reduces GPU memory requirements by 70-80% compared to full fine-tuning and trains 5-10x faster.
Key LoRA parameters to set:
- Rank (r): Start with 16 or 32. Higher ranks capture more complex adaptations but use more memory. For most tasks, 32 is sufficient.
- Alpha: Set to 2x your rank (alpha = 64 for rank 32). This controls the learning rate scaling for the LoRA layers.
- Target modules: Apply LoRA to all attention projection layers (q_proj, k_proj, v_proj, o_proj) and the gate projection in MoE layers. Targeting more layers improves quality but increases training time.
- Learning rate: Start at 2e-4 for LoRA fine-tuning. Use a cosine scheduler with warmup for the first 10% of steps.
- Batch size: As large as your GPU memory allows. With a 48GB GPU and rank-32 LoRA on Qwen 3.5 (4-bit quantized base), you can fit a batch size of 4 with sequences up to 4,096 tokens.
“LoRA rank 32 captures 95% of what full fine-tuning achieves for instruction-following tasks. Save full fine-tuning for cases where you need the model to learn fundamentally new capabilities, not just follow new patterns.” — ML engineer who fine-tuned Qwen 3.5 for medical document extraction.
Step 3: Train and Monitor
Start training with a small subset (100-200 examples) to verify your setup works before running the full dataset. Watch for these signals during training:
Loss should decrease steadily for the first 1-2 epochs, then flatten. If loss spikes or oscillates, reduce the learning rate.
Stop at 2-3 epochs for most datasets. Training beyond 3 epochs on small datasets (under 5,000 examples) usually leads to overfitting. The model will memorize training examples instead of learning generalizable patterns.
Save checkpoints every 500 steps. After training, evaluate each checkpoint on a held-out validation set and pick the best one, not necessarily the last one.
Step 4: Evaluate the Fine-Tuned Model
Run your fine-tuned model on a test set of 100-200 examples that were not in the training data. Compare outputs against the base model to measure improvement.
Check for three things: task accuracy (does it produce correct answers?), format compliance (does it follow your output format?), and regression (did it lose general capabilities you still need?).
If accuracy improved but general knowledge degraded, reduce the number of training epochs or increase the LoRA rank to give the model more capacity for new knowledge without overwriting existing weights.
Cost Estimates for Fine-Tuning Qwen 3.5
Cloud GPU rental: An A6000 on Lambda Cloud costs about $1.10/hour. A typical fine-tuning run on 5,000 examples takes 4-6 hours, costing $4.40-$6.60. Multi-GPU setups on A100s run about $2.00-$3.50/hour per GPU.
Total project cost: Including experimentation, hyperparameter tuning, and multiple runs, expect to spend $20-$80 in compute for a well-prepared fine-tuning project. This is dramatically cheaper than even one month of high-volume API costs for a custom use case.
Fine-tuning Qwen 3.5 gives you a model that is specialized to your data, runs on your infrastructure, and has no per-token API costs. For teams with proprietary data and specific output requirements, it is the most cost-effective path to production-quality AI.