Shared low-rank subspaces for efficient LoRA adapter management.
One shared basis. Per-task coefficients. Up to 122× compression at scale.
Benchmarks
Tested with 8 Lots-of-LoRAs adapters (Mistral-7B, rank 16, 96 layers each).
Variance Explained
B matrices share structure much more strongly.
| k | Var (A) | Var (B) |
|---|---|---|
| 1 | 0.19 | 0.43 |
| 2 | 0.37 | 0.73 |
| 4 | 0.69 | 0.95 |
| 6 | 1.00 | 1.00 |
Reconstruction Error
Relative L2 norm — near-perfect at k=6.
| k | Mean Err | Max Err |
|---|---|---|
| 1 | 0.826 | 0.938 |
| 4 | 0.387 | 0.846 |
| 6 | 0.000002 | 0.000003 |
Compression at Scale
Shared basis is a one-time cost.
| N | Full | vLoRA | Ratio |
|---|---|---|---|
| 8 | 288 MB | 288 MB | 1.0× |
| 100 | 3,600 MB | 289 MB | 12.5× |
| 1,000 | 36,000 MB | 293 MB | 122.8× |
The 3-Step Algorithm
Build a shared subspace, project new adapters, and absorb them — all in a few lines of code.
Initialize
SharedSubspace.from_adapters() SVD on stacked weight matrices to extract the shared basis across all adapters.
Project
subspace.project() New adapter is reduced to a small loadings vector — per-task coefficients against the shared basis.
Absorb
subspace.absorb() Incorporate a new adapter and recompute the basis to include its structure.
Key Insight
LoRA adapters across tasks share a common low-rank subspace. Instead of storing N separate adapters, maintain one shared basis and per-task coefficient vectors — achieving up to 100× parameter reduction. The shared basis is a one-time cost; each new adapter adds only k loadings per layer.
Beyond Compression
A complete toolkit for LoRA adapter lifecycle — from training to merging to serving.
Adapter Merging
Task arithmetic, TIES, and DARE merging — combine adapters into one with state-of-the-art techniques.
vlora merge adapters/* --method ties Instant Adapter Switching
VLoRAModel wraps any PyTorch model — switch adapters with a single call, no reloading.
model.set_task("sentiment") Train in the Subspace
100×+ parameter reduction — optimize k loadings per layer instead of full LoRA matrices.
SubspaceTrainer(subspace, "task") HuggingFace Trainer
Drop-in VLoRACallback for HF Trainer — train subspace loadings with your existing pipeline.
pip install vlora-dev[hf] Serving Ready
Export to vLLM, TGI, or Ollama-compatible formats with proper adapter configs.
vlora export --alpha 32 9 CLI Commands
compress, export, merge, analyze, validate, diff, benchmark, info, add — everything from the terminal.
vlora validate subspace/ Quickstart
Get started in minutes with the vLoRA Python library.
from vlora import SharedSubspace, load_adapter
# Step 1: Build shared subspace from existing adapters
adapters = [load_adapter(f"adapters/task_{i}") for i in range(5)]
subspace = SharedSubspace.from_adapters(adapters, num_components=16)
# Step 2: Project a new adapter (only stores small loadings vector)
new_adapter = load_adapter("adapters/new_task")
projection = subspace.project(new_adapter, task_id="new_task")
subspace.add_task(projection)
# Step 3: Absorb — recompute basis to include new adapter
subspace.absorb(load_adapter("adapters/another_task"), new_task_id="another")
# Reconstruct any task back to full LoRA weights
weights = subspace.reconstruct("new_task")
# Save / load
subspace.save("shared_subspace/")
subspace = SharedSubspace.load("shared_subspace/") Run the benchmark yourself: