vLoRA

Shared low-rank subspaces for efficient LoRA adapter management.

One shared basis. Per-task coefficients. Up to 122× compression at scale.

$ pip install vlora-dev
Based on arXiv:2602.06043 Apache 2.0 PyTorch ≥ 2.0

Benchmarks

Tested with 8 Lots-of-LoRAs adapters (Mistral-7B, rank 16, 96 layers each).

Variance Explained

B matrices share structure much more strongly.

k Var (A) Var (B)
1 0.19 0.43
2 0.37 0.73
4 0.69 0.95
6 1.00 1.00

Reconstruction Error

Relative L2 norm — near-perfect at k=6.

k Mean Err Max Err
1 0.826 0.938
4 0.387 0.846
6 0.000002 0.000003

Compression at Scale

Shared basis is a one-time cost.

N Full vLoRA Ratio
8 288 MB 288 MB 1.0×
100 3,600 MB 289 MB 12.5×
1,000 36,000 MB 293 MB 122.8×

The 3-Step Algorithm

Build a shared subspace, project new adapters, and absorb them — all in a few lines of code.

1

Initialize

SharedSubspace.from_adapters()

SVD on stacked weight matrices to extract the shared basis across all adapters.

2

Project

subspace.project()

New adapter is reduced to a small loadings vector — per-task coefficients against the shared basis.

3

Absorb

subspace.absorb()

Incorporate a new adapter and recompute the basis to include its structure.

Key Insight

LoRA adapters across tasks share a common low-rank subspace. Instead of storing N separate adapters, maintain one shared basis and per-task coefficient vectors — achieving up to 100× parameter reduction. The shared basis is a one-time cost; each new adapter adds only k loadings per layer.

Beyond Compression

A complete toolkit for LoRA adapter lifecycle — from training to merging to serving.

Adapter Merging

Task arithmetic, TIES, and DARE merging — combine adapters into one with state-of-the-art techniques.

vlora merge adapters/* --method ties

Instant Adapter Switching

VLoRAModel wraps any PyTorch model — switch adapters with a single call, no reloading.

model.set_task("sentiment")

Train in the Subspace

100×+ parameter reduction — optimize k loadings per layer instead of full LoRA matrices.

SubspaceTrainer(subspace, "task")

HuggingFace Trainer

Drop-in VLoRACallback for HF Trainer — train subspace loadings with your existing pipeline.

pip install vlora-dev[hf]

Serving Ready

Export to vLLM, TGI, or Ollama-compatible formats with proper adapter configs.

vlora export --alpha 32

9 CLI Commands

compress, export, merge, analyze, validate, diff, benchmark, info, add — everything from the terminal.

vlora validate subspace/

Quickstart

Get started in minutes with the vLoRA Python library.

quickstart.py
from vlora import SharedSubspace, load_adapter

# Step 1: Build shared subspace from existing adapters
adapters = [load_adapter(f"adapters/task_{i}") for i in range(5)]
subspace = SharedSubspace.from_adapters(adapters, num_components=16)

# Step 2: Project a new adapter (only stores small loadings vector)
new_adapter = load_adapter("adapters/new_task")
projection = subspace.project(new_adapter, task_id="new_task")
subspace.add_task(projection)

# Step 3: Absorb — recompute basis to include new adapter
subspace.absorb(load_adapter("adapters/another_task"), new_task_id="another")

# Reconstruct any task back to full LoRA weights
weights = subspace.reconstruct("new_task")

# Save / load
subspace.save("shared_subspace/")
subspace = SharedSubspace.load("shared_subspace/")

Run the benchmark yourself:

$ pip install vlora-dev[hub] && python examples/real_adapters.py