How does Granite Embedding 97M Multilingual R2 compare to other sub-100M multilingual embedding models?

According to the Hugging Face Blog, it achieves a score of 60.3 on MTEB Multilingual Retrieval, making it the top-ranked open model under 100M parameters in that benchmark.

What context length do the new Granite Embedding R2 models support?

Both the 97M and 311M models support up to 32,768 tokens of context, a 64x increase over their R1 predecessors.

What license governs the Granite Embedding R2 models?

Both models are released under the Apache 2.0 license, permitting commercial use without royalty obligations.

IBM Granite Embedding Multilingual R2: 97M and 311M Parameter Models Top MTEB Multilingual Retrieval Charts

IBM’s Granite Embedding Multilingual R2 family — a 97M-parameter compact model and a 311M-parameter full-size model — both built on ModernBERT, claim the top spot among open sub-100M multilingual embedders and second place among open models under 500M parameters respectively on the MTEB Multilingual Retrieval benchmark. Released under Apache 2.0 with 32,768-token context windows, these models push the frontier of what small open-source embedding systems can deliver across language breadth.

MTEB Multilingual Retrieval Benchmark Results

According to the Hugging Face Blog, IBM’s granite-embedding-97m-multilingual-r2 scores 60.3 on MTEB Multilingual Retrieval, surpassing every other open multilingual embedding model under 100M parameters. Its larger sibling, granite-embedding-311m-multilingual-r2, reaches 65.2 on the same benchmark, placing it second among open models with fewer than 500M parameters. Both models use 32K-token context — a 64x expansion over the R1 generation — and add code retrieval capability spanning nine programming languages, a meaningful addition for engineering teams working across international codebases.

Architecture and Language Coverage

Both models are built atop ModernBERT and produce embeddings without requiring task-specific instruction prefixes, a usability advantage over instruction-tuned alternatives like E5-mistral. The 311M model outputs 768-dimensional vectors with Matryoshka dimension support, allowing downstream teams to truncate embeddings for storage or latency trade-offs without retraining. The 97M model produces 384-dimensional embeddings. The Hugging Face Blog notes that while the underlying encoder was pretrained on text from 200+ languages, 52 languages receive explicit retrieval-pair and cross-lingual fine-tuning for higher accuracy.

Drop-in Integration and Deployment Flexibility

IBM engineered both models for minimal adoption friction. They function as drop-in replacements inside LangChain, LlamaIndex, Haystack, and Milvus via a single model-name change, with no API modifications or new dependencies required. ONNX and OpenVINO weights are included, enabling CPU-optimized inference for organizations that cannot rely on GPU infrastructure.

Why This Matters

The sub-100M parameter embedding category has historically been dominated by English-centric models, forcing practitioners to choose between speed and multilingual fidelity. Granite Embedding R2’s performance suggests that ModernBERT’s architecture, combined with broad multilingual pretraining and targeted retrieval fine-tuning, can close much of that quality gap at compact size. For teams building retrieval-augmented generation pipelines over multilingual corpora — legal, healthcare, government, or global e-commerce — this release expands the viable model tier downward, reducing inference costs without the usual accuracy penalty. The Apache 2.0 license removes legal friction for commercial deployments, which enterprise procurement teams typically cite as a prerequisite for open-model adoption. Whether these MTEB scores hold across domain-specific multilingual corpora outside the benchmark distribution remains to be validated, but the headline numbers establish a new reference point for the sub-100M retrieval class.