New Open-Weight Models
Sarvam AI has launched two substantial foundational AI models, featuring 30 billion and 105 billion parameters respectively. These powerful language models are
now accessible for download under an open-source Apache 2.0 license, facilitated through platforms like AIKosh and Hugging Face. This initiative marks a pivotal moment for Indian AI development, aiming to reduce reliance on foreign technology and foster a more inclusive AI ecosystem. The models were initially showcased at the India-AI Impact Summit 2026, highlighting their advanced capabilities in reasoning and multilingual processing. Sarvam emphasizes that these models were built entirely in-house, leveraging extensive, high-quality datasets and state-of-the-art computing resources. The project benefited from the Indian government's IndiaAI Mission, receiving crucial GPU access and infrastructure support from Yotta and Nvidia, underscoring a collaborative effort in advancing indigenous AI capabilities. The company aims to empower developers and researchers worldwide with these robust tools.
Technical Innovations Revealed
Delving into the architecture, Sarvam's 30B and 105B models employ a sophisticated Mixture-of-Experts (MoE) transformer design. This design selectively activates parameters, leading to significant reductions in computational cost during operation. The 30B model is equipped with a 32,000-token context window, ideal for real-time conversational applications, while the larger 105B model boasts an expansive 128,000-token window, catering to more intricate, multi-step reasoning tasks. For enhanced efficiency, the Sarvam 30B model utilizes Grouped Query Attention (GQA) to optimize KV-cache memory without compromising performance. In contrast, the Sarvam 105B model incorporates DeepSeek-style Multi-head Latent Attention (MLA), further minimizing memory requirements for processing extended contexts. These architectural choices underscore Sarvam's commitment to creating efficient and scalable AI solutions.
Training Data and Multilingual Focus
The foundation of these advanced models lies in their comprehensive training data, which includes a diverse mix of code, general web content, specialized knowledge domains, mathematical datasets, and extensive multilingual resources. A significant portion of the training investment was dedicated to curating a rich multilingual corpus, specifically focusing on the 10 most widely spoken Indian languages. This deliberate emphasis on Indian languages is further supported by a custom-built tokenizer, meticulously trained from scratch to ensure efficient tokenization across all 22 scheduled Indian languages and their 12 distinct scripts. The tokenizer's effectiveness is evident in its superior performance compared to other open-source alternatives, demonstrating a remarkably low fertility score – the average number of tokens needed to represent a word. This advanced tokenization capability is key to the models' strong performance on Indic text.
Performance Benchmarks
Initial evaluations indicate that the Sarvam 105B model shows promising scaling behavior, outperforming the 30B model on various benchmarks during early training phases. When compared against other large language models of similar scale, the 105B model achieves performance comparable to models like pt-oss 120B and Qwen3-Next (80B) in general capabilities. It also exhibits robust performance in agentic reasoning and task completion, surpassing models such as DeepSeek R1, Gemini 2.5 Flash, and o4-mini on the Tau 2 Bench. However, the 105B model may not be the leading performer in code generation, as its results on the SWE-Bench Verified benchmark lagged behind its counterparts. The 30B model shows competitive results against Nemotron 3 Nano 30B, with slight advantages in coding (SWE-Bench Verified) and agentic reasoning (Tau2), though it performs slightly lower on benchmarks like Live Code Bench v6 and BrowseComp. Notably, Sarvam's 30B model achieves significantly higher throughput, delivering 20% to 40% more tokens per second than Qwen3 due to its optimized code and kernels.
Safety and Application
Sarvam AI has meticulously integrated safety features into its models through a supervised fine-tuning process. This involved training on datasets that address both standard and India-specific risk scenarios, incorporating adversarial prompts and 'jailbreak' style inputs identified through automated red-teaming. These challenging prompts were paired with policy-aligned, safe completions to ensure the models respond responsibly. Internally, the 30B model powers the Samvaad conversational agent platform, while the 105B model forms the backbone of the Indus AI assistant, designed for complex reasoning and agentic workflows. Both models are optimized for deployment across a wide array of hardware, including personal devices like laptops, making them versatile for various applications.














