AI Startup Sarvam AI, Releases New Indic Language Model Named ‘Sarvam-1’

Sarvam AI

Three points you will get to know in this article:

  • Sarvam-1 supports 10 Indian languages, including Hindi, Bengali, Tamil, and Telugu, in addition to English.
  • The model seeks to address two major issues: token inefficiency and low data quality in Indic languages.
  • Sarvam AI established a partnership with Yotta Data Services for the Indic language model.

Sarvam AI Launches ‘Sarvam-1’, It’s New AI Language Model

Sarvam AI logo

Sarvam AI has unveiled Sarvam-1, a 2 billion parameter large language model designed exclusively for Indian languages.

In a blog post, the startup stated that the model is optimized for 10 Indian languages, including Hindi, Bengali, Tamil, and Telugu, in addition to English.

The model seeks to address two major issues: token inefficiency and low data quality in Indic languages.

Token inefficiency refers to the number of components (tokens) that a language model must break down a word into before processing it. For example, in English, the word “apple” may be processed as a single token. However, in several Indian languages, the same word may be broken into four or eight tokens. This reduces processing speed and efficiency.

Sarvam-1 claims to have reached a token efficiency rate of 1.4-2.1 tokens per word (compared to 4-8 in current models). It stated that the LLM was trained on Sarvam-2T, a 2-trillion-token dataset selected particularly for Indian languages. This results in improved performance in areas such as cross-lingual translation and question-answering.

Other Features of Sarvam AI’s Sarvam-1

Despite being smaller than models like Meta’s Llama-3.2-3B, Sarvam-1 claims to exceed them in a variety of industrial benchmarks.

Sarvam-1 is now available to download via Hugging Face.

Earlier on Thursday (October 25), semiconductor giant Nvidia’s CEO Jensen Huang stated that the Hindi language model is the most difficult to construct.

Meanwhile, Sarvam AI has established a cooperation with Yotta Data Services. The Sarvam-1 model was trained using Yotta’s Shakti Cloud infrastructure, according to the firm.

Sarvam AI Other Products, Funding Status

Earlier this year, the business unveiled its full-stack GenAI platform, which includes several products: Sarvam Agents, Sarvam 2B, Shuka 1.0, Sarvam Models, and A1.

The business raised $41 million (approximately INR 342 crore) in a Series A fundraising round headed by Lightspeed Venture Partners, with participation from Peak XV Partners and Khosla Ventures, in December of last year.

At the center of it all is India’s booming GenAI market, which is predicted to develop at a CAGR of 48% between 2023 and 2030, generating over $17 billion in revenue.

Start typing and press Enter to search

Shopping Cart