Google's Gemini 2.5 Pro outperforms GPT-4o on Hindi and Tamil reasoning tasks

New benchmarks released by researchers at IIT Madras show Gemini 2.5 Pro scoring 18 points higher than GPT-4o on a standardised Indian language reasoning test.

Google’s Gemini 2.5 Pro has outperformed OpenAI’s GPT-4o on a comprehensive set of Indian language reasoning benchmarks, according to research published by a team at IIT Madras on Monday. The results suggest that Google has made significant progress in training its models on Indian language data.

The benchmark, called IndicReason, tests language models on tasks including reading comprehension, mathematical word problems, logical inference, and cultural context understanding — all in Hindi, Tamil, Telugu, Bengali, and Marathi. Gemini 2.5 Pro scored an average of 74.2 across all languages, compared to GPT-4o’s 56.1 and Anthropic’s Claude 3.5 Sonnet’s 61.8.

The researchers noted that all three models performed significantly worse on Tamil and Telugu than on Hindi, suggesting that the training data for southern Indian languages remains considerably thinner than for Hindi. “The gap between Hindi performance and Tamil performance is around 20 points for every model tested,” the paper noted. “This is a problem that no frontier lab has yet solved.”

Google said it had invested significantly in Indian language data collection over the past two years, working with universities, government bodies, and content creators across multiple states. The company declined to give specific figures on the volume of training data.

Google’s Gemini 2.5 Pro outperforms GPT-4o on Hindi and Tamil reasoning tasks