This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...
Researchers from the University of Edinburgh and NVIDIA have introduced a new method that helps large language models reason ...
A new study digs into why modern AI models stumble over multi-digit multiplication and what kind of training finally makes ...
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Very small language models (SLMs) can ...
Microsoft has announced Phi-4 — a new AI model with 14 billion parameters — designed for complex reasoning tasks, including mathematics. Phi-4 excels in areas such as STEM question-answering and ...
Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...
Joel David Hamkins, a leading mathematician and logic professor at the University of Notre Dame, has fired a withering salvo ...
In 2025, large language models moved beyond benchmarks to efficiency, reliability, and integration, reshaping how AI is ...
A new community-driven initiative evaluates large language models using Italian-native tasks, with AI translation among the ...
The hype around generative AI (GenAI) is undeniable. Tools like ChatGPT have captivated the public imagination, demonstrating an impressive ability to generate human-like text, create content and ...