Large Language Models Math Reasoning

MathEval: a comprehensive benchmark for evaluating large language models on mathematical reasoning capabilities

This study introduces MathEval, a comprehensive benchmarking framework designed to systematically evaluate the mathematical reasoning capabilities of large language models (LLMs). Addressing key ...

Dagens.com on MSN

Even the best AI models can’t reliably do simple math

A new study digs into why modern AI models stumble over multi-digit multiplication and what kind of training finally makes ...

The Brighterside of News on MSN

New memory structure helps AI models think longer and faster without using more power

Researchers from the University of Edinburgh and NVIDIA have introduced a new method that helps large language models reason ...

VentureBeat

How test-time scaling unlocks hidden reasoning abilities in small language models (and allows them to outperform LLMs)

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now Very small language models (SLMs) can ...

5don MSN

One of the world's biggest mathematicians Joel David Hamkins says AI models are basically zero help for mathematics as they produce…

Joel David Hamkins, a leading mathematician and logic professor at the University of Notre Dame, has fired a withering salvo ...

Computerworld

Microsoft introduces Phi-4, an AI model for advanced reasoning tasks

Microsoft has announced Phi-4 — a new AI model with 14 billion parameters — designed for complex reasoning tasks, including mathematics. Phi-4 excels in areas such as STEM question-answering and ...

ExtremeTech

Microsoft's Phi-4-Reasoning Models Bring AI Math and Logic Skills to Smaller Devices

Microsoft has introduced a new set of small language models called Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning, which are described as "marking a new era for efficient AI." These ...

1don MSN

'Basically zero, garbage': Renowned mathematician Joel David Hamkins declares AI models useless for solving math. Here's why

One of the world's biggest mathematicians Joel David Hamkins has slammed AI models used for solving mathematics and called ...

InfoQ

Microsoft Phi-4 is a Small Language Model Specialized for Complex Math Reasoning

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

12d

How 2025 Recalibrated AI Models Race

In 2025, large language models moved beyond benchmarks to efficiency, reliability, and integration, reshaping how AI is ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results