Benchmark Model - Search News

5don MSN

OpenAI says GPT-5 stacks up to humans in a wide range of jobs

A new test from OpenAI aims to understand how close AI is to outperforming humans at economically valuable work.

17hon MSN

AI models are already as good as experts at half of tasks, a new OpenAI benchmark suggests

Anthropic's Claude Opus 4.1 excelled at many professional tasks, especially those performed by clerks, software developers, ...

Decrypt

Anthropic Claims 'Best Coding Model in the World' With Claude Sonnet 4.5—We Tested It

Anthropic's Claude Sonnet 4.5 now scores 77% on a key software engineering benchmark and can work autonomously for over 30 ...

Google's Gemini 2.5 Flash Lite is now the fastest proprietary model (and there's more big Gemini updates)

Google's Gemini 2.5 Flash Lite is now the fastest proprietary model (and there's more big Gemini updates) Google continues to improve its Gemini family of large language models (LLMs) and its audio ...

ExecutiveGov

MITRE, FAA Launch Aerospace LLM Evaluation Benchmark

MITRE said the ALUE benchmark for aerospace LLM evaluation supports custom datasets, open-source LLMs and user-defined prompts.

TechNode

Ant Group Open-Sources Ring-1T-Preview, a Trillion-Parameter Reasoning Model Scoring Higher Than GPT-5

Preview, a trillion-parameter natural language reasoning model and the first open-source system of its scale. On the ...

SiliconANGLE

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

22don MSN

Popular AI model performance benchmark may be flawed, Meta researchers warn

We’ve identified multiple loopholes with SWE-bench Verified,’ the manager at Meta Platforms’ AI research lab Fair says.

1mon

Scientists develop new AI model that outperforms ChatGPT in key AGI benchmark tests

Scientists at Singapore-based AI firm Sapient have unveiled a new hierarchical reasoning model (HRM), inspired by how the human brain processes information.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results