[R] Higher effort settings reduce deep research accuracy for GPT-5 and Gemini Flash 3
<!-- SC_OFF --><div class="md"><p>We evaluated 22 model configurations across different effort/thinking levels on Deep Research Bench (169 web research tasks, human-verified answers). For two of the most capable models, higher effort settings scored worse. </p>...
查看原文本解读由 AI 自动生成 · 模板:事件解读 · 仅供参考,请以原文为准。