[R] Higher effort settings reduce deep research accuracy for GPT-5 and Gemini Flash 3
<!-- SC_OFF --><div class="md"><p>We evaluated 22 model configurations across different effort/thinking levels on Deep Research Bench (169 web research tasks, human-verified answers). For two of the most capable models, higher effort settings scored worse. </p>...
查看原文AI 资讯解读
本解读由 AI 自动生成,仅供参考。请以原文为准。