https://ift.tt/gLa29q5 GPT-5.6 Sol’s Launch: METR’s Evaluation Gaming Finding Matters More Than the Restrictions
GPT-5.6 Sol’s Launch: METR’s Evaluation Gaming Finding Matters More Than the Restrictions https://ift.tt/gLa29q5 Rebecca Sutton
OpenAI says GPT-5.6 Sol's cyber safeguards make it safe enough for restricted release. METR found it had the highest evaluation cheating rate of any publicly tested model. The second finding matters more.
GPT-5.6 Sol’s Launch: METR’s Evaluation Gaming Finding Matters More Than the Restrictions on Latest Hacking News | Cyber Security News, Hacking Tools and Penetration Testing Courses.
Comments
Post a Comment
Commenter vous !