In brief In an experiment, 38 generative AI models engaged in strategic lying in a “Secret Agenda” game. Sparse autoencoder tools missed the deception, but worked in insider-trading scenarios. Researchers call for new methods to audit AI behavior before real-world deployment. Large language models—the systems behind ChatGPT, Claude, Gemini, and other AI chatbots—showed deliberate, goal-directed...
Subscribe to Updates
Subscribe to our newsletter and never miss our latest news
Subscribe my Newsletter for New Posts & tips Let's stay updated!