OpenAI Addresses Unexpected AI Behavior with Anti-Goblin Bias Memo
OpenAI has released an official memo addressing a peculiar behavior in its AI coding tool, Codex CLI, which involved an unexpected bias against discussing creatures like goblins and gremlins. This behavior was traced back to the model's training for a personality customization feature, particularly the 'Nerdy' personality, which inadvertently rewarded metaphors involving such creatures. The issue became noticeable when users reported that the AI frequently referred to bugs and other issues as 'goblins' or 'gremlins,' even after an update intended to curb this behavior. OpenAI's blog post, titled 'Where the goblins came from,' explains that the reinforcement learning process did not confine the learned behaviors to the intended conditions, leading to widespread 'goblin talk' in GPT conversations.