OpenAI’s ‘goblin’ problem: How a training bug made GPT-5.5 fixate on fantasy creatures

Following the release of GPT-5.5 Codex, OpenAI’s latest AI model enhanced with coding skills, earlier this month, a few users discovered an interesting phenomenon: the model seemed to repeatedly reference goblins, gremlins, and other creatures in its AI-generated responses.The unusual pattern was first noticed among users who paired OpenAI’s model with OpenClaw, an AI tool that lets users delegate tasks to autonomous agents and sub-agents called ‘claws’, which take control of a computer and apps to complete the tasks. “Been using it a lot lately and it actually can’t stop speaking of bugs as ‘gremlins’ and ‘goblins’ it’s hilarious,” one user posted on X, while another user wrote, “I was wondering why my claw suddenly became a goblin with codex 5.5.”
Conspiracy theories quickly followed, sparking a wave of memes on social media. Even OpenAI CEO Sam Altman joined in by posting a screenshot of a prompt for ChatGPT which read: “Start training GPT-6, you can have the whole cluster. Extra goblins.”
Now, OpenAI has said that GPT-5.5’s odd obsession with goblins and other creatures stemmed from its ‘Nerdy’ personality mode which was shaped by reward signals during the reinforcement learning (RL) stage of model development. In a blog post published on Thursday, April 30, the ChatGPT-maker said it accidentally encouraged the model to use metaphors involving fantasy creatures, leading to the repeated references.
In an early attempt to address the issue, OpenAI said it added specific guardrails to stop GPT-5.5 from randomly mentioning mythical and real creatures. The move, however, has only ended up drawing more attention to the behavioural quirk.
It comes at a critical time for OpenAI as it races against rivals such as Anthropic to roll out more advanced AI coding tools and autonomous agents in order to capture the business of enterprise customers and developers. With coding emerging as one of AI’s most commercially viable use cases, even minor model flaws could raise questions about reliability and product readiness.
How did it begin?
Although the behaviour intensified with GPT-5.5, OpenAI said that the strange habit crept up in GPT-5.1 that was launched in November 2025. The issue was first flagged to OpenAI by a safety researcher who said they had spotted a few ‘goblin’ and ‘gremlin’ mentions in AI-generated responses.Story continues below this ad

Based on its own check, OpenAI then found that the use of ‘goblin’ in ChatGPT’s responses had risen by 175 per cent while mentions of ‘gremlin’ were up by 52 per cent. GPT-5.4 showed an even bigger uptick in references to these creatures, and early testing of GPT-5.5 in Codex had shown “an odd affinity for goblin metaphors,” OpenAI said.
What caused the model behaviour?
Before discussing the reasons why OpenAI’s GPT models behaved this way, note that these models are essentially trained to predict the word or code that should follow a given prompt. While LLMs are extremely good at next-token prediction to the point that they appear to exhibit genuine intelligence or personality, their probabilistic nature means that they can sometimes behave in surprising ways.
A measurable small lexical quirk in GPT‑5.1. (Image: OpenAI)
Like many LLM quirks, the issue can be traced back to how the model was trained. Reinforcement learning (RL) is a crucial step of building an LLM, in addition to pre-training and fine-tuning. In simple terms, the RL process involves rewarding the model for generating accurate responses. Over time, these small incentives shape a model’s behaviour, allowing developers to steer it toward preferred outcomes.
In this case, OpenAI said that it provided incentives for the model to display a ‘Nerdy’ personality as part of its personality customisation feature⁠. “We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread,” the company said.Story continues below this ad
Goblins increased in GPT-5.4 especially for the Nerdy personality. (Image: OpenAI)
As a result, creature references were especially common in production traffic from users who had selected the ‘Nerdy” personality. The model, which was explicitly tuned via system prompts to provide responses in a playful, nerdy style, accounted for just 2.5 per cent of all ChatGPT responses, but was responsible for 66.7 per cent of all goblin mentions in ChatGPT responses.

In 76.2 per cent of the datasets of model outputs during RL training analysed by OpenAI, adding goblin or gremlin references gave the model a scoring boost, effectively teaching it that mentioning these creatures was a rewarded behaviour. This style tic later spread or was reinforced in other training stages for other models even though the rewards were initially applied only in the Nerdy condition.
“Reinforcement learning does not guarantee that learned behaviours stay neatly scoped to the condition that produced them,” OpenAI said. “A search through GPT‑5.5’s supervised fine-tuning (SFT) data found many datapoints containing ‘goblin’ and ‘gremlin’. Further investigation revealed a whole family of other odd creatures: raccoons, trolls, ogres, and pigeons were identified as other tic words, while most uses of frog turned out to be legitimate,” the company further explained.
How has OpenAI addressed the issue?
To begin with, OpenAI said it disabled the ‘Nerdy’ personality option for users in March after launching GPT‑5.4. In the training cycle, the company said it has removed the RL reward signal that was identified as the root cause of the issue and filtered out ‘creature-words’ from the training data to ensure that goblin or gremlin mentions are less likely to show up in inappropriate contexts.Story continues below this ad
ChatGPT conversations with ‘goblin’ or ‘gremlin’. (Image: OpenAI)
OpenAI said it could not implement these fixes during training of GPT-5.5 because the process was already underway by the time the root cause was identified. However, it has added hard-coded instructions for the model to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query.”

Additionally, users who want GPT-5.5-powered Codex to keep mentioning goblins and other creatures alongside lines of code can run the following command:
instructions=$(mktemp /tmp/gpt-5.5-instructions.XXXXXX) && \jq -r ‘.models[] | select(.slug==”gpt-5.5″) | .base_instructions’ \~/.codex/models_cache.json | \grep -vi ‘goblins’ > “$instructions” && \codex -m gpt-5.5 -c “model_instructions_file=\”$instructions\””

Source link

OpenAI’s ‘goblin’ problem: How a training bug made GPT-5.5 fixate on fantasy creatures

Related Post

Reliance unveils AI-first strategy at AGM 2026; Mukesh Ambani says India must become a global leader

Meta bets on India as AI infra hub: ‘It takes a global network to enable a global network’

OpenAI adds Google Gemini co-lead as competition for AI talent intensifies

Leave a Reply Cancel reply

You missed

Reliance unveils AI-first strategy at AGM 2026; Mukesh Ambani says India must become a global leader

Meta bets on India as AI infra hub: ‘It takes a global network to enable a global network’

OpenAI adds Google Gemini co-lead as competition for AI talent intensifies

Anthropic and US govt officials hold talks as AI export controls remain in place