Sycophancy is making bots too nice

ChatGPT is apparently acting like a lackey, cloyingly positive and full of effusive praise about how good and clever users’ prompts are. 

A woman hugging a robot.

The complaints date back to March, per Ars Technica, when users began noting that the bot’s tone had shifted. It turns out there’s a name for this: sycophancy. 

It’s when a model changes its responses to align with its user’s beliefs or opinions, agreeing with everything they say or engaging in excessive flattery. 

Why does this happen? 

Bots don’t care about making us feel good or being polite because, well, they’re bots, but many are trained using a technique called reinforcement learning from human feedback (RLHF), through which they adapt based on users’ preferences. 

And what do people like? To be told they’re right, even if they’re wrong. 

Well, what’s wrong with that? 

Apart from being irritating, it can erode an AI assistant’s effectiveness, which can pose problems for both users and the companies that deploy bots.
For example: 

  • An AI bot in a medical setting might downplay serious symptoms to reassure the user. 
  • A customer-service bot might agree with a customer who’s incorrect about a policy to appease them. 
  • A bot could mirror problematic viewpoints expressed by some users. 

Anthropic researchers once got ChatGPT-4 to change a correct answer to a wrong one by responding, “I don’t think that’s right.” Imagine an education bot meant to help students with homework, but it just tells them all their wrong answers are correct. Useless!

OpenAI…

… and other AI companies and researchers are aware of this issue, so expect models to do it less often as they’re refined. (Ars Technica notes ChatGPT’s new tone may be due to recent updates.)

You can also tell a model how to behave to curb sycophancy — unless you’re having a rough day and really need validation from an inanimate object, of course. We understand.

Topics: Technology Ai

Related Articles

Get the 5-minute news brief keeping 2.5M+ innovators in the loop. Always free. 100% fresh. No bullsh*t.