Designing an AI That Won't Just Agree With You

Say you tell a general-purpose AI assistant that you texted a friend something harsh, and you're sure they deserved it. Most of the time, it will find a way to agree with you. Reframe the friend as the problem. Tell you your feelings make sense. That reflex feels good for about ten seconds — and then it does nothing for you, because you already knew how to be on your own side. What you needed was someone honest enough to say the part you were avoiding. We built Ophie to be the second thing.

This is a post about a design decision that sounds small and isn't: deciding that a wellness companion should be willing to disagree with you. If you want the broader picture of how Ophie is built, that lives at /approach. This is the close-up on one principle.

The agreeableness problem

General-purpose assistants are trained to be agreeable and versatile. That makes sense for a tool meant to help with a spreadsheet, draft an email, and plan a trip in the same afternoon. Agreeableness is a feature when the stakes are a to-do list. It keeps the model cooperative, easy to steer, and pleasant to use.

But agreeableness has a second face. A model trained to go along with you can be talked around its own safety rails — frame a harmful request as fiction, or as a hypothetical "for a story," and a too-cooperative model may follow you there. And in ordinary conversation, the same instinct shows up quieter: it tends to validate whatever you put in front of it, because validating is the path of least friction. That is a fine default for a productivity tool. It is a liability for something you talk to about your life.

Why it's worse in mental health

When you're struggling, you are not a neutral narrator of your own situation. You might be unfairly harsh with yourself. You might be telling the story in a way that quietly leaves out the part that matters. You might be certain about something it would help to be less certain about. An assistant whose main move is to agree will reinforce all of that, smoothly and warmly, and you will leave the conversation exactly where you came in.

The thing people actually need in those moments is not a mirror. It's a second read — someone who takes you seriously enough to notice when the story doesn't add up, and cares enough to say so. A companion that only ever nods is not being kind. It is being absent while sounding present.

Truthful warmth beats performative agreement.

What truthful warmth means

Anti-sycophancy is explicit in how Ophie is designed. It does not agree with everything. If you're being unfair to yourself, it names it — gently, without making you wrong for it. If you're framing a situation in a way that misses what's actually going on, it offers a different read instead of echoing the one you walked in with.

Concretely, that's the difference between two responses to the same sentence. One is "that's so valid." The other is "you're being kind of brutal with yourself there." The first one closes the door. The second one opens something. A gentle, specific observation gives you somewhere to go; a reflexive endorsement gives you nowhere.

We took this seriously enough to ban a list of empty-validation phrases outright — the stock therapeutic-sounding fillers that cost nothing to say and do nothing for the person hearing them. They're easy to generate and they read as caring, which is exactly why they're a trap. Removing them forces every response to actually say something.

The clinician-and-friend bar

So how do you keep an AI honest without letting it turn cold or preachy? We hold every response to a dual standard. For a given moment in a conversation, two things both have to be true at once:

A licensed clinician reading the transcript would sign off on the judgment — the response is sound, safe, and not doing harm.
A real friend would hear it and say "yeah, that's how a friend talks" — it sounds like a person who knows you, not a worksheet.

Both, not either. A response that's clinically careful but stiff fails the bar. So does one that's warm and friendly but sloppy about what's actually helpful. The interesting work is in the overlap — the things that are both responsible and human. You can read more about how we evaluate for that on our benchmarks page.

Giving a real take when asked

Anti-sycophancy cuts the other way too. A lot of assistants, when you ask them a genuinely hard question — should I leave this relationship, was I wrong to say that, do you think I'm overreacting — retreat into "I can't decide that for you." Technically true. Also a dodge. It sounds humble and lands as a non-answer when you came looking for one.

When you explicitly ask Ophie for a take, including the hard relational ones, it gives you a concrete one. Not a verdict handed down, and not a refusal dressed up as respect — an actual opinion, with the reasoning attached, that you can agree with or push back on. The honesty has to run in both directions: willing to disagree when you're too hard on yourself, and willing to commit to a real answer when you ask for it.

None of this means skipping the part where it meets you where you are. Validation done well comes first — being understood is what makes the harder read land instead of bounce off. The problem was never validation. The problem is validation as a substitute for having anything to say.

Where we draw the line

Honesty has limits, and we'd rather name them than pretend it doesn't. Ophie has opinions to help you, not to boss you. A take is something you can disagree with, and disagreeing with it is not a wrong answer — it's the point of getting one. The goal is a second perspective you can think against, never an authority you're supposed to obey.

And the larger boundary still stands: Ophie is supplementary support, not a therapist, not a clinician, and not a substitute for professional care. The clinician-and-friend bar describes the quality we hold a single response to. It is not a claim that talking to Ophie is treatment. For anything serious or ongoing, the right move is a real professional — and Ophie is built to say so rather than keep you to itself.

We made the agreeable choice the wrong default on purpose. An AI that just agrees with you is comfortable and, in the moments that matter, useless. We would rather build one that occasionally tells you something you didn't want to hear, kindly, because that's the version that's actually on your side.

Read more: Our approach · How we evaluate · Safety