Your Legal AI Has a Prompting Problem

Christopher Martin

June 16, 2025

The Hype and the Hangover

When generative AI hit the mainstream, I was leading innovation work at Latham & Watkins. Like many, I was intrigued—suddenly, tools like ChatGPT were drafting language, summarizing documents, and producing legal-sounding content with impressive fluency. It felt less like a trend and more like a turning point.

We rolled up our sleeves and started testing. Could we speed up contract review? Extract deal terms more efficiently? Reduce the need for first-draft memos? At first, it seemed promising. But the more we used it, the more obvious the cracks became.

The outputs weren’t reliable. Key clauses were misinterpreted. Slight changes in wording led to completely different results. Most concerning of all, we couldn’t explain how the model arrived at its conclusions or whether we could trust it to do the same thing twice.

What initially appeared to be a shortcut became another layer of complexity. We came to a hard truth: these tools weren’t built for legal work but for language. And law isn’t just language. It’s logic, precedent, and risk. Legal outputs need to be structured, auditable, and repeatable. You can’t prompt your way into that.

The Reality Check

Early adopters quickly found themselves trading old inefficiencies for new frustrations, troubleshooting inconsistent outputs and second-guessing unreliable results.

Why? Because these tools weren’t built for legal work.

Legal language isn’t casual conversation. It’s structured. Precise. Precedent-driven. There’s no room for “close enough.” And yet, that’s exactly how large language models (LLMs) operate: by predicting the next most likely word, not by understanding what’s legally correct.

“Prompting simply won't cut it for legal work. An LLM is not a legal expert, and you can't prompt it into being one.” - Doug Bemis, Syntracts co-founder

This is the core tension: legal work demands logic and structure. LLMs, by design, generate plausible text, not correct answers.

Why Legal Language Breaks LLMs

Legal writing is formal and deliberate. Every clause, phrase, and punctuation mark matters. Familiarity isn’t laziness, it’s protection. Lawyers rely on language courts have already vetted.

Meanwhile, LLMs are trained to sound right, not be right. They lack grounding in legal precedent or interpretive logic, and are prone to:

  • Hallucinations: Misapplying precedent or inventing citations
  • Inconsistency: Different answers to the same prompt
  • Opacity: No audit trail for how conclusions were formed
“LLMs are trained to sound right, not be right. This is why they seem very convincing but ultimately turn out to be fundamentally flawed.” – Doug Bemis

And that’s just the theory. In practice, it gets even messier.

A day in the life of a legal prompt engineer:

  • Rewrite the same clause prompt 14 different ways
  • Check Twitter to see if GPT-4 quietly became GPT-4.2 overnight
  • Decode vague complaints into 2,000-character prompt fixes
  • Re-test results across hundreds of documents
  • Track which version of which prompt worked with which model

It’s brittle, expensive, and not what anyone signed up for when they invested in legal AI.

The Prompting Pitfall: Why Iterating Doesn’t Save You

Some teams try to “fix” LLMs with better prompts. But that approach eventually hits a wall—not because the tools are broken, but because they weren’t built for legal work in the first place. It’s like squeezing a balloon: fix one issue, and another pops up somewhere else. Improve precision, and you lose consistency. Add context, and you break the formatting. Trying to build stability this way is exhausting and unreliable.

Prompt engineering is a crutch, not a cure. You can’t prompt your way into legal expertise. Law is about logic, not linguistic guesswork—and LLMs weren’t built to reason.

Even when a firm nails a prompt that seems to work, it’s short-lived. Model updates happen constantly. Every time OpenAI, Google, or Anthropic pushes an update, everything changes behind the scenes:

  • Prompt results shift unpredictably
  • Products scramble to recalibrate workflows
  • Legal teams are left guessing whether to trust today’s outputs
“Sam Altman just tweets randomly, ‘Good news! We completely deprecated the old model…’ And everyone starts going bonkers.” – Christopher Martin, Syntracts co-founder

It’s funny, until your workflow breaks.

Legal teams trying to build stable systems on top of unstable models are left constantly adjusting, second-guessing, and firefighting. Instead of scaling expertise, they’re rebuilding it from scratch every day.

A Smarter Approach: Purpose-Built Legal AI

The problem with prompting isn’t just technical, it’s foundational. It comes down to how today’s mainstream AI is designed.

General purpose LLMs work by predicting the next likely word based on patterns in large datasets. That’s great for writing something that sounds convincing. But legal work isn’t about sounding right, it’s about being right. It demands accuracy, consistency, and explainability.

That foundational rethink is what led us to build Syntracts

Rather than bending a general-purpose model into legal shape, Syntracts starts from legal logic. It trains on synthetic legal data and fine-tunes on firm-specific examples—capturing not just how legal text looks, but how it thinks.

“Law will be transformed by LLMs—but not by prompting. You need something purpose-built.” – Doug Bemis

What Makes Syntracts Different:

  • No Hallucinations: Outputs are grounded in legal logic, not probabilistic guesswork
  • Consistency by Design: One contract, one reliable set of structured interpretations
  • Audit-Ready: Every conclusion is traceable, with a clear decision path
  • Enterprise-Ready: Fully on-premise with zero black-box exposure

And unlike prompt-heavy tools that spit out loosely structured paragraphs requiring human review, Syntracts delivers structured, machine-readable answers—ready for compliance, search, and analysis.

It’s not just faster; it’s more trustworthy.

From Prompt Chaos to Legal Clarity

Before we built Syntracts, we watched legal teams spend hours wrestling with prompt-based AI. Excitement turned into frustration—rewriting instructions, troubleshooting hallucinations, wondering why the same clause generated five different answers.

These weren’t edge cases. They were the norm. And they drained time, introduced risk, and made teams question whether the tools were worth it.

That’s the gap we built Syntracts to close—not with magic, but with structure. Instead of hoping the right prompt produces a usable answer, we focused on making sure the output is predictable, auditable, and grounded in legal logic.

Early Adopters are seeing Real Results:

  • 5x faster contract reviews without compromising accuracy
  • 80% reduction in QA time for AI-generated insights
  • Seamless work integration; no prompt tuning required
  • Full on-premise deployments meeting strict security standards

It’s not just a new way to use legal AI, it’s a new standard for how legal intelligence should work.

The End of the Prompting Era

Prompting made for flashy demos. But law isn’t a demo. It’s contracts, obligations, deadlines, negotiations, and risk.

Legal teams don’t need tools that impress on stage. They need systems they can trust and ones that deliver the same answer every time, with logic they can explain.

That’s why we moved beyond prompting. With Syntracts, legal AI becomes clear, consistent, and completely under your control.

Ready to see it in action?
See why top legal teams are choosing Syntracts to move past the prompting era. Book a demo today.