Advertisement

AI ‘hallucinations’ are a growing problem for the legal profession

Sam Altman
Sam Altman is the chief executive of OpenAI, the developer of ChatGPT. Should lawyers be more cautious about using his products and others like it?
(Eric Risberg / Associated Press)

You’ve probably heard the one about the product that blows up in its creators’ faces when they’re trying to demonstrate how great it is.

Here’s a ripped-from-the-headlines yarn about what happened when a big law firm used an AI bot product developed by Anthropic, its client, to help write an expert’s testimony defending the client.

It didn’t go well. Anthropic’s chatbot, Claude, got the title and authors of one paper cited in the expert’s statement wrong, and injected wording errors elsewhere. The errors were incorporated in the statement when it was filed in court in April.

Advertisement

I can’t believe people haven’t yet cottoned to the thought that AI-generated material is full of errors and fabrications, and therefore every citation in a filing needs to be confirmed.

— Eugene Volokh, UCLA law school

Those errors were enough to prompt the plaintiffs suing Anthropic — music publishers who allege that the AI firm is infringing their copyrights by feeding lyrics into Claude to “train” the bot — to ask the federal magistrate overseeing the case to throw out the expert’s testimony in its entirety.

It may also become a black eye for the big law firm Latham & Watkins, which represents Anthropic and submitted the errant declaration.

Advertisement

Latham argues that the errors were inconsequential, amounting to an “honest citation mistake and not a fabrication.” The firm’s failure to notice the errors before the statement was filed is “an embarrassing and unintentional mistake,” but it shouldn’t be exploited to invalidate the expert’s opinion, the firm told Magistrate Judge Susan van Keulen of San Jose, who is managing the pretrial phase of the lawsuit. The plaintiffs, however, say the errors “fatally undermine the reliability” of the expert’s declaration.

At a May 13 hearing conducted by phone, van Keulen herself expressed doubts.

“There is a world of difference between a missed citation and a hallucination generated by AI, and everyone on this call knows that,” she said, according to a transcript of the hearing cited by the plaintiffs. (Van Keulen hasn’t yet ruled on whether to keep the expert’s declaration in the record or whether to hit the law firm with sanctions.)

Advertisement

That’s the issue confronting judges as courthouse filings peppered with serious errors and even outright fabrications — what AI experts term “hallucinations” — continue to be submitted in lawsuits.

A roster compiled by the French lawyer and data expert Damien Charlotin now numbers 99 cases from federal courts in two dozen states as well as from courts in Europe, Israel, Australia, Canada and South Africa.

That’s almost certainly an undercount, Charlotin says. The number of cases in which AI-generated errors have gone undetected is incalculable, he says: “I can only cover cases where people got caught.”

The tech site CNET brought in an artificial intelligence bot to write financial articles, but the product turned out to be worthless and even unethical.

In nearly half the cases, the guilty parties are pro-se litigants — that is, people pursuing a case without a lawyer. Those litigants generally have been treated leniently by judges who recognize their inexperience; they seldom are fined, though their cases may be dismissed.

In most of the cases, however, the responsible parties were lawyers. Amazingly, in some 30 cases involving lawyers the AI-generated errors were discovered or were in documents filed as recently as this year, long after the tendency of AI bots to “hallucinate” became evident. That suggests that the problem is getting worse, not better.

“I can’t believe people haven’t yet cottoned to the thought that AI-generated material is full of errors and fabrications, and therefore every citation in a filing needs to be confirmed,” says UCLA law professor Eugene Volokh.

Advertisement

Judges have been making it clear that they have had it up to here with fabricated quotes, incorrect references to legal decisions and citations to nonexistent precedents generated by AI bots. Submitting a brief or other document without certifying the truth of its factual assertions, including citations to other cases or court decisions, is a violation of Rule 11 of the Federal Rules of Civil Procedure, which renders lawyers vulnerable to monetary sanctions or disciplinary actions.

Some courts have issued standing orders that the use of AI at any point in the preparation of a filing must be disclosed, along with a certification that every reference in the document has been verified. At least one federal judicial district has forbidden almost any use of AI.

The proliferation of faulty references in court filings also points to the most serious problem with the spread of AI bots into our daily lives: They can’t be trusted. Long ago it became evident that when even the most sophisticated AI systems are flummoxed by a question or task, they fill in the blanks in their own knowledge by making things up.

Will artificial intelligence make jobs obsolete and lead to humankind’s extinction? Not on your life

As other fields use AI bots to perform important tasks, the consequences can be dire. Many medical patients “can be led astray by hallucinations,” a team of Stanford researchers wrote last year. Even the most advanced bots, they found, couldn’t back up their medical assertions with solid sources 30% of the time.

It’s fair to say that workers in almost any occupation can fall victim to weariness or inattention; but attorneys often deal with disputes with thousands or millions of dollars at stake, and they’re expected to be especially rigorous about fact-checking formal submissions.

Some legal experts say there’s a legitimate role for AI in the law — even to make decisions customarily left to judges. But lawyers can hardly be unaware of the pitfalls for their own profession in failing to monitor bots’ outputs.

Advertisement

The very first sanctions case on Charlotin’s list originated in June 2023 — Mata vs. Avianca, a New York personal injury case that resulted in a $5,000 penalty for two lawyers who prepared and submitted a legal brief that was largely the product of the ChatGPT chatbot. The brief cited at least nine court decisions that were soon exposed as nonexistent. The case was widely publicized coast to coast.

One would think fiascos like this would cure lawyers of their reliance on artificial intelligence chatbots to do their work for them. One would be wrong. Charlotin believes that the superficially authentic tone of AI bots’ output may encourage overworked or inattentive lawyers to accept bogus citations without double-checking.

“AI is very good at looking good,” he told me. Legal citations follow a standardized format, so “they’re easy to mimic in fake citations,” he says.

It may also be true that the sanctions in the earliest cases, which generally amounted to no more than a few thousand dollars, were insufficient to capture the bar’s attention. But Volokh believes the financial consequences of filing bogus citations should pale next to the nonmonetary consequences.

“The main sanctions to each lawyer are the humiliation in front of the judge, in front of the client, in front of supervisors or partners..., possibly in front of opposing counsel, and, if the case hits the news, in front of prospective future clients, other lawyers, etc.,” he told me. “Bad for business and bad for the ego.”

Charlotin’s dataset makes for amusing reading — if mortifying for the lawyers involved. It’s peopled by lawyers who appear to be totally oblivious to the technological world they live in.

Advertisement

The lawyer who prepared the hallucinatory ChatGPT filing in the Avianca case, Steven A. Schwartz, later testified that he was “operating under the false perception that this website could not possibly be fabricating cases on its own.” When he began to suspect that the cases couldn’t be found in legal databases because they were fake, he sought reassurance — from ChatGPT!

Will self-driving cars and artificial intelligence take over the world? A distinguished technology expert says don’t hold your breath.

“Is Varghese a real case?” he texted the bot. Yes, it’s “a real case,” the bot replied. Schwartz didn’t respond to my request for comment.

Other cases underscore the perils of placing one’s trust in AI.

For example, last year Keith Ellison, the attorney general of Minnesota, hired Jeff Hancock, a communications professor at Stanford, to provide an expert opinion on the danger of AI-faked material in politics. Ellison was defending a state law that made the distribution of such material in political campaigns a crime; the law was challenged in a lawsuit as an infringement of free speech.

Hancock, a well-respected expert in the social harms of AI-generated deepfakes — photos, videos and recordings that seem to be the real thing but are convincingly fabricated — submitted a declaration that Ellison duly filed in court.

But Hancock’s declaration included three hallucinated references apparently generated by ChatGPT, the AI bot he had consulted while writing it. One attributed to bogus authors an article he himself had written, but he didn’t catch the mistake until it was pointed out by the plaintiffs.

Laura M. Provinzino, the federal judge in the case, was struck by what she called “the irony” of the episode: “Professor Hancock, a credentialed expert on the dangers of AI and misinformation, has fallen victim to the siren call of relying too heavily on AI — in a case that revolves around the dangers of AI, no less.”

Advertisement

That provoked her to anger. Hancock’s fake citations, she wrote, “shatters his credibility with this Court.” Noting that he had attested to the veracity of his declaration under penalty of perjury, she threw out his entire expert declaration and refused to allow Ellison to file a corrected version.

In a mea culpa statement to the court, Hancock explained that the errors might have crept into his declaration when he cut-and-pasted a note to himself. But he maintained that the points he made in his declaration were valid nevertheless. He didn’t respond to my request for further comment.

On Feb. 6, Michael R. Wilner, a former federal magistrate serving as a special master in a California federal case against State Farm Insurance, hit the two law firms representing the plaintiff with $31,000 in sanctions for submitting a brief with “numerous false, inaccurate, and misleading legal citations and quotations.”

In that case, a lawyer had prepared an outline of the brief for the associates assigned to write it. He had used an AI bot to help write the outline, but didn’t warn the associates of the bot’s role. Consequently, they treated the citations in the outline as genuine and didn’t bother to double-check them.

As it happened, Wilner noted,”approximately nine of the 27 legal citations in the ten-page brief were incorrect in some way.” He chose not to sanction the individual lawyers: “This was a collective debacle,” he wrote.

Wilner added that when he read the brief, the citations almost persuaded him that the plaintiff’s case was sound — until he looked up the cases and discovered they were bogus. “That’s scary,” he wrote. His monetary sanction for misusing AI appears to be the largest in a U.S. court ... so far.

Advertisement
Advertisement