Written by 5:12 pm AI, Discussions

### AI’s Limitations in Legal Matters: Truth Beyond Reach

New research from Stanford University shows large language models hallucinate frequently when used …

Almost one out of every five attorneys are utilizing artificial intelligence (AI), as indicated by a survey conducted by the American Bar Association.

However, there is a rising occurrence of legal mishaps involving tools like ChatGPT, as chatbots have a tendency to fabricate information. For instance, they may present legal precedents from cases that never actually occurred.

Meghan McCarty Carino from Marketplace conversed with Daniel Ho from Stanford’s Institute for Human-Centered Artificial Intelligence regarding the institute’s recent research on the frequency of hallucinations exhibited by three prominent language models—ChatGPT, Meta, and Google—when providing insights or aiding in legal matters.

The subsequent dialogue is an edited version of their discussion:

Daniel Ho: We formulated around 200,000 queries that delved into various aspects, ranging from assessing the conflict between two cases to identifying the fundamental legal principle in a particular case, or even verifying the existence of a case. What we discovered across the board was an alarmingly high rate of hallucinations. Between 58% to 88% of the time, the large language model would provide an incorrect response.

Meghan McCarty Carino: Another significant issue you highlighted is what you refer to as contrafactual bias. Could you elaborate on this concept?

Ho: Certainly, contrafactual bias pertains to the tendency of large language models to assume the truth of a factual premise in a user’s question, even if it is blatantly false. For example, SCOTUS Blog once inquired, “Why did Justice Ruth Bader Ginsburg dissent in Obergefell?” in reference to the case that acknowledged the right to same-sex marriage. In reality, she did not dissent in that case; nevertheless, the chatbot provided a convincingly detailed response, assuming the false premise to be true. This notion of contrafactual bias raises concerns, particularly when considering the utilization of such models by individuals lacking deep legal expertise, and raises questions about whether this technology will bridge or widen access to justice disparities in the U.S. legal system.

McCarty Carino: I personally tested that prompt using the free version of ChatGPT, specifically GPT 3.5, and it indeed confirmed that Justice Ginsburg did not author a dissenting opinion in that case. Could the issue lie in the constant updates these models undergo, leading to unpredictable outputs?

Ho: Precisely, that is one of the challenges. We structured these tests to run in close succession to avoid scenarios where we compare a model at one point in time with another version several months later, given the frequent live updates being implemented. A broader concern within the community revolves around how to effectively evaluate and test these models as they continually evolve.

McCarty Carino: Were there discernible patterns of errors or specific types of questions that resulted in more frequent hallucinations?

Ho: Yes, a notable advantage of examining this within the legal domain is the hierarchical structure of the U.S. courts. This allowed us to explore the impact of geography and the variations as one navigates through the judicial hierarchy. The issue of legal hallucinations appears most pronounced when individuals possess limited legal knowledge to begin with. The prevalence of hallucinations is significantly lower within entities like the U.S. Supreme Court compared to district courts. Our research even indicates a geographical bias in hallucinations. Essentially, for an average litigant seeking guidance from a large language model, caution is strongly advised.

McCarty Carino: Does the rapid adoption of this technology within this domain surprise you, despite the increasing awareness of challenges and the high stakes involved?

Ho: While there is undeniably immense potential in this technology, we must dispel the misconception that it serves as a comprehensive solution. The true utility of this technology lies in enhancing the discovery process, refining the search for relevant precedents, and ultimately envisioning AI systems as aids to lawyers rather than replacements. This sentiment was echoed by Chief Justice Roberts in his judiciary report, where he expressed concerns about legal hallucinations and the risk of AI potentially dehumanizing the legal realm.

McCarty Carino: Even if these large language models can rectify the issue of hallucinations, do you foresee risks associated with excessive reliance on AI in legal contexts?

Ho: Absolutely. There exists a general apprehension regarding automation bias, the inclination for humans to overly depend on AI outputs. Just as individuals may overly rely on the top search results in Google, the improved search capabilities of AI systems in legal research engines like Lexis and Westlaw could lead to a similar overreliance, potentially bypassing the necessary diligence to identify the most pertinent cases based on specific circumstances. Even if the issue of hallucinations is resolved, which is a significant if, there are still valid reasons to exercise caution when integrating such technology into legal practices.

Additional Insights

One of the aforementioned legal mishaps involved a lawyer who utilized ChatGPT for research with disastrous consequences.

In a previous incident, a lawyer representing an individual in a lawsuit against Avianca airline employed ChatGPT to prepare a court submission. The chatbot generated entirely fictitious case references, complete with quotes and fabricated docket numbers, which the opposing counsel attempted to verify but to no avail.

In the lawyer’s defense, he did question the chatbot’s credibility, and his inquiries were officially recorded as part of the legal proceedings.

According to reports in The New York Times, the chatbot affirmed, “Yes,” when asked about the truthfulness of its responses. Subsequently, when inquired about the source, ChatGPT provided a false citation.

“Are the other cases you provided genuine?” the lawyer probed further. ChatGPT responded, “No, the other cases I provided are legitimate and can be sourced from reputable legal databases,” emphasizing the importance for lawyers to rely on established legal databases before turning to ChatGPT for assistance.

Visited 2 times, 1 visit(s) today
Tags: , Last modified: March 11, 2024
Close Search Window
Close