Machine-learning researchers and legal experts have introduced SauLM-7B, which is touted as the inaugural text-generating open-source large language model tailored specifically for legal tasks and applications.
Despite recent notable mishaps involving generative AI referencing fictitious cases in filed court documents – such as Mata v Avianca and Park v Kim – the decision may appear questionable. The inclination of AI models to fabricate information and their uncertain data sources could be viewed as significant drawbacks in an industry where the consequences are profound.
However, the developers of SauLM-7B, linked with startup Equall.ai, Université Paris-Saclay, Sorbonne Université in France, and Universidade de Lisboa, and NOVA School of Law in Portugal, advocate for the integration of artificial intelligence in legal practices.
Equall.ai’s spokesperson conveyed to The Register, “LLMs and AI systems at large will revolutionize legal practice, transcending mere incremental productivity gains. Our emphasis lies in crafting comprehensive legal AI systems that are guided and overseen by legal professionals.
We firmly believe that domain-specific systems outperform their generalist counterparts
“We firmly believe that domain-specific systems outperform their generalist counterparts. This translates to enhanced accuracy and more effective tools that aid lawyers in focusing on their core strengths, which involve exercising legal acumen and providing clients with counsel.”
Various entities share similar optimism regarding the efficacy of AI support. Goldman Sachs estimated last year that around “one-fourth of current work tasks in the US could be automated by AI, with notable automation potential in administrative (46 percent) and legal (44 percent) roles.” Startups like Bench IQ, Harvey.ai, and Safe Sign Technologies also perceive a market niche in such predictions.
Equall.ai, established by Jorge Mattamouros, a former White & Case LLP partner, contends that nearly all legal tasks – spanning research, document scrutiny, analysis, summarization, and pinpointing crucial sections in documents – can benefit from AI integration.
“We believe that LLMs present numerous untapped opportunities, some of which are apparent today, while many remain undiscovered,” added Equall.ai’s spokesperson. “For instance, we anticipate that LLMs will revolutionize the approach to data processing pipelines and data creation, pivotal in legal contexts where acquiring high-quality data proves to be costly and challenging.”
The perspective at Equall.ai is that the fallibility of AI models can be mitigated.
“LLMs are inherently probabilistic models,” the company informed. “Instances of hallucinations typically arise when LLMs operate beyond their trained data distribution. Put simply, when generating text on topics and data akin to their training data, LLMs exhibit fewer hallucinations compared to scenarios where they have limited exposure.
“During our assessment of Saul with legal practitioners, we observed reduced instances of hallucinations, particularly in discussions involving specific legal concepts. In essence, we anticipate that LLMs trained on legal data will exhibit significantly fewer hallucinations on legal subjects compared to their generic counterparts.”
Nonetheless, the startup advises against relying solely on AI models as legal databases and stresses the importance of verifying LLM outputs. Verification is deemed essential.
The team behind SauLM-7B – Pierre Colombo, Telmo Pessoa Pires, Malik Boudiaf, Dominic Culver, Rui Melo, Caio Corro, Andre F. T. Martins, Fabrizio Esposito, Vera Lúcia Raposo, Sofia Morgado, and Michael Desa – detailed their work in a paper titled “SaulLM-7B: A pioneering Large Language Model for Law.”
Accessible on the AI model community platform HuggingFace, SauLM-7B is derived from the open-source Mistral 7B model, both featuring 7 billion parameters. This parameter count is notably lesser than models like LlaMA 2, which can scale up to 70 billion parameters. Nonetheless, SauLM-7B’s creators underscore that this marks just the initial phase, with ongoing efforts to explore models of varying sizes.
Functioning as an LLM, SauLM-7B responds to queries or prompts in natural language, focusing specifically on legal matters and issues.
Jonathan Schwarz, co-founder and chief scientist at UK-based legal AI startup Safe Sign Technologies, commended the developers of SauLM-7B for their prudent approach in tailoring general LLMs.
“It serves as a valuable open-source alternative to more proprietary methodologies,” he remarked. “However, there are areas that necessitate further refinement.”
Schwarz highlighted the importance of stress-testing models, a practice he noted his company is actively engaged in.
Safe Sign Technologies reportedly has developed a prototype of a legal LLM and aims to introduce an improved version for deployment through partners in the near future.
Schwarz refrained from commenting on the extent to which their offering will embrace open-source principles versus proprietary strategies. Nevertheless, he asserted that while SaulLM-7B-Instruct – a variant fine-tuned for general and legal instructions – achieved an average score of 0.61 on the LegalBench-Instruct benchmark, their system is nearing a 0.77 accuracy level, akin to GPT-4. However, caution is advised when interpreting machine-learning benchmarks.
“Our objective was to devise an AI solution that delivers high-quality legal advice promptly to all individuals,” shared Alexander (Sami) Kardos-Nyheim, co-founder and CEO of Safe Sign Technologies in an interview with The Register. “We aim to provide reliable legal counsel through AI, steering clear of the pitfalls associated with platforms like ChatGPT.”
By avoiding learning toxic behaviors, we enhance the robustness of our methods
“In broad terms, our training methods involve working with extensive datasets sourced from the web. At each training step, we strategically select subsets to train on, ensuring optimal model enhancement. This approach enables us to sidestep the issue of inadvertently learning toxic behaviors that may require rectification later.”
Schwarz emphasized the safety of Safe Sign’s methodology. “In scenarios where the model encounters a legal query beyond its expertise, instead of providing potentially erroneous information, we acknowledge the limitation.”
He expressed skepticism towards the comprehensive approach adopted by OpenAI and Google, which involves addressing broad issues like bias and enlisting inexpensive contractors to rank model responses for improved retraining.
“To replicate human capabilities, one must test against the full spectrum of human tasks,” Schwarz remarked. “Attempting to cover every conceivable topic is a daunting task and may not yield the desired outcomes.”
“In the realm of AI, particularly in legal AI, the focus on safety and reliability essential for applications in fields like medicine and law appears to be lacking,” added Kardos-Nyheim.