Vals Legal AI Report (VLAIR)
All four AI products tested outperformed human lawyers on legal research tasks, with scores ranging from 74-78% compared to 69% for the lawyer baseline. Surprisingly, ChatGPT performed nearly identically to specialized legal AI products on accuracy, with the main advantage for legal AI being better citations from proprietary databases.
Details
Author(s)
Vals study team
Date
October 14, 2025
Summary
The Vals Legal AI Report evaluated four AI products (Alexi, Counsel Stack, Midpage, and ChatGPT) against a baseline of human lawyers on 200 US legal research questions. All four AI products outperformed the lawyer baseline, scoring between 74-78% compared to 69% for lawyers. The study scored responses on accuracy, authoritativeness of citations, and appropriateness of format.
The most striking finding is that ChatGPT performed nearly identically to specialized legal AI products, with both averaging 80% on accuracy scores. Legal AI products showed advantage only in citations and sourcing, scoring 6 points higher on authoritativeness due to their proprietary legal databases. However, the study notes this gap may close with the adoption of Deep Research capabilities, and all participants struggled significantly with multi-jurisdictional questions, particularly 50-state surveys.
The most striking finding is that ChatGPT performed nearly identically to specialized legal AI products, with both averaging 80% on accuracy scores. Legal AI products showed advantage only in citations and sourcing, scoring 6 points higher on authoritativeness due to their proprietary legal databases. However, the study notes this gap may close with the adoption of Deep Research capabilities, and all participants struggled significantly with multi-jurisdictional questions, particularly 50-state surveys.
Key Takeaways
1. All AI products outperformed human lawyers on legal research tasks, with AI scoring 74-78% compared to 69% for lawyers working without AI assistance. Where AI excelled, it outperformed lawyers by an average margin of 31 percentage points.
2. ChatGPT performed nearly identically to specialized legal AI products on accuracy (80% for both), with the primary advantage for legal AI being citations and sourcing from proprietary databases. This 6-point advantage in authoritativeness may diminish as Deep Research capabilities become more widely adopted.
3. Multi-jurisdictional questions proved challenging for all participants, with scores averaging 14 points lower than single-jurisdiction questions. The single 50-state survey question in the study was particularly difficult, with most participants either failing to provide responses, timing out, or providing answers without authoritative sources.
2. ChatGPT performed nearly identically to specialized legal AI products on accuracy (80% for both), with the primary advantage for legal AI being citations and sourcing from proprietary databases. This 6-point advantage in authoritativeness may diminish as Deep Research capabilities become more widely adopted.
3. Multi-jurisdictional questions proved challenging for all participants, with scores averaging 14 points lower than single-jurisdiction questions. The single 50-state survey question in the study was particularly difficult, with most participants either failing to provide responses, timing out, or providing answers without authoritative sources.
Why it matters?
This study challenges the assumption that specialized legal AI tools are necessary for accurate legal research, finding that ChatGPT performs nearly as well as products built on proprietary legal databases. The findings suggest law firms may not need to invest in expensive specialized tools for basic research accuracy, though citation quality remains an advantage for legal AI products accessing vetted source libraries.
Practical Implications
1. Lawyers can confidently use AI tools for legal research knowing they perform better than traditional methods, but should focus AI on tasks where it excels (straightforward legal questions) while reserving human review for complex multi-jurisdictional matters where lawyers still maintain an edge.
2. The near-identical performance between ChatGPT and specialized legal AI products on accuracy suggests firms should evaluate whether expensive specialized tools provide sufficient value beyond better citations, particularly for research that doesn't require extensive source documentation.
3. All AI products struggled with 50-state surveys and multi-jurisdictional questions, indicating lawyers should use specialized workflows or iterative prompting for these tasks rather than relying on single zero-shot queries, and should expect to provide more hands-on guidance for jurisdictionally complex research.
2. The near-identical performance between ChatGPT and specialized legal AI products on accuracy suggests firms should evaluate whether expensive specialized tools provide sufficient value beyond better citations, particularly for research that doesn't require extensive source documentation.
3. All AI products struggled with 50-state surveys and multi-jurisdictional questions, indicating lawyers should use specialized workflows or iterative prompting for these tasks rather than relying on single zero-shot queries, and should expect to provide more hands-on guidance for jurisdictionally complex research.
Citation
Vals AI, "VLAIR - Legal Research," October 14, 2025, https://www.vals.ai/industry-reports/vlair-10-14-25.
Publication
Vals
Read Full Study
