TaxProf Weblog

bizadmin

May 31, 2023

[ad_1]

Wednesday, Could 31, 2023

Re-Evaluating GPT-4’s Bar Examination Efficiency

Following up on my earlier publish, GPT-4 Beats 90% Of Aspiring Attorneys On The Bar Examination: Eric Martínez (MIT; Google Scholar), Re-Evaluating GPT-4’s Bar Examination Efficiency:

Maybe essentially the most extensively touted of GPT-4’s at-launch, zero-shot capabilities has been its reported Ninetieth-percentile efficiency on the Uniform Bar Examination, with its reported 80-percentile-points increase over its predecessor, GPT-3.5, far exceeding that for some other examination. This paper investigates the methodological challenges in documenting and verifying the Ninetieth-percentile declare, presenting 4 units of findings that recommend that OpenAI’s estimates of GPT-4’s UBE percentile, although clearly a formidable leap over these of GPT-3.5, seem like overinflated, notably if taken as a “conservative” estimate representing “the decrease vary of percentiles,” and moreso if meant to mirror the precise capabilities of a training lawyer.

First, though GPT-4’s UBE rating nears the Ninetieth percentile when analyzing approximate conversions from February administrations of the Illinois Bar Examination, these estimates are closely skewed in direction of repeat test-takers who failed the July administration and rating considerably decrease than the overall test-taking inhabitants. Second, information from a latest July administration of the identical examination suggests GPT-4’s general UBE percentile was ~68th percentile, and ~forty eighth percentile on essays. Third, analyzing official NCBE information and utilizing a number of conservative statistical assumptions, GPT-4’s efficiency in opposition to first-time check takers is estimated to be ~63rd percentile, together with ~forty first percentile on essays. Fourth, when analyzing solely those that handed the examination (i.e. licensed or license-pending attorneys), GPT-4’s efficiency is estimated to drop to ~forty eighth percentile general, and ~fifteenth percentile on essays.

Taken collectively, these findings carry well timed insights for the desirability and feasibility of outsourcing legally related duties to AI fashions, in addition to for the significance for AI builders to implement rigorous and clear capabilities evaluations to assist safe protected and reliable AI.

https://taxprof.typepad.com/taxprof_blog/2023/05/re-evaluating-gpt-4s-bar-exam-performance.html

[ad_2]

Re-Evaluating GPT-4’s Bar Examination Efficiency

LEAVE A REPLY Cancel reply