{"id": 1210172, "name": "Best MMLU score by country of origin", "unit": "%", "createdAt": "2026-03-25T15:35:38.000Z", "updatedAt": "2026-03-25T15:35:38.000Z", "coverage": "", "timespan": "", "datasetId": 7780, "shortUnit": "%", "columnOrder": 0, "shortName": "em", "catalogPath": "grapher/artificial_intelligence/2026-01-30/mmlu/mmlu_by_country#em", "descriptionShort": "Highest MMLU score achieved by a model from a given country or region, shown whenever a new record was set. MMLU is a multitask benchmark made up of exam-style questions across dozens of academic and professional subjects.", "descriptionFromProducer": "MMLU consists of four-choice questions spanning humanities, STEM, social sciences, and professional domains. Many questions require recall of domain facts, application of definitions, or light reasoning under time constraints\u2014skills analogous to standardized testing. Due to its breadth and stability, MMLU is frequently used as a headline indicator of general knowledge in model reports. Sub-scores by discipline can reveal strengths and weaknesses across subject areas.", "type": "float", "grapherConfigIdETL": "019d25a3-159d-7b83-8bdd-3c59963c854e", "datasetName": "Epoch AI Benchmark Data", "updatePeriodDays": 31, "datasetVersion": "2026-01-30", "nonRedistributable": false, "display": {"unit": "%", "zeroDay": "2021-08-05", "shortUnit": "%", "yearIsDay": true, "numDecimalPlaces": 1}, "schemaVersion": 2, "processingLevel": "major", "presentation": {"topicTagsLinks": ["Artificial Intelligence"]}, "descriptionKey": ["This indicator shows the highest MMLU score achieved so far by a model from a given country or region.", "MMLU (Massive Multitask Language Understanding) is a benchmark that tests AI models across 57 subjects, from school-level science and mathematics to professional fields such as law and medicine.", "Scores show the share of multiple-choice questions a model answered correctly. Some models are tested with example questions before answering new questions, while others are not.", "Models are grouped by country or region using information on where the developer is based.", "France, the United Kingdom, and Germany are grouped into a single \"Europe\" category.", "Data are compiled by Epoch AI from published papers, official leaderboards, and other primary sources."], "dimensions": {"years": {"values": [{"id": 0}, {"id": 125}, {"id": 175}, {"id": 222}, {"id": 236}, {"id": 586}, {"id": 587}, {"id": 665}, {"id": 705}, {"id": 714}, {"id": 762}, {"id": 780}, {"id": 784}, {"id": 819}, {"id": 858}, {"id": 986}, {"id": 1006}, {"id": 1012}, {"id": 1037}, {"id": 1050}, {"id": 1084}, {"id": 1121}, {"id": 1141}, {"id": 1146}, {"id": 1174}, {"id": 1203}, {"id": 1239}]}, "entities": {"values": [{"id": 13, "name": "United States", "code": "USA"}, {"id": 276, "name": "Europe", "code": "OWID_EUR"}, {"id": 72, "name": "United Arab Emirates", "code": "ARE"}, {"id": 171, "name": "China", "code": "CHN"}, {"id": 44, "name": "Canada", "code": "CAN"}]}}, "origins": [{"id": 14142, "title": "Epoch AI Benchmark Data", "description": "Comprehensive collection of AI benchmark datasets from Epoch AI, including FrontierMath and other performance benchmarks.", "producer": "Epoch AI", "citationFull": "Epoch AI, \u2018AI Benchmarking Hub\u2019. Published online at epoch.ai. Retrieved from \u2018https://epoch.ai/benchmarks\u2019 [online resource]. Accessed 30 Jan 2026.", "urlMain": "https://epoch.ai/benchmarks", "urlDownload": "https://epoch.ai/data/benchmark_data.zip", "dateAccessed": "2026-03-08", "datePublished": "2026-01-26", "license": {"url": "https://epoch.ai/about", "name": "CC BY 4.0"}}]}