ParEval Leaderboard: Evaluating the Ability of Large Language Models to Generate Parallel Code
We introduced the ParEval benchmark in “Can Large Language Models Write Parallel Code?” to evaluate the capability of LLMs at parallel code generation. We found a significant gap between their ability to generate sequential vs parallel code for a large array of computational problems and parallel programming models. On this page we keep an up-to-date table tracking the progress of state-of-the-art LLMs on ParEval.
ParEval Results
Last updated March 5, 2024
If you would like a model added you can reach out to dnicho@umd.edu or open an issue in the GitHub repo.
Citing ParEval
@misc{nichols2024large,
title={Can Large Language Models Write Parallel Code?},
author={Daniel Nichols and Joshua H. Davis and Zhaojun Xie and
Arjun Rajaram and Abhinav Bhatele},
year={2024},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing},
series = {HPDC '24}
}