6 dataset results for Code Repair

CodeXGLUE is a benchmark dataset and open challenge for code intelligence. It includes a collection of code intelligence tasks and a platform for model evaluation and comparison. CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified code intelligence tasks covering the following scenarios:

161 PAPERS • 15 BENCHMARKS

PyTorrent

PyTorrent contains 218,814 Python package libraries from PyPI and Anaconda environment. This is because earlier studies have shown that much of the code is redundant and Python packages from these environments are better in quality and are well-documented. PyTorrent enables users (such as data scientists, students, etc.) to build off the shelf machine learning models directly without spending months of effort on large infrastructure.

4 PAPERS • NO BENCHMARKS YET

CriticBench

CriticBench is a comprehensive benchmark designed to assess the abilities of Large Language Models (LLMs) to critique and rectify their reasoning across various tasks. It encompasses five reasoning domains:

1 PAPER • NO BENCHMARKS YET

DISL

DISL (Fueling Research with A Large Dataset of Solidity Smart Contracts)

DISL The full dataset report is available at: https://arxiv.org/abs/2403.16861

1 PAPER • NO BENCHMARKS YET

FixEval

We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of execution based evaluation compared to suboptimal match based evaluation metrics like BLEU, CodeBLEU, Syntax Match, Exact Match etc.

1 PAPER • NO BENCHMARKS YET

Performance Improving Code Edits (PIE) (Performance Improving Code Edits)

PIE stands for Performance Improving Code Edits. PIE contains trajectories of programs, where a programmer begins with an initial, slower version and iteratively makes changes to improve the program’s performance.

1 PAPER • NO BENCHMARKS YET

Datasets

6 dataset results for Code Repair