CodeXGLUE is a benchmark dataset and open challenge for code intelligence. It includes a collection of code intelligence tasks and a platform for model evaluation and comparison. CodeXGLUE stands for General Language Understanding Evaluation benchmark for CODE. It includes 14 datasets for 10 diversified code intelligence tasks covering the following scenarios:
161 PAPERS • 15 BENCHMARKS
PyTorrent contains 218,814 Python package libraries from PyPI and Anaconda environment. This is because earlier studies have shown that much of the code is redundant and Python packages from these environments are better in quality and are well-documented. PyTorrent enables users (such as data scientists, students, etc.) to build off the shelf machine learning models directly without spending months of effort on large infrastructure.
4 PAPERS • NO BENCHMARKS YET
Description
3 PAPERS • NO BENCHMARKS YET
SLNET is collection of third party Simulink models. It is curated via mining open source repository (GitHub and Matlab Central) using SLNET-Miner (https://github.com/50417/SLNet_Miner).
2 PAPERS • NO BENCHMARKS YET
DotPrompts is a set of testcases derived from PragmaticCode, such that each testcase consists of a prompt to a dereference location (a code location having the "." operator in Java). It is primarily meant as a benchmark for Code LMs.
1 PAPER • 1 BENCHMARK
MMCode is a multi-modal code generation dataset designed to evaluate the problem-solving skills of code language models in visually rich contexts (i.e. images). It contains 3,548 questions paired with 6,620 images, derived from real-world programming challenges across 10 code competition websites, with Python solutions and tests provided. The dataset emphasizes the extreme demand for reasoning abilities, the interwoven nature of textual and visual contents, and the occurrence of questions containing multiple images.
1 PAPER • NO BENCHMARKS YET
Syntax-Aware Fill-in-the-Middle (SAFIM) is a benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. SAFIM has three subtasks: Algorithmic Block Completion, Control-Flow Expression Completion, and API Function Call Completion. SAFIM is sourced from code submitted from April 2022 to January 2023 to minimize the impact of data contamination on evaluation results.
Verified Smart Contracts Code Comments is a dataset of real Ethereum smart contract functions, containing "code, comment" pairs of both Solidity and Vyper source code. The dataset is based on every deployed Ethereum smart contract as of 1st of April 2022, whose been verified on Etherscan and has a least one transaction. A total of 1,541,370 smart contract functions are provided, parsed from 186,397 unique smart contracts, filtered down from 2,217,692 smart contracts.
Verified Smart Contracts is a dataset of real Ethereum smart contracts, containing both Solidity and Vyper source code. It consists of every deployed Ethereum smart contract as of 1st of April 2022, whose been verified on Etherscan and has a least one transaction. A total of 186,397 unique smart contracts are provided, filtered down from 2,217,692 smart contracts. The dataset contains 53,843,305 lines of code.