Code Generation
331 papers with code • 15 benchmarks • 43 datasets
Code Generation is an important field to predict explicit code or program structure from multimodal data sources such as incomplete code, programs in another programming language, natural language descriptions or execution examples. Code Generation tools can assist the development of automatic programming tools to improve programming productivity.
Source: Deep Learning for Source Code Modeling and Generation
Image source: Measuring Coding Challenge Competence With APPS
Libraries
Use these libraries to find Code Generation models and implementationsSubtasks
Most implemented papers
LLaMA: Open and Efficient Foundation Language Models
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters.
Llama 2: Open Foundation and Fine-Tuned Chat Models
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Evaluating Large Language Models Trained on Code
We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities.
pix2code: Generating Code from a Graphical User Interface Screenshot
Transforming a graphical user interface screenshot created by a designer into computer code is a typical task conducted by a developer in order to build customized software, websites, and mobile applications.
GPT-4 Technical Report
We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs.
StructVAE: Tree-structured Latent Variable Models for Semi-supervised Semantic Parsing
Semantic parsing is the task of transducing natural language (NL) utterances into formal meaning representations (MRs), commonly represented as tree structures.
A Syntactic Neural Model for General-Purpose Code Generation
We consider the problem of parsing natural language descriptions into source code written in a general-purpose programming language like Python.
A parallel corpus of Python functions and documentation strings for automated code documentation and code generation
Automated documentation of programming source code and automated code generation from natural language are challenging tasks of both practical and scientific interest.
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
We present CodeT5, a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers.
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
To democratize this, we train and release a family of large language models up to 16. 1B parameters, called CODEGEN, on natural language and programming language data, and open source the training library JAXFORMER.