MIMIC-IT

Introduced by Li et al. in Otter: A Multi-Modal Model with In-Context Instruction Tuning

MultI-Modal In-Context Instruction Tuning (MIMIC-IT) is a dataset for instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. The data sample consists of a queried image-instruction-answer triplet, with the instruction-answer tailored to the image, and context. The context contains a series of image-instruction-answer triplets that contextually correlate with the queried triplet, emulating the relationship between the context and the queried image-text pair found in the MMC4 dataset.

Source: Otter: A Multi-Modal Model with In-Context Instruction Tuning

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Instruction Following

Similar Datasets

GRIT

VisIT-Bench

Bongard-OpenWorld

SciGraphQA

Source: Otter: A Multi-Modal Model with In-Context Instruction Tuning.

Usage

License

MIT license

Modalities

Images
Texts

MIMIC-IT

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit