Overview

The LaMini Dataset is an instruction dataset generated using h2ogpt-gm-oasst1-en-2048-falcon-40b-v2. It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

Dataset Generation

  • Base Model: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2.
  • Seed Instructions: Sourced from databricks/databricks-dolly-15k dataset.
  • Generation Approach: Example-guided and topic-guided strategies.
  • Total Instructions: 1,504 unique instruction examples.

Dataset Sources

Structure

Each entry in the dataset contains: - Instruction - Response

Usage

The LaMini Dataset can be used to fine-tune language models to improve their ability to follow instructions and generate relevant responses.

Access

The dataset is available on HuggingFace at the following link: https://huggingface.co/datasets/SurgeGlobal/LaMini

Citation

If you find our work useful, please cite our paper as follows:

@misc{surge2024openbezoar,
      title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, 
      author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
      year={2024},
      eprint={2404.12195},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Dataset Authors

Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Apache 2.0

Modalities


Languages