SurgeGlobal/LaMini Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

## Overview
The LaMini Dataset is an instruction dataset generated using [h2ogpt-gm-oasst1-en-2048-falcon-40b-v2](https://huggingface.co/h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2). It is designed for instruction-tuning pre-trained models to specialize them in a variety of downstream tasks.

## Dataset Generation
- **Base Model**: h2oai/h2ogpt-gm-oasst1-en-2048-falcon-40b-v2.
- **Seed Instructions**: Sourced from databricks/databricks-dolly-15k dataset.
- **Generation Approach**: Example-guided and topic-guided strategies.
- **Total Instructions**: 1,504 unique instruction examples.

### Dataset Sources

- **Repository:** [Bitbucket Project](https://bitbucket.org/paladinanalytics/workspace/projects/OP)
- **Paper :** [Pre-Print](https://arxiv.org/abs/2404.12195)

## Structure
Each entry in the dataset contains:
- **Instruction**
- **Response**

## Usage
The LaMini Dataset can be used to fine-tune language models to improve their ability to follow instructions and generate relevant responses.

## Access
The dataset is available on HuggingFace at the following link: [https://huggingface.co/datasets/SurgeGlobal/LaMini](https://huggingface.co/datasets/SurgeGlobal/LaMini)

## Citation
If you find our work useful, please cite our paper as follows:
```
@misc{surge2024openbezoar,
      title={OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data}, 
      author={Chandeepa Dissanayake and Lahiru Lowe and Sachith Gunasekara and Yasiru Ratnayake},
      year={2024},
      eprint={2404.12195},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

## Dataset Authors

Chandeepa Dissanayake, Lahiru Lowe, Sachith Gunasekara, and Yasiru Ratnayake

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

SurgeGlobal/LaMini

Overview

Dataset Generation

Dataset Sources

Structure

Usage

Access

Citation

Dataset Authors

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

SurgeGlobal/Evol-Instruct

SurgeGlobal/Orca

Usage

License

Modalities

Languages

Similar Datasets

https://huggingface.co/datasets/SurgeGlobal/LaMini

SurgeGlobal/Evol-Instruct

SurgeGlobal/Evol-Instruct

SurgeGlobal/Orca

SurgeGlobal/Orca

SurgeGlobal/LaMini

Overview

Dataset Generation

Dataset Sources

Structure

Usage

Access

Citation

Dataset Authors

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

Similar Datasets

SurgeGlobal/Evol-Instruct

SurgeGlobal/Orca

Usage

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages