Tamil Alpaca Orca Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

# Dataset Card for "tamil-alpaca"

This repository includes a Tamil-translated versions of the [Alpaca dataset](https://huggingface.co/datasets/yahma/alpaca-cleaned) and a subset of [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) dataset.

This dataset is part of the release of Tamil LLaMA family of models – an important step in advancing LLMs for the Tamil language. To dive deep into the development and capabilities of this model, please read the [research paper](https://arxiv.org/abs/2311.05845) and the [introductory blog post (WIP) ]() that outlines our journey and the model's potential impact.

**GitHub Repository:** [https://github.com/abhinand5/tamil-llama](https://github.com/abhinand5/tamil-llama)

## Models trained using this dataset

| Model                    | Type                        | Data              | Base Model           | # Params | Download Links                                                         |
|--------------------------|-----------------------------|-------------------|----------------------|------|------------------------------------------------------------------------|
| Tamil LLaMA 7B Instruct  | Instruction following model | 145k instructions | Tamil LLaMA 7B Base  | 7B   | [HF Hub](https://huggingface.co/abhinand/tamil-llama-7b-instruct-v0.1) |
| Tamil LLaMA 13B Instruct | Instruction following model | 145k instructions | Tamil LLaMA 13B Base | 13B  | [HF Hub](abhinand/tamil-llama-13b-instruct-v0.1)                       |

## Meet the Developers

Get to know the creators behind this innovative model and follow their contributions to the field:

- [Abhinand Balachandran](https://www.linkedin.com/in/abhinand-05/)

## Citation

If you use this model or any of the the Tamil-Llama datasets in your research, please cite:

```bibtex
@misc{balachandran2023tamilllama,
      title={Tamil-Llama: A New Tamil Language Model Based on Llama 2}, 
      author={Abhinand Balachandran},
      year={2023},
      eprint={2311.05845},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
```

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

---

Tamil Alpaca Orca

Dataset Card for "tamil-alpaca"

Models trained using this dataset

Meet the Developers

Citation

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Similar Datasets

IndicNLP Corpus

Tamil Alpaca

Usage

License

Modalities

Languages

Model	Type	Data	Base Model	# Params	Download Links
Tamil LLaMA 7B Instruct	Instruction following model	145k instructions	Tamil LLaMA 7B Base	7B	HF Hub
Tamil LLaMA 13B Instruct	Instruction following model	145k instructions	Tamil LLaMA 13B Base	13B	HF Hub