Tamil Alpaca

Introduced by Balachandran in Tamil-Llama: A New Tamil Language Model Based on Llama 2

Dataset Card for "tamil-alpaca"

This repository includes a Tamil-translated version of the Alpaca dataset.

This dataset is part of the release of Tamil LLaMA family of models – an important step in advancing LLMs for the Tamil language. To dive deep into the development and capabilities of this model, please read the research paper and the introductory blog post (WIP) that outlines our journey and the model's potential impact.

GitHub Repository: https://github.com/abhinand5/tamil-llama

Models trained using this dataset

Model Type Data Base Model # Params Download Links
Tamil LLaMA 7B Instruct Instruction following model 145k instructions Tamil LLaMA 7B Base 7B HF Hub
Tamil LLaMA 13B Instruct Instruction following model 145k instructions Tamil LLaMA 13B Base 13B HF Hub

Meet the Developers

Get to know the creators behind this innovative model and follow their contributions to the field:

Citation

If you use this model or any of the the Tamil-Llama datasets in your research, please cite:

@misc{balachandran2023tamilllama,
      title={Tamil-Llama: A New Tamil Language Model Based on Llama 2}, 
      author={Abhinand Balachandran},
      year={2023},
      eprint={2311.05845},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • gpl-3.0

Modalities


Languages