This repository includes a Tamil-translated versions of the Alpaca dataset and a subset of OpenOrca dataset.
This dataset is part of the release of Tamil LLaMA family of models – an important step in advancing LLMs for the Tamil language. To dive deep into the development and capabilities of this model, please read the research paper and the introductory blog post (WIP) that outlines our journey and the model's potential impact.
GitHub Repository: https://github.com/abhinand5/tamil-llama
Model | Type | Data | Base Model | # Params | Download Links |
---|---|---|---|---|---|
Tamil LLaMA 7B Instruct | Instruction following model | 145k instructions | Tamil LLaMA 7B Base | 7B | HF Hub |
Tamil LLaMA 13B Instruct | Instruction following model | 145k instructions | Tamil LLaMA 13B Base | 13B | HF Hub |
Get to know the creators behind this innovative model and follow their contributions to the field:
If you use this model or any of the the Tamil-Llama datasets in your research, please cite:
@misc{balachandran2023tamilllama,
title={Tamil-Llama: A New Tamil Language Model Based on Llama 2},
author={Abhinand Balachandran},
year={2023},
eprint={2311.05845},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Paper | Code | Results | Date | Stars |
---|