metadata

language:
  - as
  - bn
  - brx
  - doi
  - en
  - gom
  - gu
  - hi
  - kn
  - ks
  - kas
  - mai
  - ml
  - mr
  - mni
  - mnb
  - ne
  - or
  - pa
  - sa
  - sat
  - sd
  - snd
  - ta
  - te
  - ur
language_details: >-
  asm_Beng, ben_Beng, brx_Deva, doi_Deva, eng_Latn, gom_Deva, guj_Gujr,
  hin_Deva, kan_Knda, kas_Arab, kas_Deva, mai_Deva, mal_Mlym, mar_Deva,
  mni_Beng, mni_Mtei, npi_Deva, ory_Orya, pan_Guru, san_Deva, sat_Olck,
  snd_Arab, snd_Deva, tam_Taml, tel_Telu, urd_Arab
tags:
  - indictrans2
  - translation
  - ai4bharat
  - multilingual
license: mit
datasets:
  - flores-200
  - IN22-Gen
  - IN22-Conv
metrics:
  - bleu
  - chrf
  - chrf++
  - comet
inference: false

IndicTrans2

This is the model card of IndicTrans2 Indic-En Distilled 200M variant.

Please refer to section 7.6: Distilled Models in the TMLR submission for further details on model training, data and metrics.

Usage Instructions

Please refer to the github repository for a detail description on how to use HF compatible IndicTrans2 models for inference.

Note: IndicTrans2 is not compatible with AutoTokenizer, therefore we provide IndicTransTokenizer

Citation

If you consider using our work then please cite using:

@article{gala2023indictrans,
title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author={Jay Gala and Pranjal A Chitale and A K Raghavan and Varun Gumma and Sumanth Doddapaneni and Aswanth Kumar M and Janki Atul Nawale and Anupama Sujatha and Ratish Puduppully and Vivek Raghavan and Pratyush Kumar and Mitesh M Khapra and Raj Dabre and Anoop Kunchukuttan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=vfT4YuzAYA},
note={}
}