limited pre-training and guard rails
hi Abinaya and team?:
This is great effort; I think you should write a paper and post to Arxiv on this topic and significant contribution for Tamil.
Can you release the methodology for training, tokenization and encoding representations ?
However since the model seems to be having some limited self correction and guard rails, and the model has limited cleanup of personally-identifiable information it should be mentioned in the announcement and user guide. There should be more guard rails added to this model and harmful content generation should be listed.
Thank you
-Muthu Annamalai
Hi @mannamalai : Sure, will try to write up a paper outlining the points you mentioned. This model was built as part of a hackathon and the amount of data used to pretrain the model is super less. We have plans to improve this model further as part of the AI Tamil Nadu's initiative.