Edit model card
YAML Metadata Error: "datasets[2]" with value "Wiki-Tamil novels scrapped data" is not valid. If possible, use a dataset id from https://hf.co/datasets.

GPT2-Kalki

Model description

GPT2-Kalki is a GPT-2 transformer model fine-tuned on corpus of Tamil language data from Wikipedia. Has been specifically finetuned on the works of Kalki Krishnamurthy - a Tamil writer from the 1900s. This model is an experimentation of "What if" scenarios using the characters of his novels. The famous movie that has been released now Ponniyin Selvan - I is based on the novel written by the same author. This model is trained on an already trained model on Tamil dataset from GPT2-Tamil.

Dataset Used:

The GTP-2 model is trained on oscar dataset - ta and IndicNLP dataset - ta and manually scrapped Wikipedia dataset specifically having stories and novels. The scrapped dataset will be released soon.

Usage

You can use this model for Tamil text generation: python >>> from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline >>> tokenizer = AutoTokenizer.from_pretrained('tsaditya/GPT-Kalki') >>> model = AutoModelWithLMHead.from_pretrained('tsaditya/GPT-Kalki') >>> text = "ஆதித்த கரிகாலர் தஞ்சைக்குச் செல்ல உடனடியாக ஒப்புக்கொண்டார். " >>> encoded_text = tokenizer.encode(text, return_tensors='tf') >>> beam_output = model.generate( encoded_text, do_sample=True, max_length=512, top_k=50, top_p=0.95, num_return_sequences=1, no_repeat_ngram_size = 3, temperature = 0.7 ) >>> print(tokenizer.decode(beam_output[0], skip_special_tokens=True))

Downloads last month
16
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using tsaditya/GPT-Kalki 1