romrawinjp's picture
Update README.md
c3f134c verified
metadata
license: apache-2.0
datasets:
  - AIAT/Pangpuriye-dataset
  - AIAT/Pangpuriye-public_ThaiSum40k
  - AIAT/Pangpuriye-generated_by_LLama3-codeLlama
  - AIAT/Pangpuriye-public_alpaca-cleaned
  - AIAT/Pangpuriye-generated_by_typhoon
language:
  - th
  - en
pipeline_tag: text-generation
tags:
  - code_generation
  - sql
metrics:
  - accuracy

🤖 Super AI Engineer Development Program Season 4 - Pangpuriye Table-based Question Answering Model

logo

This model was fine-tuned from the original OpenThaiGPT-1.0.1-7b. The model is set under Apache license 2.0.

Example inference using huggingface transformers.

The following code is an exmaple of how to inference our model.

from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer
import pandas as pd

def get_prediction(raw_prediction):
    if "[/INST]" in raw_prediction:
        index = raw_prediction.index("[/INST]")
        return raw_prediction[index + 7:]

    return raw_prediction

tokenizer = LlamaTokenizer.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True)

schema = """your SQL schema"""
query = "หาจำนวนลูกค้าที่เป็นเพศชาย"

prompt = f"""
    [INST] <<SYS>>
    You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
    <</SYS>>
    {schema}### (sql extract) {query} [/INST]
"""

tokens = tokenizer(prompt, return_tensors="pt")
output = model.generate(tokens["input_ids"], max_new_tokens=20, eos_token_id=tokenizer.eos_token_id)
print(get_prediction(tokenizer.decode(output[0], skip_special_tokens=True)))

Acknowledgements

The model collaborated by the members of Panguriye's house during the LLMs hackathon in Super AI Engineer Development Program Season 4.

We thank the organizers of this hackathon, OpenThaiGPT, AIAT, NECTEC and ThaiSC for this challenging task and opportunity to be a part of developing Thai large language model.

Citation Information

If our work is useful for future development, please cite our model as follows:

@misc {artificial_intelligence_association_of_thailand_2024,
    author       = { {Artificial Intelligence Association of Thailand} },
    title        = { Pangpuriye-openthaigpt-1.0.0-7b-chat (Revision 21f9a62) },
    year         = 2024,
    url          = { https://huggingface.co/AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat },
    doi          = { 10.57967/hf/2193 },
    publisher    = { Hugging Face }
}