--- library_name: transformers model-index: - name: internlm-chatbode-20b results: - task: type: text-generation name: Text Generation dataset: name: ENEM Challenge (No Images) type: eduagarcia/enem_challenge split: train args: num_few_shot: 3 metrics: - type: acc value: 65.78 name: accuracy source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BLUEX (No Images) type: eduagarcia-temp/BLUEX_without_images split: train args: num_few_shot: 3 metrics: - type: acc value: 58.69 name: accuracy source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: OAB Exams type: eduagarcia/oab_exams split: train args: num_few_shot: 3 metrics: - type: acc value: 43.33 name: accuracy source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 RTE type: assin2 split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 91.53 name: f1-macro source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Assin2 STS type: eduagarcia/portuguese_benchmark split: test args: num_few_shot: 15 metrics: - type: pearson value: 78.95 name: pearson source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: FaQuAD NLI type: ruanchaves/faquad-nli split: test args: num_few_shot: 15 metrics: - type: f1_macro value: 81.36 name: f1-macro source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HateBR Binary type: ruanchaves/hatebr split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 81.72 name: f1-macro source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: PT Hate Speech Binary type: hate_speech_portuguese split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 73.66 name: f1-macro source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: tweetSentBR type: eduagarcia/tweetsentbr_fewshot split: test args: num_few_shot: 25 metrics: - type: f1_macro value: 70.11 name: f1-macro source: url: >- https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=recogna-nlp/internlm-chatbode-20b name: Open Portuguese LLM Leaderboard language: - pt pipeline_tag: text-generation --- # internlm-chatbode-20b
O InternLm-ChatBode é um modelo de linguagem ajustado para o idioma português, desenvolvido a partir do modelo [InternLM2](https://huggingface.co/internlm/internlm2-chat-20b). Este modelo foi refinado através do processo de fine-tuning utilizando o dataset UltraAlpaca. ## Características Principais - **Modelo Base:** [internlm/internlm2-chat-20b](internlm/internlm2-chat-20b) - **Dataset para Fine-tuning:** UltraAlpaca - **Treinamento:** O treinamento foi realizado a partir do fine-tuning, usando QLoRA, do internlm2-chat-20b. ## Exemplo de uso A seguir um exemplo de código de como carregar e utilizar o modelo: ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("recogna-nlp/internlm-chatbode-20b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("recogna-nlp/internlm-chatbode-20b", torch_dtype=torch.float16, trust_remote_code=True).cuda() model = model.eval() response, history = model.chat(tokenizer, "Olá", history=[]) print(response) response, history = model.chat(tokenizer, "O que é o Teorema de Pitágoras? Me dê um exemplo", history=history) print(response) ``` As respostas podem ser geradas via stream utilizando o método `stream_chat`: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer model_path = "recogna-nlp/internlm-chatbode-20b" model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, trust_remote_code=True).cuda() tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) model = model.eval() length = 0 for response, history in model.stream_chat(tokenizer, "Olá", history=[]): print(response[length:], flush=True, end="") length = len(response) ``` # Open Portuguese LLM Leaderboard Evaluation Results Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/recogna-nlp/internlm-chatbode-20b) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard) | Metric | Value | |--------------------------|---------| |Average |**71.68**| |ENEM Challenge (No Images)| 65.78| |BLUEX (No Images) | 58.69| |OAB Exams | 43.33| |Assin2 RTE | 91.53| |Assin2 STS | 78.95| |FaQuAD NLI | 81.36| |HateBR Binary | 81.72| |PT Hate Speech Binary | 73.66| |tweetSentBR | 70.11|