Edit model card

MMedLM

πŸ’»Github Repo πŸ–¨οΈarXiv Paper

The official model weights for "Towards Building Multilingual Language Model for Medicine".

Introduction

This repo contains MMedLM 2-1.8B , a multilingual medical foundation model with 1.8 billion parameters. MMedLM 2-1.8B builds upon the foundation of InternLM 2-1.8B and has been further pretrained on MMedC, a comprehensive multilingual medical corpus. This further pretraining enhances the model's medical-domain knowledge. With an auto-regressive continues training on MMedC, MMedLM 2-1.8B can exceed the performance of most 7B models, including InternLM and LLaMA 2.

The model underwent further pretraining on MMedC with the following hyperparameters:

  • Iterations: 15000
  • Global batch size: 512
  • Cutoff length: 2048
  • Learning rate: 2e-5

The model can be loaded as follows:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Henrychur/MMedLM2-1.8B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Henrychur/MMedLM2-1.8B", torch_dtype=torch.float16, trust_remote_code=True)
  • Note that this is a foundation model that has not undergone instruction fine-tuning.

News

[2023.3.1] We release MMedLM 2-1.8B, a 1.8B light-weight model based on InternLM 2-1.8B. With an auto-regressive continues training on MMedC, MMedLM 2-1.8B can exceed the performance of most 7B models, including InternLM and LLaMA 2.

[2024.2.21] Our pre-print paper is released ArXiv. Dive into our findings here.

[2024.2.20] We release MMedLM and MMedLM 2. With an auto-regressive continues training on MMedC, these models achieves superior performance compared to all other open-source models, even rivaling GPT-4 on MMedBench.

[2023.2.20] We release MMedC, a multilingual medical corpus containing 25.5B tokens.

[2023.2.20] We release MMedBench, a new multilingual medical multi-choice question-answering benchmark with rationale. Check out the leaderboard here.

Evaluation on MMedBench

The further pretrained MMedLM 2 showcast it's great performance in medical domain across different language.

Method Size Year MMedC MMedBench English Chinese Japanese French Russian Spanish Avg.
GPT-3.5 - 2022.12 βœ— βœ— 56.88 52.29 34.63 32.48 66.36 66.06 51.47
GPT-4 - 2023.3 βœ— βœ— 78.00 75.07 72.91 56.59 83.62 85.67 74.27
Gemini-1.0 pro - 2024.1 βœ— βœ— 53.73 60.19 44.22 29.90 73.44 69.69 55.20
BLOOMZ 7B 2023.5 βœ— trainset 43.28 58.06 32.66 26.37 62.89 47.34 45.10
InternLM 7B 2023.7 βœ— trainset 44.07 64.62 37.19 24.92 58.20 44.97 45.67
Llama 2 7B 2023.7 βœ— trainset 43.36 50.29 25.13 20.90 66.80 47.10 42.26
MedAlpaca 7B 2023.3 βœ— trainset 46.74 44.80 29.64 21.06 59.38 45.00 41.11
ChatDoctor 7B 2023.4 βœ— trainset 43.52 43.26 25.63 18.81 62.50 43.44 39.53
PMC-LLaMA 7B 2023.4 βœ— trainset 47.53 42.44 24.12 20.74 62.11 43.29 40.04
Mistral 7B 2023.10 βœ— trainset 61.74 71.10 44.72 48.71 74.22 63.86 60.73
InternLM 2 1.8B 2024.2 βœ— trainset 38.49 64.1 32.16 18.01 53.91 36.83 40.58
InternLM 2 7B 2024.2 βœ— trainset 57.27 77.55 47.74 41.00 68.36 59.59 58.59
MMedLM (Ours) 7B - βœ“ trainset 49.88 70.49 46.23 36.66 72.27 54.52 55.01
MMedLM 2(Ours) 7B - βœ“ trainset 61.74 80.01 61.81 52.09 80.47 67.65 67.30
MMedLM 2(Ours) 1.8B - βœ“ trainset 45.40 66.78 42.21 25.56 69.14 43.40 48.75
  • GPT and Gemini is evluated under zero-shot setting through API
  • Open-source models first undergo training on the trainset of MMedBench before evaluate.

Contact

If you have any question, please feel free to contact [email protected].

Citation

@misc{qiu2024building,
      title={Towards Building Multilingual Language Model for Medicine}, 
      author={Pengcheng Qiu and Chaoyi Wu and Xiaoman Zhang and Weixiong Lin and Haicheng Wang and Ya Zhang and Yanfeng Wang and Weidi Xie},
      year={2024},
      eprint={2402.13963},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
125
Safetensors
Model size
1.89B params
Tensor type
F32
Β·
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.

Dataset used to train Henrychur/MMedLM2-1_8B