--- language: - en license: apache-2.0 library_name: transformers tags: - drugs - classification - bert datasets: - lloydmeta/drug_dataset_cleaned widget: - text: 'I have been taking ambien or zolphidem for almost 15 years. ' example_title: Drugs for insomnia --- # Model Card for drug-BERT This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug. ## Model Details ### Model Description This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug. It was created as a learning exercise covering: * Colab * Transformer architecture * Finetuning/training on top of existing NLP models * Huggingface libraries **Developed by:** [lloydmeta](http://github.com/lloydmeta) of [beachape.com](https://beachape.com) **License:** Apache 2.0 **Finetuned from model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) ## Uses Classifying (identifying) the _condition_ someone has, based on their review of a drug. ### Out-of-Scope Use Actual, clinical diagnosis. ## Bias, Risks, and Limitations * Biases from the base `bert-base-uncased` model apply here * Only drugs and conditions in the drugs review dataset are included ## How to Get Started with the Model ```python from transformers import pipeline condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert") text_sentiment = "I have been taking ambien or zolphidem for almost 15 years." condition_from_drug_review_classifier(text_sentiment) ``` ## Training Details ### Training Data * Trained on [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), cleaned up by removing html tags from reviews, with samples that lacked `condition` removed. * 60% of the data set was split for training data. * Irrelevant columns like `patient_id`, `drugName`, `rating`, `date`, etc were removed ### Training Procedure * `review` data was tokenised with a max of 512 * Learning rate: 2e-5 * Epochs: 3 * Weight decay: 0.01 * Per device train batch size: 4 ## Evaluation 15% of the data set was split for evaluation. ### Testing Data, Factors & Metrics #### Testing Data 25% of the data set was split for testing. ## Model Card Authors [lloydmeta](http://github.com/lloydmeta) ## Model Card Contact [lloydmeta](http://github.com/lloydmeta)