---
language:
- en
license: apache-2.0
library_name: transformers
tags:
- drugs
- classification
- bert
datasets:
- lloydmeta/drug_dataset_cleaned
widget:
- text: 'I have been taking ambien or zolphidem for almost 15 years. '
  example_title: Drugs for insomnia
---

# Model Card for drug-BERT

This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on
the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and
is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug.


## Model Details

### Model Description

This is a multiclass classification model, built on top of [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased), trained on
the [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), and
is useful for making a best attempt classification for the _condition_ someone has, based on their review of a drug.

It was created as a learning exercise covering:
* Colab
* Transformer architecture
* Finetuning/training on top of existing NLP models
* Huggingface libraries


**Developed by:** [lloydmeta](http://github.com/lloydmeta) of [beachape.com](https://beachape.com)  
**License:** Apache 2.0  
**Finetuned from model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased)

## Uses

Classifying (identifying) the _condition_ someone has, based on their review of a drug.

### Out-of-Scope Use

Actual, clinical diagnosis.

## Bias, Risks, and Limitations

* Biases from the base `bert-base-uncased` model apply here
* Only drugs and conditions in the drugs review dataset are included

## How to Get Started with the Model

```python
from transformers import pipeline

condition_from_drug_review_classifier = pipeline("text-classification", model = "lloydmeta/drug-bert")
text_sentiment = "I have been taking ambien or zolphidem for almost 15 years."
condition_from_drug_review_classifier(text_sentiment)
```

## Training Details

### Training Data

* Trained on [Drug Review Dataset (Drugs.com)](https://archive.ics.uci.edu/dataset/462/drug+review+dataset+drugs+com), cleaned up by removing html tags from reviews, with
samples that lacked `condition` removed.
* 60% of the data set was split for training data.
* Irrelevant columns like `patient_id`, `drugName`, `rating`, `date`, etc were removed


### Training Procedure 

* `review` data was tokenised with a max of 512
* Learning rate: 2e-5
* Epochs: 3
* Weight decay: 0.01
* Per device train batch size: 4


## Evaluation

15% of the data set was split for evaluation.

### Testing Data, Factors & Metrics

#### Testing Data

25% of the data set was split for testing.

## Model Card Authors

[lloydmeta](http://github.com/lloydmeta)

## Model Card Contact

[lloydmeta](http://github.com/lloydmeta)