davanstrien's picture
davanstrien HF staff
Add BERTopic model
379fb19
|
raw
history blame
5.17 kB
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

transformers_issues_topics

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("davanstrien/transformers_issues_topics")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 30
  • Number of training documents: 7235
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 encoder - bert - tensorflow - decoder - output 11 -1_encoder_bert_tensorflow_decoder
0 tokenizer - tokenizers - tokenization - tokenize - berttokenizer 2265 0_tokenizer_tokenizers_tokenization_tokenize
1 cuda - runtimeerror - conda - pytorch - tensorflow 1513 1_cuda_runtimeerror_conda_pytorch
2 readmemd - readmetxt - readme - docstring - docstrings 763 2_readmemd_readmetxt_readme_docstring
3 trainertrain - trainer - trainertfpy - trainers - training 550 3_trainertrain_trainer_trainertfpy_trainers
4 rag - roberta - robertatokenizer - robertatokenizerfast - robertabase 546 4_rag_roberta_robertatokenizer_robertatokenizerfast
5 modelcard - modelcards - card - model - cards 473 5_modelcard_modelcards_card_model
6 importerror - transformerscli - transformers - transformerxl - transformer 432 6_importerror_transformerscli_transformers_transformerxl
7 seq2seq - seq2seqtrainer - seq2seqdataset - runseq2seq - examplesseq2seq 405 7_seq2seq_seq2seqtrainer_seq2seqdataset_runseq2seq
8 gpt2 - gpt2tokenizer - gpt2xl - gpt2tokenizerfast - gpt 365 8_gpt2_gpt2tokenizer_gpt2xl_gpt2tokenizerfast
9 t5 - t5model - t5base - t5large - tf 289 9_t5_t5model_t5base_t5large
10 tests - testing - speedup - test - testgeneratefp16 230 10_tests_testing_speedup_test
11 questionansweringpipeline - questionanswering - answering - questionasnwering - distilbertforquestionanswering 138 11_questionansweringpipeline_questionanswering_answering_questionasnwering
12 ner - pipeline - pipelinener - pipelines - pipelineframework 138 12_ner_pipeline_pipelinener_pipelines
13 deberta - debertav2 - debertav2initpy - debertatokenizer - distilbertmodel 132 13_deberta_debertav2_debertav2initpy_debertatokenizer
14 onnxonnxruntime - onnx - onnxexport - 04onnxexport - 04onnxexportipynb 110 14_onnxonnxruntime_onnx_onnxexport_04onnxexport
15 benchmark - benchmarks - accuracy - precision - comparison 85 15_benchmark_benchmarks_accuracy_precision
16 labelsmoothingfactor - labelsmoothednllloss - labelsmoothing - labels - label 79 16_labelsmoothingfactor_labelsmoothednllloss_labelsmoothing_labels
17 longformer - longformers - longform - longformerforqa - longformerlayer 71 17_longformer_longformers_longform_longformerforqa
18 generationbeamsearchpy - generatebeamsearch - beamsearch - nonbeamsearch - beam 60 18_generationbeamsearchpy_generatebeamsearch_beamsearch_nonbeamsearch
19 cachedir - cache - cachedpath - caching - cached 58 19_cachedir_cache_cachedpath_caching
20 wav2vec2 - wav2vec - wav2vec20 - wav2vec2forctc - wav2vec2xlrswav2vec2 56 20_wav2vec2_wav2vec_wav2vec20_wav2vec2forctc
21 flax - flaxelectraformaskedlm - flaxelectraforpretraining - flaxjax - flaxelectramodel 52 21_flax_flaxelectraformaskedlm_flaxelectraforpretraining_flaxjax
22 wandbproject - wandb - wandbcallback - wandbdisabled - wandbdisabledtrue 49 22_wandbproject_wandb_wandbcallback_wandbdisabled
23 electra - electrapretrainedmodel - electraformaskedlm - electraformultiplechoice - electrafortokenclassification 38 23_electra_electrapretrainedmodel_electraformaskedlm_electraformultiplechoice
24 layoutlm - layout - layoutlmtokenizer - layoutlmbaseuncased - tf 24 24_layoutlm_layout_layoutlmtokenizer_layoutlmbaseuncased
25 notebook - notebooks - community - text - multilabel 18 25_notebook_notebooks_community_text
26 dict - dictstr - returndict - parse - arguments 18 26_dict_dictstr_returndict_parse
27 pplm - pr - deprecated - variable - ppl 17 27_pplm_pr_deprecated_variable
28 isort - github - repo - version - setupcfg 15 28_isort_github_repo_version

Training hyperparameters

  • calculate_probabilities: False
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: 30
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.29.2
  • Numba: 0.56.4
  • Plotly: 5.13.1
  • Python: 3.10.11