pszemraj's picture
Update README.md
20f1883
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification
license: apache-2.0
datasets:
  - kmfoda/booksum
language:
  - en
inference: false

BERTopic-booksum-ngram1-sentence-t5-xl-chapter

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic safetensors

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("pszemraj/BERTopic-booksum-ngram1-sentence-t5-xl-chapter")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 138
  • Number of training documents: 70840
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 were - her - was - had - she 30 -1_were_her_was_had
0 were - had - was - could - miss 28715 0_were_had_was_could
1 artagnan - athos - musketeers - porthos - treville 16916 1_artagnan_athos_musketeers_porthos
2 rama - ravan - brahma - lakshman - raghu 4563 2_rama_ravan_brahma_lakshman
3 were - canoe - hist - huron - hutter 1268 3_were_canoe_hist_huron
4 slave - were - slavery - had - was 1011 4_slave_were_slavery_had
5 holmes - sherlock - watson - moor - baskerville 580 5_holmes_sherlock_watson_moor
6 prisoner - milady - felton - were - madame 549 6_prisoner_milady_felton_were
7 coriolanus - cassius - brutus - sicinius - titus 527 7_coriolanus_cassius_brutus_sicinius
8 confederation - constitution - federal - states - senate 511 8_confederation_constitution_federal_states
9 heathcliff - catherine - wuthering - cathy - hindley 498 9_heathcliff_catherine_wuthering_cathy
10 were - seemed - rima - was - had 492 10_were_seemed_rima_was
11 laws - lawes - law - civill - actions 452 11_laws_lawes_law_civill
12 fang - wolf - fangs - musher - growl 401 12_fang_wolf_fangs_musher
13 sigurd - thorgeir - thord - gunnar - skarphedinn 395 13_sigurd_thorgeir_thord_gunnar
14 achilles - troy - patroclus - aeneas - ulysses 385 14_achilles_troy_patroclus_aeneas
15 fogg - passengers - passed - phileas - travellers 376 15_fogg_passengers_passed_phileas
16 troy - trojans - aeneas - fates - trojan 370 16_troy_trojans_aeneas_fates
17 disciples - jesus - pharisees - temple - jerusalem 340 17_disciples_jesus_pharisees_temple
18 helsing - harker - diary - dr - he 324 18_helsing_harker_diary_dr
19 lama - who - no - kim - am 312 19_lama_who_no_kim
20 sara - princess - herself - she - minchin 301 20_sara_princess_herself_she
21 horses - horse - saddle - stable - were 293 21_horses_horse_saddle_stable
22 hester - pearl - scarlet - her - human 292 22_hester_pearl_scarlet_her
23 candide - inquisitor - friar - cunegonde - philosopher 286 23_candide_inquisitor_friar_cunegonde
24 dick - aunt - were - could - had 275 24_dick_aunt_were_could
25 wolves - wolf - cub - hunger - were 261 25_wolves_wolf_cub_hunger
26 god - gods - consequences - satan - som 241 26_god_gods_consequences_satan
27 modesty - women - behaviour - human - woman 240 27_modesty_women_behaviour_human
28 society - education - distribution - service - labour 240 28_society_education_distribution_service
29 siddhartha - buddha - gotama - kamaswami - om 237 29_siddhartha_buddha_gotama_kamaswami
30 ship - captain - aboard - squire - ll 229 30_ship_captain_aboard_squire
31 cyrano - roxane - montfleury - hark - love 227 31_cyrano_roxane_montfleury_hark
32 alice - were - rabbit - hare - hatter 225 32_alice_were_rabbit_hare
33 toto - kansas - dorothy - oz - scarecrow 211 33_toto_kansas_dorothy_oz
34 lancelot - camelot - merlin - guinevere - arthur 209 34_lancelot_camelot_merlin_guinevere
35 were - soldiers - seemed - soldier - th 201 35_were_soldiers_seemed_soldier
36 were - was - fields - seemed - hills 200 36_were_was_fields_seemed
37 reason - thyself - actions - thine - life 179 37_reason_thyself_actions_thine
38 hetty - her - she - judith - were 170 38_hetty_her_she_judith
39 othello - iago - desdemona - ll - roderigo 170 39_othello_iago_desdemona_ll
40 wildeve - yes - were - vye - was 165 40_wildeve_yes_were_vye
41 utilitarian - morality - morals - virtue - moral 165 41_utilitarian_morality_morals_virtue
42 ransom - isaac - thine - thy - shekels 163 42_ransom_isaac_thine_thy
43 weasels - rat - ratty - toad - badger 157 43_weasels_rat_ratty_toad
44 philip - he - were - vicar - was 155 44_philip_he_were_vicar
45 macbeth - banquo - macduff - fleance - murderer 154 45_macbeth_banquo_macduff_fleance
46 lydgate - bulstrode - himself - he - had 145 46_lydgate_bulstrode_himself_he
47 capulet - romeo - juliet - verona - mercutio 142 47_capulet_romeo_juliet_verona
48 dying - her - were - helen - she 141 48_dying_her_were_helen
49 anne - avonlea - diana - her - marilla 141 49_anne_avonlea_diana_her
50 tartuffe - scene - dorine - pernelle - scoundrel 140 50_tartuffe_scene_dorine_pernelle
51 were - yes - had - was - no 139 51_were_yes_had_was
52 jekyll - hyde - were - myself - had 135 52_jekyll_hyde_were_myself
53 loved - were - philip - was - could 128 53_loved_were_philip_was
54 falstaff - mistress - ford - forsooth - windsor 127 54_falstaff_mistress_ford_forsooth
55 hurstwood - were - barn - had - was 127 55_hurstwood_were_barn_had
56 provost - capell - collier - conj - pope 126 56_provost_capell_collier_conj
57 gretchen - highness - chancellor - hildegarde - yes 125 57_gretchen_highness_chancellor_hildegarde
58 delamere - watson - dr - ll - no 124 58_delamere_watson_dr_ll
59 jem - her - were - felt - margaret 123 59_jem_her_were_felt
60 beowulf - grendel - hrothgar - wiglaf - hero 111 60_beowulf_grendel_hrothgar_wiglaf
61 verloc - seemed - was - were - had 102 61_verloc_seemed_was_were
62 hamlet - guildenstern - rosencrantz - fortinbras - polonius 102 62_hamlet_guildenstern_rosencrantz_fortinbras
63 corey - mrs - yes - business - lapham 101 63_corey_mrs_yes_business
64 projectiles - cannon - projectile - distance - satellite 99 64_projectiles_cannon_projectile_distance
65 piano - musical - music - played - beethoven 98 65_piano_musical_music_played
66 wedding - bridegroom - were - marriage - looked 93 66_wedding_bridegroom_were_marriage
67 juan - her - fame - some - had 92 67_juan_her_fame_some
68 were - looked - felt - her - had 91 68_were_looked_felt_her
69 staked - gambling - wildeve - stakes - dice 91 69_staked_gambling_wildeve_stakes
70 mistress - leonora - wanted - florence - was 89 70_mistress_leonora_wanted_florence
71 delano - ship - sailor - captain - benito 87 71_delano_ship_sailor_captain
72 yes - goring - no - robert - room 85 72_yes_goring_no_robert
73 stockmann - yes - horster - mayor - dr 81 73_stockmann_yes_horster_mayor
74 ll - were - looked - carl - was 80 74_ll_were_looked_carl
75 barber - philosophy - no - some - man 78 75_barber_philosophy_no_some
76 tom - maggie - came - had - tulliver 78 76_tom_maggie_came_had
77 middlemarch - hustings - candidate - brooke - may 75 77_middlemarch_hustings_candidate_brooke
78 inspector - verloc - yes - affair - police 75 78_inspector_verloc_yes_affair
79 scrooge - merry - no - christmas - man 73 79_scrooge_merry_no_christmas
80 coquenard - mutton - served - were - pudding 70 80_coquenard_mutton_served_were
81 yes - no - jack - ll - tell 69 81_yes_no_jack_ll
82 seth - lisbeth - th - ud - no 67 82_seth_lisbeth_th_ud
83 higgins - eliza - her - she - liza 66 83_higgins_eliza_her_she
84 yarmouth - were - went - had - was 65 84_yarmouth_were_went_had
85 servian - sergius - yes - catherine - no 64 85_servian_sergius_yes_catherine
86 service - army - salvation - institution - training 61 86_service_army_salvation_institution
87 condemn - ff - pray - mercy - conj 58 87_condemn_ff_pray_mercy
88 lucy - bartlett - were - could - she 57 88_lucy_bartlett_were_could
89 wills - seemed - bequest - were - testator 54 89_wills_seemed_bequest_were
90 scene - iii - malvolio - valentine - cesario 54 90_scene_iii_malvolio_valentine
91 fuss - think - ll - thinks - oh 53 91_fuss_think_ll_thinks
92 hermia - demetrius - helena - theseus - helen 50 92_hermia_demetrius_helena_theseus
93 seemed - rochester - were - had - yes 50 93_seemed_rochester_were_had
94 sorrow - mourned - myself - had - was 48 94_sorrow_mourned_myself_had
95 gerty - sleepless - tea - weariness - tired 48 95_gerty_sleepless_tea_weariness
96 rushworth - crawford - were - sotherton - was 47 96_rushworth_crawford_were_sotherton
97 reasoning - syllogisme - names - signification - definitions 46 97_reasoning_syllogisme_names_signification
98 could - caleb - sure - work - no 46 98_could_caleb_sure_work
99 rose - tears - hope - tell - wish 46 99_rose_tears_hope_tell
100 peggotty - em - gummidge - he - ll 46 100_peggotty_em_gummidge_he
101 time - future - story - paradox - traveller 46 101_time_future_story_paradox
102 cleopatra - antony - caesar - loved - slave 45 102_cleopatra_antony_caesar_loved
103 appendicitis - doctors - doctor - dr - wanted 45 103_appendicitis_doctors_doctor_dr
104 slept - awoke - waking - sleep - seemed 44 104_slept_awoke_waking_sleep
105 parlour - room - seemed - sat - had 43 105_parlour_room_seemed_sat
106 prophets - scripture - prophet - moses - prophecy 43 106_prophets_scripture_prophet_moses
107 letter - honour - adieu - duval - evelina 43 107_letter_honour_adieu_duval
108 complications - cranky - had - tanis - was 43 108_complications_cranky_had_tanis
109 fled - armies - brussels - imperial - napoleon 42 109_fled_armies_brussels_imperial
110 philip - easel - greco - impressionists - manet 42 110_philip_easel_greco_impressionists
111 harlings - harling - frances - were - shimerdas 40 111_harlings_harling_frances_were
112 jane - mrs - janet - eyre - her 40 112_jane_mrs_janet_eyre
113 prisoner - confinement - prisoners - prison - gaoler 40 113_prisoner_confinement_prisoners_prison
114 hardcastle - marlow - impudence - constance - modesty 40 114_hardcastle_marlow_impudence_constance
115 horatio - murder - revenge - sorrow - hieronimo 40 115_horatio_murder_revenge_sorrow
116 traddles - had - married - room - horace 39 116_traddles_had_married_room
117 philip - tell - feelings - was - remember 38 117_philip_tell_feelings_was
118 nervous - countenance - seemed - he - huxtable 38 118_nervous_countenance_seemed_he
119 rogers - wanted - lapham - could - silas 38 119_rogers_wanted_lapham_could
120 titus - timon - varro - servilius - alcibiades 37 120_titus_timon_varro_servilius
121 morality - justice - moral - impartiality - unjust 37 121_morality_justice_moral_impartiality
122 willard - elmer - were - was - henderson 37 122_willard_elmer_were_was
123 had - was - could - circumstances - possession 37 123_had_was_could_circumstances
124 monkey - he - sahib - rat - sara 36 124_monkey_he_sahib_rat
125 mcmurdo - mcginty - cormac - police - scanlan 36 125_mcmurdo_mcginty_cormac_police
126 hetty - herself - she - her - had 36 126_hetty_herself_she_her
127 dimmesdale - reverend - chillingworth - clergyman - deacon 35 127_dimmesdale_reverend_chillingworth_clergyman
128 formerly - eliza - was - friend - friends 34 128_formerly_eliza_was_friend
129 were - seemed - had - was - felt 34 129_were_seemed_had_was
130 prisoner - jerry - lorry - tellson - court 33 130_prisoner_jerry_lorry_tellson
131 macmurdo - wenham - captain - steyne - crawley 33 131_macmurdo_wenham_captain_steyne
132 ducal - duchy - xv - fetes - theatre 32 132_ducal_duchy_xv_fetes
133 chapter - book - dows - unt - windowpane 32 133_chapter_book_dows_unt
134 money - riches - things - risk - thoughts 31 134_money_riches_things_risk
135 bethy - beth - seemed - sister - her 31 135_bethy_beth_seemed_sister
136 oliver - pickwick - were - was - inn 30 136_oliver_pickwick_were_was

Training hyperparameters

  • calculate_probabilities: True
  • language: None
  • low_memory: False
  • min_topic_size: 30
  • n_gram_range: (1, 1)
  • nr_topics: auto
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: True

Framework versions

  • Numpy: 1.24.3
  • HDBSCAN: 0.8.29
  • UMAP: 0.5.3
  • Pandas: 2.0.2
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.30.2
  • Numba: 0.57.1
  • Plotly: 5.15.0
  • Python: 3.10.11