KingKazma's picture
Add BERTopic model
8362e3e
metadata
tags:
  - bertopic
library_name: bertopic
pipeline_tag: text-classification

xsum_108_50000_25000_test

This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.

Usage

To use this model, please install BERTopic:

pip install -U bertopic

You can use the model as follows:

from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/xsum_108_50000_25000_test")

topic_model.get_topic_info()

Topic overview

  • Number of topics: 84
  • Number of training documents: 11334
Click here for an overview of all topics.
Topic ID Topic Keywords Topic Frequency Label
-1 said - mr - people - would - year 5 -1_said_mr_people_would
0 win - goal - game - league - foul 4854 0_win_goal_game_league
1 police - court - said - officer - mr 1637 1_police_court_said_officer
2 labour - party - eu - election - vote 872 2_labour_party_eu_election
3 health - care - nhs - patient - cancer 341 3_health_care_nhs_patient
4 olympic - sport - race - gold - medal 325 4_olympic_sport_race_gold
5 cricket - england - wicket - test - captain 278 5_cricket_england_wicket_test
6 animal - dog - bird - whale - specie 206 6_animal_dog_bird_whale
7 bridge - rail - council - said - transport 199 7_bridge_rail_council_said
8 school - education - student - teacher - university 193 8_school_education_student_teacher
9 bank - rate - growth - economy - market 181 9_bank_rate_growth_economy
10 syria - syrian - iraq - iran - force 145 10_syria_syrian_iraq_iran
11 energy - industry - wind - electricity - company 119 11_energy_industry_wind_electricity
12 film - actress - star - actor - character 80 12_film_actress_star_actor
13 president - boko - african - haram - mr 79 13_president_boko_african_haram
14 fire - blaze - service - smoke - said 79 14_fire_blaze_service_smoke
15 trump - mr - republican - trumps - president 75 15_trump_mr_republican_trumps
16 music - album - song - band - singer 70 16_music_album_song_band
17 race - hamilton - f1 - mercedes - lap 68 17_race_hamilton_f1_mercedes
18 space - earth - planet - solar - orbit 61 18_space_earth_planet_solar
19 lifeboat - rnli - beach - coastguard - rescue 55 19_lifeboat_rnli_beach_coastguard
20 flood - flooding - water - weather - rain 55 20_flood_flooding_water_weather
21 fight - boxing - champion - joshua - ali 54 21_fight_boxing_champion_joshua
22 plane - aircraft - flight - passenger - pilot 54 22_plane_aircraft_flight_passenger
23 earthquake - quake - flood - people - water 53 23_earthquake_quake_flood_people
24 russian - russia - ukraine - putin - ukrainian 49 24_russian_russia_ukraine_putin
25 murray - match - wimbledon - tennis - konta 47 25_murray_match_wimbledon_tennis
26 bitcoin - security - talktalk - data - tor 44 26_bitcoin_security_talktalk_data
27 round - birdie - bogey - par - shot 41 27_round_birdie_bogey_par
28 ireland - dup - sinn - northern - party 39 28_ireland_dup_sinn_northern
29 maduro - venezuela - president - venezuelan - opposition 36 29_maduro_venezuela_president_venezuelan
30 yn - ar - yr - ei - wedi 36 30_yn_ar_yr_ei
31 painting - art - gallery - portrait - museum 34 31_painting_art_gallery_portrait
32 unsupported - updated - bst - playback - media 33 32_unsupported_updated_bst_playback
33 migrant - eu - asylum - turkey - germany 31 33_migrant_eu_asylum_turkey
34 stone - cave - discovery - site - tree 30 34_stone_cave_discovery_site
35 parade - poppy - flag - jesus - statue 30 35_parade_poppy_flag_jesus
36 drug - cannabis - drugs - heroin - cocaine 27 36_drug_cannabis_drugs_heroin
37 church - pope - bishop - vatican - cardinal 27 37_church_pope_bishop_vatican
38 greek - greece - bailout - eurozone - bank 27 38_greek_greece_bailout_eurozone
39 nama - ireland - northern - cerberus - irish 26 39_nama_ireland_northern_cerberus
40 prison - prisoner - prisons - justice - turing 25 40_prison_prisoner_prisons_justice
41 radio - show - bbc - series - programme 24 41_radio_show_bbc_series
42 fifa - blatter - platini - fifas - football 23 42_fifa_blatter_platini_fifas
43 tesco - sale - store - supermarket - customer 23 43_tesco_sale_store_supermarket
44 china - taiwan - chinese - hong - taiwans 22 44_china_taiwan_chinese_hong
45 afghan - taliban - afghanistan - mansour - mullah 22 45_afghan_taliban_afghanistan_mansour
46 council - local - funding - government - authority 22 46_council_local_funding_government
47 nsa - encryption - cia - snowden - us 21 47_nsa_encryption_cia_snowden
48 ice - glacier - temperature - ocean - climate 21 48_ice_glacier_temperature_ocean
49 osullivan - world - snooker - beat - champion 21 49_osullivan_world_snooker_beat
50 book - prize - novel - author - award 20 50_book_prize_novel_author
51 auschwitz - jews - holocaust - camp - winton 20 51_auschwitz_jews_holocaust_camp
52 samsung - apple - phone - company - battery 19 52_samsung_apple_phone_company
53 picture - image - pictures - please - submit 19 53_picture_image_pictures_please
54 korea - north - korean - missile - koreas 19 54_korea_north_korean_missile
55 pension - worker - pay - work - hour 19 55_pension_worker_pay_work
56 pen - fillon - le - macron - mr 18 56_pen_fillon_le_macron
57 paris - eaw - french - attack - suspect 18 57_paris_eaw_french_attack
58 content - app - tv - digital - apple 18 58_content_app_tv_digital
59 israel - israeli - palestinians - palestinian - gaza 17 59_israel_israeli_palestinians_palestinian
60 housing - affordable - rent - homelessness - government 17 60_housing_affordable_rent_homelessness
61 prince - queen - birthday - duke - royal 17 61_prince_queen_birthday_duke
62 australia - australian - asylum - visa - abbott 15 62_australia_australian_asylum_visa
63 tax - spending - cut - osborne - fiscal 15 63_tax_spending_cut_osborne
64 updated - 2017 - bst - last - gmt 14 64_updated_2017_bst_last
65 refugee - uk - child - vulnerable - refugees 12 65_refugee_uk_child_vulnerable
66 ebola - sierra - leone - outbreak - liberia 12 66_ebola_sierra_leone_outbreak
67 shah - ahmed - mosque - muslims - prophet 11 67_shah_ahmed_mosque_muslims
68 broadband - 4g - ee - customer - internet 11 68_broadband_4g_ee_customer
69 pistorius - steenkamp - toilet - door - reeva 10 69_pistorius_steenkamp_toilet_door
70 eu - uk - population - migrant - trade 9 70_eu_uk_population_migrant
71 australia - marriage - turnbull - katter - samesex 9 71_australia_marriage_turnbull_katter
72 sugar - gin - sabmiller - inbev - ab 8 72_sugar_gin_sabmiller_inbev
73 suu - kyi - rohingya - rakhine - myanmar 8 73_suu_kyi_rohingya_rakhine
74 nadeau - field - aircraft - cordon - accidents 8 74_nadeau_field_aircraft_cordon
75 abortion - ireland - law - unborn - case 8 75_abortion_ireland_law_unborn
76 homosexuality - tor - homosexual - law - gay 7 76_homosexuality_tor_homosexual_law
77 castro - cuba - cuban - fidel - havana 7 77_castro_cuba_cuban_fidel
78 china - samsung - firm - business - cheil 7 78_china_samsung_firm_business
79 event - festival - technology - campsite - interactive 6 79_event_festival_technology_campsite
80 vw - volkswagen - production - emission - carmaker 6 80_vw_volkswagen_production_emission
81 mohammed - gjolla - sheriff - nca - terrorism 6 81_mohammed_gjolla_sheriff_nca
82 tb - tuberculosis - disease - badger - zoonotic 5 82_tb_tuberculosis_disease_badger

Training hyperparameters

  • calculate_probabilities: True
  • language: english
  • low_memory: False
  • min_topic_size: 10
  • n_gram_range: (1, 1)
  • nr_topics: None
  • seed_topic_list: None
  • top_n_words: 10
  • verbose: False

Framework versions

  • Numpy: 1.22.4
  • HDBSCAN: 0.8.33
  • UMAP: 0.5.3
  • Pandas: 1.5.3
  • Scikit-Learn: 1.2.2
  • Sentence-transformers: 2.2.2
  • Transformers: 4.31.0
  • Numba: 0.57.1
  • Plotly: 5.13.1
  • Python: 3.10.12