Lukashenko Generator
There is the Text-to-Text Generative AI This model generate the phrases which are familar to that the Diktator and fascist from Belarus - Aliaksandr Lukashenko could speak.
Documentation
Description
The model used the dataset: NebulasBellum/pizdziuk_luka which was collected by the telegram chanel Pul Pervogo. Only with the help of this chanel we could make this great job like creating the fascist Diktator Lukashenko speech :)
Model was trained with 250 epochs and produce a very good results.
loss: 0.1890 - accuracy: 0.9416
The model all time in the improving by use the extended datasets which all time improving by adding the new speach of the Fascist Lukashenko. All information collected from public, and not only :) (thanks our partizanz).
Right now the model folder NebulasBellum/Lukashenko_tarakan
contain all neccesary files for download and use the model in the TensorFlow
library
with collected weights weights_lukash.h5
Quick Start
For use this model with the TensorFlow
library you need:
- Download the model:
md Luka_Pizdziuk
cd Luka_Pizdziuk
git clone https://huggingface.co/NebulasBellum/Lukashenko_tarakan/tree/main
and create the python script:
import tensorflow as tf
import copy
import numpy as np
# add the start generation of the lukashenko speech from the simple seed
seed_text = 'я не глядя поддержу'
weights_path = 'weights_lukash.h5'
model_path = 'Lukashenko_tarakan'
# Load the model to the Keras from saved files
model = tf.keras.models.load_model(model_path)
model.load_weights(weights_path)
# Show the Model summary
model.summary()
# Load the dataset to the model
with open('source_text_lukash.txt', 'r') as source_text_file:
data = source_text_file.read().splitlines()
tmp_data = copy.deepcopy(data)
sent_length = 0
for idx, line in enumerate(data):
if len(line) < 5:
tmp_data.pop(idx)
else:
sent_length += len(line.split())
data = tmp_data
lstm_length = int(sent_length / len(data))
# Tokenize the dataset
token = tf.keras.preprocessing.text.Tokenizer()
token.fit_on_texts(data)
encoded_text = token.texts_to_sequences(data)
# Vocabular size
vocab_size = len(token.word_counts) + 1
# Create the sequences
datalist = []
for d in encoded_text:
if len(d) > 1:
for i in range(2, len(d)):
datalist.append(d[:i])
max_length = 20
sequences = tf.keras.preprocessing.sequence.pad_sequences(datalist, maxlen=max_length, padding='pre')
# X - input data, y - target data
X = sequences[:, :-1]
y = sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)
seq_length = X.shape[1]
# Generate the Lukashenko speech from the seed
generated_text = ''
number_lines = 3
for i in range(number_lines):
text_word_list = []
for _ in range(lstm_length * 2):
encoded = token.texts_to_sequences([seed_text])
encoded = tf.keras.preprocessing.sequence.pad_sequences(encoded, maxlen=seq_length, padding='pre')
y_pred = np.argmax(model.predict(encoded), axis=-1)
predicted_word = ""
for word, index in token.word_index.items():
if index == y_pred:
predicted_word = word
break
seed_text = seed_text + ' ' + predicted_word
text_word_list.append(predicted_word)
seed_text = text_word_list[-1]
generated_text = ' '.join(text_word_list)
generated_text += '\n'
print(f"Lukashenko are saying: {generated_text}")
Try in HF space
The ready to check space with working model are placed here:
- Downloads last month
- 2