Bloomz-3b Refuses to Summarize Text
I read that Bloomz was good for summarization tasks compared to the regular bloom model.
However, based on my experience, it refuses to summarize the text. I have tried several prompts to no avail, it always returns the input text.
Has anyone had success with abstraction summarization with Bloomz?
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-3b")
model = AutoModelForCausalLM.from_pretrained("bigscience/bloomz-3b", torch_dtype='auto', device_map='auto', offload_folder='offload', offload_state_dict=True)
text = "In this lesson, we will learn about the different types of clouds that exist in the atmosphere. Clouds are classified based on their height, shape, and composition. The three main cloud types are cumulus, stratus, and cirrus. Cumulus clouds are puffy and white, with a flat base and a rounded top. Stratus clouds are low, gray, and flat, and they often cover the entire sky. Cirrus clouds are high and thin, with a wispy appearance. They are often an indicator of an approaching storm. Understanding the different cloud types is important for weather forecasting and aviation safety."
inputs = tokenizer.encode(f'Write a brief summary for the following text that focuses on the main idea:\n\n{text}', return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=64)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)
Output:
Write a brief summary for the following text that focuses on the main idea:
In this lesson, we will learn about the different types of clouds that exist in the atmosphere. Clouds are classified based on their height, shape, and composition. The three main cloud types are cumulus, stratus, and cirrus. Cumulus clouds are puffy and white, with a flat base and a rounded top. Stratus clouds are low, gray, and flat, and they often cover the entire sky. Cirrus clouds are high and thin, with a wispy appearance. They are often an indicator of an approaching storm. Understanding the different cloud types is important for weather forecasting and aviation safety.
It's not returning the input text, but directly finishing the generation after your prompt. You can see that by setting skip_special_tokens=False
.
It's best to have the instruction after the context, see the below generation I got with bloomz-3b:
inputs = tokenizer.encode(f'{text}\n\nWrite a short summary for the prior text.', return_tensors="pt").to(device)
outputs = model.generate(inputs, max_new_tokens=64)
summary = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(summary)
Learn about different types of clouds.
Thanks, adding the instruction after the context helped. However, it looks like Bloomz is doing extractive summarization rather than abstractive summarization.
I tried several prompts such as:
- What are the main points of the prior text?
- What was the prior text about?
- Summarize the prior text.
- Write a brief summary of the prior text that highlights its key points.
- Write a summary of the prior text that highlights its key points.
But all of them returned basically the same "Learn about different types of clouds." output (the last two prompts changed "learn" to "identify").
Even using a larger model (bigscience/bloomz-7b1) yielded the same results.
How can use Bloomz for abstractive summarization?
You could give it a few-shot example. E.g. provide it one example with an abstractive summary in the beginning of your prompt.
It's not returning the input text, but directly finishing the generation after your prompt. You can see that by setting
skip_special_tokens=False
.It's best to have the instruction after the context, see the below generation I got with bloomz-3b:
inputs = tokenizer.encode(f'{text}\n\nWrite a short summary for the prior text.', return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=64) summary = tokenizer.decode(outputs[0], skip_special_tokens=False) print(summary)
Learn about different types of clouds.
@Muennighoff Hi, I have the same problem and tried with skip_special_tokens=False but the prompt is still printed, is there any changes to the instruction?
It's not returning the input text, but directly finishing the generation after your prompt. You can see that by setting
skip_special_tokens=False
.It's best to have the instruction after the context, see the below generation I got with bloomz-3b:
inputs = tokenizer.encode(f'{text}\n\nWrite a short summary for the prior text.', return_tensors="pt").to(device) outputs = model.generate(inputs, max_new_tokens=64) summary = tokenizer.decode(outputs[0], skip_special_tokens=False) print(summary)
Learn about different types of clouds.
@Muennighoff Hi, I have the same problem and tried with skip_special_tokens=False but the prompt is still printed, is there any changes to the instruction?
If it's just generating the endoftext token i.e. it does not generate anything and changing the prompt does not help you can enforce a minimum number of tokens by setting min_new_tokens
to a value larger than 0, which will just ignore the endoftext token for x number of tokens.
Thank you. Sorry I didn't describe my problem clearly. In my case, the output is prompt + generate,
for example the output is
"here is an example.
Generate a summmary: This is an example.
Test sentence .....
Generate a summmary: This is a test sentence."
What I want is only :"This is a test sentence."
Oh that is just the default behavior of generate to also return the input prompt - the model is not actually generating that part.
You can just remove it by doing sth like gen = gen[len(prompt):]
Thank you for your answer! it helps. :)
another question: if the example is in English and the test sentence in another language, is there any method to solve the problem that the generation is always in English?
Explicitly specifying the language may work, e.g. Please reply in Japanese.
I will try this method, thank you for your help