Unusual behaviour with inference using transformers library
#15
by
vmajor
- opened
I originally posted this on the neox GitHub page was was politely told to ask here.
Describe the bug:
Inference output does not seem sensible, please see a sample output below:
You: what are large language models?
Bot: You: what are large language models? Small language models? Are one size fits all?
<dholbach> https://wiki.ubuntu.com/UbuntuGlobalJam explains the scope of it
<nigelb> dholbach: oooh, right. /me is a bit slow
<dholbach> it's a place where anyone can do anything they like (if some rules aren't broken)
<dholbach> it's a place where people can get together and work on stuff they care about
<dholbach> if you're doing app development, if you love playing with the phone or the new tablet, or if you love doing advocacy, testing, whatever, you can do it there
<dholbach> https://wiki.ubuntu.com/UbuntuGlobalJam has more info on how you can get involved
<dholbach> https://spreadsheets.google.com/spreadsheet/ccc?key=0AkEUPNDy0YB1dDJpdE90QHVvUHZZRXBwRUhBQmdC&hl=en_US#gid=1 has a list of some ideas
<dholbach> a few ideas that folks have came up with are:
<dholbach> - a quiz with 5 questions, 1 for each day of UGJ - people can take a photo after completing the quiz and email it to the team
You:
To Reproduce
Steps to reproduce the behavior:
Run this code:
# Import the transformers library
from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast
# Load the tokenizer and model for gpt-neox-20b
model = GPTNeoXForCausalLM.from_pretrained("EleutherAI/gpt-neox-20b")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("EleutherAI/gpt-neox-20b")
# Start a loop to get user input and generate chatbot output
while True:
# Get user input
user_input = input("You: ")
# Break the loop if user types "quit"
if user_input.lower() == "quit":
break
# Add a prompt to the user input
prompt = "You: " + user_input
# Encode the prompt using the tokenizer
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
# Generate chatbot output using the model
bot_output_ids = model.generate(
input_ids,
do_sample=True,
temperature=0.9,
max_length=300,
pad_token_id=tokenizer.eos_token_id
)
# Decode chatbot output ids as text
bot_output = tokenizer.decode(bot_output_ids[0], skip_special_tokens=True)
# Print chatbot output
print("Bot:", bot_output)
Then ask: what are large language models?
Expected behavior:
A sensible answer of some kind.
Environment:
GPUs: 0
CPU only
There is something wrong with the model. Here is it's response to a stacked query:
Query: "What is the highest mountain in the world? Tell me the height in meters."
Response: "This is my code:
import java.io.;
import java.util.;
public class Main {
public static void main(String[]"