how test the model
Hi there,
first of all thanks for sharing this cool model,
I tried to test it but I couldn't get result, so I think the way that I tried to test it might be wrong, could you please guide me how can I simply test it or could you please check my code
here is the code that I used but it just re-write the query and nothing more!
from transformers import AutoTokenizer
import transformers
import torch
model = "beomi/llama-2-ko-7b"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.float16,
device_map="auto",
)
query="whatever"
sequences = pipeline(
query,
do_sample=False,
top_k=10,
num_return_sequences=1,
eos_token_id=tokenizer.eos_token_id,
max_length=200,
)
for seq in sequences:
print(f"Result: {seq['generated_text']}")
Thanks for your attention!
it seems like it is working as intended, I've tested on Google Colab using your code but it seems working fine.
Here's demo colab link: https://colab.research.google.com/drive/1yw2wnge6iHfj7PO5VVDA3jkmliiOqQvd?usp=sharing
what i changed is 1 line of code - since this ckpt is consisted of BF16
, you'll need to use torch_dtype=torch.bfloat16
or remove that line at all. (model's config contains about torch_dtype already) but actually it is not critical issue for running the model.
could you explain more detail about your env (python ver, pytorch ver, GPU, nvidia-driver version, cuda version, transformers/tokenizers/accelerate version)?
Thank you for the prompt response, and thanks for your guidance.
It seems that the issue has been solved.
I also obtained the same result that you shared:
Loading checkpoint shards: 100%|ββββββββββ| 15/15 [00:26<00:00, 1.77s/it]
Result: How's the weather today? (μ€λ λ μ¨κ° μ΄λ»μ΅λκΉ?) 10. μ€λ μ λ
μ λ ν κ²λκΉ? What are you doing tonight? 11. λͺ μμ ν΄κ·Όν©λκΉ? What time do you get off? 12. μ€λμ λͺ μμ μΆκ·Όν©λκΉ? What time are you coming to work today? 13. μ΄λλ₯Ό κ°μλκΉ? Where are you headed? 14. λΉμ μ λ¬΄μ¨ μΌλ‘ μ ννμ
¨μ΅λκΉ? May I help you, sir? 15. μ΄ μ·μ μ΄λμ? How does this look on me? 16. μ°¨ ν μ μ΄λ»μ΅λκΉ? How about a cup of coffee? 17. μ μ μκ² κ·Έλ κ² νλ₯Ό λ΄κ³ μμ΅λκΉ? Why are you so angry with me? 18. λλ λΉμ μ μ¬λν©λλ€. I love you. 19. λλ λΉμ μ μ’μν©λλ€. I like you. 20. λΉμ μ μ°Έ μ μ΄μ μ
λλ€. You...
It's working, but it seems to be generating questions and translations rather than general text generation.
Did you train the model to generate similar questions with their translations?
and I've noticed that it doesn't work well with larger texts and it's just re-write whole the given context again
and here are my current environment details:
python: 3.8.16
pytorch: 2.0.1+cu117
GPU: A100 80G
Nvidia-Driver Version: 495.29.05
CUDA Version: 11.5
transformers: 4.32.0.dev0
tokenizers: 0.13.3
accelerate: 0.20.3
thanks again!
It would be sampling issue.
How about adding some temperatures and top-p sampling?
the phenomena shown is NOT intended since I trained the model with shuffled texts.
I saw the same issue as reported by Soroor. I already used temperature 0.7, top-p 0.9.
Is there a good prompt template to use for chat?