llmware
/

bling-phi-3

Text Generation

text-generation-inference

Model card Files Files and versions Community

doberst commited on May 2

Commit

c80968e

•

1 Parent(s): 11c462d

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -23,7 +23,9 @@ Evaluated against the benchmark test:   [RAG-Instruct-Benchmark-Tester](https://
 --Summarization Quality (1-5):  4 (Above Average)
 --Hallucinations:  No hallucinations observed in test runs.
-For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
 ### Model Description
@@ -99,15 +101,15 @@ If you are using a HuggingFace generation script:
     inputs = tokenizer(new_prompt, return_tensors="pt")
     start_of_output = len(inputs.input_ids[0])
-    #   temperature: set at 0.3 for consistency of output
     #   max_new_tokens:  set at 100 - may prematurely stop a few of the summaries
     outputs = model.generate(
             inputs.input_ids.to(device),
             eos_token_id=tokenizer.eos_token_id,
             pad_token_id=tokenizer.eos_token_id,
-            do_sample=True,
-            temperature=0.3,
             max_new_tokens=100,
             )

 --Summarization Quality (1-5):  4 (Above Average)
 --Hallucinations:  No hallucinations observed in test runs.
+For test run results (and good indicator of target use cases), please see the files ("core_rag_test" and "answer_sheet" in this repo).
+Note: the Pytorch version answered 1 question with "Not Found" while the quantized version answered it correctly, hence the small difference in scores.
 ### Model Description
     inputs = tokenizer(new_prompt, return_tensors="pt")
     start_of_output = len(inputs.input_ids[0])
+    #   temperature: set at 0.0 with do_sample=False for consistency of output
     #   max_new_tokens:  set at 100 - may prematurely stop a few of the summaries
     outputs = model.generate(
             inputs.input_ids.to(device),
             eos_token_id=tokenizer.eos_token_id,
             pad_token_id=tokenizer.eos_token_id,
+            do_sample=False,
+            temperature=0.0,
             max_new_tokens=100,
             )