michaelfeil commited on
Commit
6abd23d
1 Parent(s): 84a751e

Upload togethercomputer/GPT-JT-6B-v0 ctranslate fp16 weights

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -22,18 +22,18 @@ widget:
22
  example_title: "Question Answering"
23
  ---
24
  # # Fast-Inference with Ctranslate2
25
- Speedup inference by 2x-8x using int8 inference in C++
26
 
27
  quantized version of [togethercomputer/GPT-JT-6B-v0](https://huggingface.co/togethercomputer/GPT-JT-6B-v0)
28
  ```bash
29
- pip install hf-hub-ctranslate2>=2.0.6 ctranslate2>=3.13.0
30
  ```
31
  Converted on 2023-05-19 using
32
  ```
33
  ct2-transformers-converter --model togethercomputer/GPT-JT-6B-v0 --output_dir /home/michael/tmp-ct2fast-GPT-JT-6B-v0 --force --copy_files merges.txt tokenizer.json README.md tokenizer_config.json vocab.json special_tokens_map.json added_tokens.json .gitattributes --quantization float16
34
  ```
35
 
36
- Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
37
  - `compute_type=int8_float16` for `device="cuda"`
38
  - `compute_type=int8` for `device="cpu"`
39
 
@@ -51,7 +51,7 @@ model = GeneratorCT2fromHfHub(
51
  tokenizer=AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v0")
52
  )
53
  outputs = model.generate(
54
- text=["How do you call a fast Flan-ingo?", "User: How are you doing?"],
55
  )
56
  print(outputs)
57
  ```
 
22
  example_title: "Question Answering"
23
  ---
24
  # # Fast-Inference with Ctranslate2
25
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
26
 
27
  quantized version of [togethercomputer/GPT-JT-6B-v0](https://huggingface.co/togethercomputer/GPT-JT-6B-v0)
28
  ```bash
29
+ pip install hf-hub-ctranslate2>=2.0.6
30
  ```
31
  Converted on 2023-05-19 using
32
  ```
33
  ct2-transformers-converter --model togethercomputer/GPT-JT-6B-v0 --output_dir /home/michael/tmp-ct2fast-GPT-JT-6B-v0 --force --copy_files merges.txt tokenizer.json README.md tokenizer_config.json vocab.json special_tokens_map.json added_tokens.json .gitattributes --quantization float16
34
  ```
35
 
36
+ Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2)
37
  - `compute_type=int8_float16` for `device="cuda"`
38
  - `compute_type=int8` for `device="cpu"`
39
 
 
51
  tokenizer=AutoTokenizer.from_pretrained("togethercomputer/GPT-JT-6B-v0")
52
  )
53
  outputs = model.generate(
54
+ text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"],
55
  )
56
  print(outputs)
57
  ```