Transformers
falcon
TheBloke commited on
Commit
db4ef0d
1 Parent(s): 5ec0b35

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -7
README.md CHANGED
@@ -61,6 +61,7 @@ Here are a list of clients and libraries that are known to support GGUF:
61
  <!-- repositories-available start -->
62
  ## Repositories available
63
 
 
64
  * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Falcon-180B-GGUF)
65
  * [Technology Innovation Institute's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-180B)
66
  <!-- repositories-available end -->
@@ -116,23 +117,37 @@ Refer to the Provided Files table below to see what files use which methods, and
116
 
117
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
118
 
119
- ### All files are split and require joining after download
120
 
121
- **Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded all files as split files
122
 
123
  <details>
124
- <summary>Click for instructions regarding joining files</summary>
125
 
126
- To join the files, use the following example for each file you're interested in:
 
 
 
 
 
 
 
 
 
 
127
 
128
  Linux and macOS:
129
  ```
130
- cat falcon-180b.Q2_K.gguf-split-* > falcon-180b.Q2_K.gguf && rm falcon-180b.Q2_K.gguf-split-*
 
131
  ```
132
  Windows command line:
133
  ```
134
- COPY /B falcon-180b.Q2_K.gguf-split-a + falcon-180b.Q2_K.gguf-split-b falcon-180b.Q2_K.gguf
135
- del falcon-180b.Q2_K.gguf-split-a falcon-180b.Q2_K.gguf-split-b
 
 
 
136
  ```
137
 
138
  </details>
 
61
  <!-- repositories-available start -->
62
  ## Repositories available
63
 
64
+ * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Falcon-180B-GPTQ)
65
  * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Falcon-180B-GGUF)
66
  * [Technology Innovation Institute's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/tiiuae/falcon-180B)
67
  <!-- repositories-available end -->
 
117
 
118
  **Note**: the above RAM figures assume no GPU offloading. If layers are offloaded to the GPU, this will reduce RAM usage and use VRAM instead.
119
 
120
+ ### Q6_K and Q8_0 files are split and require joining
121
 
122
+ **Note:** HF does not support uploading files larger than 50GB. Therefore I have uploaded the Q6_K and Q8_0 files as split files.
123
 
124
  <details>
125
+ <summary>Click for instructions regarding Q6_K and Q8_0 files</summary>
126
 
127
+ ### q6_K
128
+ Please download:
129
+ * `falcon-180b.Q6_K.gguf-split-a`
130
+ * `falcon-180b.Q6_K.gguf-split-b`
131
+
132
+ ### q8_0
133
+ Please download:
134
+ * `falcon-180b.Q8_0.gguf-split-a`
135
+ * `falcon-180b.Q8_0.gguf-split-b`
136
+
137
+ To join the files, do the following:
138
 
139
  Linux and macOS:
140
  ```
141
+ cat falcon-180b.Q6_K.gguf-split-* > falcon-180b.Q6_K.gguf && rm falcon-180b.Q6_K.gguf-split-*
142
+ cat falcon-180b.Q8_0.gguf-split-* > falcon-180b.Q8_0.gguf && rm falcon-180b.Q8_0.gguf-split-*
143
  ```
144
  Windows command line:
145
  ```
146
+ COPY /B falcon-180b.Q6_K.gguf-split-a + falcon-180b.Q6_K.gguf-split-b falcon-180b.Q6_K.gguf
147
+ del falcon-180b.Q6_K.gguf-split-a falcon-180b.Q6_K.gguf-split-b
148
+
149
+ COPY /B falcon-180b.Q8_0.gguf-split-a + falcon-180b.Q8_0.gguf-split-b falcon-180b.Q8_0.gguf
150
+ del falcon-180b.Q8_0.gguf-split-a falcon-180b.Q8_0.gguf-split-b
151
  ```
152
 
153
  </details>