metadata

license: other
license_name: yi-license
license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
tags:
  - merge
  - GGUF
  - imatrix
  - 2bit

Kyllene-57B

Kyllene-57B quantized to 2~3 bpw GGUF-GGUF

Please ❤️like❤️/📧comment📧/💌mail me some anthrax spores💌 if you use these! The download ticker won't work on a repo like this, so there's no feedback. I'm not wasting my time, right?

NOTICE: I did not use the original file! I started with Q6_K (there was no Q8 and more precision for this would be absurd). There may well be problems with these quants but I'll eat my own entire ass if a 57B Q6_K (>6.5bpw) is the root of any of them. More suspect is how I produced the imatrix.

imatrix included. generated from a 900k text file, also included this file was made by concatenating most of the default exllamav2 calibration data. a 900kb file of coherent text only, with some formatting and code but no endless broken html tags or nonsense. includes multilingual, for those deep layers. artefact produced from:

$ cd exllamav2/conversion/standard_cal_data
$ cat technical.utf8 multilingual.utf8 code.utf8 tiny.utf8 > techmulcodetiny.utf8

where: exllamav2/conversion/standard_cal_data and techmulcodetiny.utf8 produce a file that is used by imatrix for 560~ "chunks"

imatrix run with default sampling settings besides the dataset (i think? i increased the batch number and reduced the batch size so i could cram on more layers but the generation should have been the same in the end) (someone tell me why I was wrong to run imatrix with -cb continuous batching. shame me.)

Downloads (eventually)

under consideration:

Q2_K_S (imat only but I think compatible with older things. I'm not very sure what this is. )
Q2_K (should be strictly better than the original but this may be where my --allow-requantize comes to bite me, we'll see)

upload in progress: (probably done by now)

IQ2_XS 2.38 BPW CUDA0 buffer size = 15941.43 MiB

This file only exists because I did the maths wrong (I was expecting it to be bigger), but I recall that 16GB GPUs exist and I may give it a go with stable diffusion

upload scheduled in order: (big gpuboys just have to wait)

IQ2_M 2.7 BPW

briefly existed before I clobbered (verb, transitory) it. It ~~might~~ will be back.

IQ3_XXS 3.0<size<3.1 BPW

3090 enjoyers and their friends may want to run this with -nkvo and -ngl 100 ( no K/V offload 100 layers in koboldcpp). There are 101 layers and the last one becomes distressed if separated from its K/V cache. Invariably chokes your PCIe lanes to death as a survival mechanism. Nature is beautiful.