Spaces:

ggml-org
/

gguf-my-repo

Running on A10G

App Files Files Community

135

imatrix support

#80

by SixOpen - opened Jun 3

base: refs/heads/main

←

from: refs/pr/80

Discussion Files changed

+2240

-31

SixOpen

Jun 3

No description provided.

SixOpen

Jun 4

continued from https://huggingface.co/spaces/ggml-org/gguf-my-repo/discussions/78
Sadly no clue as to why I have no perms to push, whoami says the huggingface-cli login is valid.

The diffs are naturally still off due to uploading through huggingface_cli upload ggml-org/gguf-my-repo . . --repo-type space --revision refs/pr/80 instead after the denies. Looking forward to a solution regarding the auth, it's still in draft mode so all good until finding a way :)

Dampfinchen

Jun 4

Will we able to submit our own .txt file for Imatrix generation? That would be really cool. I hope this gets merged soon, it's a game changer.

SixOpen

Jun 4

Of course! The fallback file is there solely for less familiar users that would try to quantize without providing their own :)

reach-vb

ggml.ai org Jun 4

•

edited Jun 4

@SixOpen - can you try creating a pull request like this: https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-advanced-usage - this way you should have the correct diff.

Otherwise maybe you can open a PR through the UI :/

Imatrix support349817ec

SixOpen

Jun 4

All ready! Oddly, it still didn't push after the hf-cli login but remote set-url origin with username and token did! Glad I didn't have to clutter you with that many separate PRs and thanks for the patience 😆

reach-vb

ggml.ai org Jun 5

Brilliant! Reviewing it now!

reach-vb changed pull request status to open Jun 5

reach-vb

ggml.ai org Jun 5

This generally looks good to me! Thanks for keeping it clean! Would really like it if @ggerganov can give it a review too!

ggerganov

ggml.ai org Jun 6

Looks like we are calling make twice: one time in Dockerfile ENTRYPOINT and one more time in start.sh. Maybe it is better to just call it in Dockerfile like this:

ENTRYPOINT ["/bin/bash", "-c", "cd llama.cpp && LLAMA_CUDA=1 make -j quantize gguf-split imatrix && cd .. && /bin/sh start.sh"]

And simplify start.sh to just:

python app.py

In app.py, is it necessary to compile again? If not, then generate_importance_matrix can be simplified
Since the imatrix computation can take a lot of time if the training data is too big, we can put a time limit for the imatrix command - let's say 1 minute. If the process does not finish within this time limit, it gets killed and we use whatever imatrix.dat has been generated last (the imatrix tool periodically outputs the current result to imatrix.dat, see the --output-frequency CLI argument)

SixOpen

Jun 6

Great calls :) The time limit is definitely something we should have, will add that in a bit! Looks that while stashing start.sh remained on the version prior to entrypoint tweaks, and some LFS shenanigans might have affected the txt as well but I'll update the branch to take care of all of that 😄 along with the superfluous compile in app.py

Imatrix87a3f98b

Imatrix70cc07f3

reach-vb

ggml.ai org Jun 7

•

edited Jun 7

Thanks @ggerganov for the review! and thanks @SixOpen for updating the PR.

Small comment - let's keep the build process in the start.sh. This is because spaces sometimes build the Dockerfile in a different environment and the final space separately.

If the build happens during start.sh, then we make sure that the build is correct as per the hardware assigned to the space (this also makes it easy for people to duplicate this space).

Question: how are we ensuring the imatrix process goes on only for a minute?

EDIT: Nevermind saw the signal code LGTM.

ggerganov

ggml.ai org Jun 7

Agree to move the build inside start.sh. Btw the 1 minute timeout was an example - I'm not sure what number would make sense, so feel free to experiment if it is too-short or too-long

Dampfinchen

Jun 7

Kalomaze's group merged is a very popular imatrix dataset: https://github.com/ggerganov/llama.cpp/discussions/5263#discussioncomment-8395384

I suggest running that on a large model and see how long it takes, then add a few minutes in case people want to add something to it.

reach-vb

ggml.ai org Jun 7

Oh wow! this dataset looks great! Can we please use this as the default in our case @SixOpen ? 🤗

SixOpen

Jun 7

Of course! Good to know about spaces :) will update soon covering all above

reach-vb

ggml.ai org Jun 10

Nice thanks @SixOpen - let me know when you've made the update I can then re-review and merge this!
I'll also start a discussion to highlight your contribution to the repo too! ❤️

Dampfinchen

Jun 10

Very much looking forward to this PR getting merged.

Imatrixa06efcae

SixOpen

Jun 10

@reach-vb Thanks for the wait, let me know if I should add anything else 🤗

reach-vb

ggml.ai org Jun 11

Lovely! Looks good to me! 🚀

reach-vb changed pull request status to merged Jun 11

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment