AQLM-executorch / README.md
BlackSamorez's picture
Update README.md
d1e0141 verified
|
raw
history blame
725 Bytes

Bringing SOTA quantization to mobile LLM deployment: A practical Executorch integration guide

Article: https://blacksamorez.substack.com/p/aqlm-executorch-android

Usage

  • Download and install the .apk file on your Android phone.
  • Download the .pte and .model files and put them into the /data/local/tmp/llama folder on your Android phone.
  • Running the app you will see the option to load the .pte and .model files. After loading them, you'll be able to chat with the model.

Requirements

This app was tested on Samsung S24 Ultra running Android 14.

Limitations

  • Although the app looks like chat, generation requests are independent.
  • Llama-3 chat template is hard-coded into the app.