[Cache Request] alejandrovil/llama3-AWQ

#107
by alejandrovil - opened

Please add the following model to the neuron cache

AWS Inferentia and Trainium org

AWQ models use a 4-bit quantization scheme that is not supported on Neuron platforms.

dacorvo changed discussion status to closed

Sign up or log in to comment