Tried with MPS?
#7
by
kronosprime
- opened
I adjusted the code to replace CUDA references with MPS, but after 20 minutes on the fastest M2 with 96GB the generations hadn't fully finished. So I wanted to ask if anyone else had the same result, or did it work for you?
You might want to run the GGUF version then, no? Not sure whether @TheBloke has quantized this.