zhangce commited on
Commit
65778e0
1 Parent(s): b02c8e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -67,10 +67,10 @@ Please refer to [OpenChatKit](https://github.com/togethercomputer/OpenChatKit) f
67
 
68
  ## Inference
69
 
70
- You can use the Together API to try out Llama-2-7B-32K-beta for inference.
71
- The updated inference stack allows for efficient and speedy inference.
72
 
73
- To run the model locally, we strongly recommend to install Flash Attention V2:
74
  ```
75
  # Please update the path of `CUDA_HOME`
76
  export CUDA_HOME=/usr/local/cuda-11.8
 
67
 
68
  ## Inference
69
 
70
+ You can use the [Together API](https://together.ai/blog/api-announcement) to try out Llama-2-7B-32K-beta for inference.
71
+ The updated inference stack allows for efficient inference.
72
 
73
+ To run the model locally, we strongly recommend to install Flash Attention V2, which is necessary to obtain the best performance:
74
  ```
75
  # Please update the path of `CUDA_HOME`
76
  export CUDA_HOME=/usr/local/cuda-11.8