Spaces:
Running
on
Zero
Running
on
Zero
Feature(MInference): update NeurIPS'24
Browse files
app.py
CHANGED
@@ -14,7 +14,7 @@ HF_TOKEN = os.environ.get("HF_TOKEN", None)
|
|
14 |
|
15 |
|
16 |
DESCRIPTION = """
|
17 |
-
# [MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention](https://aka.ms/MInference) (
|
18 |
|
19 |
_Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
|
20 |
|
@@ -23,7 +23,11 @@ _Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo
|
|
23 |
<a href="https://arxiv.org/abs/2407.02490" target="blank"> [Paper]</a></h3>
|
24 |
|
25 |
## News
|
|
|
|
|
|
|
26 |
- 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
|
|
|
27 |
- 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
|
28 |
|
29 |
<font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>
|
|
|
14 |
|
15 |
|
16 |
DESCRIPTION = """
|
17 |
+
# [MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention](https://aka.ms/MInference) (NeurIPS'24 Spotlight)
|
18 |
|
19 |
_Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
|
20 |
|
|
|
23 |
<a href="https://arxiv.org/abs/2407.02490" target="blank"> [Paper]</a></h3>
|
24 |
|
25 |
## News
|
26 |
+
- 🧤 [24/09/26] MInference has been accepted as **spotlight** at **NeurIPS'24**. See you in Vancouver!
|
27 |
+
- 👘 [24/09/16] We are pleased to announce the release of our KV cache offloading work, [RetrievalAttention](https://aka.ms/RetrievalAttention), which accelerates long-context LLM inference via vector retrieval.
|
28 |
+
- 🥤 [24/07/24] MInference support [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) now.
|
29 |
- 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
|
30 |
+
- 📃 [24/07/03] Due to an issue with arXiv, the PDF is currently unavailable there. You can find the paper at this [link](https://export.arxiv.org/pdf/2407.02490).
|
31 |
- 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
|
32 |
|
33 |
<font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>
|