Spaces:

microsoft
/

MInference

Running on Zero

iofu728 commited on Oct 16

Commit

0830175

•

1 Parent(s): 962818f

Feature(MInference): update NeurIPS'24

Files changed (1) hide show

app.py CHANGED Viewed

@@ -14,7 +14,7 @@ HF_TOKEN = os.environ.get("HF_TOKEN", None)
 DESCRIPTION = """
-# [MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention](https://aka.ms/MInference) (Under Review, ES-FoMo @ ICML'24)
 _Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
@@ -23,7 +23,11 @@ _Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo
 <a href="https://arxiv.org/abs/2407.02490" target="blank"> [Paper]</a></h3>
 ## News
 - 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
 - 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
 <font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>

 DESCRIPTION = """
+# [MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention](https://aka.ms/MInference) (NeurIPS'24 Spotlight)
 _Huiqiang Jiang†, Yucheng Li†, Chengruidong Zhang†, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_
 <a href="https://arxiv.org/abs/2407.02490" target="blank"> [Paper]</a></h3>
 ## News
+- 🧤 [24/09/26] MInference has been accepted as **spotlight** at **NeurIPS'24**. See you in Vancouver!
+- 👘 [24/09/16] We are pleased to announce the release of our KV cache offloading work, [RetrievalAttention](https://aka.ms/RetrievalAttention), which accelerates long-context LLM inference via vector retrieval.
+- 🥤 [24/07/24] MInference support [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) now.
 - 🪗 [24/07/07] Thanks @AK for sponsoring. You can now use MInference online in the [HF Demo](https://huggingface.co/spaces/microsoft/MInference) with ZeroGPU.
+- 📃 [24/07/03] Due to an issue with arXiv, the PDF is currently unavailable there. You can find the paper at this [link](https://export.arxiv.org/pdf/2407.02490).
 - 🧩 [24/07/03] We will present **MInference 1.0** at the _**Microsoft Booth**_ and _**ES-FoMo**_ at ICML'24. See you in Vienna!
 <font color="brown"><b>This is only a deployment demo. You can follow the code below to try MInference locally.</b></font>