7 15 12

Huiqiang Jiang PRO

iofu728

https://www.microsoft.com/en-us/research/people/hjiang/

AI & ML interests

None yet

Recent Activity

updated a Space about 1 month ago

microsoft/MInference

upvoted a paper about 1 month ago

Articles

How to Optimize TTFT of 8B LLMs with 1M Tokens to 20s

Jul 21

• 2

Organizations

iofu728's activity

commented a paper 2 months ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16 • 39 •

New activity in microsoft/MInference 4 months ago

setup.py error

#1 opened 4 months ago by

NicDev

commented 3 papers 5 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23 •

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23 •

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2 • 23 •

New activity in zero-gpu-explorers/README 5 months ago

Asking help for using pycuda on Gradio Zero

#78 opened 5 months ago by

iofu728

commented a paper 6 months ago

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19 • 24 •

commented a paper 8 months ago

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19 • 24 •

New activity in microsoft/llmlingua-2-xlm-roberta-large-meetingbank 8 months ago

Feature(LLMLingua-2): update LLMLingua-2 link

#1 opened 8 months ago by

iofu728

New activity in microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank 8 months ago

Feature(LLMLingua-2): update the LLMLingua-2 link

#1 opened 8 months ago by

iofu728

commented a paper 8 months ago

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19 • 24 •