File Download

There are no files associated with this item.

Supplementary

Conference Paper: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization

TitleLLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Authors
Issue Date2-Mar-2024
Persistent Identifierhttp://hdl.handle.net/10722/340643

 

DC FieldValueLanguage
dc.contributor.authorZhao, Juntao-
dc.contributor.authorWan, Borui-
dc.contributor.authorWu, Chuan-
dc.contributor.authorPeng, Yanghua-
dc.contributor.authorLin, Haibin -
dc.date.accessioned2024-03-11T10:46:05Z-
dc.date.available2024-03-11T10:46:05Z-
dc.date.issued2024-03-02-
dc.identifier.urihttp://hdl.handle.net/10722/340643-
dc.languageeng-
dc.relation.ispartofthe 29th ACM SIGPLAN Annual Sympo- sium Principles and Practice of Parallel Programming (PPoPP’24) (02/03/2024-06/03/2024, Edinburgh)-
dc.titleLLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization-
dc.typeConference_Paper-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats