File Download
Supplementary

postgraduate thesis: Some optimization algorithms for applications in cloud computing and large language models

TitleSome optimization algorithms for applications in cloud computing and large language models
Authors
Advisors
Advisor(s):Yuan, X
Issue Date2025
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Hu, H. [胡翰宇]. (2025). Some optimization algorithms for applications in cloud computing and large language models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractThis thesis develops novel optimization algorithms to address two critical challenges in modern information technologies: cost-efficient bandwidth allocation for cloud-based livestreaming services and efficient compression of large language models (LLMs) through unstructured, semi-structured, and structured pruning. Both problems are large-scale, combinatorial in nature, and bear significant practical implications. For bandwidth allocation, we propose a three-phase optimization framework to minimize costs under the industry-standard 95th percentile billing. Our approach integrates circling reduction techniques with clustering-guided reallocations and constraint restoration, achieving reductions of up to 86% in billed outflow bandwidth and 42% in billed back-to-source bandwidth compared to baseline methods, while respecting indivisible stream constraints and quality-of-service requirements. For LLM compression, we advance the field through two key contributions. First, we present FISTAPruner for unstructured and semi-structured pruning, which introduces a convex optimization framework leveraging the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) with intra-layer error correction and adaptive hyperparameter tuning. This approach achieves 50% sparsity while retaining 98.6% of zero-shot task performance on LLaMA-3-70B. Second, we formalize an interlinking operator property in transformers and develop rigorous mixed-integer optimization models for structured pruning. Two novel structured pruning methods, FASP and SPAP, are proposed to address this challenge. FASP combines efficient importance scoring with optimal weight reconstruction, enabling structured pruning with high efficiency while preserving model performance. SPAP employs penalty methods and alternating minimization, further enhancing pruned model performance through optimization-driven approaches. Our structured pruning methods demonstrate superior performance across seven LLM families, including OPT, LLaMA-1/2/3/3.1/3.2, and Qwen2.5, achieving hardware-independent inference speedups of 1.29× at 30% sparsity with proportional memory reductions. The proposed methods are rigorously evaluated on synthetic and real-world cloud datasets as well as a wide range of LLMs, demonstrating scalability, practical applicability and superior performance compared with state-of-the-art baselines. This thesis bridges gaps in resource allocation and model compression, providing tools to reduce operational costs in cloud computing and enable efficient deployment of LLMs in resource-constrained environments.
DegreeDoctor of Philosophy
SubjectMathematical optimization
Cloud computing
Natural language processing (Computer science)
Dept/ProgramMathematics
Persistent Identifierhttp://hdl.handle.net/10722/367422

 

DC FieldValueLanguage
dc.contributor.advisorYuan, X-
dc.contributor.authorHu, Hanyu-
dc.contributor.author胡翰宇-
dc.date.accessioned2025-12-11T06:41:52Z-
dc.date.available2025-12-11T06:41:52Z-
dc.date.issued2025-
dc.identifier.citationHu, H. [胡翰宇]. (2025). Some optimization algorithms for applications in cloud computing and large language models. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/367422-
dc.description.abstractThis thesis develops novel optimization algorithms to address two critical challenges in modern information technologies: cost-efficient bandwidth allocation for cloud-based livestreaming services and efficient compression of large language models (LLMs) through unstructured, semi-structured, and structured pruning. Both problems are large-scale, combinatorial in nature, and bear significant practical implications. For bandwidth allocation, we propose a three-phase optimization framework to minimize costs under the industry-standard 95th percentile billing. Our approach integrates circling reduction techniques with clustering-guided reallocations and constraint restoration, achieving reductions of up to 86% in billed outflow bandwidth and 42% in billed back-to-source bandwidth compared to baseline methods, while respecting indivisible stream constraints and quality-of-service requirements. For LLM compression, we advance the field through two key contributions. First, we present FISTAPruner for unstructured and semi-structured pruning, which introduces a convex optimization framework leveraging the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) with intra-layer error correction and adaptive hyperparameter tuning. This approach achieves 50% sparsity while retaining 98.6% of zero-shot task performance on LLaMA-3-70B. Second, we formalize an interlinking operator property in transformers and develop rigorous mixed-integer optimization models for structured pruning. Two novel structured pruning methods, FASP and SPAP, are proposed to address this challenge. FASP combines efficient importance scoring with optimal weight reconstruction, enabling structured pruning with high efficiency while preserving model performance. SPAP employs penalty methods and alternating minimization, further enhancing pruned model performance through optimization-driven approaches. Our structured pruning methods demonstrate superior performance across seven LLM families, including OPT, LLaMA-1/2/3/3.1/3.2, and Qwen2.5, achieving hardware-independent inference speedups of 1.29× at 30% sparsity with proportional memory reductions. The proposed methods are rigorously evaluated on synthetic and real-world cloud datasets as well as a wide range of LLMs, demonstrating scalability, practical applicability and superior performance compared with state-of-the-art baselines. This thesis bridges gaps in resource allocation and model compression, providing tools to reduce operational costs in cloud computing and enable efficient deployment of LLMs in resource-constrained environments. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshMathematical optimization-
dc.subject.lcshCloud computing-
dc.subject.lcshNatural language processing (Computer science)-
dc.titleSome optimization algorithms for applications in cloud computing and large language models-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineMathematics-
dc.description.naturepublished_or_final_version-
dc.date.hkucongregation2025-
dc.identifier.mmsid991045147148803414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats