File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Realizing intelligence at the wireless edge : AI model compression, downloading, and cooperative fine-tuning
Title | Realizing intelligence at the wireless edge : AI model compression, downloading, and cooperative fine-tuning |
---|---|
Authors | |
Issue Date | 2024 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Wu, H. [吴海]. (2024). Realizing intelligence at the wireless edge : AI model compression, downloading, and cooperative fine-tuning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | The recent advancements in artificial intelligence (AI) have driven researchers to integrate AI learning models into the sixth-generation (6G) mobile network, facilitating the automation of various tasks in next-generation mobile applications. By utilizing modern mobile AI accelerators, edge devices can execute AI models and training algorithms, benefiting from enhanced data security and improved performance speeds. However, the current wireless network infrastructure, which primarily focuses on rate maximization, struggles to support these tasks due to its limited versatility and efficiency, heterogeneous design, and lack of end-to-end performance guarantees. To realize intelligent operations at the edge, a task-oriented communication system with high scalability, ease of deployment, and low-cost maintenance is essential. This dissertation contributes to intelligent edge realization through three key innovations: 1) designing scalable, high-performance AI model quantization, 2) investigating efficient multi-user model downloading via broadcasting, and 3) optimizing cooperative model fine-tuning for seamless model adaptation.
First, to support heterogeneous devices' local running of AI models, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits organized from the most significant bit to less significant bits. Hence, a neural network with an arbitrary quantization precision can be obtained by integrating the corresponding number of bit groups, i.e., vertical layers. To obtain high-performance vertical-layered quantization models, a simple once quantization-aware training scheme is proposed, which incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all models.
Next, to overcome the communication bottleneck caused by simultaneous multi-user AI model downloading, a model broadcasting and assembling (MBA) framework is proposed, which leverages reusable knowledge, referring to shared parameters among tasks/models, to enable parameter broadcasting. The MBA framework comprises the MBA protocol and the joint design of parameter-selection-and-power-control. The former defines the system operations, including parameter selection from an AI library, power control for broadcasting, and model assembling at devices, and the latter provides guarantees on devices' model performance and aims to minimize the downloading latency.
Last, a multi-device cooperation mechanism enabling the efficient fine-tuning of foundation models (FoMo) is proposed for edge devices to align the model with the user preference. Within the device-edge cooperative fine-tuning (DEFT) paradigm, devices distributed at the wireless edge simultaneously optimize different parts of fine-tuning parameters within a FoMo. Considering the optimization of FoMos, where blocks residing in different depths lead to varied computation latency and memory consumption, an integrated communication-and-computation allocation of local computation loads and uplink communication resources is optimized to achieve low-latency DEFT. |
Degree | Doctor of Philosophy |
Subject | Edge computing Artificial intelligence Wireless communication systems Mobile communication systems |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/351039 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wu, Hai | - |
dc.contributor.author | 吴海 | - |
dc.date.accessioned | 2024-11-08T07:10:53Z | - |
dc.date.available | 2024-11-08T07:10:53Z | - |
dc.date.issued | 2024 | - |
dc.identifier.citation | Wu, H. [吴海]. (2024). Realizing intelligence at the wireless edge : AI model compression, downloading, and cooperative fine-tuning. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/351039 | - |
dc.description.abstract | The recent advancements in artificial intelligence (AI) have driven researchers to integrate AI learning models into the sixth-generation (6G) mobile network, facilitating the automation of various tasks in next-generation mobile applications. By utilizing modern mobile AI accelerators, edge devices can execute AI models and training algorithms, benefiting from enhanced data security and improved performance speeds. However, the current wireless network infrastructure, which primarily focuses on rate maximization, struggles to support these tasks due to its limited versatility and efficiency, heterogeneous design, and lack of end-to-end performance guarantees. To realize intelligent operations at the edge, a task-oriented communication system with high scalability, ease of deployment, and low-cost maintenance is essential. This dissertation contributes to intelligent edge realization through three key innovations: 1) designing scalable, high-performance AI model quantization, 2) investigating efficient multi-user model downloading via broadcasting, and 3) optimizing cooperative model fine-tuning for seamless model adaptation. First, to support heterogeneous devices' local running of AI models, we study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one. It represents weights as a group of bits organized from the most significant bit to less significant bits. Hence, a neural network with an arbitrary quantization precision can be obtained by integrating the corresponding number of bit groups, i.e., vertical layers. To obtain high-performance vertical-layered quantization models, a simple once quantization-aware training scheme is proposed, which incorporates a cascade downsampling mechanism with the multi-objective optimization employed to train the shared source model weights such that they can be updated simultaneously, considering the performance of all models. Next, to overcome the communication bottleneck caused by simultaneous multi-user AI model downloading, a model broadcasting and assembling (MBA) framework is proposed, which leverages reusable knowledge, referring to shared parameters among tasks/models, to enable parameter broadcasting. The MBA framework comprises the MBA protocol and the joint design of parameter-selection-and-power-control. The former defines the system operations, including parameter selection from an AI library, power control for broadcasting, and model assembling at devices, and the latter provides guarantees on devices' model performance and aims to minimize the downloading latency. Last, a multi-device cooperation mechanism enabling the efficient fine-tuning of foundation models (FoMo) is proposed for edge devices to align the model with the user preference. Within the device-edge cooperative fine-tuning (DEFT) paradigm, devices distributed at the wireless edge simultaneously optimize different parts of fine-tuning parameters within a FoMo. Considering the optimization of FoMos, where blocks residing in different depths lead to varied computation latency and memory consumption, an integrated communication-and-computation allocation of local computation loads and uplink communication resources is optimized to achieve low-latency DEFT. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Edge computing | - |
dc.subject.lcsh | Artificial intelligence | - |
dc.subject.lcsh | Wireless communication systems | - |
dc.subject.lcsh | Mobile communication systems | - |
dc.title | Realizing intelligence at the wireless edge : AI model compression, downloading, and cooperative fine-tuning | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.date.hkucongregation | 2024 | - |
dc.identifier.mmsid | 991044869876803414 | - |