Transparent and efficient fault tolerance on high-speed datacenter networking

Wang, Cheng; 王成

File Download

FullText.pdf

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: Transparent and efficient fault tolerance on high-speed datacenter networking

Title	Transparent and efficient fault tolerance on high-speed datacenter networking
Authors	Wang, Cheng 王成
Advisors	Advisor(s):Cui, H Lau, FCM
Issue Date	2019
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Wang, C. [王成]. (2019). Transparent and efficient fault tolerance on high-speed datacenter networking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	Hardware failures are commonplace in datacenters and typically lead to catastrophic consequences. For example, in March 2017, Amazon storage services (S3) went offline for an extended period of time due to server failures in the datacenter, and this outage caused the company to lose over $150 million. To achieve fault tolerance of those important services deployed in the datacenter, replication is the de-facto standard technique. The core of replication is to provide multiple identical copies of a service on different physical machines so that the service can continue to operate even if some of its replicas fail. Unfortunately, despite much research and engineering effort, a replication system that is both transparent and efficient is still missing. A transparent replication system runs general applications without any modifications, and can thus relieve the application developers from the intrusive and error-prone modifications to the program’s structure and code. An efficient replication system causes reasonable performance overhead to the underlying applications. This thesis proposes three transparent and efficient replication systems, APUS, PLOVER, and GANNET. To achieve transparency, APUS, PLOVER, and GANNET utilise the unique features on three different system levels, application level, virtual machine level, and storage level, respectively. To build efficient (high-performance) systems, high-speed RDMA networking in modern datacenters is leveraged and corresponding new protocols are designed to exploit the whole potential of this emerging new hardware.
Degree	Doctor of Philosophy
Subject	Computer networks
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/283115

DC Field	Value	Language
dc.contributor.advisor	Cui, H	-
dc.contributor.advisor	Lau, FCM	-
dc.contributor.author	Wang, Cheng	-
dc.contributor.author	王成	-
dc.date.accessioned	2020-06-10T01:02:12Z	-
dc.date.available	2020-06-10T01:02:12Z	-
dc.date.issued	2019	-
dc.identifier.citation	Wang, C. [王成]. (2019). Transparent and efficient fault tolerance on high-speed datacenter networking. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/283115	-
dc.description.abstract	Hardware failures are commonplace in datacenters and typically lead to catastrophic consequences. For example, in March 2017, Amazon storage services (S3) went offline for an extended period of time due to server failures in the datacenter, and this outage caused the company to lose over $150 million. To achieve fault tolerance of those important services deployed in the datacenter, replication is the de-facto standard technique. The core of replication is to provide multiple identical copies of a service on different physical machines so that the service can continue to operate even if some of its replicas fail. Unfortunately, despite much research and engineering effort, a replication system that is both transparent and efficient is still missing. A transparent replication system runs general applications without any modifications, and can thus relieve the application developers from the intrusive and error-prone modifications to the program’s structure and code. An efficient replication system causes reasonable performance overhead to the underlying applications. This thesis proposes three transparent and efficient replication systems, APUS, PLOVER, and GANNET. To achieve transparency, APUS, PLOVER, and GANNET utilise the unique features on three different system levels, application level, virtual machine level, and storage level, respectively. To build efficient (high-performance) systems, high-speed RDMA networking in modern datacenters is leveraged and corresponding new protocols are designed to exploit the whole potential of this emerging new hardware.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Computer networks	-
dc.title	Transparent and efficient fault tolerance on high-speed datacenter networking	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.date.hkucongregation	2020	-
dc.identifier.mmsid	991044242096103414	-

File Download

Supplementary

postgraduate thesis: Transparent and efficient fault tolerance on high-speed datacenter networking

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats