File Download
Links for fulltext
(May Require Subscription)
- Publisher Website: 10.1145/2967360.2967364
- Scopus: eid_2-s2.0-84986569448
Supplementary
-
Citations:
- Scopus: 0
- Appears in Collections:
Conference Paper: TRIPOD: an efficient, highly-available Cluster Management System
Title | TRIPOD: an efficient, highly-available Cluster Management System |
---|---|
Authors | |
Issue Date | 2016 |
Publisher | ACM. |
Citation | The 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2016), Hong Kong, China, 4-5 August 2016. In Conference Proceedings, 2016, p. 1-9 How to Cite? |
Abstract | Driven by the increasing computational demands, cluster management systems (e.g., MESOS) are already pervasive for deploying many applications. Unfortunately, despite much effort, existing systems are still difficult to meet the high requirements of critical applications (e.g., trading and military applications), because these applications naturally require high-availability and low performance overhead in deployments. Existing systems typically replicate their job controllers so that these controllers can be highly-available and thus they can handle application failures. However, applications themselves are still often a single point of failure, leaving arbitrary unavailable time windows for themselves. This paper proposes the design of TRIPOD, a cluster management system that automatically provides high availability to general applications. TRIPOD’s key to make applications achieve high-availability efficiently is a new PAXOS replication protocol that leverages RDMA (Remote Direct Memory Access). TRIPOD runs replicas of the same job with a replicas of controllers, and controllers agree on job requests efficiently with this protocol. Evaluation shows that TRIPOD has low performance overhead in both throughput and response time compared to an application’s unreplicated execution. |
Description | Article no. 9 |
Persistent Identifier | http://hdl.handle.net/10722/229719 |
ISBN |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, C | - |
dc.contributor.author | Yang, JY | - |
dc.contributor.author | Yi, N | - |
dc.contributor.author | Cui, H | - |
dc.date.accessioned | 2016-08-23T14:12:51Z | - |
dc.date.available | 2016-08-23T14:12:51Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | The 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2016), Hong Kong, China, 4-5 August 2016. In Conference Proceedings, 2016, p. 1-9 | - |
dc.identifier.isbn | 978-1-4503-4265-0 | - |
dc.identifier.uri | http://hdl.handle.net/10722/229719 | - |
dc.description | Article no. 9 | - |
dc.description.abstract | Driven by the increasing computational demands, cluster management systems (e.g., MESOS) are already pervasive for deploying many applications. Unfortunately, despite much effort, existing systems are still difficult to meet the high requirements of critical applications (e.g., trading and military applications), because these applications naturally require high-availability and low performance overhead in deployments. Existing systems typically replicate their job controllers so that these controllers can be highly-available and thus they can handle application failures. However, applications themselves are still often a single point of failure, leaving arbitrary unavailable time windows for themselves. This paper proposes the design of TRIPOD, a cluster management system that automatically provides high availability to general applications. TRIPOD’s key to make applications achieve high-availability efficiently is a new PAXOS replication protocol that leverages RDMA (Remote Direct Memory Access). TRIPOD runs replicas of the same job with a replicas of controllers, and controllers agree on job requests efficiently with this protocol. Evaluation shows that TRIPOD has low performance overhead in both throughput and response time compared to an application’s unreplicated execution. | - |
dc.language | eng | - |
dc.publisher | ACM. | - |
dc.relation.ispartof | Proceedings of the 7th ACM SIGOPS Asia-Pacific Workshop on Systems, APSys '16 | - |
dc.title | TRIPOD: an efficient, highly-available Cluster Management System | - |
dc.type | Conference_Paper | - |
dc.identifier.email | Cui, H: heming@hku.hk | - |
dc.identifier.authority | Cui, H=rp02008 | - |
dc.description.nature | link_to_OA_fulltext | - |
dc.identifier.doi | 10.1145/2967360.2967364 | - |
dc.identifier.scopus | eid_2-s2.0-84986569448 | - |
dc.identifier.hkuros | 262858 | - |
dc.identifier.spage | 1 | - |
dc.identifier.epage | 9 | - |
dc.publisher.place | United States | - |
dc.customcontrol.immutable | sml 160919 | - |