File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: Performance analysis of access latency in distributed storage systems
Title | Performance analysis of access latency in distributed storage systems |
---|---|
Authors | |
Issue Date | 2016 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Shuai, Q. [帥奇奇]. (2016). Performance analysis of access latency in distributed storage systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801616. |
Abstract | Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this work we design new XOR-based erasure codes HTSC and FH HTSC to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time.
Both direct and k-access reads are common in distributed storage systems. However, much of previous research only considers k-access reads and many schemes, such as Redundant Scheme, are only shown to reduce latency for k-access reads. We have no idea whether those existing schemes can also work for direct reads. The study regarding the characteristics of the latency performance of direct reads, and the appropriate schemes for direct reads to reduce latency is still lacking. In this work, we study the latency performance of direct reads and its correlation with degraded reads. We illustrate the relationship between degraded reads and bandwidth cost and answer important questions like when degraded reads can help reduce latency. Then we propose a scheme DRALB to reduce latency for direct reads. DRALB can be easily added to existing schemes and can greatly reduce the latency of hot data. We also conduct trace-driven simulations to verify that DRALB significantly outperforms existing schemes, in terms of latency performance of direct reads.
Till now, almost all previous studies analyze access latency when a user is interested in reading all the files in a codeword. Our research extends previous studies and analyzes the access latency in a general case when users require different sizes of files from a codeword. We also characterize the latency-cost tradeoffs for the general case. In addition, we study the latency performance of coding and replication with non-uniform data popularity in practical storage systems. Accounting for practical conditions and through extensive simulations using real service time traces from Amazon S3, we compare the latency performance of coding and replication and find that, different from previous results, under the same storage cost, we cannot determine easily which one is better, since it depends on many conditions, especially on whether the data popularity is uniform or not. |
Degree | Doctor of Philosophy |
Subject | Storage area networks (Computer networks) Electronic data processing - Distributed processing |
Dept/Program | Electrical and Electronic Engineering |
Persistent Identifier | http://hdl.handle.net/10722/246681 |
HKU Library Item ID | b5801616 |
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Shuai, Qiqi | - |
dc.contributor.author | 帥奇奇 | - |
dc.date.accessioned | 2017-09-22T03:40:11Z | - |
dc.date.available | 2017-09-22T03:40:11Z | - |
dc.date.issued | 2016 | - |
dc.identifier.citation | Shuai, Q. [帥奇奇]. (2016). Performance analysis of access latency in distributed storage systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801616. | - |
dc.identifier.uri | http://hdl.handle.net/10722/246681 | - |
dc.description.abstract | Access latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this work we design new XOR-based erasure codes HTSC and FH HTSC to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time. Both direct and k-access reads are common in distributed storage systems. However, much of previous research only considers k-access reads and many schemes, such as Redundant Scheme, are only shown to reduce latency for k-access reads. We have no idea whether those existing schemes can also work for direct reads. The study regarding the characteristics of the latency performance of direct reads, and the appropriate schemes for direct reads to reduce latency is still lacking. In this work, we study the latency performance of direct reads and its correlation with degraded reads. We illustrate the relationship between degraded reads and bandwidth cost and answer important questions like when degraded reads can help reduce latency. Then we propose a scheme DRALB to reduce latency for direct reads. DRALB can be easily added to existing schemes and can greatly reduce the latency of hot data. We also conduct trace-driven simulations to verify that DRALB significantly outperforms existing schemes, in terms of latency performance of direct reads. Till now, almost all previous studies analyze access latency when a user is interested in reading all the files in a codeword. Our research extends previous studies and analyzes the access latency in a general case when users require different sizes of files from a codeword. We also characterize the latency-cost tradeoffs for the general case. In addition, we study the latency performance of coding and replication with non-uniform data popularity in practical storage systems. Accounting for practical conditions and through extensive simulations using real service time traces from Amazon S3, we compare the latency performance of coding and replication and find that, different from previous results, under the same storage cost, we cannot determine easily which one is better, since it depends on many conditions, especially on whether the data popularity is uniform or not. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Storage area networks (Computer networks) | - |
dc.subject.lcsh | Electronic data processing - Distributed processing | - |
dc.title | Performance analysis of access latency in distributed storage systems | - |
dc.type | PG_Thesis | - |
dc.identifier.hkul | b5801616 | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Electrical and Electronic Engineering | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_b5801616 | - |
dc.identifier.mmsid | 991043959797403414 | - |