File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: Performance analysis of access latency in distributed storage systems

TitlePerformance analysis of access latency in distributed storage systems
Authors
Issue Date2016
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Shuai, Q. [帥奇奇]. (2016). Performance analysis of access latency in distributed storage systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801616.
AbstractAccess latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this work we design new XOR-based erasure codes HTSC and FH HTSC to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time. Both direct and k-access reads are common in distributed storage systems. However, much of previous research only considers k-access reads and many schemes, such as Redundant Scheme, are only shown to reduce latency for k-access reads. We have no idea whether those existing schemes can also work for direct reads. The study regarding the characteristics of the latency performance of direct reads, and the appropriate schemes for direct reads to reduce latency is still lacking. In this work, we study the latency performance of direct reads and its correlation with degraded reads. We illustrate the relationship between degraded reads and bandwidth cost and answer important questions like when degraded reads can help reduce latency. Then we propose a scheme DRALB to reduce latency for direct reads. DRALB can be easily added to existing schemes and can greatly reduce the latency of hot data. We also conduct trace-driven simulations to verify that DRALB significantly outperforms existing schemes, in terms of latency performance of direct reads. Till now, almost all previous studies analyze access latency when a user is interested in reading all the files in a codeword. Our research extends previous studies and analyzes the access latency in a general case when users require different sizes of files from a codeword. We also characterize the latency-cost tradeoffs for the general case. In addition, we study the latency performance of coding and replication with non-uniform data popularity in practical storage systems. Accounting for practical conditions and through extensive simulations using real service time traces from Amazon S3, we compare the latency performance of coding and replication and find that, different from previous results, under the same storage cost, we cannot determine easily which one is better, since it depends on many conditions, especially on whether the data popularity is uniform or not.
DegreeDoctor of Philosophy
SubjectStorage area networks (Computer networks)
Electronic data processing - Distributed processing
Dept/ProgramElectrical and Electronic Engineering
Persistent Identifierhttp://hdl.handle.net/10722/246681
HKU Library Item IDb5801616

 

DC FieldValueLanguage
dc.contributor.authorShuai, Qiqi-
dc.contributor.author帥奇奇-
dc.date.accessioned2017-09-22T03:40:11Z-
dc.date.available2017-09-22T03:40:11Z-
dc.date.issued2016-
dc.identifier.citationShuai, Q. [帥奇奇]. (2016). Performance analysis of access latency in distributed storage systems. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. Retrieved from http://dx.doi.org/10.5353/th_b5801616.-
dc.identifier.urihttp://hdl.handle.net/10722/246681-
dc.description.abstractAccess latency performance is a key metric in distributed storage systems since it greatly impacts user experience while existing codes mainly focus on improving performance such as storage overhead and repair cost. By generating parity nodes from parity nodes, in this work we design new XOR-based erasure codes HTSC and FH HTSC to reduce access latency in distributed storage systems. By comparing with other popular and representative codes, we show that, under the same repair cost, HTSC and FH HTSC codes can reduce access latency while maintaining favorable performance in other metrics. In particular, under the same repair cost, FH HTSC can achieve lower access latency, higher or equal failure tolerance and lower computation cost compared with the representative codes while enjoying similar storage overhead. Accordingly, FH HTSC is a superior choice for applications requiring low access latency and outstanding failure tolerance capability at the same time. Both direct and k-access reads are common in distributed storage systems. However, much of previous research only considers k-access reads and many schemes, such as Redundant Scheme, are only shown to reduce latency for k-access reads. We have no idea whether those existing schemes can also work for direct reads. The study regarding the characteristics of the latency performance of direct reads, and the appropriate schemes for direct reads to reduce latency is still lacking. In this work, we study the latency performance of direct reads and its correlation with degraded reads. We illustrate the relationship between degraded reads and bandwidth cost and answer important questions like when degraded reads can help reduce latency. Then we propose a scheme DRALB to reduce latency for direct reads. DRALB can be easily added to existing schemes and can greatly reduce the latency of hot data. We also conduct trace-driven simulations to verify that DRALB significantly outperforms existing schemes, in terms of latency performance of direct reads. Till now, almost all previous studies analyze access latency when a user is interested in reading all the files in a codeword. Our research extends previous studies and analyzes the access latency in a general case when users require different sizes of files from a codeword. We also characterize the latency-cost tradeoffs for the general case. In addition, we study the latency performance of coding and replication with non-uniform data popularity in practical storage systems. Accounting for practical conditions and through extensive simulations using real service time traces from Amazon S3, we compare the latency performance of coding and replication and find that, different from previous results, under the same storage cost, we cannot determine easily which one is better, since it depends on many conditions, especially on whether the data popularity is uniform or not.-
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshStorage area networks (Computer networks)-
dc.subject.lcshElectronic data processing - Distributed processing-
dc.titlePerformance analysis of access latency in distributed storage systems-
dc.typePG_Thesis-
dc.identifier.hkulb5801616-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineElectrical and Electronic Engineering-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_b5801616-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats