File Download
  Links for fulltext
     (May Require Subscription)
Supplementary

postgraduate thesis: SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing

TitleSIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing
Authors
Advisors
Advisor(s):Wang, CL
Issue Date2018
PublisherThe University of Hong Kong (Pokfulam, Hong Kong)
Citation
Zhang, M. [张明哲]. (2018). SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
AbstractWhile CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in scalable fault tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory (NVRAM) technologies are yielding fast, dense, energy-efficient, and byte-addressable NVRAM that can dethrone SSD or HDD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. It is attractive and significant to design persistent data storage, propose efficient logging and checkpointing mechanisms and optimize program scalability in multi-socket, multi-core machines with NVRAM. To achieve data persistence, state-of-the-art techniques apply logging and checkpointing mechanisms. Log-based systems can be further classified into two types. They are value logging and function logging. For every function or transaction of the persistent data, systems with value logging record the updated data. Other systems utilize function logging (a.k.a. operation logging) and checkpointing. They record the function pointer and its arguments. In this work, we design and implement a scalable persistent data system, which exploits NVRAM, alongside DRAM, to support efficient data persistence in highly-threaded big data applications. We study both value logging and function logging mechanisms and analyze their suitable applications. Based on the function logging, we effectively develop and optimize our log recording and checkpointing scheme towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. Based on the value logging, we design a fine-grained differential logging scheme which is able to flush noncontinuous dirty cache-line granularity data to NVRAM to guarantee persistence. Unlike previous work that persistent updated data in block-granularity, we effectively reduce the size of log data written in NVRAM. To achieve multi-core scalability, we study flat combining technique. By integrating function logging mechanisms and flat combining technique, we propose a new programming model that classifies functions into instant and deferrable groups. We feature a streamlined execution model, which allows lazy evaluation of deferrable functions and is well-suited to big data computing workloads that would see improved data locality and concurrency. We evaluate a wide range of applications with machine learning, high-performance computing, database and big data workloads. Experiments show that our approaches can significantly improve the scalability and runtime performance for data persistence system under multi-core architecture.
DegreeDoctor of Philosophy
SubjectFlash memories (Computers)
Dept/ProgramComputer Science
Persistent Identifierhttp://hdl.handle.net/10722/263213

 

DC FieldValueLanguage
dc.contributor.advisorWang, CL-
dc.contributor.authorZhang, Mingzhe-
dc.contributor.author张明哲-
dc.date.accessioned2018-10-16T07:35:01Z-
dc.date.available2018-10-16T07:35:01Z-
dc.date.issued2018-
dc.identifier.citationZhang, M. [张明哲]. (2018). SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.-
dc.identifier.urihttp://hdl.handle.net/10722/263213-
dc.description.abstractWhile CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in scalable fault tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory (NVRAM) technologies are yielding fast, dense, energy-efficient, and byte-addressable NVRAM that can dethrone SSD or HDD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. It is attractive and significant to design persistent data storage, propose efficient logging and checkpointing mechanisms and optimize program scalability in multi-socket, multi-core machines with NVRAM. To achieve data persistence, state-of-the-art techniques apply logging and checkpointing mechanisms. Log-based systems can be further classified into two types. They are value logging and function logging. For every function or transaction of the persistent data, systems with value logging record the updated data. Other systems utilize function logging (a.k.a. operation logging) and checkpointing. They record the function pointer and its arguments. In this work, we design and implement a scalable persistent data system, which exploits NVRAM, alongside DRAM, to support efficient data persistence in highly-threaded big data applications. We study both value logging and function logging mechanisms and analyze their suitable applications. Based on the function logging, we effectively develop and optimize our log recording and checkpointing scheme towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. Based on the value logging, we design a fine-grained differential logging scheme which is able to flush noncontinuous dirty cache-line granularity data to NVRAM to guarantee persistence. Unlike previous work that persistent updated data in block-granularity, we effectively reduce the size of log data written in NVRAM. To achieve multi-core scalability, we study flat combining technique. By integrating function logging mechanisms and flat combining technique, we propose a new programming model that classifies functions into instant and deferrable groups. We feature a streamlined execution model, which allows lazy evaluation of deferrable functions and is well-suited to big data computing workloads that would see improved data locality and concurrency. We evaluate a wide range of applications with machine learning, high-performance computing, database and big data workloads. Experiments show that our approaches can significantly improve the scalability and runtime performance for data persistence system under multi-core architecture. -
dc.languageeng-
dc.publisherThe University of Hong Kong (Pokfulam, Hong Kong)-
dc.relation.ispartofHKU Theses Online (HKUTO)-
dc.rightsThe author retains all proprietary rights, (such as patent rights) and the right to use in future works.-
dc.rightsThis work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.-
dc.subject.lcshFlash memories (Computers)-
dc.titleSIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing-
dc.typePG_Thesis-
dc.description.thesisnameDoctor of Philosophy-
dc.description.thesislevelDoctoral-
dc.description.thesisdisciplineComputer Science-
dc.description.naturepublished_or_final_version-
dc.identifier.doi10.5353/th_991044046590503414-
dc.date.hkucongregation2018-
dc.identifier.mmsid991044046590503414-

Export via OAI-PMH Interface in XML Formats


OR


Export to Other Non-XML Formats