File Download
Supplementary
-
Citations:
- Appears in Collections:
postgraduate thesis: SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing
Title | SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing |
---|---|
Authors | |
Advisors | Advisor(s):Wang, CL |
Issue Date | 2018 |
Publisher | The University of Hong Kong (Pokfulam, Hong Kong) |
Citation | Zhang, M. [张明哲]. (2018). SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. |
Abstract | While CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in scalable fault tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory (NVRAM) technologies are yielding fast, dense, energy-efficient, and byte-addressable NVRAM that can dethrone SSD or HDD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. It is attractive and significant to design persistent data storage, propose efficient logging and checkpointing mechanisms and optimize program scalability in multi-socket, multi-core machines with NVRAM. To achieve data persistence, state-of-the-art techniques apply logging and checkpointing mechanisms. Log-based systems can be further classified into two types. They are value logging and function logging. For every function or transaction of the persistent data, systems with value logging record the updated data. Other systems utilize function logging (a.k.a. operation logging) and checkpointing. They record the function pointer
and its arguments.
In this work, we design and implement a scalable persistent data system, which exploits NVRAM, alongside DRAM, to support efficient data persistence in highly-threaded big data applications. We study both value logging and function logging mechanisms and analyze their suitable applications. Based on the function logging, we effectively develop and optimize our log recording and checkpointing scheme towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. Based on the value logging, we design a fine-grained differential logging scheme which is able to flush noncontinuous dirty cache-line granularity data to NVRAM to guarantee persistence. Unlike previous work that persistent updated data in block-granularity, we effectively reduce the size of log data written in NVRAM. To achieve multi-core scalability, we study flat combining technique. By integrating function logging mechanisms and flat combining technique, we propose a new programming model that classifies functions into instant and deferrable groups. We feature a streamlined execution model, which allows lazy evaluation of deferrable functions and is well-suited to big data computing workloads that would see improved data locality and concurrency. We evaluate a wide range of applications with machine learning, high-performance computing, database and big data workloads. Experiments show that our approaches can significantly improve the scalability and runtime performance for data persistence system under multi-core architecture.
|
Degree | Doctor of Philosophy |
Subject | Flash memories (Computers) |
Dept/Program | Computer Science |
Persistent Identifier | http://hdl.handle.net/10722/263213 |
DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | Wang, CL | - |
dc.contributor.author | Zhang, Mingzhe | - |
dc.contributor.author | 张明哲 | - |
dc.date.accessioned | 2018-10-16T07:35:01Z | - |
dc.date.available | 2018-10-16T07:35:01Z | - |
dc.date.issued | 2018 | - |
dc.identifier.citation | Zhang, M. [张明哲]. (2018). SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR. | - |
dc.identifier.uri | http://hdl.handle.net/10722/263213 | - |
dc.description.abstract | While CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in scalable fault tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory (NVRAM) technologies are yielding fast, dense, energy-efficient, and byte-addressable NVRAM that can dethrone SSD or HDD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. It is attractive and significant to design persistent data storage, propose efficient logging and checkpointing mechanisms and optimize program scalability in multi-socket, multi-core machines with NVRAM. To achieve data persistence, state-of-the-art techniques apply logging and checkpointing mechanisms. Log-based systems can be further classified into two types. They are value logging and function logging. For every function or transaction of the persistent data, systems with value logging record the updated data. Other systems utilize function logging (a.k.a. operation logging) and checkpointing. They record the function pointer and its arguments. In this work, we design and implement a scalable persistent data system, which exploits NVRAM, alongside DRAM, to support efficient data persistence in highly-threaded big data applications. We study both value logging and function logging mechanisms and analyze their suitable applications. Based on the function logging, we effectively develop and optimize our log recording and checkpointing scheme towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. Based on the value logging, we design a fine-grained differential logging scheme which is able to flush noncontinuous dirty cache-line granularity data to NVRAM to guarantee persistence. Unlike previous work that persistent updated data in block-granularity, we effectively reduce the size of log data written in NVRAM. To achieve multi-core scalability, we study flat combining technique. By integrating function logging mechanisms and flat combining technique, we propose a new programming model that classifies functions into instant and deferrable groups. We feature a streamlined execution model, which allows lazy evaluation of deferrable functions and is well-suited to big data computing workloads that would see improved data locality and concurrency. We evaluate a wide range of applications with machine learning, high-performance computing, database and big data workloads. Experiments show that our approaches can significantly improve the scalability and runtime performance for data persistence system under multi-core architecture. | - |
dc.language | eng | - |
dc.publisher | The University of Hong Kong (Pokfulam, Hong Kong) | - |
dc.relation.ispartof | HKU Theses Online (HKUTO) | - |
dc.rights | The author retains all proprietary rights, (such as patent rights) and the right to use in future works. | - |
dc.rights | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. | - |
dc.subject.lcsh | Flash memories (Computers) | - |
dc.title | SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing | - |
dc.type | PG_Thesis | - |
dc.description.thesisname | Doctor of Philosophy | - |
dc.description.thesislevel | Doctoral | - |
dc.description.thesisdiscipline | Computer Science | - |
dc.description.nature | published_or_final_version | - |
dc.identifier.doi | 10.5353/th_991044046590503414 | - |
dc.date.hkucongregation | 2018 | - |
dc.identifier.mmsid | 991044046590503414 | - |