SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing

Zhang, Mingzhe; 张明哲

File Download

FullText.pdf

Links for fulltext

(May Require Subscription)

DOI: 10.5353/th_991044046590503414

Supplementary

Citations:
Appears in Collections:
- HKU Theses Online
- Computer Science: Theses

postgraduate thesis: SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing

Title	SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing
Authors	Zhang, Mingzhe 张明哲
Advisors	Advisor(s):Wang, CL
Issue Date	2018
Publisher	The University of Hong Kong (Pokfulam, Hong Kong)
Citation	Zhang, M. [张明哲]. (2018). SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.
Abstract	While CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in scalable fault tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory (NVRAM) technologies are yielding fast, dense, energy-efficient, and byte-addressable NVRAM that can dethrone SSD or HDD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. It is attractive and significant to design persistent data storage, propose efficient logging and checkpointing mechanisms and optimize program scalability in multi-socket, multi-core machines with NVRAM. To achieve data persistence, state-of-the-art techniques apply logging and checkpointing mechanisms. Log-based systems can be further classified into two types. They are value logging and function logging. For every function or transaction of the persistent data, systems with value logging record the updated data. Other systems utilize function logging (a.k.a. operation logging) and checkpointing. They record the function pointer and its arguments. In this work, we design and implement a scalable persistent data system, which exploits NVRAM, alongside DRAM, to support efficient data persistence in highly-threaded big data applications. We study both value logging and function logging mechanisms and analyze their suitable applications. Based on the function logging, we effectively develop and optimize our log recording and checkpointing scheme towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. Based on the value logging, we design a fine-grained differential logging scheme which is able to flush noncontinuous dirty cache-line granularity data to NVRAM to guarantee persistence. Unlike previous work that persistent updated data in block-granularity, we effectively reduce the size of log data written in NVRAM. To achieve multi-core scalability, we study flat combining technique. By integrating function logging mechanisms and flat combining technique, we propose a new programming model that classifies functions into instant and deferrable groups. We feature a streamlined execution model, which allows lazy evaluation of deferrable functions and is well-suited to big data computing workloads that would see improved data locality and concurrency. We evaluate a wide range of applications with machine learning, high-performance computing, database and big data workloads. Experiments show that our approaches can significantly improve the scalability and runtime performance for data persistence system under multi-core architecture.
Degree	Doctor of Philosophy
Subject	Flash memories (Computers)
Dept/Program	Computer Science
Persistent Identifier	http://hdl.handle.net/10722/263213

DC Field	Value	Language
dc.contributor.advisor	Wang, CL	-
dc.contributor.author	Zhang, Mingzhe	-
dc.contributor.author	张明哲	-
dc.date.accessioned	2018-10-16T07:35:01Z	-
dc.date.available	2018-10-16T07:35:01Z	-
dc.date.issued	2018	-
dc.identifier.citation	Zhang, M. [张明哲]. (2018). SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing. (Thesis). University of Hong Kong, Pokfulam, Hong Kong SAR.	-
dc.identifier.uri	http://hdl.handle.net/10722/263213	-
dc.description.abstract	While CPU architectures are incorporating many more cores to meet ever-bigger workloads, advance in scalable fault tolerance support is indispensable for sustaining system performance under reliability constraints. Emerging non-volatile memory (NVRAM) technologies are yielding fast, dense, energy-efficient, and byte-addressable NVRAM that can dethrone SSD or HDD drives for persisting data. Research on using NVRAM to enable fast in-memory data persistence is ongoing. It is attractive and significant to design persistent data storage, propose efficient logging and checkpointing mechanisms and optimize program scalability in multi-socket, multi-core machines with NVRAM. To achieve data persistence, state-of-the-art techniques apply logging and checkpointing mechanisms. Log-based systems can be further classified into two types. They are value logging and function logging. For every function or transaction of the persistent data, systems with value logging record the updated data. Other systems utilize function logging (a.k.a. operation logging) and checkpointing. They record the function pointer and its arguments. In this work, we design and implement a scalable persistent data system, which exploits NVRAM, alongside DRAM, to support efficient data persistence in highly-threaded big data applications. We study both value logging and function logging mechanisms and analyze their suitable applications. Based on the function logging, we effectively develop and optimize our log recording and checkpointing scheme towards NVRAM, mitigating its long write latency through write-combining and consolidated flushing techniques. Efficient persistent object management with features including safe references and memory leak prevention is also implemented and tailored to NVRAM. Based on the value logging, we design a fine-grained differential logging scheme which is able to flush noncontinuous dirty cache-line granularity data to NVRAM to guarantee persistence. Unlike previous work that persistent updated data in block-granularity, we effectively reduce the size of log data written in NVRAM. To achieve multi-core scalability, we study flat combining technique. By integrating function logging mechanisms and flat combining technique, we propose a new programming model that classifies functions into instant and deferrable groups. We feature a streamlined execution model, which allows lazy evaluation of deferrable functions and is well-suited to big data computing workloads that would see improved data locality and concurrency. We evaluate a wide range of applications with machine learning, high-performance computing, database and big data workloads. Experiments show that our approaches can significantly improve the scalability and runtime performance for data persistence system under multi-core architecture.	-
dc.language	eng	-
dc.publisher	The University of Hong Kong (Pokfulam, Hong Kong)	-
dc.relation.ispartof	HKU Theses Online (HKUTO)	-
dc.rights	The author retains all proprietary rights, (such as patent rights) and the right to use in future works.	-
dc.rights	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.	-
dc.subject.lcsh	Flash memories (Computers)	-
dc.title	SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing	-
dc.type	PG_Thesis	-
dc.description.thesisname	Doctor of Philosophy	-
dc.description.thesislevel	Doctoral	-
dc.description.thesisdiscipline	Computer Science	-
dc.description.nature	published_or_final_version	-
dc.identifier.doi	10.5353/th_991044046590503414	-
dc.date.hkucongregation	2018	-
dc.identifier.mmsid	991044046590503414	-

File Download

Links for fulltext

(May Require Subscription)

Supplementary

postgraduate thesis: SIMPO : a scalable in-memory persistent object framework using NVRAM for reliable big data computing

Export via OAI-PMH Interface in XML Formats

OR

Export to Other Non-XML Formats