Software Architecture for Fault-Tolerant Multicore Computing with Hybridized Non-Volatile Memories
Professor Wang, Cho Li (Principal investigator)
Fault tolerance, STT-MRAM, Many-core, Operating systems, Non-volatile memory
RGC General Research Fund (GRF)
HKU Project Code
General Research Fund (GRF)
1) Propose a new fault-tolerant multicore architecture fabricated from the next-generation non-volatile memory (e.g., STT-RAM) used as last level cache (LLC), on-chip programmable scratchpad memory, and off-chip memory; 2) Revamp the Linux operating system to exploit such special hybridized memory hierarchy with both caches and programmable scratchpads SRAM and STT-MRAM. A novel persistent process model and a persistent page table design are proposed to help provide native fault tolerance for program execution against transient and crash failures; 3) Explore new data affinity techniques tailored to big data computing along the abovementioned memory hierarchy. In particular, we propose anti-caching of datasets whose access via the conventional cacheable datapath could cause serious cache pollution. The anti-caching mechanism exploits the on-chip programmable memory; 4) Failure-atomicity using non-volatile memory: we will design and implement kernel modules to ensure global consistent state of data stored across non-volatile MRAM and volatile memory components (DRAM, caches, store buffers, etc.) using variant schemes of read-copy-update (RCU) synchronization [and transactional semantics, for comparison] in the OS; 5) Propose a new fault-tolerant programming model, built atop a new object abstraction, call "FT-Object", to facilitate fast recovery, supplemented with a library containing common data structures like trees, lists and maps, built from FT-Object for effortless marriage of fault tolerance and user-friendly programming.